Python Dialog Serialization

NOTE: Evaluation of proposal deferred until after the Techdemo2 release.

Description
The current YAML serialization of dialogues uses a complicated parser implemented using PyYaml which is difficult to maintain and extend. In addition, it has been determined that a human-readable serialization of dialogues is not a strict requirement. This document proposes that dialogues be serialized in Python instead of YAML, directly leveraging the extensibility of the in-memory data structures and minimizing the effort required to maintain the parsing code.

Rationale
TODO: write up rationale.

Pros and Cons
TODO: evaluate pros and cons.

Alternatives
Various alternatives were evaluated.

Refactor the Existing PyYaml Parser
Much of the complexity of the current YAML parser is due to the fact that we did not leverage PyYaml's datatype tagging feature. PyYaml provides the functionality to generically serialize and deserialize any Python object: the __dict__ of the object is serialized as a YAML mapping, and tags are added to the serialization which declare the type of object that was serialized.

The datatype tagging feature was originally rejected because it cluttered the dialogue files, and also required integrating some of the parsing code into the dialogue classes.

TODO: finish this section

Implement Our Own YAML Parser
Given the shortfalls of PyYaml, and the benefits in human readability of YAML, it is tempting to implement our own YAML parser. Given the wide range of parser generators such as ANTLR, LEX/YACC, and Bison it wouldn't be prohibitively difficult to implement a YAML parser using one of these 3rd party tools.

Both PyParsing and ANTLR were evaluated for their ability to implement a basic YAML parser. PyParsing provides an intuitive context-free grammar language written in Python, and isn't strictly speaking a "parser generator" since it uses a generic parser which is integrated into the language. ANTLR is more of a "typical" parser generator implemented in Java, but with Python bindings and the ability to generate Python code.

As it turns out, YAML is a very difficult language to parse because, like Python, it uses indentation to determine scope. PyParsing simply wasn't up to the task, and implementing a lexer/parser with ANTLR proved to be possible but very difficult.

Although it would be possible to implement our own YAML parser using ANTLR, the basic parser would take a good deal of time and effort to implement. Once implemented, maintenance of the parser would likely be less intensive than the current PyYaml parser, but would still need to be updated frequently while the dialogue engine is prototyped and extended. It would also require substantial knowledge of how context-free grammars are constructed and parsed.

Switch to Another Serialization Format
Many of the deficiencies of YAML are due to the complicated and unwieldy PyYaml implementation of the language, but many human-readable serialization languages exist which have much more mature Python implementations.

XML and JSON are both mature data serialization languages with good Python support.

TODO: finish section.