Python Dialog Serialization

From Post-Apocalyptic RPG wiki

Jump to: navigation, search

Wip code proposal.png This article features a work in progress code proposal.

A work in progress code proposal is incomplete, but is being actively worked on. The proposal may be in various stages of development, from a somewhat organized idea dump to a fully-formed proposal that just needs a quick spell check.

NOTE: Evaluation of proposal deferred until after the Techdemo2 release.



The current YAML serialization of dialogues uses a complicated parser implemented using PyYaml which is difficult to maintain and extend. In addition, it has been determined that a human-readable serialization of dialogues is not a strict requirement. This document proposes that dialogues be serialized in Python instead of YAML, directly leveraging the extensibility of the in-memory data structures and minimizing the effort required to maintain the parsing code.


TODO: write up rationale.

Pros and Cons

TODO: evaluate pros and cons.


Various alternatives were evaluated.

Refactor the Existing PyYaml Parser

Much of the complexity of the current YAML parser is due to the fact that we did not leverage PyYaml's datatype tagging feature. PyYaml provides the functionality to generically serialize and deserialize any Python object: the __dict__ of the object is serialized as a YAML mapping, and tags are added to the serialization which declare the type of object that was serialized.

The datatype tagging feature was originally rejected because it cluttered the dialogue files, and also required integrating some of the parsing code into the dialogue classes.

TODO: finish this section

Implement Our Own YAML Parser

Given the shortfalls of PyYaml, and the benefits in human readability of YAML, it is tempting to implement our own YAML parser. Given the wide range of parser generators such as ANTLR, LEX/YACC, and Bison it wouldn't be prohibitively difficult to implement a YAML parser using one of these 3rd party tools.

Both PyParsing and ANTLR were evaluated for their ability to implement a basic YAML parser. PyParsing provides an intuitive context-free grammar language written in Python, and isn't strictly speaking a "parser generator" since it uses a generic parser which is integrated into the language. ANTLR is more of a "typical" parser generator implemented in Java, but with Python bindings and the ability to generate Python code.

As it turns out, YAML is a very difficult language to parse because, like Python, it uses indentation to determine scope. PyParsing simply wasn't up to the task, and implementing a lexer/parser with ANTLR proved to be possible but very difficult.

Although it would be possible to implement our own YAML parser using ANTLR, the basic parser would take a good deal of time and effort to implement. Once implemented, maintenance of the parser would likely be less intensive than the current PyYaml parser, but would still need to be updated frequently while the dialogue engine is prototyped and extended. It would also require substantial knowledge of how context-free grammars are constructed and parsed.

Switch to Another Serialization Format

Many of the deficiencies of YAML are due to the complicated and unwieldy PyYaml implementation of the language, but many human-readable serialization languages exist which have much more mature Python implementations.

XML and JSON are both mature data serialization languages with good Python support.

TODO: finish section.

Personal tools