June 28, 2013

Enabling formulas and code highlighting on Blogspot

This blog uses MathJax to embed LaTeX-style formulas and Google Code Prettify for syntax highlighting.

To install these tools select the Template tab from the blog admin panel and click Edit HTML. Insert the following code just after the opening head tag.
<script src='https://google-code-prettify.googlecode.com/svn/loader/run_prettify.js'/>    
  <script type='text/x-mathjax-config'>//<![CDATA[
    MathJax.Hub.Config({
      extensions: ["tex2jax.js"],
      jax: ["input/TeX", "output/HTML-CSS"],
      displayAlign: "left",
      styles: {
        ".MathJax_Display": {
          //"background-color": "rgb(230,230,230)",
          "padding-left": "4em",
          "float": "left",
          "display": "inline"
        }
      },
      tex2jax: {
        inlineMath: [ ['$','$'] ],
        displayMath: [ ['$$', '$$'] ],
        processEscapes: true
      },
      "HTML-CSS": { availableFonts: ["TeX"] }
    });
  //]]></script>
  <script src='http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default' type='text/javascript'>
</script>   
The in posts you'll be able to use the following syntax to highlight code
<pre class="prettyprint">
int f(int x)
{
  return 2 * x;
}
</pre>
and \$ and \$\$ delimiters to insert LaTeX math inline or in display style.

A couple of off topic tips. To make the text area adapt to the browser window, in the same editor search for the text .content-outer and edit the code around it as follows:
body {
min-width: 100px /*$(content.width);*/
}
.content-outer, .content-fauxcolumn-outer, .region-inner {
min-width: 100px /*$(content.width);*/
max-width: 2000px /*$(content.width);*/
_width: 100% /*$(content.width);*/
}
Also, in math jax, it's possible to define macros as follows:
TeX: {
  Macros: {
    cov: '{\\operatorname{cov}}',
    reals: '{\\mathbb{R}}',
    T: '{\\mathrm{T}}',
    tr: '{\\operatorname{tr}}',
    paren: ['\\left( #1 \\right)', 1],
    bracket: ['\\left[ #1 \\right]', 1],
    brace: ['\\left\\{ #1 \\right\\}', 1]
  }
}

June 25, 2013

Pyson, something between JSON and Python

JSON, by virtue of its minimalist nature, lacks a few features which make it difficult to efficiently represent some instances of structured data.

In particular, JSON only allows the representation of trees, i.e. each node has exactly one parent node with exception of the root element. What this means is that data cannot be reused in multiple places within a document.

In addition, JSON's type system is very limited - there is no way to define custom types. This means that most structures must be represented as dictionaries, making it necessary to repeat the schema (i.e. the dictionary keys) along with every instance of the data (i.e. the dictionary values), or requiring ugly mechanisms like special keys (e.g "__type__") to give each dictionary a label indicating how to interpret the data.

So the idea is to extend JSON with a couple of additional light-weight constructs to solve these issues. In the same original spirit which gave rise to JSON, I decided to borrow some syntax from Python, of which JSON is a subset (kind of). So what we have is
$$
\text{JSON} \subset \text{Pyson} \subset \text{Python}
$$

Syntax

The extra syntactic features in Pyson are:
  • The root element is no longer any value but a document, i.e. a list of key-value pairs of the form key = value:
    x = 3
    y = {'a': 3, 'b': 2}
    z = [true, false]
    
  • It is possibile to reference values which have been defined previously
    x = [1,2,3]
    y = {'z': x}
    u = [y, x, y]
    
  • It is possibile to define new types and instantiate them
    DateTime = Type("DateTime", ["year", "month", "day"])
    today = DateTime(2013, 6, 24)
    
Parsing and semantic analysis

Modifying a JSON parser to accommodate the new syntax is very straightforward. It's more complicated to get the reference semantics right.

In JSON, there is no difference between the parsing and semantic analysis because JSON values are a one-to-one representation of the abstract syntax tree (AST) generated by a parser.
$$
\text{JSON string} \quad\to_{\text{parser}}\quad \text{AST} = \text{data}
$$
This also means that converting a value stored in memory to a JSON string is immediate.
$$
\text{AST} = \text{data}\quad \to_{\text{to string}}\quad \text{JSON string}
$$

References, on the other hand, need to be resolved in the AST before it is a useful in-memory representation of the data.
$$
\text{Pyson string} \quad\to_{\text{parser}}\quad\text{AST}\quad\to_\text{linker}\quad \text{data}
$$
Serialization also becomes more complicated, because it's necessary to analyze the data, factor out common nodes and substitute in the references. It's also necessary that the variables are output in the correct order as required by the references.
$$
\text{data} \quad\to_\text{unlinker}\quad \text{AST} \quad\to_{\text{to string}}\quad \text{Pyson string}
$$

Note that when a Pyson string is parsed, linked, unlinked and converted back into a string, the output will be the same as the input (except for some irrelevant differences in variable order). When, however, the data being serialized has not been come from a Pyson string, it is possible that there are common nodes which don't have an explicit variable name - hence they must be generated automatically.

Examples: reading Pyson with C#

We'll read in this Pyson document
DateTime = Type("DateTime", ["year", "month", "day"])
today = DateTime(2013, 6, 24)
expiry = DateTime(2014, 6, 1)
period = [today, expiry]
The following C# code steps through parsing, linking, unlinking and conversion to string
List<KeyValuePair<string, object>> inputAST = PysonParser.Parse(inputString);

Dictionary<string, object> data = PysonParser.Link(inputAST);

List<KeyValuePair<string, object>> outputAST = PysonParser.Unlink(data);

string outputString = PysonParser.ToString(outputAST);
The output string comes out exactly the same as the input string. We'll rarely be interested in the intermediate ASTs so we can alternatively just write
Dictionary<string, object> data = PysonParser.Link(inputString);

string outputString = PysonParser.ToString(data);
From C# we can easily access the data using the dynamic keyword.
dynamic data = PysonParser.Link(inputString);
double startYear = data["period"][0]["year"];
double endYear = data["period"][1]["year"];
bool test1 = data["period"][0] is PysonInstance;
bool test2 = data["DateTime"] is PysonType;
bool test3 = data["period"][0].Type == data["DateTime"];

Examples: generating Pyson with C#

Here's an example of how to generate a Pyson string from an in-memory object.
var dateTime = new PysonType("DateTime", new object[] { "year", "month", "day" });
var today = dateTime.New(2013, 6, 24);
var expiry = dateTime.New(2014, 6, 1);

dynamic data = new Dictionary<string, object>();
data["period"] = new object[] { today, expiry };
 
string outputString = PysonParser.ToString(data);
The output isn't as readable as we might expect:
Type1 = Type("DateTime", ["year", "month", "day"])
period = [Type1(2013, 6, 24), Type1(2014, 6, 1)]
This is because the Dictionary object doesn't directly contain the DateTime type, so the rather ugly name 'Type1' is generated automatically. This can be fixed by adding it to the Dictionary:
var dateTime = new PysonType("DateTime", new object[] { "year", "month", "day" });
var today = dateTime.New(2013, 6, 24);
var expiry = dateTime.New(2014, 6, 1);

dynamic data = new Dictionary<string, object>();
data["period"] = new object[] { today, expiry };
data["DateTime"] = dateTime;
 
string outputString = PysonParser.ToString(data);
which generates:
DateTime = Type("DateTime", ["year", "month", "day"])
period = [DateTime(2013, 6, 24), DateTime(2014, 6, 1)]
Note that objects which appear more than once in the data are automatically factored out and placed before the first reference.
var dateTime = new PysonType("DateTime", new object[] { "year", "month", "day" });
var today = dateTime.New(2013, 6, 24);
var expiry = dateTime.New(2014, 6, 1);

dynamic data = new Dictionary<string, object>();
data["period"] = new object[] { today, expiry };
data["DateTime"] = dateTime;
data["simulation"] = new Dictionary>string, object<();
data["simulation"]["start"] = today;
 
string outputString = PysonParser.ToString(data);
and the output
DateTime = Type("DateTime", ["year", "month", "day"])
obj3 = DateTime(2013, 6, 24)
period = [obj3, DateTime(2014, 6, 1)]
simulation = { "start": obj3 }