August 7, 2013

Markowitz Optimization with CVXOPT

Let $\mu$ be the expected return vector and $\Sigma$ be the return covariance matrix, then Markowitz seeks to minimize the portfolio variance while achieving a given portfolio return $\mu^*$
$$
\begin{align}
\text{minimize}& w^\T \Sigma w \\
\text{s.t.}& w_i \geq 0 \\
& \sum w_i = 1 \\
& \sum \mu_i w_i = \mu^*
\end{align}
$$
This is a quadratic programming (QP) problem which can easily be solved in Python using the cvxopt library.
def markowitz(mu, sigma, mu_p):
 def iif(cond, iftrue=1.0, iffalse=0.0):
  if cond:
   return iftrue
  else:
   return iffalse

 from cvxopt import matrix, solvers
 #solvers.options['show_progress'] = False

 d = len(sigma)

 # P and q determine the objective function to minimize
 # which in cvxopt is defined as $.5 x^T P x + q^T x$
 P = matrix(sigma)
 q = matrix([0.0 for i in range(d)])

 # G and h determine the inequality constraints in the
 # form $G x \leq h$. We write $w_i \geq 0$ as $-1 \times x_i \leq 0$
 # and also add a (superfluous) $x_i \leq 1$ constraint
 G = matrix([
  [ (-1.0)**(1+j%2) * iif(i == j/2) for i in range(d) ]
  for j in range(2*d)
 ]).trans()
 h = matrix([ iif(j % 2) for j in range(2*d) ])

 # A and b determine the equality constraints defined
        # as A x = b
 A = matrix([
  [ 1.0 for i in range(d) ],
  mu
 ]).trans()
 b = matrix([ 1.0, float(mu_p) ])

 sol = solvers.qp(P, q, G, h, A, b)

 if sol['status'] != 'optimal':
  raise Exception("Could not solve problem.")

 w = list(sol['x'])
 f = 2.0*sol['primal objective']

 return {'w': w, 'f': f, 'args': (P, q, G, h, A, b), 'result': sol }
Using my ExcelPython library to call the function from a spreadsheet, I tried calculating the portfolio weights and efficient frontier with parameters
$$
\begin{align}
\mu &= \paren{ \begin{matrix} 4\% \\ 5\% \\ 6\%  \end{matrix} } &
\Sigma &= \paren{ \begin{matrix}
20\% & -9\% & 12\% \\
-9\% & 15\% & -11\% \\
12\% & -11\% & 30\%
\end{matrix} }
\end{align}
$$
with $\mu^*$ varying from 4% to 6%. Here's the result



July 15, 2013

Covariance estimation

Estimating the covariance matrix of a $d$-dimensional Gaussian random variable $X$ from $n$ observations is easy right? We all know we get an unbiased, efficient estimate of the covariance between $X_i$ and $X_j$ by computing
$$ \cov(X_i, X_j) \approx \frac 1 {n-1} \sum_{k=1}^n (x_{i,k} - \bar x_i) (x_{j,k} - \bar x_j) $$
so we just do this for each pair $(i, j)$ and we have our covariance matrix! Actually, it's not always that simple...

Theoretical covariance (and correlation) matrices are positive semi-definite, and so estimated covariance matrices should be viewed as elements of the manifold embedded in $\reals^{n\times n}$ consisting of PSD matrices. Intepreted in this light, the pairwise correlation estimator is biased and inefficient. Indeed, for small $n/d$ it's not unlikely that pairwise correlation will yield a matrix which isn't PSD, leading to problems further down the line when, for example, performing an eigenvalue decomposition.

One solution to this problem is shrinkage, an implementation of which is available in the scikit-learn Python library.

The idea behind shrinkage is that of substituting the traditional estimator with a convex combination $(1-\lambda) S + \lambda F$ of the sample covariance matrix $S$ and a shrinkage target $F$, a well-conditioned matrix which makes milder claims about the theoretical distribution.

So, to shrink a sample covariance matrix you need to decide on a shrinkage target and the convex weight $\lambda$. $F$ is typically taken to be the identity matrix multiplied by the average sample variance
$$
F = \frac{\tr S}{d}I
$$
so the tricky part is $\lambda$. Choosing this parameter is tricky because the optimality conditions one would intuitively choose depend on knowing the theoretical, unobservable covariance matrix $\Sigma$. However, it turns out, that these optimal parameters $\lambda^*$ can be estimated from the sample, ultimately providing a practical procedure to shrink the sample covariance matrix. Such optimal parameters are derived by Ledoit and Wolf and subsequently improved by Chen, Wiesel, Eldar and Hero.

Ledoit and Wolf derive the LW shrinkage intensity
$$
\hat\lambda_{LW}^* = \frac{b^2}{d^2}
$$
where $d^2 = \| S-F \|^2/$, $b^2 = \min(d^2, n^{-2}\sum_{k=1}^n \| x^{(n)}_k (x^{(n)}_k)^\T - S\|^2)$ and $\|M\|^2=\tr(MM^\T)$ is the Frobenius norm. Asymptotically, the shrunk estimator is shown to have, amongst all convex combinations of $S$ and $F$, minimum quadratic risk with respect to $\Sigma$. This asymptotic result does not, however, help in situations in which $n/d$ is low, for example when $n = d = 2$ we have $b = 0$ and hence no shrinkage occurs, which is the opposite of what one would want in such a situation.

The CWEH estimator performs significantly better in low $n/d$ situations. The shrinkage intensity is given by
$$
\hat\lambda_{OAS}^* = \min\paren{ \frac{(1-2/d)\tr(S^2)+\tr^2(S)}{(n+1-2/d)\bracket{ \tr(S^2) - \tr^2(S)/d } }, 1 }.
$$

July 3, 2013

References in JSON

Modern object-oriented programming languages store data as objects which reference each other, in other words as a directed graph where each node is an object and each arc is a reference. JSON, on the other hand, can natively only represent trees, which are a strict subset of all possible directed graphs, specifically
$$ \begin{align*} \text{(trees)} &= \left\{ g \in \text{(directed graphs)} : \text{ one node in }g\text{ has zero predecessors and the others have exactly one} \right\} \\ &\subset \text{(directed graphs)} \end{align*} $$
In order to represent generic directed graphs in JSON, it's necessary to introduce reference semantics, in other words some mechanism to allow objects to be referenced by multiple predecessors, including objects they in turn reference (cycles).

Let's look at a few approaches that have been tried elsewhere

ID-based references

One approach is to recognise JSON dictionaries of the form
{ "$ref": <identifier> }
as a reference to the unique dictionary containing the key
{ "$id": <identifier>, ... }
This approach has the disadvantage that only JSON dictionary can be referenced, as there is no mechanism for assigning an id to an array

Path-based references

An alternative method is to reference other objects by describing where they are in relation to the referencing object. One popular proposal for doing this is JSONPath.

A simple object names-based approach

Getting back to our original problem, of how to represent an entire directed graph of objects, clearly it's necessary to have some way to retrieve at least a subset of those objects for manipulation, hence a JSON representation of the graph should have as the root object a dictionary associating names to objects.

When this is the case, it's very straightforward to reference objects via their associated names. In order to  represent any directed graph of objects, all that is needed is to assign arbitrary, unique names to multiply-referenced objects. Singly-referenced objects do not need to have names as they can be represented directly inline within the JSON document.

References can then be represented using the {"$ref": <id>} notation or even more simply by deciding on a custom marker such as the '@' sign and specifying that any string starting with that symbol is a reference to a named object. Clearly strings which are really supposed to start with '@' must be escaped, e.g. with a double '@@' which must be processed during the linking/unlinking algorithms.

Note that in reality this approach is not too dissimilar to the ID-based approach, but it provides a different way of assigning IDs which can be applied to any type of object, not just dictionaries.

C# implementation

Here is a C# implementation of the object names-based approach. Note that here it has been assumed that the JSON structure is represented in-memory with IDictionary and Array objects, but this can easily be generalized to any JSON value system. The linking and unlinking procedures act in-place, i.e. they modify the instances passed to them rather than returning new, modified instances.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace JsonUtils
{
  public class JsonLinker
  {
    public static void Link(IDictionary<string, object> vars)
    {
      Link(vars, vars);
    }

    static object Link(object obj, IDictionary<string, object> variables)
    {
      if(obj == null)
        return null;

      Type t = obj.GetType();
      if(obj is IDictionary<string, object>)
      {
        var dict = (IDictionary<string, object>) obj;
        foreach(var key in dict.Keys.ToArray())
          dict[key] = Link(dict[key], variables);
      }
      else if(obj is Array)
      {
        var a = (Array) obj;
        for(int k = 0; k < a.Length; k++)
          a.SetValue(Link(a.GetValue(k), variables), k);
      }
      else if(obj is string)
      {
        var s = (string) obj;
        if(s.Length >  0 && s[0] == '@')
        {
          if(s.Length > 1 && s[1] == '@')
            return s.Substring(1);
          else
            return variables[s.Substring(1)];
        }
      }

      return obj;
    }

    public static void Unlink(IDictionary<string, object> vars)
    {
      var context = new UnlinkContext(vars);
    }

    class UnlinkContext
    {
      IDictionary<string, object> variables;
      IDictionary<object, string> objects = new Dictionary<object, string>();
      ISet<object> visited = new HashSet<object>();

      public UnlinkContext(IDictionary<string, object> variables)
      {
        this.variables = variables;
        foreach(var kv in variables)
        {
          if(kv.Value != null && (kv.Value is IDictionary<string, object> || kv.Value is Array))
            objects[kv.Value] = kv.Key;
        }

        foreach(var v in variables.Values.ToArray())
          Traverse(v);

        foreach(var k in variables.Keys.ToArray())
          variables[k] = Unlink(variables[k], k);
      }

      void Traverse(object obj)
      {
        if(obj != null && (obj is IDictionary<string, object> || obj is Array))
        {
          if(visited.Contains(obj))
          {
            if(!objects.ContainsKey(obj))
            {
              string name = "obj" + objects.Count.ToString();
              objects.Add(obj, name);
              variables.Add(name, obj);
            }
            return;
          }

          visited.Add(obj);

          if(obj is IDictionary<string, object>)
          {
            var dict = (IDictionary<string, object>) obj;
            foreach(var value in dict.Values)
              Traverse(value);
          }
          else if(obj is Array)
          {
            var a = (Array) obj;
            for(int k = 0; k < a.Length; k++)
              Traverse(a.GetValue(k));
          }
        }
      }

      object Unlink(object obj, string label = null)
      {
        if(obj != null && (obj is IDictionary<string, object> || obj is Array))
        {
          string name;
          if(objects.TryGetValue(obj, out name) && name != label)
            return "@" + name;

          if(obj is IDictionary<string, object>)
          {
            var dict = (IDictionary<string, object>) obj;
            foreach(var key in dict.Keys.ToArray())
              dict[key] = Unlink(dict[key]);
          }
          else if(obj is Array)
          {
            var a = (Array) obj;
            for(int k = 0; k < a.Length; k++)
              a.SetValue(Unlink(a.GetValue(k)), k);
          }
          else if(obj is string)
          {
            var s = (string) obj;
            if(s.Length > 0 && s[0] == '@')
              return "@" + s;
          }
        }

        return obj;
      }
    }
  }
}

July 2, 2013

Mapping the Pyson syntax to JSON

The problem with creating a new syntax like Pyson is that it's necessary to reimplement a parser in every programming you wish to use it in. So an alternative could be to map the Pyson syntax on to JSON and only keep the semantic concepts associated with it, i.e. the
$$\text{AST} \to \text{data}$$
step.

For example, the following could be a JSON representation of a Pyson document we have seen a few times already
{
  "DateTime": {
    "#type": " @Type",
    "#args": [
      "DateTime",
      ["year", "month", "day"]
    ],
    "defaults"=[0, 0, 0],
    "format"="{year:D4}/{month:D2}/{day:D2} {hour:D2}:{minute:D2}:{second:D2}"
  }
  "obj3": {
    "#type": "@DateTime",
    "#args": [2013, 6, 24]
  }
  "period": [
    "@obj3",
    {
      "#type": "@DateTime",
      "#args": [2014, 6, 1]
    }
  ],
  "simulation": {
    "start": "@obj3"
  }
}
Here we've used strings starting with '@' to denote references and dictionary keys starting with '#' for keys with special meaning. The JSON version is somewhat more verbose and less readable than the Pyson version, however it has the advantage of being immediately parsable in any language that already supports JSON. To render it usable in the same way as Pyson, a second pass is needed to link the references and unescape strings that really should start with '@', and viceversa.

Pyson get some more syntax

Keeping in theme with the 'Python subset' idea, I thought of allowing keyword arguments in constructors as follows
DateTime = Type(
    "DateTime",
    ["year", "month", "day", "hour", "minute", "second"],
    defaults=[0, 0, 0],
    format="{year:D4}/{month:D2}/{day:D2} {hour:D2}:{minute:D2}:{second:D2}"
    )

June 28, 2013

Enabling formulas and code highlighting on Blogspot

This blog uses MathJax to embed LaTeX-style formulas and Google Code Prettify for syntax highlighting.

To install these tools select the Template tab from the blog admin panel and click Edit HTML. Insert the following code just after the opening head tag.
<script src='https://google-code-prettify.googlecode.com/svn/loader/run_prettify.js'/>    
  <script type='text/x-mathjax-config'>//<![CDATA[
    MathJax.Hub.Config({
      extensions: ["tex2jax.js"],
      jax: ["input/TeX", "output/HTML-CSS"],
      displayAlign: "left",
      styles: {
        ".MathJax_Display": {
          //"background-color": "rgb(230,230,230)",
          "padding-left": "4em",
          "float": "left",
          "display": "inline"
        }
      },
      tex2jax: {
        inlineMath: [ ['$','$'] ],
        displayMath: [ ['$$', '$$'] ],
        processEscapes: true
      },
      "HTML-CSS": { availableFonts: ["TeX"] }
    });
  //]]></script>
  <script src='http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default' type='text/javascript'>
</script>   
The in posts you'll be able to use the following syntax to highlight code
<pre class="prettyprint">
int f(int x)
{
  return 2 * x;
}
</pre>
and \$ and \$\$ delimiters to insert LaTeX math inline or in display style.

A couple of off topic tips. To make the text area adapt to the browser window, in the same editor search for the text .content-outer and edit the code around it as follows:
body {
min-width: 100px /*$(content.width);*/
}
.content-outer, .content-fauxcolumn-outer, .region-inner {
min-width: 100px /*$(content.width);*/
max-width: 2000px /*$(content.width);*/
_width: 100% /*$(content.width);*/
}
Also, in math jax, it's possible to define macros as follows:
TeX: {
  Macros: {
    cov: '{\\operatorname{cov}}',
    reals: '{\\mathbb{R}}',
    T: '{\\mathrm{T}}',
    tr: '{\\operatorname{tr}}',
    paren: ['\\left( #1 \\right)', 1],
    bracket: ['\\left[ #1 \\right]', 1],
    brace: ['\\left\\{ #1 \\right\\}', 1]
  }
}

June 25, 2013

Pyson, something between JSON and Python

JSON, by virtue of its minimalist nature, lacks a few features which make it difficult to efficiently represent some instances of structured data.

In particular, JSON only allows the representation of trees, i.e. each node has exactly one parent node with exception of the root element. What this means is that data cannot be reused in multiple places within a document.

In addition, JSON's type system is very limited - there is no way to define custom types. This means that most structures must be represented as dictionaries, making it necessary to repeat the schema (i.e. the dictionary keys) along with every instance of the data (i.e. the dictionary values), or requiring ugly mechanisms like special keys (e.g "__type__") to give each dictionary a label indicating how to interpret the data.

So the idea is to extend JSON with a couple of additional light-weight constructs to solve these issues. In the same original spirit which gave rise to JSON, I decided to borrow some syntax from Python, of which JSON is a subset (kind of). So what we have is
$$
\text{JSON} \subset \text{Pyson} \subset \text{Python}
$$

Syntax

The extra syntactic features in Pyson are:
  • The root element is no longer any value but a document, i.e. a list of key-value pairs of the form key = value:
    x = 3
    y = {'a': 3, 'b': 2}
    z = [true, false]
    
  • It is possibile to reference values which have been defined previously
    x = [1,2,3]
    y = {'z': x}
    u = [y, x, y]
    
  • It is possibile to define new types and instantiate them
    DateTime = Type("DateTime", ["year", "month", "day"])
    today = DateTime(2013, 6, 24)
    
Parsing and semantic analysis

Modifying a JSON parser to accommodate the new syntax is very straightforward. It's more complicated to get the reference semantics right.

In JSON, there is no difference between the parsing and semantic analysis because JSON values are a one-to-one representation of the abstract syntax tree (AST) generated by a parser.
$$
\text{JSON string} \quad\to_{\text{parser}}\quad \text{AST} = \text{data}
$$
This also means that converting a value stored in memory to a JSON string is immediate.
$$
\text{AST} = \text{data}\quad \to_{\text{to string}}\quad \text{JSON string}
$$

References, on the other hand, need to be resolved in the AST before it is a useful in-memory representation of the data.
$$
\text{Pyson string} \quad\to_{\text{parser}}\quad\text{AST}\quad\to_\text{linker}\quad \text{data}
$$
Serialization also becomes more complicated, because it's necessary to analyze the data, factor out common nodes and substitute in the references. It's also necessary that the variables are output in the correct order as required by the references.
$$
\text{data} \quad\to_\text{unlinker}\quad \text{AST} \quad\to_{\text{to string}}\quad \text{Pyson string}
$$

Note that when a Pyson string is parsed, linked, unlinked and converted back into a string, the output will be the same as the input (except for some irrelevant differences in variable order). When, however, the data being serialized has not been come from a Pyson string, it is possible that there are common nodes which don't have an explicit variable name - hence they must be generated automatically.

Examples: reading Pyson with C#

We'll read in this Pyson document
DateTime = Type("DateTime", ["year", "month", "day"])
today = DateTime(2013, 6, 24)
expiry = DateTime(2014, 6, 1)
period = [today, expiry]
The following C# code steps through parsing, linking, unlinking and conversion to string
List<KeyValuePair<string, object>> inputAST = PysonParser.Parse(inputString);

Dictionary<string, object> data = PysonParser.Link(inputAST);

List<KeyValuePair<string, object>> outputAST = PysonParser.Unlink(data);

string outputString = PysonParser.ToString(outputAST);
The output string comes out exactly the same as the input string. We'll rarely be interested in the intermediate ASTs so we can alternatively just write
Dictionary<string, object> data = PysonParser.Link(inputString);

string outputString = PysonParser.ToString(data);
From C# we can easily access the data using the dynamic keyword.
dynamic data = PysonParser.Link(inputString);
double startYear = data["period"][0]["year"];
double endYear = data["period"][1]["year"];
bool test1 = data["period"][0] is PysonInstance;
bool test2 = data["DateTime"] is PysonType;
bool test3 = data["period"][0].Type == data["DateTime"];

Examples: generating Pyson with C#

Here's an example of how to generate a Pyson string from an in-memory object.
var dateTime = new PysonType("DateTime", new object[] { "year", "month", "day" });
var today = dateTime.New(2013, 6, 24);
var expiry = dateTime.New(2014, 6, 1);

dynamic data = new Dictionary<string, object>();
data["period"] = new object[] { today, expiry };
 
string outputString = PysonParser.ToString(data);
The output isn't as readable as we might expect:
Type1 = Type("DateTime", ["year", "month", "day"])
period = [Type1(2013, 6, 24), Type1(2014, 6, 1)]
This is because the Dictionary object doesn't directly contain the DateTime type, so the rather ugly name 'Type1' is generated automatically. This can be fixed by adding it to the Dictionary:
var dateTime = new PysonType("DateTime", new object[] { "year", "month", "day" });
var today = dateTime.New(2013, 6, 24);
var expiry = dateTime.New(2014, 6, 1);

dynamic data = new Dictionary<string, object>();
data["period"] = new object[] { today, expiry };
data["DateTime"] = dateTime;
 
string outputString = PysonParser.ToString(data);
which generates:
DateTime = Type("DateTime", ["year", "month", "day"])
period = [DateTime(2013, 6, 24), DateTime(2014, 6, 1)]
Note that objects which appear more than once in the data are automatically factored out and placed before the first reference.
var dateTime = new PysonType("DateTime", new object[] { "year", "month", "day" });
var today = dateTime.New(2013, 6, 24);
var expiry = dateTime.New(2014, 6, 1);

dynamic data = new Dictionary<string, object>();
data["period"] = new object[] { today, expiry };
data["DateTime"] = dateTime;
data["simulation"] = new Dictionary>string, object<();
data["simulation"]["start"] = today;
 
string outputString = PysonParser.ToString(data);
and the output
DateTime = Type("DateTime", ["year", "month", "day"])
obj3 = DateTime(2013, 6, 24)
period = [obj3, DateTime(2014, 6, 1)]
simulation = { "start": obj3 }

May 21, 2013

Using COM constants in Python

The Python library win32com allows you to instantiate and manipulate COM objects transparently from Python. One aspect which is not automatic is the treatment of constants, for example the following code will generate an error unless you've performed some necessary steps.
import win32com.client
xl = win32com.client.Dispatch(“Excel.Application”)
print win32com.client.constants.xlWorkbookNormal
To make this work you first need to run the script makepy.py passing the relevant COM type in the command line.
C:\Python27\Lib\site-packages\win32com\client>makepy.py "Excel.Application"
After this step the above Python code should print -4143.

April 26, 2013

Visual Studio-style docking windows with DockPanel Suite

Installation

These instructions work with Visual Studio 2010.

Download the binaries from the DockPanel Suite website. In Visual Studio, select Choose Items from the Toolbox window's context menu (it may be hidden, display it with View > Toolbox). In .NET Framework Components click Browse and select the WeifenLuo.WinFormsUI.Docking.dll assembly downloaded from the website. This will add the DockPanel component to the Toolbox.

Usage

The container form must be an MDI form containing a DockPanel component. Add a new form to your Visual Studio project, and set its IsMdiContainer property to True. Place a DockPanel component on it and make it fill up the form's client area by setting its Dock property to Fill.

To add docking windows to the container form you can instantiate them in code and associate them to the container form by calling the Show method specifying the DockPanel as the owner.

public partial class Form1 : Form
{
  Form2 formA = new Form2();
  Form2 formB = new Form2();
  Form2 formC = new Form2();
 
  public Form1()
  {
    InitializeComponent();

    formA.Show(this.dockPanel1);
    formB.Show(this.dockPanel1);
    formC.Show(this.dockPanel1);
  }
}

Here's the result.





April 20, 2013

The Metropolis algorithm applied to Bayesian analysis

As the dimensionality of a Bayesian model increases, numerically integrating the parameter space becomes computationally intensive and, in all but the most premeditated cases, analytic computation is unfeasible. An alternative is to sample the posterior distribution
$$p(\theta|D) = \frac{p(D|\theta) p(\theta)}{\int p(D|\theta)p(\theta) d\theta}$$
via Monte-Carlo. The Metropolis algorithm, first discussed in Metropolis, Rosenblush, Teller (1953), allows you to sample from any distribution $p(x)$ providing you can compute a function $Cp(x)$ proportional to $p(x)$. In this case, the algorithm allows us to sample the posterior without performing the integral in the denominator.

The algorithm

We start by defining a transition probability
$$M(x,x') = q(x,x')\min\left(\frac{p(x')}{p(x)}, 1\right)$$
where $q$ is any arbitrary transition probability. In practice, to create a chain of samples based on $M$ we
  1. Sample $q(x, \cdot)$ to choose $x'$
  2. Move to $x'$ if $p(x') > p(x)$, else move to $x'$ with probability $p(x')/p(x)$, else remain in $x$.
As the number of iterations increases, the probability of being in state $x$ converges to $p(x)$. Intuitively, this happens because $p$ is an eigenfunction of $M$ with eigenvalue $1$. Further details can be found in Robert, Casella (2004).

Practical issues

Although the algorithm works in principle with any non-degenerate proposal distribution $q$, the optimal choice minimizes the burn-in period, i.e. the number of initial iterations we discard before considering the chain to have 'converged' to the target distribution.

April 17, 2013

The simplest possible Bayesian model

In chapter 5, Kruschke goes through the process of using Bayes' rule for updating belief in the 'lopsidedness' of a coin. Instead of using R I decided to try implementing the model in Excel.

The Model

Coin tosses are IID with a $\mathrm{Bernoulli}(\theta)$ distribution, thus the probability of observing $N$ of which $z$ are heads is
$$p(z|\theta, N) = \theta^z (1-\theta)^{(N-z)}$$
Using Bayes' rule we get
$$p(\theta|z,N) = \frac{\theta^z (1-\theta)^{N-z} p(\theta)}{p(z|N)}$$
Note that if $p(\theta)$ is already of the form
$$f(\theta;a,b) = \frac{\theta^{a-1} (1-\theta)^{b-1}}{B(a,b)}$$
where the denominator is a normalizing constant then $p(\theta|N) = f(\theta;a+z,b+N-z)$.

Excel Implementation

This is very straight-forward using the BETADIST function, which is cumulative, so we divide the unit interval into equally spaced subintervals, calculate the probability of $\theta$ being in the interval and update on the basis of the new observations.

Here's the result:


Discrete model

Using a discrete distribution for $\theta$ rather than a continuous parametric form we can allow for much more freedom in the choice of initial prior distribution. The update rule becomes
$$p(\theta|z_N,N)=\frac{p(z_N|z_{N-1},N-1,\theta)p(\theta|z_{N-1},N-1)}{\sum_{i=1}^n p(z_N|z_{N-1},N-1,\theta_i) p(\theta_i|z_{N-1},N-1)}$$
in other words
$$p_{N,j} = \frac{\tilde p_{N,j}}{\sum_{i=1}^n \tilde p_{N,i}}$$
where
$$\tilde p_{N,k} = (\theta_k y_N + (1-\theta_k)(1-y_N)) p_{N-1,j}$$
and $y_N = z_N - z_{N-1}$ is the result of the $N$th coin toss.

As expected the result is very similar to the continuous model.

In the discrete model we can assign any initial prior. Here is an example with an initial prior that assigns non-zero probability to only 3 values of $\theta$ (22.5%, 52.5% and 77.5%) and random tosses generated with a real theta of 20%. The distribution quickly converges to probability 1 that $\theta=22.5\%$.


April 12, 2013

Doing Bayesian Data Analysis

Assuming I can keep at it, I'll be making my way through Kruschke's Doing Bayesian Data Analysis. Here's a few concepts he goes through in Chapter 4.

The Bayes factor

This is a ratio which allows you to compare which out of two models best fits the data. By introducing a binary parameter which determines the choice of model, Bayes' rule
$$p(M_i|D)=\frac{p(D|M_i)p(M_i)}{p(D)}$$
gives us the Bayes factor
$$\frac{p(M_1|D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)}$$
or also
$$\frac{p(D|M_1)}{p(D|M_2)}=\frac{p(M_1|D)}{p(M_2|D)}\frac{p(M_2)}{p(M_1)}$$
Note that if we have no prior preference between the two models, these ratios are equal. The Bayes' factor is useful because on their own $p(D|M_i)$ scale on the basis of the model - the more complex the model, the lower the absolute value - but this is actually a good thing because by considering the ratio we automatically trade off complexity (when there's little data) against descriptive value (when the simpler model doesn't fit the data).

Data order invariance

I don't get this part - it's always true that $p(\theta|D', D)$ is invariant of the order in which Bayes' rule is applied, by definition. The factorized rule cited at the end is just the result of applying the definition of data independence given the parameters.

The problem with Bayesian mathematics

Computing the posterior given some new data usually means performing an integral, which, even using approximations, can be computationally intensive (especially since the posterior will be fed into an optimizer). Numerical integration on the other hand would impose a limit on the dimensionality of the model. Thus we winning method is Markov Chain Monte Carlo (MCMC).