April 20, 2013

The Metropolis algorithm applied to Bayesian analysis

As the dimensionality of a Bayesian model increases, numerically integrating the parameter space becomes computationally intensive and, in all but the most premeditated cases, analytic computation is unfeasible. An alternative is to sample the posterior distribution
$$p(\theta|D) = \frac{p(D|\theta) p(\theta)}{\int p(D|\theta)p(\theta) d\theta}$$
via Monte-Carlo. The Metropolis algorithm, first discussed in Metropolis, Rosenblush, Teller (1953), allows you to sample from any distribution $p(x)$ providing you can compute a function $Cp(x)$ proportional to $p(x)$. In this case, the algorithm allows us to sample the posterior without performing the integral in the denominator.

The algorithm

We start by defining a transition probability
$$M(x,x') = q(x,x')\min\left(\frac{p(x')}{p(x)}, 1\right)$$
where $q$ is any arbitrary transition probability. In practice, to create a chain of samples based on $M$ we
  1. Sample $q(x, \cdot)$ to choose $x'$
  2. Move to $x'$ if $p(x') > p(x)$, else move to $x'$ with probability $p(x')/p(x)$, else remain in $x$.
As the number of iterations increases, the probability of being in state $x$ converges to $p(x)$. Intuitively, this happens because $p$ is an eigenfunction of $M$ with eigenvalue $1$. Further details can be found in Robert, Casella (2004).

Practical issues

Although the algorithm works in principle with any non-degenerate proposal distribution $q$, the optimal choice minimizes the burn-in period, i.e. the number of initial iterations we discard before considering the chain to have 'converged' to the target distribution.

No comments:

Post a Comment