$$p(\theta|D) = \frac{p(D|\theta) p(\theta)}{\int p(D|\theta)p(\theta) d\theta}$$
via Monte-Carlo. The Metropolis algorithm, first discussed in Metropolis, Rosenblush, Teller (1953), allows you to sample from any distribution $p(x)$ providing you can compute a function $Cp(x)$ proportional to $p(x)$. In this case, the algorithm allows us to sample the posterior without performing the integral in the denominator.
The algorithm
We start by defining a transition probability
$$M(x,x') = q(x,x')\min\left(\frac{p(x')}{p(x)}, 1\right)$$
where $q$ is any arbitrary transition probability. In practice, to create a chain of samples based on $M$ we
- Sample $q(x, \cdot)$ to choose $x'$
- Move to $x'$ if $p(x') > p(x)$, else move to $x'$ with probability $p(x')/p(x)$, else remain in $x$.
As the number of iterations increases, the probability of being in state $x$ converges to $p(x)$. Intuitively, this happens because $p$ is an eigenfunction of $M$ with eigenvalue $1$. Further details can be found in Robert, Casella (2004).
Practical issues
Although the algorithm works in principle with any non-degenerate proposal distribution $q$, the optimal choice minimizes the burn-in period, i.e. the number of initial iterations we discard before considering the chain to have 'converged' to the target distribution.
Although the algorithm works in principle with any non-degenerate proposal distribution $q$, the optimal choice minimizes the burn-in period, i.e. the number of initial iterations we discard before considering the chain to have 'converged' to the target distribution.