Computation, Statistics

Introduction to Bayesian Statistics, Part 3

The two most popular Markov chain Monte Carlo sampling algorithms are Gibbs sampling and Metropolis Hastings. These algorithms produce Markov chains. Numbers inside a Markov chain are dependent on only the previous number. In the context of sampling, we check the probability of the proposed value based on only the probability of the current value, and no other values.

Metropolis Hastings

The Metropolis Hastings sampling algorithm first selects a proposed value from the proposal distribution, centered at the initial guess. The probability ratio is then calculated – value of the probability distribution with the proposed value divided by the value at the current value.

Finally, a random number is drawn from a uniform distribution. If this number is greater than the probability ratio, the proposed value is rejected.

Gibbs sampling

Standard Metropolis Hastings requires the joint posterior probability distribution – priors times likelihood. For situations with multiple parameters (eg, estimating mean and standard deviation), it might not be feasible to sample from the full posterior distribution (or even describe nicely it mathematically).

In this case, we use Gibbs sampling. Gibbs samples from the full conditional distribution instead.


The Normal distribution has two parameters – mean and standard deviation. Each parameter needs its own prior, and both parameters will be in the likelihood expression.

The full conditional for the mean is the prior of the mean multiplied by the Normal likelihood. In this case we can remove the first term that only contains sigma. This is because it would become a constant in the full conditional.

We can’t easily remove sigma from the second term. Being unable to derive a full conditional that only contains one parameter is fairly common. In this case, we simply use the current value for sigma.

Which to use?

If your model involves multiple parameters and you can’t simplify the posterior into a single probability distribution, you should use Gibbs.

Next week in Part 4 I’ll walk through some R code and interpretation of output.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s