Statistics

Probability distributions that aren’t Normal

Many people are aware of the normal distribution or “bell curve”. What are some other probability distributions and when are they useful?

3bWLN7f

You can think of a probability distribution as a collection of the number of times something happened. For example, how many students get which grade (70%, 73%, 94%, etc). We can visualize this type of information using a bar graph or histogram.

p2_hist_i1
In this histogram, the data had values between 0.2 and 0.21 15 times.

The normal distribution is nice because its symmetrical. However, many things are not symmetrical. Imagine the number of people that get a certain range of income. Since you can’t make less than $0 there is a lower boundary, but there is no upper boundary. This makes the distribution skewed.

Distribution_of_Annual_Household_Income_in_the_United_States_2010.png

You can think of this as a skewed normal distribution, but this doesn’t mean much mathematically. There are other probability distributions with other shapes for situations that are not symmetrical.

434px-Binomial_distribution_pmf.svg

Binomial Distribution

This is probably the second most popular distribution. The binomial distribution is based on the chance that something occurs, such as flipping a coin and getting heads or tails, or rolling a die. The binomial distribution is discrete, meaning it only uses whole numbers.

800px-Gamma_distribution_pdf.svg

Gamma Distribution 

This distribution would be a better fit for income data, since with certain parameters it can be very skewed. Since it is so flexible, it is often used in Bayesian statistics to define a prior distribution. See my introduction to Bayesian statistics here.

531px-Beta_distribution_pdf.svg

Beta Distribution

The beta distribution is similar to the gamma in that it is very flexible, but it is only defined between 0 to 1. It is useful in Bayesian statistics when you are looking at binomial data since the chance something occurs is always between 0 and 1.

 

360px-Cauchy_pdf.svg

Cauchy Distribution

The Cauchy distribution is my favorite because its mean does not exist! If your data came from a cauchy distribution, it would not converge to an average value. Data like this are highly variable. Chemical chromatograms are like this, since so many variables influence the data.

Base_peak_chromatogram
A chromatogram. Each peak is a different substance that comes out in order. The peak height, retention time, and even the order can change depending on many variables, including time of day or room temperature.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s