Normal Distribution Curve

CellPhone_BellCurve_Normal_Distribution_Big_Data

Community posts are submitted by members of the Big Data Community and span a range of themes. If you would like to contribute to the blog, just register to join the community.

In the orange juice problem, you’re trying to blend juices to take out variability, so is it reasonable to expect that the product you’re getting from your suppliers In the orange juice problem, you’re trying to blend juices to take out variability, so is it reasonable to expect that the product you’re getting from your suppliers won’t have variable specs?

Chances are that shipment of Biondo Commune orange juice you’re getting from Egypt won’t have an exact 13 Brix/Acid ratio. That may be the expected number, but there’s probably some give around it. And oftentimes, that wiggle room can be characterized using a probability distribution.

A probability distribution, loosely speaking, gives a likelihood to each possible outcome of some situation, and all the probabilities add up to 1. Perhaps the most famous and widely used distribution is the normal distribution, otherwise known as the “bell curve.” The reason why the bell curve crops up a lot is because when you have a bunch of independent, complex, real-world factors added together that produce randomly distributed data, that data will often be distributed in a normal or bell-like way. This is called the central limit theorem.

To see this, let’s do a little experiment. Pull out your cell phone and grab the last four digits of each of your saved contacts’ phone numbers. Digit one will probably be uniformly distributed between 0 and 9, meaning each of those digits will show up roughly the same amount. Same goes for digits 2, 3, and 4.

Now, let’s take these four “random variables” and sum them. The lowest number you could get is 0 (0 + 0 + 0 + 0). The highest is 36 (9 + 9 + 9 + 9). There’s only one way to get 0 and 36. There are four ways to get 1 and four ways to get 35, but there’s a ton of ways to get 20. So if you did this to enough phone numbers and graphed a bar chart of the various sums, you’d have a bell curve that looks a little like the above.

So what do you think? Pretty juicy, eh?

Read more and see other examples from Chapter 4 of John W. Foreman’s book, Data Smart: Using Data Science to Transform Information into Insight.

Want to add your own blog to the Big Data Week community? Just register to get started.