Perhaps the most famous and widely used distribution is the normal distribution, otherwise known as the “bell curve.” The reason why the bell curve crops up a lot is because when you have a bunch of independent, complex, real-world factors added together that produce randomly distributed data, that data will often be distributed in a normal or bell-like way. This is called the central limit theorem.
Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions.
When you were a child, perhaps there came that day when someone explained to you that Santa Claus didn’t exist, outside of men with bad rosacea dressed up at the mall. Well, today I’m going to shatter another belief: your not-from-concentrate premium orange juice was not hand squeezed. In fact, the pulp in it is probably from different oranges than the juice, and the juice has been pulled from different vats and blended according to mathematical models to ensure that each carafe you drink tastes the same as the last.
In this modern (Big Data), connected (Smartphones), advertising ridden (Facebook, Twitter, Google) world, where the Internet of Things means devices from your watch and fridge to your thermostat and house plants spit out data, the challenge of bringing data together to a valuable end will become increasingly difficult unless this data is consilient. Here I explain what consilience means for data and discuss its benefits.