Under certain conditions, people are gullible. Like when they are shown persuasive statistical evidence of something. I don’t know whether statistics make people gullible or if gullible people misinterpret statistical evidence.

The correlation coefficient is a statistical measure of the “goodness of fit.” How close does data set A match data set B? A correlation factor of +1 means they behave exactly the same way. If one is up 10% at a point then you could expect that the other is as well.

If -1 then they are perfectly unconnected. One goes up 10% the other goes down 10%.

At 0, the connection is random.

In some fields a correlation of two groups of data between .3 and .5 is considered moderate and anything over .8 is considered very strong.

The problems arise when we make an assessment that the strength of the correlation says something about causation. Assuming that correlation shows causation is a big, as in BIG, mistake.

The most you could say about highly correlated data sets is that perhaps there is a reason to look deeper. Perhaps there is a common motivator.

I know people who would argue that you cannot get very high correlations without some driving force. These people are “probability challenged.” Even very rare, and certainly unconnected events, can occur together. In theory, any correlation that is not prohibited will eventually occur.

As a test, suppose I could find two data sets with a correlation coefficient of .993. Very strongly correlated. Nearly perfect. There must be something that drives them to move together. Wrong!

- For the 10 years from 2000 to 2009, that .993 is the correlation of the number of divorces in Maine with US per capita consumption of margarine.
- Maybe you can find the common driving force of a .947 correlation between per capita consumption of cheese and the number of people who die by becoming tangled in their bed sheets. The correlation between such deaths and skiing facility revenue is higher still.
- There is a weaker but still strong correlation of .666 (curious coincidence) between the number of people who die by falling into a swimming pool and the number of films where Nicolas Cage appears.
- The number of lawyers in Ohio is strongly correlated .51 with banana prices and very strongly negatively correlated -.89 with per capita consumption of high fructose corn syrup.

Clearly statistical correlation “evidence” is not to be taken as conclusive.

You will be wise to take any statistic with a dose of skepticism. As you know 42.3% of all published statistics have been manufactured on the spot. 🙂

Statistics can help you focus your attention, but they do not provide proof of anything. Be cautious.

The correlation statistics above come from Spurious Connections at tylervigen.com If you were not a statistical skeptic before you go there, you will be when you leave.

