Last week we looked at averages and saw how they can be abused. This week we turn our attention to correlation and causation. Correlations are probably the most abused statistical entity, particularly by pseudo-science and disreputable media.
Let's be curious and ask 'correlation or causation?'.
What is a correlation?
A correlation is a relationship between two sets of data, whereby as one changes the other changes. A positive correlation means that when one set increases, the other increases. A negative correlation means when one set increases, the other decreases.
What is causation?
Causation is where one factor causes a change to another. There is cause and effect. An increase in one thing causes the other to also increase (positive causation), or decrease (negative causation).
What's the issue?
Those definitions are solid definitions, with seemingly little room to go wrong. A correlation is a relationship, causation means one thing causes another. A correlation does not imply causation. Correlations happen all the time between essentially disparate data. If you look hard enough you are likely to find a correlation. As a bit of fun I went to gov.uk's statistical data sets and picked two completely unrelated data sets, museum and gallery visits, and driving offences. I put the data into excel and graphed it:
There is a strong negative correlation there (the most negative correlation possible is -1, and this got -0.937). As the visits to the Natural History Museum increase, the total number of driving offences (defined as motor vehicle offences including a finding of guilt at court, fixed penalty notice, and/or written warning). It appears that as society gets more cultured it also gets safer at driving!
Of course it is futile to suggest that there is a causal link between visits to the Natural History Museum and the total number of driving offences. There is no denying the correlation in the data presented above. But there is no causality. The visits to the museum do not cause fewer driving offences. You could just as easily state that as we are spending less time dealing with court cases related to our bad driving, we have more time spend at museums.
I'm interested to hear your theories (sensible and wacky) on this correlation.
EDIT: Although there isn't always a causal link between two sets of data, there can be a common shared causal link to a third phenomena (thanks to Izzy Rill for the comment). As suggested above it could be that society is getting more cultured and cultured drivers are safer, or as Izzy suggested the roads leading to the museum could have been strewn with traps causing driving offences and putting off visitors! There is not always a common link, either.
EDIT: Although there isn't always a causal link between two sets of data, there can be a common shared causal link to a third phenomena (thanks to Izzy Rill for the comment). As suggested above it could be that society is getting more cultured and cultured drivers are safer, or as Izzy suggested the roads leading to the museum could have been strewn with traps causing driving offences and putting off visitors! There is not always a common link, either.
Is this dangerous?
The above example is amusing. There is no risk of death by having fun with that graph. However, things can take a very sinister turn when certain parties get hold of data presenting a strong correlation and wrongly conclude there is causation.
The biggest example of this is, of course, that the MMR vaccine causes autism. There is a correlation:
But, as we have seen, a correlation does not imply causation. Epidemiological studies (ones that study a large group of the population) have shown no link between the MMR vaccine and autism.
So, what could explain the correlation? Some correlations are mere coincidences, some have explanations. One school of thought is that autism diagnoses became more common around the time of the MMR vaccine roll out. As autism became more and more understood, more and more people were diagnosed with it. Autism is also one of those tricky subjects. Many people claim that people are mis-diagnosed with autism simply as a way of explaining a child's behaviour. I am not an expert in autism, so I won't pass any view on this, but I am certain that though there is a correlation there is no causation between the vaccine and autism.
It is very dangerous to jump to conclusions when it comes to vaccines. The MMR vaccine practically wiped out measles, mumps and rubella, but they are on the rise again as fewer and fewer parents are taking their children to be vaccinated. I understand the concern. If the media is scaremongering about a vaccine causing autism, you can't blame parents not wanting to give it to their children. That is why it is important to understand the difference between correlation and causation, and to look at all sides of every argument.
The risk of not having the MMR vaccine is great. The evidence supporting the vaccine-autism causation is not.
Curious Correlation
Searching correlation and causation on Google's Ngram viewer presents this interesting correlation:
Did you find any interesting correlations in the gov.uk data sets? Any theories on them? How about in the Google Ngrams? Let me know in the comments below. As always you can share this post using the tools to the left and below the post. You can follow It Is All Science using the buttons on the right.
Remember, it is all science. So, let's be curious.
Remember, it is all science. So, let's be curious.
No comments:
Post a Comment