I'd love to be able to recognise the creator of this graph, but its popped up so many places I'm not sure who's work is actually is. Still, we can all see from the graph quite clearly that the more chocolate each person in a country consumes, the more Nobel Prizes that country has received over the years.
More importantly, how well these two items are related can be measured relatively easily, using something called correlation. The wonderful person who created this graph has in fact also measured the correlation between chocal consumption and Nobel Prizes being awarded, and you can see it in the top left corner of the graph, where it says r=0.791.
Now, there are a large number of ways in which correlation can be calculated, depending on what sort of data you have, but what they all measure fundamentally is how close your data is to a straight line. If all your points are on a straight line, your correlation or r will be equal to 1. If all points are spread out randomly, it will be zero (if the line is pointing downwards, the r value will actually be negative, but r=-1 is as good as r=+1). A value of r=0.791 is pretty reasonable - the two variables are said to be higly correlated.
Now, in no way does this imply causality. A country could not improve its chances of receiving a Nobel Prize by handing out copious amounts of free chocolate to its school kids or population at large, and expect to start receiving Nobel Prizes left, right and centre. The set of data above could, in fact, be random. Or, there could be some other, underlying, relationship. Perhaps people who receive Nobel Prizes tend to come from welathy countries, countries where people can afford to eat chocolate at will. Sometimes looking at the outliers can tell you something as useful as looking at the data points in the trend itself. In this case, what's going wrong in Germany??? All that chocolate eating, and yet its substandard when it comes to Nobel Prizes?? Sweden does alright though, plenty of Nobel Prizes, without all the investment in sweets.
All this thinking about correlation brought to mind a graph that floated around the twitterverse earlier this year, purporting to prove that forcing an economy like Greece to reform and consolidate quickly (ie austerity measures) will only result in a worse result at the end of the day. Here it is below, again unattributed (but happy to correct if the author lets me know)
Looks pretty convincing, doesn't it. Nice straight line, lots of points on the line, including, down at the bottom, Greece, trying to consolidate faster than the rest, and getting it more wrong than anyone else (x axis is attempted reduction in spending, y axis is how wrong they got their growth predictions).
Using a spreadsheet like Excel, it's pretty easy to get a basic value for correlation here (doesn't really matter what sort for the moment), using the '=CORREL('x values', 'y values') formula. To get the values off the graph, I simply read off the axes, to the nearest half value (not particularly precise, but good enough for our purposes.
Using my rough and ready measurements, I calculated a correlation of r=-0.68! Less than that for Nobel Prizes and chocolate consumption.
Remember I said sometimes it is instructive to look at the outliers in any data set? I redid the calculations, taking out Greece, and got a new value of r=-0.5. Borderline low correlation. Much lower than for Nobel Prizes & Chocolates.
Hmm. What does this mean for all those economists claiming that austerity is not a good plan? Not a lot really. They may well be right. But this graph doesn't really prove that convincingly. Nor does it disprove it either, for that matter. what we can take from this little analysis is that whatever is happening in Greece is somewhat different to what is happening in the other parts of the world, given that it has such a disproportionate effect on the data. Certainly makes it an interesting place to look at, economically speaking.
For our students, its a nice little study on the importance of looking at data critically. Graphs can be a great way to communicate information, but they need to be aware both that correlation is not causality, and also of the impact of outliers