Saturday 16 August 2014

How are Graphs Manipulated? Skewed Statistics 3

How Are Graphs Manipulated?

Graphs are useful to present a sets of data in a visually appealing way whilst still containing all the information. Last week in the Skewed Statistics series, we looked at correlation and causation, a topic which often includes graphs. We saw then how data can be manipulated to imply causation. Graphs, too, are easily manipulated. Sometimes this is to make the data easier to read or understand. Other times this manipulation can mislead. It is important to know how graphs are used and abused.

Let's be curious and ask, 'how are graphs manipulated?'


Graph Basics

Before we look at how graphs can be manipulated, I feel it is important to go over the basics of graphs. If you are already well versed in graphs, feel free to skip this section.

A graph typically has two axes. The x-axis (or abscissa) is the horizontal axis. The y-axis (or ordinate) is the vertical axis. Some graphs have a third z-axis, at right angles to the other two. This post will focus on two-dimensional graphs using only x- and y-axes.

The x-axis will normally be the independent axis. That means it is not affected by the other data. For example it could be the date, gender, or amount given. The y-axis, therefore, is the dependant axis, meaning it is affected by the data on the x-axis. For example it could be the number of deer in a field (possibly dependant on time of year), average height (possibly, dependant on gender), or time taken for a drug to act (possibly dependent on amount given).

Example 1 - titles, labels, and bars

Everybody knows that you should title the axes of your graph. The title should clearly state what the axis shows and what the units are. However, by leaving the titles off, one can leave the graph open to interpretation. As well as the titles saying what the axis shows, we also need labels so we know what each point relates to. Is it going up in 2s? 10? Millions?

Let's say we have a graph investigating the number of customers to a shop between June and August. The shop wants to know if their customers increase throughout the summer. Here's the graph:
Unlabelled Graph
Wow. Whatever the shop is doing, it is working. August got more than twice the number of customers than June. Right? Wrong. Here's what's wrong:
  1. No y-axis title (what's it showing?)
  2. No y-axis data labels (what is it going up in?)
  3. Bar graphs are notoriously easy to manipulate. The relative sizes of the bars do not correspond to a percentage increase
This is the actual data I plotted:
Labelled Graph
Labelled and Proportionate bar graph

Neither are the best graph as the relative sizes of the bars in the first are misleading. The second is very difficult to see any change, though maybe that's for the best as there was very little change. However, you can now see the actual values. Between June and August the shop only had 10 more customers, an increase of just 1 %, a far cry from the 100+ % increase implied by the relative bar sizes.

Example 2 - minimums and maximums

Graphs don't have to start at 0. However, they do require even spacing. You can't have an axis that goes up in 2s, then 5s, then 2s again. Nor can you put a 0 in, jump up to a million, then go up in 10s therein. You are better to start at a million and go up in 10s.

This issue leads to graphs that make small differences seem large (like the example above), or large differences seem small. Look at these graphs:
Natural History Museum Graph comparison
Both graphs are showing the same data. The top one makes the change look large. The bottom one makes the increase seem slight. Personally, I think the top one is the most representative. There is a change of around 2.4 million visitors (reading from the line-of-best-fit), which is a 77 % increase.

The issue with the bottom one is that it starts at 0, even though there are no values below 3 million. Then it ends at 8 million, despite having no values above 6 million. That all makes the slope look shallow and insignificant.

Fox News bad graph
Image Source: statisticshowto.com/misleading-graphs
This is a real life example from Fox News. Look at that, if Bush tax cuts expired then the top take rate would be many many times times larger. Only they're not, if you look at the actual figures. 35 % 'now', and 39.6 % after. A difference of 4.6 %. That just shows how easy it is to make a graph look shocking by adjusting the minimum.

Example 3 - pictorial histograms

Pictographs are the perennial favourite for news outlets. You can put a pretty picture of the thing you're representing, and change the size of the image to show larger numbers. The issue is when you double the height of a picture, you also double the width making it look 4 times bigger. Here's an example:
Bad pictorial histogram
Image Source: narragansett.k12.ri.us
The data is saying that the US produced twice as much trash in 1980 than 1960. But what happens when you glance at it? Well, the second bin is 4x the size of the first bin. This shows just that:
Bad pictorial histogram explanation
Image Source: narragansett.k12.ri.us
Edited by Matthew Bird
It is better to use pictographs, rather than pictorial histograms for this reason. The graph would be far clearer if the image had just shown two bins stacked on top of one another.


Have you got any other examples of badly manipulated graphs? Let me know in the comments below. As ever, you can share this post with the links to the left, and follow It Is All Science with the links to the right.

Remember, it is all science. Let's be curious.

No comments:

Post a Comment

Google+