Causation and Lurking Variables (1 of 2)


Learning Objectives

  • Distinguish between association and causation. Identify lurking variables that may explain an observed relationship.


A common mistake people make when describing the relationship between two quantitative variables is that they confuse association and causation. This mistake is so common that we devote this entire section to clarifying the difference.

This confusion often occurs when there is a strong relationship between the two quantitative variables. In the case of a linear relationship, people mistakenly interpret an r-value that is close to 1 or -1 as evidence that the explanatory variable causes changes in the response variable. In this case, the correct interpretation is that there is a statistical relationship between the variables, not a causal link. In other words, the explanatory variable and the response variable vary together in a predictable way. There is an association between the variables. But this should not be interpreted as a cause-and-effect relationship.

Let’s look at an example.


Fire Damage

The scatterplot below shows the relationship between the number of firefighters sent to fires (x) and the amount of damage caused by fires (y) in a certain city.

Scatterplot correlating number of firefighters with amount of damage done to properties in US dollars 

The scatterplot shows a positive association with a somewhat strong curvilinear form. An increase in the number of firefighters is associated with an increase in the damage done by the fire.

Can we conclude that the increase in firefighters causes the increase in damage? Of course not.

A third variable is at play in the background – the seriousness of the fire – and is responsible for the observed relationship. More serious fires require more firefighters and also result in more damage.

The following figure will help you visualize this situation:

Illustration of a lurking variable. In this case the lurking variable is the seriousness of the fire, where the X variable (the number of firefighters, and the Y variable (the amount of damage done) has an impact on the outcome, though the seriousness of the fire isn't studied as a factor. 

The seriousness of the fire is a lurking variable. A lurking variable is a variable that is not measured in the study. It is a third variable that is neither the explanatory nor the response variable, but it affects your interpretation of the relationship between the explanatory and response variables.

In our example, the lurking variable has an effect on both the explanatory and the response variables. This common effect creates the observed association between the explanatory and response variables even though there is no cause-and-effect link between them.

Learn By Doing

Learn By Doing