The Replication Crisis in Psychology

LEARNING OBJECTIVES

  • Describe the role of replication within the scientific method
  • Describe how statistics and biases have contributed to a crisis of replication within the field of psychology

The Scientific Method

As addressed elsewhere in this course, within psychology we use the scientific method as our tool to advance our knowledge within the field. Using this method we take our current understanding of the psychological world, and from that we derive testable hypotheses about our theoretical models. We test those hypotheses by gathering data, we openly share our methodology and our data via peer-reviewed publication, and based on these data we draw conclusions about the implications for our theoretical models of psychology.

Within this model, our current understanding of psychology is built on all the published research that exists within the field. This includes historical research, and it includes the newer research that has since superseded the historical data. Those who conduct research in the field today derive their hypotheses from the aggregated body of research that has taken place in the past.

The Role of Replication

The openness of psychological research – sharing our methodology and data via publication – is a key to the effectiveness of the scientific method. This allows other psychologists to know exactly how data were gathered, and it allows them to potentially use the same methods to test new hypotheses.

In an ideal world, this openness also allows other researchers to check whether a study was valid by replication – essentially, using the same methods to see if they yield the same results. The ability to replicate allows us to hold researchers accountable for their work.

Statistics and p-values

Replication is particularly important in a field such as psychology, which relies on statistics for its data. Human beings are a very diverse group, and so any study of human psychology will yield a diverse set of results. Statistics are used to identify the hidden patterns within the overall noisy and diverse data of human psychology.

The challenge within statistics is that sometimes randomness doesn’t look random. For instance, if you toss a coin 100 times, then most of the time you’ll get about 50 heads and 50 tails. But if a hundred people each toss a coin a hundred times, then some of those people will get 60 heads and 40 tails, or even 70 heads and 30 tails. Some people will get stretches where heads come up 10 times in a row. The question, then, is whether these kinds of patterns happened for a reason, or happened just out of randomness. That’s where statistics come in.

Many statistics are used in the field of psychology, and many yield results in the form of what is called a p-value. As noted previously in our course, a p-value is a statement of probability, the probability that a given result was just a result of randomness.

Such statistics are used in many fields – psychology, economics, medicine – and within each field there are generally accepted standards for p-values. Within psychology, the most common standard for p-values is “p < .05”. What this means is that there is less than a 5% probability that the results happened just by random chance, and therefore a 95% probability that a results reflects a meaningful pattern in human psychology. We call this statistical significance, and, statistically speaking, this is a pretty high standard. You need a pretty defined pattern within a research sample to reach that threshold.

However, this standard contains two problems. The first is that, if your sample is large enough, you can get a statistically significant p-value for a pretty small effect. So, while we may be able to say that a result is statistically significant, it might not actually be all that interesting in the context of how we understand human psychology.

The greater problem, however, comes when we multiple p-values over many studies. It’s one thing to say that we have only a 5% chance that a result came from randomness within one study. However, when we publish thousands of studies a year, all with the standard of p < .05, we must conclude that some nontrivial number of those studies are built around results that were, in fact, random. A 5% probability of randomness multiplied across thousands of studies makes this inevitable.

Publication Bias and Unreported Data

However, that probability may actually underestimate the number of studies that reflect random results that simply didn’t look random. This is because of the problem of publication bias within psychology. Publication bias takes many forms, but one of the most problematic is that psychology research journals are far more likely to publish studies that find statistically significant results than they are studies that fail to find statistically significant results. What this means is that studies that yield results are not statistically significant are very unlikely to get published. And, because researchers know this, they often do not bother to write up research and submit it for publication if they found non-significant results. Writing up research is time-consuming and hard, and doing so with almost no chance of publication is seen as a waste of time.

This has profound implications for the body of published research in psychology. Let’s say that twenty researchers are all studying the same phenomenon. Out of the twenty, one gets statistically significant results, while the other nineteen all get non-significant results. The statistically significant result was likely just a result of randomness, but because of publication bias that study’s results are far more likely to be published than are the results of the other nineteen. Thus, our knowledge of the field will be distorted.

Publication Bias and the Replication Crisis

In an ideal scientific world this aberrant result could then be subsequently tested by replication of the study; and, when replication failed, we could better understand whether that result really meant something.

However, replication within psychology is not an appealing task for researchers. For one thing, most researchers got into the field with the goal of learning exciting new things, not simply whiling away their time testing other people’s claims. For another, psychology journals also like to publish exciting new things, rather than tests of other people’s claims. This means there is little incentive for researchers to engage in replication – it’s a lot of work that isn’t often rewarded.

The implications of all this are profound, and largely suggest that a lot of what we think we know about human psychology may, in fact, be wrong.

It’s only in recent years that researchers have started to grapple with this reality. The term replication crisis, was coined to address the idea that the lack of replication within the field is the main driving factor in creating this problem.

The Reproducibility Project

Since the field started focusing more attention on the replication crisis, some researchers have started doing replications of classic, accepted studies within psychology. The Reproducibility Project has attempted to replicate 100 studies within the field of psychology that were published with statistically significant results; they found that many of these results did not replicate well. Some did not reach statistical significance when replicated. Others reached statistical significance, but with much weaker effects than in the original study.

Note that this in itself does not mean that the original studies were bad, fraudulent, or even wrong. What it means, at its core, is that replication found results that were different from the results of the original studies, sufficiently different that we might no longer be secure in our knowledge of what those results mean. Further replication and testing in other directions might give us a better understanding of why the results were different, but that too will require time and resources.

These results have, unsurprisingly, met with some resistance, both from researchers who published the original research, and from others who believe the replication efforts have their own methodological problems.

Replication Crises in Other Fields

Within science media, psychology has received the most attention for its replication crisis. However, other fields are facing similar issues. Two Federal Reserve researchers found they were unable to replicate a large number of results from the field of economics. Researchers in medical science have found that many clinical medical results cannot be reproduced. A recent study had difficulty replicating results of important cancer studies.

Any field that is subject to publication bias and/or the use of probabilistic statistics in its research needs replication in order to test and validate its results. However, until recently replication was not a priority within these fields

An Exciting Time in Psychology!

All of this may sound like these are dark times for the field, but in fact this is an exciting time to be a psychological scientist. The replication crisis is helping highlight the scientific method as a process, and how we can be better in using that method. Researchers and journals are exploring how to use better statistics to analyze data, and how the field might make better use of replication in the future. We are coming to a better understanding of what we know, and that’s a reality that’s exciting for any scientist. The best feature of the scientific method is that over time science is self-correcting. Within science the data always win, and as we make better use of data our field will grow in exciting ways.

GLOSSARY

publication bias: a number of biases that affect which scientific studies will be published in journals; a common example is the preference within journals to publish studies that find statistically significant results, rather than those that fail to find statistically significant results
replication: within psychological science, the recreation of a study using the same methods to validate whether its conclusions were correct