- Describe replication and its importance to psychology
This is a little difficult for a psychologist to ask, but here goes: when you think of a “science” which one of these is more likely to come to mind: physics or psychology?
We suspect you chose “physics” (though we don’t have the data, so maybe not!).
Despite the higher “status” of physics and chemistry in the world of science over psychology, good scientific reasoning is just as important in psychology. Valid logic, careful methodology, strong results, and empirically supported conclusions should be sought after regardless of the topic area.
We would like you to exercise your scientific reasoning using the example below. Read the passage “Watching TV is Related to Math Ability” and answer a few questions afterward.
Watching TV is Related to Math Ability
Television is often criticized for having a negative impact on our youth. Everything from aggressive behavior to obesity in children seems to be blamed on their television viewing habits. On the other hand, TV also provides us with much of our news and entertainment, and has become a major source of education for children, with shows like Sesame Street teaching children to count and say the alphabet.
Recently, researchers Ian McAtee and Leo Geraci at Harvard University did some research to examine if TV watching might have beneficial effects on cognition. The approach was fairly simple. Children between the ages of 12-14 were either asked to watch a television sitcom or do arithmetic problems, and while they were doing these activities, images of their brains were recorded using fMRI (functional magnetic resonance imaging). This technique measures the flow of blood to specific parts of the brain during performance, allowing scientists to create images of the areas that are activated during cognition.
Results revealed that similar areas of the parietal lobes were active during TV watching (the red area of the brain image on the top) and during arithmetic problem solving (the red area of the brain image on the bottom). This area of the brain has been implicated in other research as being important for abstract thought, suggesting that both TV watching and arithmetic processing may have beneficial effects on cognition. “We were somewhat surprised that TV watching would activate brain areas involved in higher-order thought processes because TV watching is typically considered a passive activity,” said McAtee. Added Geraci, “The next step is to see what specific content on the TV show led to the pattern of activation that mimicked math performance, so we need to better understand that aspect of the data. We also need to compare TV watching to other types of cognitive skills, like reading comprehension and writing.” Although this is only the beginning to this type of research, these findings certainly question the accepted wisdom that the “idiot box” is harmful to children’s cognitive functioning.
Indicate whether you agree or disagree with the following statements about the article. There are no incorrect answers.
The article was well written.
- strongly disagree
- strongly agree
The title, “Watching TV is Related to Math Ability” was a good description of the results.
- strongly disagree
- strongly agree
The scientific argument in the article made sense.
- strongly disagree
- strongly agree
It is pretty surprising to learn that watching television can improve your math ability, and the fact that we can identify the area in the brain that produces this relationship shows how far psychology has progressed as a science.
Or maybe not.
The article you just read and rated was not an account of real research. Ian McAtee and Leo Geraci are not real people and the study discussed was never conducted (as far as we know). The article was written by psychologists David McCabe and Alan Castel for a study they published in 2008. They asked people to do exactly what you just did: read this article and two others and rate them.
McCabe and Castel wondered if people’s biases about science influence the way they judge the information they read. In other words, if what you are reading looks more scientific, do you assume it is better science?
In recent years, neuroscience has impressed a lot of people as “real science,” when compared to the “soft science” of psychology. Did you notice the pictures of the brain next to the article that you just read? Do you think that picture had any influence on your evaluation of the scientific quality of the article? The brain pictures actually added no new information that was not already in the article itself other than showing you exactly where in the brain the relevant part of the parietal lobe is located. The red marks are in the same locations in both brain pictures, but we already knew that “Results revealed that similar areas in the parietal lobes were active during TV watching…and during arithmetic solving.”
McCabe & Castel Experiment
McCabe and Castel wrote three brief (fake) scientific articles that appeared to be typical reports like those you might find in a textbook or news source, all with brain activity as part of the story. In addition to the one you read (“Watching TV is related to math ability “) others had these titles: “Meditation enhances creative thought” and “Playing video games benefits attention.”
All of the articles had flawed scientific reasoning. In the “Watching TV is Related to Math Ability” article that you read, the only “result” that is reported is that a particular brain area (a part of the parietal lobe) is active when a person is watching TV and when he or she is working on math. The highlighted part of the next sentence is where the article goes too far: “This area of the brain has been implicated in other research as being important for abstract thought, suggesting that both tv watching and arithmetic processing may have beneficial effects on cognition.”
The fact that the same area of the brain is active for two different activities does not “suggest” that either one is beneficial or that there is any interesting similarity in mental or brain activity between the processes. The final part of the article goes on and on about how this supposedly surprising finding is intriguing and deserves extensive exploration.
The researchers asked 156 college students to read the three articles and rate them for how much they made sense scientifically, as well as rating the quality of the writing and the accuracy of the title.
Everybody read the same articles, but the picture that accompanied the article differed according to create three experimental conditions. For the article in the brain image condition, subjects saw one of the following brain images to the side of the article:
Graphs are a common and effective way to display results in science and other areas, but most people are so used to seeing graphs that (according to McCabe and Castel) people should be less impressed by them than by brain images. The figures below show the graphs that accompanied the three articles for the bar graph condition. The results shown in the graphs were made up by the experimenters, but what they show is consistent with the information in the article.
Finally, in the control condition, the article was presented without any accompanying figure or picture. The control condition tells us how the subjects rate the articles without any extraneous, but potentially biasing, illustrations.
Each participant read all three articles: one with a brain image, one with a bar graph, and one without any illustration (the control condition). Across all the participants, each article was presented approximately the same number of times in each condition, and the order in which the articles were presented was randomized.
Immediately after reading each article, the participants rated their agreement with three statements: (a) The article was well written, (b) The title was a good description of the results, and (c) The scientific reasoning in the article made sense. Each rating was on a 4-point scale: (score=1) strongly disagree, (score=2) disagree, (score=3) agree, and (score=4) strongly agree. Remember that the written part of the articles was exactly the same in all three conditions, so the ratings should have been the same if people were not using the illustrations to influence their conclusions.
Before going on, let’s make sure you know the basic design of this experiment. In other words, can you identify the critical variables used in the study according to their function?
Results for (a) Accuracy of the Title and (b) Quality of the Writing
The first two questions for the participants were about (a) the accuracy of the title and (b) the quality of the writing. These questions were included to assure that the participants had read the articles closely. The experimenters expected that there would be no differences in the ratings for the three conditions for these questions. For the question about the title, their prediction was correct. Subjects gave about the same rating to the titles in all three conditions, agreeing that it was accurate.
For question (b) about the quality of the writing, the experimenters found that the two conditions with illustrations (the brain images and the bar graphs) were rated higher than the control condition. Apparently just the presence of an illustration made the writing seem better. This result was not predicted.
Results for (c) Scientific Reasoning Assessment
The main hypothesis behind this study was that subjects would rate the quality of the scientific reasoning in the article higher when it was accompanied by a brain image than when there was a bar graph or there was no illustration at all. If the ratings differed among conditions, then the illustrations—which added nothing substantial that was not in the writing—had to be the cause.
Use the graph below to show your predicted results of the experiment. Move the bars to the point where you think people generally agreed or disagreed with the statement that “the scientific reasoning in the article made sense.” Higher bars mean that the person believes the reasoning in the article is better, and a lower bar means that they judge the reasoning as worse. Click on “Show Results” when you are done to compare your prediction with the actual results.
RESULTS: The results supported the experimenters’ prediction. The scientific reasoning for the Brain Image condition was rated as significantly higher than for either other condition. There was no significant difference between the Bar Graph condition and the Control condition. Here is a graph of the results:
McCabe and Castel conducted two more experiments, changing the stories, the images, and the wording of the questions in each. Across the three experiments, they tested almost 400 college students and their results were consistent: participants rated the quality of scientific reasoning higher when the writing was accompanied by a brain image than in other conditions.
The implications of this study go beyond brain images. The deeper idea is that any information that symbolizes something we believe is important can influence our thinking, sometimes making us less thoughtful than we might otherwise be. This other information could be a brain image or some statistical jargon that sounds impressive or a mathematical formula that we don’t understand or a statement that the author teaches at Harvard University rather than Littletown State College.
In a study also published in 2008, Deena Weisberg and her colleagues at Yale University conducted a study similar to the one you just read. Weisberg had people read brief descriptions of psychological phenomena (involving memory, attention, reasoning, emotion, and other similar topics). They rated the scientific quality of the explanations. Instead of images, Weisberg had some explanations that included entirely superfluous and useless brain information (e.g., “people feel strong emotion because the amygdala processes emotion”) or no such brain information. Weisberg found that a good explanation was rated as even better when it included a brain reference (which was completely irrelevant). When the explanation was flawed, students were fairly good at catching the reasoning problems UNLESS the explanation contained the irrelevant brain reference. In that case, the students rated the flawed explanations as being good. Weisberg and her colleague call the problem “the seductive allure of neuroscience explanations.”
Does it Replicate? The Messy World of Real Science
A few years after the McCabe and Castel study was published, some psychologists at the University of Victoria in New Zealand, led by Robert Michael, were intrigued by the results and they were impressed by how frequently the paper had been cited by other researchers (about 40 citations per year between 2008 and 2012—a reasonably strong citation record). They wanted to explore the brain image effect, so they started by simply replicating the original study.
In their first attempt at replication, the researchers recruited and tested people using an online site called Mechanical Turk. With 197 participants, they found no hint of an effect of the brain image on people’s judgments about the validity of the conclusions of the article they read. In a second replication study, they tested students from their university and again found no statistically significant effect. In this second attempt, the results were in the predicted direction (the presence of a brain image was associated with higher ratings), but the differences were not strong enough to be persuasive. They tried slight variations on instructions and people recruited, but across 10 different replication studies, only one produced a statistically significant effect.
So, did Dr. Michael and his colleagues accuse McCabe and Castel of doing something wrong? Did they tear apart the experiments we described earlier and show that they were poorly planned, incorrectly analyzed, or interpreted in a deceptive way?
Not at all.
It is instructive to see how professional scientists approached the problem of failing to replicate a study. Here is a quick review of the approach taken by the researchers who did not replicate the McCabe and Castel study:
- First, they did not question the integrity of the original research. David McCabe and Alan Castel are respected researchers who carefully reported on a series of well-conducted experiments. They even noted that the original paper was carefully reported, even if journalists and other psychologists had occasionally exaggerated the findings: “Although McCabe and Castel (2008) did not overstate their findings, many others have. Sometimes these overstatements were linguistic exaggerations…Other overstatements made claims beyond what McCabe and Castel themselves reported.” [p. 720]
- Replication is an essential part of the scientific process. Michael and his colleagues did not back off of the importance of their difficulty reproducing the McCabe and Castel results. Clearly, McCabe and Castel’s conclusions—that “there is something special about the brain images with respect to influencing judgments of scientific credibility”—need to taken as possibly incorrect.
- Michael and his colleagues looked closely at the McCabe and Castel results and their own, and they looked for interesting reasons that the results of the two sets of studies might be different.
- Subtle effects: Perhaps the brain pictures really do influence their judgments, but only for some people or under very specific circumstances.
- Alternative explanations: Perhaps people assume that irrelevant information is not typically presented in scientific reports. People may have believed that the brain images provided additional evidence for the claims.
- Things have changed: The McCabe and Castel study was conducted in 2008 and the failed replication was in 2013. Neuroscience was very new to the general public in 2008, but a mere 5 years later, in 2013, it may have seemed less impressive.
Do images really directly affect people’s judgments of the quality of scientific thinking? Maybe yes. Maybe no. That’s still an open question.
The “Replication Crisis”
In recent years, there has been increased effort in the sciences (psychology, medicine, economics, etc.) to redo previous experiments to test their reliability. The findings have been disappointing at times.
The Reproducibility Project has attempted to replicate 100 studies within the field of psychology that were published with statistically significant results; they found that many of these results did not replicate well. Some did not reach statistical significance when replicated. Others reached statistical significance, but with much weaker effects than in the original study.
How could this happen?
- Chance. Psychologist use statistics to confirm that their results did not occur simply because of chance. Within psychology, the most common standard for p-values is “p < .05”. This p-value means that there is less than a 5% probability that the results of an experiment happened just by random chance, and a 95% probability that the results were statistically significant. Even though a published study may reveal statistically significant results, there is still a possibility that those results were random.
- Publication bias. Psychology research journals are far more likely to publish studies that find statistically significant results than they are studies that fail to find statistically significant results. What this means is that studies that yield results that are not statistically significant are very unlikely to get published. Let’s say that twenty researchers are all studying the same phenomenon. Out of the twenty, one gets statistically significant results, while the other nineteen all get non-significant results. The statistically significant result was likely just a result of randomness, but because of publication bias, that one study’s results are far more likely to be published than are the results of the other nineteen.
Note that this “replication crisis” itself does not mean that the original studies were bad, fraudulent, or even wrong. What it means, at its core, is that replication found results that were different from the results of the original studies. These results were sufficiently different that we might no longer be secure in our knowledge of what those results mean. Further replication and testing in other directions might give us a better understanding of why the results were different, but that too will require time and resources.
One Final Note
When we wrote to Dr. Alan Castel for permission to use his stimuli in this article, he not only consented, but he also sent us his data and copies of all of his stimuli. He sent copies of research by a variety of people, some research that has supported his work with David McCabe and some that has not. He even included a copy of the 10-experiment paper that you just read about, the one that failed to replicate the McCabe and Castel study.
The goal is to find the truth, not to insist that everything you publish is the last word on the topic. In fact, if it is the last word, then you are probably studying something so boring that no one else really cares.
Scientists disagree with one another all the time. But the disagreements are (usually) not personal. The evidence is not always neat and tidy, and the best interpretation of complex results is seldom obvious. At its best, scientists can disagree passionately about theory and evidence, and later to relax over a cool drink as friends.
- David P. McCabe & Alan D. Castel (2008). Seeing is believing: The effect of brain images on judgments of scientific reasoning. Cognition, 107, 343-352. ↵
- Deena Skolnick Weisberg, Frank C. Keil, Joshua Goodstein, Elizabeth Rawson, & Jeremy R. Gray (2008). The seductive allure of neuroscience explanations. Journal of Cognitive Neuroscience, 20(3), 470-477 ↵
- Robert B. Michael, Eryn J. Newman, Matti Vuorre, Geoff Cumming, and Maryanne Garry (2013). On the (non)persuasive power of a brain image. Psychonomic Bulletin & Review, 20(4), 720-725. ↵
- They actually tried to replicate Experiment 3 in the McCabe and Castel study. You read Experiment 1. These two experiments were similar and supported the same conclusions, but Dr. Michael and his colleagues preferred Experiment 3 for some technical reasons. ↵
- David McCabe, the first author of the original paper, tragically passed away in 2011 at the age of 41. At the time of his death, he was an assistant professor of Psychology at Colorado State University and he had started to build a solid body of published research, and he was also married with two young children. The problems with replicating his experiments were only published after his death, so it is impossible to know what his thoughts might have been about the issues these challenges raised. ↵