Biases in Experimental Design: Validity, Reliability, and Other Issues
Research studies with small sample sizes, high variability, and sampling bias are usually not representative of the general population.
Explain the factors that can threaten the external validity of a study
- Sampling bias is when the sample in question is not representative of the general population.
- Selection bias occurs when the participants in the sample are not equally and fairly selected for both the experimental and control groups; this renders any results from the experiment meaningless.
- Response bias is when only highly motivated people return a survey. When this occurs, the resulting data is biased toward those with the motivation to answer and submit the survey, and is therefore not representative of the population as a whole.
- External validity is the ability to apply conclusions gathered from the results of an experiment to the general population.
- Data sets with little variability have values that are similar to each other; data sets with high variability have values that are more spread out.
- reliability: The overall consistency of a measure.
- external validity: In research, whether or not study findings can be generalized to real-world scenarios.
- law of diminishing returns: The tendency for a continuing effort toward a particular goal to decline in efficacy after a certain amount of success has been achieved.
- bias: An inclination, predisposition, or prejudice toward something.
Research studies often fall prey to experimental bias, in which the results are not representative of what they are supposed to measure. This limits the applicability of the results to anything beyond the experiment itself, which decreases or eliminates the value of those results.
A study that is externally valid is one in which the data and conclusions gathered from the results of an experiment can be applied to the general population outside of the experiment itself. If the study’s data and conclusions cannot be applied to the general population, including general events or scenarios, then the experiment’s results are only relevant to that experiment, and nothing more. A study’s external validity can be threatened by such factors as small sample sizes, high variability, and sampling bias.
Small Sample Sizes
The smaller the sample size for an experiment, the less applicable the results will be to the general population. The world has some 7 billion individuals, and thus a representative sample in any experiment would have to be very large to be applied to this general population. Nonetheless, the larger the sample group is in relation to the general population to whom the results are to be applied, the more likely it is to be applicable.
This premise, however, can be negatively impacted by the law of diminishing returns, which states that effectiveness will decline after a certain amount of success has been achieved. This means that after a certain point, including more individuals in a study would gradually have less value to researchers. This could be caused by a multitude of factors, including cost and time put into the research. Generally it is best to attain a reasonable sample size that is representative of the population being studied.
Variability, also known as dispersion or spread, refers to how spread out a group of data is, or how much the measures differ from each other. Data sets with similar values are considered to have little variability because the values are within a smaller spread, whereas data sets with values that are spread out have high variability because the values are within a larger spread. In many instances of high variability there are outliers, which are values that exist far outside of the area where the majority of values are found. In many cases these outliers, which increase the variability of the data set, are removed when conducting statistical analysis of the data.
Sampling bias occurs when the sample participating in the study is not representative of the general population. This may be the result of purposeful selection of participants by the researcher, but there are many other factors that can create sampling bias. One example is surveys taken during a presidential election. The results of the surveys often depend on the city, state, or area being surveyed. For example, people in cities tend to vote one way, while people in rural environments often vote another. Similarly, one’s geographic location (the Northeast, South, Midwest, etc.) can have an impact on who is being surveyed. If there is a high saturation of a given political party in an area surveyed, then the results will be skewed in the direction of the political party, and not be representative of the general population.
Selection bias happens when the comparisons in data from the sample population have no meaning or value because the participants in the sample were not equally and fairly selected for both the experimental and control groups. Both the experimental and control groups should be representative of the general population, as well as representative of each other. One group should not show substantially higher characteristics of a given variable than the other, as this can distort the findings.
Response bias (also known as “self-selection bias”) occurs when only certain types of people respond to a survey or study. When this occurs, the resulting data is biased towards those with the motivation to answer and submit the survey or participate in the study. The resulting data, however, is not representative of the desired sample, nor the population at large. This is because only a select few have answered the survey and participated in the experiment. This data requires a disclaimer saying that out of all respondents, a certain characteristic is found. Regardless of a disclaimer, the results cannot be applied to the general population, nor the entire desired sample group.
For example, imagine that a university newspaper ran an ad asking for students to volunteer for a study in which intimate details of their sex lives would be discussed. Clearly, the sample of students who would volunteer for such a study would not be representative of all of the students at the university (many of whom would never want to volunteer for such a study due to privacy concerns). Similarly, an online survey about computer use is likely to attract people more interested in technology than is typical. In both of these examples, people who “self-select” themselves for the study are likely to differ in important ways from the population the experimenter wishes to draw conclusions about. Thus, the responses collected are biased and not representative of the general population of interest. Many of the admittedly “non-scientific” polls taken on television or websites suffer from response bias.
A response bias can also result when the non-random component occurs after the potential subject has enlisted in the experiment. Considering again the hypothetical experiment in which subjects are to be asked intimate details of their sex lives, and assume this time that the subjects did not know what the experiment was going to be about until they showed up. When they found out, many of the subjects would refuse to participate, leaving only those students who are very interested in discussing their sex lives, which results in a biased sample.
Another important issue to consider when collecting data is reliability. Reliability refers to the overall consistency of a measure. This means that any surveys, tasks, or measures the researcher administers during a study need to produce similar results each time they are used under similar conditions. If a measure is not reliable, it will produce different results, even under the same conditions. Consider a scale that measures how much you weigh. If one day the scale shows that you weigh 150 lbs yet the next day it shows you 170 lbs, it may be time to shop for a more reliable scale. Let’s look at another example. A researcher is running a study using a questionnaire that assesses emotion —
specifically negative affect. If the emotion questionnaire produces completely different results, even when very similar participants with identical levels of negative affect under identical experimental conditions complete the questionnaire, it is not reliable, and the data cannot be trusted. Ideally, the two similar participants with identical levels of negative affect should score very similarly on the emotion questionnaire. This would indicate the measure’s ability to produce consistent results under similar conditions: the measure would be considered reliable.
Therefore, when determining the reliability of a measure, a researcher must determine how much variability is stemming from measurement error (assumed to be random error) and how much is stemming from the “true score” or the actual, replicable aspects of the phenomenon being measured. This concept is sometimes referred to as “classical test theory.” Researchers typically do this by pre-testing their measures on preliminary samples of participants, and by running descriptive reliability analyses that indicate to them the measure’s overall consistency.
Heuristics and Cognitive Biases
When interpreting data, a researcher must avoid cognitive bias and be aware of the use of heuristics to avoid drawing incorrect conclusions.
Explain the heuristics and cognitive biases that can impact a researcher’s interpretation of data
- When analyzing results, researchers must be mindful of heuristics and cognitive biases that may skew their interpretations.
- Common heuristics, or mental shortcuts, include availability, representativeness, and similarity heuristics.
- Common cognitive biases include hindsight bias, confirmation bias, and the illusory correlation bias. Pareidolia and apophenia can also result in researchers inferring connections or patterns when there are none.
- Representative heuristics are in use when we assess the probability of an uncertain event happening according to how obvious, equal, or representative it is in relation to the population, and the degree of reflecting the underlying characteristics of the process, such as coincidence.
- pareidolia: The tendency to interpret a vague stimulus as something known to the observer, such as interpreting marks on Mars as canals, seeing shapes in clouds, or hearing hidden messages in reversed music.
- heuristic: Experience-based techniques for problem solving, learning, and discovery that give a solution which is not guaranteed to be optimal.
- apophenia: The perception of or belief in connectedness among unrelated phenomena.
Once data has been gathered, the researcher must analyze and interpret it. However, people are prone to certain tendencies and biases that must be avoided when making inferences about findings. When solving problems or reasoning, people often make use of certain heuristics, or learning shortcuts. These mental shortcuts, which are often influenced by cognitive biases, save time and energy but cause errors in reasoning.
There are several types of heuristics used to save time when drawing conclusions about large amounts of information, including availability, representativeness, and similarity heuristics.
An availability heuristic involves estimating how common an event is based on how easily we can remember the event occurring previously. Things that are more easily remembered are thought to be more common than things that are not easily recalled. The availability heuristic leads to people overestimating the occurrence of situations they are familiar with.
An example of this is when we incorrectly believe that “spectacular” occurrences (like deaths caused by severe weather) happen more often than regular ones (like deaths caused by disease). Since the media covers these “spectacular” occurrences more often, and with more emphasis, they become more available to our memory. In the same way, vivid or exceptional outcomes will be perceived as more likely than those that are harder to picture, or are difficult to understand.
Consider the following research example. A researcher is conducting a clinical study that requires her to screen participants for mental illnesses. This researcher has recently read about and heard news stories on
antisocial personality disorder. As a result, she ends up incorrectly flagging several participants in the sample as having antisocial personality disorder, when in reality, this mental illness is quite infrequent in the general population.
We use the representativeness heuristic when we make judgments about the probability of an event under uncertainty. When people rely on representativeness to make judgments, they are likely to judge incorrectly because the fact that something is more representative does not make it more likely. People tend to overestimate the ability of an event’s representativeness to accurately predict the likelihood of that event.
For example, you are conducting a study in which you are examining the occurrence of men versus women who choose careers as scientists. Based on the schemas you hold about men most often being in science—what is most representative in your mind when you think of a typical scientist—you predict that most scientists will be male. However, in reality, there is a roughly 50% chance that a scientist will be a woman rather than a man.
We use the similarity heuristic to account for how people make judgments based on the similarity between current situations and other situations. This occurs when we base decisions on favorable versus unfavorable experiences, and how the present seems similar to the past. This heuristic involves making choices based on how similar a current situation is compared to a previous experience.
We rely on the similarity heuristic all the time when making decisions. For example, if someone enjoys a book by John Irving, they may generalize that to assume that they would also enjoy other books by John Irving, and they will be more likely to read another. Much of the time this is helpful and saves us time in making decisions.
However, this heuristic can introduce bias in research, in which it is by definition important to remain an objective observer. For example, a researcher may unconsciously draw the same conclusions as what was found in studies that followed similar methods, precisely because their study is similar to those former studies. Although sometimes this assumption may accurately reflect the data (results do frequently replicate across studies), sometimes this bias can lead the researcher to exclude other, valid interpretations.
Cognitive biases are another factor that can lead researchers to make incorrect inferences when analyzing data. A cognitive bias is the mind’s tendency to come to incorrect conclusions based on a variety of factors.
The illusory correlation bias is our predisposition to perceive a relationship between variables (typically people, events, or behaviors) even when no such relationship exists. A common example of this phenomenon would be when people form false associations between membership in a statistical minority group and rare (typically negative) behaviors. This is one way stereotypes form and endure. Hamilton & Rose (1980) found that stereotypes can lead people to expect certain groups and traits to fit together, and then to overestimate the frequency with which these correlations actually occur.
Hindsight bias is a false memory of having predicted events, or an exaggeration of actual predictions, after becoming aware of the outcome. This is the moment after something occurs where we look back and say, “I saw that coming,” or look back and put all the signs and pieces together which led to the eventual outcome. Hindsight bias occurs in psychological research when researchers form “post hoc hypotheses.” When a researcher obtains a certain result that is counter to what he or she originally predicted, the researcher may use post hoc hypotheses to revise their prediction to fit the actual, obtained result.
The confirmation bias leads to the tendency to search for, or interpret, information in a way that confirms one’s existing beliefs. This occurs when we look only for information that affirms what we already believe to be true. Confirmation bias is especially dangerous in psychological research. If a researcher has a particular hypothesis in mind, he or she may look for patterns in the data that support that hypothesis, while ignoring other important patterns that oppose it.
Apophenia is the experience of seeing meaningful patterns or connections in random or meaningless data. This is a person’s tendency to seek patterns in random information. Researchers make this mistake when they obtain mostly null results (results that do not support their hypothesis), and compensate by exaggerating or magnifying any pattern they do find. In reality, statistically meaningless data or null findings are common, which is why researchers typically conduct multiple studies to examine their research questions.
Pareidolia is when a vague and random stimulus is perceived as significant when it is not. This occurs when we see images of animals or other things in the clouds, the Man on the Moon or the face on Mars, hear hidden messages in songs when played in reverse, or give inanimate objects qualities that make them seem human.