{"id":550,"date":"2020-05-01T21:44:09","date_gmt":"2020-05-01T21:44:09","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/adolescent\/?post_type=chapter&#038;p=550"},"modified":"2021-07-11T15:27:31","modified_gmt":"2021-07-11T15:27:31","slug":"analyzing-data-correlational-and-experimental-research","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/adolescent\/chapter\/analyzing-data-correlational-and-experimental-research\/","title":{"raw":"Analyzing Data: Correlational and Experimental Research","rendered":"Analyzing Data: Correlational and Experimental Research"},"content":{"raw":"Did you know that as sales of ice cream increase, so does the overall rate of crime? Is it possible that indulging in your favorite flavor of ice cream could send you on a crime spree? Or, after committing a crime, do you think you might decide to treat yourself to a cone? There is no question that a relationship exists between ice cream and crime (e.g., Harper, 2013), but does one thing actually caused the other to occur.\r\n\r\nIt is much more likely that both ice cream sales and crime rates are related to the temperature outside. When the temperature is warm, there are lots of people out of their houses, interacting with each other, getting annoyed with one another, and sometimes committing crimes. Also, when it is warm outside, we are more likely to seek a refreshing treat like ice cream. How do we determine if there is indeed a relationship between two things? And when there is a relationship, how can we discern whether it is attributable to coincidence or causation? We do this through statistical analysis of the data. Which analysis we use will depend on several conditions outlined next.\r\n<h2>Introduction to Statistical Thinking<\/h2>\r\n[caption id=\"attachment_1883\" align=\"alignright\" width=\"379\"]<a href=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17151911\/coffee.jpg\"><img class=\" wp-image-1883\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17151911\/coffee.jpg\" alt=\"Coffee cup with heart shaped cream inside.\" width=\"379\" height=\"284\" \/><\/a> <strong>Figure 2.6.1<\/strong>. People around the world differ in their preferences for drinking coffee versus drinking tea. Would the results of the coffee study be the same in Canada as in China? [Image: Duncan, https:\/\/goo.gl\/vbMyTm, CC BY-NC 2.0, https:\/\/goo.gl\/l8UUGY][\/caption]Does drinking coffee actually increase your life expectancy? A recent study (Freedman, Park, Abnet, Hollenbeck, &amp; Sinha, 2012) found that men who drank at least six cups of coffee a day had a 10% lower chance of dying (women 15% lower) than those who drank none. Does this mean you should pick up or increase your own coffee habit? Modern society has become awash in studies such as this; you can read about several such studies in the news every day. Conducting such a study well, and interpreting the results of such studies requires understanding basic ideas of <strong>statistics<\/strong>, the science of gaining insight from data. Key components to a statistical investigation are:\r\n<ul>\r\n \t<li>Planning the study: Start by asking a testable research question and deciding how to collect data. For example, how long was the study period of the coffee study? How many people were recruited for the study, how were they recruited, and from where? How old were they? What other variables were recorded about the individuals? Were changes made to the participants\u2019 coffee habits during the course of the study?<\/li>\r\n \t<li>Examining the data: What are appropriate ways to examine the data? What graphs are relevant, and what do they reveal? What descriptive statistics can be calculated to summarize relevant aspects of the data, and what do they reveal? What patterns do you see in the data? Are there any individual observations that deviate from the overall pattern, and what do they reveal? For example, in the coffee study, did the proportions differ when we compared the smokers to the non-smokers?<\/li>\r\n \t<li>Inferring from the data: What are valid statistical methods for drawing inferences \u201cbeyond\u201d the data you collected? In the coffee study, is the 10%\u201315% reduction in risk of death something that could have happened just by chance?<\/li>\r\n \t<li>Drawing conclusions: Based on what you learned from your data, what conclusions can you draw? Who do you think these conclusions apply to? (Were the people in the coffee study older? Healthy? Living in cities?) Can you draw a <strong>cause-and-effect<\/strong> conclusion about your treatments? (Are scientists now saying that the coffee drinking is the cause of the decreased risk of death?)<\/li>\r\n<\/ul>\r\nNotice that the numerical analysis (\u201ccrunching numbers\u201d on the computer) comprises only a small part of overall statistical investigation. In this section, you will see how we can answer some of these questions and what questions you should be asking about any statistical investigation you read about.\r\n\r\n[embed]https:\/\/youtu.be\/z-Qi4w6Xkuc[\/embed]\r\n\r\n<strong>Video 2.6.1.\u00a0<\/strong><em>Types of Statistical Studies<\/em> explains the differences between correlational and experimental research.\r\n\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/2708\r\n\r\n<section>\r\n<h2>Distributional Thinking<\/h2>\r\nWhen data are collected to address a particular question, an important first step is to think of meaningful ways to organize and examine the data. Let's take a look at an example.\r\n\r\n<strong>Example 1<\/strong>: Researchers investigated whether cancer pamphlets are written at an appropriate level to be read and understood by cancer patients (Short, Moriarty, &amp; Cooley, 1995). Tests of reading ability were given to 63 patients. In addition, readability level was determined for a <strong>sample<\/strong> of 30 pamphlets, based on characteristics such as the lengths of words and sentences in the pamphlet. The results, reported in terms of grade levels, are displayed in Figure 2.6.2.\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"800\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/902\/2016\/06\/10212647\/000001456original.jpg\" alt=\"Table showing patients' reading levels and pahmphlet's reading levels.\" width=\"800\" height=\"236\" \/> <strong>Figure 2.6.2<\/strong>. Frequency tables of patient reading levels and pamphlet readability levels.[\/caption]\r\n<figure><figcaption>Testing these two variables reveal two fundamental aspects of statistical thinking:<\/figcaption><\/figure>\r\n<ul>\r\n \t<li>Data <em>vary<\/em>. More specifically, values of a variable (such as reading level of a cancer patient or readability level of a cancer pamphlet) vary.<\/li>\r\n \t<li>Analyzing the pattern of variation, called the <strong>distribution<\/strong> of the variable, often reveals insights.<\/li>\r\n<\/ul>\r\nAddressing the research question of whether the cancer pamphlets are written at appropriate levels for the cancer patients requires comparing the two distributions. A na\u00efve comparison might focus only on the centers of the distributions. Both medians turn out to be ninth grade, but considering only medians ignores the variability and the overall distributions of these data. A more illuminating approach is to compare the entire distributions, for example with a graph, as in Figure 2.6.3.\r\n<figure>\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"800\"]<img src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/902\/2016\/06\/10212639\/000001457original.jpg\" alt=\"Bar graph showing that the reading level of pamphlets is typically higher than the reading level of the patients.\" width=\"800\" height=\"388\" \/> <strong>Figure 2.6.3<\/strong>. Comparison of patient reading levels and pamphlet readability levels.[\/caption]<\/figure>\r\nFigure 2.6.3 makes clear that the two distributions are not well aligned at all. The most glaring discrepancy is that many patients (17\/63, or 27%, to be precise) have a reading level below that of the most readable pamphlet. These patients will need help to understand the information provided in the cancer pamphlets. Notice that this conclusion follows from considering the distributions as a whole, not simply measures of center or variability, and that the graph contrasts those distributions more immediately than the frequency tables.\r\n\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/2741\r\n<h2>Statistical Significance<\/h2>\r\nEven when we find patterns in data, often there is still uncertainty in various aspects of the data. For example, there may be potential for measurement errors (even your own body temperature can fluctuate by almost 1\u00b0F over the course of the day). Or we may only have a \u201csnapshot\u201d of observations from a more long-term process or only a small subset of individuals from the <strong>population<\/strong> of interest. In such cases, how can we determine whether patterns we see in our small set of data is convincing evidence of a systematic phenomenon in the larger process or population? Let's take a look at another example.\r\n\r\n<strong>Example 2<\/strong>: In a study reported in the November 2007 issue of <em>Nature<\/em>, researchers investigated whether pre-verbal infants take into account an individual\u2019s actions toward others in evaluating that individual as appealing or aversive (Hamlin, Wynn, &amp; Bloom, 2007). In one component of the study, 10-month-old infants were shown a \u201cclimber\u201d character (a piece of wood with \u201cgoogly\u201d eyes glued onto it) that could not make it up a hill in two tries. Then the infants were shown two scenarios for the climber\u2019s next try, one where the climber was pushed to the top of the hill by another character (\u201chelper\u201d), and one where the climber was pushed back down the hill by another character (\u201chinderer\u201d). The infant was alternately shown these two scenarios several times. Then the infant was presented with two pieces of wood (representing the helper and the hinderer characters) and asked to pick one to play with.\r\n\r\nThe researchers found that of the 16 infants who made a clear choice, 14 chose to play with the helper toy. One possible explanation for this clear majority result is that the helping behavior of the one toy increases the infants\u2019 likelihood of choosing that toy. But are there other possible explanations? What about the color of the toy? Well, prior to collecting the data, the researchers arranged so that each color and shape (red square and blue circle) would be seen by the same number of infants. Or maybe the infants had right-handed tendencies and so picked whichever toy was closer to their right hand?\r\n\r\nWell, prior to collecting the data, the researchers arranged it so half the infants saw the helper toy on the right and half on the left. Or, maybe the shapes of these wooden characters (square, triangle, circle) had an effect? Perhaps, but again, the researchers controlled for this by rotating which shape was the helper toy, the hinderer toy, and the climber. When designing experiments, it is important to <em>control<\/em> for as many variables as might affect the responses as possible. It is beginning to appear that the researchers accounted for all the other plausible explanations. But there is one more important consideration that cannot be controlled\u2014if we did the study again with these 16 infants, they might not make the same choices. In other words, there is some <em>randomness<\/em> inherent in their selection process.\r\n<h3>P-value<\/h3>\r\nMaybe each infant had no genuine preference at all, and it was simply \u201crandom luck\u201d that led to 14 infants picking the helper toy. Although this random component cannot be controlled, we can apply a <em>probability model<\/em> to investigate the pattern of results that would occur in the long run if random chance were the only factor.\r\n\r\nIf the infants were equally likely to pick between the two toys, then each infant had a 50% chance of picking the helper toy. It\u2019s like each infant tossed a coin, and if it landed heads, the infant picked the helper toy. So if we tossed a coin 16 times, could it land heads 14 times? Sure, it\u2019s possible, but it turns out to be very unlikely. Getting 14 (or more) heads in 16 tosses is about as likely as tossing a coin and getting 9 heads in a row. This probability is referred to as a <strong>p-value<\/strong>. The p-value represents the likelihood that experimental results happened by chance.\u00a0Within psychology, the most common standard for p-values is \u201cp &lt; .05\u201d. What this means is that there is less than a 5% probability that the results happened just by random chance, and therefore a 95% probability that the results reflect a meaningful pattern in human psychology. We call this <strong>statistical significance<\/strong>.\r\n\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/2743\r\n\r\nSo, in the study above, if we assume that each infant was choosing equally, then the probability that 14 or more out of 16 infants would choose the helper toy is found to be 0.0021. We have only two logical possibilities: either the infants have a genuine preference for the helper toy, or the infants have no preference (50\/50), and an outcome that would occur only 2 times in 1,000 iterations happened in this study. Because this p-value of 0.0021 is quite small, we conclude that the study provides very strong evidence that these infants have a genuine preference for the helper toy.\r\n\r\nIf we compare the p-value to some cut-off value, like 0.05, we see that the p=value is smaller.\u00a0Because the p-value is smaller than that cut-off value, then we reject the hypothesis that only random chance was at play here. In this case, these researchers would conclude that <em>significantly<\/em> more than half of the infants in the study chose the helper toy, giving strong evidence of a genuine preference for the toy with the helping behavior.\r\n\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/2742\r\n\r\n<\/section>\r\n<h2>Generalizability<\/h2>\r\n[caption id=\"attachment_1880\" align=\"alignright\" width=\"429\"]<a href=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17150423\/generalizability.jpg\"><img class=\"wp-image-1880 \" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17150423\/generalizability.jpg\" alt=\"Photo of a diverse group of college-aged students.\" width=\"429\" height=\"229\" \/><\/a> <strong>Figure 2.6.4<\/strong>. Generalizability is an important research consideration: The results of studies with widely representative samples are more likely to generalize to the population. [Image: Barnacles Budget Accommodation][\/caption]One limitation to the study mentioned previously about the babies choosing the \"helper\" toy\u00a0is that the conclusion only applies to the 16 infants in the study. We don\u2019t know much about how those 16 infants were selected. Suppose we want to select a subset of individuals (a <strong>sample<\/strong>) from a much larger group of individuals (the <strong>population<\/strong>) in such a way that conclusions from the sample can be <strong>generalized<\/strong> to the larger population. This is the question faced by pollsters every day.\r\n\r\n<strong>Example 3<\/strong>: The General Social Survey (GSS) is a survey on societal trends conducted every other year in the United States. Based on a sample of about 2,000 adult Americans, researchers make claims about what percentage of the U.S. population consider themselves to be \u201cliberal,\u201d what percentage consider themselves \u201chappy,\u201d what percentage feel \u201crushed\u201d in their daily lives, and many other issues. The key to making these claims about the larger population of all American adults lies in how the sample is selected. The goal is to select a sample that is representative of the population, and a common way to achieve this goal is to select a <strong>random sample<\/strong> that gives every member of the population an equal chance of being selected for the sample. In its simplest form, random sampling involves numbering every member of the population and then using a computer to randomly select the subset to be surveyed. Most polls don\u2019t operate exactly like this, but they do use probability-based sampling methods to select individuals from nationally representative panels.\r\n\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/2747\r\n\r\nIn 2004, the GSS reported that 817 of 977 respondents (or 83.6%) indicated that they always or sometimes feel rushed. This is a clear majority, but we again need to consider variation due to <em>random sampling<\/em>. Fortunately, we can use the same probability model we did in the previous example to investigate the probable size of this error. (Note, we can use the coin-tossing model when the actual population size is much, much larger than the sample size, as then we can still consider the probability to be the same for every individual in the sample.) This probability model predicts that the sample result will be within 3 percentage points of the population value (roughly 1 over the square root of the sample size, the <strong>margin of error<\/strong>). A statistician would conclude, with 95% confidence, that between 80.6% and 86.6% of all adult Americans in 2004 would have responded that they sometimes or always feel rushed.\r\n\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/2748\r\n\r\nThe key to the margin of error is that when we use a probability sampling method, we can make claims about how often (in the long run, with repeated random sampling) the sample result would fall within a certain distance from the unknown population value by chance (meaning by random sampling variation) alone. Conversely, non-random samples are often suspect to bias, meaning the sampling method systematically over-represents some segments of the population and under-represents others. We also still need to consider other sources of bias, such as individuals not responding honestly. These sources of error are not measured by the margin of error.\r\n<h2>Cause and Effect Conclusions<\/h2>\r\nIn many research studies, the primary question of interest concerns differences between groups. Then the question becomes how were the groups formed (e.g., selecting people who already drink coffee vs. those who don\u2019t). In some studies, the researchers actively form the groups themselves. But then we have a similar question\u2014could any differences we observe in the groups be an artifact of that group-formation process? Or maybe the difference we observe in the groups is so large that we can discount a \u201cfluke\u201d in the group-formation process as a reasonable explanation for what we find?\r\n\r\n<strong>Example 4<\/strong>: A psychology study investigated whether people tend to display more creativity when they are thinking about intrinsic (internal) or extrinsic (external) motivations (Ramsey &amp; Schafer, 2002, based on a study by Amabile, 1985). The subjects were 47 people with extensive experience with creative writing. Subjects began by answering survey questions about either intrinsic motivations for writing (such as the pleasure of self-expression) or extrinsic motivations (such as public recognition). Then all subjects were instructed to write a haiku, and those poems were evaluated for creativity by a panel of judges. The researchers conjectured beforehand that subjects who were thinking about intrinsic motivations would display more creativity than subjects who were thinking about extrinsic motivations. The creativity scores from the 47 subjects in this study are displayed in Figure 2.6.5, where higher scores indicate more creativity.\r\n<figure style=\"width: 91.2087912087912%;\">\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"800\"]<img style=\"width: 91.2087912087912%;\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/902\/2016\/06\/10212632\/000001452original.jpg\" alt=\"Image showing a dot for creativity scores, which vary between 5 and 27, and the types of motivation each person was given as a motivator, either extrinsic or intrinsic.\" width=\"800\" height=\"226\" \/> <strong>Figure 2.6.5<\/strong>. Creativity scores separated by type of motivation.[\/caption]\r\n\r\n<figcaption><\/figcaption><\/figure>\r\nIn this example, the key question is whether the type of motivation <em>affects<\/em> creativity scores. In particular, do subjects who were asked about intrinsic motivations tend to have higher creativity scores than subjects who were asked about extrinsic motivations?\r\n\r\nFigure 2.6.5 reveals that both motivation groups saw considerable variability in creativity scores, and these scores have considerable overlap between the groups. In other words, it\u2019s certainly not always the case that those with extrinsic motivations have higher creativity than those with intrinsic motivations, but there may still be a statistical <em>tendency<\/em> in this direction. (Psychologist Keith Stanovich (2013) refers to people\u2019s difficulties with thinking about such probabilistic tendencies as \u201cthe Achilles heel of human cognition.\u201d)\r\n\r\nThe mean creativity score is 19.88 for the intrinsic group, compared to 15.74 for the extrinsic group, which supports the researchers\u2019 conjecture. Yet comparing only the means of the two groups fails to consider the variability of creativity scores in the groups. We can measure variability with statistics using, for instance, the standard deviation: 5.25 for the extrinsic group and 4.40 for the intrinsic group. The standard deviations tell us that most of the creativity scores are within about 5 points of the mean score in each group. We see that the mean score for the intrinsic group lies within one standard deviation of the mean score for extrinsic group. So, although there is a tendency for the creativity scores to be higher in the intrinsic group, on average, the difference is not extremely large.\r\n\r\nWe again want to consider possible explanations for this difference. The study only involved individuals with extensive creative writing experience. Although this limits the population to which we can generalize, it does not explain why the mean creativity score was a bit larger for the intrinsic group than for the extrinsic group. Maybe women tend to receive higher creativity scores? Here is where we need to focus on how the individuals were assigned to the motivation groups. If only women were in the intrinsic motivation group and only men in the extrinsic group, then this would present a problem because we wouldn\u2019t know if the intrinsic group did better because of the different type of motivation or because they were women. However, the researchers guarded against such a problem by randomly assigning the individuals to the motivation groups. Like flipping a coin, each individual was just as likely to be assigned to either type of motivation. Why is this helpful? Because this <strong>random assignment<\/strong> tends to balance out all the variables related to creativity we can think of, and even those we don\u2019t think of in advance, between the two groups. So we should have a similar male\/female split between the two groups; we should have a similar age distribution between the two groups; we should have a similar distribution of educational background between the two groups; and so on. Random assignment should produce groups that are as similar as possible except for the type of motivation, which presumably eliminates all those other variables as possible explanations for the observed tendency for higher scores in the intrinsic group.\r\n\r\nBut does this always work? No, so by \u201cluck of the draw\u201d the groups may be a little different prior to answering the motivation survey. So then the question is, is it possible that an unlucky random assignment is responsible for the observed difference in creativity scores between the groups? In other words, suppose each individual\u2019s poem was going to get the same creativity score no matter which group they were assigned to, that the type of motivation in no way impacted their score. Then how often would the random-assignment process alone lead to a difference in mean creativity scores as large (or larger) than 19.88 \u2013 15.74 = 4.14 points?\r\n\r\nWe again want to apply to a probability model to approximate a <strong>p-value<\/strong>, but this time the model will be a bit different. Think of writing everyone\u2019s creativity scores on an index card, shuffling up the index cards, and then dealing out 23 to the extrinsic motivation group and 24 to the intrinsic motivation group, and finding the difference in the group means. We (better yet, the computer) can repeat this process over and over to see how often, when the scores don\u2019t change, random assignment leads to a difference in means at least as large as 4.41. Figure 2.6.6 shows the results from 1,000 such hypothetical random assignments for these scores.\r\n<figure style=\"width: 51.0989010989011%;\">\r\n\r\n[caption id=\"\" align=\"alignleft\" width=\"800\"]<img class=\"\" style=\"width: 307px;\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/902\/2016\/06\/10212623\/000001454original.jpg\" alt=\"Standard distribution in a typical bell curve.\" width=\"800\" height=\"307\" \/> <strong>Figure 2.6.6<\/strong>. Differences in group means under random assignment alone.[\/caption]\r\n\r\n<figcaption><\/figcaption><\/figure>\r\nOnly 2 of the 1,000 simulated random assignments produced a difference in group means of 4.41 or larger. In other words, the approximate p-value is 2\/1000 = 0.002. This small p-value indicates that it would be very surprising for the random assignment process alone to produce such a large difference in group means. Therefore, as with Example 4, we have strong evidence that focusing on intrinsic motivations tends to increase creativity scores, as compared to thinking about extrinsic motivations.\r\n\r\nNotice that the previous statement implies a cause-and-effect relationship between motivation and creativity score; is such a strong conclusion justified? Yes, because of the random assignment used in the study. That should have balanced out any other variables between the two groups, so now that the small p-value convinces us that the higher mean in the intrinsic group wasn\u2019t just a coincidence, the only reasonable explanation left is the difference in the type of motivation. Can we generalize this conclusion to everyone? Not necessarily\u2014we could cautiously generalize this conclusion to individuals with extensive experience in creative writing similar to the individuals in this study, but we would still want to know more about how these individuals were selected to participate.\r\n\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/2709\r\n<h2>Conclusion<\/h2>\r\n[caption id=\"attachment_1881\" align=\"alignleft\" width=\"276\"]<a href=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17150557\/conclusion.jpg\"><img class=\"wp-image-1881 \" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17150557\/conclusion.jpg\" alt=\"Close-up photo of mathematical equations.\" width=\"276\" height=\"197\" \/><\/a> <strong>Figure 2.6.7<\/strong>. Researchers employ the scientific method that involves a great deal of statistical thinking: generate a hypothesis --&gt; design a study to test that hypothesis --&gt; conduct the study --&gt; analyze the data --&gt; report the results. [Image: widdowquinn][\/caption]Statistical thinking involves the careful design of a study to collect meaningful data to answer a focused research question, detailed analysis of patterns in the data, and drawing conclusions that go beyond the observed data. Random sampling is paramount to generalizing results from our sample to a larger population, and random assignment is key to drawing cause-and-effect conclusions. With both kinds of randomness, probability models help us assess how much random variation we can expect in our results, in order to determine whether our results could happen by chance alone and to estimate a margin of error.\r\n\r\nSo where does this leave us with regard to the coffee study mentioned previously (the\u00a0Freedman, Park, Abnet, Hollenbeck, &amp; Sinha, 2012 found that men who drank at least six cups of coffee a day had a 10% lower chance of dying (women 15% lower) than those who drank none)? We can answer many of the questions:\r\n<ul>\r\n \t<li>This was a 14-year study conducted by researchers at the National Cancer Institute.<\/li>\r\n \t<li>The results were published in the June issue of the <em>New England Journal of Medicine<\/em>, a respected, peer-reviewed journal.<\/li>\r\n \t<li>The study reviewed coffee habits of more than 402,000 people ages 50 to 71 from six states and two metropolitan areas. Those with cancer, heart disease, and stroke were excluded at the start of the study. Coffee consumption was assessed once at the start of the study.<\/li>\r\n \t<li>About 52,000 people died during the course of the study.<\/li>\r\n \t<li>People who drank between two and five cups of coffee daily showed a lower risk as well, but the amount of reduction increased for those drinking six or more cups.<\/li>\r\n \t<li>The sample sizes were fairly large and so the p-values are quite small, even though percent reduction in risk was not extremely large (dropping from a 12% chance to about 10%\u201311%).<\/li>\r\n \t<li>Whether coffee was caffeinated or decaffeinated did not appear to affect the results.<\/li>\r\n \t<li>This was an observational study, so no cause-and-effect conclusions can be drawn between coffee drinking and increased longevity, contrary to the impression conveyed by many news headlines about this study. In particular, it\u2019s possible that those with chronic diseases don\u2019t tend to drink coffee.<\/li>\r\n<\/ul>\r\nThis study needs to be reviewed in the larger context of similar studies and consistency of results across studies, with the constant caution that this was not a randomized experiment. Whereas a statistical analysis can still \u201cadjust\u201d for other potential confounding variables, we are not yet convinced that researchers have identified them all or completely isolated why this decrease in death risk is evident. Researchers can now take the findings of this study and develop more focused studies that address new questions.\r\n<div class=\"textbox examples\">\r\n<h3>Learn More<\/h3>\r\nExplore these outside resources to learn more about applied statistics:\r\n<ul>\r\n \t<li>Video about p-values:\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=bVMVGHkt2cg\" target=\"_blank\" rel=\"noopener noreferrer\">P-Value Extravaganza<\/a><\/li>\r\n \t<li><a href=\"http:\/\/www.rossmanchance.com\/applets\/\" target=\"_blank\" rel=\"noopener\">Interactive web applets for teaching and learning statistics<\/a><\/li>\r\n \t<li>Inter-university Consortium for Political and Social Research\u00a0<a href=\"http:\/\/www.icpsr.umich.edu\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">where you can find and analyze data.<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.causeweb.org\/\" target=\"_blank\" rel=\"noopener\">The Consortium for the Advancement of Undergraduate Statistics<\/a><\/li>\r\n<\/ul>\r\n<\/div>\r\n<section><\/section>","rendered":"<p>Did you know that as sales of ice cream increase, so does the overall rate of crime? Is it possible that indulging in your favorite flavor of ice cream could send you on a crime spree? Or, after committing a crime, do you think you might decide to treat yourself to a cone? There is no question that a relationship exists between ice cream and crime (e.g., Harper, 2013), but does one thing actually caused the other to occur.<\/p>\n<p>It is much more likely that both ice cream sales and crime rates are related to the temperature outside. When the temperature is warm, there are lots of people out of their houses, interacting with each other, getting annoyed with one another, and sometimes committing crimes. Also, when it is warm outside, we are more likely to seek a refreshing treat like ice cream. How do we determine if there is indeed a relationship between two things? And when there is a relationship, how can we discern whether it is attributable to coincidence or causation? We do this through statistical analysis of the data. Which analysis we use will depend on several conditions outlined next.<\/p>\n<h2>Introduction to Statistical Thinking<\/h2>\n<div id=\"attachment_1883\" style=\"width: 389px\" class=\"wp-caption alignright\"><a href=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17151911\/coffee.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1883\" class=\"wp-image-1883\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17151911\/coffee.jpg\" alt=\"Coffee cup with heart shaped cream inside.\" width=\"379\" height=\"284\" \/><\/a><\/p>\n<p id=\"caption-attachment-1883\" class=\"wp-caption-text\"><strong>Figure 2.6.1<\/strong>. People around the world differ in their preferences for drinking coffee versus drinking tea. Would the results of the coffee study be the same in Canada as in China? [Image: Duncan, https:\/\/goo.gl\/vbMyTm, CC BY-NC 2.0, https:\/\/goo.gl\/l8UUGY]<\/p>\n<\/div>\n<p>Does drinking coffee actually increase your life expectancy? A recent study (Freedman, Park, Abnet, Hollenbeck, &amp; Sinha, 2012) found that men who drank at least six cups of coffee a day had a 10% lower chance of dying (women 15% lower) than those who drank none. Does this mean you should pick up or increase your own coffee habit? Modern society has become awash in studies such as this; you can read about several such studies in the news every day. Conducting such a study well, and interpreting the results of such studies requires understanding basic ideas of <strong>statistics<\/strong>, the science of gaining insight from data. Key components to a statistical investigation are:<\/p>\n<ul>\n<li>Planning the study: Start by asking a testable research question and deciding how to collect data. For example, how long was the study period of the coffee study? How many people were recruited for the study, how were they recruited, and from where? How old were they? What other variables were recorded about the individuals? Were changes made to the participants\u2019 coffee habits during the course of the study?<\/li>\n<li>Examining the data: What are appropriate ways to examine the data? What graphs are relevant, and what do they reveal? What descriptive statistics can be calculated to summarize relevant aspects of the data, and what do they reveal? What patterns do you see in the data? Are there any individual observations that deviate from the overall pattern, and what do they reveal? For example, in the coffee study, did the proportions differ when we compared the smokers to the non-smokers?<\/li>\n<li>Inferring from the data: What are valid statistical methods for drawing inferences \u201cbeyond\u201d the data you collected? In the coffee study, is the 10%\u201315% reduction in risk of death something that could have happened just by chance?<\/li>\n<li>Drawing conclusions: Based on what you learned from your data, what conclusions can you draw? Who do you think these conclusions apply to? (Were the people in the coffee study older? Healthy? Living in cities?) Can you draw a <strong>cause-and-effect<\/strong> conclusion about your treatments? (Are scientists now saying that the coffee drinking is the cause of the decreased risk of death?)<\/li>\n<\/ul>\n<p>Notice that the numerical analysis (\u201ccrunching numbers\u201d on the computer) comprises only a small part of overall statistical investigation. In this section, you will see how we can answer some of these questions and what questions you should be asking about any statistical investigation you read about.<\/p>\n<p><iframe loading=\"lazy\" id=\"oembed-1\" title=\"Types of statistical studies | Statistical studies | Probability and Statistics | Khan Academy\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/z-Qi4w6Xkuc?feature=oembed&#38;rel=0\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p><strong>Video 2.6.1.\u00a0<\/strong><em>Types of Statistical Studies<\/em> explains the differences between correlational and experimental research.<\/p>\n<p>\t<iframe id=\"lumen_assessment_2708\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=2708&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_2708\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<section>\n<h2>Distributional Thinking<\/h2>\n<p>When data are collected to address a particular question, an important first step is to think of meaningful ways to organize and examine the data. Let&#8217;s take a look at an example.<\/p>\n<p><strong>Example 1<\/strong>: Researchers investigated whether cancer pamphlets are written at an appropriate level to be read and understood by cancer patients (Short, Moriarty, &amp; Cooley, 1995). Tests of reading ability were given to 63 patients. In addition, readability level was determined for a <strong>sample<\/strong> of 30 pamphlets, based on characteristics such as the lengths of words and sentences in the pamphlet. The results, reported in terms of grade levels, are displayed in Figure 2.6.2.<\/p>\n<div style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/902\/2016\/06\/10212647\/000001456original.jpg\" alt=\"Table showing patients' reading levels and pahmphlet's reading levels.\" width=\"800\" height=\"236\" \/><\/p>\n<p class=\"wp-caption-text\"><strong>Figure 2.6.2<\/strong>. Frequency tables of patient reading levels and pamphlet readability levels.<\/p>\n<\/div>\n<figure><figcaption>Testing these two variables reveal two fundamental aspects of statistical thinking:<\/figcaption><\/figure>\n<ul>\n<li>Data <em>vary<\/em>. More specifically, values of a variable (such as reading level of a cancer patient or readability level of a cancer pamphlet) vary.<\/li>\n<li>Analyzing the pattern of variation, called the <strong>distribution<\/strong> of the variable, often reveals insights.<\/li>\n<\/ul>\n<p>Addressing the research question of whether the cancer pamphlets are written at appropriate levels for the cancer patients requires comparing the two distributions. A na\u00efve comparison might focus only on the centers of the distributions. Both medians turn out to be ninth grade, but considering only medians ignores the variability and the overall distributions of these data. A more illuminating approach is to compare the entire distributions, for example with a graph, as in Figure 2.6.3.<\/p>\n<figure>\n<div style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/902\/2016\/06\/10212639\/000001457original.jpg\" alt=\"Bar graph showing that the reading level of pamphlets is typically higher than the reading level of the patients.\" width=\"800\" height=\"388\" \/><\/p>\n<p class=\"wp-caption-text\"><strong>Figure 2.6.3<\/strong>. Comparison of patient reading levels and pamphlet readability levels.<\/p>\n<\/div>\n<\/figure>\n<p>Figure 2.6.3 makes clear that the two distributions are not well aligned at all. The most glaring discrepancy is that many patients (17\/63, or 27%, to be precise) have a reading level below that of the most readable pamphlet. These patients will need help to understand the information provided in the cancer pamphlets. Notice that this conclusion follows from considering the distributions as a whole, not simply measures of center or variability, and that the graph contrasts those distributions more immediately than the frequency tables.<\/p>\n<p>\t<iframe id=\"lumen_assessment_2741\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=2741&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_2741\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<h2>Statistical Significance<\/h2>\n<p>Even when we find patterns in data, often there is still uncertainty in various aspects of the data. For example, there may be potential for measurement errors (even your own body temperature can fluctuate by almost 1\u00b0F over the course of the day). Or we may only have a \u201csnapshot\u201d of observations from a more long-term process or only a small subset of individuals from the <strong>population<\/strong> of interest. In such cases, how can we determine whether patterns we see in our small set of data is convincing evidence of a systematic phenomenon in the larger process or population? Let&#8217;s take a look at another example.<\/p>\n<p><strong>Example 2<\/strong>: In a study reported in the November 2007 issue of <em>Nature<\/em>, researchers investigated whether pre-verbal infants take into account an individual\u2019s actions toward others in evaluating that individual as appealing or aversive (Hamlin, Wynn, &amp; Bloom, 2007). In one component of the study, 10-month-old infants were shown a \u201cclimber\u201d character (a piece of wood with \u201cgoogly\u201d eyes glued onto it) that could not make it up a hill in two tries. Then the infants were shown two scenarios for the climber\u2019s next try, one where the climber was pushed to the top of the hill by another character (\u201chelper\u201d), and one where the climber was pushed back down the hill by another character (\u201chinderer\u201d). The infant was alternately shown these two scenarios several times. Then the infant was presented with two pieces of wood (representing the helper and the hinderer characters) and asked to pick one to play with.<\/p>\n<p>The researchers found that of the 16 infants who made a clear choice, 14 chose to play with the helper toy. One possible explanation for this clear majority result is that the helping behavior of the one toy increases the infants\u2019 likelihood of choosing that toy. But are there other possible explanations? What about the color of the toy? Well, prior to collecting the data, the researchers arranged so that each color and shape (red square and blue circle) would be seen by the same number of infants. Or maybe the infants had right-handed tendencies and so picked whichever toy was closer to their right hand?<\/p>\n<p>Well, prior to collecting the data, the researchers arranged it so half the infants saw the helper toy on the right and half on the left. Or, maybe the shapes of these wooden characters (square, triangle, circle) had an effect? Perhaps, but again, the researchers controlled for this by rotating which shape was the helper toy, the hinderer toy, and the climber. When designing experiments, it is important to <em>control<\/em> for as many variables as might affect the responses as possible. It is beginning to appear that the researchers accounted for all the other plausible explanations. But there is one more important consideration that cannot be controlled\u2014if we did the study again with these 16 infants, they might not make the same choices. In other words, there is some <em>randomness<\/em> inherent in their selection process.<\/p>\n<h3>P-value<\/h3>\n<p>Maybe each infant had no genuine preference at all, and it was simply \u201crandom luck\u201d that led to 14 infants picking the helper toy. Although this random component cannot be controlled, we can apply a <em>probability model<\/em> to investigate the pattern of results that would occur in the long run if random chance were the only factor.<\/p>\n<p>If the infants were equally likely to pick between the two toys, then each infant had a 50% chance of picking the helper toy. It\u2019s like each infant tossed a coin, and if it landed heads, the infant picked the helper toy. So if we tossed a coin 16 times, could it land heads 14 times? Sure, it\u2019s possible, but it turns out to be very unlikely. Getting 14 (or more) heads in 16 tosses is about as likely as tossing a coin and getting 9 heads in a row. This probability is referred to as a <strong>p-value<\/strong>. The p-value represents the likelihood that experimental results happened by chance.\u00a0Within psychology, the most common standard for p-values is \u201cp &lt; .05\u201d. What this means is that there is less than a 5% probability that the results happened just by random chance, and therefore a 95% probability that the results reflect a meaningful pattern in human psychology. We call this <strong>statistical significance<\/strong>.<\/p>\n<p>\t<iframe id=\"lumen_assessment_2743\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=2743&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_2743\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>So, in the study above, if we assume that each infant was choosing equally, then the probability that 14 or more out of 16 infants would choose the helper toy is found to be 0.0021. We have only two logical possibilities: either the infants have a genuine preference for the helper toy, or the infants have no preference (50\/50), and an outcome that would occur only 2 times in 1,000 iterations happened in this study. Because this p-value of 0.0021 is quite small, we conclude that the study provides very strong evidence that these infants have a genuine preference for the helper toy.<\/p>\n<p>If we compare the p-value to some cut-off value, like 0.05, we see that the p=value is smaller.\u00a0Because the p-value is smaller than that cut-off value, then we reject the hypothesis that only random chance was at play here. In this case, these researchers would conclude that <em>significantly<\/em> more than half of the infants in the study chose the helper toy, giving strong evidence of a genuine preference for the toy with the helping behavior.<\/p>\n<p>\t<iframe id=\"lumen_assessment_2742\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=2742&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_2742\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<\/section>\n<h2>Generalizability<\/h2>\n<div id=\"attachment_1880\" style=\"width: 439px\" class=\"wp-caption alignright\"><a href=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17150423\/generalizability.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1880\" class=\"wp-image-1880\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17150423\/generalizability.jpg\" alt=\"Photo of a diverse group of college-aged students.\" width=\"429\" height=\"229\" \/><\/a><\/p>\n<p id=\"caption-attachment-1880\" class=\"wp-caption-text\"><strong>Figure 2.6.4<\/strong>. Generalizability is an important research consideration: The results of studies with widely representative samples are more likely to generalize to the population. [Image: Barnacles Budget Accommodation]<\/p>\n<\/div>\n<p>One limitation to the study mentioned previously about the babies choosing the &#8220;helper&#8221; toy\u00a0is that the conclusion only applies to the 16 infants in the study. We don\u2019t know much about how those 16 infants were selected. Suppose we want to select a subset of individuals (a <strong>sample<\/strong>) from a much larger group of individuals (the <strong>population<\/strong>) in such a way that conclusions from the sample can be <strong>generalized<\/strong> to the larger population. This is the question faced by pollsters every day.<\/p>\n<p><strong>Example 3<\/strong>: The General Social Survey (GSS) is a survey on societal trends conducted every other year in the United States. Based on a sample of about 2,000 adult Americans, researchers make claims about what percentage of the U.S. population consider themselves to be \u201cliberal,\u201d what percentage consider themselves \u201chappy,\u201d what percentage feel \u201crushed\u201d in their daily lives, and many other issues. The key to making these claims about the larger population of all American adults lies in how the sample is selected. The goal is to select a sample that is representative of the population, and a common way to achieve this goal is to select a <strong>random sample<\/strong> that gives every member of the population an equal chance of being selected for the sample. In its simplest form, random sampling involves numbering every member of the population and then using a computer to randomly select the subset to be surveyed. Most polls don\u2019t operate exactly like this, but they do use probability-based sampling methods to select individuals from nationally representative panels.<\/p>\n<p>\t<iframe id=\"lumen_assessment_2747\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=2747&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_2747\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>In 2004, the GSS reported that 817 of 977 respondents (or 83.6%) indicated that they always or sometimes feel rushed. This is a clear majority, but we again need to consider variation due to <em>random sampling<\/em>. Fortunately, we can use the same probability model we did in the previous example to investigate the probable size of this error. (Note, we can use the coin-tossing model when the actual population size is much, much larger than the sample size, as then we can still consider the probability to be the same for every individual in the sample.) This probability model predicts that the sample result will be within 3 percentage points of the population value (roughly 1 over the square root of the sample size, the <strong>margin of error<\/strong>). A statistician would conclude, with 95% confidence, that between 80.6% and 86.6% of all adult Americans in 2004 would have responded that they sometimes or always feel rushed.<\/p>\n<p>\t<iframe id=\"lumen_assessment_2748\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=2748&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_2748\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<p>The key to the margin of error is that when we use a probability sampling method, we can make claims about how often (in the long run, with repeated random sampling) the sample result would fall within a certain distance from the unknown population value by chance (meaning by random sampling variation) alone. Conversely, non-random samples are often suspect to bias, meaning the sampling method systematically over-represents some segments of the population and under-represents others. We also still need to consider other sources of bias, such as individuals not responding honestly. These sources of error are not measured by the margin of error.<\/p>\n<h2>Cause and Effect Conclusions<\/h2>\n<p>In many research studies, the primary question of interest concerns differences between groups. Then the question becomes how were the groups formed (e.g., selecting people who already drink coffee vs. those who don\u2019t). In some studies, the researchers actively form the groups themselves. But then we have a similar question\u2014could any differences we observe in the groups be an artifact of that group-formation process? Or maybe the difference we observe in the groups is so large that we can discount a \u201cfluke\u201d in the group-formation process as a reasonable explanation for what we find?<\/p>\n<p><strong>Example 4<\/strong>: A psychology study investigated whether people tend to display more creativity when they are thinking about intrinsic (internal) or extrinsic (external) motivations (Ramsey &amp; Schafer, 2002, based on a study by Amabile, 1985). The subjects were 47 people with extensive experience with creative writing. Subjects began by answering survey questions about either intrinsic motivations for writing (such as the pleasure of self-expression) or extrinsic motivations (such as public recognition). Then all subjects were instructed to write a haiku, and those poems were evaluated for creativity by a panel of judges. The researchers conjectured beforehand that subjects who were thinking about intrinsic motivations would display more creativity than subjects who were thinking about extrinsic motivations. The creativity scores from the 47 subjects in this study are displayed in Figure 2.6.5, where higher scores indicate more creativity.<\/p>\n<figure style=\"width: 91.2087912087912%;\">\n<div style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" style=\"width: 91.2087912087912%;\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/902\/2016\/06\/10212632\/000001452original.jpg\" alt=\"Image showing a dot for creativity scores, which vary between 5 and 27, and the types of motivation each person was given as a motivator, either extrinsic or intrinsic.\" width=\"800\" height=\"226\" \/><\/p>\n<p class=\"wp-caption-text\"><strong>Figure 2.6.5<\/strong>. Creativity scores separated by type of motivation.<\/p>\n<\/div><figcaption><\/figcaption><\/figure>\n<p>In this example, the key question is whether the type of motivation <em>affects<\/em> creativity scores. In particular, do subjects who were asked about intrinsic motivations tend to have higher creativity scores than subjects who were asked about extrinsic motivations?<\/p>\n<p>Figure 2.6.5 reveals that both motivation groups saw considerable variability in creativity scores, and these scores have considerable overlap between the groups. In other words, it\u2019s certainly not always the case that those with extrinsic motivations have higher creativity than those with intrinsic motivations, but there may still be a statistical <em>tendency<\/em> in this direction. (Psychologist Keith Stanovich (2013) refers to people\u2019s difficulties with thinking about such probabilistic tendencies as \u201cthe Achilles heel of human cognition.\u201d)<\/p>\n<p>The mean creativity score is 19.88 for the intrinsic group, compared to 15.74 for the extrinsic group, which supports the researchers\u2019 conjecture. Yet comparing only the means of the two groups fails to consider the variability of creativity scores in the groups. We can measure variability with statistics using, for instance, the standard deviation: 5.25 for the extrinsic group and 4.40 for the intrinsic group. The standard deviations tell us that most of the creativity scores are within about 5 points of the mean score in each group. We see that the mean score for the intrinsic group lies within one standard deviation of the mean score for extrinsic group. So, although there is a tendency for the creativity scores to be higher in the intrinsic group, on average, the difference is not extremely large.<\/p>\n<p>We again want to consider possible explanations for this difference. The study only involved individuals with extensive creative writing experience. Although this limits the population to which we can generalize, it does not explain why the mean creativity score was a bit larger for the intrinsic group than for the extrinsic group. Maybe women tend to receive higher creativity scores? Here is where we need to focus on how the individuals were assigned to the motivation groups. If only women were in the intrinsic motivation group and only men in the extrinsic group, then this would present a problem because we wouldn\u2019t know if the intrinsic group did better because of the different type of motivation or because they were women. However, the researchers guarded against such a problem by randomly assigning the individuals to the motivation groups. Like flipping a coin, each individual was just as likely to be assigned to either type of motivation. Why is this helpful? Because this <strong>random assignment<\/strong> tends to balance out all the variables related to creativity we can think of, and even those we don\u2019t think of in advance, between the two groups. So we should have a similar male\/female split between the two groups; we should have a similar age distribution between the two groups; we should have a similar distribution of educational background between the two groups; and so on. Random assignment should produce groups that are as similar as possible except for the type of motivation, which presumably eliminates all those other variables as possible explanations for the observed tendency for higher scores in the intrinsic group.<\/p>\n<p>But does this always work? No, so by \u201cluck of the draw\u201d the groups may be a little different prior to answering the motivation survey. So then the question is, is it possible that an unlucky random assignment is responsible for the observed difference in creativity scores between the groups? In other words, suppose each individual\u2019s poem was going to get the same creativity score no matter which group they were assigned to, that the type of motivation in no way impacted their score. Then how often would the random-assignment process alone lead to a difference in mean creativity scores as large (or larger) than 19.88 \u2013 15.74 = 4.14 points?<\/p>\n<p>We again want to apply to a probability model to approximate a <strong>p-value<\/strong>, but this time the model will be a bit different. Think of writing everyone\u2019s creativity scores on an index card, shuffling up the index cards, and then dealing out 23 to the extrinsic motivation group and 24 to the intrinsic motivation group, and finding the difference in the group means. We (better yet, the computer) can repeat this process over and over to see how often, when the scores don\u2019t change, random assignment leads to a difference in means at least as large as 4.41. Figure 2.6.6 shows the results from 1,000 such hypothetical random assignments for these scores.<\/p>\n<figure style=\"width: 51.0989010989011%;\">\n<div style=\"width: 810px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"\" style=\"width: 307px;\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images-archive-read-only\/wp-content\/uploads\/sites\/902\/2016\/06\/10212623\/000001454original.jpg\" alt=\"Standard distribution in a typical bell curve.\" width=\"800\" height=\"307\" \/><\/p>\n<p class=\"wp-caption-text\"><strong>Figure 2.6.6<\/strong>. Differences in group means under random assignment alone.<\/p>\n<\/div><figcaption><\/figcaption><\/figure>\n<p>Only 2 of the 1,000 simulated random assignments produced a difference in group means of 4.41 or larger. In other words, the approximate p-value is 2\/1000 = 0.002. This small p-value indicates that it would be very surprising for the random assignment process alone to produce such a large difference in group means. Therefore, as with Example 4, we have strong evidence that focusing on intrinsic motivations tends to increase creativity scores, as compared to thinking about extrinsic motivations.<\/p>\n<p>Notice that the previous statement implies a cause-and-effect relationship between motivation and creativity score; is such a strong conclusion justified? Yes, because of the random assignment used in the study. That should have balanced out any other variables between the two groups, so now that the small p-value convinces us that the higher mean in the intrinsic group wasn\u2019t just a coincidence, the only reasonable explanation left is the difference in the type of motivation. Can we generalize this conclusion to everyone? Not necessarily\u2014we could cautiously generalize this conclusion to individuals with extensive experience in creative writing similar to the individuals in this study, but we would still want to know more about how these individuals were selected to participate.<\/p>\n<p>\t<iframe id=\"lumen_assessment_2709\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=2709&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_2709\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<h2>Conclusion<\/h2>\n<div id=\"attachment_1881\" style=\"width: 286px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17150557\/conclusion.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1881\" class=\"wp-image-1881\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/855\/2016\/10\/17150557\/conclusion.jpg\" alt=\"Close-up photo of mathematical equations.\" width=\"276\" height=\"197\" \/><\/a><\/p>\n<p id=\"caption-attachment-1881\" class=\"wp-caption-text\"><strong>Figure 2.6.7<\/strong>. Researchers employ the scientific method that involves a great deal of statistical thinking: generate a hypothesis &#8211;&gt; design a study to test that hypothesis &#8211;&gt; conduct the study &#8211;&gt; analyze the data &#8211;&gt; report the results. [Image: widdowquinn]<\/p>\n<\/div>\n<p>Statistical thinking involves the careful design of a study to collect meaningful data to answer a focused research question, detailed analysis of patterns in the data, and drawing conclusions that go beyond the observed data. Random sampling is paramount to generalizing results from our sample to a larger population, and random assignment is key to drawing cause-and-effect conclusions. With both kinds of randomness, probability models help us assess how much random variation we can expect in our results, in order to determine whether our results could happen by chance alone and to estimate a margin of error.<\/p>\n<p>So where does this leave us with regard to the coffee study mentioned previously (the\u00a0Freedman, Park, Abnet, Hollenbeck, &amp; Sinha, 2012 found that men who drank at least six cups of coffee a day had a 10% lower chance of dying (women 15% lower) than those who drank none)? We can answer many of the questions:<\/p>\n<ul>\n<li>This was a 14-year study conducted by researchers at the National Cancer Institute.<\/li>\n<li>The results were published in the June issue of the <em>New England Journal of Medicine<\/em>, a respected, peer-reviewed journal.<\/li>\n<li>The study reviewed coffee habits of more than 402,000 people ages 50 to 71 from six states and two metropolitan areas. Those with cancer, heart disease, and stroke were excluded at the start of the study. Coffee consumption was assessed once at the start of the study.<\/li>\n<li>About 52,000 people died during the course of the study.<\/li>\n<li>People who drank between two and five cups of coffee daily showed a lower risk as well, but the amount of reduction increased for those drinking six or more cups.<\/li>\n<li>The sample sizes were fairly large and so the p-values are quite small, even though percent reduction in risk was not extremely large (dropping from a 12% chance to about 10%\u201311%).<\/li>\n<li>Whether coffee was caffeinated or decaffeinated did not appear to affect the results.<\/li>\n<li>This was an observational study, so no cause-and-effect conclusions can be drawn between coffee drinking and increased longevity, contrary to the impression conveyed by many news headlines about this study. In particular, it\u2019s possible that those with chronic diseases don\u2019t tend to drink coffee.<\/li>\n<\/ul>\n<p>This study needs to be reviewed in the larger context of similar studies and consistency of results across studies, with the constant caution that this was not a randomized experiment. Whereas a statistical analysis can still \u201cadjust\u201d for other potential confounding variables, we are not yet convinced that researchers have identified them all or completely isolated why this decrease in death risk is evident. Researchers can now take the findings of this study and develop more focused studies that address new questions.<\/p>\n<div class=\"textbox examples\">\n<h3>Learn More<\/h3>\n<p>Explore these outside resources to learn more about applied statistics:<\/p>\n<ul>\n<li>Video about p-values:\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=bVMVGHkt2cg\" target=\"_blank\" rel=\"noopener noreferrer\">P-Value Extravaganza<\/a><\/li>\n<li><a href=\"http:\/\/www.rossmanchance.com\/applets\/\" target=\"_blank\" rel=\"noopener\">Interactive web applets for teaching and learning statistics<\/a><\/li>\n<li>Inter-university Consortium for Political and Social Research\u00a0<a href=\"http:\/\/www.icpsr.umich.edu\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">where you can find and analyze data.<\/a><\/li>\n<li><a href=\"https:\/\/www.causeweb.org\/\" target=\"_blank\" rel=\"noopener\">The Consortium for the Advancement of Undergraduate Statistics<\/a><\/li>\n<\/ul>\n<\/div>\n<section><\/section>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-550\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Original<\/div><ul class=\"citation-list\"><li>Analyzing Data: Correlational and Experimental Research. <strong>Authored by<\/strong>: Nicole Arduini-Van Hoose. <strong>Provided by<\/strong>: Hudson Valley Community College. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/courses.lumenlearning.com\/adolescent\/chapter\/analyzing-data-correlational-and-experimental-research\/\">https:\/\/courses.lumenlearning.com\/adolescent\/chapter\/analyzing-data-correlational-and-experimental-research\/<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\">CC BY-NC-SA: Attribution-NonCommercial-ShareAlike<\/a><\/em><\/li><\/ul><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Psychology. <strong>Provided by<\/strong>: OpenStax. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/openstax.org\/details\/books\/psychology\">https:\/\/openstax.org\/details\/books\/psychology<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Specific attribution<\/div><ul class=\"citation-list\"><li>Types of Statistical Studies. <strong>Authored by<\/strong>: Sal Khan. <strong>Provided by<\/strong>: Khan Academy. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"https:\/\/youtu.be\/z-Qi4w6Xkuc\">https:\/\/youtu.be\/z-Qi4w6Xkuc<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">CC BY-NC-ND: Attribution-NonCommercial-NoDerivatives <\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":185983,"menu_order":5,"template":"","meta":{"_candela_citation":"[{\"type\":\"original\",\"description\":\"Analyzing Data: Correlational and Experimental Research\",\"author\":\"Nicole Arduini-Van Hoose\",\"organization\":\"Hudson Valley Community College\",\"url\":\"https:\/\/courses.lumenlearning.com\/adolescent\/chapter\/analyzing-data-correlational-and-experimental-research\/\",\"project\":\"\",\"license\":\"cc-by-nc-sa\",\"license_terms\":\"\"},{\"type\":\"cc-attribution\",\"description\":\"Types of Statistical Studies\",\"author\":\"Sal Khan\",\"organization\":\"Khan Academy\",\"url\":\"https:\/\/youtu.be\/z-Qi4w6Xkuc\",\"project\":\"\",\"license\":\"cc-by-nc-nd\",\"license_terms\":\"\"},{\"type\":\"cc\",\"description\":\"Psychology\",\"author\":\"\",\"organization\":\"OpenStax\",\"url\":\"https:\/\/openstax.org\/details\/books\/psychology\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"Psychological Research","pb_subtitle":"Analyzing Data: Correlational and Experimental Research","pb_authors":["narduinivanhoos"],"pb_section_license":""},"chapter-type":[],"contributor":[57],"license":[],"class_list":["post-550","chapter","type-chapter","status-publish","hentry","contributor-narduinivanhoos"],"part":440,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/pressbooks\/v2\/chapters\/550","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/wp\/v2\/users\/185983"}],"version-history":[{"count":9,"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/pressbooks\/v2\/chapters\/550\/revisions"}],"predecessor-version":[{"id":1462,"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/pressbooks\/v2\/chapters\/550\/revisions\/1462"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/pressbooks\/v2\/parts\/440"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/pressbooks\/v2\/chapters\/550\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/wp\/v2\/media?parent=550"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/pressbooks\/v2\/chapter-type?post=550"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/wp\/v2\/contributor?post=550"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/adolescent\/wp-json\/wp\/v2\/license?post=550"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}