{"id":91,"date":"2021-01-25T00:57:29","date_gmt":"2021-01-25T00:57:29","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/?post_type=chapter&#038;p=91"},"modified":"2021-01-25T00:57:29","modified_gmt":"2021-01-25T00:57:29","slug":"7-data-statistics","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/chapter\/7-data-statistics\/","title":{"raw":"7. Data: Statistics","rendered":"7. Data: Statistics"},"content":{"raw":"<div class=\"article-introduction\">\r\n\r\nModern science is often based on statements of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistical+significance\/pop\">statistical significance<\/a> and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a>. For example: 1) studies have shown that the probability of developing lung cancer is almost 20 times greater in cigarette smokers compared to nonsmokers (ACS, 2004); 2) there is a significant likelihood of a catastrophic meteorite impact on Earth sometime in the next 200,000 years (Bland, 2005); and 3) first-born male children exhibit IQ test scores that are 2.82 points higher than second-born males, a difference that is significant at the 95% <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/confidence+level\/pop\">confidence level<\/a> (Kristensen &amp; Bjerkedal, 2007). But why do scientists speak in terms that seem obscure? If cigarette smoking causes lung cancer, why not simply say so? If we should immediately establish a colony on the moon to escape extraterrestrial disaster, why not inform people? And if older children are smarter than their younger siblings, why not let them know?\r\n\r\nThe reason is that none of these latter statements accurately reflects the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>. Scientific data rarely lead to absolute conclusions. Not all smokers die from lung cancer \u2013 some smokers decide to quit, thus reducing their risk, some smokers may die prematurely from cardiovascular or diseases other than lung cancer, and some smokers may simply never contract the disease. All data exhibit variability, and it is the role of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> to <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/quantify\/pop\">quantify<\/a> this variability and allow scientists to make more accurate statements about their data.\r\n<figure class=\"centered\"><img src=\"https:\/\/www.visionlearning.com\/img\/library\/modules\/mid155\/Image\/VLObject-4077-081104111140.jpg\" alt=\"dive\" \/><figcaption>Figure 1: The field of statistics has its roots in calculations of the probable outcomes of games of chance.<\/figcaption><\/figure>\r\nA common misconception is that <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> provide a measure of proof that something is true, but they actually do no such thing. Instead, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> provide a measure of the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> of observing a certain result. This is a critical distinction. For example, the American Cancer Society has conducted several massive studies of cancer in an effort to make statements about the risks of the disease in US citizens. Cancer Prevention Study I enrolled approximately 1 million people between 1959 and 1960, and Cancer Prevention Study II was even larger, enrolling 1.2 million people in 1982. Both of these studies found much higher rates of lung cancer among cigarette smokers compared to nonsmokers, however, not all individuals who smoked contracted lung cancer (and, in fact, some nonsmokers did contract lung cancer). Thus, the development of lung cancer is a probability-based event, not a simple cause-and-effect relationship.\r\n\r\nStatistical techniques allow scientists to put numbers to this <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a>, moving from a statement like \"If you smoke cigarettes, you are more likely to develop lung cancer\" to the one that started this module: \"The probability of developing lung cancer is almost 20 times greater in cigarette smokers compared to nonsmokers.\" The quantification of probability offered by <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> is a powerful tool used widely throughout science, but it is frequently misunderstood.\r\n<div class=\"comprehension-checkpoint\">\r\n<p class=\"leader\">Comprehension Checkpoint<\/p>\r\n<p class=\"question\">Statistics can<\/p>\r\n\r\n<form class=\"question\" name=\"cc5754\">\r\n<ul class=\"quiz-options\">\r\n \t<li class=\"option-a\"><label class=\"choice\" for=\"q1-5754-0-option-a\">describe how much uncertainty there is in scientific results.<\/label><\/li>\r\n \t<li class=\"option-b\"><label class=\"choice\" for=\"q1-5754-1-option-b\">provide proof that something is true.<\/label><\/li>\r\n<\/ul>\r\n<\/form><\/div>\r\n<\/div>\r\n<section id=\"toc_1\" class=\"article-section\">\r\n<h2>What is statistics?<\/h2>\r\nThe field of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> dates to 1654 when a French gambler, Antoine Gombaud, asked the noted mathematician and philosopher <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/Pascal%2C+Blaise\/pop\">Blaise Pascal<\/a> about how one should divide the stakes among players when a game of chance is interrupted prematurely. Pascal posed the question to the lawyer and mathematician Pierre de Fermat, and over a series of letters, Pascal and Fermat devised a mathematical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/system\/pop\">system<\/a> that not only answered Gombaud's original question, but laid the foundations of modern <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/theory\/pop\">theory<\/a> and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a>.\r\n\r\nFrom its roots in gambling, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> has grown into a field of study that involves the development of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/method\/pop\">methods<\/a> and tests that are used to quantitatively define the variability <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/inherent\/pop\">inherent<\/a> in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>, the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> of certain <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/outcome\/pop\">outcomes<\/a>, and the error and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/uncertainty\/pop\">uncertainty<\/a> associated with those outcomes (see our <a href=\"https:\/\/www.visionlearning.com\/library\/module_viewer.php?mid=157\">Uncertainty, Error, and Confidence<\/a> module). As such, statistical methods are used extensively throughout the scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/process\/pop\">process<\/a>, from the design of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> questions through data <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a> and to the final <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/interpretation\/pop\">interpretation<\/a> of data.\r\n\r\nThe specific statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/method\/pop\">methods<\/a> used vary widely between different scientific disciplines; however, the reasons that these tests and techniques are used are similar across disciplines. This module does not attempt to introduce the many different statistical concepts and tests that have been developed, but rather provides an overview of how various statistical methods are used in science. More information about specific statistical tests and methods can be found under the Resources tab.\r\n\r\n<\/section><section id=\"toc_2\" class=\"article-section\">\r\n<h2>Statistics in research design<\/h2>\r\nMany people misinterpret statements of likelihood and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> as a sign of weakness or <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/uncertainty\/pop\">uncertainty<\/a> in scientific results. However, the use of statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/method\/pop\">methods<\/a> and probability tests in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> is an important aspect of science that adds strength and certainty to scientific conclusions. For example, in 1843, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/Lawes%2C+John+Bennet\/pop\">John Bennet Lawes<\/a>, an English entrepreneur, founded the Rothamsted Experimental Station in Hertfordshire, England to investigate the impact of fertilizer application on crop yield. Lawes was motivated to do so because he had established one of the first artificial fertilizer factories a year earlier. For the next 80 years, researchers at the Station conducted <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/experiment\/pop\">experiments<\/a> in which they applied fertilizers, planted different crops, kept track of the amount of rain that fell, and measured the size of the harvest at the end of each growing season.\r\n\r\nBy the turn of the century, the Station had a vast collection of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> but few useful conclusions: One fertilizer would outperform another one year but underperform the next, certain fertilizers appeared to affect only certain crops, and the differing amounts of rainfall that fell each year continually confounded the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/experiment\/pop\">experiments<\/a> (Salsburg, 2001). The data were essentially useless because there were a large number of uncontrolled <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/variable\/pop\">variables<\/a>.\r\n<figure class=\"centered\"><a title=\"Building at the Rothamsted Research Station\" href=\"https:\/\/www.visionlearning.com\/img\/library\/large_images\/image_4078.jpg\"> <img src=\"https:\/\/www.visionlearning.com\/img\/library\/modules\/mid155\/Image\/VLObject-4078-081104111154.jpg\" alt=\"Building at the Rothamsted Research Station\" \/> <\/a><figcaption>Figure 2: A building at the Rothamsted Research Station<\/figcaption><\/figure>\r\nIn 1919, the Rothamsted Station hired a young statistician by the name of Ronald Aylmer Fisher to try to make some sense of the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>. Fisher's statistical analyses suggested that the relationship between rainfall and plant growth was far more statistically significant than the relationship between fertilizer type and plant growth. But the agricultural scientists at the station weren't out to test for weather \u2013 they wanted to know which fertilizers were most effective for which crops. No one could remove weather as a <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/variable\/pop\">variable<\/a> in the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/experiment\/pop\">experiments<\/a>, but Fisher realized that its effects could essentially be separated out if the experiments were designed appropriately.\r\n\r\nIn order to share his insights with the scientific community, Fisher published two books: <em>Statistical Methods for Research Workers<\/em> in 1925 and <em>The Design of Experiments<\/em> in 1935. By highlighting the need to consider statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a> during the planning stages of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a>, Fisher revolutionized the practice of science and transformed the Rothamsted Station into a major center for research on <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> and agriculture, which it still is today.\r\n\r\nIn <em>The Design of Experiments<\/em>, Fisher introduced several concepts that have become hallmarks of good scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a>, including the use of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/control\/pop\">controls<\/a>, randomization, and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/replication\/pop\">replication<\/a> (Figure 3).\r\n<figure><img src=\"https:\/\/www.visionlearning.com\/img\/library\/modules\/mid155\/Image\/VLObject-4258-081126041115.jpg\" alt=\"Fisher's Barley Treatment Plot Design\" \/><figcaption>Figure 3: An original figure from Fisher's <em>The Design of Experiments<\/em> showing the arrangement of treatment groups and yields of barley in an experiment at the Rothamsted station in 1927 (Fisher, 1935). Letters in parentheses denote control plots not treated with fertilizer (<em>I<\/em>) or those treated with different fertilizers (<em>s<\/em> = sulfate of ammonia, <em>m<\/em> = chloride of ammonia, <em>c<\/em> = cyanamide, and <em>u<\/em> = urea) with or without the addition of superphosphate (<em>p<\/em>). Subscripted numbers in parentheses indicate relative quantities of fertilizer used. Numbers at the bottom of each block indicate the relative yield of barley from the plot.<\/figcaption><\/figure>\r\n<blockquote><b>Controls:<\/b> The use of controls is based on the concept of variability. Since any phenomenon has some measure of variability, controls allow the researcher to measure natural, random, or systematic variability in a similar system and use that estimate as a baseline for comparison to the observed variable or phenomenon. At Rothamsted, a control would be a crop that did not receive the application of fertilizer (see plots labeled <em>I<\/em> in Figure 3). The variability inherent in plant growth would still produce plants of varying heights and sizes. The control then could provide a measure of the impact that weather or other variables could have on crop growth independent of fertilizer application, thus allowing the researchers to statistically remove this as a factor.<\/blockquote>\r\n<blockquote><b>Randomization:<\/b> Statistical randomization helps to manage bias in scientific research. Unlike the common use of the word <em>random,<\/em> which implies haphazard or disorganized, statistical randomization is a precise procedure in which units being observed are assigned to a treatment or control group in a manner that takes into account the potential influence of confounding variables. This allows the researcher to quantify the influence of these confounding variables by observing them in both the control and treatment groups. For example, before Fisher, fertilizers were applied along different crop rows at Rothamsted, some of which fell entirely along the edge of fields. Yet edges are known to affect agricultural yield, and so it was difficult in many cases to distinguish edge effects from fertilizer effects \u2013 the edge effects would be considered a confounding variable. Fisher introduced a process of randomly assigning different fertilizers to different plots within a field in a single year while assuring that not all of the treatment (or control) plots for any particular fertilizer fell along the edge of the field (see Figure 3).<\/blockquote>\r\n<blockquote><b>Replication:<\/b> Fisher also advocated for replicating experimental trials and measurements. This way the range of variability inherently associated with the experiment or measurement could be quantified and the robustness of the results could be evaluated. At Rothamsted this meant planting multiple plots with the same crop and applying the same fertilizer to each of those plots (see Figure 3). Further, this meant repeating similar applications in different years so that the variability of different fertilizer applications as a function of different weather conditions could be quantified.<\/blockquote>\r\nIn general, scientists design <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> studies based on the nature of the question they are seeking to investigate, but they refine their research plan in line with many of Fisher's statistical concepts to increase the likelihood that their findings will be useful. The incorporation of these techniques facilitates the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a> and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/interpretation\/pop\">interpretation<\/a> of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>, another place where <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> are used.\r\n<div class=\"comprehension-checkpoint\">\r\n<p class=\"leader\">Comprehension Checkpoint<\/p>\r\n<p class=\"question\"><em>Statistical randomization<\/em> is a term that scientists apply to research that does not follow a set procedure.<\/p>\r\n\r\n<form class=\"question\" name=\"cc5755\">\r\n<ul class=\"quiz-options\">\r\n \t<li class=\"option-a\"><label class=\"choice\" for=\"q1-5755-0-option-a\">True<\/label><\/li>\r\n \t<li class=\"option-b\"><label class=\"choice\" for=\"q1-5755-1-option-b\">False<\/label><\/li>\r\n<\/ul>\r\n<\/form><\/div>\r\n<\/section><section id=\"toc_3\" class=\"article-section\">\r\n<h2>Statistics in data analysis<\/h2>\r\nA multitude of statistical techniques have been developed for <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a>, but they generally fall into two groups: descriptive and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/inferential\/pop\">inferential<\/a>.\r\n<blockquote><b>Descriptive Statistics:<\/b> Descriptive statistics allow a scientist to quickly sum up major attributes of a dataset using measures such as the mean, median, and standard deviation. These measures provide a general sense of the group being studied, allowing scientists to place the study within a larger context. For example, Cancer Prevention Study I (CPS-I) was a prospective mortality study initiated in 1959 as mentioned earlier. Researchers conducting the study reported the age and demographics of participants, among other variables, to allow a comparison between the study group and the broader population of the United States at the time. Adults participating in the study ranged from 30 to 108 years of age, with the median age reported as 52 years. The study subjects were 57% female, 97% white, and 2% black. By comparison, median age in the United States in 1959 was 29.4 years of age, obviously much younger than the study group since CPS-I did not enroll anyone under 30 years of age. Further, 51% of US residents were female in 1960, 89% white, and about 11% black. One recognized shortcoming of CPS I, easily identifiable from the descriptive statistics, was that with 97% participants categorized as white, the study did not adequately assess disease profiles in minority groups of the US.<\/blockquote>\r\n<blockquote><b>Inferential Statistics:<\/b> Inferential statistics are used to model patterns in data, make judgments about data, identify relationships between variables in datasets, and make inferences about larger populations based on smaller samples of data. It is important to keep in mind that from a statistical perspective, the word \"population\" does not have to mean a group of people as it does in common language. A statistical population is the larger group that a dataset is used to make inferences about \u2013 this can be a group of people, corn plants, meteor impacts, oil field locations, or any other group of measurements as the case may be.<\/blockquote>\r\n<blockquote>Transferring results from small sample sizes to large populations is especially important with respect to scientific studies. For example, while Cancer Prevention Studies I and II enrolled approximately 1 million and 1.2 million people, respectively, they represented a small fraction of the 179 and 226 million people who were living in the United States in 1960 and 1980. Common inferential techniques include regression, correlation, and point estimation\/testing. For example, Petter Kristensen and Tor Bjerkedal (2007) examined IQ test scores in a group of 250,000 male Norwegian military personnel. Their analyses suggested that first-born male children had an average IQ test score 2.82 \u00b1 0.07 points higher than second-born male children, a statistically significant difference at the 95% confidence level (Kristensen &amp; Bjerkedal, 2007).<\/blockquote>\r\nThe phrase \"statistically significant\" is a key concept in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a>, and it is commonly misunderstood. Many people assume that, like the common use of the word <em>significant<\/em>, calling a result statistically significant means that the result is important or momentous, but this is not the case. Instead, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistical+significance\/pop\">statistical significance<\/a> is an estimate of the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> that the observed association or difference is due to chance rather than any real association. In other words, tests of statistical significance describe the likelihood that an observed association or difference would be seen even if there were no real association or difference actually present. The measure of significance is often expressed in terms of confidence, which has the same meaning in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> as it does in common language, but can be quantified.\r\n\r\nIn Kristensen and Bjerkedal's work, for example, the IQ difference between first- and second-born men was found to be significant at a 95% <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/confidence+level\/pop\">confidence level<\/a>, meaning that there is only a 5% <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> that the IQ difference is due purely to chance. This does not mean that the difference is large or even important: 2.82 IQ points is a tiny blip on the IQ scale and hardly enough to declare first-borns geniuses in relation to their younger siblings. Nor do the findings imply that the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/outcome\/pop\">outcome<\/a> is 95% \"correct.\" Instead, they indicate that the observed difference is not due simply to random <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/sampling\/pop\">sampling<\/a> bias and that there is a 95% probability the same results would be seen again if another researcher conducted a similar study in a different <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/population\/pop\">population<\/a> of Norwegian men. A second-born Norwegian who has a higher IQ than his older brother does not disprove the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> \u2013 it is just a statistically less likely outcome.\r\n\r\nJust as revealing as a statistically significant difference or relationship, is the lack of a <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistical+significance\/pop\">statistical significance<\/a> difference. For example, researchers have found that the risks of dying from heart disease in men who have quit smoking for at least two years is not significantly different from the risk of the disease in male nonsmokers (Rosenberg et al., 1985). So, the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> show that while smokers have a significantly higher rate of heart disease than nonsmokers, this risk falls back to baseline within just two years after having quit smoking.\r\n<div class=\"comprehension-checkpoint\">\r\n<p class=\"leader\">Comprehension Checkpoint<\/p>\r\n<p class=\"question\">If a result is <em>statistically significant<\/em>, it means that the result is likely<\/p>\r\n\r\n<form class=\"question\" name=\"cc5756\">\r\n<ul class=\"quiz-options\">\r\n \t<li class=\"option-a\"><label class=\"choice\" for=\"q1-5756-0-option-a\">due to a pattern or trend as opposed to a random error.<\/label><\/li>\r\n \t<li class=\"option-b\"><label class=\"choice\" for=\"q1-5756-1-option-b\">of crucial importance in the scientific community.<\/label><\/li>\r\n<\/ul>\r\n<\/form><\/div>\r\n<\/section><section id=\"toc_4\" class=\"article-section\">\r\n<h2>Limitations, misconceptions, and the misuse of statistics<\/h2>\r\nGiven the wide variety of possible statistical tests, it is easy to misuse <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a>, often to the point of deception. One reason for this is that <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> do not address <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/systematic+error\/pop\">systematic error<\/a> that can be introduced into a study either intentionally or accidentally. For example, in one of the first studies that reported on the effects of quitting smoking, E. Cuyler Hammond and Daniel Horn found that individuals who smoked more than one pack of cigarettes a day but had quit smoking within the past year had a death rate of 198.0, significantly higher than the rate of 157.1 for individuals who were still smoking more than one pack a day at the time of their study (Hammond &amp; Horn, 1958). Without a proper understanding of the study, one might conclude from the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistics\/pop\">statistics<\/a> that quitting smoking is actually dangerous for heavy smokers. However, Hammond later offers an explanation for this finding when he says, \"This is not surprising in light of the fact that recent ex-smokers, as a group, are heavily weighted with men in ill health\" (Hammond, 1965). Thus, heavy smokers who had stopped smoking included many individuals who had quit because they were already diagnosed with an illness, thus adding systematic error to the sample set. Without a complete understanding of these facts, the statistics alone could be misinterpreted.\r\n\r\nThe most effective use of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a>, then, is to identify trends and features within a <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/dataset\/pop\">dataset<\/a>. These trends can then be interpreted by the researcher in light of his or her understanding of their scientific basis, possibly opening up opportunities for further study. Andrew Lang, a Scottish poet and novelist, famously summed up this aspect of statistical testing when he stated, \"An unsophisticated forecaster uses <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> as a drunken man uses lamp-posts \u2013 for support rather than for illumination.\"\r\n\r\nAnother misconception of statistical testing is that statistical relationships and associations prove causation. In reality, identification of a <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/correlation\/pop\">correlation<\/a> or association between <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/variable\/pop\">variables<\/a> does not mean that a change in one variable actually caused the change in another variable. For example, in 1950 Richard Doll and Austin Hill, British researchers who became known for conducting one of the first scientifically valid <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/comparative\/pop\">comparative<\/a> studies (see our <a href=\"https:\/\/www.visionlearning.com\/library\/module_viewer.php?mid=152\">Comparison in Research<\/a> module) of smoking and the development of lung cancer, famously wrote about the correlation they uncovered:\r\n<blockquote>This is not necessarily to state that smoking causes carcinoma of the lung. The association would occur if carcinoma of the lung caused people to smoke or if both attributes were end-effects of a common cause. (Doll &amp; Hill, 1950)<\/blockquote>\r\nDoll and Hill went on to discuss the scientific basis of the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/correlation\/pop\">correlation<\/a> and the fact that the habit of smoking preceded the development of lung cancer in all of their study <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/subject\/pop\">subjects<\/a>, leading them to conclude \"...that smoking is a factor, and an important factor, in the production of carcinoma of the lung.\" As multiple lines of scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/evidence\/pop\">evidence<\/a> have accumulated regarding the association between smoking and lung cancer, scientists are now able to make very accurate statements about the statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> of risk associated with smoking cigarettes.\r\n<figure><a title=\"cigarette\" href=\"https:\/\/www.visionlearning.com\/img\/library\/large_images\/image_4079.jpg\"> <img src=\"https:\/\/www.visionlearning.com\/img\/library\/modules\/mid155\/Image\/VLObject-4079-081104121109.jpg\" alt=\"cigarette\" \/> <\/a><figcaption>Figure 4: Filtered and low tar cigarettes were advertised as less dangerous based on hollow statistics. <span class=\"credit\">image \u00a9 Tomasz Sienicki<\/span><\/figcaption><\/figure>\r\nWhile <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> help uncover patterns, relationships, and variability in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>, they can unfortunately be used to misrepresent data, relationships, and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/interpretation\/pop\">interpretations<\/a>. For example, in the late 1950s, in light of the mounting <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/comparative\/pop\">comparative<\/a> studies that demonstrated a causative relationship between cigarette smoking and lung cancer, the major tobacco companies began to investigate the viability of marketing alternative products that they could promote as \"healthier\" than regular cigarettes. As a result, filtered and light cigarettes were developed. The tobacco industry then sponsored and widely advertised <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> that suggested that the common <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/cellulose\/pop\">cellulose<\/a> acetate filter reduced tar in regular cigarettes by 42-46% and nicotine by 19-35%. Marlboro<sup>\u00ae<\/sup> filtered cigarettes claimed to have \"22 percent less tar, 34 percent less nicotine\" than other brands. The tobacco industry launched a similar advertising campaign promoting low tar cigarettes (6 to 12 mg tar compared to 12 to 16 mg in \"regular\" cigarettes) and ultra low tar cigarettes (under 6 mg) (Glantz et al., 1996).\r\n\r\nWhile the industry flooded the public with <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> on tar content, the tobacco companies did not advertise the fact that there was no <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> to indicate that tar or nicotine were the causative agents in the development of smoking-induced lung cancer. In fact, several research studies showed that the risks associated with low tar products were no different than regular products, and worse still, some studies showed that \"low tar\" cigarettes led to increased consumption of cigarettes among smokers (Stepney, 1980; NCI, 2001). Thus hollow <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> were used to mislead the public and detract from the real issue.\r\n<div class=\"comprehension-checkpoint\">\r\n<p class=\"leader\">Comprehension Checkpoint<\/p>\r\n<p class=\"question\">If there is a statistical correlation between two events or variables, this means that one event <em>causes<\/em> the other.<\/p>\r\n\r\n<form class=\"question\" name=\"cc5757\">\r\n<ul class=\"quiz-options\">\r\n \t<li class=\"option-a\"><label class=\"choice\" for=\"q1-5757-0-option-a\">True<\/label><\/li>\r\n \t<li class=\"option-b\"><label class=\"choice\" for=\"q1-5757-1-option-b\">False<\/label><\/li>\r\n<\/ul>\r\n<\/form><\/div>\r\n<\/section><section id=\"toc_5\" class=\"article-section\">\r\n<h2>Statistics and scientific research<\/h2>\r\nAll measurements contain some <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/uncertainty\/pop\">uncertainty<\/a> and error, and statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/method\/pop\">methods<\/a> help us <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/quantify\/pop\">quantify<\/a> and characterize this uncertainty. This helps explain why scientists often speak in qualified statements. For example, no <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/seismologist\/pop\">seismologist<\/a> who studies <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/earthquake\/pop\">earthquakes<\/a> would be willing to tell you exactly when an earthquake is going to occur; instead, the US Geological Survey issues statements like this: \"There is ... a 62% <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> of at least one <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/magnitude\/pop\">magnitude<\/a> 6.7 or greater earthquake in the 3-decade interval 2003-2032 within the San Francisco Bay Region\" (USGS, 2007). This may sound ambiguous, but it is in fact a very precise, mathematically-derived description of how confident seismologists are that a major earthquake will occur, and open reporting of error and uncertainty is a hallmark of quality scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a>.\r\n\r\nToday, science and statistical analyses have become so intertwined that many scientific disciplines have developed their own subsets of statistical techniques and terminology. For example, the field of biostatistics (sometimes referred to as biometry) involves the application of specific statistical techniques to disciplines in biology such as <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/population\/pop\">population<\/a> genetics, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/epidemiology\/pop\">epidemiology<\/a>, and public health. The field of geostatistics has evolved to develop specialized spatial <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a> techniques that help geologists map the location of petroleum and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/mineral\/pop\">mineral<\/a> deposits; these spatial analysis techniques have also helped Starbuck's<sup>\u00ae<\/sup> determine the ideal distribution of coffee shops based on maximizing the number of customers visiting each store. Used correctly, statistical analysis goes well beyond finding the next oil field or cup of coffee to illuminating scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> in a way that helps <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/validate\/pop\">validate<\/a> scientific knowledge.\r\n\r\n<\/section><section id=\"toc-999\" class=\"article-section\">\r\n<h3>Summary<\/h3>\r\nScientific research rarely leads to absolute certainty. There is some degree of uncertainty in all conclusions, and statistics allow us to discuss that uncertainty. Statistical methods are used in all areas of science. The module explores the difference between (a) proving that something is true and (b) measuring the probability of getting a certain result. It explains how common words like \"significant,\" \"control,\" and \"random\" have a different meaning in the field of statistics than in everyday life.\r\n<h3>Key Concepts<\/h3>\r\n<ul class=\"bulleted\">\r\n \t<li>Statistics are used to describe the variability inherent in data in a quantitative fashion, and to quantify relationships between variables.<\/li>\r\n \t<li>Statistical analysis is used in designing scientific studies to increase consistency, measure uncertainty, and produce robust datasets.<\/li>\r\n \t<li>There are a number of misconceptions that surround statistics, including confusion between statistical terms and the common language use of similar terms, and the role that statistics employ in data analysis.<\/li>\r\n<\/ul>\r\n<\/section><footer>\r\n<ul class=\"indented links\">\r\n \t<li>\r\n<h5>Further Reading<\/h5>\r\n<\/li>\r\n \t<li><a href=\"https:\/\/www.visionlearning.com\/en\/library\/Process-of-Science\/49\/Using-Graphs-and-Visual-Data-in-Science\/156\">Using Graphs and Visual Data in Science<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.visionlearning.com\/en\/library\/Process-of-Science\/49\/Uncertainty-Error-and-Confidence\/157\">Uncertainty, Error, and Confidence<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.visionlearning.com\/en\/library\/Process-of-Science\/49\/Data-Analysis-and-Interpretation\/154\">Data Analysis and Interpretation<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.visionlearning.com\/en\/library\/Math-in-Science\/62\/Introduction-to-Descriptive-Statistics\/218\">Introduction to Descriptive Statistics<\/a><\/li>\r\n<\/ul>\r\n<a name=\"refs\"><\/a>\r\n<ul class=\"indented list\">\r\n \t<li>\r\n<h5>References<\/h5>\r\n<\/li>\r\n \t<li>ACS (American Cancer Society). (2004). <em>Cancer facts &amp; figures - 2004.<\/em> Atlanta, GA: American Cancer Society.<\/li>\r\n \t<li>ACS (American Cancer Society). (2007). <a href=\"http:\/\/www.cancer.org\/docroot\/RES\/content\/RES_6_2_Study_Overviews.asp\">Cancer prevention studies overview<\/a>. Atlanta, GA: American Cancer Society.<\/li>\r\n \t<li>ACS (American Cancer Society). (2008). <em><a href=\"http:\/\/www.cancer.org\/docroot\/RES\/content\/RES_6_1_Characteristics_of_American_Cancer_Society_cohorts.asp?sitearea=&amp;level=\">Characteristics of American Cancer Society cohorts<\/a>.<\/em> Atlanta, GA: American Cancer Society. Retrieved July 18, 2008.<\/li>\r\n \t<li>Bland, P. A. (2005). The impact rate on Earth. <em>Philosophical Transactions of the Royal Society A, 363,<\/em> 2793-2810.<\/li>\r\n \t<li>Cohen, J. (1988). <em>Statistical power analysis for the behavioral sciences<\/em> (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.<\/li>\r\n \t<li>Doll, R., &amp; Hill, A. B. (1950). Smoking and carcinoma of the lung. <em>British Medical Journal 2<\/em>(4682), 739-748.<\/li>\r\n \t<li>Fisher, R. A. (1935). <em>The design of experiments.<\/em> Oxford: Oxford University Press.<\/li>\r\n \t<li>Glantz, S. A., Slade, J., Bero, L. A., Hanauer, P., &amp; Barnes, D. E. (1996). <em>The cigarette papers.<\/em> Berkeley, CA: University of California Press.<\/li>\r\n \t<li>Hamilton, W. L., Norton, G. d., Ouellette, T. K., Rhodes, W. M., Kling, R., &amp; Connolly, G. N. (2004). Smokers' responses to advertisements for regular and light cigarettes and potential reduced-exposure tobacco products. <em>Nicotine &amp; Tobacco Research, 6<\/em>(Supp. 3), S353-S362.<\/li>\r\n \t<li>Hammond, E. C., &amp; Horn, D. (1958). Smoking and death rates: Report on forty-four months of follow-up of 187,783 men. 2. Death rates by cause.<em> Journal of the American Medical Association, 166<\/em>(11), 1294-308.<\/li>\r\n \t<li>Hammond, E. C. (1965). Evidence of the effects of giving up cigarette smoking. <em>American Journal of Public Health, 55,<\/em> 682-691.<\/li>\r\n \t<li>Kristensen, P., &amp; Bjerkedal, T. (2007). Explaining the relation between birth order and intelligence. <em>Science 316<\/em>(5832), 1717.<\/li>\r\n \t<li>National Center for Health Statistics. (2006).<em> Health, United States, 2006<\/em>. NCHS, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services.<\/li>\r\n \t<li>NCI - National Cancer Institute. (2001). <em><a href=\"http:\/\/dccps.cancer.gov\/TCRB\/monographs\/13\/\">Monograph 13: Risks associated with smoking cigarettes with low tar machine-measured yields of tar and nicotine<\/a><\/em>. NCI, Tobacco Control Research, Document M914.<\/li>\r\n \t<li>Rosenberg, L., Kaufman, D. W., Helmrich, S. P., Shapiro, S. (1985). The risk of myocardial infarction after quitting smoking in men under 55 years of age. <em>New England Journal of Medicine, 313,<\/em> 1511-1514.<\/li>\r\n \t<li>Salsburg, D. (2001). <em>The lady tasting tea: How statistics revolutionized science in the twentieth century.<\/em> New York: W. H. Freeman &amp; Company.<\/li>\r\n \t<li>Silverstein, B., Feld, S., Kozlowski, L. T. (1980). The availability of low-nicotine cigarettes as a cause of cigarette smoking among teenage females (in Research Notes) <em>Journal of Health and Social Behavior, 21<\/em>(4),383-388.<\/li>\r\n \t<li>Stepney, R. (1980). Consumption of cigarettes of reduced tar and nicotine delivery. <em>Addiction, 75<\/em>(1), 81-88.<\/li>\r\n \t<li>Fisher, R. A. (1935). <em>Design of experiments.<\/em> New York: Hafner Press.<\/li>\r\n<\/ul>\r\n<\/footer>","rendered":"<div class=\"article-introduction\">\n<p>Modern science is often based on statements of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistical+significance\/pop\">statistical significance<\/a> and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a>. For example: 1) studies have shown that the probability of developing lung cancer is almost 20 times greater in cigarette smokers compared to nonsmokers (ACS, 2004); 2) there is a significant likelihood of a catastrophic meteorite impact on Earth sometime in the next 200,000 years (Bland, 2005); and 3) first-born male children exhibit IQ test scores that are 2.82 points higher than second-born males, a difference that is significant at the 95% <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/confidence+level\/pop\">confidence level<\/a> (Kristensen &amp; Bjerkedal, 2007). But why do scientists speak in terms that seem obscure? If cigarette smoking causes lung cancer, why not simply say so? If we should immediately establish a colony on the moon to escape extraterrestrial disaster, why not inform people? And if older children are smarter than their younger siblings, why not let them know?<\/p>\n<p>The reason is that none of these latter statements accurately reflects the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>. Scientific data rarely lead to absolute conclusions. Not all smokers die from lung cancer \u2013 some smokers decide to quit, thus reducing their risk, some smokers may die prematurely from cardiovascular or diseases other than lung cancer, and some smokers may simply never contract the disease. All data exhibit variability, and it is the role of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> to <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/quantify\/pop\">quantify<\/a> this variability and allow scientists to make more accurate statements about their data.<\/p>\n<figure class=\"centered\"><img decoding=\"async\" src=\"https:\/\/www.visionlearning.com\/img\/library\/modules\/mid155\/Image\/VLObject-4077-081104111140.jpg\" alt=\"dive\" \/><figcaption>Figure 1: The field of statistics has its roots in calculations of the probable outcomes of games of chance.<\/figcaption><\/figure>\n<p>A common misconception is that <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> provide a measure of proof that something is true, but they actually do no such thing. Instead, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> provide a measure of the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> of observing a certain result. This is a critical distinction. For example, the American Cancer Society has conducted several massive studies of cancer in an effort to make statements about the risks of the disease in US citizens. Cancer Prevention Study I enrolled approximately 1 million people between 1959 and 1960, and Cancer Prevention Study II was even larger, enrolling 1.2 million people in 1982. Both of these studies found much higher rates of lung cancer among cigarette smokers compared to nonsmokers, however, not all individuals who smoked contracted lung cancer (and, in fact, some nonsmokers did contract lung cancer). Thus, the development of lung cancer is a probability-based event, not a simple cause-and-effect relationship.<\/p>\n<p>Statistical techniques allow scientists to put numbers to this <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a>, moving from a statement like &#8220;If you smoke cigarettes, you are more likely to develop lung cancer&#8221; to the one that started this module: &#8220;The probability of developing lung cancer is almost 20 times greater in cigarette smokers compared to nonsmokers.&#8221; The quantification of probability offered by <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> is a powerful tool used widely throughout science, but it is frequently misunderstood.<\/p>\n<div class=\"comprehension-checkpoint\">\n<p class=\"leader\">Comprehension Checkpoint<\/p>\n<p class=\"question\">Statistics can<\/p>\n<form class=\"question\" action=\"action\" id=\"cc5754\">\n<ul class=\"quiz-options\">\n<li class=\"option-a\"><label class=\"choice\" for=\"q1-5754-0-option-a\">describe how much uncertainty there is in scientific results.<\/label><\/li>\n<li class=\"option-b\"><label class=\"choice\" for=\"q1-5754-1-option-b\">provide proof that something is true.<\/label><\/li>\n<\/ul>\n<\/form>\n<\/div>\n<\/div>\n<section id=\"toc_1\" class=\"article-section\">\n<h2>What is statistics?<\/h2>\n<p>The field of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> dates to 1654 when a French gambler, Antoine Gombaud, asked the noted mathematician and philosopher <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/Pascal%2C+Blaise\/pop\">Blaise Pascal<\/a> about how one should divide the stakes among players when a game of chance is interrupted prematurely. Pascal posed the question to the lawyer and mathematician Pierre de Fermat, and over a series of letters, Pascal and Fermat devised a mathematical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/system\/pop\">system<\/a> that not only answered Gombaud&#8217;s original question, but laid the foundations of modern <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/theory\/pop\">theory<\/a> and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a>.<\/p>\n<p>From its roots in gambling, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> has grown into a field of study that involves the development of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/method\/pop\">methods<\/a> and tests that are used to quantitatively define the variability <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/inherent\/pop\">inherent<\/a> in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>, the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> of certain <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/outcome\/pop\">outcomes<\/a>, and the error and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/uncertainty\/pop\">uncertainty<\/a> associated with those outcomes (see our <a href=\"https:\/\/www.visionlearning.com\/library\/module_viewer.php?mid=157\">Uncertainty, Error, and Confidence<\/a> module). As such, statistical methods are used extensively throughout the scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/process\/pop\">process<\/a>, from the design of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> questions through data <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a> and to the final <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/interpretation\/pop\">interpretation<\/a> of data.<\/p>\n<p>The specific statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/method\/pop\">methods<\/a> used vary widely between different scientific disciplines; however, the reasons that these tests and techniques are used are similar across disciplines. This module does not attempt to introduce the many different statistical concepts and tests that have been developed, but rather provides an overview of how various statistical methods are used in science. More information about specific statistical tests and methods can be found under the Resources tab.<\/p>\n<\/section>\n<section id=\"toc_2\" class=\"article-section\">\n<h2>Statistics in research design<\/h2>\n<p>Many people misinterpret statements of likelihood and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> as a sign of weakness or <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/uncertainty\/pop\">uncertainty<\/a> in scientific results. However, the use of statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/method\/pop\">methods<\/a> and probability tests in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> is an important aspect of science that adds strength and certainty to scientific conclusions. For example, in 1843, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/Lawes%2C+John+Bennet\/pop\">John Bennet Lawes<\/a>, an English entrepreneur, founded the Rothamsted Experimental Station in Hertfordshire, England to investigate the impact of fertilizer application on crop yield. Lawes was motivated to do so because he had established one of the first artificial fertilizer factories a year earlier. For the next 80 years, researchers at the Station conducted <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/experiment\/pop\">experiments<\/a> in which they applied fertilizers, planted different crops, kept track of the amount of rain that fell, and measured the size of the harvest at the end of each growing season.<\/p>\n<p>By the turn of the century, the Station had a vast collection of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> but few useful conclusions: One fertilizer would outperform another one year but underperform the next, certain fertilizers appeared to affect only certain crops, and the differing amounts of rainfall that fell each year continually confounded the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/experiment\/pop\">experiments<\/a> (Salsburg, 2001). The data were essentially useless because there were a large number of uncontrolled <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/variable\/pop\">variables<\/a>.<\/p>\n<figure class=\"centered\"><a title=\"Building at the Rothamsted Research Station\" href=\"https:\/\/www.visionlearning.com\/img\/library\/large_images\/image_4078.jpg\"> <img decoding=\"async\" src=\"https:\/\/www.visionlearning.com\/img\/library\/modules\/mid155\/Image\/VLObject-4078-081104111154.jpg\" alt=\"Building at the Rothamsted Research Station\" \/> <\/a><figcaption>Figure 2: A building at the Rothamsted Research Station<\/figcaption><\/figure>\n<p>In 1919, the Rothamsted Station hired a young statistician by the name of Ronald Aylmer Fisher to try to make some sense of the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>. Fisher&#8217;s statistical analyses suggested that the relationship between rainfall and plant growth was far more statistically significant than the relationship between fertilizer type and plant growth. But the agricultural scientists at the station weren&#8217;t out to test for weather \u2013 they wanted to know which fertilizers were most effective for which crops. No one could remove weather as a <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/variable\/pop\">variable<\/a> in the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/experiment\/pop\">experiments<\/a>, but Fisher realized that its effects could essentially be separated out if the experiments were designed appropriately.<\/p>\n<p>In order to share his insights with the scientific community, Fisher published two books: <em>Statistical Methods for Research Workers<\/em> in 1925 and <em>The Design of Experiments<\/em> in 1935. By highlighting the need to consider statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a> during the planning stages of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a>, Fisher revolutionized the practice of science and transformed the Rothamsted Station into a major center for research on <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> and agriculture, which it still is today.<\/p>\n<p>In <em>The Design of Experiments<\/em>, Fisher introduced several concepts that have become hallmarks of good scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a>, including the use of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/control\/pop\">controls<\/a>, randomization, and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/replication\/pop\">replication<\/a> (Figure 3).<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/www.visionlearning.com\/img\/library\/modules\/mid155\/Image\/VLObject-4258-081126041115.jpg\" alt=\"Fisher's Barley Treatment Plot Design\" \/><figcaption>Figure 3: An original figure from Fisher&#8217;s <em>The Design of Experiments<\/em> showing the arrangement of treatment groups and yields of barley in an experiment at the Rothamsted station in 1927 (Fisher, 1935). Letters in parentheses denote control plots not treated with fertilizer (<em>I<\/em>) or those treated with different fertilizers (<em>s<\/em> = sulfate of ammonia, <em>m<\/em> = chloride of ammonia, <em>c<\/em> = cyanamide, and <em>u<\/em> = urea) with or without the addition of superphosphate (<em>p<\/em>). Subscripted numbers in parentheses indicate relative quantities of fertilizer used. Numbers at the bottom of each block indicate the relative yield of barley from the plot.<\/figcaption><\/figure>\n<blockquote><p><b>Controls:<\/b> The use of controls is based on the concept of variability. Since any phenomenon has some measure of variability, controls allow the researcher to measure natural, random, or systematic variability in a similar system and use that estimate as a baseline for comparison to the observed variable or phenomenon. At Rothamsted, a control would be a crop that did not receive the application of fertilizer (see plots labeled <em>I<\/em> in Figure 3). The variability inherent in plant growth would still produce plants of varying heights and sizes. The control then could provide a measure of the impact that weather or other variables could have on crop growth independent of fertilizer application, thus allowing the researchers to statistically remove this as a factor.<\/p><\/blockquote>\n<blockquote><p><b>Randomization:<\/b> Statistical randomization helps to manage bias in scientific research. Unlike the common use of the word <em>random,<\/em> which implies haphazard or disorganized, statistical randomization is a precise procedure in which units being observed are assigned to a treatment or control group in a manner that takes into account the potential influence of confounding variables. This allows the researcher to quantify the influence of these confounding variables by observing them in both the control and treatment groups. For example, before Fisher, fertilizers were applied along different crop rows at Rothamsted, some of which fell entirely along the edge of fields. Yet edges are known to affect agricultural yield, and so it was difficult in many cases to distinguish edge effects from fertilizer effects \u2013 the edge effects would be considered a confounding variable. Fisher introduced a process of randomly assigning different fertilizers to different plots within a field in a single year while assuring that not all of the treatment (or control) plots for any particular fertilizer fell along the edge of the field (see Figure 3).<\/p><\/blockquote>\n<blockquote><p><b>Replication:<\/b> Fisher also advocated for replicating experimental trials and measurements. This way the range of variability inherently associated with the experiment or measurement could be quantified and the robustness of the results could be evaluated. At Rothamsted this meant planting multiple plots with the same crop and applying the same fertilizer to each of those plots (see Figure 3). Further, this meant repeating similar applications in different years so that the variability of different fertilizer applications as a function of different weather conditions could be quantified.<\/p><\/blockquote>\n<p>In general, scientists design <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> studies based on the nature of the question they are seeking to investigate, but they refine their research plan in line with many of Fisher&#8217;s statistical concepts to increase the likelihood that their findings will be useful. The incorporation of these techniques facilitates the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a> and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/interpretation\/pop\">interpretation<\/a> of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>, another place where <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> are used.<\/p>\n<div class=\"comprehension-checkpoint\">\n<p class=\"leader\">Comprehension Checkpoint<\/p>\n<p class=\"question\"><em>Statistical randomization<\/em> is a term that scientists apply to research that does not follow a set procedure.<\/p>\n<form class=\"question\" action=\"action\" id=\"cc5755\">\n<ul class=\"quiz-options\">\n<li class=\"option-a\"><label class=\"choice\" for=\"q1-5755-0-option-a\">True<\/label><\/li>\n<li class=\"option-b\"><label class=\"choice\" for=\"q1-5755-1-option-b\">False<\/label><\/li>\n<\/ul>\n<\/form>\n<\/div>\n<\/section>\n<section id=\"toc_3\" class=\"article-section\">\n<h2>Statistics in data analysis<\/h2>\n<p>A multitude of statistical techniques have been developed for <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a>, but they generally fall into two groups: descriptive and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/inferential\/pop\">inferential<\/a>.<\/p>\n<blockquote><p><b>Descriptive Statistics:<\/b> Descriptive statistics allow a scientist to quickly sum up major attributes of a dataset using measures such as the mean, median, and standard deviation. These measures provide a general sense of the group being studied, allowing scientists to place the study within a larger context. For example, Cancer Prevention Study I (CPS-I) was a prospective mortality study initiated in 1959 as mentioned earlier. Researchers conducting the study reported the age and demographics of participants, among other variables, to allow a comparison between the study group and the broader population of the United States at the time. Adults participating in the study ranged from 30 to 108 years of age, with the median age reported as 52 years. The study subjects were 57% female, 97% white, and 2% black. By comparison, median age in the United States in 1959 was 29.4 years of age, obviously much younger than the study group since CPS-I did not enroll anyone under 30 years of age. Further, 51% of US residents were female in 1960, 89% white, and about 11% black. One recognized shortcoming of CPS I, easily identifiable from the descriptive statistics, was that with 97% participants categorized as white, the study did not adequately assess disease profiles in minority groups of the US.<\/p><\/blockquote>\n<blockquote><p><b>Inferential Statistics:<\/b> Inferential statistics are used to model patterns in data, make judgments about data, identify relationships between variables in datasets, and make inferences about larger populations based on smaller samples of data. It is important to keep in mind that from a statistical perspective, the word &#8220;population&#8221; does not have to mean a group of people as it does in common language. A statistical population is the larger group that a dataset is used to make inferences about \u2013 this can be a group of people, corn plants, meteor impacts, oil field locations, or any other group of measurements as the case may be.<\/p><\/blockquote>\n<blockquote><p>Transferring results from small sample sizes to large populations is especially important with respect to scientific studies. For example, while Cancer Prevention Studies I and II enrolled approximately 1 million and 1.2 million people, respectively, they represented a small fraction of the 179 and 226 million people who were living in the United States in 1960 and 1980. Common inferential techniques include regression, correlation, and point estimation\/testing. For example, Petter Kristensen and Tor Bjerkedal (2007) examined IQ test scores in a group of 250,000 male Norwegian military personnel. Their analyses suggested that first-born male children had an average IQ test score 2.82 \u00b1 0.07 points higher than second-born male children, a statistically significant difference at the 95% confidence level (Kristensen &amp; Bjerkedal, 2007).<\/p><\/blockquote>\n<p>The phrase &#8220;statistically significant&#8221; is a key concept in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a>, and it is commonly misunderstood. Many people assume that, like the common use of the word <em>significant<\/em>, calling a result statistically significant means that the result is important or momentous, but this is not the case. Instead, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistical+significance\/pop\">statistical significance<\/a> is an estimate of the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> that the observed association or difference is due to chance rather than any real association. In other words, tests of statistical significance describe the likelihood that an observed association or difference would be seen even if there were no real association or difference actually present. The measure of significance is often expressed in terms of confidence, which has the same meaning in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> as it does in common language, but can be quantified.<\/p>\n<p>In Kristensen and Bjerkedal&#8217;s work, for example, the IQ difference between first- and second-born men was found to be significant at a 95% <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/confidence+level\/pop\">confidence level<\/a>, meaning that there is only a 5% <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> that the IQ difference is due purely to chance. This does not mean that the difference is large or even important: 2.82 IQ points is a tiny blip on the IQ scale and hardly enough to declare first-borns geniuses in relation to their younger siblings. Nor do the findings imply that the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/outcome\/pop\">outcome<\/a> is 95% &#8220;correct.&#8221; Instead, they indicate that the observed difference is not due simply to random <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/sampling\/pop\">sampling<\/a> bias and that there is a 95% probability the same results would be seen again if another researcher conducted a similar study in a different <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/population\/pop\">population<\/a> of Norwegian men. A second-born Norwegian who has a higher IQ than his older brother does not disprove the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> \u2013 it is just a statistically less likely outcome.<\/p>\n<p>Just as revealing as a statistically significant difference or relationship, is the lack of a <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistical+significance\/pop\">statistical significance<\/a> difference. For example, researchers have found that the risks of dying from heart disease in men who have quit smoking for at least two years is not significantly different from the risk of the disease in male nonsmokers (Rosenberg et al., 1985). So, the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> show that while smokers have a significantly higher rate of heart disease than nonsmokers, this risk falls back to baseline within just two years after having quit smoking.<\/p>\n<div class=\"comprehension-checkpoint\">\n<p class=\"leader\">Comprehension Checkpoint<\/p>\n<p class=\"question\">If a result is <em>statistically significant<\/em>, it means that the result is likely<\/p>\n<form class=\"question\" action=\"action\" id=\"cc5756\">\n<ul class=\"quiz-options\">\n<li class=\"option-a\"><label class=\"choice\" for=\"q1-5756-0-option-a\">due to a pattern or trend as opposed to a random error.<\/label><\/li>\n<li class=\"option-b\"><label class=\"choice\" for=\"q1-5756-1-option-b\">of crucial importance in the scientific community.<\/label><\/li>\n<\/ul>\n<\/form>\n<\/div>\n<\/section>\n<section id=\"toc_4\" class=\"article-section\">\n<h2>Limitations, misconceptions, and the misuse of statistics<\/h2>\n<p>Given the wide variety of possible statistical tests, it is easy to misuse <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a>, often to the point of deception. One reason for this is that <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> do not address <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/systematic+error\/pop\">systematic error<\/a> that can be introduced into a study either intentionally or accidentally. For example, in one of the first studies that reported on the effects of quitting smoking, E. Cuyler Hammond and Daniel Horn found that individuals who smoked more than one pack of cigarettes a day but had quit smoking within the past year had a death rate of 198.0, significantly higher than the rate of 157.1 for individuals who were still smoking more than one pack a day at the time of their study (Hammond &amp; Horn, 1958). Without a proper understanding of the study, one might conclude from the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistics\/pop\">statistics<\/a> that quitting smoking is actually dangerous for heavy smokers. However, Hammond later offers an explanation for this finding when he says, &#8220;This is not surprising in light of the fact that recent ex-smokers, as a group, are heavily weighted with men in ill health&#8221; (Hammond, 1965). Thus, heavy smokers who had stopped smoking included many individuals who had quit because they were already diagnosed with an illness, thus adding systematic error to the sample set. Without a complete understanding of these facts, the statistics alone could be misinterpreted.<\/p>\n<p>The most effective use of <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a>, then, is to identify trends and features within a <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/dataset\/pop\">dataset<\/a>. These trends can then be interpreted by the researcher in light of his or her understanding of their scientific basis, possibly opening up opportunities for further study. Andrew Lang, a Scottish poet and novelist, famously summed up this aspect of statistical testing when he stated, &#8220;An unsophisticated forecaster uses <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> as a drunken man uses lamp-posts \u2013 for support rather than for illumination.&#8221;<\/p>\n<p>Another misconception of statistical testing is that statistical relationships and associations prove causation. In reality, identification of a <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/correlation\/pop\">correlation<\/a> or association between <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/variable\/pop\">variables<\/a> does not mean that a change in one variable actually caused the change in another variable. For example, in 1950 Richard Doll and Austin Hill, British researchers who became known for conducting one of the first scientifically valid <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/comparative\/pop\">comparative<\/a> studies (see our <a href=\"https:\/\/www.visionlearning.com\/library\/module_viewer.php?mid=152\">Comparison in Research<\/a> module) of smoking and the development of lung cancer, famously wrote about the correlation they uncovered:<\/p>\n<blockquote><p>This is not necessarily to state that smoking causes carcinoma of the lung. The association would occur if carcinoma of the lung caused people to smoke or if both attributes were end-effects of a common cause. (Doll &amp; Hill, 1950)<\/p><\/blockquote>\n<p>Doll and Hill went on to discuss the scientific basis of the <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/correlation\/pop\">correlation<\/a> and the fact that the habit of smoking preceded the development of lung cancer in all of their study <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/subject\/pop\">subjects<\/a>, leading them to conclude &#8220;&#8230;that smoking is a factor, and an important factor, in the production of carcinoma of the lung.&#8221; As multiple lines of scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/evidence\/pop\">evidence<\/a> have accumulated regarding the association between smoking and lung cancer, scientists are now able to make very accurate statements about the statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> of risk associated with smoking cigarettes.<\/p>\n<figure><a title=\"cigarette\" href=\"https:\/\/www.visionlearning.com\/img\/library\/large_images\/image_4079.jpg\"> <img decoding=\"async\" src=\"https:\/\/www.visionlearning.com\/img\/library\/modules\/mid155\/Image\/VLObject-4079-081104121109.jpg\" alt=\"cigarette\" \/> <\/a><figcaption>Figure 4: Filtered and low tar cigarettes were advertised as less dangerous based on hollow statistics. <span class=\"credit\">image \u00a9 Tomasz Sienicki<\/span><\/figcaption><\/figure>\n<p>While <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> help uncover patterns, relationships, and variability in <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a>, they can unfortunately be used to misrepresent data, relationships, and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/interpretation\/pop\">interpretations<\/a>. For example, in the late 1950s, in light of the mounting <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/comparative\/pop\">comparative<\/a> studies that demonstrated a causative relationship between cigarette smoking and lung cancer, the major tobacco companies began to investigate the viability of marketing alternative products that they could promote as &#8220;healthier&#8221; than regular cigarettes. As a result, filtered and light cigarettes were developed. The tobacco industry then sponsored and widely advertised <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> that suggested that the common <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/cellulose\/pop\">cellulose<\/a> acetate filter reduced tar in regular cigarettes by 42-46% and nicotine by 19-35%. Marlboro<sup>\u00ae<\/sup> filtered cigarettes claimed to have &#8220;22 percent less tar, 34 percent less nicotine&#8221; than other brands. The tobacco industry launched a similar advertising campaign promoting low tar cigarettes (6 to 12 mg tar compared to 12 to 16 mg in &#8220;regular&#8221; cigarettes) and ultra low tar cigarettes (under 6 mg) (Glantz et al., 1996).<\/p>\n<p>While the industry flooded the public with <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> on tar content, the tobacco companies did not advertise the fact that there was no <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a> to indicate that tar or nicotine were the causative agents in the development of smoking-induced lung cancer. In fact, several research studies showed that the risks associated with low tar products were no different than regular products, and worse still, some studies showed that &#8220;low tar&#8221; cigarettes led to increased consumption of cigarettes among smokers (Stepney, 1980; NCI, 2001). Thus hollow <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/statistic\/pop\">statistics<\/a> were used to mislead the public and detract from the real issue.<\/p>\n<div class=\"comprehension-checkpoint\">\n<p class=\"leader\">Comprehension Checkpoint<\/p>\n<p class=\"question\">If there is a statistical correlation between two events or variables, this means that one event <em>causes<\/em> the other.<\/p>\n<form class=\"question\" action=\"action\" id=\"cc5757\">\n<ul class=\"quiz-options\">\n<li class=\"option-a\"><label class=\"choice\" for=\"q1-5757-0-option-a\">True<\/label><\/li>\n<li class=\"option-b\"><label class=\"choice\" for=\"q1-5757-1-option-b\">False<\/label><\/li>\n<\/ul>\n<\/form>\n<\/div>\n<\/section>\n<section id=\"toc_5\" class=\"article-section\">\n<h2>Statistics and scientific research<\/h2>\n<p>All measurements contain some <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/uncertainty\/pop\">uncertainty<\/a> and error, and statistical <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/method\/pop\">methods<\/a> help us <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/quantify\/pop\">quantify<\/a> and characterize this uncertainty. This helps explain why scientists often speak in qualified statements. For example, no <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/seismologist\/pop\">seismologist<\/a> who studies <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/earthquake\/pop\">earthquakes<\/a> would be willing to tell you exactly when an earthquake is going to occur; instead, the US Geological Survey issues statements like this: &#8220;There is &#8230; a 62% <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/probability\/pop\">probability<\/a> of at least one <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/magnitude\/pop\">magnitude<\/a> 6.7 or greater earthquake in the 3-decade interval 2003-2032 within the San Francisco Bay Region&#8221; (USGS, 2007). This may sound ambiguous, but it is in fact a very precise, mathematically-derived description of how confident seismologists are that a major earthquake will occur, and open reporting of error and uncertainty is a hallmark of quality scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/research\/pop\">research<\/a>.<\/p>\n<p>Today, science and statistical analyses have become so intertwined that many scientific disciplines have developed their own subsets of statistical techniques and terminology. For example, the field of biostatistics (sometimes referred to as biometry) involves the application of specific statistical techniques to disciplines in biology such as <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/population\/pop\">population<\/a> genetics, <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/epidemiology\/pop\">epidemiology<\/a>, and public health. The field of geostatistics has evolved to develop specialized spatial <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/analysis\/pop\">analysis<\/a> techniques that help geologists map the location of petroleum and <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/mineral\/pop\">mineral<\/a> deposits; these spatial analysis techniques have also helped Starbuck&#8217;s<sup>\u00ae<\/sup> determine the ideal distribution of coffee shops based on maximizing the number of customers visiting each store. Used correctly, statistical analysis goes well beyond finding the next oil field or cup of coffee to illuminating scientific <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/data\/pop\">data<\/a> in a way that helps <a class=\"term\" title=\"\" href=\"https:\/\/www.visionlearning.com\/en\/glossary\/view\/validate\/pop\">validate<\/a> scientific knowledge.<\/p>\n<\/section>\n<section id=\"toc-999\" class=\"article-section\">\n<h3>Summary<\/h3>\n<p>Scientific research rarely leads to absolute certainty. There is some degree of uncertainty in all conclusions, and statistics allow us to discuss that uncertainty. Statistical methods are used in all areas of science. The module explores the difference between (a) proving that something is true and (b) measuring the probability of getting a certain result. It explains how common words like &#8220;significant,&#8221; &#8220;control,&#8221; and &#8220;random&#8221; have a different meaning in the field of statistics than in everyday life.<\/p>\n<h3>Key Concepts<\/h3>\n<ul class=\"bulleted\">\n<li>Statistics are used to describe the variability inherent in data in a quantitative fashion, and to quantify relationships between variables.<\/li>\n<li>Statistical analysis is used in designing scientific studies to increase consistency, measure uncertainty, and produce robust datasets.<\/li>\n<li>There are a number of misconceptions that surround statistics, including confusion between statistical terms and the common language use of similar terms, and the role that statistics employ in data analysis.<\/li>\n<\/ul>\n<\/section>\n<footer>\n<ul class=\"indented links\">\n<li>\n<h5>Further Reading<\/h5>\n<\/li>\n<li><a href=\"https:\/\/www.visionlearning.com\/en\/library\/Process-of-Science\/49\/Using-Graphs-and-Visual-Data-in-Science\/156\">Using Graphs and Visual Data in Science<\/a><\/li>\n<li><a href=\"https:\/\/www.visionlearning.com\/en\/library\/Process-of-Science\/49\/Uncertainty-Error-and-Confidence\/157\">Uncertainty, Error, and Confidence<\/a><\/li>\n<li><a href=\"https:\/\/www.visionlearning.com\/en\/library\/Process-of-Science\/49\/Data-Analysis-and-Interpretation\/154\">Data Analysis and Interpretation<\/a><\/li>\n<li><a href=\"https:\/\/www.visionlearning.com\/en\/library\/Math-in-Science\/62\/Introduction-to-Descriptive-Statistics\/218\">Introduction to Descriptive Statistics<\/a><\/li>\n<\/ul>\n<p><a name=\"refs\" id=\"refs\"><\/a><\/p>\n<ul class=\"indented list\">\n<li>\n<h5>References<\/h5>\n<\/li>\n<li>ACS (American Cancer Society). (2004). <em>Cancer facts &amp; figures &#8211; 2004.<\/em> Atlanta, GA: American Cancer Society.<\/li>\n<li>ACS (American Cancer Society). (2007). <a href=\"http:\/\/www.cancer.org\/docroot\/RES\/content\/RES_6_2_Study_Overviews.asp\">Cancer prevention studies overview<\/a>. Atlanta, GA: American Cancer Society.<\/li>\n<li>ACS (American Cancer Society). (2008). <em><a href=\"http:\/\/www.cancer.org\/docroot\/RES\/content\/RES_6_1_Characteristics_of_American_Cancer_Society_cohorts.asp?sitearea=&amp;level=\">Characteristics of American Cancer Society cohorts<\/a>.<\/em> Atlanta, GA: American Cancer Society. Retrieved July 18, 2008.<\/li>\n<li>Bland, P. A. (2005). The impact rate on Earth. <em>Philosophical Transactions of the Royal Society A, 363,<\/em> 2793-2810.<\/li>\n<li>Cohen, J. (1988). <em>Statistical power analysis for the behavioral sciences<\/em> (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.<\/li>\n<li>Doll, R., &amp; Hill, A. B. (1950). Smoking and carcinoma of the lung. <em>British Medical Journal 2<\/em>(4682), 739-748.<\/li>\n<li>Fisher, R. A. (1935). <em>The design of experiments.<\/em> Oxford: Oxford University Press.<\/li>\n<li>Glantz, S. A., Slade, J., Bero, L. A., Hanauer, P., &amp; Barnes, D. E. (1996). <em>The cigarette papers.<\/em> Berkeley, CA: University of California Press.<\/li>\n<li>Hamilton, W. L., Norton, G. d., Ouellette, T. K., Rhodes, W. M., Kling, R., &amp; Connolly, G. N. (2004). Smokers&#8217; responses to advertisements for regular and light cigarettes and potential reduced-exposure tobacco products. <em>Nicotine &amp; Tobacco Research, 6<\/em>(Supp. 3), S353-S362.<\/li>\n<li>Hammond, E. C., &amp; Horn, D. (1958). Smoking and death rates: Report on forty-four months of follow-up of 187,783 men. 2. Death rates by cause.<em> Journal of the American Medical Association, 166<\/em>(11), 1294-308.<\/li>\n<li>Hammond, E. C. (1965). Evidence of the effects of giving up cigarette smoking. <em>American Journal of Public Health, 55,<\/em> 682-691.<\/li>\n<li>Kristensen, P., &amp; Bjerkedal, T. (2007). Explaining the relation between birth order and intelligence. <em>Science 316<\/em>(5832), 1717.<\/li>\n<li>National Center for Health Statistics. (2006).<em> Health, United States, 2006<\/em>. NCHS, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services.<\/li>\n<li>NCI &#8211; National Cancer Institute. (2001). <em><a href=\"http:\/\/dccps.cancer.gov\/TCRB\/monographs\/13\/\">Monograph 13: Risks associated with smoking cigarettes with low tar machine-measured yields of tar and nicotine<\/a><\/em>. NCI, Tobacco Control Research, Document M914.<\/li>\n<li>Rosenberg, L., Kaufman, D. W., Helmrich, S. P., Shapiro, S. (1985). The risk of myocardial infarction after quitting smoking in men under 55 years of age. <em>New England Journal of Medicine, 313,<\/em> 1511-1514.<\/li>\n<li>Salsburg, D. (2001). <em>The lady tasting tea: How statistics revolutionized science in the twentieth century.<\/em> New York: W. H. Freeman &amp; Company.<\/li>\n<li>Silverstein, B., Feld, S., Kozlowski, L. T. (1980). The availability of low-nicotine cigarettes as a cause of cigarette smoking among teenage females (in Research Notes) <em>Journal of Health and Social Behavior, 21<\/em>(4),383-388.<\/li>\n<li>Stepney, R. (1980). Consumption of cigarettes of reduced tar and nicotine delivery. <em>Addiction, 75<\/em>(1), 81-88.<\/li>\n<li>Fisher, R. A. (1935). <em>Design of experiments.<\/em> New York: Hafner Press.<\/li>\n<\/ul>\n<\/footer>\n","protected":false},"author":51812,"menu_order":9,"template":"","meta":{"_candela_citation":"[]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-91","chapter","type-chapter","status-publish","hentry"],"part":49,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/pressbooks\/v2\/chapters\/91","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/wp\/v2\/users\/51812"}],"version-history":[{"count":1,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/pressbooks\/v2\/chapters\/91\/revisions"}],"predecessor-version":[{"id":92,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/pressbooks\/v2\/chapters\/91\/revisions\/92"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/pressbooks\/v2\/parts\/49"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/pressbooks\/v2\/chapters\/91\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/wp\/v2\/media?parent=91"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/pressbooks\/v2\/chapter-type?post=91"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/wp\/v2\/contributor?post=91"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-hccc-generalscience\/wp-json\/wp\/v2\/license?post=91"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}