{"id":184,"date":"2017-04-15T03:18:49","date_gmt":"2017-04-15T03:18:49","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/conceptstest1\/chapter\/causation-and-lurking-variables-2-of-2\/"},"modified":"2017-06-05T05:52:28","modified_gmt":"2017-06-05T05:52:28","slug":"causation-and-lurking-variables-2-of-2","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/chapter\/causation-and-lurking-variables-2-of-2\/","title":{"raw":"Causation and Lurking Variables (2 of 2)","rendered":"Causation and Lurking Variables (2 of 2)"},"content":{"raw":"&nbsp;\r\n<div class=\"textbox learning-objectives\">\r\n<h3>Learning Objectives<\/h3>\r\n<ul>\r\n \t<li>Distinguish between association and causation. Identify lurking variables that may explain an observed relationship.<\/li>\r\n<\/ul>\r\n<\/div>\r\nIn the next example, we investigate a subtle point about the confusion between association and causation. In this example, a cause-and-effect connection is logical but not justified by an observed association in a single study.\r\n<div class=\"textbox examples\">\r\n<h3>Example<\/h3>\r\n<h2>Smoking and Lung Cancer<\/h2>\r\nIn this data, <em>x<\/em> = cigarette consumption per capita in the United States, and <em>y<\/em> = lung cancers per 100,000. To investigate the connection between cigarette consumption and lung cancers, the data is offset by 30 years because cancer takes time to develop. For example, cigarette consumption in 1945 is paired with cancer rates for 1975.\r\n\r\n<img class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031848\/m3_examining_relationships_topic_3_1_scatter_linear_corr_causation_1.gif\" alt=\"Scatterplot correlating cigarette smoking with lung cancer\" width=\"401\" height=\"264\" \/>&nbsp;\r\n\r\nIn the scatterplot, we see a fairly strong positive correlation.\r\n\r\nCan we conclude from this data that cigarette smoking causes lung cancer? The answer is no.\r\n\r\nThe data comes from an observational study. Recall from our previous discussions in Module 1 that we can draw cause-and-effect conclusions only from randomized comparative experiments. From this study, we can say that cigarette smoking is <strong>associated<\/strong> with lung cancer. We can also say that cigarette smoking <strong>correlates<\/strong> with lung cancer. We <em>cannot<\/em> say that cigarette smoking <strong>causes<\/strong> lung cancer.\r\n\r\nYet the National Cancer Institute\u2019s website states that \u201ccigarette smoking causes many types of cancer, including cancers of the lung\u201d (<a href=\"http:\/\/www.cancer.gov\/cancertopics\/factsheet\/Tobacco\/cessation\/\" target=\"_blank\" rel=\"noopener noreferrer\">National Cancer Institute<\/a>).\r\n\r\nHow can this be? Did the National Cancer Institute conduct a randomized comparative experiment to establish this cause-and-effect relationship? Of course not. We cannot randomly assign people to smoke or not smoke. All of the studies linking smoking with cancer are observational studies. Alone, each study can show only an association.\r\n\r\nSo is it possible to draw a causal link between cigarette consumption and cancer rates? The answer is yes, well sort of. In practice, researchers use criteria such as the following to provide evidence of a causal connection from observational studies:\r\n<ul>\r\n \t<li>There is a reasonable explanation for how one variable might cause the other.<\/li>\r\n \t<li>The association is seen in repeated studies under varying conditions.<\/li>\r\n \t<li>The effects of potential lurking variables are ruled out when we look across studies.<\/li>\r\n<\/ul>\r\n<\/div>\r\nThe point of the previous example is again that association does not imply causation. But researchers can use an <em>observed association as the first step in building a case for causation.<\/em>\r\n\r\nThis point is subtle but important. When experiments cannot be conducted, it can be difficult and controversial to explain an observed association between two variables. Many of the current disputes involving data and statistics involve questions of causation that we cannot investigate through an experiment. Does the death penalty reduce violent crime? Does cell phone use cause brain tumors? Does pollution cause global warming? All of these questions imply a cause-and-effect relationship in situations that are complex and involve many interacting variables. In these situations, a single observational study cannot establish a causal link between two variables. But researchers can use the observed association as a first step in building a case for causation.\r\n<div class=\"textbox exercises\">\r\n<h3>Learn By Doing<\/h3>\r\nhttps:\/\/assessments.lumenlearning.com\/assessments\/3853\r\n\r\n<\/div>\r\n&nbsp;\r\n\r\n&nbsp;\r\n<h3><strong>Let\u2019s Summarize<\/strong><\/h3>\r\n<ul>\r\n \t<li>The relationship between two quantitative variables is visually displayed using the <em>scatterplot<\/em>, where each point represents an individual. We always plot the explanatory variable on the horizontal axis and the response variable on the vertical axis.<\/li>\r\n \t<li>When we explore a relationship using the scatterplot, we should describe the <em>overall pattern<\/em> of the relationship and any <em>deviations<\/em> from that pattern. To describe the overall pattern, consider the <em>direction<\/em>, <em>form<\/em>, and <em>strength<\/em> of the relationship.<\/li>\r\n \t<li>Adding labels to the scatterplot that indicate different groups or categories within the data might help us gain more insight about the relationship we are exploring.<\/li>\r\n \t<li>A special case of the relationship between two quantitative variables is the <em>linear <\/em>relationship. In this case, a straight line simply and adequately summarizes the relationship.<\/li>\r\n \t<li>When the scatterplot displays a linear relationship, we supplement it with the <em>correlation coefficient (r)<\/em>, which measures the <em>strength<\/em> and the <em>direction<\/em> of a linear relationship between two quantitative variables. The correlation ranges between -1 and 1. Values near -1 indicate a strong negative linear relationship. Values near 0 can indicate a weak or no linear relationship. Values near 1 indicate a strong positive linear relationship. Remember, we use the correlation coefficient only <em>after<\/em> we have looked at the data and observed that there is a linear relationship. If you have no information about what the data actually looks like, then you should not use the correlation coefficient in your analysis.<\/li>\r\n \t<li>The correlation is an appropriate numerical measure only for linear relationships, and it is sensitive to outliers. Therefore, the correlation should be used only as a supplement to a scatterplot (after we look at the data).<\/li>\r\n \t<li>A <em>lurking variable<\/em> is a variable that is not measured in the study. It is a third variable that is neither the explanatory nor the response variable, but it affects your interpretation of the relationship between the explanatory and response variable.<\/li>\r\n \t<li><em>Association does not imply causation.<\/em> Do not interpret a high correlation between explanatory and response variables as a cause-and-effect relationship.<\/li>\r\n \t<li>An observational study alone cannot establish a causal connection between explanatory and response variables. To establish a cause-and-effect relationship, researchers must conduct a comparative randomized experiment. In reality, it is often impossible to conduct an experiment. So observational studies that show an association between two variables can be used as a first step in building a case for causation.<\/li>\r\n<\/ul>\r\n<h3><\/h3>","rendered":"<p>&nbsp;<\/p>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<ul>\n<li>Distinguish between association and causation. Identify lurking variables that may explain an observed relationship.<\/li>\n<\/ul>\n<\/div>\n<p>In the next example, we investigate a subtle point about the confusion between association and causation. In this example, a cause-and-effect connection is logical but not justified by an observed association in a single study.<\/p>\n<div class=\"textbox examples\">\n<h3>Example<\/h3>\n<h2>Smoking and Lung Cancer<\/h2>\n<p>In this data, <em>x<\/em> = cigarette consumption per capita in the United States, and <em>y<\/em> = lung cancers per 100,000. To investigate the connection between cigarette consumption and lung cancers, the data is offset by 30 years because cancer takes time to develop. For example, cigarette consumption in 1945 is paired with cancer rates for 1975.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/1729\/2017\/04\/15031848\/m3_examining_relationships_topic_3_1_scatter_linear_corr_causation_1.gif\" alt=\"Scatterplot correlating cigarette smoking with lung cancer\" width=\"401\" height=\"264\" \/>&nbsp;<\/p>\n<p>In the scatterplot, we see a fairly strong positive correlation.<\/p>\n<p>Can we conclude from this data that cigarette smoking causes lung cancer? The answer is no.<\/p>\n<p>The data comes from an observational study. Recall from our previous discussions in Module 1 that we can draw cause-and-effect conclusions only from randomized comparative experiments. From this study, we can say that cigarette smoking is <strong>associated<\/strong> with lung cancer. We can also say that cigarette smoking <strong>correlates<\/strong> with lung cancer. We <em>cannot<\/em> say that cigarette smoking <strong>causes<\/strong> lung cancer.<\/p>\n<p>Yet the National Cancer Institute\u2019s website states that \u201ccigarette smoking causes many types of cancer, including cancers of the lung\u201d (<a href=\"http:\/\/www.cancer.gov\/cancertopics\/factsheet\/Tobacco\/cessation\/\" target=\"_blank\" rel=\"noopener noreferrer\">National Cancer Institute<\/a>).<\/p>\n<p>How can this be? Did the National Cancer Institute conduct a randomized comparative experiment to establish this cause-and-effect relationship? Of course not. We cannot randomly assign people to smoke or not smoke. All of the studies linking smoking with cancer are observational studies. Alone, each study can show only an association.<\/p>\n<p>So is it possible to draw a causal link between cigarette consumption and cancer rates? The answer is yes, well sort of. In practice, researchers use criteria such as the following to provide evidence of a causal connection from observational studies:<\/p>\n<ul>\n<li>There is a reasonable explanation for how one variable might cause the other.<\/li>\n<li>The association is seen in repeated studies under varying conditions.<\/li>\n<li>The effects of potential lurking variables are ruled out when we look across studies.<\/li>\n<\/ul>\n<\/div>\n<p>The point of the previous example is again that association does not imply causation. But researchers can use an <em>observed association as the first step in building a case for causation.<\/em><\/p>\n<p>This point is subtle but important. When experiments cannot be conducted, it can be difficult and controversial to explain an observed association between two variables. Many of the current disputes involving data and statistics involve questions of causation that we cannot investigate through an experiment. Does the death penalty reduce violent crime? Does cell phone use cause brain tumors? Does pollution cause global warming? All of these questions imply a cause-and-effect relationship in situations that are complex and involve many interacting variables. In these situations, a single observational study cannot establish a causal link between two variables. But researchers can use the observed association as a first step in building a case for causation.<\/p>\n<div class=\"textbox exercises\">\n<h3>Learn By Doing<\/h3>\n<p>\t<iframe id=\"lumen_assessment_3853\" class=\"resizable\" src=\"https:\/\/assessments.lumenlearning.com\/assessments\/load?assessment_id=3853&#38;embed=1&#38;external_user_id=&#38;external_context_id=&#38;iframe_resize_id=lumen_assessment_3853\" frameborder=\"0\" style=\"border:none;width:100%;height:100%;min-height:400px;\"><br \/>\n\t<\/iframe><\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h3><strong>Let\u2019s Summarize<\/strong><\/h3>\n<ul>\n<li>The relationship between two quantitative variables is visually displayed using the <em>scatterplot<\/em>, where each point represents an individual. We always plot the explanatory variable on the horizontal axis and the response variable on the vertical axis.<\/li>\n<li>When we explore a relationship using the scatterplot, we should describe the <em>overall pattern<\/em> of the relationship and any <em>deviations<\/em> from that pattern. To describe the overall pattern, consider the <em>direction<\/em>, <em>form<\/em>, and <em>strength<\/em> of the relationship.<\/li>\n<li>Adding labels to the scatterplot that indicate different groups or categories within the data might help us gain more insight about the relationship we are exploring.<\/li>\n<li>A special case of the relationship between two quantitative variables is the <em>linear <\/em>relationship. In this case, a straight line simply and adequately summarizes the relationship.<\/li>\n<li>When the scatterplot displays a linear relationship, we supplement it with the <em>correlation coefficient (r)<\/em>, which measures the <em>strength<\/em> and the <em>direction<\/em> of a linear relationship between two quantitative variables. The correlation ranges between -1 and 1. Values near -1 indicate a strong negative linear relationship. Values near 0 can indicate a weak or no linear relationship. Values near 1 indicate a strong positive linear relationship. Remember, we use the correlation coefficient only <em>after<\/em> we have looked at the data and observed that there is a linear relationship. If you have no information about what the data actually looks like, then you should not use the correlation coefficient in your analysis.<\/li>\n<li>The correlation is an appropriate numerical measure only for linear relationships, and it is sensitive to outliers. Therefore, the correlation should be used only as a supplement to a scatterplot (after we look at the data).<\/li>\n<li>A <em>lurking variable<\/em> is a variable that is not measured in the study. It is a third variable that is neither the explanatory nor the response variable, but it affects your interpretation of the relationship between the explanatory and response variable.<\/li>\n<li><em>Association does not imply causation.<\/em> Do not interpret a high correlation between explanatory and response variables as a cause-and-effect relationship.<\/li>\n<li>An observational study alone cannot establish a causal connection between explanatory and response variables. To establish a cause-and-effect relationship, researchers must conduct a comparative randomized experiment. In reality, it is often impossible to conduct an experiment. So observational studies that show an association between two variables can be used as a first step in building a case for causation.<\/li>\n<\/ul>\n<h3><\/h3>\n\n\t\t\t <section class=\"citations-section\" role=\"contentinfo\">\n\t\t\t <h3>Candela Citations<\/h3>\n\t\t\t\t\t <div>\n\t\t\t\t\t\t <div id=\"citation-list-184\">\n\t\t\t\t\t\t\t <div class=\"licensing\"><div class=\"license-attribution-dropdown-subheading\">CC licensed content, Shared previously<\/div><ul class=\"citation-list\"><li>Concepts in Statistics. <strong>Provided by<\/strong>: Open Learning Initiative. <strong>Located at<\/strong>: <a target=\"_blank\" href=\"http:\/\/oli.cmu.edu\">http:\/\/oli.cmu.edu<\/a>. <strong>License<\/strong>: <em><a target=\"_blank\" rel=\"license\" href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\">CC BY: Attribution<\/a><\/em><\/li><\/ul><\/div>\n\t\t\t\t\t\t <\/div>\n\t\t\t\t\t <\/div>\n\t\t\t <\/section>","protected":false},"author":163,"menu_order":13,"template":"","meta":{"_candela_citation":"[{\"type\":\"cc\",\"description\":\"Concepts in Statistics\",\"author\":\"\",\"organization\":\"Open Learning Initiative\",\"url\":\"http:\/\/oli.cmu.edu\",\"project\":\"\",\"license\":\"cc-by\",\"license_terms\":\"\"}]","CANDELA_OUTCOMES_GUID":"465d484a-d81a-4a00-9f26-d684ab10add2","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-184","chapter","type-chapter","status-publish","hentry"],"part":140,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/184","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/wp\/v2\/users\/163"}],"version-history":[{"count":4,"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/184\/revisions"}],"predecessor-version":[{"id":1536,"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/184\/revisions\/1536"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/pressbooks\/v2\/parts\/140"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/pressbooks\/v2\/chapters\/184\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/wp\/v2\/media?parent=184"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/pressbooks\/v2\/chapter-type?post=184"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/wp\/v2\/contributor?post=184"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/suny-wmopen-concepts-statistics\/wp-json\/wp\/v2\/license?post=184"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}