{"id":3841,"date":"2022-03-15T23:16:13","date_gmt":"2022-03-15T23:16:13","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/?post_type=chapter&#038;p=3841"},"modified":"2022-06-02T07:53:19","modified_gmt":"2022-06-02T07:53:19","slug":"what-to-know-about-6-a","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/chapter\/what-to-know-about-6-a\/","title":{"raw":"What to Know About 6.A: Exploring Lines of Best Fit","rendered":"What to Know About 6.A: Exploring Lines of Best Fit"},"content":{"raw":"<div class=\"textbox learning-objectives\">\r\n<h3>Learning Goals<\/h3>\r\nAt the end of this page, you should feel comfortable performing these skills:\r\n<ul>\r\n \t<li>Identify the explanatory and response variables in a given scenario.<\/li>\r\n \t<li>Identify when a linear regression analysis might be appropriate.<\/li>\r\n \t<li>Use technology to perform a least squares regression analysis.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<h2>Bivariate Data<\/h2>\r\nIn the upcoming activity, you will need to identify the explanatory and response variable given a scenario and understand when linear regression analysis might be appropriate. In this page, you'll prepare for that by looking carefully at definitions, applying the definitions in given scenarios, and seeing specific situations in which data may or may not be related linearly.\r\n\r\nOften, we do statistical studies to find relationships between two or more variables that can help us to better predict future outcomes and perhaps make changes that will improve our lives.\r\n\r\nIn the next activity, we will be focusing on studies and relationships involving <strong>two quantitative variables.<\/strong> In each dataset, the two variables will be linked because both observations will be measured from the same individual or unit.\r\n\r\nThese types of linked data are called <strong>bivariate data<\/strong> and are often presented in scatterplots. Bivariate data are defined as pairs of data values, where each pair consists of two different measurements that come from the same individual or unit.\r\n\r\nSee the video below for a quick explanation.\r\n<div class=\"textbox tryit\">\r\n<h3>Video Placement<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[Perspective Video: ] A short video (1 minute or less) that shows graphs with common examples of bivariate data -- just showing the placement of explanatory and response variables on the axis and showing how each data point on a scatterplot indicates one input\/response observation in the data set. Examples might include miles driven over gas prices, revenue over marketing expenditures, annual income over total years of school, etc.\u00a0<\/span>\r\n\r\n<\/div>\r\nNow that you have the idea of two related quantitative variables, read on to see how to determine the nature of the two variables in an existing bivariate data set or study of bivariate data. The key idea is that one of the variables will measure the outcome of the study. It will be dependent upon the other variable.\r\n<h3>Explanatory and Response Variables<\/h3>\r\nA teacher wonders if \u201cnumber of absences per semester\u201d is related to \u201cacademic performance\u201d for students in her classes. She might look back on her class records from previous semesters and generate a dataset by observing both the final overall average grade and total number of missed classes for each student in a random sample of students. This is an example of a bivariate dataset.\r\n\r\nWhen working with a bivariate dataset, there are two variables to consider:\r\n<ul>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">The <strong>explanatory variable<\/strong> ([latex]x[\/latex]) is the variable that is thought to explain or predict the response variable of a study.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">The <strong>response variable<\/strong> ([latex]y[\/latex]) measures the outcome of interest in the study. This variable is thought to depend in some way on the explanatory variable. It is often referred to as the \u201cvariable of interest\u201d for the researcher. (In your previous math classes this variable may have been referred to as the dependent variable.)<\/li>\r\n<\/ul>\r\nIn this example, the outcome the teacher is most interested in is how well her students will do in her class, so the response variable is <em>Overall Average Grade<\/em>. The other variable, <em>Number of Absences<\/em>, is the explanatory variable.\r\n\r\nIdentifying explanatory and response variables can sometimes be difficult. When trying to identify explanatory and response variables, make sure to carefully read the scenario and keep the following phrases in mind:\r\n<p style=\"text-align: center;\"><strong><span style=\"text-decoration: underline;\">Explanatory<\/span> is used to predict <span style=\"text-decoration: underline;\">Response<\/span><\/strong>\r\n<strong>(or calculate)<\/strong>\r\n<strong>(or determine)<\/strong><\/p>\r\nIt is good practice to identify both variables and then ask, \u201cWhich one is the main outcome or focus of the study?\u201d This variable will be the response variable, and the other variable will be the explanatory variable. When reading a pre-existing study, carefully read the context of the study to identify\u00a0which variable is being used to explain (the explanatory variable) an outcome or response (the response variable).\r\n<div class=\"textbox exercises\">\r\n<h3>example<\/h3>\r\n<span style=\"background-color: #ffff99;\">[This is a good place to use socially equitable or topical data to replace this common example]<\/span>\r\n\r\nScenario 1. Suppose a chamber of commerce wants to investigate sales in an historic shopping district under various weather conditions. They choose to keep track of the total in-person sales by day and daily high temperatures.\r\n\r\nWhich of these two variables\u00a0<em>Daily Sales<\/em> or\u00a0<em>Daily High Temp<\/em> is the response variable? Which is the explanatory variable?\r\n\r\n[reveal-answer q=\"140397\"]Show Answer[\/reveal-answer]\r\n[hidden-answer a=\"140397\"]In this case, the chamber wants to measure the sales in dollars dependent upon the daily high temperature. Daily sales measures the response. The high temperature would be used to predict the response.[\/hidden-answer]\r\n\r\nScenario 2. Later, the chamber of commerce wishes to explore whether the daily number of shoppers in the district is related to the daytime precipitation. They collect the number of inches of precipitation that fell between 9am and 6pm each day for a certain time period and also the number of times a person crossed through the gate to the main shopping courtyard.\r\n\r\nWhich of these two variables,\u00a0<em>Daily Precipitation<\/em> or\u00a0<em>Number of Shoppers<\/em> is the response variable? Which is the explanatory variable?\r\n\r\n[reveal-answer q=\"278911\"]Show Answer[\/reveal-answer]\r\n[hidden-answer a=\"278911\"]The chamber wishes to show that the number of shoppers is dependent upon the precipitation during business hours. The number of shoppers is the response variable and the daily precipitation in inches is the explanatory variable.\u00a0 [\/hidden-answer]\r\n\r\n<\/div>\r\nWe'll see shortly that both variables present in a bivariate data set will need to be quantitative in order to determine a linear relationship. Note for example that both of the scenarios in the example above included only quantitative variables. To define explanatory and response variables in any bivariate data set, when the study seeks only a correlation, both variables need not be quantitative.\u00a0 Keep this in mind as you answer Questions 1 and 2 below to define and identify explanatory and response variables in a given scenario.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 1<\/h3>\r\nTrue or False: The response variable can be thought of as the predicted variable or outcome.\r\n<ol>\r\n \t<li>a) True<\/li>\r\n \t<li>b) False<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"457387\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"457387\"]See the definition given above.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 2<\/h3>\r\nA researcher wonders if a new cancer treatment leads to a higher five-year survival rate for people diagnosed with a certain type of lung cancer. She creates an experiment where the experimental group gets the new treatment and the control group gets the traditional treatment. After five years, she gathers data on the people in each group to see which cancer patients survived and which did not.\r\n\r\n&nbsp;\r\n\r\nPart A: Identify the explanatory variable. Select the best answer.\r\n<ol>\r\n \t<li>a) Survival status of the patient after five years (survived or did not survive)<\/li>\r\n \t<li>b) Treatment status of the patient (control group or experimental group)<\/li>\r\n \t<li>c) Cancer status (diagnosed with cancer or no cancer)<\/li>\r\n \t<li>d) The years of study (1, 2, 3, 4, or 5)<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"442694\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"442694\"]It may be easier to identify the response first, then choose the explanatory.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: Identify the response variable. Select the best answer.\r\n<ol>\r\n \t<li>a) Survival status of the patient after five years (survived or did not survive)<\/li>\r\n \t<li>b) Treatment status of the patient (control group or new treatment group)<\/li>\r\n \t<li>c) Cancer status (diagnosed with cancer or no cancer)<\/li>\r\n \t<li>d) The years of study (1, 2, 3, 4, or 5)<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"974413\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"974413\"]Which variable is dependent upon the other?[\/hidden-answer]\r\n\r\n<\/div>\r\n<h2>Linear Relationships<\/h2>\r\nA method we will use to make predictions about missing observations or future observations in bivariate data is called <strong>Least Squares Regression (LSR) analysis<\/strong>. The language might seem intimidating at first, but the ideas are quite straightforward, especially with examples to illustrate each new term. For example, LSR analysis can also be described as <strong>linear modeling<\/strong>, where we determine the equation of a <em>line of best fit<\/em> to make predictions based on an existing dataset. In this type of analysis, both the explanatory and response variables must be quantitative, since the linear model requires numerical values in its calculations.\r\n<div class=\"textbox tryit\">\r\n<h3>Video Placement<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[A 3-Instructor Perspective Video: A description and explanation of least squares regression using a scatterplot, line of best fit, vertical error (residuals), and linear equation. Note in the video that both variables must be quantitative in order to perform a linear analysis. It should end with an explanation of common statistical notation for slope and y-intercept, showing that the equation for the line of best fit is the same equation students learned in algebra I as y=mx+b, except that since this line is a rough representation of the data set, the outcome is merely a prediction, thus denoted [latex]\\hat{y}[\/latex].<\/span>\r\n\r\n<\/div>\r\n<h3>Line of Best Fit<\/h3>\r\nThe line of best fit is simply the best line that describes the data points. For real data with natural deviations, the line cannot go through all of the points. In fact, very often, the line does not go through any of the data points.\r\n\r\nSince no line will be perfect, the best we can do is minimize its error. In this class, we will do this by minimizing the sum total of the squared vertical errors from all data points to the line. This is why the <em>line of best fit<\/em> is also called the Least Squares Regression Line (LSRL).\r\n\r\n<img class=\"alignnone wp-image-1181\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12030231\/Picture123-300x160.jpg\" alt=\"A graph with several points and a line of best fit. Each point is connected to the line of best fit vertically. Beside one of the vertical lines, it reads &quot;Residual = 4 - 10 = -6.&quot;\" width=\"1178\" height=\"628\" \/>\r\n\r\nThe <strong>vertical error<\/strong> associated with each data point is called the <strong>residual<\/strong> of that observation. This error, illustrated by the length of the vertical line, represents how far off a prediction calculated from the line is compared to the actual, observed [latex]y[\/latex] value; the larger the line, the greater the error associated with that particular observation.\r\n\r\nNote: For data points that are above the line of best fit, the residuals are positive, and for data points that are below the line, the residuals are negative.\r\n\r\nThe equation for the line of best fit is very similar to one you may have seen in a previous math class:\r\n<p style=\"text-align: center;\">[latex]\\hat{y} = a+bx[\/latex]<\/p>\r\nwhere [latex]\\hat{y}[\/latex] is the general predicted value of the response variable (pronounced y-hat), a is the estimated value of the y-intercept, and [latex]b[\/latex] is the estimated slope.\r\n\r\nWhile the actual process of finding the <em>line of best fit<\/em> might seem complicated, the concept of <em>line of best fit<\/em> is very straightforward. We can use technology to take care of long and tedious calculations.\r\n<h3>When is LSR Analysis Appropriate?<\/h3>\r\nAs you answer Questions 3 and 4, keep in mind that in order to create a linear model during LSR analysis, both of the variables in the bivariate data must be quantitative.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 3<\/h3>\r\nWhich of the following questions could be explored using LSR analysis involving bivariate data? Select all that apply.\r\n<ol>\r\n \t<li>a) Could the number of cigarettes a person smokes per day be used to predict a person\u2019s lifespan?<\/li>\r\n \t<li>b) Does our race, ethnicity, and\/or gender impact the likelihood that we will be treated fairly when seeking a loan, medical treatment, or pursuing an educational degree?<\/li>\r\n \t<li>c) Does the amount of sleep we get per day have an impact on our weight?<\/li>\r\n \t<li>d) Is there are association between the type of pet people own and their level of general happiness?<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"427838\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"427838\"]Both variables must be quantitative.[\/hidden-answer]\r\n\r\n<\/div>\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 4<\/h3>\r\nCan we use LSR analysis to better understand the data generated from the experiment in Question 2? Select the best answer.\r\n<ol>\r\n \t<li>a) Yes, if it is a well-designed experiment.<\/li>\r\n \t<li>b) Yes, because LSR analysis can be used to understand and make better predictions for all datasets.<\/li>\r\n \t<li>c) No, because at least one of the variables is categorical.<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"836710\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"836710\"]What type of variables were involved in the scenario on Question 2?[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Performing LSR Analysis<\/h3>\r\nNow let's put everything you've seen in this activity together to perform an LSR analysis using technology. See the example below for guidance, then answer Question 5.\r\n<div class=\"textbox tryit\">\r\n<h3>Video Placement<\/h3>\r\n<span style=\"background-color: #e6daf7;\">[Worked Example: A 3-instructor worked example that follows the structure of Question 5. This would be an excellent placement for a social justice or inclusion topic. The data should be appropriate for LSR, it should identify the explanatory and response variables, and it should be used to create and visually inspect a scatterplot using technology. It should then use technology to calculate a line of best fit and the correlation coefficient. ]<\/span>\r\n\r\n<\/div>\r\nNow it's your turn to try.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 5<\/h3>\r\nA scientist gathered data on the striped ground cricket to see if ground temperature (measured in degrees Fahrenheit) can be predicted by the number of chirps the cricket makes per second (measured in number of wing vibrations per second).\u00a0 After collecting the data, he could create a scatterplot to understand if there is a positive linear trend.\r\n\r\n&nbsp;\r\n\r\nPart A: Can LSR analysis be used to examine these data? Select the best answer.\r\n<ol>\r\n \t<li>a) No, because this is an observational study and not an experiment.<\/li>\r\n \t<li>b) Yes, because these are bivariate data and both variables are quantitative (and it does not matter that this was an observational study).<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"382508\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"382508\"]Put Answer Here[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: Identify the explanatory variable. Select the best answer.\r\n<ol>\r\n \t<li>a) Ground temperature<\/li>\r\n \t<li>b) Number of crickets<\/li>\r\n \t<li>c) Time of day<\/li>\r\n \t<li>d) Number of chirps per second<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"791176\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"791176\"]Put Answer Here[\/hidden-answer]\r\n\r\nPart C: Identify the response variable. Select the best answer.\r\n<ol>\r\n \t<li>a) Ground temperature<\/li>\r\n \t<li>b) Number of crickets<\/li>\r\n \t<li>c) Time of day<\/li>\r\n \t<li>d) Number of chirps per second<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"963120\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"963120\"]Put Answer Here[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nThe following is a chart of the data the scientist collected to help him answer his question.\r\n<div align=\"center\">\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><strong>Chirps per second<\/strong><\/td>\r\n<td><strong>Temperature in degrees Fahrenheit<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>20<\/td>\r\n<td>88.6<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>16<\/td>\r\n<td>71.6<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>19.8<\/td>\r\n<td>93.3<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>18.4<\/td>\r\n<td>84.3<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>17.1<\/td>\r\n<td>80.6<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>15.5<\/td>\r\n<td>75.2<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>14.7<\/td>\r\n<td>69.7<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>17.1<\/td>\r\n<td>82<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>15.4<\/td>\r\n<td>69.4<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>16.2<\/td>\r\n<td>83.3<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>15<\/td>\r\n<td>79.6<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>17.2<\/td>\r\n<td>82.6<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>16<\/td>\r\n<td>80.6<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>17<\/td>\r\n<td>83.5<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>14.4<\/td>\r\n<td>76.3<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nGo to the Linear Regression tool at <a href=\"https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/\">https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/<\/a> and plot the data using the following steps: under \u201cEnter Data,\u201d select \u201cEnter Own;\u201d name the x (explanatory) and y (response) variables appropriately; copy and paste the data from the table (make sure the explanatory variable is in the first column and the response variable is in the second column); under \u201cPlot Options,\u201d select \u201cRegression Line;\u201d and select \u201cSubmit Data.\u201d\r\n\r\n&nbsp;\r\n\r\nPart D: Does the scatterplot look fairly linear?\r\n\r\n[reveal-answer q=\"688222\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"688222\"]Put Answer Here[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart E: What is the equation of the line of best fit?\r\n\r\n[reveal-answer q=\"238091\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"238091\"]Make sure to use proper notation (don't forget the hat).[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart F: What is the value of the correlation coefficient?\r\n\r\n[reveal-answer q=\"288608\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"288608\"]Make sure to use the proper letter.[\/hidden-answer]\r\n\r\n<\/div>\r\nYou've seen some of these ideas before in <em>Forming Connections<\/em>\u00a0[5A]. Review those notes now to answer Question 6. Make sure to have your answer for Question 6 handy as you begin the upcoming <em>Forming Connections\u00a0<\/em>activity!\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 6<\/h3>\r\nLook over your notes from In-Class Activity 5.A. Write down three different examples where you noted an explanatory variable that can be used to predict a response variable and where both variables are quantitative. One scenario should have a positive association, one should have a negative association, and one should have no association or almost no association.\r\n\r\n[reveal-answer q=\"978167\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"978167\"]Feel free to return to [section 5A] to build up notes if you didn't the first time.[\/hidden-answer]\r\n\r\nYou will be looking at your three examples at the beginning of the upcoming <em>Forming Connections<\/em>. Make sure you have your examples available at the start of that activity.\r\n\r\n<\/div>\r\n<h2>Summary<\/h2>\r\nIn this <em>What to Know\u00a0<\/em>page,\u00a0you learned to recognize when a linear regression analysis is appropriate, how to identify the explanatory and response variables in bivariate data, and how to calculate the line of best fit.\u00a0 Let\u2019s summarize the skills as you saw them in each question.\r\n<ul>\r\n \t<li>In Questions 3, 4, and Question 5 Part A, you identified when a linear regression analysis might be appropriate.<\/li>\r\n \t<li>In Questions 1, 2, and Question 5 Parts B and C, you identified the explanatory and response variables in a given scenario.<\/li>\r\n \t<li>In Question 5 Parts D through F, you calculated the line of best fit and wrote it using proper notation.<\/li>\r\n<\/ul>\r\nIf you feel comfortable with these ideas, it\u2019s time to move on to <em>Forming Connections<\/em> in the next activity!","rendered":"<div class=\"textbox learning-objectives\">\n<h3>Learning Goals<\/h3>\n<p>At the end of this page, you should feel comfortable performing these skills:<\/p>\n<ul>\n<li>Identify the explanatory and response variables in a given scenario.<\/li>\n<li>Identify when a linear regression analysis might be appropriate.<\/li>\n<li>Use technology to perform a least squares regression analysis.<\/li>\n<\/ul>\n<\/div>\n<h2>Bivariate Data<\/h2>\n<p>In the upcoming activity, you will need to identify the explanatory and response variable given a scenario and understand when linear regression analysis might be appropriate. In this page, you&#8217;ll prepare for that by looking carefully at definitions, applying the definitions in given scenarios, and seeing specific situations in which data may or may not be related linearly.<\/p>\n<p>Often, we do statistical studies to find relationships between two or more variables that can help us to better predict future outcomes and perhaps make changes that will improve our lives.<\/p>\n<p>In the next activity, we will be focusing on studies and relationships involving <strong>two quantitative variables.<\/strong> In each dataset, the two variables will be linked because both observations will be measured from the same individual or unit.<\/p>\n<p>These types of linked data are called <strong>bivariate data<\/strong> and are often presented in scatterplots. Bivariate data are defined as pairs of data values, where each pair consists of two different measurements that come from the same individual or unit.<\/p>\n<p>See the video below for a quick explanation.<\/p>\n<div class=\"textbox tryit\">\n<h3>Video Placement<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[Perspective Video: ] A short video (1 minute or less) that shows graphs with common examples of bivariate data &#8212; just showing the placement of explanatory and response variables on the axis and showing how each data point on a scatterplot indicates one input\/response observation in the data set. Examples might include miles driven over gas prices, revenue over marketing expenditures, annual income over total years of school, etc.\u00a0<\/span><\/p>\n<\/div>\n<p>Now that you have the idea of two related quantitative variables, read on to see how to determine the nature of the two variables in an existing bivariate data set or study of bivariate data. The key idea is that one of the variables will measure the outcome of the study. It will be dependent upon the other variable.<\/p>\n<h3>Explanatory and Response Variables<\/h3>\n<p>A teacher wonders if \u201cnumber of absences per semester\u201d is related to \u201cacademic performance\u201d for students in her classes. She might look back on her class records from previous semesters and generate a dataset by observing both the final overall average grade and total number of missed classes for each student in a random sample of students. This is an example of a bivariate dataset.<\/p>\n<p>When working with a bivariate dataset, there are two variables to consider:<\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The <strong>explanatory variable<\/strong> ([latex]x[\/latex]) is the variable that is thought to explain or predict the response variable of a study.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">The <strong>response variable<\/strong> ([latex]y[\/latex]) measures the outcome of interest in the study. This variable is thought to depend in some way on the explanatory variable. It is often referred to as the \u201cvariable of interest\u201d for the researcher. (In your previous math classes this variable may have been referred to as the dependent variable.)<\/li>\n<\/ul>\n<p>In this example, the outcome the teacher is most interested in is how well her students will do in her class, so the response variable is <em>Overall Average Grade<\/em>. The other variable, <em>Number of Absences<\/em>, is the explanatory variable.<\/p>\n<p>Identifying explanatory and response variables can sometimes be difficult. When trying to identify explanatory and response variables, make sure to carefully read the scenario and keep the following phrases in mind:<\/p>\n<p style=\"text-align: center;\"><strong><span style=\"text-decoration: underline;\">Explanatory<\/span> is used to predict <span style=\"text-decoration: underline;\">Response<\/span><\/strong><br \/>\n<strong>(or calculate)<\/strong><br \/>\n<strong>(or determine)<\/strong><\/p>\n<p>It is good practice to identify both variables and then ask, \u201cWhich one is the main outcome or focus of the study?\u201d This variable will be the response variable, and the other variable will be the explanatory variable. When reading a pre-existing study, carefully read the context of the study to identify\u00a0which variable is being used to explain (the explanatory variable) an outcome or response (the response variable).<\/p>\n<div class=\"textbox exercises\">\n<h3>example<\/h3>\n<p><span style=\"background-color: #ffff99;\">[This is a good place to use socially equitable or topical data to replace this common example]<\/span><\/p>\n<p>Scenario 1. Suppose a chamber of commerce wants to investigate sales in an historic shopping district under various weather conditions. They choose to keep track of the total in-person sales by day and daily high temperatures.<\/p>\n<p>Which of these two variables\u00a0<em>Daily Sales<\/em> or\u00a0<em>Daily High Temp<\/em> is the response variable? Which is the explanatory variable?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q140397\">Show Answer<\/span><\/p>\n<div id=\"q140397\" class=\"hidden-answer\" style=\"display: none\">In this case, the chamber wants to measure the sales in dollars dependent upon the daily high temperature. Daily sales measures the response. The high temperature would be used to predict the response.<\/div>\n<\/div>\n<p>Scenario 2. Later, the chamber of commerce wishes to explore whether the daily number of shoppers in the district is related to the daytime precipitation. They collect the number of inches of precipitation that fell between 9am and 6pm each day for a certain time period and also the number of times a person crossed through the gate to the main shopping courtyard.<\/p>\n<p>Which of these two variables,\u00a0<em>Daily Precipitation<\/em> or\u00a0<em>Number of Shoppers<\/em> is the response variable? Which is the explanatory variable?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q278911\">Show Answer<\/span><\/p>\n<div id=\"q278911\" class=\"hidden-answer\" style=\"display: none\">The chamber wishes to show that the number of shoppers is dependent upon the precipitation during business hours. The number of shoppers is the response variable and the daily precipitation in inches is the explanatory variable.\u00a0 <\/div>\n<\/div>\n<\/div>\n<p>We&#8217;ll see shortly that both variables present in a bivariate data set will need to be quantitative in order to determine a linear relationship. Note for example that both of the scenarios in the example above included only quantitative variables. To define explanatory and response variables in any bivariate data set, when the study seeks only a correlation, both variables need not be quantitative.\u00a0 Keep this in mind as you answer Questions 1 and 2 below to define and identify explanatory and response variables in a given scenario.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 1<\/h3>\n<p>True or False: The response variable can be thought of as the predicted variable or outcome.<\/p>\n<ol>\n<li>a) True<\/li>\n<li>b) False<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q457387\">Hint<\/span><\/p>\n<div id=\"q457387\" class=\"hidden-answer\" style=\"display: none\">See the definition given above.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>question 2<\/h3>\n<p>A researcher wonders if a new cancer treatment leads to a higher five-year survival rate for people diagnosed with a certain type of lung cancer. She creates an experiment where the experimental group gets the new treatment and the control group gets the traditional treatment. After five years, she gathers data on the people in each group to see which cancer patients survived and which did not.<\/p>\n<p>&nbsp;<\/p>\n<p>Part A: Identify the explanatory variable. Select the best answer.<\/p>\n<ol>\n<li>a) Survival status of the patient after five years (survived or did not survive)<\/li>\n<li>b) Treatment status of the patient (control group or experimental group)<\/li>\n<li>c) Cancer status (diagnosed with cancer or no cancer)<\/li>\n<li>d) The years of study (1, 2, 3, 4, or 5)<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q442694\">Hint<\/span><\/p>\n<div id=\"q442694\" class=\"hidden-answer\" style=\"display: none\">It may be easier to identify the response first, then choose the explanatory.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: Identify the response variable. Select the best answer.<\/p>\n<ol>\n<li>a) Survival status of the patient after five years (survived or did not survive)<\/li>\n<li>b) Treatment status of the patient (control group or new treatment group)<\/li>\n<li>c) Cancer status (diagnosed with cancer or no cancer)<\/li>\n<li>d) The years of study (1, 2, 3, 4, or 5)<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q974413\">Hint<\/span><\/p>\n<div id=\"q974413\" class=\"hidden-answer\" style=\"display: none\">Which variable is dependent upon the other?<\/div>\n<\/div>\n<\/div>\n<h2>Linear Relationships<\/h2>\n<p>A method we will use to make predictions about missing observations or future observations in bivariate data is called <strong>Least Squares Regression (LSR) analysis<\/strong>. The language might seem intimidating at first, but the ideas are quite straightforward, especially with examples to illustrate each new term. For example, LSR analysis can also be described as <strong>linear modeling<\/strong>, where we determine the equation of a <em>line of best fit<\/em> to make predictions based on an existing dataset. In this type of analysis, both the explanatory and response variables must be quantitative, since the linear model requires numerical values in its calculations.<\/p>\n<div class=\"textbox tryit\">\n<h3>Video Placement<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[A 3-Instructor Perspective Video: A description and explanation of least squares regression using a scatterplot, line of best fit, vertical error (residuals), and linear equation. Note in the video that both variables must be quantitative in order to perform a linear analysis. It should end with an explanation of common statistical notation for slope and y-intercept, showing that the equation for the line of best fit is the same equation students learned in algebra I as y=mx+b, except that since this line is a rough representation of the data set, the outcome is merely a prediction, thus denoted [latex]\\hat{y}[\/latex].<\/span><\/p>\n<\/div>\n<h3>Line of Best Fit<\/h3>\n<p>The line of best fit is simply the best line that describes the data points. For real data with natural deviations, the line cannot go through all of the points. In fact, very often, the line does not go through any of the data points.<\/p>\n<p>Since no line will be perfect, the best we can do is minimize its error. In this class, we will do this by minimizing the sum total of the squared vertical errors from all data points to the line. This is why the <em>line of best fit<\/em> is also called the Least Squares Regression Line (LSRL).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1181\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/12030231\/Picture123-300x160.jpg\" alt=\"A graph with several points and a line of best fit. Each point is connected to the line of best fit vertically. Beside one of the vertical lines, it reads &quot;Residual = 4 - 10 = -6.&quot;\" width=\"1178\" height=\"628\" \/><\/p>\n<p>The <strong>vertical error<\/strong> associated with each data point is called the <strong>residual<\/strong> of that observation. This error, illustrated by the length of the vertical line, represents how far off a prediction calculated from the line is compared to the actual, observed [latex]y[\/latex] value; the larger the line, the greater the error associated with that particular observation.<\/p>\n<p>Note: For data points that are above the line of best fit, the residuals are positive, and for data points that are below the line, the residuals are negative.<\/p>\n<p>The equation for the line of best fit is very similar to one you may have seen in a previous math class:<\/p>\n<p style=\"text-align: center;\">[latex]\\hat{y} = a+bx[\/latex]<\/p>\n<p>where [latex]\\hat{y}[\/latex] is the general predicted value of the response variable (pronounced y-hat), a is the estimated value of the y-intercept, and [latex]b[\/latex] is the estimated slope.<\/p>\n<p>While the actual process of finding the <em>line of best fit<\/em> might seem complicated, the concept of <em>line of best fit<\/em> is very straightforward. We can use technology to take care of long and tedious calculations.<\/p>\n<h3>When is LSR Analysis Appropriate?<\/h3>\n<p>As you answer Questions 3 and 4, keep in mind that in order to create a linear model during LSR analysis, both of the variables in the bivariate data must be quantitative.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Question 3<\/h3>\n<p>Which of the following questions could be explored using LSR analysis involving bivariate data? Select all that apply.<\/p>\n<ol>\n<li>a) Could the number of cigarettes a person smokes per day be used to predict a person\u2019s lifespan?<\/li>\n<li>b) Does our race, ethnicity, and\/or gender impact the likelihood that we will be treated fairly when seeking a loan, medical treatment, or pursuing an educational degree?<\/li>\n<li>c) Does the amount of sleep we get per day have an impact on our weight?<\/li>\n<li>d) Is there are association between the type of pet people own and their level of general happiness?<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q427838\">Hint<\/span><\/p>\n<div id=\"q427838\" class=\"hidden-answer\" style=\"display: none\">Both variables must be quantitative.<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox key-takeaways\">\n<h3>Question 4<\/h3>\n<p>Can we use LSR analysis to better understand the data generated from the experiment in Question 2? Select the best answer.<\/p>\n<ol>\n<li>a) Yes, if it is a well-designed experiment.<\/li>\n<li>b) Yes, because LSR analysis can be used to understand and make better predictions for all datasets.<\/li>\n<li>c) No, because at least one of the variables is categorical.<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q836710\">Hint<\/span><\/p>\n<div id=\"q836710\" class=\"hidden-answer\" style=\"display: none\">What type of variables were involved in the scenario on Question 2?<\/div>\n<\/div>\n<\/div>\n<h3>Performing LSR Analysis<\/h3>\n<p>Now let&#8217;s put everything you&#8217;ve seen in this activity together to perform an LSR analysis using technology. See the example below for guidance, then answer Question 5.<\/p>\n<div class=\"textbox tryit\">\n<h3>Video Placement<\/h3>\n<p><span style=\"background-color: #e6daf7;\">[Worked Example: A 3-instructor worked example that follows the structure of Question 5. This would be an excellent placement for a social justice or inclusion topic. The data should be appropriate for LSR, it should identify the explanatory and response variables, and it should be used to create and visually inspect a scatterplot using technology. It should then use technology to calculate a line of best fit and the correlation coefficient. ]<\/span><\/p>\n<\/div>\n<p>Now it&#8217;s your turn to try.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Question 5<\/h3>\n<p>A scientist gathered data on the striped ground cricket to see if ground temperature (measured in degrees Fahrenheit) can be predicted by the number of chirps the cricket makes per second (measured in number of wing vibrations per second).\u00a0 After collecting the data, he could create a scatterplot to understand if there is a positive linear trend.<\/p>\n<p>&nbsp;<\/p>\n<p>Part A: Can LSR analysis be used to examine these data? Select the best answer.<\/p>\n<ol>\n<li>a) No, because this is an observational study and not an experiment.<\/li>\n<li>b) Yes, because these are bivariate data and both variables are quantitative (and it does not matter that this was an observational study).<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q382508\">Hint<\/span><\/p>\n<div id=\"q382508\" class=\"hidden-answer\" style=\"display: none\">Put Answer Here<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: Identify the explanatory variable. Select the best answer.<\/p>\n<ol>\n<li>a) Ground temperature<\/li>\n<li>b) Number of crickets<\/li>\n<li>c) Time of day<\/li>\n<li>d) Number of chirps per second<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q791176\">Hint<\/span><\/p>\n<div id=\"q791176\" class=\"hidden-answer\" style=\"display: none\">Put Answer Here<\/div>\n<\/div>\n<p>Part C: Identify the response variable. Select the best answer.<\/p>\n<ol>\n<li>a) Ground temperature<\/li>\n<li>b) Number of crickets<\/li>\n<li>c) Time of day<\/li>\n<li>d) Number of chirps per second<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q963120\">Hint<\/span><\/p>\n<div id=\"q963120\" class=\"hidden-answer\" style=\"display: none\">Put Answer Here<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The following is a chart of the data the scientist collected to help him answer his question.<\/p>\n<div style=\"margin: auto;\">\n<table>\n<tbody>\n<tr>\n<td><strong>Chirps per second<\/strong><\/td>\n<td><strong>Temperature in degrees Fahrenheit<\/strong><\/td>\n<\/tr>\n<tr>\n<td>20<\/td>\n<td>88.6<\/td>\n<\/tr>\n<tr>\n<td>16<\/td>\n<td>71.6<\/td>\n<\/tr>\n<tr>\n<td>19.8<\/td>\n<td>93.3<\/td>\n<\/tr>\n<tr>\n<td>18.4<\/td>\n<td>84.3<\/td>\n<\/tr>\n<tr>\n<td>17.1<\/td>\n<td>80.6<\/td>\n<\/tr>\n<tr>\n<td>15.5<\/td>\n<td>75.2<\/td>\n<\/tr>\n<tr>\n<td>14.7<\/td>\n<td>69.7<\/td>\n<\/tr>\n<tr>\n<td>17.1<\/td>\n<td>82<\/td>\n<\/tr>\n<tr>\n<td>15.4<\/td>\n<td>69.4<\/td>\n<\/tr>\n<tr>\n<td>16.2<\/td>\n<td>83.3<\/td>\n<\/tr>\n<tr>\n<td>15<\/td>\n<td>79.6<\/td>\n<\/tr>\n<tr>\n<td>17.2<\/td>\n<td>82.6<\/td>\n<\/tr>\n<tr>\n<td>16<\/td>\n<td>80.6<\/td>\n<\/tr>\n<tr>\n<td>17<\/td>\n<td>83.5<\/td>\n<\/tr>\n<tr>\n<td>14.4<\/td>\n<td>76.3<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>Go to the Linear Regression tool at <a href=\"https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/\">https:\/\/dcmathpathways.shinyapps.io\/LinearRegression\/<\/a> and plot the data using the following steps: under \u201cEnter Data,\u201d select \u201cEnter Own;\u201d name the x (explanatory) and y (response) variables appropriately; copy and paste the data from the table (make sure the explanatory variable is in the first column and the response variable is in the second column); under \u201cPlot Options,\u201d select \u201cRegression Line;\u201d and select \u201cSubmit Data.\u201d<\/p>\n<p>&nbsp;<\/p>\n<p>Part D: Does the scatterplot look fairly linear?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q688222\">Hint<\/span><\/p>\n<div id=\"q688222\" class=\"hidden-answer\" style=\"display: none\">Put Answer Here<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part E: What is the equation of the line of best fit?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q238091\">Hint<\/span><\/p>\n<div id=\"q238091\" class=\"hidden-answer\" style=\"display: none\">Make sure to use proper notation (don&#8217;t forget the hat).<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part F: What is the value of the correlation coefficient?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q288608\">Hint<\/span><\/p>\n<div id=\"q288608\" class=\"hidden-answer\" style=\"display: none\">Make sure to use the proper letter.<\/div>\n<\/div>\n<\/div>\n<p>You&#8217;ve seen some of these ideas before in <em>Forming Connections<\/em>\u00a0[5A]. Review those notes now to answer Question 6. Make sure to have your answer for Question 6 handy as you begin the upcoming <em>Forming Connections\u00a0<\/em>activity!<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Question 6<\/h3>\n<p>Look over your notes from In-Class Activity 5.A. Write down three different examples where you noted an explanatory variable that can be used to predict a response variable and where both variables are quantitative. One scenario should have a positive association, one should have a negative association, and one should have no association or almost no association.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q978167\">Hint<\/span><\/p>\n<div id=\"q978167\" class=\"hidden-answer\" style=\"display: none\">Feel free to return to [section 5A] to build up notes if you didn&#8217;t the first time.<\/div>\n<\/div>\n<p>You will be looking at your three examples at the beginning of the upcoming <em>Forming Connections<\/em>. Make sure you have your examples available at the start of that activity.<\/p>\n<\/div>\n<h2>Summary<\/h2>\n<p>In this <em>What to Know\u00a0<\/em>page,\u00a0you learned to recognize when a linear regression analysis is appropriate, how to identify the explanatory and response variables in bivariate data, and how to calculate the line of best fit.\u00a0 Let\u2019s summarize the skills as you saw them in each question.<\/p>\n<ul>\n<li>In Questions 3, 4, and Question 5 Part A, you identified when a linear regression analysis might be appropriate.<\/li>\n<li>In Questions 1, 2, and Question 5 Parts B and C, you identified the explanatory and response variables in a given scenario.<\/li>\n<li>In Question 5 Parts D through F, you calculated the line of best fit and wrote it using proper notation.<\/li>\n<\/ul>\n<p>If you feel comfortable with these ideas, it\u2019s time to move on to <em>Forming Connections<\/em> in the next activity!<\/p>\n","protected":false},"author":428269,"menu_order":3,"template":"","meta":{"_candela_citation":"[]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-3841","chapter","type-chapter","status-publish","hentry"],"part":4241,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3841","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/users\/428269"}],"version-history":[{"count":11,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3841\/revisions"}],"predecessor-version":[{"id":4837,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3841\/revisions\/4837"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/parts\/4241"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3841\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/media?parent=3841"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapter-type?post=3841"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/contributor?post=3841"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/license?post=3841"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}