{"id":3453,"date":"2022-03-02T15:38:50","date_gmt":"2022-03-02T15:38:50","guid":{"rendered":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/?post_type=chapter&#038;p=3453"},"modified":"2022-03-29T18:50:17","modified_gmt":"2022-03-29T18:50:17","slug":"forming-connections-in-1c","status":"publish","type":"chapter","link":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/chapter\/forming-connections-in-1c\/","title":{"raw":"Forming Connections in 1C: Data Collection and Organization","rendered":"Forming Connections in 1C: Data Collection and Organization"},"content":{"raw":"<div class=\"textbox learning-objectives\">\r\n<h3>Objectives for the activity<\/h3>\r\nDuring this activity, you will:\r\n<ul>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Organize data in a spreadsheet.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Distinguish between observational units and variables in a dataset.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Distinguish between categorical and quantitative variables.<\/li>\r\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Distinguish between quantitative variables that are discrete or continuous.<\/li>\r\n \t<li aria-level=\"1\">Identify variables that can be used to collect data.<\/li>\r\n<\/ul>\r\n<\/div>\r\nIn\u00a0<em>What to Know [1C]<\/em>, you learned to distinguish between statistical investigative questions and survey questions. You also began to see that some data could be numerical or non-numerical. In this activity, we'll extend your understanding of statistical problem-solving by learning some key terms and organizational strategies associated with data collection.\r\n\r\nRecall the four steps of the statistical problem-solving process, from (1) forming a statistical question and (2) collecting data to (3) analyzing the data and (4) interpreting the results. Today we\u2019ll consider the connection between the first two steps. That is, how do we get from the statistical investigative question to a data collection plan? Along the way, you'll be able to see that there\u00a0are multiple data collection and organization strategies that may be considered for a single statistical question. You'll also consider ethical obligations related to data collection and storage.\r\n<h2>Data Collection and Organization<\/h2>\r\nIn practice, there are often multiple data collection options to consider. For example, if we were interested in the relationship between phone use in class and grades, there are many ways to define the relevant variables and collect and organize the information.\r\n\r\n<img class=\"wp-image-1064 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/11231620\/Picture71-300x200.jpg\" alt=\"Several people sitting in a row all using their smartphones.\" width=\"726\" height=\"484\" \/>\r\n\r\nConsider Question 1 below individually, then compare your answer with a partner and discuss the similarities and differences in your answers.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 1<\/h3>\r\nDo you think there is a relationship between a student\u2019s phone use in class and their grades? Are there any details about \u201cphone use\u201d that are important to consider?\r\n\r\n[reveal-answer q=\"536624\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"536624\"]What do you think? Are there different ways to use a phone in class, some helpful, some not so helpful? [\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Data Organization<\/h3>\r\nA dataset contains information about a group of individuals or <strong>observational units<\/strong>. The characteristics of these observational units are recorded as <strong>variables<\/strong>. For example, the researcher collecting data on student phone use might ask individual students to report the number of times they checked their messages during class. In this case, the variable is the number of times messages were checked during class and the observational unit is one student response.\u00a0Prior to analyzing the data, it needs to be organized into a spreadsheet in rows and columns. See the example below for a demonstration.\r\n<div class=\"textbox exercises\">\r\n<h3>example<\/h3>\r\nPicture yourself as the researcher collecting responses for many survey questions (<strong>variables<\/strong>) from each individual (<strong>observational unit<\/strong>) you survey. The data will be organized into a spreadsheet, which consists of rows and columns. Naturally, there are only two possibilities for arranging the variable responses for each individual surveyed.\r\n\r\nWhich of the following two options do you think represents the way observational units and variables are usually organized in a spreadsheet?\r\n\r\nOption A: Each row is a variable and each column is an observational unit\r\n<table style=\"border-collapse: collapse; width: 99.7423%;\" border=\"1\">\r\n<tbody>\r\n<tr>\r\n<td style=\"width: 33.2907%;\">Variabiles<\/td>\r\n<td style=\"width: 33.2907%;\">Individual 1<\/td>\r\n<td style=\"width: 33.2907%;\">Individual 2<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 33.2907%;\">Variable 1<\/td>\r\n<td style=\"width: 33.2907%;\">response 1<\/td>\r\n<td style=\"width: 33.2907%;\">response 1<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 33.2907%;\">Variable 2<\/td>\r\n<td style=\"width: 33.2907%;\">response 2<\/td>\r\n<td style=\"width: 33.2907%;\">response 2<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 33.2907%;\">Variable 3<\/td>\r\n<td style=\"width: 33.2907%;\">response 3<\/td>\r\n<td style=\"width: 33.2907%;\">response 3<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nOption B: Each row is an observational unit and each column is a variable\r\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\r\n<tbody>\r\n<tr>\r\n<td style=\"width: 183.891px;\">Individual surveyed<\/td>\r\n<td style=\"width: 183.891px;\">Variable 1<\/td>\r\n<td style=\"width: 183.891px;\">Variable 2<\/td>\r\n<td style=\"width: 183.922px;\">Variable\u00a0 3<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 183.891px;\">Individual 1<\/td>\r\n<td style=\"width: 183.891px;\">response 1<\/td>\r\n<td style=\"width: 183.891px;\">response 2<\/td>\r\n<td style=\"width: 183.922px;\">response 3<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 183.891px;\">Individual 2<\/td>\r\n<td style=\"width: 183.891px;\">response 1<\/td>\r\n<td style=\"width: 183.891px;\">response 2<\/td>\r\n<td style=\"width: 183.922px;\">response 3<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[reveal-answer q=\"251414\"]Show Answer[\/reveal-answer]\r\n[hidden-answer a=\"251414\"]If you haven't prepared spreadsheets for data collection before, it may not be obvious which of the two organization strategies is often preferred. Option B is the usual, recommended[footnote]<a href=\"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/00031305.2017.1375989\">https:\/\/www.biostat.wisc.edu\/~kbroman\/publications\/dataorg.pdf<\/a>[\/footnote] way to organize observational units and variables in a spreadsheet to make it easier to analyze.\u00a0 [\/hidden-answer]\r\n\r\n<span style=\"background-color: #ffff99;\">[The hidden answer includes a link to an open access article: Data Organization in Spreadsheets published in <em>The American Statistician<\/em>\u00a0and located at\u00a0 Taylor &amp; Francis Online. Please edit as needed in the preferred citation style.]<\/span>\r\n\r\n<\/div>\r\nAre you beginning to develop an image of how data can be organized in a spreadsheet? Answer Question 2 below to check your understanding.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 2<\/h3>\r\nA dataset contains information about a group of individuals or <strong>observational units<\/strong>. The characteristics of these observational units are recorded as <strong>variables<\/strong>. How are observational units and variables usually organized in a spreadsheet?\r\n\r\n[reveal-answer q=\"153850\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"153850\"]Put Answer Here[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Types of Variables<\/h3>\r\nA variable is classified as <strong>categorical<\/strong> if it places an individual into one of several groups; it is classified as <strong>quantitative<\/strong> if it takes numerical values that can be used in arithmetic.\r\n\r\nThere are two types of quantitative variables. A <strong>discrete<\/strong> variable takes a fixed set of possible values, and it is not possible to get any value in between. In contrast, the range of outcomes for a <strong>continuous<\/strong> variable includes an infinite number of possible values. The discussion below provides a demonstration and examples of these types of variables. Try the question given in the Example before moving to Question 3.\r\n<div class=\"textbox exercises\">\r\n<h3>example<\/h3>\r\n<strong>Categorical Variables<\/strong>\r\n\r\nThese variables place an individual into one of several groups. Categorical survey questions are often encountered when completing forms that ask for information such as gender and race.\r\n\r\n<strong>Quantitative Variables<\/strong>\r\n\r\nQuantitative variables may be discrete or continuous.\r\n\r\n<strong>Discrete<\/strong> variables often require non-negative whole numbers as responses. For example, an automobile insurance applicant may be asked for how many accidents were they found to be at fault. Responses would necessarily be a whole number like [latex]0[\/latex],\u00a0[latex]1[\/latex], or\u00a0[latex]2[\/latex].\r\n\r\n<strong>Continuous<\/strong> variables take any number or fraction of a number as a response, such as weight in pounds ([latex]155[\/latex], [latex]187.2[\/latex], or [latex]221.9[\/latex]).\r\n\r\nEx. Imagine that you have been selected as a statistics intern in a veterinary clinic. The veterinarian wants to collect data about the dogs seen in her office. You've been asked to record information from the patient files to answer the survey questions listed below. For each, state whether the associated variable is categorical, discrete quantitative, or continuous quantitative and explain how you know.\r\n<ol>\r\n \t<li>What zip code does the dog's owner live in?<\/li>\r\n \t<li>What is the dog's weight in pounds?<\/li>\r\n \t<li>How many times has the dog been seen in the office?<\/li>\r\n \t<li>Does the owner have an outstanding balance due?<\/li>\r\n \t<li>How many pets are in the household in addition to the dog? ([latex]0[\/latex], [latex]1-2[\/latex], [latex]3-5[\/latex], more than [latex]5[\/latex])<\/li>\r\n<\/ol>\r\n[reveal-answer q=\"318506\"]Show Answer[\/reveal-answer]\r\n[hidden-answer a=\"318506\"]\r\n<ol>\r\n \t<li>Categorical; this question places individuals into one of a finite group of zip codes. If you thought this might be quantitative, note that it doesn't make sense to perform arithmetic on this variable (an average area code is meaningless!)<\/li>\r\n \t<li>Quantitative continuous:\u00a0weights may be recorded in fractions of a pound such as 60.0, 30.833, or 19.5. This response can also be restricted to weights rounded to the nearest whole pound, which would make it a discrete variable.<\/li>\r\n \t<li>Quantitative discreet: it would be impossible to have been seen 1.5 times.<\/li>\r\n \t<li>Categorical: this question requires one of two responses: yes, or no.<\/li>\r\n \t<li>Categorical: this may seem like a quantitative variable at first, but the response asks for one of four categories.<\/li>\r\n<\/ol>\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\nNow you try identifying the types of variables present in survey questions with a partner. Work in pairs to discuss the list of survey questions given in Question 3.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 3<\/h3>\r\nConsider the survey questions below. If you used these questions to collect data, would the resulting variables be categorical or quantitative? For variables that are quantitative, classify them as discrete or continuous.\r\n\r\n&nbsp;\r\n\r\nWhat type of mobile phone do you have? (iPhone, Android, other)\r\n\r\n[reveal-answer q=\"242358\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"242358\"]Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nWhat is your area code?\r\n\r\n[reveal-answer q=\"980274\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"980274\"]Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nHow many devices capable of connecting to the Internet do you bring with you to class on a typical day?\r\n\r\n[reveal-answer q=\"638864\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"638864\"]Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nHow much time did you spend on your phone yesterday? (less than 2 hours, 2\u20135 hours, more than 5 hours)\r\n\r\n[reveal-answer q=\"574624\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"574624\"]Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nApproximately how much time do you spend on your phone in a typical day?\r\n\r\n[reveal-answer q=\"990667\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"990667\"]Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nDo you usually spend more time on your phone on weekdays or on weekends?\r\n\r\n[reveal-answer q=\"281623\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"281623\"]Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Data Collection<\/h3>\r\nDifferent survey questions offer different advantages and disadvantages for data collection. For example, it may be easier to remember how much time you spent on your phone yesterday compared to questions about your general habits, but a single day of phone use may not be representative of your phone use in general. The next few questions ask you to consider a statistical question that allows for many different options for collecting data, each with its own advantages and disadvantages.\r\n\r\nWork in groups of two or three to complete the remainder of this activity.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 4<\/h3>\r\nSuppose you want to investigate whether there is a relationship between a student\u2019s phone use in class and their grades. Write survey questions or state variable names to answer the first two questions below.\r\n\r\n&nbsp;\r\n\r\nPart A: List three ways that you could measure phone use. Make sure your list involves at least one categorical variable and at least one quantitative variable.\r\n\r\n[reveal-answer q=\"947182\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"947182\"]Consider what types of data different types of variables take.[\/hidden-answer]\r\n\r\n&nbsp;\r\n\r\nPart B: List three ways that you could measure grades. Make sure your list involves at least one categorical variable and at least one quantitative variable.\r\n\r\n[reveal-answer q=\"671060\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"671060\"]Consider what types of data different types of variables take.[\/hidden-answer]\r\n\r\nPart C: Revisit the lists that you made in Parts A and B. Which of the approaches do you like the best?\r\n\r\n[reveal-answer q=\"898393\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"898393\"]What do <em>you<\/em> think? Which of the items you listed do you think will reveal the kind of data that will help to answer the research question?[\/hidden-answer]\r\n\r\n<\/div>\r\nWhen developing the variables to collect data in the question above, you considered individuals as the observational units. How might your variable selections change if the observational unit shifts from individuals to class sections of students? Keep in mind that variables should be characteristic of the observational units as you answer Question 5 next. How should the data collection change when observing class sections rather than individuals?\r\n<div class=\"textbox key-takeaways\">\r\n<h3>Question 5<\/h3>\r\nSuppose you want to investigate the relationship between phone use in class and grades using class sections as the observational units instead of individual students.\u00a0Write survey questions or state variable names to answer both of the questions below.\r\n\r\n&nbsp;\r\n\r\nPart A: Name one way you could measure phone use in a class section.\r\n\r\n[reveal-answer q=\"77027\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"77027\"]Consider what types of data different types of variables take. How can the data reveal information about the class section rather than individual students?[\/hidden-answer]\r\n\r\nPart B: Name one way you could measure grades in a class section.\r\n\r\n[reveal-answer q=\"750738\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"750738\"]Consider what types of data different types of variables take. How can the data reveal information about the class section (e.g., class policies, average grades) rather than individual students?[\/hidden-answer]\r\n\r\n<\/div>\r\n<h3>Ethical Issues<\/h3>\r\nThink back on the survey questions and variables given in this activity as well as the ones you wrote to answer Questions 4 and 5. Do you have any ethical concerns about the data collection plans proposed? Some examples to consider include:\r\n<ul>\r\n \t<li>In what ways could the data collection process or the information revealed cause some students to be treated differently than others?<\/li>\r\n \t<li>In what ways could the questions asked or methods of collection cause the data to imply associations that are not representative of the true situation?<\/li>\r\n \t<li>Are there any ethical considerations surrounding how the collected data will be stored?<\/li>\r\n<\/ul>\r\n<div class=\"textbox exercises\">\r\n<h3>Example<\/h3>\r\nPrivacy concerns in data collection are paramount. You may be familiar with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.[footnote]<a href=\"https:\/\/www.hhs.gov\/hipaa\/for-professionals\/privacy\/special-topics\/de-identification\/index.html\">https:\/\/www.hhs.gov\/hipaa\/for-professionals\/privacy\/special-topics\/de-identification\/index.html<\/a>[\/footnote] For individuals at least 18 years old, this rule prevents the individual's medical information from being revealed to anyone who the individual has not identified as eligible to receive it.\r\n\r\nA similar exists for college students at least 18 years old. A federal law called the Family Educational Rights and Privacy Act (FERPA),[footnote]null[\/footnote][footnote]The U.S. Department of Education provides information about FERPA on their website: <a href=\"https:\/\/www2.ed.gov\/policy\/gen\/guid\/fpco\/ferpa\">https:\/\/www2.ed.gov\/policy\/gen\/guid\/fpco\/ferpa<\/a>[\/footnote] protects the privacy of student records.\r\n\r\nHow could data collection and storage when studying phone use and grades protect the privacy of student information?\r\n\r\n[reveal-answer q=\"466002\"]Show Answer[\/reveal-answer]\r\n[hidden-answer a=\"466002\"]Any personally identifiable information obtained during data collection from students must be removed to protect privacy of student information.\r\n<ul>\r\n \t<li>Data that has been <em>de-identified<\/em> has had all personally identifying information removed from the data.<\/li>\r\n \t<li>Data that has been <em>anonymized<\/em> has been permanently de-identified so that the personally identifying information may never become reassociated with the data.<\/li>\r\n<\/ul>\r\n[\/hidden-answer]\r\n\r\n<\/div>\r\nWork in pairs or groups to summarize your understanding of the ethical concerns associated with data collection and storage as you answer Question 6.\r\n<div class=\"textbox key-takeaways\">\r\n<h3>question 6<\/h3>\r\nAre there any ethical concerns associated with a study of phone use and grades?\r\n\r\n[reveal-answer q=\"318324\"]Hint[\/reveal-answer]\r\n[hidden-answer a=\"318324\"]Consider privacy concerns surrounding data collection and storage.[\/hidden-answer]\r\n\r\n<\/div>\r\n<span style=\"background-color: #ffff99;\">[Note: a question could be inserted here to specifically add an LO for ethics as needed.]<\/span>","rendered":"<div class=\"textbox learning-objectives\">\n<h3>Objectives for the activity<\/h3>\n<p>During this activity, you will:<\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Organize data in a spreadsheet.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Distinguish between observational units and variables in a dataset.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Distinguish between categorical and quantitative variables.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Distinguish between quantitative variables that are discrete or continuous.<\/li>\n<li aria-level=\"1\">Identify variables that can be used to collect data.<\/li>\n<\/ul>\n<\/div>\n<p>In\u00a0<em>What to Know [1C]<\/em>, you learned to distinguish between statistical investigative questions and survey questions. You also began to see that some data could be numerical or non-numerical. In this activity, we&#8217;ll extend your understanding of statistical problem-solving by learning some key terms and organizational strategies associated with data collection.<\/p>\n<p>Recall the four steps of the statistical problem-solving process, from (1) forming a statistical question and (2) collecting data to (3) analyzing the data and (4) interpreting the results. Today we\u2019ll consider the connection between the first two steps. That is, how do we get from the statistical investigative question to a data collection plan? Along the way, you&#8217;ll be able to see that there\u00a0are multiple data collection and organization strategies that may be considered for a single statistical question. You&#8217;ll also consider ethical obligations related to data collection and storage.<\/p>\n<h2>Data Collection and Organization<\/h2>\n<p>In practice, there are often multiple data collection options to consider. For example, if we were interested in the relationship between phone use in class and grades, there are many ways to define the relevant variables and collect and organize the information.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1064 aligncenter\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5738\/2022\/01\/11231620\/Picture71-300x200.jpg\" alt=\"Several people sitting in a row all using their smartphones.\" width=\"726\" height=\"484\" \/><\/p>\n<p>Consider Question 1 below individually, then compare your answer with a partner and discuss the similarities and differences in your answers.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Question 1<\/h3>\n<p>Do you think there is a relationship between a student\u2019s phone use in class and their grades? Are there any details about \u201cphone use\u201d that are important to consider?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q536624\">Hint<\/span><\/p>\n<div id=\"q536624\" class=\"hidden-answer\" style=\"display: none\">What do you think? Are there different ways to use a phone in class, some helpful, some not so helpful? <\/div>\n<\/div>\n<\/div>\n<h3>Data Organization<\/h3>\n<p>A dataset contains information about a group of individuals or <strong>observational units<\/strong>. The characteristics of these observational units are recorded as <strong>variables<\/strong>. For example, the researcher collecting data on student phone use might ask individual students to report the number of times they checked their messages during class. In this case, the variable is the number of times messages were checked during class and the observational unit is one student response.\u00a0Prior to analyzing the data, it needs to be organized into a spreadsheet in rows and columns. See the example below for a demonstration.<\/p>\n<div class=\"textbox exercises\">\n<h3>example<\/h3>\n<p>Picture yourself as the researcher collecting responses for many survey questions (<strong>variables<\/strong>) from each individual (<strong>observational unit<\/strong>) you survey. The data will be organized into a spreadsheet, which consists of rows and columns. Naturally, there are only two possibilities for arranging the variable responses for each individual surveyed.<\/p>\n<p>Which of the following two options do you think represents the way observational units and variables are usually organized in a spreadsheet?<\/p>\n<p>Option A: Each row is a variable and each column is an observational unit<\/p>\n<table style=\"border-collapse: collapse; width: 99.7423%;\">\n<tbody>\n<tr>\n<td style=\"width: 33.2907%;\">Variabiles<\/td>\n<td style=\"width: 33.2907%;\">Individual 1<\/td>\n<td style=\"width: 33.2907%;\">Individual 2<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 33.2907%;\">Variable 1<\/td>\n<td style=\"width: 33.2907%;\">response 1<\/td>\n<td style=\"width: 33.2907%;\">response 1<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 33.2907%;\">Variable 2<\/td>\n<td style=\"width: 33.2907%;\">response 2<\/td>\n<td style=\"width: 33.2907%;\">response 2<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 33.2907%;\">Variable 3<\/td>\n<td style=\"width: 33.2907%;\">response 3<\/td>\n<td style=\"width: 33.2907%;\">response 3<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Option B: Each row is an observational unit and each column is a variable<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 183.891px;\">Individual surveyed<\/td>\n<td style=\"width: 183.891px;\">Variable 1<\/td>\n<td style=\"width: 183.891px;\">Variable 2<\/td>\n<td style=\"width: 183.922px;\">Variable\u00a0 3<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 183.891px;\">Individual 1<\/td>\n<td style=\"width: 183.891px;\">response 1<\/td>\n<td style=\"width: 183.891px;\">response 2<\/td>\n<td style=\"width: 183.922px;\">response 3<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 183.891px;\">Individual 2<\/td>\n<td style=\"width: 183.891px;\">response 1<\/td>\n<td style=\"width: 183.891px;\">response 2<\/td>\n<td style=\"width: 183.922px;\">response 3<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q251414\">Show Answer<\/span><\/p>\n<div id=\"q251414\" class=\"hidden-answer\" style=\"display: none\">If you haven&#8217;t prepared spreadsheets for data collection before, it may not be obvious which of the two organization strategies is often preferred. Option B is the usual, recommended<a class=\"footnote\" title=\"https:\/\/www.biostat.wisc.edu\/~kbroman\/publications\/dataorg.pdf\" id=\"return-footnote-3453-1\" href=\"#footnote-3453-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a> way to organize observational units and variables in a spreadsheet to make it easier to analyze.\u00a0 <\/div>\n<\/div>\n<p><span style=\"background-color: #ffff99;\">[The hidden answer includes a link to an open access article: Data Organization in Spreadsheets published in <em>The American Statistician<\/em>\u00a0and located at\u00a0 Taylor &amp; Francis Online. Please edit as needed in the preferred citation style.]<\/span><\/p>\n<\/div>\n<p>Are you beginning to develop an image of how data can be organized in a spreadsheet? Answer Question 2 below to check your understanding.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Question 2<\/h3>\n<p>A dataset contains information about a group of individuals or <strong>observational units<\/strong>. The characteristics of these observational units are recorded as <strong>variables<\/strong>. How are observational units and variables usually organized in a spreadsheet?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q153850\">Hint<\/span><\/p>\n<div id=\"q153850\" class=\"hidden-answer\" style=\"display: none\">Put Answer Here<\/div>\n<\/div>\n<\/div>\n<h3>Types of Variables<\/h3>\n<p>A variable is classified as <strong>categorical<\/strong> if it places an individual into one of several groups; it is classified as <strong>quantitative<\/strong> if it takes numerical values that can be used in arithmetic.<\/p>\n<p>There are two types of quantitative variables. A <strong>discrete<\/strong> variable takes a fixed set of possible values, and it is not possible to get any value in between. In contrast, the range of outcomes for a <strong>continuous<\/strong> variable includes an infinite number of possible values. The discussion below provides a demonstration and examples of these types of variables. Try the question given in the Example before moving to Question 3.<\/p>\n<div class=\"textbox exercises\">\n<h3>example<\/h3>\n<p><strong>Categorical Variables<\/strong><\/p>\n<p>These variables place an individual into one of several groups. Categorical survey questions are often encountered when completing forms that ask for information such as gender and race.<\/p>\n<p><strong>Quantitative Variables<\/strong><\/p>\n<p>Quantitative variables may be discrete or continuous.<\/p>\n<p><strong>Discrete<\/strong> variables often require non-negative whole numbers as responses. For example, an automobile insurance applicant may be asked for how many accidents were they found to be at fault. Responses would necessarily be a whole number like [latex]0[\/latex],\u00a0[latex]1[\/latex], or\u00a0[latex]2[\/latex].<\/p>\n<p><strong>Continuous<\/strong> variables take any number or fraction of a number as a response, such as weight in pounds ([latex]155[\/latex], [latex]187.2[\/latex], or [latex]221.9[\/latex]).<\/p>\n<p>Ex. Imagine that you have been selected as a statistics intern in a veterinary clinic. The veterinarian wants to collect data about the dogs seen in her office. You&#8217;ve been asked to record information from the patient files to answer the survey questions listed below. For each, state whether the associated variable is categorical, discrete quantitative, or continuous quantitative and explain how you know.<\/p>\n<ol>\n<li>What zip code does the dog&#8217;s owner live in?<\/li>\n<li>What is the dog&#8217;s weight in pounds?<\/li>\n<li>How many times has the dog been seen in the office?<\/li>\n<li>Does the owner have an outstanding balance due?<\/li>\n<li>How many pets are in the household in addition to the dog? ([latex]0[\/latex], [latex]1-2[\/latex], [latex]3-5[\/latex], more than [latex]5[\/latex])<\/li>\n<\/ol>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q318506\">Show Answer<\/span><\/p>\n<div id=\"q318506\" class=\"hidden-answer\" style=\"display: none\">\n<ol>\n<li>Categorical; this question places individuals into one of a finite group of zip codes. If you thought this might be quantitative, note that it doesn&#8217;t make sense to perform arithmetic on this variable (an average area code is meaningless!)<\/li>\n<li>Quantitative continuous:\u00a0weights may be recorded in fractions of a pound such as 60.0, 30.833, or 19.5. This response can also be restricted to weights rounded to the nearest whole pound, which would make it a discrete variable.<\/li>\n<li>Quantitative discreet: it would be impossible to have been seen 1.5 times.<\/li>\n<li>Categorical: this question requires one of two responses: yes, or no.<\/li>\n<li>Categorical: this may seem like a quantitative variable at first, but the response asks for one of four categories.<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<\/div>\n<p>Now you try identifying the types of variables present in survey questions with a partner. Work in pairs to discuss the list of survey questions given in Question 3.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 3<\/h3>\n<p>Consider the survey questions below. If you used these questions to collect data, would the resulting variables be categorical or quantitative? For variables that are quantitative, classify them as discrete or continuous.<\/p>\n<p>&nbsp;<\/p>\n<p>What type of mobile phone do you have? (iPhone, Android, other)<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q242358\">Hint<\/span><\/p>\n<div id=\"q242358\" class=\"hidden-answer\" style=\"display: none\">Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>What is your area code?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q980274\">Hint<\/span><\/p>\n<div id=\"q980274\" class=\"hidden-answer\" style=\"display: none\">Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>How many devices capable of connecting to the Internet do you bring with you to class on a typical day?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q638864\">Hint<\/span><\/p>\n<div id=\"q638864\" class=\"hidden-answer\" style=\"display: none\">Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>How much time did you spend on your phone yesterday? (less than 2 hours, 2\u20135 hours, more than 5 hours)<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q574624\">Hint<\/span><\/p>\n<div id=\"q574624\" class=\"hidden-answer\" style=\"display: none\">Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Approximately how much time do you spend on your phone in a typical day?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q990667\">Hint<\/span><\/p>\n<div id=\"q990667\" class=\"hidden-answer\" style=\"display: none\">Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Do you usually spend more time on your phone on weekdays or on weekends?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q281623\">Hint<\/span><\/p>\n<div id=\"q281623\" class=\"hidden-answer\" style=\"display: none\">Consider what type of answer this question could take: one of a list of options, a whole number, or a fraction of a number.<\/div>\n<\/div>\n<\/div>\n<h3>Data Collection<\/h3>\n<p>Different survey questions offer different advantages and disadvantages for data collection. For example, it may be easier to remember how much time you spent on your phone yesterday compared to questions about your general habits, but a single day of phone use may not be representative of your phone use in general. The next few questions ask you to consider a statistical question that allows for many different options for collecting data, each with its own advantages and disadvantages.<\/p>\n<p>Work in groups of two or three to complete the remainder of this activity.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Question 4<\/h3>\n<p>Suppose you want to investigate whether there is a relationship between a student\u2019s phone use in class and their grades. Write survey questions or state variable names to answer the first two questions below.<\/p>\n<p>&nbsp;<\/p>\n<p>Part A: List three ways that you could measure phone use. Make sure your list involves at least one categorical variable and at least one quantitative variable.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q947182\">Hint<\/span><\/p>\n<div id=\"q947182\" class=\"hidden-answer\" style=\"display: none\">Consider what types of data different types of variables take.<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Part B: List three ways that you could measure grades. Make sure your list involves at least one categorical variable and at least one quantitative variable.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q671060\">Hint<\/span><\/p>\n<div id=\"q671060\" class=\"hidden-answer\" style=\"display: none\">Consider what types of data different types of variables take.<\/div>\n<\/div>\n<p>Part C: Revisit the lists that you made in Parts A and B. Which of the approaches do you like the best?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q898393\">Hint<\/span><\/p>\n<div id=\"q898393\" class=\"hidden-answer\" style=\"display: none\">What do <em>you<\/em> think? Which of the items you listed do you think will reveal the kind of data that will help to answer the research question?<\/div>\n<\/div>\n<\/div>\n<p>When developing the variables to collect data in the question above, you considered individuals as the observational units. How might your variable selections change if the observational unit shifts from individuals to class sections of students? Keep in mind that variables should be characteristic of the observational units as you answer Question 5 next. How should the data collection change when observing class sections rather than individuals?<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Question 5<\/h3>\n<p>Suppose you want to investigate the relationship between phone use in class and grades using class sections as the observational units instead of individual students.\u00a0Write survey questions or state variable names to answer both of the questions below.<\/p>\n<p>&nbsp;<\/p>\n<p>Part A: Name one way you could measure phone use in a class section.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q77027\">Hint<\/span><\/p>\n<div id=\"q77027\" class=\"hidden-answer\" style=\"display: none\">Consider what types of data different types of variables take. How can the data reveal information about the class section rather than individual students?<\/div>\n<\/div>\n<p>Part B: Name one way you could measure grades in a class section.<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q750738\">Hint<\/span><\/p>\n<div id=\"q750738\" class=\"hidden-answer\" style=\"display: none\">Consider what types of data different types of variables take. How can the data reveal information about the class section (e.g., class policies, average grades) rather than individual students?<\/div>\n<\/div>\n<\/div>\n<h3>Ethical Issues<\/h3>\n<p>Think back on the survey questions and variables given in this activity as well as the ones you wrote to answer Questions 4 and 5. Do you have any ethical concerns about the data collection plans proposed? Some examples to consider include:<\/p>\n<ul>\n<li>In what ways could the data collection process or the information revealed cause some students to be treated differently than others?<\/li>\n<li>In what ways could the questions asked or methods of collection cause the data to imply associations that are not representative of the true situation?<\/li>\n<li>Are there any ethical considerations surrounding how the collected data will be stored?<\/li>\n<\/ul>\n<div class=\"textbox exercises\">\n<h3>Example<\/h3>\n<p>Privacy concerns in data collection are paramount. You may be familiar with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.<a class=\"footnote\" title=\"https:\/\/www.hhs.gov\/hipaa\/for-professionals\/privacy\/special-topics\/de-identification\/index.html\" id=\"return-footnote-3453-2\" href=\"#footnote-3453-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a> For individuals at least 18 years old, this rule prevents the individual&#8217;s medical information from being revealed to anyone who the individual has not identified as eligible to receive it.<\/p>\n<p>A similar exists for college students at least 18 years old. A federal law called the Family Educational Rights and Privacy Act (FERPA),<a class=\"footnote\" title=\"null\" id=\"return-footnote-3453-3\" href=\"#footnote-3453-3\" aria-label=\"Footnote 3\"><sup class=\"footnote\">[3]<\/sup><\/a><a class=\"footnote\" title=\"The U.S. Department of Education provides information about FERPA on their website: https:\/\/www2.ed.gov\/policy\/gen\/guid\/fpco\/ferpa\" id=\"return-footnote-3453-4\" href=\"#footnote-3453-4\" aria-label=\"Footnote 4\"><sup class=\"footnote\">[4]<\/sup><\/a> protects the privacy of student records.<\/p>\n<p>How could data collection and storage when studying phone use and grades protect the privacy of student information?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q466002\">Show Answer<\/span><\/p>\n<div id=\"q466002\" class=\"hidden-answer\" style=\"display: none\">Any personally identifiable information obtained during data collection from students must be removed to protect privacy of student information.<\/p>\n<ul>\n<li>Data that has been <em>de-identified<\/em> has had all personally identifying information removed from the data.<\/li>\n<li>Data that has been <em>anonymized<\/em> has been permanently de-identified so that the personally identifying information may never become reassociated with the data.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<p>Work in pairs or groups to summarize your understanding of the ethical concerns associated with data collection and storage as you answer Question 6.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>question 6<\/h3>\n<p>Are there any ethical concerns associated with a study of phone use and grades?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><span class=\"show-answer collapsed\" style=\"cursor: pointer\" data-target=\"q318324\">Hint<\/span><\/p>\n<div id=\"q318324\" class=\"hidden-answer\" style=\"display: none\">Consider privacy concerns surrounding data collection and storage.<\/div>\n<\/div>\n<\/div>\n<p><span style=\"background-color: #ffff99;\">[Note: a question could be inserted here to specifically add an LO for ethics as needed.]<\/span><\/p>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-3453-1\"><a href=\"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/00031305.2017.1375989\">https:\/\/www.biostat.wisc.edu\/~kbroman\/publications\/dataorg.pdf<\/a> <a href=\"#return-footnote-3453-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-3453-2\"><a href=\"https:\/\/www.hhs.gov\/hipaa\/for-professionals\/privacy\/special-topics\/de-identification\/index.html\">https:\/\/www.hhs.gov\/hipaa\/for-professionals\/privacy\/special-topics\/de-identification\/index.html<\/a> <a href=\"#return-footnote-3453-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><li id=\"footnote-3453-3\">null <a href=\"#return-footnote-3453-3\" class=\"return-footnote\" aria-label=\"Return to footnote 3\">&crarr;<\/a><\/li><li id=\"footnote-3453-4\">The U.S. Department of Education provides information about FERPA on their website: <a href=\"https:\/\/www2.ed.gov\/policy\/gen\/guid\/fpco\/ferpa\">https:\/\/www2.ed.gov\/policy\/gen\/guid\/fpco\/ferpa<\/a> <a href=\"#return-footnote-3453-4\" class=\"return-footnote\" aria-label=\"Return to footnote 4\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":493460,"menu_order":16,"template":"","meta":{"_candela_citation":"[]","CANDELA_OUTCOMES_GUID":"","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-3453","chapter","type-chapter","status-publish","hentry"],"part":3418,"_links":{"self":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3453","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/users\/493460"}],"version-history":[{"count":13,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3453\/revisions"}],"predecessor-version":[{"id":4247,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3453\/revisions\/4247"}],"part":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/parts\/3418"}],"metadata":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapters\/3453\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/media?parent=3453"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/pressbooks\/v2\/chapter-type?post=3453"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/contributor?post=3453"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/courses.lumenlearning.com\/lumen-danacenter-statsmockup\/wp-json\/wp\/v2\/license?post=3453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}