In the next preview assignment and in the next class, you will need to understand contingency tables(also called two-way tables). For the preview assignment in particular, you will need to understand conditional and marginal distributions. Matching Book DistributionIn this corequisite support activity, we’ll be using two-way tables(also called contingency tables) that classify the counts for two categorical variables measured for the same set of individuals. The Pew Research Center is a non-partisan, social science research think tank. One of the surveys they conduct periodically is called the Core Trends Survey, in which they poll a representative sample of American adults on a multitude of variables. The contingency table detailing the observed counts for the variables Number of books read in the last year and Type of residence is given below.
|
Type of residence |
|
| Urban |
Suburban |
Rural |
Total |
| Number of books read |
None |
133 |
144 |
81 |
358 |
| 1-4 |
146 |
149 |
53 |
348 |
| 5-9 |
76 |
74 |
33 |
183 |
| 10+ |
194 |
216 |
76 |
486 |
|
Total |
549 |
583 |
243 |
1,375 |
The conditional distribution of one variable with respect to a value of a second variable gives the counts or the relative frequencies of the first variable restricted to only that value of the second variable. In terms of the table, this means we will restrict ourselves to either one row or one column of the interior part of the table. For example, if we consider the conditional distribution of Number of books read in the last year for people who live in urban residences(as shown in the following table), we are restricting ourselves to the “Urban”column of the table and looking at the distribution of Number of books read in the last year for just the urban dwellers.
|
|
Urban |
| Number of books read |
None |
133 |
| 1-4 |
146 |
| 5-9 |
76 |
| 10+ |
194 |
| Total |
549 |
Often, when we discuss the conditional distribution, we’re more interested in the relative frequencies, or the proportion corresponding to each value of the variable of interest. For example, among all the people living in an urban setting, the relative frequency of individuals who read no books in the last year is:
[latex]\frac{133}{549}=0.2423=24.23%[/latex]
Question 1
1) Complete the following table for the relative frequencies of the conditional distribution of Number of books read in the last year for urban dwellers.
|
|
Urban |
| Relative frequency of books read as a percentage |
None |
0.2423 |
| 1-4 |
|
| 5-9 |
|
| 10+ |
|
| Total |
1 |
The marginal distribution of a variable gives the distribution of one of the variables with no regard to the other variable whatsoever. In the table, this will be either the total row or the total column. One way to remember this is that the “margins” are on the outsides of a piece of paper (sides, top, and bottom), and the total row and column are the outside row and column of the table (on the side and bottom).
For example, if we are considering the marginal distribution of Number of books read in the last year, we will look only at the totals in the far right column of the table because those give us the counts for each category of the variable Number of books read in the last year, with no regard to the other variable.
| Number of books read |
None |
358 |
| 1-4 |
348 |
| 5-9 |
183 |
| 10+ |
486 |
| Total |
1,375 |
As before, we are often interested in the relative frequencies of the marginal distribution. For example, the relative frequency of individuals who read no books last year is:
[latex]\frac{358}{1375}=0.2604=26.04%[/latex]
Question 2
2) Complete the following table with the relative frequencies for the marginal distribution of Number of books read in the last year by respondents.
| Relative frequency of number of books read as a percentage |
None |
|
| 1-4 |
|
| 5-9 |
|
| 10+ |
|
| Total |
100% |
Note: Sometimes the percentages will not sum exactly to 100%. This is due to a rounding error when you compute and round each percentage.
In the upcoming in-class activity, we will be considering whether two variables are independent or not. Recall that two variables are independent if knowing the value of one does not affect the likelihood of any value of the other. For example, if our two variables are independent, then knowing that someone lives in an urban area should not affect the probability that they fall into any one category of Number of books read in the last year. Consider the following contingency table again. If knowing the Type of residence should not affect the likelihood of Number of books read in the last year, each column in our contingency table should have approximately the same distribution of Number of books read in the last year. In other words, the conditional distribution of Number of books read in the last year for each value of Type of residence should match the marginal distribution of Number of books read in the last year. For example, the relative frequencies for the conditional distribution ofNumber of books read in the last year for urban dwellers should match the marginal distribution you found in Question 2. The relative frequencies of Number of books read in the last year for rural dwellers should also match that marginal distribution.
Let’s look again at the marginal distribution for number of books read, but this time, we’ll include more decimal places so we can avoid rounding errors in our next calculation.
Question 3
3) Let’s imagine that the conditional distribution of Number of books read in the last year for urban dwellers had relative frequencies that matched the marginal distribution. Note that there are 549 total urban dwellers, so 0.26036364 (26.036364%) of them would have read no books, or 26.036364% of 549=0.26036364×549=142.940urban dwellers would have read no books.
Part A:Find the expected count of urban dwellers who would read 1–4 books if the conditional distribution matched the marginal distribution.
Part B:Find the expected count of urban dwellers who would read 5–9 books if the conditional distribution matched the marginal distribution.
Part C:Find the expected count of urban dwellers who would read 10 or more books if the conditional distribution matched the marginal distribution. Since you know that there are 549 urban dwellers total and you already know how many fall into each of the othercategories, find this value by subtracting.
Part D:Compare the observed countsof urban dwellers in each category to theexpected counts(as shown in the following table).Do theexpected numbers look fairly close to the observed ones, orare they very far off?
ObservedExpectedNumber of books read by urban dwellers
|
|
Observed |
Expected |
| Number of books read by urban dwellers |
None |
133 |
142.940 |
| 1-4 |
146 |
138.947 |
| 5-9 |
76 |
73.067 |
| 10+ |
194 |
194.046 |
| Total |
549 |
549 |
Part E: Consider your answer to Part D. If the observed and expected counts were fairly close for the rural dwellers and suburban dwellers as well, would you guess that the variables Number of books read in the last yearand Type of residenceare independent or not independent?
In the next in-class activity, we will learn how to use technology to test whether these two variables are independent. Stay tuned!