The goal of this in-class activity is to investigate if there are differences in voting behaviors between eligible voters who affiliate with one of the two major political parties in the United States (Democrat and Republican) and those who do not. Specifically, we will look at the difference in the proportions of “regular voters” between the two groups. In these data, a “regular voter” is a person who indicated that they voted in “all or all-but-one of the elections they were eligible for” in the survey.
The authors in the FiveThirtyEight article “Why Many Americans Don’t Vote”[1] limited their analysis to only include survey responses from voters who were eligible for at least four election cycles. For this activity, we will also limit our data to this group.

Credit: iStock/LifeSyleVisuals
Question 1
Why do you think the authors chose to only include survey responses from people who were eligible to vote for at least four election cycles in their analysis?
The data come from the data journalism website FiveThirtyEight. The data were originally collected as part of an online survey conducted by Ipsos, where respondents answered demographic questions along with questions about political party affiliation and voting behavior.
The data contain information for 3,594 respondents who said they identify with a major party (Republican or Democrat) and 2,242 respondents who said they do not have a major party affiliation. All the respondents had been eligible to vote for at least four election cycles at the time of the survey. The primary variables of interest are:
party_id: Republican, Democrat, or Other
major_party: Yes if the respondent identified as Republican or Democrat; no otherwise
regular_voter: Yes if the respondent voted in all or all-but-one of the elections they were eligible for; no otherwise
The first 15 observations from the dataset are presented in the table below.
| party_id | major_party | regular_voter |
| Democrat | Yes | Yes |
| Other | No | Yes |
| Democrat | Yes | No |
| Democrat | Yes | No |
| Republican | Yes | Yes |
| Other | No | No |
| Republican | Yes | Yes |
| Democrat | Yes | Yes |
| Republican | Yes | Yes |
| Other | No | Yes |
| Republican | Yes | No |
| Democrat | Yes | No |
| Republican | Yes | No |
| Other | No | No |
| Republican | Yes | Yes |
Of the 3,594 respondents with a major party affiliation,1,255 were regular voters. Of the 2,242 respondents with no major party affiliation, 556 were regular voters.
Question 2
Our primary goal is to use the data to examine whether there’s a difference in the proportions of regular voters between eligible voters who said they affiliate with a major political party and those who said they don’t.
Let’s start by looking at a graphical display of the two variables. Select the “Voter” dataset in the DCMP Compare Two Population Proportions tool at
https://dcmathpathways.shinyapps.io/2sample_prop/.
- What are the two groups in this analysis? Are they independent or dependent? Explain.
- Use software to visualize the distribution of regular voters based on whether the respondents said they have major party affiliations. You can use the DCMP Compare Two Population Proportions tool at
https://dcmathpathways.shinyapps.io/2sample_prop/. - Interpret the plot. Does there appear to be a difference in the proportions of regular voters between eligible voters who said they have a major party affiliation and those who said they don’t? Explain.
It looks like the proportion of regular voters is higher for those who said they have an affiliation with a major party; however, the difference does not appear to be very large.
Question 3
Ultimately, we want to understand the value [latex]p_{1} - p_{2}[/latex], where [latex]p_{1}[/latex] is the true population proportion of eligible voters with a major party affiliation who are regular voters and [latex]p_{2}[/latex] is the true population proportion of eligible voters with no major party affiliation who are regular voters.
Ideally, we would have complete data about the two populations of interest (eligible voters with and without a major party affiliation) so we could calculate [latex]p_{1} - p_{2}[/latex] directly. However, we don’t have complete data on each population, so we’ll use our samples to draw conclusions about the differences between these two groups.
- What are the samples for this analysis? Include a description of each sample along with the sample size.
- The sample statistic [latex]\hat{p}_{1} - \hat{p}_{2}[/latex] is our “best guess” for the true difference in proportions, [latex]p_{1} - p_{2}[/latex]. Use technology to obtain the sample statistic and state what this value means in the context of the data.
Question 4
Though we have a “best guess” for the difference in proportions of regular voters between the two groups, we expect there is some variability associated with that guess. In other words, if we calculated the difference in proportions of regular voters from two other random samples of 3,594 eligible voters with a major party affiliation and 2,242 eligible voters without an affiliation, we would expect to get a different (yet probably close) value of [latex]\hat{p}_{1} - \hat{p}_{2}[/latex] than we did in the previous question. When certain conditions apply (more on those later), the sampling distribution tells us three things about the distribution of [latex]\hat{p}_{1} - \hat{p}_{2}[/latex]:
- For large samples, the distribution is normal.
- The distribution has a mean of [latex]\hat{p}_{1} - \hat{p}_{2}[/latex], the true population difference.
- The distribution has a standard deviation of [latex]\sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}[/latex]. This is the estimate of the sample-to-sample variability, the random variability we expect in [latex]\hat{p}_{1} - \hat{p}_{2}[/latex] if we take random samples of the same size repeatedly. In the formula, [latex]p_{1} - p_{2}[/latex] are the true population proportions as previously stated, and [latex]n_{1} - n_{2}[/latex] are the sample sizes for the group with a major party affiliation and the group without a major party affiliation, respectively.
Before calculating a confidence interval, let’s calculate the standard deviation. Similar to previous calculations, we will replace [latex]p_{1}[/latex] and [latex]p_{2}[/latex] in the formula with the respective sample proportions [latex]\hat{p}_{1} - \hat{p}_{2}[/latex]. The estimate is called the standard error.
Use technology to obtain the standard error.
Question 5
Now that we have our estimate of the difference in proportions and the standard error, let’s calculate the confidence interval. The formula for the confidence interval is:
Estimate [latex]\pm[/latex] Margin of Error
[latex](\hat{p_{1}} - \hat{p_{2}}) \pm z^{*} \times \sqrt{\frac{\hat{p_{1}}(1-\hat{p_{1}})}{n_{1}} + \frac{\hat{p_{2}}(1-\hat{p_{2}})}{n_{2}}}[/latex]
Recall from Question 2 that the estimate is the difference in the sample proportions. Let’s break down the margin of error. The margin of error is the width of the confidence interval and is comprised to two parts:
- [latex]z^{*}[/latex]: The z critical value; this is the point on the standard normal distribution such that the proportion of area under the curve between [latex]-z^{*}[/latex] and [latex]+z^{*}[/latex] is [latex]C[/latex], the confidence level.
- Standard error: A measure of the sample-to-sample variability, as discussed in Question 3.
In practice, we can use technology to calculate the confidence interval.
- Use the DCMP Compare Two Population Proportions tool to calculate the 95% confidence interval for the true difference in the proportions of regular voters among those who have a major party affiliation and those who do not.
- Interpret the interval in the context of the data. Hint: Think about the interpretation you wrote for a single proportion to help guide your interpretation for this context.
Question 6
Similar to when we use a confidence interval to draw conclusions about a single proportion, we need to check a set of conditions to ensure the interval is appropriate for the data. The three conditions when calculating confidence intervals for the difference between two proportions are:
- Random samples: The observations represent a random sample of the population.
- Independence: The samples are independently selected. (This is the condition you assessed in Question 1.)
- Sample size: [latex]n_{1}\hat{p}_{1} \geq 10[/latex] and [latex]n_{2}\hat{p}_{2} \geq 10[/latex]
Check the conditions for these data. State whether each condition is satisfied, along with a brief explanation about your decision. If you are working in a group, have each group member check a single condition and then discuss the results as a group.
Question 7
Now, let’s use our confidence interval to answer the primary question of interest—is there a difference in the proportions of regular voters among eligible voters who have a major party affiliation and those who do not?
- What value would you expect [latex]p_{1} - p_{2}[/latex] to be if there was truly no difference in the proportions of regular voters between the two groups?
- Based on your interval, what would you conclude regarding whether there is a difference in the proportions of regular voters among those who have a major party affiliation and those who don’t? Explain.
- Suppose there is a “Get Out the Vote” campaign that would like to mail fliers to eligible voters who do not regularly vote to encourage them to vote in an upcoming election. Based on your analysis, would you advise them to target one group of voters over the other? Explain.
- Thomson-DeVeaux, A., Mithani, J., & Bronner, L. (2020, October 26). Why many Americans don’t vote. FiveThirtyEight. https://projects.fivethirtyeight.com/non-voters-poll-2020-election/ ↵