11F In-Class Activity

In 2004, two University of Chicago economists (Marianne Bertrand and Sendhil Mullainathan) decided to conduct an experiment[1] to test for labor market discrimination.

The investigators created 4,890 mock identical resumés, which were sent to job placement ads in Chicago and Boston. To gauge market racial discrimination, each resumé was randomly assigned either a commonly-white or commonly-black name. The experimenters then measured the proportion of resumés from each group (white and black) that received callbacks.[2]

Credit: iStock/Mitoria

Question 1

Conducting this experiment took a lot of work and resources. Why didn’t the investigators just compare observed black and white wages in current data? Why do you believe they went through all the trouble of conducting this experiment?

Questions 2–6: These questions reference the study results, which are summarized in the following table:

Commonly-White Names Commonly-Black Names Total
Called back 246 164 410
Not called back 2,199 2,281 4,480
Total 2,445 2,445 4,890

[latex]p_{1}[/latex] = true proportion of commonly-white-named resumés that received callbacks

[latex]p_{2}[/latex] = true proportion of commonly-black-named resumés that received callbacks

Question 2

Before conducting a test, let’s explore the results.

  1. Part A: Calculate  and  (the sample proportions of white/black-named resumés that received callbacks, respectively).
  2. Part B: Are these two sample proportions different enough to show that the  difference didn’t occur by chance alone? Justify using basic reasoning  (without conducting a hypothesis test).
  3. Part C: If the sample sizes were only 20 resumés per group, would you be more  likely to believe the difference in proportions could have occurred by chance  alone? Explain.

Question 3

If we wanted to test if the difference in callback rates is statistically significant,  should we use a one-sample or two-sample inference procedure? Explain.

Question 4

We will now conduct a two-sample z-test of proportions. Here are the hypotheses for  this test:

*MISSING LaTeX*

  1. Part A: In which hypothesis are the callback rates between the groups equal? In  which hypothesis could the commonly-white names get higher callback  rates?
  2. Part B: What is the null hypothesis value of the difference in proportions?

Question 5

To ensure our inferences are accurate, we must check certain conditions for  conducting a two-sample z-test of proportions:

  1. Part A: Confirm that the data were collected via a random sample or an experiment  with random assignment to treatment. This would allow us to propose that  we have comparable experimental groups.
  2. Part B: Confirm that we have a large enough sample size to meet the “sample size”  condition. This will help ensure that the underlying sampling distribution we  use to calculate the P-value can be modeled with a normal curve. The  conditions are: 

To calculate the combined (“pooled”) sample proportion, add the total  number of candidates that received callbacks in both groups and divide this  sum by the total number of people in the study:

[latex]\hat{p_{c}} \frac{x_{1}+x_{2}}{n_{1}+n_{2}}, where [latex] x_{1}[/latex] and [latex]x_{2}[/latex] are the number of “successes” from Groups 1 and 2, respectively.

Note: The final condition is that the sample sizes are each less than a tenth of the size  of the populations from which they’re drawn [[latex]n_{1} < 0.10(N_{1})[/latex] and [latex]n_{2} < 0.10(N_{2})[/latex]]. This  helps ensure our estimates for the standard errors are accurate. However, this condition  does not need to be checked in the case of a randomized experiment.

Conditions for Two-Sample Z-Test of Proportions

  1. Large Counts: Check that *MISSING LATEX*
  2. Random Samples/Assignment: Check that the two samples  are independent and random samples or that they come from  randomly assigned groups in an experiment.
  3. 10%: Check that *MISSING LATEX*.

Question 6

Go to the DCMP Compare Two Population Proportions tool at

https://dcmathpathways.shinyapps.io/2sample_prop/. Select the “Number of  successes” option and input the relevant data from the study. Under the type of  inference section, select “Significance Test” and the appropriate alternative  hypothesis.

  1. Part A: Interpret the “observed difference” value. How was this calculated and what  does it mean?
  2. Part B: State and interpret the z-test statistic value.
  3. Part C: State and interpret the P-value.
  4. Part D: Using the previous information, draw a conclusion for this test. State what  your conclusion means in the context of the study. Use significance level [latex]\alpha =0.05[/latex].

Question 7

We found that the difference between callback rates was statistically significant, but  is it practically significant? Justify your answer using the observed difference in  sample proportions.

 

 


  1. Bertrand, M. & Mullainathan, S. (2003, July). Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. National Bureau of Economic Research. https://www.nber.org/papers/w9873
  2. Lesson adapted from Skew The Script. https://skewthescript.org/7-8