Airbnb is a website that connects people who are renting out rooms or homes with people looking for accommodations in that area. In this in-class activity, we are going to look at the prices of Airbnb listings under $500 per night in New York City and examine the behavior of sample mean prices for random samples of listings.
Question 1
1) The average Airbnb listing price in New York City is $130 per night.[1] Which of the following would be more surprising: a single randomly selected listing price of $200 per night or an average listing price of $200 per night in a random sample of 25 listings? Explain.
Credit: iStock/FillippoBacci
In-Class Activities 9.B and 9.C introduced you to sampling variability. A sampling distribution is the probability distribution of a sample statistic, such as a sample mean or sample proportion, as it varies from sample to sample. In-Class Activity 9.B introduced you to the sampling distribution of a sample proportion.
In this activity, you will explore the sampling distribution of the sample mean for varying sample sizes and discover the Central Limit Theorem as it applies to means.
Go to the DCMP Sampling Distribution of the Sample Mean (Continuous Population) tool at https://dcmathpathways.shinyapps.io/SampDist_cont/.
You will use this tool to simulate random samples of different sizes from all Airbnb listings under $500 in New York City (NYC), where the sample mean price (in USD) is calculated in each sample. Enter the following inputs:
- Select Population Distribution: Real Population Data
- Select Example: New York Airbnb Prices
Question 2
2) Examine the population distribution of the NYC Airbnb listing prices displayed in the data analysis tool. What shape is this distribution?
(a) Symmetric
(b) Skewed right
(c) Skewed left
(d) Normal
Question 3
3) Before using the tool to generate random samples, let’s make a few predictions.
a) If you were to take 1,000 random samples of [latex]n=2[/latex] NYC Airbnb listings and calculate the sample mean price for each sample, what shape would you expect the distribution of the sample means to have?
b) If you were to take 1,000 random samples of [latex]n=10[/latex] NYC Airbnb listings and calculate the sample mean price for each sample, what shape would you expect the distribution of the sample means to have?
c) If you were to take 1,000 random samples of [latex]n=50[/latex] NYC Airbnb listings and calculate the sample mean price for each sample, what shape would you expect the distribution of the sample means to have?
d) How would you expect the variability in sample mean listing prices to change as the sample size increases?
e) How would you expect the average of the sample mean listing prices to change as the sample size increases?
Question 4
4) Use the tool to generate 1,000 random samples for each of the following sample sizes. Select the “Show Normal Approximation” box to overlay a normal distribution on each simulated sampling distribution. For each sample size:
- Sketch the graph of the resulting sampling distribution of the sample mean listing prices, including the x-axis label and scale (use the same x-axis scale for all three plots) and the overlayed normal distribution.
- Write down the mean of the sampling distribution of the sample mean listing prices.
- Write down the standard deviation of the sampling distribution of the sample mean listing prices.
a) [latex]n= 2[/latex]
b) [latex]n= 10[/latex]
c) [latex]n= 50[/latex]
Question 5
5) Now, you’ll compare the results in Question 4 to your predictions from Question 3.
a) As the sample size increases, how does the shape of the sampling distribution of the mean listing prices change? Does this match the pattern in your predicted shapes from Question 3, Parts A, B, and C?
b) As the sample size increases, how does the standard deviation of the sampling distribution of the mean listing prices change? Does this match your prediction from Question 3, Part D?
c) As the sample size increases, how does the mean of the sampling distribution of the mean listing prices change? Does this match your prediction from Question 3, Part E?
We can calculate the mean of sample means and standard deviation of sample means through simulation, as in Question 4, or through mathematical formulas. Suppose the mean of the population is µ and the standard deviation of the population is σ. Then the mean of the sample means is the same as the population mean, µ, but the standard deviation of the sample means decreases with the sample size. Specifically, the standard deviation of the sample means is equal to [latex]\frac{\sigma}{\sqrt{n}}[/latex].
| Sampling Distribution of the Sample Mean
When taking many random samples of size [latex]n[/latex] from a population distribution with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex]: The mean of the distribution of the sample means is [latex]\mu[/latex]. The standard deviation of the distribution of the sample means is [latex]\frac{\sigma}{\sqrt{n}}[/latex]. |
Question 6
6) Examine again the population distribution of the NYC Airbnb listing prices displayed in the data analysis tool. What are the values of [latex]\mu[/latex] and [latex]\sigma[/latex]?
Question 7
7) Calculate the mean and standard deviation of sample mean listing prices for each of the following sample sizes using the mathematical formulas given previously.
a) [latex]n=2[/latex]
b) [latex]n=10[/latex]
c) [latex]n=50[/latex]
Question 8
8) Compare the simulated mean and standard deviation of the sample mean listing prices from Question 4 to those calculated in Question 7. Do the values seem similar? (Hint: They should!)
In In-Class Activity 9.C, you saw the Central Limit Theorem at work for sample proportions. In this activity, you just witnessed the Central Limit Theorem at work for sample means. The Central Limit Theorem states that, as the sample size gets larger, the distribution of the sample mean will become closer to a normal distribution. Combining this with the expressions for the mean and standard deviation of the sample means results in:
| Sampling Distribution of the Sample Mean
When taking many random samples of size [latex]n[/latex] from a population distribution with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex]: The mean of the distribution of the sample means is [latex]\mu[/latex]. The standard deviation of the distribution of the sample means is [latex]\frac{\sigma}{\sqrt{n}}[/latex]. If the population distribution is not normal, the Central Limit Theorem states that the distribution of the sample means still follows an approximate normal distribution as long as the sample size is large (e.g., [latex]n≥30[/latex]) and the population distribution is not strongly skewed. |
Question 9
9) Suppose you are planning a vacation to Los Angeles (LA), and you would like to learn more about the distribution of the Airbnb listing prices in LA. You take a random sample of 50 LA Airbnb listings. The mean listing price in your sample is $152.
Assume the population of all LA Airbnb listing prices has the same mean and standard deviation as that for NYC. (You found these in Question 6.)
a) Use the mean and standard deviation you found in Question 7, Part C to calculate the z-score for a sample mean of $152. Write a sentence interpreting this value in the context of the problem.
b) Using the normal approximation, find the probability of observing a sample mean listing price of $152 or higher from a random sample of 50 listings.
c) Based on your answers in Parts A and B, do these data provide evidence that the mean Airbnb listing price in LA is higher than the mean Airbnb listing price in NYC? Explain.
- Kaggle. (2019). New York City Airbnb open data. https://www.kaggle.com/dgomonov/new-york-city airbnb-open-data ↵