16C Coreq

In the next preview assignment and in the next class, you will need to use technology to calculate the linear regression equation and use that equation to calculate predicted values of the response variable. You will assess assumptions for regression and interpret confidence intervals.The objective of this analysis is to predict a coffee’s aftertaste based on its acidity. We will use the “coffeeratings” dataset. These data are originally from the Coffee Quality Database^[1] compiled by James DeLoux, a data scientist at BuzzFeed. The dataset contains information about the origin, processing, and taste quality for a sample of 1,338 coffees.The taste quality characteristics are on a scale of 0–10, as determined by a panel of expert coffee tasters.

•acidity: Measure of acidity (a sharp, tangy feeling, like when biting into an orange^[2]); higher values correspond to more acidic taste

•aftertaste: Measure of taste after you take a sip of the coffee; higher values indicate better quality taste

Question 1

1) What are the response and explanatory variables? Explain.

Question 2

2) Before using linear regression, let’s look at the variable aftertaste. The mean of the sample of aftertaste values is 7.41, and the 95% confidence interval for the mean aftertaste value for all coffees is (7.389, 7.425). Which of the following statements istrue? Select all that apply.

a) We are 95% confident that the mean value of aftertaste for all coffees is between 7.389 and 7.425.

b) The population mean value of aftertaste is 7.41.

c) The sample mean value of aftertaste is 7.41.

d) If we take thousands of random samples of 1,338 coffees and calculate a confidence interval, we expect 95% of the intervals will contain the true mean value of aftertaste.

Question 3

3) What is one reason we may want to use a linear regression model to predict thevalue ofaftertastebased on acidityrather than just using the sample meanfrom Question 1 as our best guess?

Question 4

4) Use spreadsheet DCMP_STAT_16C_Coffee_Ratings to answer the following questions.You will need the DCMP Linear Regressiontoolat https://dcmathpathways.shinyapps.io/LinearRegression/. Make sure to select the appropriate explanatoryvariable (𝑋)and responsevariable (𝑌).

Part A: Make a scatterplot to visualize the relationship between acidityand aftertaste.

Part B: Describe the relationship between the two variables.

Part C: Based on the scatterplot, is linear regression appropriate to describe the relationship between the variables? Explain.

Question 5

5) Usethe DCMP Linear Regressiontoolto calculate the linear regression equation.

Part A: Write the linear regression equationusing customized variable names.

Part B: Interpret the slope in the context of the data.

Part C: Is it meaningful to interpret the intercept? If so, interpret the intercept in the context of the data. If not, explain why not.

Question 6

6) Now let’s use the model for prediction.

Part A: Whatis the predicted aftertastevaluefor a coffee withanacidityof 7.25?

Part B: How is the aftertastevalueexpected to change iftheaciditydecreases by 0.25 points?

Question 7

7) Let’s conclude by looking at plots of the residuals.

Part A: Make a scatterplot of the residuals vs.the predicted values by selecting the Fitted Values & Residual Analysis tab.

Part B: Based on the plot, is the linear regression equation a good fit for the relationship between acidity and aftertaste? Explain.

Part C: Use the linear regression tool (i.e., data analysis tool) to make a histogram of the residuals. Select the option “Histogram/Boxplot of Residuals.”If desired, select the option “Superimpose the Normal Curve” as well.

Part D: Describe the distribution of the residuals.

coffee-quality-database. (2018, June 16). GitHub. Retrieved from https://github.com/jldbc/coffee-quality-database ↵
Coffee cupping. (2006, July 26). In Wikipedia.https://en.wikipedia.org/wiki/Coffee_cupping ↵