15C InClass

In the preview assignment, you were introduced to flight data for three airlines in March 2021.The two-way table is given again below.^[1] Recall that we are considering the question of whether the distributions of flight status are the same among the flights of the three airlines. For our test, the null hypothesis is that the distribution of flight status is the same for these three airlines, and the alternative hypothesis is that the distribution of flight status is not the same for these three airlines.

On-Time FlightsDelayed FlightsCanceled FlightsDiverted FlightsTotalAmerican Airlines42,6004,6572969547,648Delta Airlines51,6204,0301505655,856Southwest Airlines69,3849,2801,78212880,574Total 163,60417,9672,228279184,078

Question 1

1) Suppose that we obtain enough evidence to reject the null hypothesis; that is, we decide that there is strong evidence to support the idea that the flight status distributions are not the same for at least one of the three airlines. At that point, what else might you want to know about the situation?

Question 2

2) Consider the following listed assumptions and conditions required to perform a chi-square test of homogeneity. Discuss whether each condition is met and whether there is any cause for concern.Appropriate Data and Variables: This is not an official condition, but it is important to make sure that we aredealing with data that give the counts for each value of a categorical variable. Thatcategorical variable should be measured for a samplefrom each population of interest.

Part A: Is this condition met?Explain.Independence/Randomness Condition: The samples from our populationsshould be independent,random samples or independent samples that can beconsidered representative of the respective populations.

Part B: Is this condition metwithour sample of March 2021 flights? Explain.Large Sample Sizes Condition: The sample sizes need to be large enoughso thattheexpected count in each cell is at least five. Notice that this conditionrequires us to know the expected count in each cell. We can use technology to obtain that information. Goto the DCMPChi-Square Testtool at https://dcmathpathways.shinyapps.io/ChiSquaredTest/.•Click the Test of Independence/Homogeneitytab at the top. •In the drop-down menu for “Enter Data,”select “Textbook”and “Airline FlightStatus.”•Noticethat the rows represent the different populations you are testing for homogeneity.•Noticethat thecolumns are the values of the categorical variable we are considering the distributions of. •Make sure to check the box for“Show ExpectedCell Counts.”

Part C: Is the expected cell frequency condition met? Explain.

Question 3

3) Let’sproceed with this chi-square test at a significance level of 0.01. Continueusing the data analysis tool. What is thevalue of thechi-square test statistic obtained from the test?

Question 4

4) What is the P-value obtained from the chi-square test? What does the P-valuerepresent and what does it tell you?

Question 5

5)What is the conclusion of our hypothesis test? State your conclusion in context. Even though we have already drawn a conclusion from our hypothesis test, there is still some information we can glean by looking at the difference between the observed count and the expected count for each cell.The data analysis toolcalls this difference the residualfor that cell (and the idea is similar to the concept of residuals you saw when looking at the differencesbetween observed values and predicted values in the linear regression context). Residuals are calculated using the formula:Residual=Observed−ExpectedSince the values in our cells may vary quite a bit, it’s a good idea to look atwhat the data analysis toolcallsstandardized residualsinstead.These are sometimes referred to as Standardized Pearson residuals.These are values that standardize the residuals so that if the null hypothesis is assumed to be true, they can be interpreted as normal z-scores. In particular, most standardized residuals for a given test will fall between −2 and 2. We can use these standardized residuals to determine how far off our observed countis from what was expected if the null hypothesis istrue (i.e., if the distributionsare really the same). The sign of the standardized residual tells us whether we observed more cases in that cell than weexpected (a positive residual) or fewer cases than we expected (a negative residual).

Question 6

6) In thedata analysistool, check the boxes for “Show Residuals” and “Show Standardized Residuals.”

Part A: Completethe following table with the standardized residuals for Delta Airlines and then interpretthe standardized residuals.Standardized Residual for On-time FlightsStandardized Residual for Delayed FlightsStandardized Residual for Canceled FlightsStandardized Residual for Diverted FlightsDelta Airlines 31.9

Part B:Complete thefollowingtable with the standardized residuals for Southwest Airlines andthen interpret the standardized residuals. Standardized Residual for On-time FlightsStandardized Residual for Delayed FlightsStandardized Residual for Canceled FlightsStandardized Residual for Diverted FlightsSouthwest Airlines−33.3

Part C:Based on your answers to Parts A and B, which airline might you choosebetween DeltaAirlinesand SouthwestAirlinesif you were planning a flight? A word of caution: As we saw in the preview assignment, the degrees of freedom fora chi-square test of homogeneity are not related to the sample size at all, so theydo not increase as the sample size increases. The degrees of freedom dependonly on the number of rows and columns in the associated two-way table. As aconsequence, it can be that if the sample size is verylarge, a chi-square test mayresult in rejecting the null hypothesiseven when theactualdifferences between thedistributions are small.In our airline example, we had a very large sample size foreach population, and we got a very small P-value that led us to reject the nullhypothesis.

Question 7

7) Discuss this result in terms of practical significance and statisticalsignificance. Can we safely conclude that any of our three airlines are doing muchbetter than the others?

U.S. Department of Transportation, Bureau of Transportation Statistics. (n.d.). On-time performance -Reporting operating carrier flight delays at a glance. https://www.transtats.bts.gov/HomeDrillChart_Month.asp?5ry_lrn4=FDFD&N44_Qry=E&5ry_Pn44vr4=DDD&5ry_Nv42146=DDD&heY_fryrp6lrn4=FDFE&heY_fryrp6Z106u=F ↵