13A InClass

Question 1

1) Which corporations from around the world do you think contribute the most to polluting our environment with plastic waste?

A globe in the middle of a pile of garbage.

Credit: iStock/piotr_malczyk

In Questions 2–9, you will use a dataset called “plastics.” This dataset is a sample of  plastic products collected by community volunteers in countries around the world during  a brand audit to see what types of plastics are found in waste and from which  companies the plastics found came from.

The dataset can be accessed here:

https://docs.google.com/spreadsheets/d/15RIs8K9MaGVfYQmi15bH87dUIz8X1o60Wcs ukoLlFFo/edit?usp=sharing.

The dataset variables and variable descriptions are provided in the following table.

Variable Description
country Country of cleanup
year Year (2019 or 2020)
parent_company Source of plastic
empty Category left empty count
hdpe High-density polyethylene count (plastic milk  containers, plastic bags, bottle caps, trash cans, oil  cans, plastic lumber, toolboxes, supplement  containers)
Idpe Low-density polyethylene count (plastic bags,  Ziploc bags, buckets, squeeze bottles, plastic  tubes, chopping boards)
other_count Category marked other count
pet Polyester plastic count (polyester fibers, soft drink  bottles, food containers [also see plastic bottles])
pp Polypropylene count (flower pots, bumpers, car  interior trim, industrial fibers, carry-out beverage  cups, microwavable food containers, DVD cases)
ps Polystyrene count (toys, video cassettes, ashtrays,  trunks, beverage/food coolers, beer cups, wine and  champagne cups, carry-out food containers,  styrofoam)
pvc PVC plastic count (window frames, bottles for  chemicals, flooring, plumbing pipes)
grand_total Grand total count (all types of plastic)
num_events Number of counting events
volunteers Number of volunteers

Question 2

2) Plastic items contain recycling codes that appear within a chasing arrow logo like the one shown here:A recycling symbol
What do you think these codes mean?

Question 3

3) Looking at the “plastics” dataset, how many different types of plastic are represented  in the dataset? List the different types.

 

Question 4

4) Use the following dashboard website that has the “plastics” dataset loaded:  https://sarahsauve.shinyapps.io/TidyTuesdayBrandAuditDashboard/.

Select a country that interests you from the drop-down menu at the top of the #breakfreefromplastic dashboard.

a) What are the total numbers of plastics recorded in 2019 and 2020 for that  country?

b) Are there any companies reported as top polluters in 2019 or 2020 for that  country? If so, what is the top company and what is the % of recorded  plastics?

 

In Questions 5–8, you will use a subset of the “plastics” dataset. This subset contains all  of the countries from the original data that have counts of plastics from The Coca-Cola  Company in 2020. To create the subset, go to the “plastics” dataset and complete the following steps:

  1. Highlight the top row.
  2. Select “Data.”
  3. Select “Filter views.”
  4. Select “Create new temporary filter view.”

Now, filter the data using parent_company = “The Coca-Cola Company” and year =  “2020.” If filtered correctly, the number of observations in the dataset should be [latex]n[/latex] = 50,  and the first few observations should look like this:

NO ALT TEXT

Question 5

5) We are interested in answering the following research question about the “plastics” dataset:

“For the products reported from The Coca-Cola Company, is the average total  plastics count found in various countries in 2020 different from the claimed value of  275 items for The Coca-Cola Company?”

a) Based on the research question, are we interested in testing a proportion or  a mean?

Remember that for confidence intervals, if the population standard deviation is not  known, the confidence intervals are based on the t Distribution rather than the  normal distribution.

b) What distribution do you think would be used in a hypothesis test that would  answer this research question?

c) Why should that type of distribution be used?

d) Is this a one-sample or two-sample test?

Recall from In-Class Activity 12.B:

t Distribution

When taking many, many random samples of size [latex]n[/latex] from a population distribution with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex], the t-statistic

[latex]t=\frac{\bar{x}-\mu}{SE(\bar{x})}=\frac{\bar{x}-\mu}{s/\sqrt{n}}[/latex]

will follow a t Distribution with

[latex]n-1[/latex] degrees of freedom if:

1. the population distribution is normal; or

2. the population distribution is not too skewed and the sample size is large (e.g., [latex]n[/latex] ≥ 30).

As was the case with previous inference methods, there are a few assumptions/conditions that you should check before using the one-sample t-test.

Conditions for a One-Sample t-Test

  1. The sample is a random sample from the population of interest or it is reasonable  to regard the sample as if it is random. It is reasonable to regard the sample as a  random sample if it was selected in a way that should result in a sample that is  representative of the population.
  2. For each population, the distribution of the variable that was measured is approximately normal, or the sample size for the sample from that  population is large. Usually, a sample of size 30 or more is considered to be  “large.” If a sample size is less than 30, you should look at a plot of the data from  that sample (a dotplot, a boxplot, or, if the sample size isn’t really small, a  histogram) to make sure that the distribution looks approximately symmetric and  that there are no outliers.

Question 6

6) Create a histogram of the grand totals of all plastics found in various countries in  2020 for The Coca-Cola Company using the DCMP Describing and Exploring  Quantitative Variables tool at https://dcmathpathways.shinyapps.io/EDA_quantitative/. Describe the shape and spread of the histogram. Note that a random sample of  countries was taken each year, and the actual countries included varied from year to  year.

Question 7

7) Verify the conditions are met for a one-sample t-test.

a) Are the data a random sample from the population of interest? Is this  condition met? Explain.

b) Is the distribution of the variable total plastic counts approximately normal, or  is the sample size for the sample from that population large? Is this condition met? Explain.

Question 8

8) Write out the null and alternative hypotheses that would be used to answer the  research question stated in Question 5. Remember to use the correct notation.

Question 9

9) What if we are interested in answering the following question using samples from the  “plastics” dataset:

“For the products reported from The Coca-Cola Company, is the average total  plastics count found in various countries in 2020 less than the average total plastics  count from 2019?”

a) Is this a one-sample or two-sample test?

b) Write the null and alternative hypotheses that would be used to answer the  research question. Define the parameters of interest using the correct  notation, using [latex]\mu_{1}[/latex]to represent the population mean for total plastics for The  Coca-Cola Company in 2019 and [latex]\mu_{2}[/latex] to represent the population mean for  total plastics for The Coca-Cola Company in 2020.