objectives for this activity
During this activity, you will:
- Utilize standardized scores and the Empirical Rule to determine if an observation is unusual.
- Compare two observations by calculating and comparing z-scores.
Click on a skill above to jump to its location in this activity.
What Is Unusual?
In a medical study, many observations are made in an effort to obtain a data sample representative of the population from which it was taken. In this activity, you’ll see how standardized scores and the Empirical Rule can be used to determine if an observation is usual or unusual.
Around the world, pharmaceutical companies conduct clinical trials to evaluate the safety and efficacy of their drugs. Clinical trials are research studies performed on people and are aimed at evaluating if a new drug is safe and effective. People who participate in clinical trials are volunteers.

question 1
video placement
[Intro video: In this activity, we’ll use the landscape of a medical study to learn how the Empirical Rule can help us identify unusual observations of a quantitative variable. We’ll also compare two observations by calculating and comparing their standardized scores. Data collected during medical studies tends to form a bell-shaped, unimodal distribution, with a large number of observations located near the mean and equal numbers of values further from the mean falling off to either side. [voice over the empirical rule image from What to Know 4E]. Under these conditions, we know that almost all the observations in the distribution fall within three standard deviations from the mean. This lets us set an exact threshold for how far away from the mean an observation can be located for us to consider it unusual. But human clinical trials require people willing to participate. They are costly and take a great deal of time to gather the appropriate data, even decades to study the effects of a drug over an entire lifespan. Can you think of a good alternative to having human volunteers participate in clinical trials? How about using mice instead? In this activity, we’ll explore a medical study involving mice as we learn to apply the Empirical Rule and standardized scores.]
Mice are often used in medical studies to evaluate the effects of chemicals and pharmaceuticals. One reason for this is that scientists know a lot about the genome of a mouse. They are bred in labs to be identical, so the only thing different between them is the treatment. Mice also have short lifespans, which allows scientists to model the effects of a drug over their entire lifespan (about [latex]800[/latex] days). It is much more difficult to understand the effects of a drug over the lifetime of a human.
Consider a study concerned with learning how a drug or a treatment affects the body. The toxicity of a chemical and its impact on vital organs is of interest when assessing the effects of a chemical treatment. A standard method used to measure the level of toxicity in an organ is to use the organ’s weight.[1]
The Empirical Rule
recall
Before beginning the activity below, recall the definition of the Empirical Rule. What does it state?
Core skill:
Consider the weights of the livers and spleens in [latex]26[/latex]-week old female C57BL/6J laboratory mice. The mean liver weight is [latex]0.999[/latex] grams (g) with a standard deviation of [latex]0.087[/latex] g, and the mean spleen weight is [latex]0.086[/latex] g with a standard deviation of [latex]0.007[/latex] g. Use this information along with the Empirical Rule to answer Questions 2 and 3 below. Round your answers to three decimal places.
question 2
question 3
video placement
[insert sub-summary video: “In these questions, you calculated specific values for the liver and spleen weights of the mice that marked locations in the data exactly one, two, and three standard deviations below the mean (to the left on the graph) and above the mean (to the right)). [this is voice over the graph of the Empirical rule again.] So, [pointing to the horizontal axis] what values are associated with 68% of the liver weights? That’s right, liver weights between 0.912 g and 1.086 g make up 68% of all the liver weights because these are all within one standard deviation of the mean. So, what do you think you’d consider an unusual liver weight, either unusually high or unusually low? In statistics, we oftentimes consider an observation unusual if it is at least two standard deviations away from the mean of a data set. What percentage of this data is within two standard deviations? That’s right, 95% percent. In this context, between which two values of spleen and liver weights are 95% of the data located? I’ll let you figure that one out for yourself and use it to answer the following question.”]
question 4
Z-Scores
A higher organ weight is an indicator of higher toxicity. Suppose a mouse has a liver weight of [latex]1.07[/latex] g and a spleen weight of [latex]0.104[/latex] g. Is either of these values extreme? How many standard deviations from the mean do these values lie, and in what direction? We can use the z-score for each of these values to help us answer these questions. In the following questions, calculate the z-score for these weights then interpret that score. Remember, the z-score is a number of standard deviations, and has no units associated with it. It only gives relative proximity (distance and direction) from the mean of a quantitative variable.
recall
In the following questions, you’ll need to calculate and interpret z-scores. Take a moment to refresh the formula if needed.
Core skill:
question 5
question 6
question 7
video placement
[wrap-up video: In the final question of this activity, you compared two organ weights, one liver and one spleen, to determine which had a higher level of toxicity. But the distribution for liver and spleen weights didn’t have the same mean, so simply comparing one weight to the other wouldn’t help. Mouse spleens are naturally much lighter than mouse livers. You needed to compare their “unusualness” instead. To do so, you calculated z-scores for each weight. This let you determine which of the two was further from the mean weight for all such mouse organs [voice over the Empirical graph again here], which let you know which of the two was relatively heavier for it’s type. Remember that by calculating the z-score, you are calculating a distance in the distribution, not a weight in grams. Z-scores have no units associated with them. You found that the spleen showed a higher level of toxicity because the weight of the spleen was unusual, at 2.571 standard deviations above the mean. The weight of the liver, by contrast, was only 0.816 standard deviations away, within the middle 68% of all mouse liver weights.”]
- Sellers, R. S., Mortan, D., Michael, B., Bindhu, M., Roome, N., Johnson, J. K., Yano, B. L., Perry, R., & Schafer, K. (2007). Society of toxicologic pathology position paper: Organ weight recommendations for toxicology studies. Toxicologic Pathology. 35(5), 751-755. https://doi.org/10.1080/01926230701595300 ↵
- What is a genome? (2017, January 6). Yourgenome. Retrieved from https://www.yourgenome.org/facts/what-is-a-genome ↵
- Definition of toxicity. (2021, March 29). RxList. Retrieved from https://www.rxlist.com/toxicity/definition.htm ↵
- Lazic, S. E., Semenova, E., & Williams, D. P. (2020, April 20). Determining organ weight toxicity with Bayesian causal models: Improving on the analysis of relative organ weights. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7170916/ ↵