Epidemiology is a science that studies the causes and effects of health-related events as they occur in populations. Disease, defined as a deviation from health, is one such health-related event of concern to epidemiologists, so in that regard, epidemiology is often thought of as the study of disease in populations.
Although the historical origins of epidemiology as a science are investigations of epidemics of infectious disease, modern epidemiology has expanded to not only include contagious diseases, but also environmental connections to disease states and even accidental injuries. Epidemiologists gather data on the frequency of various diseases in populations, and correlate risk factors associated with disease development.
The information compiled by epidemiologists provides the foundation for the concept of “public health.” The focus of public health is to prevent and manage diseases, injuries, and other conditions that threaten human helath. Keeping track of the number of people who acquire or have a particular health-related condition guides the deployment of interventions, distribution of grant funding for research on particular diseases, and development of public health policy.
In the United States, the Centers for Disease Control and Prevention (CDC) is the arm of the federal government responsible for promoting and protecting public health. On the infectious disease front, the CDC receives reports on the occurrence of certain infectious diseases, called notifiable diseases, from regions in the United States and its territories. The data received from state and local health agencies each week is compiled into a large searchable database called the National Notifiable Diseases Surveillance System (NNDSS) and published in the Morbidity and Mortality Weekly Report (MMWR), which is available in both print and electronic formats.
The data maintained within the NNDSS tables is available for retrospective analysis and also used to predict trends in disease occurrence in populations by time and place.
Two important measurements of disease occurrence and distribution are morbidity (illnesses due to a disease) and mortality (deaths due to a disease). The morbidity of a specific disease is defined as the number of susceptible people in a population that have the disease during a specific period of time, and is usually expressed as a rate. Mortality may also be expressed as a rate, and reflects the number of deaths due to a particular disease in a population over time.
Frequency of Disease in a Population
The frequency at which a disease occurs in a population is a way to assess risk and disease impact. One way to measure disease frequency is to simply count how many people are afflicted with it in a given period of time. However, using simple counts prevents comparison among populations, which may vary vastly in size. Therefore, disease frequency is usually expressed as a proportion of the number of people affected by the disease to the population size, over a specified time period.
Two specific statistical measures widely used in epidemiological investigations are incidence and prevalence. Incidence is a measure of the number of NEW cases of a disease during a specific time period. Incidence is used as a way to understand risk factors, such as the cause of a health-related event or concern for disease spread. Prevalence refers to the total number of both new and existing cases in a population over time, and provides an indication of the overall health of the population during a time period.
Both of these statistics are measures of disease over time. For this reason, they are often expressed as a rate:
Incidence rate = Number of new cases of a disease in a population ÷ Number of at-risk people during a time period
Prevalence rate =Number of cases of a disease in a population ÷ Number of at-risk people during a time period
Because the number of cases of any disease may be small, and the size of the population under study may be very large, the resulting number may be so exceptionally small that it is perceived to be of no consequence. Therefore, these measures are often expressed as a percent, or multiplied by a factor of 100, 1,000, or even 100,000 so that the rates are expressed in number of people per 100, 1,000, or 100,000 individuals, respectively. For example, if over the course of one year, five women in a study population of 200 women (5/200) develop breast cancer, then the calculated incidence of breast cancer in this population is 0.025. Such a small number might lead some people to presume their disease risk is also small. Therefore, the incidence may be expressed as a percent (2.5%), or a multiplier can be used to express the disease rate as 25 breast cancer cases per 1,000 women per year.
Prevalence estimates the likelihood that someone in a group will have a disease, and is often used as an indicator of the overall healthcare burden of a disease. Prevalence is highly dependent on the duration of the morbidity associated with the disease. The prevalence of chronic diseases will continually increase as the cases accumulate over time since it is a measure of both new and existing cases.
For example, a survey asking about personal experience with colon cancer was provided to 80,000 people, with 2,400 responding that they had been recently diagnosed with the disease, and 7,000 people responding that they’d had the disease for more than a year. The prevalence rate for colon cancer in this population can be determined by adding new and existing cases (9,400) and dividing by the size of the population (80,000). Therefore, the prevalence of cancer in this population is 0.1175, which can be expressed as 11.75%, or 118 colon cancer cases per 1,000 people.
Incidence and prevalence are two fundamentally different statistics. Keeping track of new cases of a disease requires an extensive network of reporting, while prevalence can be determined by surveying members of a population at a given point in time. Although there are limitations, if the disease is fairly stable in the population, has an average time of duration, and is not irreversible, incidence can be estimated using the prevalence data, and vice versa, using the following relationships (where Time refers to the average amount of time a person is sick with the disease):
Prevalence rate = Incidence x duration (in days, weeks, or months) of the disease
Incidence rate = Prevalence / duration (in days, weeks, or months) of the disease
Example: A prevalence survey conducted in upstate New York in 2013 revealed that 200 people in a study population of 16,000 Saratoga County residents were diagnosed with anaplasmosis, a bacterial disease transmitted to humans by ticks. For appropriately treated patients, the average amount of time that a person is sick with this disease is approximately four weeks.
1. What is the prevalence of anaplasmosis in Saratoga County, expressed per 1,000 people?
2. What is the estimated annual incidence of anaplasmosis per 1,000 Saratoga County residents? (Hint: Because this asks for the annual incidence, time should be expressed in years.)
3. In 2013, there were 223,865 people living in Saratoga County. Therefore, how many of those people would be expected to have anaplasmosis in 2013?
Measures of Association
Measuring the frequency of health-related events in populations is a useful way to assess and compare the health status of people in a population at one time, at different times, among subgroups of the population, or between populations. However, knowing how frequently a disease occurs in a single group does not indicate whether being a member of that group increases a person’s risk of experiencing a specific health-related event.
Therefore, identifying the cause of a health-related event in epidemiology usually includes comparing disease rates between groups of people who differ by exposure. By measuring and comparing the frequency of health related events between groups where one is exposed and one is not, it is possible to evaluate if there is an association between a particular risk factor (such as smoking) and a positive or negative impact on health (such as cardiovascular disease).
For cohort studies which involve a group of people who share the same experiences, epidemiologists may make comparisons of disease frequency by calculating ratios of the variables. The risk ratio (also known as relative risk) gives an indication of the strength of the association between a factor and a disease or other health outcome. To calculate the relative risk, the incidence of the health-related event in a group that was exposed to the condition or variable is divided by the incidence of the same variable in the group that was not exposed. In general, a calculated risk ratio equal to or close to one indicates that there is no difference in risk, because the incidence is approximately equal in both groups. Ratios greater than or less than one suggest higher or lower risk, respectively.
To calculate relative risk in a study involving a cohort, the conventional method is to organize the data in a format known in statistics as a “2 x 2” table. An example is shown in Table 1:
Table 1. Standard 2 x 2 table for relative risk calculation. | ||||
Outcome |
||||
Yes | No | Total | Incidence of outcome | |
Exposed | 16 | 108 | 124 | 16/124 = 0.13 |
Not Exposed | 14 | 341 | 355 | 14/355 = 0.04 |
Relative risk is calculated by dividing the incidence of the health event for the exposed group by the incidence of the health event in the unexposed group:
RR = incidence of outcome in exposed group / incidence of outcome of non-exposed group
RR = 0.13/0.04 = 3.25
In this case, because the calculated value is more than one, there is an increased risk associated with exposure to the risk factor. Specifically, the people in the exposed group were 3.25 times more likely to have the health event than those in the non-exposed group.
Example: To determine if patients who take prophylactic antibiotics before surgery are more or less likely to develop a hospital-acquired infection (HAI) of the wound, two groups of surgery patients were compared. One group with eighty participants took an antibiotic prior to surgery, and a second group of seventy patients did not take the antibiotic. Six people in the antibiotic group developed an HAI after surgery, and nine people in the no antibiotic group ended up with an HAI. Calculate the relative risk for this health-related event.
Table 2. Relative Risk Example | ||||
Outcome | ||||
HAI | No HAI | Total | Incidence of outcome | |
Antibiotic | 6 | 74 | 80 | 6/80 = 0.075 |
No antibiotic | 9 | 61 | 70 | 9/70 = 0.13 |
RR = incidence of HAI for exposed group / incidence for non-exposed group
RR = 0.075/0.13 = 0.58
Because the relative risk is less than one, there is a reduced risk for a patient of getting a hospital-acquired infection if they are given an antibiotic before surgery. Specifically, someone who gets a pre-surgery antibiotic has 0.58 times the risk of an HAI, meaning that taking a pre-surgery antibiotic cuts the risk of HAI by almost half.
Another option to compare frequencies of health events is to calculate the risk difference, in which the difference between the two measures is determined by subtraction. The risk difference provides a measure of the public health impact of the risk factor and indicates how the health event might be prevented if the risk factor were eliminated.
The cohort study above examined if prophylactic antibiotics reduced the risk of getting a hospital acquired infection for patients. Note that the incidence of HAI in the antibiotic group was 75 per 1,000 people, and the incidence of HAI in the no antibiotic group was 130 per 1,000. The difference between these two values (55) indicates the number of HAI cases that could be prevented through prophylactic antibiotics before surgery. In this case, HAI would be prevented for 55 people (per thousand) if they are given an antibiotic before surgery.
Example: To determine if people who take a proton-pump inhibitor to combat heartburn are more or less likely to develop gastroesophogeal reflux disease (GERD), two groups of patients were compared. One group with 43 participants took the PPI daily, and a second group with 39 patients did not. After 3 months, 6 people in the PPI group developed developed GERD, while 5 people in the no PPI group developed GERD. Calculate the risk difference and indicate whether taking a PPI reduces the risk for GERD.
Using a case-control (as opposed to a cohort) study, relative risk is also a way for epidemiologists to track risk factors associated with disease outbreaks and potentially assign a cause, such as during a sporadic outbreak of a food-borne disease.
Example: On February 12, 2014 a forty-three-year-old man in New York was hospitalized with a one-week history of diarrhea and vomiting followed by fever, neck pain, and headache. This was the first reported (index) case of a sporadic outbreak of listeriosis, a disease caused by the bacterium Listeria monocytogenes. Almost everyone who is diagnosed with listeriosis has an invasive infection, meaning that the bacteria spread from their intestines to their bloodstream or other body sites, including the central nervous system.
An epidemiological investigation of this event identified 630 laboratory confirmed listeriosis cases across 11 states. To identify the source of the bacteria, a case-control study was conducted to compare the foods eaten by 52 of the patients with confirmed cases, with a group of 48 healthy controls who were matched to the case patients by gender, age, and geographic location. All 100 people were asked to complete a questionnaire about the foods they had eaten just prior to the index case report. The data is illustrated in Table 3.
Table 3. Questionnaire data | ||||
Ate food | Did not eat food | |||
Food item: | Sick | Not Sick | Sick | Not Sick |
Weiner brand hot dog | 24 | 28 | 22 | 26 |
Raggle brand sausage | 20 | 32 | 29 | 19 |
Dairydelish yogurt | 38 | 14 | 13 | 35 |
Yummyum ice cream bar | 28 | 24 | 23 | 25 |
So… which food was contaminated? Calculate the relative risk for each food, and the highest number wins. Start by calculating the incidence for each group (first food item is shown):
Weiner hot dogs | Incidence exposed | 24/52 = 0.46 | RR: | 0.46/0.5 = 0.92 | |
Incidence not exposed | 22/48 = 0.5 | ||||
Raggle sausage | Incidence exposed | RR: | |||
Incidence not exposed | |||||
Dairydelish | Incidence exposed | RR: | |||
Incidence not exposed | |||||
Yummyum | Incidence exposed | RR: | |||
Incidence not exposed |
Based on your calculations, which food is associated with this food-borne outbreak of Listeria?
Epidemiology Problem
On July 30, 2013, the New York State Department of Health received a complaint from a person who said that he and his entire family had become very ill with vomiting and diarrhea after eating at a particular restaurant. He went on to say that his two-year-old son, Devin, became so dehydrated that he required hospitalization. After rehydration therapy, Devin was well enough to return home. A specimen taken from Devin’s stool was cultured on several types of media, including Sorbitol-MacConkey (SMAC) Agar, Salmonella-Shigella (SS) Agar, and Mannitol Salt Agar (MSA). Pink colonies grew on the SMAC plates, but no colonies appeared on the MSA plate. Pertinent results and additional tests are provided in Table 4, or your instructor may provide you with the actual media containing the cultures you should use in this analysis.
Table 4. Laboratory results for bacteria cultured from stool specimen. | ||||
Gram stain | MacConkey Agar | Catalase | Oxidase | TSI |
Pink single bacilli | Colonies were translucent and beige colored | Bubbles formed when H2O2 was added | No color change when smeared on a DrySlide Oxidase card | K/A H2S gas |
Based on the laboratory results, what bacterial genus is the most likely cause of Devin’s illness?
___________________________________________________________________________
What is the name of the gastrointestinal disease caused by infection with this bacterium?
___________________________________________________________________________
Over the next 10 days, the hospital where Devin had been treated saw an additional 19 cases of rapid-onset gastroenteritis in people who dined at the same restaurant as Devin and his family. The Department of Health initiated an investigation, which included interviewing restaurant staff and people who had eaten there at some point over the previous two weeks. Samples of food taken from the restaurant at the time of the interviews did not test positive for any harmful bacterial agents.
To determine what food item might have been contaminated, a case-control study was conducted with the 19 people who developed food poisoning after dining at the restaurant matched with 20 controls, who had eaten at the restaurant but did not get sick. The responses were compiled and the data is shown in Table 5.
Table 5. Data from the case-control study | ||||
Exposed |
Not Exposed |
|||
Food item: | Sick | Not Sick | Sick | Not Sick |
Hamburger | 8 | 11 | 9 | 11 |
Hot dog | 7 | 12 | 8 | 12 |
Fried chicken | 9 | 10 | 12 | 8 |
French fries | 10 | 9 | 11 | 9 |
Potato salad | 16 | 3 | 4 | 16 |
Soda | 11 | 8 | 11 | 9 |
Water | 9 | 10 | 6 | 14 |
Beer | 10 | 9 | 10 | 10 |
Can a particular food item be associated with the occurrence of disease among the people that ate at the restaurant? If yes, which food?
_________________________________________________________________________
Epidemiologists were interested in knowing if this was a sporadic outbreak or an indication of a disease becoming more common in the upstate New York region. Therefore, a quick analysis was performed by comparing the incidence of this disease at 4 times throughout the year of 2013, at weeks 12, 26, 40, and 52. Retrieve this data from the NNDSS database (you can find it at cdc.gov → MMWR → State Health Statistics → NNDSS Tables → Search Morbidity Tables).
Week 12: Number of reported cases __________
Week 26: Number of reported cases __________
Week 40: Number of reported cases __________
Week 52: Number of reported cases __________
In this case, the size of the population would be considered the same for each of the weeks, therefore it is possible to compare the number of reported cases without calculating the incidence. From this data, what can you conclude overall about the occurrence of this disease in upstate New York? Is there any indication that we are on the verge of an epidemic of this disease?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________