• Clinical science

Epidemiology

Abstract

Classical epidemiology is the study of the distribution and determinants of disease in populations. Clinical epidemiology applies the principles of epidemiology to improve the prevention, detection, and treatment of disease in patients. Epidemiological studies can be descriptive, in which case they investigate individual characteristics, places, and/or the time of events in relation to an outcome, or analytical, in which case they seek to determine the influence of an exposure on an outcome. Descriptive studies may take the form of case reports, case series, and ecological studies. Analytical studies can be further divided into experimental (e.g., randomized control studies) and observational (e.g., cohort or case-control studies) types. There are a number of factors that influence the amount of clinical evidence epidemiological studies contribute. Limiting bias, confounding, and effect modification make conclusions drawn from studies more reliable.

In epidemiological studies, the strength of the relationship between two events is measured using ratios, rates, and proportion tests. This relationship can be presented in the form of a two-by-two table, which helps to visualize the number of false positive and true positive diagnostic test results, as well as the number of patients who actually have the disease and those who do not (tested with a gold standard test). A diagnostic test is considered precise if the results it yields are reproducible under similar conditions (reliable) and if it measures what it was developed to measure (valid). The higher a test's reliability and validity, the lower the amount of random errors it will generate.

Also see statistical analysis of data.

Introduction to epidemiology

  • Classical epidemiology: the study of determinants and distribution of disease in populations
  • Clinical epidemiology: the study and application of principles of epidemiology to improve the prevention, detection, and treatment of disease in patients
  • Population (epidemiology): the total number of people or inhabitants in a country or region from which a sample is drawn for statistical measurement
  • Data (epidemiology): factual information, collected during observation and/or experimentation, that is used as a basis for analysis and discussion
  • Sample (epidemiology): a small group of people that are representative of a population
Definition Time Area Examples Possible factors
Endemic A disease that affects individuals at a relatively constant and expected rate within a specific population/region Unlimited Limited
  • Local requirements
    • Spread of disease vectors and pathogen reservoirs
    • Geographical conditions
    • Climate
    • Living conditions (e.g., sewage systems, housing, work)
Epidemic A disease that affects individuals at an unusually fast or unexpected rate within a specific population/region Limited Limited
  • Increased infectivity of a pathogen
  • Living conditions (e.g., living in crowded areas)
  • Spread/introduction of the pathogen to a new geographical area
Pandemic Worldwide epidemic Limited Unlimited
  • Spanish flu (1918/19)
  • Global trade and travel
  • Increased infectivity of a pathogen (e.g., antigenic shift)

Epidemiological studies

Principles of study design

  • Study designs should be tailored to the question that needs to be answered.
  • A good study design with high levels of evidence increases the strength of conclusions drawn from the results.

Types of epidemiological studies

Description Example
Descriptive studies
  • Studies that try to identify individual characteristics (age, sex, occupation), place (e.g., residence, hospital), or time of events (e.g., during diagnosis, reporting) in relation to an outcome (e.g., disease).
Analytical studies
  • Studies that determine the relationship between an exposure and outcome
  • Always involve a comparison group

Interpretation

  • Epidemiological studies suggest relationships between two events (e.g., exposure and disease).
  • This comparison can be measured using rates, proportions, and/or ratios.
    • Ratios: comparison of two related or unrelated values
    • Proportion: comparison of one part of the population to the whole
    • Rates: measure of the frequency of an event in a population over a specific period of time
      • Crude: rates that apply to the entire population (do not take specific characteristics into account)
      • Specific: rates that apply to a population group with specific characteristics taken into account (e.g., sex-specific, age-specific)
      • Standardized (adjusted rates): crude rates that have been adjusted to consider specific population characteristics to allow for comparison (e.g., usually used in death rates)
  • These measures determine the strength of association between two events and allow us to describe population characteristics (e.g., detect populations at risk) and quantify morbidity/mortality.
  • Furthermore, researchers can eventually develop hypotheses about why these groups are at risk.

Descriptive studies

Case report

  • Description: a report of a disease presentation, treatment, and outcome in a single subject or event
  • Example: report of a single case of cervical cancer in a 25-year-old female subject

Case series report

  • Description: a report of a disease course/response to treatment compiled by aggregating several similar patient cases
  • Example: collecting and reporting several cases of pericarditis at a local hospital

Ecological study

  • Aim: to identify an exposure associated with an outcome (e.g., disease), especially if the outcome is rare
  • Study method: assesses aggregated data where at least one variable (e.g., an outcome) is at a population level and not an individual level
  • Example: determining the incidence of cholera deaths based on specific locations (e.g., different parts of a city) to identify the exposure (e.g., water from a single contaminated pump)

Analytical studies

Experimental studies

Randomized controlled trials (RCT; interventional studies)

  • Aim: determines the possible effect of a specific intervention on a population of interest
  • Study method: patients are randomly allocated as either treatment or control subjects , after which they are monitored and evaluated for the outcome of interest
  • Special variants
    • Blinding: : the practice of not informing an individual or group about which individuals are a control or treatment candidate; used to minimize bias
      • Single-blind study: Only researchers know who is a control or treatment candidate.
      • Double-blind study: Neither the researcher nor the study participants know who is a control or treatment candidate.
      • Triple-blind study: The researcher, the study participants, and the person who analyzes the data do not know who is a control or treatment candidate.
    • Cluster randomized controlled trials
      • Different participants are grouped together into clusters and then randomly assigned to the control or intervention groups.
      • A cluster RCT is easier to perform than a classical RCT, but may have less validity than a classical RCT.

Field trials

  • Aim: determines the effect of disease-preventing interventions in non-institutionalized individuals
  • Example: following subjects who have received the Salk vaccine for prevention of poliomyelitis

Community trials

  • Aim: similar to field trials, but follows entire communities instead
  • Example: following communities who implement lifestyle changes to prevent cardiovascular disease

Clinical drug trials

  • Definition: studies involving human subjects to assess new health interventions to provide safe and effective medical care
  • Compares the benefits of a single treatment vs. a placebo or between 2 or more drugs

Clinical drug trials

Study population Research aim
Preclinical studies Animals Determine the effect, dose, and side effects (teratogenic/carcinogenic potential) of the drug
Phase 0 Small number of healthy individuals Test subtherapeutic doses of a new drug to determine preliminary pharmacokinetic and/or pharmacodynamic properties of the drug
Phase I Small number of healthy individuals

Determine the side effects, toxicity, pharmacokinetics, and pharmacodynamics of the drug

Phase II Small number of patients with a specific disease

Determine the efficacy, effective dosing, and side effects of the drug

Phase III Randomized control trial with a large number of patients with a specific disease Compare the new drug with current treatment options or placebo
Phase IV Large number of patients with a specific disease after drug approval Ascertain the effects of long-term therapy and effects on special patient groups (e.g., patients with chronic renal failure); can lead to withdrawal of a drug from the market

Clinical drug trials assess adverse event rates and drug interactions. They can be used to develop warnings and precautions as well as contraindications for the use of a drug. For example, if the rate of hyperglycemia is significantly higher in the treatment group compared to the control group, it may not be appropriate for use in patients with diabetes mellitus.

Factorial study

  • Aim: to test the effect and interactions of two or more factors (e.g., treatments)
  • Study method: Individuals are randomly assigned to groups receiving different doses and combinations of drugs.
  • Example: In order to study 5 dose levels of a drug X and 2 dose levels of drug Y, 10 different intervention combinations should be examined.

Crossover study

  • Aim: to obtain a more efficient comparison of treatments with fewer patients
  • Study method: each patient switches from one treatment to another during the trial period and serves as their own control
  • Example: each patient receives both drug X and drug Y, but at different time periods during the study

Observational studies

Cross-sectional study (prevalence study)

  • Aim: to determine the prevalence of exposure and disease
  • Study method: the prevalence of disease and other variables (e.g., risk factors ) are measured simultaneously at a particular point in time (i.e., a snapshot of the population)
  • Example: investigating the number of patients with both coronary heart disease as well as hypertension in the year 1998

Case-control study

  • Aim: to study if an exposure is associated with an outcome (e.g., disease)
  • Study method
    1. Researchers begin by selecting patients with the disease (cases) and without the disease (controls) with matching baseline characteristics from the same source population.
    2. The observer compares the presence of risk factors between these two groups.
    3. The odds ratio is then determined between these groups.
  • Example: determining the link between cervical cancer and human papillomavirus (HPV) exposure by comparing otherwise similar (e.g., same age) patients with and without histologically confirmed cervical cancer

Cohort study

  • Aim: to study the incidence rate and whether the exposure is associated with the outcome of interest (e.g., a disease)
  • Study method and examples
    • Retrospective cohort study
      • Starts with individuals who are either exposed or not exposed to a particular risk factor (e.g., smoking)
      • A review of medical records of patients in both groups is then conducted to determine if the disease of interest (e.g., lung cancer) has developed.
    • Prospective cohort study
      • Starts with individuals who are either exposed or not exposed to a particular risk factor (e.g., smoking)
      • These two groups are then followed for a period of time to see if the disease of interest (e.g, lung cancer) develops.

Twin concordance study

  • Aim: determines the inheritability of disease vs. environmental risk factors
  • Study method: comparing the frequency of disease in twins (monozygotic or dizygotic)
  • Example: twins are followed over a 30-year period, following the diagnosis of Hodgkin disease in the first twin, to see if the frequency of cancer differed between monozygotic and dizygotic twins

Adoption study

  • Aim: determines the inheritability of disease vs. environmental risk factors
  • Study method: comparing the frequency of disease in adopted children vs. children who live with their biological parents
  • Example: the prevalence of schizophrenia in adopted children and the prevalence of schizophrenia in children who live with their biological parents is compared to determine the influence of genetic and environmental factors on schizophrenia

Randomized controlled trials are considered the gold standard for clinical trials!

A case-control study compares a small population group over a short period of time (less cost-intensive) and determines how multiple exposures lead to one outcome; a cohort study compares a large population over a long period of time (more cost-intensive) and determines how one exposure leads to multiple outcomes!

In cohort studies, researchers select individuals based on exposure first and then determine if these individuals develop a disease. This is in contrast to case-control studies, in which patients with disease (cases) and those without disease (controls) are selected first to determine if they were exposed or not!

Other types of studies

Survival analysis (prognosis study)

  • Survival analysis is used to measure disease prognosis.
  • Survival analysis is always prospective in nature:
    • Time-to-event analysis: Individual follow‑ups are done after the onset of a disease until death occurs; or exposure to a risk factor to onset of disease.
    • Five-year survival rate: the percentage of patients with a particular disease who have survived for 5 years after the initial diagnosis
  • Pitfalls of survival analysis
    • No prediction can be made about the average duration of survival in the case of subjects who did not die within the period of observation. (These subjects are called “censored cases”.)
    • Patients may drop out or die before the end of the follow-up period.
  • Kaplan-Meier analysis
    • Allows survival analysis to be displayed graphically
    • Overcomes problems associated with regular survival analysis
    • Used to analyze incomplete survival data
      • Ideal for a small number of cases and to describe the survival of a cohort
      • Allows survival over time even to be estimated when individuals are studied over different time intervals
      • The horizontal axis represents the time of follow-up
      • The vertical axis represents the estimated probability of survival (time intervals, called Kaplan-Meier estimators, that are defined by a specific event)

Meta-analysis

  • Data from multiple studies is systematically assessed.
  • Aims to increase statistical power and to identify differences between and/or common effects in individual studies (more precise results)
  • Limiting factors depend on the individual study types;; a meta-analysis is only as good as the individual studies used.

Registry study

  • Brief description: a retrospective study that uses data obtained from disease registries (e.g., cancer registries)
    • Criteria for a good quality cancer registry:
      • Complete entries
      • Low percentage of cases with a DCO (death certificate only)

Descriptive studies

  • Characteristics: no intervention; instead, patients are observed and the clinical course of the disease is studied. → The observations are used to form a hypothesis.
  • Examples of descriptive studies:
    • Incidence study
      • Incidence studies are used to determine the incidence of a particular event in a population during a certain time period (usually a year). If the event in consideration is death, the study is called a mortality study.
      • Incidence studies are usually performed as cohort studies in order to compare the incidence of an event (e.g., disease) between two groups.
    • Correlation study
      • The unit of analysis is the entire population.
      • Any conclusions that are drawn from the correlation study can only be applied to the entire population and not to an individual.
      • Correlation studies help form hypotheses but cannot be used to test them!
      • E.g.: a study to look at correlation between consumption of wine and death due to cardiovascular disease.

Measures of disease frequency

Morbidity, incidence, and prevalence

  • Morbidity: the disease burden in a population
  • Incidence rate
    • Description: the number of new cases of disease per unit of time
    • Formula: number of new cases/person-time units
  • Prevalence
    • Description
      • The ratio of all people with a disease to the total number of people in a population at a particular point in time
      • Corresponds to disease frequency
      • An increased prevalence of disease with a stable incidence can be explained by factors that result in increased survival and prolonged duration of the disease (e.g., improved quality of care of patients)
    • Formula: total number of cases/total population at a given point in time
  • Relationship between prevalence and incidence
    • If the population is in a steady state ., the relationship between incidence rate (IR), prevalence (P), and the average duration of the disease (T) can be described mathematically as
      • P/(1-P) = IR × T
        OR
      • IR = (P / (1-P)) ÷ T
    • If the disease is extremely rare, P ≈ IR × T
    • The number of new cases per unit time can be given by the formula: IR × population at risk (population without the disease)
  • Cumulative incidence
    • Description
      • The proportion of new cases of disease (in an initially disease-free population) over a defined period of time
      • The term attack rate, a synonym of cumulative incidence, is usually used during a disease outbreak.
    • Formula: number of new cases in a given time period/population at risk in the same time interval

Prevalence is usually greater than the incidence of a long-lasting disease: incidence * average duration of disease = prevalence

Birth, fertility, and mortality

  • Birth rate: the number of live births during a specific time interval
  • Fertility rate: rate of live births among women of childbearing age (15–44 years) in a population during a specific time interval
  • Mortality: the occurrence of death in a population
Measure Description Formula
Mortality rate (crude death rate)
  • The total mortality rate from all causes of death for a population in a specific time period
  • MR = (deaths/population) * 100
Fetal mortality rate
  • Yearly rate of fetal deaths
  • FMR = (number of infant deaths during the first 24 hours after birth)/total number of live births) * 1000
Neonatal mortality rate
  • Yearly rate of neonatal deaths
  • NMR = (number of infant deaths during the first 28 days of life/total number of live births) * 1000
    • Late neonatal mortality = number of infant deaths during postnatal days 7–28/total number of live births
    • Early neonatal mortality = number of infant deaths during the first week after birth/total number of live births
Post neonatal mortality rate
  • Yearly rate of post-neonatal deaths (from 28 days up to, but not including, 1 year of age)
  • PNMR = (number of infant death between 28 to 365 days of age/total number of live births) * 1000
Infant mortality rate
  • Yearly rate of total infant deaths (from birth to 1 year of age)
  • IMR = (number of infant deaths during the first year after birth/total number of live births) * 1000
Perinatal mortality rate
  • Yearly rate of fetal deaths (stillbirths) and early neonatal deaths
Maternal mortality rate
  • The number of maternal deaths per 100,000 live births in the same year
  • MMR = (maternal deaths/live child births) * 100,000
Case fatality rate (lethality)
  • Percentage of cases (patients with a specific condition) that result in death within a specific time period
  • CFR = (number of deaths from a specific condition/number of cases with the same specific condition) * 100
Proportionate mortality rate
  • Percentage of deaths due to a specific cause at a specific time
  • Deaths from a specific cause in a year/total deaths from all causes in a year * 100

Leading causes of death by age in the US

Leading cause of death by age

1st 2nd 3rd
< 1 yr Congenital anomalies Preterm birth SIDS
1–4 yr Accident Congenital anomalies Homicide
5–14 yr Accident Cancer Suicide
15–34 yr Accident Suicide Homicide
35–44 yr Accident Cancer Heart disease
45–64 yr Cancer Heart disease Accident
65+ yr Heart disease Cancer Chronic respiratory disease

Measures of risk

  • Risk factors: variables or attributes that increase the probability of developing disease or injury
  • The variables in the formula below stand for the following:
With disease Without disease
Exposed a b
Not exposed c d

Absolute risk

  • Incidence rate
  • Measures the probability of acquiring disease/injury in a given study population
  • Used in cohort studies
  • Formula: (number of new cases) / (total individuals at risk of developing disease) = (a + c)/(a + b + c + d)

Relative risk (RR; risk ratio)

  • The risk of an outcome (e.g., disease) among one group compared to the risk among another group
  • Measures how strongly a risk factor (e.g., death/injury/disease) in exposed individuals is associated with an outcome
  • Used in cohort studies
  • Considered statistically significant if the corresponding p-value is < 0.05
  • Formula: (incidence of disease in exposed group) / (incidence of disease in unexposed group) = (a/(a + b))/(c/(c + d))

Attributable risk (AR)

Attributable risk percent (ARP)

  • The percentage of incidence of disease among exposed individuals that can be attributed to the exposure
  • Formula
    • ARP = (RR - 1)/RR
    • Alternatively, ARP = AR/(incidence of disease in exposed group) * 100

Odds ratio (OR)

  • Compares the odds of exposure in individuals with disease/injury to those without disease/injury
  • Used in case-control studies
  • Rare disease assumption
  • Formula
    • Odds
      • The probability of an event occurring divided by the probability of this event not occurring
      • Odds of disease in exposed group = cases exposed/cases not exposed
      • Odds of disease in unexposed group = controls exposed/controls not exposed
    • OR = (odds of disease in exposed group)/(odds of disease in unexposed group) = (a/c) / (b/d)
      • OR = 1 means the event is equally likely in both groups.
      • OR > 1 means the event is more likely to occur in the group exposed to the risk factor.
      • OR < 1 means the event is less likely to occur in the group exposed to the risk factor.

Relative risk reduction (RRR)

  • The proportion of decreased risk due to an intervention compared to the control group
  • Formula: 1 - RR

Absolute risk reduction (ARR)

  • The difference in risk as a result of an intervention compared to the control group (e.g., risk of death)
  • Formula: risk in intervention group – risk in control group = (c/(c + d)) – (a/(a + b))

Number needed to treat (NNT)

  • The number of individuals that must be treated, in a particular time period, for one person to benefit from treatment (i.e., not develop disease/injury)
  • Formula: 1/absolute risk reduction (ARR)

Number needed to harm (NNH)

  • The number of individuals who need to be exposed to a certain risk factor before one person develops disease/injury
  • Formula: 1/attributable risk (AR)

Number needed to screen (NNS)

  • The number of individuals who need to be screened in a particular time period in order to detect a single case of the disease
  • Formula: 1/absolute risk reduction

Hazard ratio

  • The measure of an effect of an intervention on an outcome (death/cure) over a period of time
  • Used in survival analysis
  • Formula
    • (incidence of disease in exposed group)/ (incidence of disease in unexposed group) = (a/(a + b))/(c/(c + d))
      • HR = 1 : no relationship
      • HR > 1 : the event (the outcome of interest e.g., death, cure) is more likely to occur in the exposed group
      • HR < 1: the event is less likely to occur in the exposed group

Dose-response relationship (epidemiology)

  • One of the criteria required to establish causality in epidemiological studies
  • The other criteria for causality include:
    • Consistency (e.g., consistent results between different studies)
    • Strong correlation (e.g., a high relative risk)
    • Temporality (e.g., the exposure precedes the outcome)
    • Experimental evidence (e.g., includes both human and animal studies)
    • Biologic plausability (e.g., a suitable theory: lung cancer can be caused by cigarette smoking, but not by drinking water)
    • Biologic coherence (e.g., the suspected causality is fitting with the natural history of the disease)
    • Specificity
    • Analogy
  • Refers to the presence of a dose-response curve (e.g., the presence of disease increases/decreases in direct proportion with the level of exposure)
  • A causal dose-response relationship assumes that the greater the exposure, the greater the risk of disease
  • Can be influenced by confounding

The relative risk, odds ratio, and hazard ratio are usually displayed with a corresponding p-value. Per convention, they are considered statistically significant, if the related p-value is < 0.05!

Bias, confounding, effect modification, and latent period

Bias

  • Definition: an error in the study design or way in which it is conducted that causes systematic deviation of findings from the truth
Types of bias
Bias Problem Example Solution
Selection bias
  • The individuals in a sample group are not representative of the population from which the sample is drawn.
  • Healthy worker effect: The working population is healthier on average than the general population → Any sample consisting of only working individuals does not represent the general population.
  • Berkson bias: Sample groups drawn from a hospital population are more likely to be ill than the general population.
  • Non-response bias: Responder characteristics differ significantly from nonresponder characteristics because nonresponders do not return information during a study (e.g., patients do not return a call or do not respond to a written survey response).
  • Volunteer bias: Individuals who volunteer to participate in a study have different characteristics than the general population.
  • Attrition bias
    • Selective loss to follow up of participants; especially in prospective studies
    • Risk of over- or underestimating the association between exposure and outcome because the remaining participants differ significantly from those lost to follow-up.
  • Survival bias
    • Also known as prevalence-incidence bias or Neyman bias
    • A type of selection bias in which those observed as having a disease have either more severe or less severe disease than is true for all those who truly have the disease. In comparison to the true population with disease:
      • If those with severe disease die before the moment of observation, those with less severe disease are observed
      • If those with less severe disease have a resolution of their disease before the moment of observation, those with more severe disease will be observed.
      • Typically occurs in case-control and cross-sectional studies.
  • Randomization
    • Subjects are randomly assigned to the intervention and control groups to ensure that both groups are roughly equal in baseline characteristics (often displayed in a table, e.g., in randomized controlled trials).
    • Controls for both known and unknown confounders
    • Successful if possible confounding characteristics (e.g., socioeconomic demographics, family history) are approximately equally distributed between the groups
  • Ensure the sample group is representative of the population of interest (e.g., in case-control studies).
  • Collect as much data on characteristics of the participants as possible.
  • Nonresponder characteristics should not be assumed and incorrectly included in data analysis; instead, undisclosed characteristics of nonresponders should be recorded as unknown.
  • Intention-to-treat analysis: All patients who initially enrolled in the study (including drop-outs) are included in the analysis of study data; helps to reduce selection bias; preserves randomization
Allocation bias
  • A systematic difference in the way that participants are assigned to treatment or intervention groups
  • Assigning all female patients to one group and all male patients to another group
Recall bias
  • Awareness of condition by subjects changes their recall of related risk factors; common in retrospective studies
  • Subjects recall a certain exposure after finding out about others with the same condition
Information bias
  • Incorrectly collected data
  • Insufficient information about exposure and disease frequency among subjects
  • Information is gathered differently between the treatment and control group
  • Reporting bias: selective disclosure or suppression of information or study results, resulting in underreporting or overreporting of exposure or outcome
  • Interviewer bias: different interviewing approaches towards exposed and unexposed groups, or cases and controls, prompt different responses between groups and the conclusion of systematic differences between groups when there are none.
  • Standardize data collection
Cognitive bias
  • Tendency to favor something because of personal beliefs or ideas
  • Response bias: study participants do not respond truthfully or accurately because of the manner in which questions are phrased (e.g., leading questions) and/or the possibility of more socially acceptable answer options; especially common in surveys
  • Observer bias (Experimenter-expectancy effect or Pygmalion effect): measurement of a variable or classification of subjects is influenced by the experimenter's knowledge or expectations
  • Confirmation bias: the tendency of the investigator to include only those results which support his/her hypothesis and ignore the rest
  • Hawthorne effect: subject's change their behavior once they are aware that they are being observed; especially relevant for psychiatric research; difficult bias to eliminate
  • Placebo and nocebo effects: effect of the subject's preconceptions/beliefs on the outcome
  • Use of placebo
  • Researchers are discrete about their observations
  • Prolong the time of observation to monitor long-term effects
  • Blinding
Lead-time bias
  • Lead time: the average length of time between detection of a disease and the predetermined outcome
  • Early detection of disease is misinterpreted as increased survival
Length-time bias
  • An apparent improvement in the duration of survival when a terminal disease with a long clinical course (e.g., slow-growing tumor) is screened.
  • Often discussed in the context of cancer screening
  • Arrange patients according to severity of disease
Surveillance bias
  • An outcome (e.g., disease) is diagnosed more frequently in a sample group than in the general population because of increased testing and monitoring.
  • Leads to falsely high incidence and prevalence rates
  • Subjects who receive the trial treatment are monitored more frequently
  • Comparing to an unexposed control group with a similar likelihood of screening
  • Selecting an outcome that is possible in both the exposed and unexposed group

Confounding

  • Definition: any third variable that has not been considered in the study but that correlates with the exposure and the outcome
  • Example:: A confounder can be responsible for the observed relationship between the dependent and independent variables. For instance, while exposure to coal can result in lung cancer in individuals at a mining company, many miners smoke cigarettes, which acts as a third variable that can lead to lung cancer.
  • Solution
    • Perform multiple studies with different populations.
    • Randomization
    • Crossover study
    • Restriction (epidemiology)
      • The researcher only studies a part of the population that meet certain criteria (e.g., only males with a particular disease are included in a study to avoid the influence of gender on exposure and outcome)
      • Problems
    • Matching (epidemiology)
      • Commonly used in case-control studies
      • Cases and controls are grouped into pairs with similar attributes to avoid confounding
      • Problems
        • Matching; does not completely eliminate confounding
        • Can introduce confounding if the investigators match by factors that are not matched in the source population
        • Can introduce bias
    • Standardization of data (see Z-score)
    • Stratified analysis
      • Study groups are divided into subgroups according to the third variable.
      • Measures of association (e.g., the odds ratio) can be calculated for each subgroup (e.g., stratum-specific odds ratios)
      • In confounding:
        • Stratifying participants into subgroups according to the third variable will eliminate the confounder
        • The measures of association between subgroups will be similar, but the stratified measure of association is different from the whole population measure of association (e.g., crude odds ratio)
      • In effect modification:
        • Stratifying participants into subgroups according to the third variable will result in a stronger relationship in one subgroup
        • The measures of association will differ between subgroups (i.e., there is a strong association in the subgroup in which the effect modifier is present, while there is no association in the subgroup in which the effect modifier is absent)

Effect modification

  • Definition: a third variable that positively or negatively influences a study outcome; occurs when the exposure has a different effect between groups; not considered a type of bias in itself
  • Example: a certain drug works in children, but does not have any effect on adults
  • Solution: stratified analysis

Latency period

  • Definition: A seemingly inactive period between the exposure to a risk modifier to the time its effect becoming clinically apparent
  • Example: : The incubation period for infectious diseases is often very short, while there may be a very long latency period between pathogenesis of a malignancy and clinical manifestation.

References:[1]

Evidence-based medicine

Levels of evidence

Level Source of evidence
I Ia

Evidence from a meta-analysis of many randomized controlled studies

Ib

Evidence from at least one high-quality randomized controlled study

II

IIa

Evidence from at least one high-quality, non-randomized controlled study

IIb

Evidence from a quasi-experimental study or a cohort study

III

Evidence from a descriptive study

IV

Expert opinions, case reports, and other forms of anecdotal evidence

Grades of clinical recommendation (according to Evidence-Based Medicine (EBM) guidelines)

Grade Level of recommendation Type of study

A

Very high

  • Many peer-reviewed studies
  • A large, high-quality, multicentric study

B

High

  • A high-quality study
  • Many studies with significant limitations

C

Low

  • Studies with significant limitations
D

Very low

  • No clinical studies
  • Expert opinion

Evaluation of diagnostic tests

Sensitivity and specificity

Predictive values

Unlike sensitivity and specificity, which rely solely on the diagnostic test itself, predictive values are also influenced by disease prevalence!

Verifying the presence or absence of a disease

Receiving operating characteristic curve (ROC curve)

  • A graph that compares the sensitivity and specificity of a diagnostic test
  • Used to show the trade-off between clinical sensitivity and specificity for every possible cutoff value to evaluate the ability of the test to adequately diagnose subjects (e.g., diseased vs. nondiseased)
  • The y-axis represents the sensitivity (i.e., true positive rate) and the x-axis corresponds to 1 - specificity (i.e., false positive rate).
    • A test is considered more accurate if the curve follows the y-axis.
    • A test is considered less accurate if the curve is closer to the diagonal.
  • The area under the curve also allows the usefulness of tests to be compared: The larger the area under the ROC curve, the higher the validity of the test.

Two-by-two table

  • Definition: a type of contingency table that displays the frequency of two categorical variables, often exposure and outcome of disease
Disease No disease Total Interpretation
Positive test result
  • All individuals with positive test results (TP + FP)
Negative test result
  • All individuals with negative test results (FN + TN)
Total
  • All individuals with disease (TP + FN)
  • All individuals without disease (FP + TN)
  • All individuals (TP + FP + FN + TN)
Interpretation

Example of a two-by-two table

Diagnostic test for tuberculosis (TB)

(The table below is an annotated 2x2 table, with additional columns detailing total amounts and their interpretation.)

Patients with TB Patients without TB Total
Positive test result 800 (true positive) = TP 400 (false positive) = FP 1200
Negative test result 200 (false negative) = FN 3600 (true negative) = TN 3800
Total 1000 (TP + FN) 4000 (FP + TN) 5000

References:[2]

Random error, precision, and validity

Random error

  • Definition: an error that occurs due to chance and/or precision limitations of a test
  • Can be reduced by repeated measurements and averaging over a large number of observations

Precision (reliability)

  • Definition: the reproducibility of test results on the same sample under similar conditions
  • A test with a high precision will have minimal random error.
  • Precision improves with a ↓ standard deviation and ↑ power of a statistical test.
  • Precision is measured quantitatively with a reliability coefficient between 0 and 1.
  • Reliability coefficient = 1 → the variance of the sample mean is equal to the variance of the true measure → the study/test is highly reliable
  • If the variance of the sample is very large as a result of an error in measurement, the value of the reliability coefficient will approach 0.
  • Methods of estimating precision:
    • Interrater reliability: the test yields similar results when performed by different examiners.
    • Parallel-test reliability: the reliability of a new test is compared with an established test. The new test determines the reliability of a test in comparison to another test, the reliability of which has already been established. Similar statistical results imply a similar degree of reliability.
    • Test-retest reliability: the test yields the same results when repeated on the same subjects.

Validity (accuracy)

  • Definition: the correspondence between test measurements/results and what the test was developed to measure
  • A test with high validity/ accuracy will have minimal systematic error and bias.
  • Sensitivity and specificity are measures of validity (i.e., a highly valid test is highly specific and sensitive).
  • There are two forms of validity:
    • Internal validity
      • The extent to which a study is free of error (most often in the form of bias) and the results therefore true for the sample of individuals being studied
      • High internal validity can be achieved by:
        • Matching study groups according to age, sex, and other characteristics
        • Observing measures to reduce systemic errors (bias) to a minimum
    • External validity
      • Refers to whether study results can be extrapolated from a sample population to the general population (generalizability).
      • A study with high external validity has the following characteristics:

Example of a two-by-two table

Illustrative example of diagnostic test for tuberculosis (TB)

Patients with TB Patients without TB Total
Positive test result 800 (true positive) = TP 400 (false positive) = FP 1200
Negative test result 200 (false negative) = FN 3600 (true negative) = TN 3800
Total 1000 (TP+FN) 4000 (FP+TN) 5000

Other types of studies

Survival analysis (prognosis study)

  • Survival analysis is used to measure disease prognosis.
  • Survival analysis is always prospective in nature:
    • Time-to-event analysis: individual follow‑ups are done after the onset of a disease until death occurs.
    • Five-year survival rate: the percentage of patients with a particular disease who have survived for five years after the initial diagnosis
  • Pitfalls of survival analysis
    • No prediction can be made about the average duration of survival in the case of subjects who did not die within the period of observation. (These subjects are called “censored cases”.)
    • Patients may drop-out or die before the end of the follow-up period
  • Kaplan-Meier analysis
    • Graphical way of displaying survival analysis
    • Overcomes problems associated with regular survival analysis.
    • Used to analyze incomplete survival data
      • Ideal for a small number of cases and to describe the survival of a cohort.
      • Allows us to estimate survival over time even when individuals are studied over different time intervals
      • The curve is split into the following:
        • Horizontal axis which represents the time of follow-up
        • Vertical axis which represents the estimated probability of survival (time intervals, called Kaplan-Meier estimators, that are defined by a specific event)

Meta-analysis

  • Data from multiple studies is systematically assessed
  • Aims to increase statistical power and to identify discrepancies and/or common effects among individual studies
  • Limiting factors depend on the individual study types

Registry study

  • Brief description: a retrospective study that uses data obtained from disease registries (e.g., cancer registries)
    • Criteria for a good quality cancer registry:
      • Complete entries
      • Low percentage of cases with a DCO (DCO = death certificate only)

Descriptive studies

  • Characteristics: no intervention; instead, patients are observed and the clinical course of the disease is studied. → The observations are used to form a hypothesis.
  • Examples of descriptive studies:
    • Incidence study
      • Incidence studies are used to determine the incidence of a particular event in a population during a certain time period (usually a year). If the event in consideration is death, the study is called a mortality study.
      • Incidence studies are usually performed as cohort studies in order to compare the incidence of an event (e.g., disease) between two groups.
    • Correlation study
      • The unit of analysis is the entire population.
      • Any conclusions that are drawn from the correlation study can be only be applied to the entire population and not to an individual.
      • Correlation studies help form hypotheses but cannot be used to test them!
      • E.g., a study to look at correlation between consumption of wine and death due to cardiovascular disease.

Evidence-based medicine - delete

Levels of evidence

Level Source of evidence
I Ia

Evidence from a meta-analysis of many randomized control studies

Ib

Evidence from at least one high-quality, randomized control study

II

IIa

Evidence from at least one high-quality, non-randomized controlled study

IIb

Evidence from a quasi-experimental study or a cohort study

III

Evidence from a descriptive study

IV

Expert opinions, case reports, and other forms of anecdotal evidence

Grades of clinical recommendation (according to Evidence-Based Medicine (EBM) guidelines)

Grade Level of recommendation Type of study

A

Very high

  • Many peer-reviewed studies
  • A large, high-quality, multicentric study

B

High

  • A high-quality study
  • Many studies with significant limitations

C

Low

  • Studies with significant limitations
D

Very low

  • No clinical studies
  • Expert opinion

Basic definitions used in epidemiology

  • Statistics: science of collecting, analyzing, and interpreting data
  • Population: a group designated for gathering data
  • Data: information collected from a population
  • Sample: a small group that is part of and representative of a population
  • Control group: A group in a study that does not receive the intervention (e.g., a drug) or did not develop the outcome (e.g., a disease), which is recruited from the same source population as the study group. It is matched for baseline characteristics with the study population to reduce confounding factors.

Probabilities

Independent probability

Conditional probability (Non-independent probability)

  • Definition: Events are affected by previous events.
  • Probability:
  • Examples:
  1. Drawing two same color marbles without putting them back
    • In a bag of five marbles, two red, three blue. What is the chance of getting two red ones?
    • 2/5 for the first event, ¼ for the second event → 2/20 = 1/10 = 10% chance of getting two red ones.
  2. What is chance of survival in a patient who has survived a certain time
    • A patient survived first year, survival chance for that interval was 90%. What are his chances to survive to 10 years (interval 1-10 years) given the survival from 0 to 10 years is 64%.
    • In this case we know P (A and B) and P (A) → 64/100= 90/100 × P(B I A) → P(B I A) = 64/100 ×100/90 → 64/90 = 0,71

Mutually exclusive probability

  • Definition: Events cannot both happen.
  • Probability:
  • Example: Drawing an ace or a queen out of a deck with 52 cards
    • 4/52 + 4/52 = 8/52 =2/13

Non-mutually exclusive probability

last updated 12/03/2018
{{uncollapseSections(['7gY49o', '0GXeBz', 'zsXrxz', 'bGXHBz', 'z6crnW0', '_6c5nW0', 'cGXayz', 'YGXnBz', 'ZGXZBz', 'aGXQBz', '1GX2yz', '3JcSFW0', 'RJclFW0', 'dGXoyz', '-sXDxz', 'apcQLW0', 'WGXPyz'])}}