Comprehensive Biostatistics Cheat Sheet

Introduction to Biostatistics

Biostatistics is the application of statistical methods to biological data and problems in the health sciences. It plays a crucial role in:

  • Designing rigorous medical studies
  • Analyzing health and disease patterns in populations
  • Evaluating the effectiveness of treatments and interventions
  • Identifying risk factors for diseases
  • Interpreting and communicating research findings
  • Supporting evidence-based medicine and public health decisions

Core Concepts & Principles

Types of Variables

TypeDescriptionExamples
CategoricalQualitative data that can be sorted into groupsBlood type (A, B, AB, O), Disease status (yes/no)
NumericalQuantitative data represented by numbers 
DiscreteCountable values with gaps between themNumber of heart attacks, Children per family
ContinuousCan take any value within a rangeBlood pressure, BMI, Temperature
OrdinalCategorical data with a natural orderDisease severity (mild, moderate, severe), Pain scales (1-10)

Measures of Central Tendency

  • Mean: The average value (sum of values divided by count)
  • Median: The middle value when data is arranged in order
  • Mode: The most frequently occurring value

Measures of Dispersion

  • Range: Difference between maximum and minimum values
  • Variance: Average of squared deviations from the mean
  • Standard Deviation (SD): Square root of variance
  • Interquartile Range (IQR): Range between 25th and 75th percentiles
  • Coefficient of Variation: (SD / Mean) × 100%

Probability Distributions

DistributionDescriptionApplications
NormalBell-shaped curve defined by mean and SDHeights, blood pressure, measurement errors
BinomialProbability of x successes in n trialsDisease occurrence, treatment success/failure
PoissonRare events in a fixed time or spaceDisease incidence, number of mutations
ExponentialTime between independent eventsSurvival times, waiting times
Chi-squareSum of squared standard normal variablesTesting independence, goodness of fit
t-distributionMore spread out than normal, depends on degrees of freedomSmall sample inference about means

Study Design Methodology

Types of Research Studies

Study TypeDescriptionStrengthsLimitations
Randomized Controlled Trial (RCT)Participants randomly assigned to treatment/controlGold standard; reduces confounding & biasExpensive; ethical limitations; may lack external validity
Cohort StudyFollows groups with different exposures over timeEstablishes temporal sequence; can study multiple outcomesTime-consuming; expensive; susceptible to loss to follow-up
Case-Control StudyCompares people with disease to those withoutEfficient for rare diseases; requires fewer subjectsSusceptible to recall & selection bias; cannot calculate incidence
Cross-sectional StudyData collected at one point in timeQuick, inexpensive; good for prevalenceCannot establish causality; temporal ambiguity
Ecological StudyCompares groups, not individualsUseful for generating hypotheses; uses existing dataEcological fallacy; cannot link exposure to outcome at individual level

Key Study Design Elements

  • Randomization: Random assignment to reduce systematic differences
  • Blinding: Single (participant) or double (participant and researcher) to reduce bias
  • Controls: Comparison groups to isolate variable effects
  • Sample Size Calculation: Ensures adequate statistical power
  • Inclusion/Exclusion Criteria: Defines study population

Statistical Methods by Purpose

Descriptive Statistics

  • Frequency Tables: Counts and percentages
  • Measures of Central Tendency: Mean, median, mode
  • Measures of Dispersion: SD, variance, range, IQR
  • Data Visualization: Histograms, box plots, scatter plots, bar charts

Inferential Statistics

  • Point Estimation: Single value estimate of a parameter
  • Interval Estimation: Range of plausible values (confidence intervals)
  • Hypothesis Testing: Process to test claims about populations
    • Null hypothesis (H₀): No effect/difference
    • Alternative hypothesis (H₁): Effect/difference exists

Comparative Analyses

TestPurposeData TypeAssumptions
t-test (independent)Compare means of 2 independent groupsContinuousNormal distribution, equal variances
t-test (paired)Compare means of matched pairsContinuousNormal distribution of differences
ANOVACompare means of 3+ groupsContinuousNormal distribution, equal variances
Chi-squareCompare proportions between groupsCategoricalExpected frequencies ≥5 in each cell
Fisher’s ExactCompare proportions (small samples)CategoricalSmall sample sizes
Mann-Whitney UCompare 2 groups (non-parametric)Ordinal/continuousDoes not require normality
Kruskal-WallisCompare 3+ groups (non-parametric)Ordinal/continuousDoes not require normality
Wilcoxon Signed-RankPaired data (non-parametric)Ordinal/continuousDoes not require normality

Correlation and Regression Analyses

AnalysisPurposeOutputAssumptions
Pearson CorrelationLinear relationship between 2 continuous variablesr (-1 to +1)Normal distribution, linear relationship
Spearman CorrelationMonotonic relationship between 2 variablesrs (-1 to +1)Monotonic relationship
Simple Linear RegressionPredict continuous outcome from one predictorβ coefficients, R²Linearity, normality, homoscedasticity, independence
Multiple Linear RegressionPredict continuous outcome from multiple predictorsβ coefficients, R²Linearity, normality, homoscedasticity, independence, no multicollinearity
Logistic RegressionPredict binary outcomeOdds ratios, log oddsBinary outcome, independence, no multicollinearity
Cox Proportional HazardsAnalyze time-to-event data with censoringHazard ratiosProportional hazards, independent censoring

Advanced Methods

  • Survival Analysis: Analyzes time until event occurs
    • Kaplan-Meier curves
    • Log-rank test
    • Cox proportional hazards models
  • Meta-Analysis: Statistically combines results of multiple studies
  • Multivariate Analysis: Analyzes multiple dependent variables simultaneously
  • Cluster Analysis: Groups similar observations together
  • Principal Component Analysis: Reduces data dimensionality

Statistical Power and Sample Size

Key Determinants of Statistical Power

  • Sample Size: Larger samples provide more power
  • Effect Size: Larger effects are easier to detect
  • Variability: Less variability gives more power
  • Significance Level (α): Usually set at 0.05
  • Type I Error: False positive (rejecting true null hypothesis)
  • Type II Error: False negative (failing to reject false null hypothesis)
  • Power = 1 – β: Probability of detecting a true effect (usually aim for 0.8 or 80%)

Sample Size Calculation Components

  1. Expected effect size
  2. Desired power level (typically 80% or 90%)
  3. Significance level (typically α = 0.05)
  4. Variability estimate
  5. Study design factors (one vs. two-sided tests, paired vs. independent samples)

P-values, Confidence Intervals, and Significance

P-value Interpretation

  • Definition: Probability of obtaining results at least as extreme as observed, if null hypothesis is true
  • Interpretation:
    • p < 0.05: Statistically significant (by convention)
    • p ≥ 0.05: Not statistically significant
  • Caution: Statistical significance ≠ clinical significance

Confidence Intervals (CI)

  • Definition: Range of values likely to contain the true population parameter
  • Interpretation:
    • 95% CI: 95% confidence that interval contains true parameter
    • Narrow CI indicates precise estimate
    • Wide CI indicates less precision
  • Advantage: Provides both magnitude and precision of effect

Common Challenges and Solutions

Selection Bias

  • Problem: Study participants not representative of target population
  • Solutions: Random sampling, clear inclusion/exclusion criteria, reporting participation rates

Confounding

  • Problem: Extraneous variable associated with both exposure and outcome
  • Solutions: Randomization, matching, stratification, multivariable analysis, restriction

Missing Data

  • Problem: Incomplete datasets leading to bias or reduced power
  • Solutions:
    • Complete case analysis (if missing completely at random)
    • Imputation methods (mean/median substitution, multiple imputation)
    • Sensitivity analyses

Multiple Comparisons

  • Problem: Increased risk of Type I errors when performing many tests
  • Solutions: Bonferroni correction, False Discovery Rate, pre-specified primary endpoints

Low Statistical Power

  • Problem: Inability to detect true effects
  • Solutions: Increase sample size, reduce measurement variability, use more efficient designs

Best Practices and Tips

Study Design

  • Clearly define research question and hypothesis before starting
  • Choose appropriate study design for your research question
  • Conduct proper sample size calculations before beginning
  • Pre-register study protocols and analysis plans
  • Use validated measurement tools when possible

Data Analysis

  • Examine data distribution before choosing statistical tests
  • Check assumptions of statistical tests
  • Present effect sizes along with p-values
  • Report confidence intervals
  • Conduct sensitivity analyses for important findings
  • Consider clinical significance, not just statistical significance

Reporting

  • Follow relevant reporting guidelines (CONSORT, STROBE, PRISMA)
  • Report all outcomes, not just significant ones
  • Be transparent about analytical methods and decisions
  • Avoid overinterpreting results (especially with observational data)
  • Include appropriate visualizations of data
  • Present absolute risk differences, not just relative risks

Software Tools for Biostatistics

Statistical Packages

  • R: Free, powerful, versatile; steep learning curve
  • SPSS: User-friendly interface; limited advanced capabilities
  • SAS: Industry standard in healthcare; expensive
  • Stata: Popular in epidemiology; clean syntax
  • GraphPad Prism: User-friendly; focused on biological research

Key Functions to Know

  • Data import/cleaning
  • Descriptive statistics
  • Basic visualizations
  • Common statistical tests
  • Regression analyses
  • Power calculations

Resources for Further Learning

Books

  • Fundamentals of Biostatistics by Bernard Rosner
  • Essential Medical Statistics by Betty Kirkwood and Jonathan Sterne
  • Statistical Methods in Medical Research by P. Armitage, G. Berry, J.N.S. Matthews
  • Practical Statistics for Medical Research by Douglas G. Altman

Online Courses and Resources

  • Coursera: “Statistics for Life Sciences” specialization
  • EdX: “Statistics and R” by Harvard
  • StatLearning.com: Free course materials
  • UCLA Statistical Computing Resources
  • BMJ Statistics at Square One series

Key Journals

  • Statistics in Medicine
  • Biostatistics
  • Statistical Methods in Medical Research
  • Journal of the Royal Statistical Society

Professional Organizations

  • American Statistical Association (ASA)
  • International Biometric Society
  • Royal Statistical Society
  • Society for Clinical Trials
Scroll to Top