Complete Bayesian Statistics Cheat Sheet: Theory, Applications & Practical Implementation

Introduction to Bayesian Statistics

Bayesian statistics is a framework for statistical inference that uses probability to represent uncertainty about the state of knowledge. It provides a mathematical method for updating beliefs based on new evidence, allowing for direct probability statements about parameters and hypotheses.

Why Bayesian Statistics Matters:

  • Allows incorporation of prior knowledge and domain expertise
  • Provides full probability distributions rather than point estimates
  • Handles small sample sizes better than frequentist methods
  • Enables sequential updating as new data arrives
  • Offers natural framework for hierarchical modeling
  • Makes uncertainty quantification straightforward
  • Provides intuitive interpretation of probability as degree of belief

Core Concepts & Principles

Bayes’ Theorem

The foundation of Bayesian statistics:

$$P(\theta|D) = \frac{P(D|\theta) \times P(\theta)}{P(D)}$$

Where:

  • $P(\theta|D)$ is the posterior probability of parameters $\theta$ given data $D$
  • $P(D|\theta)$ is the likelihood of observing data $D$ given parameters $\theta$
  • $P(\theta)$ is the prior probability of parameters $\theta$
  • $P(D)$ is the marginal likelihood or evidence (normalizing constant)

In words: $\text{Posterior} \propto \text{Likelihood} \times \text{Prior}$

Key Bayesian Concepts

ConceptDescriptionImportance
Prior DistributionProbability distribution representing beliefs about parameters before seeing dataEncodes existing knowledge; influences posterior with small datasets
Likelihood FunctionProbability of observing the data given specific parameter valuesConnects model to observed data
Posterior DistributionUpdated probability distribution of parameters after observing dataMain output of Bayesian analysis; used for inference and decisions
Marginal LikelihoodTotal probability of observed data averaged over all parameter valuesUsed for model comparison; denominator in Bayes’ theorem
Credible IntervalInterval containing parameter with specified probability (e.g., 95%)Bayesian alternative to confidence intervals; directly interpretable
Posterior PredictiveDistribution of future observations given observed dataUsed for prediction, model checking
Bayes FactorRatio of marginal likelihoods of two competing modelsUsed for model comparison

Bayesian vs. Frequentist Statistics

AspectFrequentist ApproachBayesian Approach
ParametersFixed but unknownRandom variables with probability distributions
Probability DefinitionLong-run frequency of eventsDegree of belief
Inference FocusP(data|hypothesis)P(hypothesis|data)
Prior InformationNot formally incorporatedExplicitly modeled
Uncertainty QuantificationConfidence intervalsCredible intervals
Hypothesis Testingp-values, significance testsPosterior probabilities, Bayes factors
Small Sample PerformanceOften requires large samplesCan work well with small samples
Sequential AnalysisRequires special techniquesNatural updating process
ComputationOften analyticalOften requires simulation methods

Step-by-Step Bayesian Statistical Analysis

  1. Formulate the Problem

    • Define parameters of interest
    • Determine goals (estimation, prediction, testing, etc.)
    • Identify observed and unobserved variables
  2. Construct Probabilistic Model

    • Specify likelihood function
    • Account for data-generating process
    • Include measurement error if applicable
  3. Choose Prior Distributions

    • Select appropriate families for parameters
    • Set hyperparameters based on existing knowledge
    • Consider sensitivity to prior choice
  4. Collect Data & Compute Posterior

    • For conjugate models: analytical solution
    • For simple models: grid approximation
    • For complex models: MCMC or variational methods
  5. Check Model Adequacy

    • Posterior predictive checks
    • Sensitivity analysis for priors
    • Convergence diagnostics for MCMC
  6. Draw Inferences

    • Summarize posterior (mean, median, mode)
    • Compute credible intervals
    • Calculate probabilities of hypotheses
  7. Make Predictions

    • Simulate from posterior predictive distribution
    • Compute predictive intervals
    • Validate with out-of-sample data if available

Key Techniques & Methods

Common Prior Distributions

DistributionTypical UseConjugate ForProperties
BetaProportions, probabilitiesBinomial, BernoulliRange: [0,1]; flexible shapes
NormalContinuous unbounded parametersNormal (known variance)Symmetric; well-understood
GammaPositive continuous parametersPoisson, ExponentialRange: (0,∞); right-skewed
Inverse GammaVariancesNormal (known mean)Range: (0,∞); heavy right tail
DirichletMultinomial probabilitiesMultinomialMultivariate generalization of Beta
UniformMinimal prior informationVariousAssigns equal probability across range
Half-CauchyScale parametersRange: (0,∞); heavy tail for robustness
Student’s tRobust alternatives to NormalHeavier tails than Normal

Conjugate Prior-Likelihood Pairs

LikelihoodConjugate PriorPosterior
Bernoulli/BinomialBeta(α,β)Beta(α+successes, β+failures)
Poisson(λ)Gamma(α,β)Gamma(α+sum(x), β+n)
Normal(μ,σ² known)Normal(μ₀,σ₀²)Normal((σ₀²∑x+σ²μ₀)/(nσ₀²+σ²), σ²σ₀²/(nσ₀²+σ²))
Normal(μ known,σ²)Inverse-Gamma(α,β)Inverse-Gamma(α+n/2, β+∑(x-μ)²/2)
MultinomialDirichlet(α₁,…,αₖ)Dirichlet(α₁+x₁,…,αₖ+xₖ)
Exponential(λ)Gamma(α,β)Gamma(α+n, β+∑x)

Computational Methods

For Simple Models

  • Analytical solutions (for conjugate models)
  • Grid approximation (discretize parameter space)
  • Quadrature methods (numerical integration)

For Complex Models

  • Markov Chain Monte Carlo (MCMC)

    • Metropolis-Hastings algorithm
    • Gibbs sampling
    • Hamiltonian Monte Carlo (HMC)
    • No-U-Turn Sampler (NUTS)
  • Variational Inference

    • Mean-field approximation
    • Full-rank approximation
    • Automatic differentiation variational inference (ADVI)
  • Approximate Bayesian Computation (ABC)

    • Rejection sampling
    • MCMC-ABC
    • Sequential Monte Carlo ABC

Common Bayesian Models

Model TypeDescriptionApplications
Bayesian Linear RegressionLinear model with prior distributions on coefficientsPrediction, regression analysis
Hierarchical ModelsMulti-level models with shared parametersGrouped/clustered data, shrinkage
Bayesian GLMsGeneralized linear models (logistic, Poisson, etc.)Count data, binary outcomes
Bayesian Time SeriesDynamic models, state-space models, ARIMAForecasting, trend analysis
Mixture ModelsCombination of multiple distributionsClustering, density estimation
Bayesian NetworksGraphical models showing probabilistic relationshipsCausal inference, expert systems
Dirichlet Process ModelsNonparametric models with infinite componentsClustering with unknown number of clusters
Gaussian ProcessesNonparametric regressionFlexible function approximation

Common Challenges & Solutions

ChallengeDescriptionSolutions
Prior SelectionChoosing appropriate prior distributionsUse weakly informative priors; sensitivity analysis; hierarchical modeling
Computational BurdenLong run times for complex modelsEfficient samplers; GPU acceleration; approximation methods
MCMC ConvergenceDetermining if chains have convergedMultiple chains; convergence diagnostics (R-hat, ESS); trace plots
High DimensionalityMany parameters causing sampling inefficiencyReparameterization; marginalization; improved samplers
Model ComparisonSelecting between competing modelsBayes factors; information criteria (WAIC, LOO); predictive performance
IdentifiabilityMultiple parameter values giving same likelihoodInformative priors; parameter constraints; reparameterization
Communicating ResultsExplaining Bayesian analysis to non-expertsVisual posterior summaries; analogies to familiar concepts

Diagnostics & Model Checking

MCMC Diagnostics

DiagnosticDescriptionGood Values
R-hatPotential scale reduction factor< 1.01 (closer to 1 is better)
Effective Sample Size (ESS)Equivalent number of independent samples> 100 per chain (higher is better)
Trace PlotsTime series of samplesShould look like “hairy caterpillars”
AutocorrelationCorrelation between consecutive samplesShould decay quickly to zero
DivergencesIssues with geometry of posteriorZero divergent transitions

Model Checking Methods

MethodDescriptionUse For
Posterior Predictive ChecksCompare replicated data to observed dataOverall model fit assessment
Prior Predictive ChecksSimulate from prior predictive distributionVerify prior assumptions
Leave-One-Out Cross-ValidationEstimate out-of-sample predictive accuracyModel comparison, overfitting detection
Information CriteriaWAIC, LOO-IC, DICModel comparison
Bayes FactorsRatio of marginal likelihoodsHypothesis testing, model comparison
Sensitivity AnalysisVary priors and check posterior changesAssess robustness to prior specification

Best Practices & Practical Tips

Problem Formulation

  • Start with clear research questions
  • Consider what parameters actually represent
  • Use domain knowledge to inform modeling choices
  • Parameterize models for interpretability

Prior Selection

  • Use weakly informative priors when possible
  • Avoid flat improper priors for scale parameters
  • Standardize/normalize predictors for better prior specification
  • Document and justify prior choices
  • Check prior predictive distribution

Computation

  • Run multiple MCMC chains with different starting points
  • Use adequate warmup/burn-in period
  • Monitor convergence diagnostics
  • Save all MCMC samples for later analysis
  • Use appropriate transformations for constrained parameters

Reporting Results

  • Show full posterior distributions, not just summaries
  • Report credible intervals, not just point estimates
  • Be transparent about modeling assumptions
  • Include model checking and validation results
  • Provide code and data when possible

Practical Workflow Tips

  • Start with simple models and gradually add complexity
  • Simulate data to validate model implementation
  • Build models incrementally and check each step
  • Use graphical representations of the model
  • Compare with non-Bayesian alternatives when appropriate

Software for Bayesian Statistics

SoftwareLanguageFeaturesBest For
StanOwn language, interfaces to R/Python/JuliaHMC/NUTS sampling, efficient for complex modelsGeneral-purpose Bayesian modeling
PyMCPythonUser-friendly API, visualization toolsPython users, general modeling
JAGSOwn language, R interfaceGibbs samplingConditionally conjugate models
BUGS/OpenBUGSOwn languageEstablished, extensive examplesLegacy applications
brmsRFormula interface, Stan backendR users, regression models
rstanarmRPre-compiled Stan modelsCommon models with R syntax
bayesplotRSpecialized plots for Bayesian analysisVisualization of Bayesian results
ArviZPythonExploratory analysis of Bayesian modelsVisualization and diagnostics
TensorFlow ProbabilityPythonVariational inference, deep learning integrationLarge-scale models, neural networks
Turing.jlJuliaFast performance, model composabilityJulia users, custom algorithms

Applications of Bayesian Statistics

FieldApplicationsBenefits
Machine LearningProbabilistic neural networks, Gaussian processesUncertainty quantification, regularization
Clinical TrialsAdaptive designs, interim analysesEfficiency, ethical considerations
EpidemiologyDisease mapping, outbreak detectionSpatial modeling, small area estimation
FinancePortfolio optimization, risk assessmentHandling uncertainty, incorporating beliefs
Environmental ScienceClimate models, ecological analysisHierarchical modeling, spatial-temporal data
Sports AnalyticsPlayer performance, game predictionsPrior knowledge inclusion, uncertainty
MarketingA/B testing, customer behaviorSequential decision making
GenomicsGene expression, phylogeneticsHigh-dimensional data, complex relationships
Social SciencesSurvey analysis, causal inferenceMissing data handling, multilevel models

Resources for Further Learning

Introductory Books

  • “Statistical Rethinking” by Richard McElreath – Accessible introduction with R examples
  • “Doing Bayesian Data Analysis” by John Kruschke – Beginner-friendly with diagrams
  • “Bayesian Statistics the Fun Way” by Will Kurt – Gentle introduction
  • “Think Bayes” by Allen Downey – Computational approach, Python

Advanced Books

  • “Bayesian Data Analysis” by Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin – Comprehensive reference
  • “Bayesian Cognitive Modeling” by Lee and Wagenmakers – Cognitive science applications
  • “Gaussian Processes for Machine Learning” by Rasmussen and Williams – GP-specific
  • “Monte Carlo Statistical Methods” by Robert and Casella – Computational methods

Online Courses

  • Statistical Rethinking (Richard McElreath) – Available on YouTube
  • Bayesian Statistics: From Concept to Data Analysis (Coursera)
  • Probabilistic Programming and Bayesian Methods (PyMC Labs)

Journals & Conferences

  • Bayesian Analysis – Official journal of ISBA
  • Journal of Statistical Software – Often features Bayesian methods
  • ISBA World Meeting – International Society for Bayesian Analysis conference
  • StanCon – Stan users conference

Online Resources

Scroll to Top