The Complete Differential Privacy Cheatsheet: Protect Data While Preserving Utility

What is Differential Privacy and Why It Matters

Differential Privacy (DP) is a mathematical framework that provides strong privacy guarantees when analyzing datasets containing sensitive information. It ensures that the presence or absence of any individual’s data doesn’t significantly affect the outcome of statistical queries.

Why It’s Critical:

  • Enables statistical analysis while protecting individual privacy
  • Provides mathematical guarantees (not just heuristics)
  • Prevents re-identification attacks and membership inference
  • Required by regulations (GDPR, CCPA) and used by tech giants (Apple, Google, Microsoft)
  • Balances data utility with privacy protection

Core Concepts and Principles

The Fundamental Definition

A randomized algorithm M satisfies (ε, δ)-differential privacy if for all datasets D₁ and D₂ that differ by exactly one record, and for all possible outputs S:

Pr[M(D₁) ∈ S] ≤ e^ε × Pr[M(D₂) ∈ S] + δ

Key Parameters

ParameterNameDescriptionTypical Values
ε (epsilon)Privacy BudgetLower = more private0.1 – 10
δ (delta)Failure ProbabilityProbability of privacy breach10⁻⁵ to 10⁻⁹
ΔfGlobal SensitivityMax change in function outputFunction-dependent
βLocal SensitivityActual sensitivity for specific dataset≤ Δf

Privacy Guarantees Hierarchy

Pure DP (ε-DP): δ = 0, strongest guarantee Approximate DP ((ε,δ)-DP): δ > 0, more practical Local DP: Each individual adds noise to their own data Central DP: Trusted curator adds noise to aggregate results


Step-by-Step Implementation Process

Phase 1: Problem Assessment

  1. Define the Query/Analysis

    • Identify what statistics you need
    • Determine acceptable accuracy loss
    • Set privacy requirements
  2. Calculate Sensitivity

    • Global sensitivity: worst-case impact of one record
    • Local sensitivity: actual impact for your dataset
    • Choose appropriate sensitivity measure
  3. Set Privacy Parameters

    • Choose ε based on privacy needs (lower = more private)
    • Set δ (typically 1/n² where n = dataset size)
    • Allocate privacy budget across queries

Phase 2: Mechanism Selection

  1. Choose DP Mechanism

    • Laplace for numerical queries (pure DP)
    • Gaussian for numerical queries (approximate DP)
    • Exponential for categorical selection
    • Report Noisy Max for top-k queries
  2. Implementation

    • Add calibrated noise to results
    • Implement composition tracking
    • Set up privacy accounting

Phase 3: Validation and Monitoring

  1. Test and Validate
    • Verify noise calibration
    • Test privacy budget tracking
    • Validate result utility

Key Techniques and Mechanisms

Fundamental Mechanisms

MechanismUse CaseNoise DistributionPrivacy Type
LaplaceCount, sum, average queriesLaplace(Δf/ε)Pure DP
GaussianNumerical queries with large datasetsN(0, (Δf·σ)²), σ ≥ √(2ln(1.25/δ))/εApproximate DP
ExponentialSelection queries (argmax)Exponential weightingPure DP
Report Noisy MaxTop-k selectionExponential + Gumbel noisePure DP

Advanced Techniques

Sparse Vector Technique (SVT)

  • Answers many threshold queries efficiently
  • Uses privacy budget only for queries above threshold
  • Ideal for monitoring applications

Private Multiplicative Weights (PMW)

  • Interactive query answering
  • Adapts to query sequence
  • Maintains utility over many rounds

Smooth Sensitivity

  • Data-dependent noise calibration
  • Better utility than global sensitivity
  • More complex implementation

Composition Methods

Composition TypePrivacy CostWhen to Use
Basicε_total = Σε_i, δ_total = Σδ_iSimple, conservative
Advancedε_total = √(2k ln(2/δ’))ε + kε²/2Multiple similar queries
Moments AccountantTighter bounds via MGFComplex ML pipelines
Rényi DPRDP(α) conversionDeep learning applications

Implementation Tools and Libraries

Python Libraries

Google’s DP Library

from dp_accounting import dp_event
from dp_accounting import privacy_accountant

# Track privacy budget
accountant = privacy_accountant.PrivacyAccountant()
event = dp_event.GaussianDpEvent(noise_multiplier=1.0)
accountant.compose(event, count=1000)

OpenMined PySyft

import syft as sy
# Differential privacy with federated learning

IBM Diffprivlib

from diffprivlib.models import LinearRegression
from diffprivlib.mechanisms import Laplace

# DP machine learning
model = LinearRegression(epsilon=1.0)
model.fit(X, y)

R Libraries

  • diffpriv: Basic DP mechanisms
  • PSIlence: Private set intersection
  • smartnoise: Microsoft’s DP platform

Cloud Platforms

  • Google Cloud AI Platform: DP-SGD integration
  • Microsoft SmartNoise: End-to-end DP workflows
  • AWS Clean Rooms: DP query processing

Common Challenges and Solutions

Challenge 1: Privacy Budget Exhaustion

Problem: Running out of ε across multiple queries Solutions:

  • Use composition theorems for tighter bounds
  • Implement query scheduling and prioritization
  • Apply sparse vector technique for threshold queries
  • Consider local sensitivity for data-dependent bounds

Challenge 2: Poor Utility with High Privacy

Problem: Results too noisy to be useful Solutions:

  • Optimize sensitivity calculations
  • Use post-processing for better estimates
  • Apply smooth sensitivity techniques
  • Consider approximate DP (ε,δ) instead of pure DP

Challenge 3: Parameter Selection

Problem: Choosing appropriate ε and δ values Solutions:

  • Start with ε=1.0 as baseline, adjust based on needs
  • Set δ ≤ 1/n² where n is dataset size
  • Use privacy risk assessment frameworks
  • Benchmark against industry standards

Challenge 4: Implementation Complexity

Problem: Correctly implementing DP mechanisms Solutions:

  • Use established libraries (don’t roll your own)
  • Implement comprehensive testing suites
  • Apply formal verification where possible
  • Regular security audits and code reviews

Best Practices and Practical Tips

Privacy Budget Management

  • Pre-allocate Budget: Plan all queries in advance
  • Track Composition: Use privacy accountants religiously
  • Batch Queries: Group similar queries to save budget
  • Prioritize Queries: Spend more budget on critical analyses

Implementation Guidelines

  • Clamp Inputs: Bound all input values to prevent manipulation
  • Add Noise Last: Apply DP noise as final step before output
  • Validate Sensitivity: Double-check all sensitivity calculations
  • Test Thoroughly: Verify privacy guarantees with unit tests

Accuracy Optimization

  • Choose Right Mechanism: Match mechanism to query type
  • Optimize Sensitivity: Use local or smooth sensitivity when possible
  • Post-process Intelligently: Apply constraints after adding noise
  • Consider Trade-offs: Balance privacy and utility requirements

Security Considerations

  • Secure Noise Generation: Use cryptographically secure RNGs
  • Prevent Side Channels: Avoid timing or memory access patterns
  • Audit Regular: Review privacy accounting and implementations
  • Monitor Attacks: Watch for correlation and reconstruction attempts

Comparison: DP Variants

VariantTrust ModelNoise LocationPrivacy StrengthUtilityComplexity
Central DPTrusted curatorServer-sideStrongHighMedium
Local DPNo trust requiredClient-sideWeakerLowerHigh
Shuffle DPTrusted shuffler onlyClient + shuffleMediumMediumHigh
Federated DPTrusted aggregatorDuring aggregationStrongHighVery High

Quick Reference Formulas

Noise Calibration

Laplace Noise: Lap(Δf/ε)
Gaussian Noise: N(0, (Δf·σ)²) where σ ≥ sqrt(2·ln(1.25/δ))/ε
Exponential Weights: exp(ε·u(x)/2Δu) where u is utility function

Sensitivity Calculations

Count Query: Δf = 1
Sum Query: Δf = max_value
Average Query: Δf = (max_value - min_value)/n

Composition Bounds

Basic: (Σε_i, Σδ_i)
Advanced: (sqrt(2k·ln(2/δ'))·ε + k·ε²/2, δ')
Moments: Use DP accounting libraries

Practical Implementation Checklist

Before Implementation

  • [ ] Define privacy requirements and threat model
  • [ ] Calculate query sensitivity accurately
  • [ ] Choose appropriate ε and δ parameters
  • [ ] Select suitable DP mechanism
  • [ ] Plan privacy budget allocation

During Implementation

  • [ ] Use established DP libraries
  • [ ] Implement privacy accounting system
  • [ ] Add comprehensive input validation
  • [ ] Apply proper noise generation
  • [ ] Test with edge cases

After Implementation

  • [ ] Validate privacy guarantees
  • [ ] Monitor utility metrics
  • [ ] Audit privacy budget usage
  • [ ] Test against known attacks
  • [ ] Document privacy parameters

Resources for Further Learning

Essential Papers

  • “Differential Privacy” (Dwork, 2006): Original foundational paper
  • “The Algorithmic Foundations of Differential Privacy” (Dwork & Roth, 2014): Comprehensive textbook
  • “Deep Learning with Differential Privacy” (Abadi et al., 2016): DP in ML
  • “Concentrated Differential Privacy” (Dwork & Rothblum, 2016): Advanced composition

Online Courses and Tutorials

  • Coursera: “Applied Privacy for Data Science” (University of Michigan)
  • edX: “Differential Privacy” (Harvard)
  • YouTube: “Differential Privacy in Practice” (Google AI)
  • OpenMined: Differential Privacy tutorials and courses

Books and References

  • “The Ethical Algorithm” (Kearns & Roth): Accessible introduction
  • “Differential Privacy: A Primer for a Non-technical Audience” (Wood et al.)
  • “Programming Differential Privacy” (Near & Abuah): Practical implementation guide

Tools and Platforms

  • OpenDP: Open-source DP platform
  • SmartNoise: Microsoft’s DP toolkit
  • Google’s DP Library: Production-ready implementations
  • DP-Bench: Benchmarking differential privacy mechanisms

Communities and Forums

  • r/DifferentialPrivacy: Reddit community
  • OpenMined Slack: DP discussion channels
  • TPDP Workshop: Theory and Practice of Differential Privacy
  • Privacy-Preserving ML Meetups: Local and virtual events

Last Updated: May 2025 | This cheatsheet covers the essential concepts and practical aspects of differential privacy. For the most current research and implementations, consult the latest academic papers and official library documentation.

Scroll to Top