Methodology GuideMathematics & StatisticsCausal Inference

Conformal Prediction Under Distribution Shift: Coverage Guarantees When the World Changes

Conformal prediction provides distribution-free coverage guaranteesโ€”but only when calibration and test data are exchangeable. Three 2025 papers extend CP to the real world: adaptive methods for drifting time series, optimal transport for distribution shift, and robust calibration under label corruption.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Uncertainty quantification is not optional for consequential predictions. A medical diagnosis without a confidence interval is a guess. A financial forecast without a prediction interval is a liability. A manufacturing quality prediction without an uncertainty band is an invitation to produce defective products.

Conformal prediction (CP) offers something that no other uncertainty quantification method provides: finite-sample, distribution-free coverage guarantees. For any predictive modelโ€”neural network, random forest, linear regressionโ€”CP constructs prediction sets that contain the true value with a user-specified probability (e.g., 90%), without any assumption about the data distribution or the model's correctness. This guarantee holds in finite samples, not just asymptotically.

The catch is exchangeability: CP assumes that calibration data and test data are drawn from the same distribution. In practice, distributions shiftโ€”manufacturing processes drift over time, patient populations change between hospitals, financial markets evolve. When exchangeability is violated, CP's coverage guarantee breaks, and prediction intervals may be misleadingly narrow or wastefully wide.

The 2025 research frontier addresses three distinct violations of exchangeability, extending CP's rigorous guarantees to the messy, non-stationary real world.

Adaptive CP for Temporal Drift

Zhang & Zhou (IEEE Transactions on Industrial Informatics) address the most common violation in industrial applications: temporal distribution shift in time series data. Manufacturing sensor readings, equipment performance metrics, and process quality indicators all drift over time as equipment ages, raw materials change, and operating conditions fluctuate.

Their adaptive conformal prediction maintains coverage under drift through a dynamic learning rate that tracks the empirical coverage of recent predictions:

  • If recent coverage falls below the target (intervals are too narrow for the current distribution), the algorithm widens future intervals
  • If recent coverage exceeds the target (intervals are wastefully wide), it narrows them
  • The adaptation rate is itself adaptiveโ€”responding more aggressively to rapid shifts and more conservatively to gradual drift
The theoretical contribution is a convergence proof: under mild regularity conditions on the drift process, the long-run average coverage converges to the target rate. This is weaker than the finite-sample guarantee of standard CP (which holds exactly for each test point) but meaningful for applications where approximate coverage over time is acceptable.

Optimal Transport for Arbitrary Distribution Shifts

Correia & Louizos provide an elegant solution for a different violation scenario: arbitrary distribution shifts between calibration and test data, crucially without requiring prior knowledge of what type of shift has occurred. Existing methods for handling non-exchangeable CP typically require specifying the nature of the shift (e.g., covariate shift, label shift) before applying the correctionโ€”a requirement that is often infeasible in practice.

Their insight: optimal transport can estimate the mapping between the calibration feature distribution and the test feature distribution using only unlabeled test data. This mapping enables reweighting of calibration nonconformity scores to reflect the test distribution, approximately restoring the coverage guaranteeโ€”regardless of whether the shift is covariate shift, label shift, or a more complex combination.

The method requires no labels from the test distributionโ€”only features. This is practically significant because in many deployment scenarios (a medical model deployed at a new hospital, a quality model applied to a new factory), unlabeled data from the target domain is abundant even when labeled data is unavailable, and the nature of the distribution shift is unknown.

Robust CP Under Label Corruption

Feldman et al. address a third practical concern: corrupted calibration labels. Real-world calibration data contains annotation errorsโ€”mislabeled examples, missing values, noisy measurements. Standard CP assumes correct calibration labels and provides no guarantee when this assumption is violated.

Their framework distinguishes between two types of label corruption:

  • Missing labels: Some calibration examples have no label (missing completely at random or missing at random). The framework uses multiple imputation to generate plausible labels for missing entries, then applies CP with appropriate coverage adjustment.
  • Noisy labels: Some calibration labels are incorrect. The framework uses density ratio reweighting to down-weight examples likely to be mislabeled, maintaining approximate coverage despite the noise.

A Practitioner's Decision Framework

For researchers and engineers choosing among CP variants, the decision depends on the nature of the exchangeability violation:

<
Violation TypeMethodData RequirementGuarantee Strength
No violation (exchangeable)Standard split CPLabeled calibration setExact finite-sample
Temporal driftAdaptive CP (Zhang & Zhou)Recent prediction outcomesLong-run average
Arbitrary distribution shift (type unknown)OT-weighted CP (Correia & Louizos)Unlabeled test featuresApproximate
Label corruptionRobust CP (Feldman et al.)Corruption rate estimateApproximate
Multiple violationsCombination neededDomain-specific designCase-by-case

Claims and Evidence

<
ClaimEvidenceVerdict
Standard CP provides exact finite-sample coverageMathematical proof under exchangeabilityโœ… Proven
Adaptive CP maintains coverage under temporal driftConvergence proof + empirical validation on industrial dataโœ… Supported
OT-based reweighting restores coverage under arbitrary distribution shift (without knowing shift type)Theoretical bounds + experimental validationโœ… Supported
Robust CP handles label corruption gracefullyFramework with theoretical analysis; empirical validationโœ… Supported
A single CP method handles all types of distribution shiftEach method addresses a specific violation typeโŒ No universal method

Open Questions

  • Conditional coverage: All methods discussed provide marginal coverage (averaged over the test distribution). Can we achieve conditional coverage (valid for specific subgroups) under distribution shift? This is substantially harder and remains open.
  • Multi-dimensional prediction sets: CP for scalar outputs is well-understood. For vector-valued outputs (multi-target regression, image reconstruction), constructing efficient prediction sets with valid coverage is an active research area.
  • Online learning integration: Can CP be integrated with online learning algorithms that continuously update the predictive model? The interaction between model updates and calibration set management creates non-trivial challenges.
  • Adversarial shift: The methods above assume natural (non-adversarial) distribution shift. Under adversarial shiftโ€”where an attacker deliberately manipulates the test distribution to invalidate CP guaranteesโ€”different defenses are needed.
  • Computational cost: OT-based reweighting and multiple imputation add computational overhead to CP. For real-time applications, this overhead must be bounded. What are the minimal-cost approximations that maintain coverage?
  • What This Means for Your Research

    For statisticians, conformal prediction under distribution shift is a vibrant research frontier where theoretical rigor meets practical necessity. The three papers reviewed here demonstrate that CP's foundational insights (using calibration residuals to construct prediction sets) are flexible enough to accommodate violations that the original framework did not anticipate.

    For ML practitioners, CP should be the default uncertainty quantification method for any deployment where prediction errors have consequences. The distribution shift extensions reviewed here remove the primary objection to CP adoption ("my data isn't exchangeable")โ€”providing robust uncertainty quantification that is practical, theoretically grounded, and model-agnostic.

    For domain scientists (industrial engineers, clinicians, environmental scientists) who use ML predictions as inputs to decisions, CP provides something no other method offers: a prediction interval you can trustโ€”not because the model is perfect, but because the coverage guarantee holds regardless of model quality.

    References (3)

    [1] Zhang, R. & Zhou, P. (2025). Uncertainty Quantification Based on Conformal Prediction for Industrial Time Series With Distribution Shift. IEEE TII.
    [2] Correia, A. & Louizos, C. (2025). Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data. arXiv:2507.10425.
    [3] Feldman, S., Bates, S., Romano, Y. (2025). Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting. arXiv:2505.04733.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords โ†’