Methodology GuideMathematics & StatisticsCausal Inference

Conformal Prediction Under Distribution Shift: Coverage Guarantees When the World Changes

Conformal prediction provides distribution-free coverage guarantees—but only when calibration and test data are exchangeable. Three 2025 papers extend CP to the real world: adaptive methods for drifting time series, optimal transport for distribution shift, and robust calibration under label corruption.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Uncertainty quantification is not optional for consequential predictions. A medical diagnosis without a confidence interval is a guess. A financial forecast without a prediction interval is a liability. A manufacturing quality prediction without an uncertainty band is an invitation to produce defective products.

Conformal prediction (CP) offers something that no other uncertainty quantification method provides: finite-sample, distribution-free coverage guarantees. For any predictive model—neural network, random forest, linear regression—CP constructs prediction sets that contain the true value with a user-specified probability (e.g., 90%), without any assumption about the data distribution or the model's correctness. This guarantee holds in finite samples, not just asymptotically.

The catch is exchangeability: CP assumes that calibration data and test data are drawn from the same distribution. In practice, distributions shift—manufacturing processes drift over time, patient populations change between hospitals, financial markets evolve. When exchangeability is violated, CP's coverage guarantee breaks, and prediction intervals may be misleadingly narrow or wastefully wide.

The 2025 research frontier addresses three distinct violations of exchangeability, extending CP's rigorous guarantees to the messy, non-stationary real world.

Adaptive CP for Temporal Drift

Zhang & Zhou (IEEE Transactions on Industrial Informatics) address the most common violation in industrial applications: temporal distribution shift in time series data. Manufacturing sensor readings, equipment performance metrics, and process quality indicators all drift over time as equipment ages, raw materials change, and operating conditions fluctuate.

Their adaptive conformal prediction maintains coverage under drift through a dynamic learning rate that tracks the empirical coverage of recent predictions:

If recent coverage falls below the target (intervals are too narrow for the current distribution), the algorithm widens future intervals
If recent coverage exceeds the target (intervals are wastefully wide), it narrows them
The adaptation rate is itself adaptive—responding more aggressively to rapid shifts and more conservatively to gradual drift

The theoretical contribution is a convergence proof: under mild regularity conditions on the drift process, the long-run average coverage converges to the target rate. This is weaker than the finite-sample guarantee of standard CP (which holds exactly for each test point) but meaningful for applications where approximate coverage over time is acceptable.

Optimal Transport for Arbitrary Distribution Shifts

Correia & Louizos provide an elegant solution for a different violation scenario: arbitrary distribution shifts between calibration and test data, crucially without requiring prior knowledge of what type of shift has occurred. Existing methods for handling non-exchangeable CP typically require specifying the nature of the shift (e.g., covariate shift, label shift) before applying the correction—a requirement that is often infeasible in practice.

Their insight: optimal transport can estimate the mapping between the calibration feature distribution and the test feature distribution using only unlabeled test data. This mapping enables reweighting of calibration nonconformity scores to reflect the test distribution, approximately restoring the coverage guarantee—regardless of whether the shift is covariate shift, label shift, or a more complex combination.

The method requires no labels from the test distribution—only features. This is practically significant because in many deployment scenarios (a medical model deployed at a new hospital, a quality model applied to a new factory), unlabeled data from the target domain is abundant even when labeled data is unavailable, and the nature of the distribution shift is unknown.

Robust CP Under Label Corruption

Feldman et al. address a third practical concern: corrupted calibration labels. Real-world calibration data contains annotation errors—mislabeled examples, missing values, noisy measurements. Standard CP assumes correct calibration labels and provides no guarantee when this assumption is violated.

Their framework distinguishes between two types of label corruption:

Missing labels: Some calibration examples have no label (missing completely at random or missing at random). The framework uses multiple imputation to generate plausible labels for missing entries, then applies CP with appropriate coverage adjustment.
Noisy labels: Some calibration labels are incorrect. The framework uses density ratio reweighting to down-weight examples likely to be mislabeled, maintaining approximate coverage despite the noise.

A Practitioner's Decision Framework

For researchers and engineers choosing among CP variants, the decision depends on the nature of the exchangeability violation:

Violation Type	Method	Data Requirement	Guarantee Strength
No violation (exchangeable)	Standard split CP	Labeled calibration set	Exact finite-sample
Temporal drift	Adaptive CP (Zhang & Zhou)	Recent prediction outcomes	Long-run average
Arbitrary distribution shift (type unknown)	OT-weighted CP (Correia & Louizos)	Unlabeled test features	Approximate
Label corruption	Robust CP (Feldman et al.)	Corruption rate estimate	Approximate
Multiple violations	Combination needed	Domain-specific design	Case-by-case

Claims and Evidence

Claim	Evidence	Verdict
Standard CP provides exact finite-sample coverage	Mathematical proof under exchangeability	✅ Proven
Adaptive CP maintains coverage under temporal drift	Convergence proof + empirical validation on industrial data	✅ Supported
OT-based reweighting restores coverage under arbitrary distribution shift (without knowing shift type)	Theoretical bounds + experimental validation	✅ Supported
Robust CP handles label corruption gracefully	Framework with theoretical analysis; empirical validation	✅ Supported
A single CP method handles all types of distribution shift	Each method addresses a specific violation type	❌ No universal method

Open Questions

Conditional coverage: All methods discussed provide marginal coverage (averaged over the test distribution). Can we achieve conditional coverage (valid for specific subgroups) under distribution shift? This is substantially harder and remains open.

Multi-dimensional prediction sets: CP for scalar outputs is well-understood. For vector-valued outputs (multi-target regression, image reconstruction), constructing efficient prediction sets with valid coverage is an active research area.

Online learning integration: Can CP be integrated with online learning algorithms that continuously update the predictive model? The interaction between model updates and calibration set management creates non-trivial challenges.

Adversarial shift: The methods above assume natural (non-adversarial) distribution shift. Under adversarial shift—where an attacker deliberately manipulates the test distribution to invalidate CP guarantees—different defenses are needed.

Computational cost: OT-based reweighting and multiple imputation add computational overhead to CP. For real-time applications, this overhead must be bounded. What are the minimal-cost approximations that maintain coverage?

What This Means for Your Research

For statisticians, conformal prediction under distribution shift is a vibrant research frontier where theoretical rigor meets practical necessity. The three papers reviewed here demonstrate that CP's foundational insights (using calibration residuals to construct prediction sets) are flexible enough to accommodate violations that the original framework did not anticipate.

For ML practitioners, CP should be the default uncertainty quantification method for any deployment where prediction errors have consequences. The distribution shift extensions reviewed here remove the primary objection to CP adoption ("my data isn't exchangeable")—providing robust uncertainty quantification that is practical, theoretically grounded, and model-agnostic.

For domain scientists (industrial engineers, clinicians, environmental scientists) who use ML predictions as inputs to decisions, CP provides something no other method offers: a prediction interval you can trust—not because the model is perfect, but because the coverage guarantee holds regardless of model quality.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문을 통해 검증해야 한다.

분포 변화 하에서의 공형 예측: 세계가 변할 때의 커버리지 보장

결과가 중요한 예측에서 불확실성 정량화는 선택 사항이 아니다. 신뢰 구간이 없는 의학적 진단은 추측에 불과하다. 예측 구간이 없는 금융 예측은 법적 책임을 초래한다. 불확실성 범위가 없는 제조 품질 예측은 불량품 생산을 초래한다.

공형 예측(CP)은 다른 어떤 불확실성 정량화 방법도 제공하지 못하는 것을 제공한다: 유한 표본, 분포 무관 커버리지 보장. 신경망, 랜덤 포레스트, 선형 회귀 등 어떠한 예측 모델에 대해서도, CP는 데이터 분포나 모델의 정확성에 대한 어떠한 가정 없이 사용자가 지정한 확률(예: 90%)로 실제 값을 포함하는 예측 집합을 구성한다. 이 보장은 점근적으로만 성립하는 것이 아니라 유한 표본에서도 성립한다.

문제는 교환 가능성이다: CP는 보정 데이터와 테스트 데이터가 동일한 분포에서 추출된다고 가정한다. 실제로는 분포가 변화한다—제조 공정은 시간이 지남에 따라 변화하고, 환자 집단은 병원마다 달라지며, 금융 시장은 진화한다. 교환 가능성이 위반되면 CP의 커버리지 보장이 무너지고, 예측 구간이 오해를 불러일으킬 만큼 좁아지거나 낭비적으로 넓어질 수 있다.

2025년 연구 최전선은 교환 가능성의 세 가지 위반 유형을 다루며, CP의 엄격한 보장을 복잡하고 비정상적인 실제 세계로 확장한다.

시간적 드리프트를 위한 적응형 CP

Zhang & Zhou (IEEE Transactions on Industrial Informatics)는 산업 응용에서 가장 흔한 위반 유형인 시계열 데이터의 시간적 분포 변화를 다룬다. 제조 센서 판독값, 장비 성능 지표, 공정 품질 지표는 장비 노후화, 원자재 변경, 운영 조건 변동에 따라 시간이 지남에 따라 모두 변화한다.

그들의 적응형 공형 예측은 최근 예측의 경험적 커버리지를 추적하는 동적 학습률을 통해 드리프트 하에서도 커버리지를 유지한다:

최근 커버리지가 목표치를 밑돌면(현재 분포에 대해 구간이 너무 좁으면), 알고리즘은 향후 구간을 넓힌다
최근 커버리지가 목표치를 초과하면(구간이 낭비적으로 넓으면), 구간을 좁힌다
적응 속도 자체도 적응적이다—급격한 변화에는 더 공격적으로, 점진적인 드리프트에는 더 보수적으로 반응한다

이론적 기여는 수렴 증명이다: 드리프트 과정에 대한 완만한 정칙 조건 하에서, 장기 평균 커버리지는 목표율로 수렴한다. 이는 표준 CP의 유한 표본 보장(각 테스트 포인트에 대해 정확하게 성립)보다 약하지만, 시간에 따른 근사적 커버리지가 허용되는 응용에서는 의미 있는 결과이다.

임의적 분포 변화를 위한 최적 수송

Correia & Louizos는 다른 위반 시나리오에 대한 우아한 해결책을 제시한다: 발생한 변화의 유형에 대한 사전 지식 없이도 대처할 수 있다는 점이 핵심인, 보정 데이터와 테스트 데이터 간의 임의적 분포 변화이다. 교환 불가능한 CP를 처리하는 기존 방법들은 일반적으로 수정을 적용하기 전에 변화의 성격(예: 공변량 변화, 레이블 변화)을 명시해야 하는데, 이는 실제로 실행 불가능한 경우가 많다.

그들의 통찰: 최적 수송은 레이블이 없는 테스트 데이터만을 사용하여 보정 특성 분포와 테스트 특성 분포 간의 매핑을 추정할 수 있다. 이 매핑은 테스트 분포를 반영하도록 보정 비적합도 점수를 재가중치화하여, 변화가 공변량 변화인지, 레이블 변화인지, 또는 더 복잡한 조합인지에 관계없이 커버리지 보장을 근사적으로 복원한다. 이 방법은 테스트 분포에서 레이블이 필요하지 않으며, 오직 피처(feature)만 필요하다. 이는 실용적으로 중요한 의미를 지닌다. 왜냐하면 많은 배포 시나리오(새로운 병원에 배포된 의료 모델, 새로운 공장에 적용된 품질 모델 등)에서, 레이블된 데이터를 구하기 어렵고 분포 이동의 특성을 알 수 없는 경우에도, 대상 도메인(target domain)으로부터 레이블되지 않은 데이터는 풍부하게 존재하기 때문이다.

레이블 오염에 강건한 CP

Feldman 등은 세 번째 실용적 문제, 즉 오염된 캘리브레이션(calibration) 레이블을 다룬다. 실세계의 캘리브레이션 데이터에는 주석 오류, 잘못 레이블된 예시, 누락된 값, 노이즈가 포함된 측정값 등의 어노테이션 오류가 존재한다. 표준 CP는 캘리브레이션 레이블이 정확하다고 가정하며, 이 가정이 위반될 경우 어떠한 보장도 제공하지 않는다.

이들의 프레임워크(framework)는 레이블 오염을 두 가지 유형으로 구분한다:

누락 레이블: 일부 캘리브레이션 예시에 레이블이 없는 경우(완전 무작위 누락 또는 무작위 누락). 이 프레임워크는 다중 대체(multiple imputation)를 사용하여 누락된 항목에 대한 그럴듯한 레이블을 생성한 후, 적절한 커버리지(coverage) 조정을 적용하여 CP를 수행한다.
노이즈 레이블: 일부 캘리브레이션 레이블이 잘못된 경우. 이 프레임워크는 밀도 비율 재가중치(density ratio reweighting)를 사용하여 잘못 레이블될 가능성이 높은 예시의 가중치를 낮추고, 노이즈에도 불구하고 근사 커버리지를 유지한다.

실무자를 위한 의사결정 프레임워크

CP 변형들 중에서 선택해야 하는 연구자와 엔지니어에게, 결정은 교환가능성(exchangeability) 위반의 특성에 따라 달라진다:

위반 유형	방법	데이터 요구사항	보장 강도
위반 없음 (교환가능)	표준 분할 CP	레이블된 캘리브레이션 세트	정확한 유한 표본
시간적 드리프트(temporal drift)	적응형 CP (Zhang & Zhou)	최근 예측 결과	장기 평균
임의적 분포 이동 (유형 미지)	OT 가중 CP (Correia & Louizos)	레이블 없는 테스트 피처	근사
레이블 오염	강건 CP (Feldman 등)	오염률 추정치	근사
복합 위반	조합 필요	도메인별 설계	사례별

주장과 근거

주장	근거	판정
표준 CP는 정확한 유한 표본 커버리지를 제공한다	교환가능성 하에서의 수학적 증명	✅ 증명됨
적응형 CP는 시간적 드리프트 하에서 커버리지를 유지한다	수렴 증명 + 산업 데이터에 대한 실증 검증	✅ 지지됨
OT 기반 재가중치는 임의적 분포 이동 하에서 (이동 유형을 알지 못해도) 커버리지를 복원한다	이론적 경계 + 실험적 검증	✅ 지지됨
강건 CP는 레이블 오염을 적절히 처리한다	이론적 분석을 포함한 프레임워크; 실증 검증	✅ 지지됨
단일 CP 방법이 모든 유형의 분포 이동을 처리한다	각 방법은 특정 위반 유형에 대응함	❌ 범용 방법 없음

미해결 문제

조건부 커버리지: 논의된 모든 방법은 주변(marginal) 커버리지(테스트 분포에 대해 평균화됨)를 제공한다. 분포 이동 하에서 조건부 커버리지(특정 하위 그룹에 대해 유효한)를 달성할 수 있는가? 이는 실질적으로 더 어려운 문제이며 여전히 미해결 상태로 남아 있다.

다차원 예측 집합: 스칼라 출력에 대한 CP는 잘 이해되어 있다. 벡터 값 출력(다중 타깃 회귀, 이미지 복원)의 경우, 유효한 커버리지를 갖춘 효율적인 예측 집합을 구성하는 것은 활발한 연구 분야이다.

온라인 학습 통합: CP를 예측 모델을 지속적으로 업데이트하는 온라인 학습 알고리즘과 통합할 수 있는가? 모델 업데이트와 캘리브레이션 세트 관리 사이의 상호작용은 자명하지 않은 과제를 야기한다.

적대적 이동(adversarial shift): 위의 방법들은 자연적(비적대적) 분포 이동을 가정한다. 공격자가 CP 보장을 무효화하기 위해 의도적으로 테스트 분포를 조작하는 적대적 이동 하에서는 다른 방어 수단이 필요하다.

계산 비용: OT 기반 재가중치 부여와 다중 대입(multiple imputation)은 CP에 계산 오버헤드를 추가한다. 실시간 애플리케이션의 경우, 이 오버헤드는 제한적이어야 한다. 커버리지를 유지하는 최소 비용 근사는 무엇인가?

연구에 대한 시사점

통계학자들에게 있어, 분포 변화 하에서의 conformal prediction은 이론적 엄밀성과 실제적 필요성이 만나는 활발한 연구 최전선이다. 본 리뷰에서 검토한 세 편의 논문은 CP의 근본적인 통찰(예측 집합을 구성하기 위해 보정 잔차를 사용하는 것)이 원래의 프레임워크가 예상하지 못했던 위반 사항들을 수용할 만큼 충분히 유연하다는 것을 보여준다.

ML 실무자들에게 있어, CP는 예측 오류가 결과를 초래하는 모든 배포 환경에서 기본 불확실성 정량화 방법이 되어야 한다. 여기서 검토한 분포 변화 확장 기법들은 CP 도입에 대한 주요 반론("내 데이터는 교환 가능하지 않다")을 제거하여, 실용적이고 이론적으로 근거가 있으며 모델에 구애받지 않는 강건한 불확실성 정량화를 제공한다.

ML 예측을 의사결정의 입력으로 사용하는 도메인 과학자들(산업 엔지니어, 임상의, 환경 과학자)에게 있어, CP는 다른 어떤 방법도 제공하지 못하는 것을 제공한다: 신뢰할 수 있는 예측 구간—모델이 완벽하기 때문이 아니라, 모델 품질과 무관하게 커버리지 보장이 유지되기 때문이다.

References (3)

[1] Zhang, R. & Zhou, P. (2025). Uncertainty Quantification Based on Conformal Prediction for Industrial Time Series With Distribution Shift. IEEE TII.

DOI Scholar

[2] Correia, A. & Louizos, C. (2025). Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data. arXiv:2507.10425.

DOI Scholar

[3] Feldman, S., Bates, S., Romano, Y. (2025). Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting. arXiv:2505.04733.

DOI Scholar

Conformal Prediction Under Distribution Shift: Coverage Guarantees When the World Changes

Adaptive CP for Temporal Drift

Optimal Transport for Arbitrary Distribution Shifts

Robust CP Under Label Corruption

A Practitioner's Decision Framework

Claims and Evidence

Open Questions

What This Means for Your Research

분포 변화 하에서의 공형 예측: 세계가 변할 때의 커버리지 보장

시간적 드리프트를 위한 적응형 CP

임의적 분포 변화를 위한 최적 수송

레이블 오염에 강건한 CP

실무자를 위한 의사결정 프레임워크

주장과 근거

미해결 문제

연구에 대한 시사점

References (3)

Explore this topic deeper