Methodology GuideAI & Machine LearningMachine/Deep Learning

Conformal Prediction: Distribution-Free Uncertainty That Actually Works

Most ML models give you a prediction but no reliable measure of how wrong it might be. Conformal prediction offers something remarkable: finite-sample coverage guarantees with no distributional assumptions. In 2025, the method is conquering its two remaining weaknesses—distribution shift and label corruption.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Every prediction is wrong. The only question is by how much—and whether the system tells you. Most machine learning models produce point predictions with no reliable indication of their uncertainty. A neural network that predicts tomorrow's stock price at $142.37 gives you no principled way to know whether the true value might be $142 or $120. Bayesian methods offer uncertainty estimates but require distributional assumptions that are routinely violated. Ensemble methods provide heuristic uncertainty but no formal guarantees.

Conformal prediction is different. It provides prediction sets—intervals for regression, collections of labels for classification—that are guaranteed to contain the true value with a user-specified probability. Not asymptotically. Not under Gaussian assumptions. Guaranteed in finite samples, for any underlying distribution. The only requirement: exchangeability of the calibration and test data.

In 2025, conformal prediction is transitioning from a theoretical curiosity to a practical tool, as researchers systematically dismantle the two remaining barriers to real-world deployment: distribution shift and data corruption.

The Elegance of Split Conformal Prediction

The core idea is disarmingly simple. Split your labeled data into training and calibration sets. Train any predictive model on the training set. On the calibration set, compute nonconformity scores—a measure of how "surprising" each true label is given the model's prediction (e.g., the absolute residual |y - ŷ| for regression). Then, for a new test point, construct a prediction set by including all labels whose nonconformity score would be below the (1-α) quantile of the calibration scores.

The result: a prediction interval that covers the true value with probability at least (1-α), regardless of the model's quality or the data distribution. A bad model produces wide intervals; a good model produces narrow ones. But both provide valid coverage.

This universality is conformal prediction's greatest strength and its most counterintuitive property. It seems too good to be true—and the catch is the exchangeability requirement. If the calibration data and test data are drawn from different distributions, the coverage guarantee breaks.

Conquering Distribution Shift

Zhang & Zhou tackle the most practically important violation of exchangeability: temporal distribution shift in industrial time series. Manufacturing processes drift over time—sensor calibration degrades, raw materials change suppliers, equipment ages. A conformal prediction interval calibrated on last month's data may not provide valid coverage for next month's predictions.

Their approach uses adaptive conformal prediction with a dynamic learning rate that tracks the empirical coverage in a sliding window. When coverage drops below the target, the algorithm widens prediction intervals; when coverage exceeds the target, it narrows them. The adaptation is principled—not heuristic—and they prove that the long-run average coverage converges to the target rate even under continuous drift.

Correia & Louizos propose a more theoretically ambitious solution using optimal transport. Their insight: even when labeled calibration data and unlabeled test data come from different distributions, we can estimate the transport map between them using the unlabeled test features. This map allows reweighting calibration nonconformity scores to reflect the test distribution, restoring approximate coverage guarantees.

The optimal transport approach is particularly elegant because it requires no labels from the test distribution—only unlabeled features. In many practical settings (deploying a medical model at a new hospital, applying a financial model to a new market), unlabeled data from the target domain is abundant even when labeled data is scarce.

Beyond Standard Settings

Everink et al. extend conformal prediction to inverse imaging problems—tasks like MRI reconstruction, deblurring, and super-resolution where the goal is to recover a clean image from corrupted observations. These problems involve massive uncertainty (there are infinitely many clean images consistent with a blurry observation), and existing methods provide pixel-level uncertainty maps that are difficult to calibrate.

Their self-supervised approach constructs conformal prediction sets in the image space itself, providing regions of pixel values that are guaranteed to contain the true image with specified probability. The method requires no ground-truth clean images for calibration—only the corrupted observations—making it applicable in settings where ground truth is unavailable by definition.

Feldman et al. address a different practical concern: corrupted labels. Real-world datasets contain annotation errors—mislabeled training examples, missing values, noisy measurements. Standard conformal prediction assumes correct calibration labels and fails silently when this assumption is violated. Their framework provides robust coverage guarantees under specified rates of label corruption, using imputation techniques to handle missing labels and reweighting to handle noisy ones.

Claims and Evidence

Claim	Evidence	Verdict
Conformal prediction provides finite-sample coverage guarantees	Mathematical proof under exchangeability; widely replicated	✅ Proven
Adaptive methods maintain coverage under temporal drift	Zhang & Zhou demonstrate convergent coverage on industrial data	✅ Supported
Optimal transport restores coverage under distribution shift	Correia & Louizos prove approximate coverage bounds	✅ Supported (theoretical)
Conformal prediction works for imaging inverse problems	Everink et al. demonstrate on MRI and deblurring	✅ Supported
Standard conformal prediction is robust to label noise	Feldman et al. show it fails; their method corrects this	❌ Standard CP is not robust; corrected version is

Open Questions

Conditional coverage: Standard conformal prediction guarantees marginal coverage (averaged over all test points) but not conditional coverage (for specific subgroups). A model might provide valid overall coverage while systematically undercovering rare but important subpopulations. How do we achieve group-conditional coverage without requiring group labels?

Prediction set size as a metric: Valid but uninformatively wide prediction sets are useless. The field needs standardized metrics that reward informativeness (narrow sets) alongside validity (correct coverage).

Integration with decision-making: Coverage guarantees are stated in terms of prediction accuracy. But decisions depend on costs—and the cost of under-coverage may differ dramatically from the cost of over-coverage. How do we build cost-sensitive conformal prediction?

Conformal prediction for generative models: Can we provide coverage guarantees for the outputs of language models or image generators? The high-dimensional, discrete (language) or continuous (image) output spaces present novel challenges.

Computational scalability: Full conformal prediction requires retraining the model for every test point—computationally prohibitive for large models. Split conformal prediction is efficient but potentially less powerful. Is there a middle ground?

What This Means for Your Research

If you deploy machine learning models in any domain where prediction errors have consequences—medicine, finance, engineering, policy—conformal prediction should be in your toolkit. It is the only method that provides genuine coverage guarantees without distributional assumptions.

The 2025 advances address the two objections that previously limited practical adoption: "my data has distribution shift" (solved by adaptive and optimal transport methods) and "my labels are noisy" (solved by robust calibration). The remaining challenge—conditional coverage for subgroups—is an active research frontier with high practical relevance.

For the broader AI community, conformal prediction embodies a principle that deserves wider adoption: it is better to admit uncertainty honestly than to provide precise predictions that are unreliably calibrated. In a field obsessed with pushing accuracy numbers higher, conformal prediction insists that knowing what you don't know is at least as important as knowing what you do.

면책 조항: 이 포스트는 정보 제공을 목적으로 한 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 발견, 통계 및 주장을 원본 논문과 대조하여 검증해야 한다.

등각 예측(Conformal Prediction): 실제로 작동하는 분포 무가정 불확실성

모든 예측은 틀린다. 유일한 문제는 얼마나 틀리는가, 그리고 시스템이 그 사실을 알려주는가이다. 대부분의 머신러닝 모델은 불확실성에 대한 신뢰할 수 있는 지표 없이 점 예측만을 생성한다. 내일의 주가를 $142.37로 예측하는 신경망은 실제 값이 $142일지 $120일지를 원칙적으로 판단할 어떠한 방법도 제공하지 않는다. 베이즈 방법론은 불확실성 추정을 제공하지만, 일상적으로 위반되는 분포 가정을 필요로 한다. 앙상블 방법은 휴리스틱한 불확실성을 제공하지만 공식적인 보장은 없다.

등각 예측은 다르다. 등각 예측은 예측 집합—회귀의 경우 구간, 분류의 경우 레이블 집합—을 제공하며, 이 집합은 사용자가 지정한 확률로 실제 값을 포함하도록 보장된다. 점근적으로 보장되는 것이 아니다. 가우시안 가정 하에서 보장되는 것도 아니다. 임의의 기저 분포에 대해 유한 표본에서 보장된다. 유일한 요건은 캘리브레이션 데이터와 테스트 데이터의 교환 가능성(exchangeability)이다.

2025년, 등각 예측은 이론적 호기심에서 실용적 도구로 전환되고 있다. 연구자들이 실세계 배포를 가로막는 두 가지 남은 장벽인 분포 변화와 데이터 오염을 체계적으로 해결하고 있기 때문이다.

분할 등각 예측(Split Conformal Prediction)의 우아함

핵심 아이디어는 놀랄 만큼 단순하다. 레이블된 데이터를 훈련 세트와 캘리브레이션 세트로 분할한다. 훈련 세트에서 임의의 예측 모델을 학습시킨다. 캘리브레이션 세트에서 비순응도 점수(nonconformity scores)를 계산한다. 이는 모델의 예측에 비추어 각 실제 레이블이 얼마나 "놀라운지"를 측정하는 지표이다(예: 회귀의 경우 절대 잔차 |y - ŷ|). 그런 다음, 새로운 테스트 포인트에 대해, 비순응도 점수가 캘리브레이션 점수의 (1-α) 분위수 아래에 해당하는 모든 레이블을 포함하는 예측 집합을 구성한다.

결과는 모델의 품질이나 데이터 분포에 관계없이 적어도 (1-α)의 확률로 실제 값을 포함하는 예측 구간이다. 성능이 나쁜 모델은 넓은 구간을 생성하고, 성능이 좋은 모델은 좁은 구간을 생성한다. 그러나 둘 다 유효한 커버리지를 제공한다.

이 보편성은 등각 예측의 가장 큰 강점이자 가장 반직관적인 특성이다. 너무 좋아서 사실이 아닌 것처럼 보인다. 그 함정은 교환 가능성 요건이다. 캘리브레이션 데이터와 테스트 데이터가 서로 다른 분포에서 추출되면 커버리지 보장이 깨진다.

분포 변화 극복

Zhang & Zhou는 교환 가능성의 가장 실질적으로 중요한 위반 사례, 즉 산업용 시계열에서의 시간적 분포 변화를 다룬다. 제조 공정은 시간이 지남에 따라 변화한다—센서 캘리브레이션이 저하되고, 원자재 공급업체가 변경되며, 장비가 노후화된다. 지난달 데이터로 캘리브레이션된 등각 예측 구간은 다음 달 예측에 대해 유효한 커버리지를 제공하지 못할 수 있다.

이들의 접근법은 슬라이딩 윈도우 내에서 실험적 커버리지를 추적하는 동적 학습률을 가진 적응형 등각 예측을 사용한다. 커버리지가 목표치 아래로 떨어지면 알고리즘은 예측 구간을 넓히고, 커버리지가 목표치를 초과하면 좁힌다. 이 적응은 휴리스틱이 아닌 원칙에 기반하며, 지속적인 변화 하에서도 장기 평균 커버리지가 목표 비율로 수렴함을 증명한다.

Correia & Louizos는 최적 수송(optimal transport)을 사용하는 보다 이론적으로 야심찬 해결책을 제안한다. 이들의 통찰은 다음과 같다: 레이블된 캘리브레이션 데이터와 레이블되지 않은 테스트 데이터가 서로 다른 분포에서 나온 경우에도, 레이블되지 않은 테스트 피처를 사용하여 두 분포 사이의 수송 맵을 추정할 수 있다. 이 맵을 통해 캘리브레이션 비순응도 점수를 테스트 분포에 맞게 재가중하여 근사적인 커버리지 보장을 복원할 수 있다. 최적 수송(optimal transport) 접근법은 테스트 분포로부터 레이블이 필요 없고, 레이블이 없는 특징(feature)만 필요하다는 점에서 특히 우아하다. 많은 실용적 환경(새로운 병원에 의료 모델을 배포하거나, 새로운 시장에 금융 모델을 적용하는 경우)에서, 레이블이 있는 데이터가 부족한 경우에도 타깃 도메인의 레이블 없는 데이터는 풍부하다.

표준 설정을 넘어서

Everink 등은 순형 예측(conformal prediction)을 역 이미징 문제(inverse imaging problems)—MRI 복원, 블러 제거(deblurring), 초해상도(super-resolution)처럼 손상된 관측으로부터 깨끗한 이미지를 복원하는 것이 목표인 작업—로 확장한다. 이러한 문제들은 방대한 불확실성을 수반하며(흐릿한 관측과 일치하는 깨끗한 이미지가 무한히 많이 존재한다), 기존 방법들은 보정하기 어려운 픽셀 수준의 불확실성 맵을 제공한다.

이들의 자기 지도(self-supervised) 접근법은 이미지 공간 자체에서 순형 예측 집합을 구성하여, 지정된 확률로 실제 이미지를 포함하도록 보장된 픽셀 값의 영역을 제공한다. 이 방법은 보정을 위한 정답(ground-truth) 깨끗한 이미지가 필요 없고, 손상된 관측만으로도 충분하여, 정의상 정답을 구할 수 없는 환경에서도 적용 가능하다.

Feldman 등은 또 다른 실용적 문제인 오염된 레이블(corrupted labels)을 다룬다. 실제 데이터셋에는 주석 오류—잘못 레이블된 훈련 예제, 누락된 값, 잡음이 있는 측정값—가 포함되어 있다. 표준 순형 예측은 올바른 보정 레이블을 가정하며, 이 가정이 위반될 경우 오류를 드러내지 않고 실패한다. 이들의 프레임워크는 지정된 레이블 오염 비율 하에서 강건한 커버리지 보장을 제공하며, 누락된 레이블을 처리하기 위한 대체(imputation) 기법과 잡음이 있는 레이블을 처리하기 위한 재가중(reweighting) 기법을 사용한다.

주장과 근거

주장	근거	판정
순형 예측은 유한 표본 커버리지 보장을 제공한다	교환 가능성(exchangeability) 하에서의 수학적 증명; 광범위하게 재현됨	✅ 증명됨
적응형 방법은 시간적 드리프트(temporal drift) 하에서도 커버리지를 유지한다	Zhang & Zhou가 산업 데이터에서 수렴하는 커버리지를 입증함	✅ 지지됨
최적 수송은 분포 이동(distribution shift) 하에서도 커버리지를 복원한다	Correia & Louizos가 근사 커버리지 경계를 증명함	✅ 지지됨 (이론적)
순형 예측은 이미징 역 문제에 적용 가능하다	Everink 등이 MRI와 블러 제거에서 입증함	✅ 지지됨
표준 순형 예측은 레이블 잡음에 강건하다	Feldman 등이 실패함을 보임; 이들의 방법이 이를 수정함	❌ 표준 CP는 강건하지 않음; 수정된 버전은 강건함

미해결 문제

조건부 커버리지(conditional coverage): 표준 순형 예측은 주변(marginal) 커버리지(모든 테스트 포인트에 대한 평균)를 보장하지만, 조건부(conditional) 커버리지(특정 하위 집단에 대한)는 보장하지 않는다. 모델이 전체적으로 유효한 커버리지를 제공하면서도 드물지만 중요한 하위 집단을 체계적으로 과소 커버할 수 있다. 그룹 레이블 없이도 그룹 조건부 커버리지를 달성하려면 어떻게 해야 하는가?

지표로서의 예측 집합 크기: 유효하지만 지나치게 넓은 예측 집합은 무용하다. 이 분야에는 유효성(validity)(올바른 커버리지)과 함께 정보성(informativeness)(좁은 집합)에 보상을 주는 표준화된 지표가 필요하다.

의사결정과의 통합: 커버리지 보장은 예측 정확도의 관점에서 명시된다. 그러나 의사결정은 비용(costs)에 의존하며, 과소 커버리지의 비용은 과대 커버리지의 비용과 극적으로 다를 수 있다. 비용 민감 순형 예측(cost-sensitive conformal prediction)은 어떻게 구축할 수 있는가?

생성 모델을 위한 순형 예측: 언어 모델이나 이미지 생성 모델의 출력에 대한 커버리지 보장을 제공할 수 있는가? 고차원의 이산적(언어) 또는 연속적(이미지) 출력 공간은 새로운 도전 과제를 제시한다.

계산 확장성(computational scalability): 완전 순형 예측(full conformal prediction)은 모든 테스트 포인트에 대해 모델을 재훈련해야 하며, 이는 대형 모델에 대해 계산적으로 금지적이다. 분할 순형 예측(split conformal prediction)은 효율적이지만 잠재적으로 성능이 낮다. 중간 지점이 존재하는가?

연구에 대한 시사점

예측 오류가 결과에 영향을 미치는 도메인—의학, 금융, 공학, 정책—에서 머신러닝 모델을 배포한다면, 컨포멀 예측(conformal prediction)은 반드시 갖추어야 할 도구이다. 이는 분포 가정 없이 진정한 커버리지 보장을 제공하는 유일한 방법이다.

2025년의 발전은 이전에 실용적 도입을 제한했던 두 가지 이의를 해결한다: "내 데이터에 분포 이동(distribution shift)이 있다"(적응형 및 최적 운송(optimal transport) 방법으로 해결)와 "내 레이블에 노이즈가 있다"(강건한 보정(robust calibration)으로 해결). 남은 과제인 서브그룹에 대한 조건부 커버리지(conditional coverage)는 실질적 관련성이 높은 활발한 연구 분야이다.

더 넓은 AI 커뮤니티에 있어, 컨포멀 예측은 더 널리 채택될 만한 원칙을 구현한다: 불확실성을 솔직하게 인정하는 것이, 신뢰할 수 없이 보정된 정밀한 예측을 제공하는 것보다 낫다. 정확도 수치를 높이는 데 집착하는 분야에서, 컨포멀 예측은 모르는 것을 아는 것이 아는 것을 아는 것만큼 중요하다고 주장한다.

References (4)

[1] Zhang, R. & Zhou, P. (2025). Uncertainty Quantification Based on Conformal Prediction for Industrial Time Series With Distribution Shift. IEEE TII.

DOI Scholar

[2] Correia, A. & Louizos, C. (2025). Non-exchangeable Conformal Prediction with Optimal Transport. arXiv:2507.10425.

DOI Scholar

[3] Everink, J., Tamo Amougou, B., Pereyra, M. (2025). Self-supervised Conformal Prediction for Uncertainty Quantification in Imaging Problems. arXiv:2502.05127.

DOI Scholar

[4] Feldman, S., Bates, S., Romano, Y. (2025). Conformal Prediction with Corrupted Labels. arXiv:2505.04733.

DOI Scholar

Conformal Prediction: Distribution-Free Uncertainty That Actually Works

The Elegance of Split Conformal Prediction

Conquering Distribution Shift

Beyond Standard Settings

Claims and Evidence

Open Questions

What This Means for Your Research

등각 예측(Conformal Prediction): 실제로 작동하는 분포 무가정 불확실성

분할 등각 예측(Split Conformal Prediction)의 우아함

분포 변화 극복

표준 설정을 넘어서

주장과 근거

미해결 문제

연구에 대한 시사점

References (4)

Explore this topic deeper