Critical ReviewMedicine & Health

1,300 AI Medical Devices Approved — Where Is the Clinical Evidence?

Over 1,300 AI-enabled medical devices have received FDA clearance, with 258 added in 2025 alone. Yet clinical performance data exists for only about half of analyzed devices, and fewer than one-third report sex-specific outcomes.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Why It Matters

A tally that once seemed futuristic now reads as bureaucratic routine: the U.S. Food and Drug Administration had cleared more than 1,300 artificial-intelligence-enabled medical devices by December 2025, adding 258 in that calendar year alone. Radiology accounts for the largest share, followed by cardiology and pathology. The number is impressive. The question it raises is uncomfortable: how many of these devices have been tested against the clinical outcomes they are supposed to improve?

The answer, according to a growing body of regulatory scholarship published in JAMA Network Open, is "far fewer than you would expect."

The Research Landscape

How AI Devices Reach the Market

Most AI medical devices enter through the FDA's 510(k) pathway, which requires manufacturers to demonstrate that a new device is "substantially equivalent" to one already on the market. This pathway was designed for stethoscopes and tongue depressors, not for software that reads chest X-rays. It does not require clinical trials. A smaller number enter through the De Novo pathway, which applies to novel low-to-moderate-risk devices, or through premarket approval (PMA), which does require clinical data but is used for fewer than 5% of AI device submissions.

The 510(k) route means that a device can reach hospitals and clinics with bench-test data and algorithmic performance metrics—sensitivity, specificity, area under the curve—measured on curated datasets, without ever being tested on a living patient in a prospective study.

What the Evidence Shows

Analyses of FDA-authorized AI devices have converged on several findings:

Clinical performance studies exist for only approximately half of the devices examined. The remainder rely on retrospective dataset evaluations or internal validation alone.
Less than one-third report sex-specific performance data. This means that for most devices, it is unknown whether algorithmic accuracy differs between male and female patients—a concern given well-documented sex-based differences in disease presentation, particularly in cardiology and dermatology.
Geographic and demographic representation in training data is rarely disclosed. When it is, the datasets skew toward populations served by large U.S. academic medical centers.
Post-market surveillance is sparse. Few devices have published real-world performance data after deployment.

Critical Analysis

Claim	Source Basis	Confidence	Caveat
1,300+ AI devices FDA-cleared as of Dec 2025	FDA device database, cumulative count	High	Count includes all AI/ML-enabled devices; definitions vary
258 new clearances in 2025	FDA annual tally	High	Calendar year figure; some filed in prior year
Clinical performance studies for ~half of analyzed devices	JAMA Network Open analysis	Moderate	Denominator varies by which devices were analyzed
Less than one-third report sex-specific data	Same analysis	Moderate	Reporting ≠ absence of data; some may collect but not publish
Most enter through 510(k)	FDA pathway records	High	Consistent across multiple analyses

The gap between regulatory clearance and clinical evidence is not unique to AI—it has existed for decades in the broader medical device ecosystem. But AI devices introduce a complication: they can change. Software updates, retraining on new data, and drift in input distributions mean that the device a hospital purchases may not behave identically to the device that was cleared. The FDA's Predetermined Change Control Plan framework, introduced in 2023, attempts to address this, but adoption remains voluntary and uneven.

The Radiology Concentration Problem

Radiology dominates the AI device landscape—accounting for roughly 75% of all clearances. This concentration reflects both the availability of large imaging datasets and the relative ease of defining a bounded task ("detect nodule in chest CT"). It also means that the clinical-evidence gap is most acute in the specialty with the most devices. Radiologists are being asked to integrate AI tools whose performance in their specific patient population, on their specific scanner hardware, may never have been measured.

What "Cleared" Does Not Mean

A common misunderstanding—shared by patients, administrators, and sometimes clinicians—is that FDA clearance implies clinical validation. It does not. Clearance through 510(k) means the device is substantially equivalent to a predicate. It does not mean the device improves patient outcomes, reduces diagnostic errors, or is cost-effective. The distinction matters because hospital procurement decisions, insurance coverage, and patient trust often rest on the assumption that regulatory clearance equals clinical proof.

Open Questions

Should the FDA require prospective clinical trials for AI diagnostic devices? The agency has signaled interest but faces industry pushback citing development speed and cost. A tiered approach—clinical data required for high-risk applications, post-market studies for lower-risk ones—seems more plausible than a blanket mandate.

How should continuous-learning devices be regulated? If an algorithm updates its weights after deployment, at what point does it become a new device requiring new clearance? The current framework lacks clear thresholds.

Who is responsible when an under-validated device contributes to a misdiagnosis? Product liability law for AI devices remains unsettled. The clinician-in-the-loop argument—that a physician always makes the final call—weakens as AI tools become more autonomous and as non-specialist settings adopt them without radiologist oversight.

Can post-market registries close the evidence gap faster than pre-market trials? Some researchers advocate for mandatory real-world performance registries, analogous to surgical outcome databases, as a pragmatic alternative to requiring pre-market clinical trials for every device.

The Measured View

The proliferation of FDA-cleared AI medical devices is a regulatory success story in throughput and a clinical evidence story still being written. Over 1,300 devices are on the market. Whether they make patients healthier—and which patients, specifically—remains answered for only a fraction of them. The gap is not a scandal; it is a structural feature of a regulatory system designed before software became medicine. Closing it will require not just more studies, but a rethinking of what "evidence" means for devices that learn.

면책 조항: 본 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 학술 저작물에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장을 원본 논문과 대조하여 검증해야 한다.

중요성

한때 미래적으로 보였던 수치가 이제는 일상적인 행정 기록처럼 읽힌다. 미국 식품의약국(FDA)은 2025년 12월까지 1,300개 이상의 인공지능 기반 의료기기를 허가했으며, 그 중 258개가 해당 연도에만 추가되었다. 방사선과가 가장 큰 비중을 차지하고, 심장학과 병리학이 그 뒤를 잇는다. 이 수치는 인상적이다. 하지만 여기서 불편한 질문이 제기된다. 이 기기들 중 실제로 개선하고자 하는 임상 결과에 대해 검증이 이루어진 것은 얼마나 될까?

JAMA Network Open에 게재된 급증하는 규제 학술 연구들에 따르면, 그 답은 "예상보다 훨씬 적다"이다.

연구 현황

AI 기기의 시장 진입 과정

대부분의 AI 의료기기는 FDA의 510(k) 경로를 통해 시장에 진입하는데, 이 경로에서 제조사는 신규 기기가 이미 시장에 출시된 기기와 "실질적으로 동등함"을 증명해야 한다. 이 경로는 청진기나 설압자를 위해 설계된 것으로, 흉부 X선을 판독하는 소프트웨어를 위한 것이 아니다. 임상시험을 요구하지 않는다. 소수의 기기는 De Novo 경로를 통해 진입하는데, 이는 신규 저위험~중간위험 기기에 적용되거나, 임상 데이터를 요구하지만 AI 기기 신청의 5% 미만에만 사용되는 시판 전 승인(PMA) 경로를 통해 진입한다.

510(k) 경로는 기기가 전향적 연구에서 살아 있는 환자를 대상으로 한 번도 검증되지 않은 채로, 선별된 데이터셋에서 측정된 벤치 테스트 데이터와 알고리즘 성능 지표—민감도, 특이도, 곡선하면적(AUC)—만으로 병원과 클리닉에 도달할 수 있음을 의미한다.

근거가 보여주는 것

FDA 허가 AI 기기에 대한 분석들은 다음과 같은 몇 가지 결과로 수렴된다.

임상 성능 연구는 분석된 기기 중 약 절반에만 존재한다. 나머지는 후향적 데이터셋 평가 또는 내부 검증만을 기반으로 한다.
3분의 1 미만이 성별 특이적 성능 데이터를 보고한다. 이는 대부분의 기기에서 알고리즘 정확도가 남성 환자와 여성 환자 사이에 차이가 있는지 알 수 없다는 것을 의미하며, 특히 심장학과 피부과에서 잘 문서화된 성별 기반 질병 발현 차이를 고려할 때 우려스러운 부분이다.
훈련 데이터의 지리적·인구통계학적 대표성은 거의 공개되지 않는다. 공개될 경우에도, 데이터셋은 대형 미국 학술 의료센터가 서비스하는 인구 집단으로 편향되어 있다.
시판 후 감시는 드물다. 배포 이후 실제 성능 데이터를 공개한 기기는 거의 없다.

비판적 분석

주장	출처 근거	신뢰도	주의사항
2025년 12월 기준 AI 기기 FDA 허가 1,300개 이상	FDA 기기 데이터베이스, 누적 집계	높음	모든 AI/ML 기반 기기 포함; 정의는 다양함
2025년 신규 허가 258건	FDA 연간 집계	높음	해당 연도 수치; 일부는 전년도에 신청됨
분석된 기기 중 약 절반에 임상 성능 연구 존재	JAMA Network Open 분석	보통	분석된 기기에 따라 분모가 다름
3분의 1 미만이 성별 특이적 데이터 보고	동일 분석	보통	보고 부재 ≠ 데이터 부재; 일부는 수집하되 미공개일 수 있음
대부분 510(k)를 통해 진입	FDA 경로 기록	높음	복수의 분석에서 일관됨

규제 허가와 임상 근거 사이의 격차는 AI에만 국한된 것이 아니며, 더 넓은 의료기기 생태계에서 수십 년간 존재해 왔다. 그러나 AI 기기는 한 가지 복잡한 문제를 추가로 야기한다. AI 기기는 변할 수 있다는 것이다. 소프트웨어 업데이트, 새로운 데이터에 대한 재훈련, 입력 분포의 드리프트는 병원이 구매한 기기가 허가된 기기와 동일하게 작동하지 않을 수 있음을 의미한다. 2023년에 도입된 FDA의 사전 결정 변경 통제 계획(Predetermined Change Control Plan) 프레임워크가 이를 해결하려 시도하고 있지만, 채택은 여전히 자발적이며 불균등하다.

방사선과 집중 문제

방사선과는 AI 기기 승인 현황을 지배하고 있으며, 전체 허가의 약 75%를 차지한다. 이러한 집중 현상은 대규모 영상 데이터셋의 가용성과, 제한된 과제("흉부 CT에서 결절 탐지")를 정의하기 상대적으로 용이하다는 점 모두를 반영한다. 또한 이는 가장 많은 기기를 보유한 전문 분야에서 임상 근거의 격차가 가장 심각하다는 것을 의미한다. 방사선과 의사들은 자신들의 특정 환자 집단과 특정 스캐너 장비에서의 성능이 한 번도 측정된 적 없을 수 있는 AI 도구들을 통합하도록 요구받고 있다.

"허가"가 의미하지 않는 것

환자, 행정 담당자, 그리고 때로는 임상의들도 공유하는 흔한 오해는 FDA 허가가 임상적 검증을 의미한다는 것이다. 그렇지 않다. 510(k)를 통한 허가는 해당 기기가 선행 기기와 실질적으로 동등하다는 것을 의미한다. 기기가 환자 결과를 개선하거나, 진단 오류를 줄이거나, 비용 효과적이라는 것을 의미하지 않는다. 규제 허가가 임상적 증명과 동일하다는 가정 위에 병원 조달 결정, 보험 적용, 그리고 환자 신뢰가 놓이는 경우가 많기 때문에 이 구분은 중요하다.

미해결 과제

FDA는 AI 진단 기기에 대해 전향적 임상시험을 요구해야 하는가? 규제 기관은 관심을 표명했지만, 개발 속도와 비용을 이유로 한 업계의 반발에 직면해 있다. 포괄적 의무화보다는 고위험 적용에는 임상 데이터를 요구하고 저위험 적용에는 시판 후 연구를 요구하는 단계적 접근 방식이 더 현실적으로 보인다.

지속 학습 기기는 어떻게 규제되어야 하는가? 알고리즘이 배포 후 가중치를 업데이트한다면, 어느 시점에서 새로운 허가가 필요한 새로운 기기가 되는가? 현재 체계에는 명확한 기준이 없다.

불충분하게 검증된 기기가 오진에 기여했을 때 누가 책임을 지는가? AI 기기에 대한 제조물 책임법은 아직 정립되지 않은 상태이다. AI 도구가 더욱 자율적으로 발전하고 방사선과 전문의의 감독 없이 비전문 분야에서도 채택됨에 따라, 의사가 항상 최종 결정을 내린다는 '루프 내 임상의' 논거는 약해지고 있다.

시판 후 등록부가 시판 전 임상시험보다 더 빠르게 근거 격차를 좁힐 수 있는가? 일부 연구자들은 모든 기기에 대해 시판 전 임상시험을 요구하는 것의 현실적 대안으로, 외과적 결과 데이터베이스와 유사한 의무적 실제 성능 등록부를 옹호한다.

균형 잡힌 시각

FDA 허가 AI 의료 기기의 증가는 처리량 측면에서는 규제의 성공 사례이며, 임상 근거 측면에서는 아직 써지고 있는 이야기이다. 1,300개 이상의 기기가 시장에 나와 있다. 이 기기들이 환자를 더 건강하게 만드는지—그리고 구체적으로 어떤 환자들에게—는 그 중 일부에 대해서만 답이 나와 있다. 이 격차는 스캔들이 아니라, 소프트웨어가 의료가 되기 이전에 설계된 규제 시스템의 구조적 특성이다. 이를 해소하려면 더 많은 연구만이 아니라, 학습하는 기기에서 "근거"가 무엇을 의미하는지에 대한 재고찰이 필요하다.

References (3)

Wu, E., Wu, K., Daneshjou, R., et al. (2024). FDA-Authorized AI/ML Medical Devices: Trends in Clinical Evidence and Regulatory Pathways. JAMA Network Open.

DOI Scholar

U.S. Food and Drug Administration. (2025). Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. FDA Device Database.

Scholar

U.S. Food and Drug Administration. (2023). Marketing Submission Recommendations for a Predetermined Change Control Plan for AI/ML-Enabled Device Software Functions. FDA Guidance Document.