Critical ReviewMedicine & Health

1,300 AI Medical Devices Approved โ€” Where Is the Clinical Evidence?

Over 1,300 AI-enabled medical devices have received FDA clearance, with 258 added in 2025 alone. Yet clinical performance data exists for only about half of analyzed devices, and fewer than one-third report sex-specific outcomes.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Why It Matters

A tally that once seemed futuristic now reads as bureaucratic routine: the U.S. Food and Drug Administration had cleared more than 1,300 artificial-intelligence-enabled medical devices by December 2025, adding 258 in that calendar year alone. Radiology accounts for the largest share, followed by cardiology and pathology. The number is impressive. The question it raises is uncomfortable: how many of these devices have been tested against the clinical outcomes they are supposed to improve?

The answer, according to a growing body of regulatory scholarship published in JAMA Network Open, is "far fewer than you would expect."

The Research Landscape

How AI Devices Reach the Market

Most AI medical devices enter through the FDA's 510(k) pathway, which requires manufacturers to demonstrate that a new device is "substantially equivalent" to one already on the market. This pathway was designed for stethoscopes and tongue depressors, not for software that reads chest X-rays. It does not require clinical trials. A smaller number enter through the De Novo pathway, which applies to novel low-to-moderate-risk devices, or through premarket approval (PMA), which does require clinical data but is used for fewer than 5% of AI device submissions.

The 510(k) route means that a device can reach hospitals and clinics with bench-test data and algorithmic performance metricsโ€”sensitivity, specificity, area under the curveโ€”measured on curated datasets, without ever being tested on a living patient in a prospective study.

What the Evidence Shows

Analyses of FDA-authorized AI devices have converged on several findings:

  • Clinical performance studies exist for only approximately half of the devices examined. The remainder rely on retrospective dataset evaluations or internal validation alone.
  • Less than one-third report sex-specific performance data. This means that for most devices, it is unknown whether algorithmic accuracy differs between male and female patientsโ€”a concern given well-documented sex-based differences in disease presentation, particularly in cardiology and dermatology.
  • Geographic and demographic representation in training data is rarely disclosed. When it is, the datasets skew toward populations served by large U.S. academic medical centers.
  • Post-market surveillance is sparse. Few devices have published real-world performance data after deployment.

Critical Analysis

<
ClaimSource BasisConfidenceCaveat
1,300+ AI devices FDA-cleared as of Dec 2025FDA device database, cumulative countHighCount includes all AI/ML-enabled devices; definitions vary
258 new clearances in 2025FDA annual tallyHighCalendar year figure; some filed in prior year
Clinical performance studies for ~half of analyzed devicesJAMA Network Open analysisModerateDenominator varies by which devices were analyzed
Less than one-third report sex-specific dataSame analysisModerateReporting โ‰  absence of data; some may collect but not publish
Most enter through 510(k)FDA pathway recordsHighConsistent across multiple analyses

The gap between regulatory clearance and clinical evidence is not unique to AIโ€”it has existed for decades in the broader medical device ecosystem. But AI devices introduce a complication: they can change. Software updates, retraining on new data, and drift in input distributions mean that the device a hospital purchases may not behave identically to the device that was cleared. The FDA's Predetermined Change Control Plan framework, introduced in 2023, attempts to address this, but adoption remains voluntary and uneven.

The Radiology Concentration Problem

Radiology dominates the AI device landscapeโ€”accounting for roughly 75% of all clearances. This concentration reflects both the availability of large imaging datasets and the relative ease of defining a bounded task ("detect nodule in chest CT"). It also means that the clinical-evidence gap is most acute in the specialty with the most devices. Radiologists are being asked to integrate AI tools whose performance in their specific patient population, on their specific scanner hardware, may never have been measured.

What "Cleared" Does Not Mean

A common misunderstandingโ€”shared by patients, administrators, and sometimes cliniciansโ€”is that FDA clearance implies clinical validation. It does not. Clearance through 510(k) means the device is substantially equivalent to a predicate. It does not mean the device improves patient outcomes, reduces diagnostic errors, or is cost-effective. The distinction matters because hospital procurement decisions, insurance coverage, and patient trust often rest on the assumption that regulatory clearance equals clinical proof.

Open Questions

  • Should the FDA require prospective clinical trials for AI diagnostic devices? The agency has signaled interest but faces industry pushback citing development speed and cost. A tiered approachโ€”clinical data required for high-risk applications, post-market studies for lower-risk onesโ€”seems more plausible than a blanket mandate.
  • How should continuous-learning devices be regulated? If an algorithm updates its weights after deployment, at what point does it become a new device requiring new clearance? The current framework lacks clear thresholds.
  • Who is responsible when an under-validated device contributes to a misdiagnosis? Product liability law for AI devices remains unsettled. The clinician-in-the-loop argumentโ€”that a physician always makes the final callโ€”weakens as AI tools become more autonomous and as non-specialist settings adopt them without radiologist oversight.
  • Can post-market registries close the evidence gap faster than pre-market trials? Some researchers advocate for mandatory real-world performance registries, analogous to surgical outcome databases, as a pragmatic alternative to requiring pre-market clinical trials for every device.
  • The Measured View

    The proliferation of FDA-cleared AI medical devices is a regulatory success story in throughput and a clinical evidence story still being written. Over 1,300 devices are on the market. Whether they make patients healthierโ€”and which patients, specificallyโ€”remains answered for only a fraction of them. The gap is not a scandal; it is a structural feature of a regulatory system designed before software became medicine. Closing it will require not just more studies, but a rethinking of what "evidence" means for devices that learn.


    References (3)

    Wu, E., Wu, K., Daneshjou, R., et al. (2024). FDA-Authorized AI/ML Medical Devices: Trends in Clinical Evidence and Regulatory Pathways. JAMA Network Open.
    U.S. Food and Drug Administration. (2025). Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. FDA Device Database.
    U.S. Food and Drug Administration. (2023). Marketing Submission Recommendations for a Predetermined Change Control Plan for AI/ML-Enabled Device Software Functions. FDA Guidance Document.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 6 keywords โ†’