Trend AnalysisEngineering

Federated Learning in Healthcare: Training AI Without Sharing Patient Data

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

The Question

Healthcare AI models require large, diverse training datasets, but medical data is siloed across hospitals, clinics, and health systems, protected by regulations (HIPAA, GDPR) that prohibit centralised data pooling. Federated learning (FL) offers a solution: instead of moving data to the model, the model moves to the data. Each institution trains locally and shares only model updates (gradients), never raw patient records. But gradient updates can leak information about training data (gradient inversion attacks), and heterogeneous clinical data across institutions (non-IID distribution) degrades model quality. Can federated learning deliver clinical-grade AI while providing mathematically provable privacy guarantees?

Landscape

Yazdinejad et al. (2024) developed AP2FL — an auditable privacy-preserving FL framework for healthcare electronics. Their key innovation: combining trusted execution environments (TEEs) with an auditing mechanism that verify each institution's contribution without revealing its data. The auditability feature addresses a practical concern: healthcare institutions need to verify that participating sites are contributing genuine model updates, not adversarial inputs.

Aminifar et al. (2024) focused on edge FL for mobile health (mHealth) systems — wearable devices that generate continuous physiological data streams. Training models on-device (smartphones, wearables) eliminates even the transmission of gradient updates to a central server, providing the strongest privacy guarantee. Their framework demonstrated seizure detection with a privacy-preserving edge FL approach while keeping all data on-device.

Collins & Wang (2025) provided a comprehensive FL survey covering horizontal FL (same features, different patients across institutions), vertical FL (different features for the same patients), and federated transfer learning. They identified the non-IID data problem as FL's central technical challenge: when hospitals serve different patient populations (urban vs. rural, paediatric vs. geriatric), local model updates diverge, and simple averaging produces suboptimal global models.

Alrashed et al. (2025) addressed vertical FL with split neural networks — a scenario where different healthcare providers hold complementary features (lab results at one site, imaging at another, genomics at a third) for the same patients.

Key Claims & Evidence

Claim	Evidence	Verdict
FL achieves comparable accuracy to centralised training	Edge FL demonstrated for seizure detection in mobile-health systems (Aminifar et al. 2024)	Supported for specific tasks; gap varies by data heterogeneity
Privacy-preserving mechanisms prevent gradient inversion attacks	TEE-based privacy guarantees (Yazdinejad et al. 2024)	Theoretically guaranteed; privacy-utility trade-off exists
Non-IID data remains FL's central challenge	Survey across FL literature identifies data heterogeneity as primary accuracy bottleneck (Collins & Wang 2025)	Confirmed; personalisation and clustering strategies emerging
Auditable FL verifies contributions	Audit trails verify institutional contributions without data exposure (Yazdinejad et al. 2024)	Demonstrated; computational overhead may be limiting

Open Questions

Regulatory acceptance: Will regulatory agencies (FDA, EMA) accept FL-trained models for clinical use, or will they require access to centralised training data for validation?

Incentive alignment: Why should hospitals contribute to FL if they don't directly benefit from the global model? Can incentive mechanisms (model improvement guarantees, data valuation) encourage participation?

Communication efficiency: FL requires many rounds of gradient communication. Can compression, quantisation, and sparse update techniques reduce bandwidth requirements for resource-constrained clinical networks?

Fairness: If FL training is dominated by large hospitals with more data, will the resulting model perform poorly for underrepresented patient populations?

Referenced Papers

[1] Yazdinejad, A. et al. (2024). AP2FL: Auditable Privacy-Preserving FL for Healthcare Electronics. IEEE Trans. Consumer Electronics. DOI: 10.1109/TCE.2023.3318509
[2] Aminifar, A. et al. (2024). Privacy-Preserving Edge FL for Intelligent Mobile-Health Systems. Future Generation Computer Systems. DOI: 10.1016/j.future.2024.07.035
[3] Collins, E. & Wang, M. (2025). Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence. arXiv. DOI: 10.48550/arXiv.2504.17703
[4] Alrashed, B. et al. (2025). PPVFL-SplitNN: Privacy-Preserving Vertical FL for Distributed Patient Data. DOI: 10.5220/0013445300003979
[5] Nawaz, A. et al. (2025). Blockchain-Enabled Second-Order FL in Personalized Healthcare. IEEE Trans. Consumer Electronics. DOI: 10.1109/TCE.2025.3620115

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에 인용하기 전에 구체적인 연구 결과, 통계 및 주장을 원본 논문과 대조하여 검증해야 한다.

의료 분야의 연합 학습: 환자 데이터 공유 없이 AI 훈련하기

분야: 공학 | 방법론: 계산적

저자: Sean K.S. Shin | 날짜: 2026-03-17

연구 질문

의료 AI 모델은 규모가 크고 다양한 훈련 데이터셋을 필요로 하지만, 의료 데이터는 병원, 클리닉, 의료 시스템에 분산되어 있으며 중앙 집중식 데이터 통합을 금지하는 규정(HIPAA, GDPR)에 의해 보호된다. 연합 학습(FL)은 이에 대한 해결책을 제시한다. 데이터를 모델로 이동시키는 대신, 모델을 데이터로 이동시키는 방식이다. 각 기관은 로컬에서 훈련을 수행하고 모델 업데이트(그래디언트)만을 공유하며, 원본 환자 기록은 절대 공유하지 않는다. 그러나 그래디언트 업데이트는 훈련 데이터에 관한 정보를 누출할 수 있고(그래디언트 역산 공격), 기관 간 이질적인 임상 데이터(non-IID 분포)는 모델 품질을 저하시킨다. 연합 학습은 수학적으로 증명 가능한 프라이버시 보장을 제공하면서 임상 수준의 AI를 구현할 수 있을까?

연구 현황

Yazdinejad 등(2024)은 의료 전자기기를 위한 감사 가능한 프라이버시 보존 FL 프레임워크인 AP2FL을 개발하였다. 이들의 핵심 혁신은 신뢰 실행 환경(TEE)과 감사 메커니즘을 결합하여, 각 기관의 데이터를 노출하지 않으면서 해당 기관의 기여도를 검증한다는 점이다. 감사 가능성 기능은 실질적인 문제를 해결한다. 의료 기관들은 참여 기관이 적대적 입력이 아닌 진정한 모델 업데이트를 기여하고 있는지 검증할 필요가 있기 때문이다.

Aminifar 등(2024)은 모바일 헬스(mHealth) 시스템, 즉 지속적인 생리 데이터 스트림을 생성하는 웨어러블 기기를 위한 엣지 FL에 초점을 맞추었다. 기기 내(스마트폰, 웨어러블)에서 모델을 훈련함으로써 중앙 서버로의 그래디언트 업데이트 전송조차 제거되어, 가장 강력한 프라이버시 보장을 제공한다. 이들의 프레임워크는 모든 데이터를 기기 내에 유지하는 프라이버시 보존 엣지 FL 방식으로 발작 감지를 시연하였다.

Collins & Wang(2025)은 수평 FL(동일한 특성, 기관별로 다른 환자), 수직 FL(동일한 환자에 대해 다른 특성), 그리고 연합 전이 학습을 포괄하는 종합적인 FL 서베이를 제공하였다. 이들은 non-IID 데이터 문제를 FL의 핵심 기술적 과제로 지목하였다. 병원들이 서로 다른 환자 집단(도시 대 농촌, 소아과 대 노인과)을 담당할 때, 로컬 모델 업데이트가 발산하여 단순 평균화가 최적화되지 않은 전역 모델을 생성한다.

Alrashed 등(2025)은 분할 신경망을 활용한 수직 FL을 다루었다. 이는 서로 다른 의료 기관이 동일한 환자에 대해 상호 보완적인 특성(한 기관의 검사 결과, 다른 기관의 영상, 세 번째 기관의 유전체 정보)을 보유하는 시나리오이다.

핵심 주장 및 근거

주장	근거	판정
FL은 중앙 집중식 훈련과 비교 가능한 정확도를 달성한다	모바일 헬스 시스템에서 발작 감지를 위한 엣지 FL 시연 (Aminifar 등 2024)	특정 과제에서 지지됨; 데이터 이질성에 따라 격차 상이
프라이버시 보존 메커니즘이 그래디언트 역산 공격을 방지한다	TEE 기반 프라이버시 보장 (Yazdinejad 등 2024)	이론적으로 보장됨; 프라이버시-유용성 트레이드오프 존재
non-IID 데이터가 FL의 핵심 과제로 남아 있다	FL 문헌 서베이에서 데이터 이질성을 주요 정확도 병목으로 확인 (Collins & Wang 2025)	확인됨; 개인화 및 클러스터링 전략이 부상 중
감사 가능한 FL이 기여도를 검증한다	감사 추적을 통해 데이터 노출 없이 기관 기여도 검증 (Yazdinejad 등 2024)	시연됨; 계산 오버헤드가 제한 요인이 될 수 있음

미해결 과제

규제 수용: FDA, EMA와 같은 규제 기관이 임상 활용을 위해 FL로 훈련된 모델을 승인할 것인가, 아니면 검증을 위해 중앙 집중식 훈련 데이터에 대한 접근을 요구할 것인가?

인센티브 정렬: 병원이 글로벌 모델로부터 직접적인 이득을 얻지 못한다면, 왜 FL에 기여해야 하는가? 인센티브 메커니즘(모델 개선 보장, 데이터 가치 평가)이 참여를 장려할 수 있는가?

통신 효율성: FL은 수많은 라운드의 그래디언트 통신을 필요로 한다. 압축, 양자화, 희소 업데이트 기법이 자원이 제한된 임상 네트워크의 대역폭 요구 사항을 줄일 수 있는가?

공정성: FL 학습이 데이터가 더 많은 대형 병원에 의해 지배된다면, 결과 모델은 과소 대표된 환자 집단에 대해 성능이 저하될 것인가?

참고 논문

[1] Yazdinejad, A. et al. (2024). AP2FL: Auditable Privacy-Preserving FL for Healthcare Electronics. IEEE Trans. Consumer Electronics. DOI: 10.1109/TCE.2023.3318509
[2] Aminifar, A. et al. (2024). Privacy-Preserving Edge FL for Intelligent Mobile-Health Systems. Future Generation Computer Systems. DOI: 10.1016/j.future.2024.07.035
[3] Collins, E. & Wang, M. (2025). Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence. arXiv. DOI: 10.48550/arXiv.2504.17703
[4] Alrashed, B. et al. (2025). PPVFL-SplitNN: Privacy-Preserving Vertical FL for Distributed Patient Data. DOI: 10.5220/0013445300003979
[5] Nawaz, A. et al. (2025). Blockchain-Enabled Second-Order FL in Personalized Healthcare. IEEE Trans. Consumer Electronics. DOI: 10.1109/TCE.2025.3620115

References (5)

Yazdinejad, A., Dehghantanha, A., & Srivastava, G. (2024). AP2FL: Auditable Privacy-Preserving Federated Learning Framework for Electronics in Healthcare. IEEE Transactions on Consumer Electronics, 70(1), 2527-2535.

DOI Scholar

Aminifar, A., Shokri, M., & Aminifar, A. (2024). Privacy-preserving edge federated learning for intelligent mobile-health systems. Future Generation Computer Systems, 161, 625-637.

DOI Scholar

E. Collins, Michelle Wang. Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence.

DOI Scholar

Alrashed, B., Nanda, P., Dinh, H., Aldahiri, A., Alhosaini, H., & Alghamdi, N. (2025). PPVFL-SplitNN: Privacy-Preserving Vertical Federated Learning with Split Neural Networks for Distributed Patient Data. Proceedings of the 22nd International Conference on Security and Cryptography, 13-24.

DOI Scholar

Nawaz, A., Irfan, M., Yu, X., Aldawsari, H., Alsisi, R. H., Zou, Z., et al. (2025). Blockchain-Enabled Privacy-Preserving Second-Order Federated Edge Learning in Personalized Healthcare. IEEE Transactions on Consumer Electronics, 71(4), 9983-9992.

DOI Scholar

Federated Learning in Healthcare: Training AI Without Sharing Patient Data

The Question

Landscape

Key Claims & Evidence

Open Questions

Referenced Papers

의료 분야의 연합 학습: 환자 데이터 공유 없이 AI 훈련하기

연구 질문

연구 현황

핵심 주장 및 근거

미해결 과제

참고 논문

References (5)

Explore this topic deeper