Computer Systems

Fully Homomorphic Encryption Meets Federated Learning: The Lancelot Framework and the Road to Private ML

Combining fully homomorphic encryption with federated learning promises ML training where no party—not even the aggregation server—can see raw gradients. The Lancelot framework demonstrates a 20x speedup over prior FHE-based FL methods while resisting Byzantine attacks, but significant overhead remains.

By ORAA Research

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Federated learning promised to keep data private by design: train models locally, share only gradients. But gradients leak information. Reconstruction attacks can recover training data from shared model updates with alarming fidelity. The logical next step—encrypting gradients so that even the aggregation server cannot read them—requires performing mathematical operations on encrypted data. This is precisely what fully homomorphic encryption (FHE) enables, and a recent line of work culminating in the Lancelot framework demonstrates that the approach is becoming computationally tractable.

The Privacy Gap in Standard Federated Learning

Federated learning (FL) distributes model training across clients that hold local data. Each client trains on its own partition and sends model updates—typically gradients or weight differences—to a central server for aggregation. The raw data never leaves the client. This architecture, introduced by McMahan et al. (2017), offered an appealing privacy narrative.

However, subsequent research revealed serious vulnerabilities. Gradient inversion attacks can reconstruct training samples from shared updates. Membership inference attacks can determine whether a specific record was in a client's training set. Model poisoning attacks allow malicious clients to corrupt the global model. Standard FL, without additional cryptographic protection, is vulnerable on all three fronts.

Differential privacy (DP) addresses some of these concerns by adding calibrated noise to gradients, but this introduces a fundamental accuracy-privacy tradeoff. Secure multi-party computation (SMPC) provides stronger guarantees but requires complex interaction protocols among participants. FHE occupies a distinct position: it allows the server to perform meaningful computation—specifically, aggregation—on encrypted gradients without ever decrypting them.

How FHE-Based Federated Learning Works

In an FHE-based FL system, clients encrypt their local model updates using a homomorphic encryption scheme (typically CKKS for approximate arithmetic on real-valued gradients). The server receives these ciphertexts and performs aggregation operations—weighted averaging, for example—directly in the encrypted domain. The result is an encrypted global model update that clients can decrypt locally to update their models.

The critical property is that the server never sees plaintext gradients. Even a fully compromised aggregation server learns nothing about individual client contributions beyond what can be inferred from the final aggregated model.

Xie et al. (2024) provide a comprehensive survey of efficiency optimization techniques for HE-based FL in IEEE Internet of Things Journal. They categorize approaches into algorithmic optimizations (batching, quantization-aware encryption, sparsification before encryption), hardware accelerations (GPU and FPGA-based polynomial arithmetic), and hybrid strategies combining multiple techniques. Their taxonomy reveals that computational overhead remains the primary bottleneck: FHE operations on neural network gradients can be 100-1000x slower than plaintext equivalents.

The Lancelot Contribution

Jiang et al. (2025), published in Nature Machine Intelligence, present Lancelot—a framework that addresses both privacy and Byzantine robustness within FHE. The key insight is that verifying whether a client's contribution is malicious (a requirement for Byzantine robustness) and keeping that contribution private (a requirement for confidentiality) are inherently conflicting goals. Lancelot resolves this tension through a carefully designed protocol that performs robustness checks in the encrypted domain.

The reported performance improvement is substantial: Lancelot achieves more than a 20-fold speedup over prior FHE-based Byzantine-robust FL methods. The framework was tested on medical imaging diagnostics and standard image classification benchmarks, demonstrating that accuracy remains comparable to non-encrypted baselines while providing formal privacy guarantees.

The architectural innovation involves restructuring the aggregation pipeline so that Byzantine detection—identifying and excluding poisoned updates—can be performed on ciphertexts without requiring decryption at any intermediate step. This avoids the common workaround of decrypting for verification and re-encrypting, which introduces both latency and a potential privacy leak at the verification point.

Practical Limitations and Open Challenges

Despite the 20x improvement, the absolute computational overhead of FHE-based FL remains significant. For large models (e.g., transformers with billions of parameters), encrypting and homomorphically aggregating gradient tensors generates ciphertext sizes and computation times that may exceed practical network and time budgets. Current FHE-based FL systems work best with smaller models or compressed gradient representations.

Key size and ciphertext expansion remain problematic. CKKS ciphertexts can be 10-100x larger than the plaintext data they encode, multiplying communication costs in FL systems where bandwidth is often the binding constraint.

Multi-key scenarios introduce additional complexity. When different clients use different encryption keys (a natural requirement in cross-organizational settings), the server must perform multi-key homomorphic operations, which are substantially more expensive than single-key variants.

The trust model requires scrutiny. FHE-based FL eliminates the need to trust the server with gradient privacy, but clients must still trust the encryption scheme, the key generation protocol, and the correctness of the aggregation. Side-channel attacks on client-side encryption remain relevant.

Where This Matters Most

The healthcare and financial sectors represent the most compelling use cases. In healthcare, multi-hospital ML collaborations face strict regulatory constraints (HIPAA, GDPR) that make even standard FL legally complex. FHE provides a defense-in-depth layer that could simplify compliance arguments. In finance, anti-money-laundering models trained across institutions could benefit from encrypted gradient aggregation to avoid sharing customer transaction patterns.

The convergence of hardware acceleration (dedicated FHE chips from companies like Intel, DARPA-funded DPRIVE program) with algorithmic improvements like Lancelot suggests that the computational gap will narrow over the next several years. Whether it narrows enough for production deployment with large language models remains an open question.

Open Questions

Can FHE-based FL scale to modern LLM fine-tuning, where gradient tensors contain billions of parameters?
How does the accuracy-computation tradeoff compare quantitatively between FHE-based FL and differential-privacy-based FL across different threat models?
Will hardware-accelerated FHE (e.g., custom ASICs) close the performance gap enough for real-time applications?

Closing Reflection

Fully homomorphic encryption in federated learning transforms a soft privacy promise into a cryptographic guarantee. The Lancelot framework represents a meaningful step toward making this practical by dramatically reducing computational overhead while simultaneously addressing Byzantine threats. Yet the field remains in a phase where theoretical elegance outpaces deployment readiness. The gap is closing, but it has not yet closed.

연합 학습(Federated Learning)은 설계 단계부터 데이터 프라이버시를 보장하겠다고 약속했다. 즉, 로컬에서 모델을 학습하고 그래디언트만 공유하는 방식이다. 그러나 그래디언트는 정보를 누설한다. 재구성 공격(reconstruction attack)은 공유된 모델 업데이트로부터 학습 데이터를 놀라울 정도의 정확도로 복원할 수 있다. 논리적인 다음 단계는 집계 서버조차 읽을 수 없도록 그래디언트를 암호화하는 것이며, 이를 위해서는 암호화된 데이터에 대한 수학적 연산을 수행해야 한다. 이것이 바로 완전 동형 암호화(Fully Homomorphic Encryption, FHE)가 가능하게 하는 것이며, Lancelot 프레임워크로 귀결된 최근 일련의 연구들은 이 접근법이 계산적으로 실현 가능해지고 있음을 보여 준다.

표준 연합 학습의 프라이버시 공백

연합 학습(FL)은 로컬 데이터를 보유한 클라이언트들에 걸쳐 모델 학습을 분산시킨다. 각 클라이언트는 자신의 데이터 파티션에서 학습을 수행하고, 집계를 위해 모델 업데이트—일반적으로 그래디언트 또는 가중치 차이—를 중앙 서버에 전송한다. 원시 데이터는 클라이언트 밖으로 나가지 않는다. McMahan et al.(2017)이 제안한 이 아키텍처는 매력적인 프라이버시 서사를 제공했다.

그러나 이후 연구들은 심각한 취약점을 드러냈다. 그래디언트 역전 공격(gradient inversion attack)은 공유된 업데이트로부터 학습 샘플을 재구성할 수 있다. 멤버십 추론 공격(membership inference attack)은 특정 레코드가 클라이언트의 학습 집합에 포함되어 있었는지를 파악할 수 있다. 모델 오염 공격(model poisoning attack)은 악의적인 클라이언트가 글로벌 모델을 손상시킬 수 있게 한다. 추가적인 암호화 보호 없는 표준 FL은 이 세 가지 측면 모두에서 취약하다.

차분 프라이버시(Differential Privacy, DP)는 그래디언트에 보정된 노이즈를 추가함으로써 이러한 우려들을 일부 해소하지만, 이는 정확도-프라이버시 간의 근본적인 트레이드오프를 초래한다. 안전한 다자간 계산(Secure Multi-Party Computation, SMPC)은 더 강력한 보장을 제공하지만, 참여자들 간의 복잡한 상호작용 프로토콜을 필요로 한다. FHE는 별개의 위치를 차지한다. 서버가 암호화된 그래디언트를 복호화하지 않고도 의미 있는 계산—구체적으로는 집계—을 수행할 수 있게 해 주기 때문이다.

FHE 기반 연합 학습의 동작 방식

FHE 기반 FL 시스템에서 클라이언트는 동형 암호화 방식(실수 값 그래디언트에 대한 근사 산술을 위해 일반적으로 CKKS를 사용)을 이용해 로컬 모델 업데이트를 암호화한다. 서버는 이 암호문(ciphertext)을 수신하고, 암호화된 도메인에서 직접 집계 연산—예를 들어 가중 평균—을 수행한다. 그 결과는 암호화된 글로벌 모델 업데이트이며, 클라이언트는 이를 로컬에서 복호화하여 자신의 모델을 업데이트할 수 있다.

핵심 특성은 서버가 평문 그래디언트를 결코 볼 수 없다는 것이다. 집계 서버가 완전히 손상된 경우에도, 최종 집계된 모델로부터 추론할 수 있는 것 이상으로 개별 클라이언트의 기여에 대해 아무것도 알 수 없다.

Xie et al.(2024)은 IEEE Internet of Things Journal에서 HE 기반 FL의 효율성 최적화 기법에 대한 포괄적인 서베이를 제공한다. 이들은 접근법을 알고리즘 최적화(배칭, 양자화 인식 암호화, 암호화 전 희소화), 하드웨어 가속(GPU 및 FPGA 기반 다항식 산술), 그리고 여러 기법을 결합한 하이브리드 전략으로 분류한다. 이들의 분류 체계는 계산 오버헤드가 주된 병목으로 남아 있음을 보여 준다. 신경망 그래디언트에 대한 FHE 연산은 평문 연산에 비해 100~1000배 느릴 수 있다.

Lancelot의 기여

Nature Machine Intelligence에 게재된 Jiang et al.(2025)은 FHE 내에서 프라이버시와 Byzantine 강건성(Byzantine robustness)을 모두 다루는 프레임워크인 Lancelot을 제안한다. 핵심 통찰은, 클라이언트의 기여가 악의적인지 여부를 검증하는 것(Byzantine 강건성을 위한 요건)과 해당 기여를 비공개로 유지하는 것(기밀성을 위한 요건)이 본질적으로 상충하는 목표라는 점이다. Lancelot은 암호화된 도메인에서 강건성 검사를 수행하는 신중하게 설계된 프로토콜을 통해 이 긴장을 해소한다. 보고된 성능 향상은 상당하다: Lancelot은 기존 FHE 기반 비잔틴 견고 FL 방법 대비 20배 이상의 속도 향상을 달성한다. 이 프레임워크는 의료 영상 진단 및 표준 이미지 분류 벤치마크에서 테스트되었으며, 공식적인 프라이버시 보장을 제공하면서도 정확도가 비암호화 기준선과 비슷한 수준을 유지함을 입증하였다.

아키텍처 혁신은 집계 파이프라인을 재구성하여 비잔틴 탐지—독성 업데이트를 식별하고 제외하는 과정—가 중간 단계에서 복호화 없이 암호문에 대해 직접 수행될 수 있도록 한다. 이를 통해 검증을 위한 복호화 후 재암호화라는 일반적인 우회 방법을 피할 수 있으며, 해당 방법은 지연 시간 증가와 검증 시점에서의 잠재적 프라이버시 누출이라는 문제를 동반한다.

실제적 한계와 미해결 과제

20배 향상에도 불구하고, FHE 기반 FL의 절대적인 계산 오버헤드는 여전히 상당하다. 대형 모델(예: 수십억 개의 파라미터를 가진 트랜스포머)의 경우, 그래디언트 텐서를 암호화하고 동형 집계하면 암호문 크기와 계산 시간이 실제 네트워크 및 시간 예산을 초과할 수 있다. 현재 FHE 기반 FL 시스템은 소형 모델이나 압축된 그래디언트 표현에 가장 적합하게 작동한다.

키 크기와 암호문 팽창은 여전히 문제로 남아 있다. CKKS 암호문은 인코딩하는 평문 데이터보다 10~100배 더 클 수 있으며, 이는 대역폭이 흔히 핵심 제약 조건인 FL 시스템에서 통신 비용을 배가시킨다.

다중 키 시나리오는 추가적인 복잡성을 초래한다. 서로 다른 클라이언트가 서로 다른 암호화 키를 사용하는 경우(조직 간 환경에서 자연스러운 요구 사항), 서버는 다중 키 동형 연산을 수행해야 하며, 이는 단일 키 방식보다 상당히 비용이 많이 든다.

신뢰 모델에 대한 면밀한 검토가 필요하다. FHE 기반 FL은 그래디언트 프라이버시 보호를 위해 서버를 신뢰할 필요성을 제거하지만, 클라이언트는 여전히 암호화 방식, 키 생성 프로토콜, 그리고 집계의 정확성을 신뢰해야 한다. 클라이언트 측 암호화에 대한 사이드 채널 공격은 여전히 관련성을 지닌다.

가장 중요한 활용 분야

헬스케어와 금융 분야는 가장 설득력 있는 활용 사례를 나타낸다. 헬스케어 분야에서는 다기관 병원 ML 협력이 엄격한 규제 제약(HIPAA, GDPR)에 직면해 있어, 표준 FL조차 법적으로 복잡하게 만든다. FHE는 컴플라이언스 논거를 단순화할 수 있는 심층 방어(defense-in-depth) 계층을 제공한다. 금융 분야에서는 기관 간에 공동으로 훈련되는 자금세탁방지 모델이 고객 거래 패턴의 공유를 피하기 위해 암호화된 그래디언트 집계의 혜택을 받을 수 있다.

Intel의 전용 FHE 칩, DARPA가 지원하는 DPRIVE 프로그램 등 하드웨어 가속의 발전과 Lancelot과 같은 알고리즘적 개선의 수렴은 향후 수년 내에 계산 격차가 좁혀질 것임을 시사한다. 그 격차가 대형 언어 모델의 프로덕션 배포에 충분할 만큼 좁혀질지는 아직 미해결 문제로 남아 있다.

미해결 질문

FHE 기반 FL은 그래디언트 텐서가 수십억 개의 파라미터를 포함하는 현대적인 LLM 파인튜닝 규모로 확장될 수 있는가?
서로 다른 위협 모델에 걸쳐 FHE 기반 FL과 차분 프라이버시(differential privacy) 기반 FL 간의 정확도-계산 트레이드오프는 정량적으로 어떻게 비교되는가?
하드웨어 가속 FHE(예: 커스텀 ASIC)가 실시간 애플리케이션에 충분할 만큼 성능 격차를 좁힐 수 있는가?

맺음말

연합 학습에서의 완전 동형 암호화는 느슨한 프라이버시 약속을 암호학적 보장으로 변환한다. Lancelot 프레임워크는 계산 오버헤드를 대폭 줄이는 동시에 비잔틴 위협을 해결함으로써 이를 실용화하는 방향으로 의미 있는 진전을 나타낸다. 그러나 이 분야는 여전히 이론적 우아함이 배포 준비성을 앞서는 단계에 머물러 있다. 격차는 좁혀지고 있지만, 아직 완전히 좁혀지지는 않았다.

References (2)

Jiang, S., et al. (2025). Towards compute-efficient Byzantine-robust federated learning with fully homomorphic encryption. Nature Machine Intelligence.

DOI Scholar

Xie, Q., et al. (2024). Efficiency optimization techniques in privacy-preserving federated learning with homomorphic encryption: A brief survey. IEEE Internet of Things Journal.