Deep DiveComputer Systems

Confidential Computing Goes GPU: Protecting AI Models and Data in Use

Confidential computing extends hardware-based trusted execution environments to GPU memory, enabling organizations to run AI training and inference on untrusted infrastructure without exposing models or data. With 75% of organizations reportedly adopting by 2025, the technology addresses a growing tension between cloud AI economics and data sovereignty.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Data security has traditionally focused on two states: data at rest (encrypted on disk) and data in transit (encrypted over networks). The third state—data in use, actively being processed in memory—has remained largely unprotected. This gap matters because computation requires data to be decrypted in memory, creating a window during which privileged software (hypervisors, operating systems, firmware) or physical attackers can access plaintext data.

Confidential computing closes this gap through hardware-based trusted execution environments (TEEs) that isolate code and data even from the infrastructure owner. CPU-based TEEs—Intel SGX, Intel TDX, AMD SEV-SNP, ARM CCA—have matured over the past decade. But AI workloads run on GPUs, not CPUs, and extending confidential computing guarantees to GPU memory introduces a fundamentally different set of architectural challenges.

Why This Matters for AI

The tension is straightforward: AI training and inference increasingly happen on cloud infrastructure that the model owner does not control. This creates three simultaneous exposure risks:

Training data exposure. Organizations training models on sensitive data (medical records, financial transactions, legal documents) must trust the cloud provider's infrastructure, staff, and supply chain. A compromised hypervisor or a malicious insider can access training data in GPU memory.

Model exposure. Trained models represent substantial intellectual property. A model deployed for inference on third-party infrastructure can be extracted by an adversary with access to the host system's memory.

Inference data exposure. When users send queries to a cloud-hosted model, the input data (which may contain sensitive personal or business information) passes through the cloud provider's infrastructure in plaintext during processing.

Confidential computing addresses all three by ensuring that data remains encrypted in GPU memory and is only decrypted within a hardware-enforced enclave that the cloud provider cannot access.

The Architecture: From CPU Enclaves to GPU TEEs

CPU-based confidential computing works by creating isolated memory regions (enclaves) that are encrypted with keys held only by the hardware. Even the operating system and hypervisor cannot read enclave memory. Attestation protocols allow remote parties to verify that a specific enclave is running expected code on genuine hardware.

Extending this to GPUs requires solving several additional problems:

GPU memory isolation. GPU memory (HBM) operates differently from CPU memory. Data moves between CPU and GPU over PCIe or NVLink, and within the GPU across a complex memory hierarchy (registers, shared memory, L2 cache, HBM). Each of these transfer points must be protected.

Encrypted data transfer. The CPU-to-GPU link must be encrypted to prevent bus-snooping attacks. NVIDIA's confidential computing implementation uses a secure channel between the CPU TEE and a GPU-side security processor that manages encryption keys for GPU memory.

Attestation chain. Remote attestation must cover not just the CPU enclave but the GPU hardware, firmware, and driver stack. A compromised GPU driver could redirect computation or exfiltrate data, so the attestation chain must verify the integrity of the entire software stack from CPU through GPU.

Performance overhead. Encryption and isolation add computational overhead. For AI workloads where GPU utilization and memory bandwidth are critical, the performance cost of confidential computing determines practical viability.

Adoption Landscape

According to industry analysis (IDC, 2025), approximately 75% of organizations are reportedly adopting confidential computing technologies by 2025, and 45% are deploying them specifically in hybrid cloud environments. These figures reflect broad organizational interest, though the depth and maturity of adoption varies substantially.

Claims and Evidence

Claim	Source	Verdict
Confidential computing extends hardware enclaves to GPU memory for AI workloads	arXiv 2511.04550 — architectural description	Stated in abstract
Enables simultaneous protection of AI models and training data	arXiv 2511.04550 — security model	Stated in abstract
75% of organizations adopting confidential computing by 2025	IDC, 2025 — industry survey	Stated in source; survey methodology should be examined
45% deploying in hybrid cloud environments	IDC, 2025 — deployment survey	Stated in source; self-reported adoption figures

Critical Analysis

Performance overhead remains the central question. AI workloads are performance-sensitive—a 10% throughput reduction on a training run that costs $100,000 in compute translates directly to $10,000 in additional cost. Published benchmarks for GPU confidential computing show variable overhead depending on workload characteristics, but the abstract does not specify performance impact. Users need workload-specific benchmarks, not aggregate claims.

The trusted computing base is larger than it appears. Confidential computing reduces trust requirements but does not eliminate them. Users must trust the hardware manufacturer (NVIDIA, AMD), the firmware signing authority, and the attestation infrastructure. Side-channel attacks against TEEs—demonstrated against Intel SGX in particular—remind us that hardware isolation is not absolute.

Supply chain integrity. If the hardware itself is compromised (e.g., through supply chain tampering), hardware-based security guarantees collapse. Confidential computing assumes genuine, unmodified hardware—an assumption that is difficult to verify and increasingly relevant given geopolitical tensions around semiconductor supply chains.

Multi-tenant GPU sharing. Cloud economics favor multi-tenant GPU sharing (MIG, time-slicing). Confidential computing in multi-tenant GPU environments requires isolation between tenants at the hardware level, which adds complexity and may reduce the efficiency gains that multi-tenancy provides.

Adoption figures deserve scrutiny. The 75% adoption figure from IDC likely reflects stated plans or pilot deployments rather than production-scale implementation. The gap between organizational awareness of confidential computing and actual deployment of GPU-based TEEs for AI workloads is likely substantial.

Open Questions

What is the real-world performance overhead for large-scale training? Published microbenchmarks may not reflect the overhead patterns of actual training runs with mixed communication patterns, gradient aggregation, and checkpointing.

How does confidential computing interact with distributed training? Multi-GPU and multi-node training requires communication between GPUs (NCCL, all-reduce). Encrypting inter-GPU communication while maintaining training throughput is a non-trivial challenge.

Can confidential computing prevent model extraction via side channels? Even with memory encryption, timing side channels, power analysis, and electromagnetic emanation could leak information about model architecture or weights. Hardware TEEs protect against software attackers, not necessarily physical attackers.

What regulatory frameworks recognize confidential computing? If healthcare or financial organizations adopt GPU TEEs to meet data protection requirements, regulators need to evaluate whether hardware-based isolation satisfies existing data protection standards (HIPAA, GDPR, PCI DSS).

Will this enable new AI collaboration models? If multiple organizations can train jointly on combined data without any party accessing the other's data, confidential computing could enable privacy-preserving federated learning at the hardware level—a stronger guarantee than algorithmic approaches alone.

Closing Reflection

GPU-based confidential computing addresses a real and growing tension: organizations want the economic benefits of cloud-based AI training and inference but cannot accept the data exposure risks inherent in running workloads on infrastructure they do not control. The technology is architecturally sound—extending proven CPU TEE principles to GPU memory—but its practical value depends on performance overhead, which must be low enough that security does not become a luxury tax on AI computation.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문을 통해 검증해야 한다.

기밀 컴퓨팅의 GPU 확장: 사용 중인 AI 모델과 데이터 보호

데이터 보안은 전통적으로 두 가지 상태에 초점을 맞춰 왔다: 정지 상태의 데이터(디스크에 암호화)와 전송 중인 데이터(네트워크를 통해 암호화). 세 번째 상태인 사용 중 데이터, 즉 메모리에서 활발히 처리되는 데이터는 대체로 보호받지 못한 상태로 남아 있었다. 연산은 데이터를 메모리에서 복호화해야 하기 때문에 이 공백은 중요한 문제이다. 이로 인해 권한 있는 소프트웨어(하이퍼바이저, 운영 체제, 펌웨어)나 물리적 공격자가 평문 데이터에 접근할 수 있는 취약한 시간대가 생겨난다.

기밀 컴퓨팅(confidential computing)은 인프라 소유자로부터도 코드와 데이터를 격리하는 하드웨어 기반의 신뢰 실행 환경(TEE, Trusted Execution Environment)을 통해 이 공백을 해소한다. CPU 기반 TEE인 Intel SGX, Intel TDX, AMD SEV-SNP, ARM CCA는 지난 10년에 걸쳐 성숙 단계에 이르렀다. 그러나 AI 워크로드는 CPU가 아닌 GPU에서 실행되며, GPU 메모리로 기밀 컴퓨팅의 보장을 확장하는 것은 근본적으로 다른 아키텍처적 과제를 수반한다.

AI에서 이것이 중요한 이유

핵심적인 긴장 관계는 명확하다: AI 학습과 추론이 모델 소유자가 직접 통제하지 않는 클라우드 인프라에서 점점 더 많이 이루어지고 있다는 점이다. 이로 인해 세 가지 노출 위험이 동시에 발생한다.

학습 데이터 노출. 민감한 데이터(의료 기록, 금융 거래, 법률 문서)로 모델을 학습하는 조직은 클라우드 제공업체의 인프라, 직원 및 공급망을 신뢰해야 한다. 침해된 하이퍼바이저나 악의적인 내부자는 GPU 메모리의 학습 데이터에 접근할 수 있다.

모델 노출. 학습된 모델은 상당한 지적 재산을 나타낸다. 제3자 인프라에서 추론용으로 배포된 모델은 호스트 시스템의 메모리에 접근할 수 있는 공격자에 의해 탈취될 수 있다.

추론 데이터 노출. 사용자가 클라우드에 호스팅된 모델에 쿼리를 전송할 때, 입력 데이터(민감한 개인 정보나 비즈니스 정보를 포함할 수 있는)는 처리 중에 클라우드 제공업체의 인프라를 평문 상태로 통과한다.

기밀 컴퓨팅은 데이터가 GPU 메모리에서 암호화된 상태를 유지하고, 클라우드 제공업체가 접근할 수 없는 하드웨어 강제 인클레이브(enclave) 내에서만 복호화되도록 보장함으로써 이 세 가지 모두를 해결한다.

아키텍처: CPU 인클레이브에서 GPU TEE로

CPU 기반 기밀 컴퓨팅은 하드웨어만이 보유한 키로 암호화된 격리된 메모리 영역(인클레이브)을 생성하는 방식으로 작동한다. 운영 체제와 하이퍼바이저조차도 인클레이브 메모리를 읽을 수 없다. 원격 증명(attestation) 프로토콜을 통해 원격 당사자는 특정 인클레이브가 정품 하드웨어에서 예상된 코드를 실행하고 있음을 검증할 수 있다.

이를 GPU로 확장하려면 몇 가지 추가적인 문제를 해결해야 한다.

GPU 메모리 격리. GPU 메모리(HBM)는 CPU 메모리와 다른 방식으로 동작한다. 데이터는 PCIe 또는 NVLink를 통해 CPU와 GPU 사이에서 이동하며, GPU 내부에서는 복잡한 메모리 계층 구조(레지스터, 공유 메모리, L2 캐시, HBM)를 가로질러 이동한다. 이러한 전송 지점 각각이 보호되어야 한다.

암호화된 데이터 전송. 버스 스누핑(bus-snooping) 공격을 방지하기 위해 CPU-GPU 링크를 암호화해야 한다. NVIDIA의 기밀 컴퓨팅 구현은 CPU TEE와 GPU 메모리의 암호화 키를 관리하는 GPU 측 보안 프로세서 사이의 보안 채널을 사용한다.

증명 체인. 원격 증명은 CPU 인클레이브뿐만 아니라 GPU 하드웨어, 펌웨어, 드라이버 스택까지 포괄해야 한다. 침해된 GPU 드라이버는 연산을 우회하거나 데이터를 유출할 수 있으므로, 증명 체인은 CPU부터 GPU에 이르는 전체 소프트웨어 스택의 무결성을 검증해야 한다.

성능 오버헤드. 암호화와 격리는 연산 오버헤드를 추가한다. GPU 활용률과 메모리 대역폭이 중요한 AI 워크로드에서는 기밀 컴퓨팅의 성능 비용이 실질적인 실현 가능성을 결정한다.

도입 현황

업계 분석(IDC, 2025)에 따르면, 2025년까지 약 75%의 조직이 기밀 컴퓨팅(confidential computing) 기술을 도입할 것으로 보고되며, 45%는 이를 특별히 하이브리드 클라우드 환경에 배포하고 있다. 이러한 수치는 조직들의 폭넓은 관심을 반영하지만, 도입의 깊이와 성숙도는 상당히 다양하다.

주장과 근거

주장	출처	판정
기밀 컴퓨팅은 AI 워크로드를 위해 하드웨어 엔클레이브(enclave)를 GPU 메모리로 확장한다	arXiv 2511.04550 — 아키텍처 설명	초록에 명시됨
AI 모델과 훈련 데이터를 동시에 보호할 수 있다	arXiv 2511.04550 — 보안 모델	초록에 명시됨
2025년까지 75%의 조직이 기밀 컴퓨팅 도입	IDC, 2025 — 업계 설문조사	출처에 명시됨; 설문 방법론 검토 필요
45%가 하이브리드 클라우드 환경에 배포	IDC, 2025 — 배포 설문조사	출처에 명시됨; 자기 보고 방식의 도입 수치

비판적 분석

성능 오버헤드는 여전히 핵심 쟁점이다. AI 워크로드는 성능에 민감하다. 컴퓨팅 비용으로 $100,000이 소요되는 훈련 실행에서 처리량이 10% 감소하면 이는 곧 $10,000의 추가 비용으로 직결된다. GPU 기밀 컴퓨팅에 대해 발표된 벤치마크는 워크로드 특성에 따라 오버헤드가 다양하게 나타나지만, 해당 초록에는 성능 영향이 명시되어 있지 않다. 사용자에게는 집계된 주장이 아니라 워크로드별 벤치마크가 필요하다.

신뢰 컴퓨팅 기반(trusted computing base)은 보이는 것보다 더 넓다. 기밀 컴퓨팅은 신뢰 요구 사항을 줄이지만 완전히 제거하지는 못한다. 사용자는 하드웨어 제조업체(NVIDIA, AMD), 펌웨어 서명 기관, 그리고 원격 증명(attestation) 인프라를 신뢰해야 한다. 특히 Intel SGX를 대상으로 실증된 TEE에 대한 사이드 채널(side-channel) 공격은 하드웨어 격리가 절대적이지 않음을 상기시켜 준다.

공급망 무결성. 하드웨어 자체가 (예: 공급망 변조를 통해) 침해된 경우, 하드웨어 기반 보안 보장은 붕괴된다. 기밀 컴퓨팅은 진본이며 수정되지 않은 하드웨어를 전제로 하는데, 이는 검증하기 어려운 가정이며 반도체 공급망을 둘러싼 지정학적 긴장이 고조되는 현 시점에서 점점 더 중요해지고 있다.

멀티 테넌트(multi-tenant) GPU 공유. 클라우드 경제학은 멀티 테넌트 GPU 공유(MIG, 타임 슬라이싱)를 선호한다. 멀티 테넌트 GPU 환경에서의 기밀 컴퓨팅은 하드웨어 수준에서 테넌트 간 격리를 필요로 하며, 이는 복잡성을 증가시키고 멀티 테넌시가 제공하는 효율성 이점을 감소시킬 수 있다.

도입 수치는 면밀한 검토가 필요하다. IDC의 75% 도입 수치는 프로덕션 규모의 구현보다는 계획 표명이나 파일럿 배포를 반영할 가능성이 높다. 기밀 컴퓨팅에 대한 조직의 인식과 AI 워크로드를 위한 GPU 기반 TEE의 실제 배포 사이의 격차는 상당할 것으로 보인다.

미해결 질문들

대규모 훈련에서 실제 성능 오버헤드는 얼마인가? 발표된 마이크로벤치마크는 혼합 통신 패턴, 그래디언트 집계(gradient aggregation), 체크포인팅을 포함하는 실제 훈련 실행의 오버헤드 패턴을 반영하지 못할 수 있다.

기밀 컴퓨팅은 분산 훈련과 어떻게 상호작용하는가? 멀티 GPU 및 멀티 노드 훈련은 GPU 간 통신(NCCL, all-reduce)을 필요로 한다. 훈련 처리량을 유지하면서 GPU 간 통신을 암호화하는 것은 간단치 않은 과제이다.

기밀 컴퓨팅은 사이드 채널을 통한 모델 추출을 방지할 수 있는가? 메모리 암호화가 이루어지더라도 타이밍 사이드 채널, 전력 분석, 전자기 방출은 모델 아키텍처나 가중치에 관한 정보를 유출할 수 있다. 하드웨어 TEE는 소프트웨어 공격자로부터는 보호하지만, 물리적 공격자로부터는 반드시 그렇지 않을 수 있다.

기밀 컴퓨팅을 인정하는 규제 체계는 무엇인가? 의료 또는 금융 조직이 데이터 보호 요건을 충족하기 위해 GPU TEE를 도입하는 경우, 규제 당국은 하드웨어 기반 격리가 기존의 데이터 보호 표준(HIPAA, GDPR, PCI DSS)을 충족하는지 평가해야 한다.

이것이 새로운 AI 협업 모델을 가능하게 할 것인가? 여러 조직이 서로의 데이터에 접근하지 않고 결합된 데이터로 공동 학습할 수 있다면, 기밀 컴퓨팅은 하드웨어 수준에서 프라이버시 보존 연합 학습(federated learning)을 가능하게 할 수 있으며, 이는 알고리즘적 접근 방식만으로는 얻을 수 없는 더 강력한 보장을 제공한다.

마무리 성찰

GPU 기반 기밀 컴퓨팅은 실재하며 점점 커지는 긴장을 해소한다. 조직들은 클라우드 기반 AI 학습 및 추론의 경제적 이점을 원하지만, 자신들이 통제하지 않는 인프라에서 워크로드를 실행하는 데 내재된 데이터 노출 위험은 받아들일 수 없다. 이 기술은 아키텍처적으로 건전하다. 즉, 검증된 CPU TEE 원칙을 GPU 메모리로 확장한 것이다. 그러나 그 실질적인 가치는 성능 오버헤드에 달려 있으며, 보안이 AI 연산에 대한 사치세가 되지 않을 만큼 오버헤드가 충분히 낮아야 한다.

References (2)

Confidential Computing for GPU-Accelerated AI Workloads. arXiv (2025). DOI: 10.48550/arXiv.2511.04550.

Scholar

arXiv (2025). Confidential Computing for GPU-Accelerated AI Workloads.

DOI Scholar