Paper ReviewComputer Systems

Quantum Error Correction Below Threshold: What Google's Willow Chip Actually Demonstrates

Google's Willow processor achieved below-threshold quantum error correction with a 101-qubit distance-7 surface code, suppressing logical errors by a factor of 2.14 per code distance increment. We examine what this milestone means—and does not mean—for practical quantum computing.

By ORAA Research

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

For decades, the central promise of quantum error correction (QEC) has rested on a conditional: if physical error rates fall below a critical threshold, then adding more qubits should exponentially suppress logical errors rather than compound them. In December 2024, a Google Quantum AI team reported crossing that threshold on their Willow superconducting processor, presenting a distance-7 surface code operating at 0.143% error per correction cycle. The result is significant, but the distance between a laboratory demonstration and a fault-tolerant quantum computer remains substantial.

The Threshold Problem in Context

Quantum error correction encodes a single logical qubit across many physical qubits, using syndrome measurements to detect and correct errors without destroying quantum information. The surface code—the most studied QEC architecture—arranges qubits on a two-dimensional lattice and tolerates relatively high individual qubit error rates compared to other codes. However, the approach only works when physical error rates sit below a critical threshold, typically estimated near 1% for the standard depolarizing noise model.

Prior experiments had demonstrated surface code elements, but none had convincingly shown that increasing the code distance—the number of qubits encoding one logical qubit—actually reduces the logical error rate. The theoretical prediction is exponential suppression characterized by a factor Lambda: each increment in code distance should multiply the error suppression by Lambda. Below threshold, Lambda exceeds 1; above threshold, adding qubits makes things worse.

What Willow Achieved

The Google team (Acharya et al., 2024) operated two surface code memories on the Willow processor. The headline result is a distance-7 code using 101 physical qubits, achieving a logical error rate of 0.143% ± 0.003% per cycle of error correction. Critically, when the team compared performance between their distance-5 and distance-7 codes, they measured Lambda = 2.14 ± 0.02—meaning each two-step increase in code distance roughly halved the logical error rate.

The logical memory also exceeded breakeven: the encoded logical qubit survived 2.4 times longer than the best individual physical qubit on the chip. This is a meaningful milestone because it demonstrates that the collective encoding genuinely outperforms its components, rather than merely redistributing errors.

A second notable achievement was real-time decoding. The distance-5 code operated with a real-time decoder achieving 63-microsecond average latency at cycle times of 1.1 microseconds, sustained over a million correction cycles. Real-time decoding is essential for any practical quantum computation, since classical post-processing delays would negate the advantage of error correction during active computation.

Critical Assessment: What the Numbers Do Not Say

Several aspects of this work warrant careful interpretation.

Lambda of 2.14 is modest. While any Lambda above 1 confirms below-threshold operation, practical fault-tolerant quantum computing requires logical error rates many orders of magnitude below current levels—typically 10^-10 or better for useful algorithms. Achieving such rates with Lambda = 2.14 would require extremely large code distances, translating to millions of physical qubits. The relationship between current Lambda values and what might be achieved with larger, more optimized chips remains an open extrapolation.

Correlated errors impose limits. The team ran repetition codes up to distance 29 and found that logical performance was ultimately limited by rare correlated error events occurring approximately once per hour (roughly 3 x 10^9 cycles). These events, which simultaneously affect multiple qubits, fall outside the independent error model on which threshold estimates are based. How these correlations scale with chip size and code distance is not yet well understood.

The gap between memory and computation is wide. Willow demonstrated a quantum memory—maintaining a stored logical state—not a fault-tolerant computation. Executing logical gates between surface-code qubits, particularly the non-Clifford gates required for universal quantum computing, introduces additional error channels and architectural complexity through techniques like magic state distillation. Memory experiments, while necessary, test a simpler operational regime.

Resource overhead remains daunting. Even optimistic extrapolations from these results suggest that running algorithms like Shor's factoring or quantum chemistry simulations at useful scales will require physical qubit counts in the millions, well beyond the 105-qubit Willow chip. The engineering challenges of scaling superconducting qubit systems—wiring, cooling, cross-talk mitigation—grow nonlinearly with qubit count.

Broader Implications for the Field

The Willow results do confirm a core theoretical prediction of QEC: below-threshold operation with genuine error suppression is physically achievable. This lends empirical support to the entire fault-tolerant quantum computing program, which has operated largely on theoretical guarantees since Shor's 1996 threshold theorem.

The demonstration also creates a concrete performance benchmark. Other platforms—trapped ions, neutral atoms, photonic systems—can now be compared against a clearly defined target: Lambda > 2 at distance 7 with real-time decoding. Competition among hardware platforms is likely to accelerate as a result.

However, the quantum computing community should resist interpreting this as evidence that fault-tolerant quantum advantage is imminent. The path from Lambda = 2.14 at distance 7 to the error rates needed for practical quantum algorithms spans multiple orders of magnitude in both qubit quality and qubit count.

Open Questions

Can Lambda be significantly improved through better qubit fabrication, or does it represent a fundamental characteristic of superconducting platforms at this stage?
How do correlated error events scale as chip sizes increase, and can they be mitigated through architectural innovations?
What is the realistic timeline for demonstrating fault-tolerant logical gates (not just memory) below threshold?
Will competing QEC codes—such as the recently proposed quantum LDPC codes—offer better scaling characteristics than the surface code?

Closing Reflection

Google's Willow result is a genuine scientific milestone: the first clear demonstration that quantum error correction works as theory predicted, with error rates suppressed by increasing code distance. Yet milestones are not destinations. The engineering, physics, and systems challenges that remain between this proof-of-principle and practical fault-tolerant quantum computing are formidable. The field has crossed a threshold in the laboratory; crossing it at scale is the work of the coming decades.

수십 년 동안, 양자 오류 정정(QEC)의 핵심 약속은 하나의 조건에 기반해 왔다: 물리적 오류율이 임계 임계값 아래로 떨어진다면, 큐비트를 추가할수록 논리적 오류가 누적되는 것이 아니라 지수적으로 억제되어야 한다는 것이다. 2024년 12월, Google Quantum AI 팀은 Willow 초전도 프로세서에서 해당 임계값을 넘었다고 보고하며, 오류 정정 사이클당 0.143%의 오류율로 작동하는 거리-7 표면 코드를 제시하였다. 이 결과는 중요한 의미를 지니지만, 실험실 시연과 내결함성 양자 컴퓨터 사이의 거리는 여전히 상당하다.

맥락에서의 임계값 문제

양자 오류 정정은 단일 논리적 큐비트를 다수의 물리적 큐비트에 걸쳐 인코딩하고, 증후군 측정을 활용하여 양자 정보를 파괴하지 않고 오류를 감지 및 수정한다. 가장 많이 연구된 QEC 아키텍처인 표면 코드는 큐비트를 2차원 격자 위에 배열하며, 다른 코드에 비해 상대적으로 높은 개별 큐비트 오류율을 허용한다. 그러나 이 접근법은 물리적 오류율이 임계값 아래에 있을 때에만 작동하며, 표준 탈분극 잡음 모델에서 이 임계값은 일반적으로 1% 부근으로 추정된다.

이전 실험들은 표면 코드의 요소들을 시연한 바 있지만, 코드 거리—하나의 논리적 큐비트를 인코딩하는 물리적 큐비트의 수—를 늘리는 것이 실제로 논리적 오류율을 낮춘다는 점을 설득력 있게 보여준 사례는 없었다. 이론적 예측은 Lambda라는 인수로 특징지어지는 지수적 억제이다: 코드 거리가 한 단계 증가할 때마다 오류 억제가 Lambda배 향상되어야 한다. 임계값 이하에서는 Lambda가 1을 초과하며, 임계값 이상에서는 큐비트를 추가할수록 상황이 악화된다.

Willow가 달성한 것

Google 팀(Acharya et al., 2024)은 Willow 프로세서에서 두 개의 표면 코드 메모리를 구동하였다. 주요 결과는 101개의 물리적 큐비트를 사용하는 거리-7 코드로, 오류 정정 사이클당 0.143% ± 0.003%의 논리적 오류율을 달성하였다. 핵심적으로, 팀이 거리-5 코드와 거리-7 코드 간의 성능을 비교했을 때, Lambda = 2.14 ± 0.02를 측정하였으며, 이는 코드 거리가 두 단계 증가할 때마다 논리적 오류율이 대략 절반으로 줄어든다는 것을 의미한다.

논리적 메모리는 손익분기점도 초과하였다: 인코딩된 논리적 큐비트는 칩에서 가장 우수한 개별 물리적 큐비트보다 2.4배 더 오래 유지되었다. 이는 오류를 단순히 재분배하는 것이 아니라 집합적 인코딩이 구성 요소들을 실질적으로 능가함을 보여준다는 점에서 의미 있는 이정표이다.

두 번째 주목할 만한 성과는 실시간 디코딩이다. 거리-5 코드는 1.1마이크로초의 사이클 타임에서 평균 지연시간 63마이크로초를 달성하는 실시간 디코더로 작동하였으며, 백만 회 이상의 정정 사이클에 걸쳐 지속되었다. 실시간 디코딩은 실용적인 양자 계산에 필수적인데, 고전적 후처리 지연이 능동적 계산 중 오류 정정의 이점을 무효화할 수 있기 때문이다.

비판적 평가: 수치들이 말하지 않는 것

이 연구의 여러 측면은 신중한 해석을 요한다.

Lambda 2.14는 소박한 수준이다. Lambda가 1을 초과한다는 것은 임계값 이하 작동을 확인하지만, 실용적인 내결함성 양자 컴퓨팅은 유용한 알고리즘을 위해 현재 수준보다 수십 자릿수 낮은 논리적 오류율—일반적으로 10^-10 또는 그 이상—을 필요로 한다. Lambda = 2.14로 그러한 오류율을 달성하려면 극도로 큰 코드 거리가 요구되며, 이는 수백만 개의 물리적 큐비트로 귀결된다. 현재의 Lambda 값과 더 크고 최적화된 칩으로 달성할 수 있는 값 사이의 관계는 여전히 열린 외삽으로 남아 있다. 상관 오류는 한계를 부과한다. 연구팀은 거리 29까지 반복 코드를 실행하였으며, 논리적 성능이 궁극적으로 약 1시간에 한 번(대략 3 x 10^9 사이클) 발생하는 희귀한 상관 오류 이벤트에 의해 제한됨을 발견하였다. 여러 큐비트에 동시에 영향을 미치는 이러한 이벤트들은 임계값 추정의 근거가 되는 독립 오류 모델의 범위를 벗어난다. 이러한 상관관계가 칩 크기 및 코드 거리에 따라 어떻게 확장되는지는 아직 충분히 이해되지 않고 있다.

메모리와 연산 사이의 간극은 크다. Willow는 저장된 논리적 상태를 유지하는 양자 메모리를 시연하였을 뿐, 내결함성 연산을 시연한 것이 아니다. surface code 큐비트 간의 논리 게이트, 특히 범용 양자 컴퓨팅에 필요한 비-Clifford 게이트를 실행하는 것은 magic state distillation과 같은 기법을 통해 추가적인 오류 채널과 아키텍처적 복잡성을 도입한다. 메모리 실험은 필요하기는 하지만, 보다 단순한 동작 영역을 테스트하는 것이다.

자원 오버헤드는 여전히 막대하다. 이 결과로부터의 낙관적인 외삽조차도, Shor의 소인수분해나 양자 화학 시뮬레이션과 같은 알고리즘을 유용한 규모에서 실행하려면 수백만 개의 물리적 큐비트가 필요하며, 이는 105개의 큐비트를 갖춘 Willow 칩을 훨씬 초과하는 수준임을 시사한다. 초전도 큐비트 시스템의 확장에 따르는 공학적 과제—배선, 냉각, 크로스톡 완화—는 큐비트 수에 따라 비선형적으로 증가한다.

해당 분야에 대한 광범위한 함의

Willow의 결과는 QEC의 핵심 이론적 예측, 즉 임계값 이하 동작을 통한 실질적인 오류 억제가 물리적으로 달성 가능하다는 점을 확인시켜 준다. 이는 Shor의 1996년 임계값 정리 이후 주로 이론적 보장에 의존해 온 내결함성 양자 컴퓨팅 프로그램 전반에 경험적 뒷받침을 제공한다.

이번 시연은 또한 구체적인 성능 벤치마크를 제시한다. 포획 이온, 중성 원자, 광자 시스템 등 다른 플랫폼들은 이제 명확하게 정의된 목표, 즉 실시간 디코딩을 갖춘 거리 7에서 Lambda > 2라는 기준과 비교될 수 있다. 그 결과 하드웨어 플랫폼 간의 경쟁이 가속화될 가능성이 높다.

그러나 양자 컴퓨팅 커뮤니티는 이를 내결함성 양자 우위가 임박하였다는 증거로 해석하는 것을 경계해야 한다. 거리 7에서 Lambda = 2.14로부터 실용적인 양자 알고리즘에 필요한 오류율에 이르는 경로는 큐비트 품질과 큐비트 수 모두에서 여러 차수의 규모를 넘나든다.

미해결 질문들

Lambda는 더 나은 큐비트 제조를 통해 유의미하게 개선될 수 있는가, 아니면 현 단계에서 초전도 플랫폼의 근본적인 특성을 나타내는 것인가?
칩 크기가 증가함에 따라 상관 오류 이벤트는 어떻게 확장되며, 아키텍처적 혁신을 통해 완화될 수 있는가?
임계값 이하에서 내결함성 논리 게이트(메모리만이 아닌)를 시연하는 현실적인 일정은 어떠한가?
최근 제안된 양자 LDPC 코드와 같은 경쟁적인 QEC 코드들이 surface code보다 더 나은 확장 특성을 제공할 것인가?

맺음말

Google의 Willow 결과는 진정한 과학적 이정표이다. 즉, 코드 거리가 증가함에 따라 오류율이 억제되면서 양자 오류 정정이 이론의 예측대로 작동한다는 것을 처음으로 명확하게 시연한 것이다. 그러나 이정표는 목적지가 아니다. 이 개념 증명과 실용적인 내결함성 양자 컴퓨팅 사이에 남아 있는 공학적, 물리적, 시스템적 과제는 만만치 않다. 해당 분야는 실험실에서 임계값을 넘었으며, 이를 대규모로 달성하는 것은 앞으로 수십 년간의 과제이다.

References (1)

Acharya, R., et al. (2024). Quantum error correction below the surface code threshold. Nature.

DOI Scholar