Paper ReviewComputer SystemsExperimental Design

Gigapixel Pathology at Scale: Distributed Computing for Whole-Slide Image Analysis

A single whole-slide pathology image can exceed 10 gigapixels—far too large for any single GPU to process. ComPRePS 2.0 demonstrates how HPC clusters can process these images in parallel, enabling computational pathology at the scale needed for population-level cancer screening.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

The digitization of pathology has created a data challenge unlike anything else in medicine. A single whole-slide image (WSI)—a microscopy scan of a tissue sample at diagnostic resolution—typically contains 1 to 10 gigapixels, with large specimens at 40× magnification reaching 10 gigapixels or more. A moderate-sized hospital generates thousands of WSIs daily. A national cancer screening program processes millions annually.

Processing these images with AI—detecting cancer cells, grading tumors, quantifying biomarkers—requires computational resources that no single machine can provide. A single WSI may take minutes to process on a high-end GPU; multiplied by thousands or millions of images, the total computation is enormous. The bottleneck is not the AI model (which is relatively compact) but the data pipeline: reading multi-gigabyte image files, tiling them into processable patches, distributing patches across compute nodes, running inference, and aggregating results back into slide-level predictions.

Kumar et al.'s ComPRePS 2.0 tackles this pipeline challenge head-on, demonstrating how HPC clusters can be organized to process histopathological data at the scale required for clinical deployment.

The Data Pipeline Challenge

Processing a WSI with AI involves a pipeline that is I/O-intensive, compute-intensive, and coordination-intensive in roughly equal measure:

Reading: WSIs are stored in pyramidal formats (SVS, NDPI, MRXS) where the full-resolution image is accompanied by lower-resolution overview layers. Reading the full-resolution layer requires streaming gigabytes of compressed image data from storage—a process that is often storage-bandwidth-limited rather than compute-limited.

Tiling: The full-resolution image is divided into overlapping tiles (typically 256×256 or 512×512 pixels) that can be processed independently by the AI model. A single WSI may produce 50,000 to 200,000 tiles. Managing this tile set—tracking coordinates, handling overlap regions, maintaining tissue masks to skip background tiles—requires careful bookkeeping.

Inference: Each tile is processed by a deep learning model that classifies tissue type, detects cellular features, or segments structures. Individual tile inference is fast (milliseconds on a GPU), but the volume of tiles makes total inference time substantial.

Aggregation: Tile-level predictions must be aggregated into slide-level results—combining thousands of tile predictions into a single diagnosis, tumor grade, or biomarker quantification. The aggregation logic must handle tile boundaries (features that span multiple tiles) and spatial context (a cluster of positive tiles is more significant than isolated positives).

The ComPRePS 2.0 Architecture

ComPRePS 2.0 distributes this pipeline across an HPC cluster with three key design decisions:

Task-level parallelism: Each WSI is processed as an independent task. Tasks are distributed across cluster nodes by a job scheduler that balances load and respects resource constraints (GPU memory, storage bandwidth).

Pipeline parallelism within tasks: Within a single WSI, the read-tile-infer-aggregate stages overlap: while one batch of tiles is being processed by the GPU, the next batch is being read and tiled by the CPU, and the previous batch's results are being aggregated. This pipelining hides I/O latency behind compute time.

Storage optimization: WSIs are pre-processed to extract tissue masks and tile coordinates before the inference phase begins. This pre-processing can be done once and cached, avoiding redundant computation when the same slide is processed with different AI models.

Claims and Evidence

Claim	Evidence	Verdict
WSI processing is computationally intensive at clinical scale	Data volume analysis for hospital and population-scale pathology	✅ Well-documented
HPC clusters can parallelize WSI processing effectively	ComPRePS 2.0 demonstrates parallel processing across cluster nodes	✅ Supported
Pipeline parallelism hides I/O latency	Overlapping read/compute/aggregate stages demonstrated	✅ Supported
Current computational pathology systems meet clinical throughput requirements	Processing speed depends on cluster size and model complexity	⚠️ Achievable but resource-intensive

Open Questions

Real-time pathology: Can computational pathology achieve fast enough turnaround for intra-operative consultation—where a surgeon waits for a diagnosis while the patient is on the operating table? This requires processing in minutes, not hours.

Cloud vs. on-premises: Clinical data governance often requires processing within the hospital network. Should computational pathology run on cloud HPC (scalable but raises data sovereignty concerns) or on-premises clusters (secure but limited in scale)?

Multi-stain integration: Pathologists use multiple staining techniques (H&E, IHC, special stains) on consecutive tissue sections. AI systems that integrate multi-stain information require cross-slide registration—aligning images from different stains of adjacent tissue sections—which adds geometric complexity to the processing pipeline.

Quality control: Not all WSIs are suitable for AI analysis—out-of-focus regions, tissue folds, air bubbles, and staining artifacts can cause incorrect predictions. Automated quality control that detects and flags problematic regions before inference improves reliability but adds processing overhead.

What This Means for Your Research

For computational pathology researchers, the systems infrastructure for processing WSIs at scale is as important as the AI models that analyze them. A model that achieves 99% accuracy on a curated benchmark but cannot process a hospital's daily slide volume is not clinically useful. ComPRePS 2.0 demonstrates that the infrastructure challenge is solvable with careful pipeline design.

For HPC researchers, digital pathology provides a domain with clear throughput requirements, well-defined pipeline stages, and enormous data volumes—characteristics that map well onto HPC cluster architectures. The challenge of processing petabytes of image data with AI models is shared with satellite remote sensing, genomics, and other data-intensive scientific domains.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 특정 발견, 통계, 주장은 원본 논문을 통해 반드시 확인해야 한다.

대규모 기가픽셀 병리학: 전체 슬라이드 이미지 분석을 위한 분산 컴퓨팅

병리학의 디지털화는 의학 분야에서 유례없는 데이터 처리 과제를 만들어냈다. 단일 전체 슬라이드 이미지(WSI)—진단 해상도로 조직 샘플을 스캔한 현미경 이미지—는 일반적으로 1~10 기가픽셀을 포함하며, 40× 배율의 대형 표본은 10 기가픽셀 이상에 달하기도 한다. 중간 규모의 병원은 하루에 수천 장의 WSI를 생성하며, 국가 암 검진 프로그램은 연간 수백만 장을 처리한다.

암세포 탐지, 종양 등급 분류, 바이오마커 정량화 등 AI를 이용한 이미지 처리에는 단일 장비로는 감당할 수 없는 컴퓨팅 자원이 필요하다. 단일 WSI를 고성능 GPU에서 처리하는 데 수 분이 소요될 수 있으며, 이를 수천~수백만 장으로 곱하면 전체 연산량은 방대해진다. 병목 지점은 AI 모델(상대적으로 소규모) 자체가 아니라 데이터 파이프라인에 있다. 즉, 수 기가바이트에 달하는 이미지 파일 읽기, 처리 가능한 패치로의 타일링, 컴퓨팅 노드 간 패치 분산, 추론 수행, 그리고 슬라이드 수준의 예측으로의 결과 집계가 그것이다.

Kumar et al.의 ComPRePS 2.0은 이 파이프라인 과제에 정면으로 도전하여, 임상 배포에 필요한 규모로 조직병리학 데이터를 처리하기 위해 HPC 클러스터를 구성하는 방법을 제시한다.

데이터 파이프라인 과제

AI를 활용한 WSI 처리는 대략 동등한 비중으로 I/O 집약적, 컴퓨팅 집약적, 조정 집약적인 파이프라인을 포함한다.

읽기: WSI는 피라미드형 포맷(SVS, NDPI, MRXS)으로 저장되며, 전체 해상도 이미지에 낮은 해상도의 개요 레이어가 함께 포함된다. 전체 해상도 레이어를 읽으려면 저장소에서 수 기가바이트의 압축된 이미지 데이터를 스트리밍해야 하며, 이 과정은 컴퓨팅보다 저장소 대역폭에 의해 병목이 발생하는 경우가 많다.

타일링: 전체 해상도 이미지는 AI 모델이 독립적으로 처리할 수 있는 중첩 타일(일반적으로 256×256 또는 512×512 픽셀)로 분할된다. 단일 WSI는 50,000~200,000개의 타일을 생성할 수 있다. 좌표 추적, 중첩 영역 처리, 배경 타일을 건너뛰기 위한 조직 마스크 유지 등 이 타일 집합을 관리하려면 세심한 기록 관리가 필요하다.

추론: 각 타일은 조직 유형을 분류하거나 세포 특징을 탐지하거나 구조를 분할하는 딥러닝 모델에 의해 처리된다. 개별 타일 추론은 빠르지만(GPU에서 밀리초 단위), 타일의 양이 많아 총 추론 시간은 상당히 길어진다.

집계: 타일 수준의 예측은 슬라이드 수준의 결과로 집계되어야 한다. 수천 개의 타일 예측을 단일 진단, 종양 등급, 또는 바이오마커 정량화로 통합하는 것이다. 집계 로직은 타일 경계(여러 타일에 걸쳐 있는 특징)와 공간적 맥락(양성 타일의 군집은 고립된 양성보다 더 큰 의미를 가짐)을 처리해야 한다.

ComPRePS 2.0 아키텍처

ComPRePS 2.0은 세 가지 핵심 설계 결정을 통해 이 파이프라인을 HPC 클러스터에 분산한다.

태스크 수준 병렬성: 각 WSI는 독립적인 태스크로 처리된다. 태스크는 부하를 균형 있게 조정하고 자원 제약(GPU 메모리, 저장소 대역폭)을 준수하는 작업 스케줄러에 의해 클러스터 노드에 분산된다.

태스크 내 파이프라인 병렬성: 단일 WSI 내에서 읽기-타일링-추론-집계 단계가 중첩된다. GPU가 한 배치의 타일을 처리하는 동안 CPU는 다음 배치를 읽고 타일링하며, 이전 배치의 결과는 집계된다. 이러한 파이프라이닝은 I/O 지연을 컴퓨팅 시간 뒤에 숨긴다.

저장소 최적화: WSI는 추론 단계가 시작되기 전에 조직 마스크와 타일 좌표를 추출하기 위해 전처리된다. 이 전처리는 한 번 수행하여 캐싱할 수 있으므로, 동일한 슬라이드를 서로 다른 AI 모델로 처리할 때 중복 연산을 방지한다.

주장과 근거

주장	근거	판정
WSI 처리는 임상 규모에서 계산 집약적이다	병원 및 인구 규모 병리학을 위한 데이터 볼륨 분석	✅ 충분히 입증됨
HPC 클러스터는 WSI 처리를 효과적으로 병렬화할 수 있다	ComPRePS 2.0이 클러스터 노드 전반에 걸친 병렬 처리를 시연	✅ 지지됨
파이프라인 병렬 처리는 I/O 지연을 숨긴다	읽기/계산/집계 단계의 중첩이 시연됨	✅ 지지됨
현재 계산 병리학 시스템은 임상 처리량 요구 사항을 충족한다	처리 속도는 클러스터 크기 및 모델 복잡도에 따라 달라진다	⚠️ 달성 가능하나 자원 집약적

미해결 문제

실시간 병리학: 계산 병리학은 수술 중 협진—외과의가 환자가 수술대 위에 있는 동안 진단을 기다리는 상황—에 충분히 빠른 처리 시간을 달성할 수 있는가? 이는 몇 시간이 아닌 몇 분 내의 처리를 필요로 한다.

클라우드 대 온프레미스: 임상 데이터 거버넌스는 병원 네트워크 내에서의 처리를 요구하는 경우가 많다. 계산 병리학은 클라우드 HPC(확장 가능하지만 데이터 주권 우려를 야기함)에서 실행되어야 하는가, 아니면 온프레미스 클러스터(안전하지만 규모가 제한됨)에서 실행되어야 하는가?

다중 염색 통합: 병리학자들은 연속 조직 절편에 다중 염색 기법(H&E, IHC, 특수 염색)을 사용한다. 다중 염색 정보를 통합하는 AI 시스템은 인접 조직 절편의 서로 다른 염색으로부터 획득한 이미지를 정렬하는 슬라이드 간 레지스트레이션을 필요로 하며, 이는 처리 파이프라인에 기하학적 복잡성을 추가한다.

품질 관리: 모든 WSI가 AI 분석에 적합한 것은 아니다—초점이 벗어난 영역, 조직 주름, 기포, 염색 아티팩트는 부정확한 예측을 유발할 수 있다. 추론 전에 문제가 있는 영역을 감지하고 플래그를 지정하는 자동화된 품질 관리는 신뢰성을 향상시키지만 처리 오버헤드를 추가한다.

연구에 주는 시사점

계산 병리학 연구자들에게 있어, 대규모 WSI 처리를 위한 시스템 인프라는 이를 분석하는 AI 모델만큼 중요하다. 엄선된 벤치마크에서 99% 정확도를 달성하지만 병원의 일일 슬라이드 처리량을 감당할 수 없는 모델은 임상적으로 유용하지 않다. ComPRePS 2.0은 신중한 파이프라인 설계를 통해 인프라 문제가 해결 가능함을 시연한다.

HPC 연구자들에게 디지털 병리학은 명확한 처리량 요구 사항, 잘 정의된 파이프라인 단계, 방대한 데이터 볼륨을 갖춘 도메인을 제공하며, 이러한 특성은 HPC 클러스터 아키텍처에 잘 부합한다. AI 모델을 활용한 페타바이트 규모의 이미지 데이터 처리 과제는 위성 원격 탐사, 유전체학, 그리고 여타 데이터 집약적 과학 분야와 공유된다.

References (2)

[1] Kumar, S., Paul, A., Abdelazim, H. et al. (2025). ComPRePS 2.0: enabling massive-scale distributed computing on HPC cluster for histopathological data processing. SPIE.

DOI Scholar

Katari Chaluva Kumar, S., Paul, A. S., Abdelazim, H., Dunklin, W., Manthey, D., Moskalenko, O., et al. (2025). ComPRePS 2.0: enabling massive-scale distributed computing on high-performance computing cluster for histopathological data processing. Medical Imaging 2025: Digital and Computational Pathology, 44.