Paper ReviewComputer SystemsDesign Science Research

HTAP: The End of the Analytics-Transactions Divide in Enterprise Databases

For decades, enterprises maintained separate databases for transactions (OLTP) and analytics (OLAP). HTAP systems promise to unify them—processing real-time transactions and complex analytics on the same data store. Kim et al. show how application-database co-design makes this practical.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Enterprise data architecture has been built on a fundamental separation for three decades: transactional systems (OLTP) that process real-time business operations—orders, payments, inventory updates—and analytical systems (OLAP) that process complex queries for business intelligence—trends, forecasts, anomaly detection. Data flows from OLTP to OLAP through ETL (Extract, Transform, Load) pipelines that introduce latency ranging from hours to days.

This separation made engineering sense when memory was expensive and workloads were predictable. But in 2025, the business demand is for real-time analytics on live transactional data—detecting fraud as it happens, adjusting pricing based on current demand, optimizing supply chains in response to live inventory levels. The ETL delay that was acceptable for monthly reporting is unacceptable for real-time decision-making.

Hybrid Transactional/Analytical Processing (HTAP) systems promise to eliminate this delay by processing both workload types on the same data store. Kim et al. demonstrate how this unification requires not just database innovation but co-innovation between applications and databases—a collaboration that challenges traditional boundaries between application development and database engineering.

The Virtual Data Model Approach

Kim et al.'s central contribution is the concept of a virtual data model that bridges the semantic gap between application logic and database operations. Traditional applications interact with databases through SQL queries that specify how to retrieve data. Virtual data models specify what data the application needs, allowing the database to optimize how it is retrieved based on its knowledge of data layout, indexing, and workload patterns.

This inversion—from application-driven query specification to database-driven query optimization—is particularly valuable for HTAP because the optimal query strategy depends on whether the workload is transactional or analytical:

Transactional queries access specific rows by primary key—they benefit from row-oriented storage and index lookup
Analytical queries scan large column ranges—they benefit from columnar storage and parallel scan

An HTAP system with a virtual data model can transparently choose the optimal access strategy for each query, maintaining a dual representation (row-store for transactions, column-store for analytics) and routing queries to the appropriate representation without application awareness.

Cloud-Native HTAP

Sundaramoorthy et al. extend the HTAP concept to cloud-native environments, where the additional challenge is dynamic resource allocation. Cloud databases must scale transaction processing and analytical processing independently—a spike in analytical queries should not degrade transactional performance, and vice versa.

Their autonomous scaling approach uses workload classification to dynamically allocate compute resources between OLTP and OLAP workloads. Machine learning models predict workload patterns from historical data and pre-allocate resources before demand spikes occur—a proactive approach that avoids the latency of reactive autoscaling.

Claims and Evidence

Claim	Evidence	Verdict
HTAP eliminates the ETL delay for real-time analytics	Architectural demonstration with SAP HANA-style systems	✅ Supported
Application-database co-design improves HTAP performance	Kim et al. demonstrate virtual data model benefits	✅ Supported
Cloud-native HTAP can independently scale OLTP and OLAP	Sundaramoorthy et al. propose autonomous scaling framework	⚠️ Framework described, limited deployment evidence
HTAP replaces the need for separate OLTP and OLAP systems	Some workloads still benefit from separation; HTAP adds complexity	⚠️ Depends on workload
Current HTAP systems handle all enterprise workload patterns	Complex analytical workloads may still require dedicated OLAP	⚠️ Workload-dependent

Open Questions

Workload interference: Even with resource isolation, HTAP systems must prevent analytical queries from degrading transactional performance. What level of isolation is sufficient, and what is the overhead?

Consistency semantics: Should analytical queries see the absolute latest transactional state, or is slight staleness acceptable? The answer affects both performance (strong consistency is expensive) and business logic (some analytics require exact consistency).

Migration path: Enterprises have massive investments in separate OLTP/OLAP infrastructure. What is the practical migration path to HTAP, and how do you maintain operations during transition?

Cost comparison: HTAP systems require more sophisticated (and expensive) database technology than separate OLTP and OLAP. Under what conditions does the elimination of ETL infrastructure offset the higher database cost?

Query optimization complexity: HTAP query optimizers must handle a much wider range of workload types than either OLTP or OLAP optimizers alone. Does this increased complexity lead to more optimization errors?

What This Means for Your Research

For database researchers, HTAP co-design (Kim et al.) opens a research direction that requires expertise in both database internals and application architecture—a combination that traditional database research rarely demands. The virtual data model concept is generalizable beyond HTAP to any setting where applications and databases must co-evolve.

For enterprise architects, HTAP represents a genuine architectural simplification—fewer moving parts, no ETL pipelines, real-time analytics capability. But adoption requires careful evaluation of workload patterns, consistency requirements, and the organizational readiness to manage a more sophisticated database platform.

For the data engineering community, the HTAP trend has implications for ETL tool providers, data warehouse vendors, and the practitioners who build and maintain data pipelines. If HTAP succeeds at scale, a significant portion of current data engineering infrastructure becomes unnecessary—a creative destruction that the industry should anticipate.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문을 통해 검증해야 한다.

HTAP: 엔터프라이즈 데이터베이스에서 분석과 트랜잭션의 분리가 끝나다

엔터프라이즈 데이터 아키텍처는 30년간 근본적인 분리를 기반으로 구축되어 왔다. 주문, 결제, 재고 업데이트 등 실시간 비즈니스 운영을 처리하는 트랜잭션 시스템(OLTP)과, 트렌드, 예측, 이상 탐지 등 비즈니스 인텔리전스를 위한 복잡한 쿼리를 처리하는 분석 시스템(OLAP)이 바로 그것이다. 데이터는 OLTP에서 OLAP으로 ETL(Extract, Transform, Load) 파이프라인을 통해 흐르며, 이 과정에서 수 시간에서 수 일에 이르는 지연이 발생한다.

이러한 분리는 메모리가 고가이고 워크로드가 예측 가능했던 시절에는 공학적으로 합리적인 선택이었다. 그러나 2025년에는 실시간 트랜잭션 데이터에 대한 실시간 분석에 대한 비즈니스 수요가 존재한다. 즉, 사기 행위가 발생하는 즉시 탐지하고, 현재 수요에 기반하여 가격을 조정하며, 실시간 재고 수준에 따라 공급망을 최적화해야 한다. 월간 보고에서는 허용 가능했던 ETL 지연이 실시간 의사 결정에서는 용납될 수 없다.

하이브리드 트랜잭션/분석 처리(HTAP) 시스템은 동일한 데이터 저장소에서 두 가지 유형의 워크로드를 모두 처리함으로써 이 지연을 제거할 것을 약속한다. Kim et al.은 이러한 통합이 단순한 데이터베이스 혁신뿐만 아니라 애플리케이션과 데이터베이스 간의 공동 혁신을 필요로 함을 입증하며, 이는 애플리케이션 개발과 데이터베이스 엔지니어링 사이의 전통적인 경계에 도전하는 협업이다.

가상 데이터 모델 접근 방식

Kim et al.의 핵심 기여는 애플리케이션 로직과 데이터베이스 연산 사이의 의미론적 격차를 해소하는 가상 데이터 모델 개념이다. 기존 애플리케이션은 데이터를 어떻게 검색할지 명시하는 SQL 쿼리를 통해 데이터베이스와 상호작용한다. 반면 가상 데이터 모델은 애플리케이션이 어떤 데이터를 필요로 하는지를 명시하며, 이를 통해 데이터베이스는 데이터 레이아웃, 인덱싱, 워크로드 패턴에 대한 자체 지식을 바탕으로 어떻게 데이터를 검색할지 최적화할 수 있다.

이러한 역전—애플리케이션 주도의 쿼리 명세에서 데이터베이스 주도의 쿼리 최적화로—은 HTAP에서 특히 가치 있다. 최적의 쿼리 전략이 워크로드가 트랜잭션인지 분석인지에 따라 달라지기 때문이다.

트랜잭션 쿼리는 기본 키로 특정 행에 접근하므로 행 지향 스토리지와 인덱스 조회에서 이점을 얻는다
분석 쿼리는 대용량 컬럼 범위를 스캔하므로 컬럼형 스토리지와 병렬 스캔에서 이점을 얻는다

가상 데이터 모델을 갖춘 HTAP 시스템은 각 쿼리에 대해 최적의 접근 전략을 투명하게 선택할 수 있으며, 이중 표현(트랜잭션을 위한 행 저장소, 분석을 위한 열 저장소)을 유지하면서 애플리케이션이 이를 인식하지 못하는 상태에서 쿼리를 적절한 표현으로 라우팅한다.

클라우드 네이티브 HTAP

Sundaramoorthy et al.은 HTAP 개념을 클라우드 네이티브 환경으로 확장하며, 이 환경에서의 추가적인 과제는 동적 자원 할당이다. 클라우드 데이터베이스는 트랜잭션 처리와 분석 처리를 독립적으로 확장해야 한다. 즉, 분석 쿼리의 급증이 트랜잭션 성능을 저하시켜서는 안 되며, 그 반대도 마찬가지다.

이들의 자율 확장 접근 방식은 워크로드 분류를 사용하여 OLTP와 OLAP 워크로드 간에 컴퓨팅 자원을 동적으로 할당한다. 머신 러닝 모델은 과거 데이터로부터 워크로드 패턴을 예측하고 수요 급증이 발생하기 전에 자원을 사전 할당한다. 이는 반응형 오토스케일링의 지연을 방지하는 선제적 접근 방식이다.

주장과 근거

주장	근거	판정
HTAP은 실시간 분석을 위한 ETL 지연을 제거한다	SAP HANA 방식 시스템을 이용한 아키텍처 시연	✅ 지지됨
애플리케이션-데이터베이스 공동 설계가 HTAP 성능을 향상시킨다	Kim et al.이 가상 데이터 모델의 이점을 입증	✅ 지지됨
클라우드 네이티브 HTAP은 OLTP와 OLAP을 독립적으로 확장할 수 있다	Sundaramoorthy 등은 자율 확장 프레임워크를 제안한다	⚠️ 프레임워크는 기술되었으나, 배포 증거가 제한적이다
HTAP은 별도의 OLTP 및 OLAP 시스템의 필요성을 대체한다	일부 워크로드는 여전히 분리 방식에서 이점을 얻으며, HTAP은 복잡성을 추가한다	⚠️ 워크로드에 따라 다르다
현재 HTAP 시스템은 모든 엔터프라이즈 워크로드 패턴을 처리한다	복잡한 분석 워크로드는 여전히 전용 OLAP을 필요로 할 수 있다	⚠️ 워크로드에 따라 다르다

미해결 문제

워크로드 간섭: 리소스 격리가 이루어지더라도, HTAP 시스템은 분석 쿼리가 트랜잭션 성능을 저하시키는 것을 방지해야 한다. 어느 수준의 격리가 충분하며, 그 오버헤드는 얼마인가?

일관성 의미론: 분석 쿼리는 절대적으로 최신의 트랜잭션 상태를 보아야 하는가, 아니면 약간의 오래됨(staleness)이 허용 가능한가? 이 답변은 성능(강한 일관성은 비용이 크다)과 비즈니스 로직(일부 분석은 정확한 일관성을 요구한다) 모두에 영향을 미친다.

마이그레이션 경로: 기업들은 별도의 OLTP/OLAP 인프라에 막대한 투자를 해왔다. HTAP으로의 실질적인 마이그레이션 경로는 무엇이며, 전환 과정에서 운영을 어떻게 유지하는가?

비용 비교: HTAP 시스템은 별도의 OLTP 및 OLAP보다 더 정교하고 비용이 높은 데이터베이스 기술을 요구한다. ETL 인프라의 제거가 더 높은 데이터베이스 비용을 상쇄하는 조건은 무엇인가?

쿼리 최적화 복잡성: HTAP 쿼리 옵티마이저는 OLTP 또는 OLAP 옵티마이저 단독보다 훨씬 더 넓은 범위의 워크로드 유형을 처리해야 한다. 이러한 복잡성의 증가가 더 많은 최적화 오류로 이어지는가?

연구에 대한 시사점

데이터베이스 연구자들에게 있어, HTAP 공동 설계(Kim 등)는 데이터베이스 내부 구조와 애플리케이션 아키텍처 모두에 대한 전문성을 요구하는 연구 방향을 열어주며, 이는 전통적인 데이터베이스 연구에서 좀처럼 요구되지 않던 조합이다. 가상 데이터 모델 개념은 HTAP을 넘어, 애플리케이션과 데이터베이스가 함께 진화해야 하는 모든 환경으로 일반화될 수 있다.

엔터프라이즈 아키텍트들에게 있어, HTAP은 진정한 아키텍처 단순화를 의미한다—구성 요소 감소, ETL 파이프라인 불필요, 실시간 분석 기능 확보. 그러나 도입을 위해서는 워크로드 패턴, 일관성 요구사항, 그리고 더 정교한 데이터베이스 플랫폼을 관리할 조직의 준비 상태에 대한 신중한 평가가 필요하다.

데이터 엔지니어링 커뮤니티에게 있어, HTAP 트렌드는 ETL 도구 공급업체, 데이터 웨어하우스 벤더, 그리고 데이터 파이프라인을 구축 및 유지하는 실무자들에게 시사하는 바가 있다. HTAP이 대규모로 성공한다면, 현재 데이터 엔지니어링 인프라의 상당 부분이 불필요해질 것이며, 이는 업계가 미리 예측해야 할 창조적 파괴이다.

References (2)

[1] Kim, K., Kim, H., Lee, J. et al. (2025). Enterprise Application-Database Co-Innovation for Hybrid Transactional/Analytical Processing: A Virtual Data Model and Its Query Optimization Needs. ACM SIGMOD.

DOI Scholar

[2] Sundaramoorthy, P., Parikh, M., Keshireddy, S. (2025). Adaptive Resource Management in Cloud-Native Databases: A Study of Autonomous Scaling and Query Optimization. IEEE ICCES.