Paper ReviewComputer SystemsExperimental Design

When Graph Databases Get Optimization Wrong: Understanding Query Bugs in GDBMSs

Graph databases (Neo4j, TigerGraph, NebulaGraph) are growing rapidly—but their query optimizers harbor bugs that can silently produce incorrect results or catastrophic performance. Chen & Yu systematically analyze these bugs, revealing patterns that differ from those in traditional relational databases.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Graph databases are no longer niche. Neo4j, TigerGraph, Amazon Neptune, and NebulaGraph power applications from social network analysis to fraud detection to knowledge graph querying. Their query languages (Cypher, GSQL, Gremlin, the emerging GQL standard) enable natural expression of graph patterns—relationships, paths, neighborhoods—that relational SQL handles awkwardly at best.

But graph database management systems (GDBMSs) are younger and less battle-tested than their relational counterparts. The query optimizers that transform declarative queries into efficient execution plans are, in many cases, less mature than those in PostgreSQL or Oracle—systems that have benefited from decades of optimization research and production hardening.

Chen & Yu provide a systematic study of query optimization bugs in GDBMSs—bugs where the optimizer produces an incorrect result or a catastrophically inefficient execution plan. Their analysis reveals patterns specific to graph databases that do not occur in relational systems, suggesting that graph query optimization requires its own research agenda rather than adaptation of relational techniques.

The Bug Taxonomy

The authors analyzed hundreds of reported bugs across major GDBMSs, categorizing them into three primary types:

Correctness bugs: The optimizer produces a query plan that returns incorrect results. This is the most dangerous category because it fails silently—the user receives a result that looks plausible but is wrong. In a fraud detection application, a correctness bug might cause the system to miss fraudulent transactions; in a recommendation system, it might return irrelevant suggestions.

The graph-specific correctness bugs arise from:

Path semantics: Graph queries often specify path patterns (find all paths between A and B of length ≤ 5). The optimizer must correctly handle path uniqueness constraints—should a path that visits the same node twice be counted? Different GDBMSs interpret this differently, and optimizer bugs around path semantics are common.
Variable-length pattern matching: Queries with variable-length relationships (MATCH (a)-[*1..5]->(b)) create optimization challenges that have no relational analogue. The optimizer must decide when to expand the variable-length pattern and how to prune the search space—decisions where bugs lead to missing results.

Performance bugs: The optimizer selects a valid but catastrophically slow execution plan. A query that should complete in milliseconds takes hours because the optimizer chose a nested-loop join over a hash join, or expanded a variable-length pattern in the wrong order.

Crash bugs: The optimizer encounters an edge case that causes it to crash—leaving the user with no result at all. While less dangerous than silent correctness bugs, crash bugs erode confidence in the GDBMS and may cause cascading failures in applications that depend on query results.

Patterns and Root Causes

The analysis reveals that graph-specific optimization bugs cluster around features that distinguish graph databases from relational ones:

Recursive queries: Graph traversals are inherently recursive—following paths of unknown length through the graph. Recursion handling in query optimizers is error-prone, especially when combined with filtering conditions that should prune the recursion.
Property graph model complexity: Unlike relational tables with fixed schemas, property graphs allow arbitrary properties on both nodes and edges. The optimizer must handle this schema flexibility without the statistical assumptions (column cardinality, join selectivity) that relational optimizers rely on.
Multi-hop joins: A single graph pattern may imply dozens of join operations—one for each edge traversal. The join ordering space is correspondingly enormous, and heuristic pruning may eliminate the optimal plan.

Claims and Evidence

Claim	Evidence	Verdict
GDBMS query optimizers contain correctness bugs	Systematic analysis of reported bugs across multiple GDBMSs	✅ Documented
Graph-specific query features create optimization challenges absent in RDBMS	Path semantics, variable-length matching, recursive traversal	✅ Supported
Performance bugs can cause order-of-magnitude slowdowns	Bug reports document queries slowing from milliseconds to hours	✅ Documented
Current GDBMSs are as reliable as mature RDBMSs	Bug density suggests less maturity in graph query optimization	❌ Not yet
The GQL standard will reduce implementation inconsistencies	Standard is emerging; adoption and compliance are uncertain	⚠️ Hopeful but unproven

Open Questions

Automated bug detection: Can we build automated tools that test GDBMS query optimizers for correctness—generating queries, comparing results across different execution plans, and flagging discrepancies?

Graph-specific cost models: Relational cost models estimate join costs based on table sizes and selectivity. What are the appropriate cost model primitives for graph traversals, where the cost depends on graph topology (degree distribution, clustering coefficient) rather than flat statistics?

Formal verification of graph optimizers: Can we formally verify that GDBMS query rewrite rules preserve query semantics? This is partially solved for relational databases but untouched for graph databases.

Benchmark standardization: The graph database community lacks standardized benchmarks comparable to TPC-H/TPC-DS for relational databases. Without standard benchmarks, comparing optimizer quality across GDBMSs is difficult.

Hybrid relational-graph optimization: Many applications use both relational and graph queries on the same data. How should optimizers handle queries that span both paradigms?

What This Means for Your Research

For database researchers, graph query optimization is a field where foundational work remains to be done. The relational query optimization literature spans thousands of papers over four decades; the graph equivalent is in its early stages. The bug patterns identified by Chen & Yu provide a roadmap for where research investment is most needed.

For practitioners using graph databases in production, the message is caution: verify query results against known ground truth, especially for complex queries involving variable-length paths and recursive patterns. The optimizer may not be wrong often, but when it is wrong, it fails silently.

For the knowledge graph and AI communities that increasingly rely on graph databases as backend stores for RAG systems, citation networks, and ontologies, optimizer correctness is a prerequisite for trustworthy AI—if the underlying data retrieval is buggy, no amount of LLM sophistication can compensate.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 저작물에서 인용하기 전에 특정 연구 결과, 통계 및 주장을 원본 논문과 대조하여 검증해야 한다.

그래프 데이터베이스가 최적화를 잘못 수행할 때: GDBMSs의 쿼리 버그 이해

그래프 데이터베이스는 더 이상 틈새 기술이 아니다. Neo4j, TigerGraph, Amazon Neptune, NebulaGraph는 소셜 네트워크 분석에서 사기 탐지, 지식 그래프 질의에 이르기까지 다양한 애플리케이션을 구동하고 있다. 이들의 쿼리 언어(Cypher, GSQL, Gremlin, 그리고 부상하는 GQL 표준)는 관계형 SQL이 기껏해야 어색하게 처리하는 그래프 패턴—관계, 경로, 이웃—을 자연스럽게 표현할 수 있게 한다.

그러나 그래프 데이터베이스 관리 시스템(GDBMSs)은 관계형 대응 시스템에 비해 역사가 짧고 검증도 덜 되어 있다. 선언적 쿼리를 효율적인 실행 계획으로 변환하는 쿼리 옵티마이저는 많은 경우, 수십 년간의 최적화 연구와 프로덕션 환경에서의 강화로 발전해 온 PostgreSQL이나 Oracle과 같은 시스템보다 성숙도가 낮다.

Chen & Yu는 GDBMSs의 쿼리 최적화 버그—옵티마이저가 잘못된 결과를 산출하거나 극단적으로 비효율적인 실행 계획을 생성하는 버그—에 대한 체계적인 연구를 제시한다. 이들의 분석은 관계형 시스템에서는 발생하지 않는 그래프 데이터베이스 고유의 패턴을 밝혀내며, 그래프 쿼리 최적화가 관계형 기법의 적용이 아닌 독자적인 연구 의제를 필요로 함을 시사한다.

버그 분류 체계

저자들은 주요 GDBMSs에서 보고된 수백 건의 버그를 분석하여 세 가지 주요 유형으로 분류하였다.

정확성 버그: 옵티마이저가 잘못된 결과를 반환하는 쿼리 계획을 생성한다. 이는 가장 위험한 유형으로, 오류가 자동으로 드러나지 않는다는 점에서 문제적이다—사용자는 그럴듯해 보이지만 틀린 결과를 수신하게 된다. 사기 탐지 애플리케이션에서 정확성 버그는 시스템이 사기 거래를 놓치게 할 수 있으며, 추천 시스템에서는 관련 없는 추천을 반환할 수 있다.

그래프 특유의 정확성 버그는 다음에서 발생한다:

경로 의미론: 그래프 쿼리는 종종 경로 패턴을 지정한다(A와 B 사이의 길이 ≤ 5인 모든 경로 탐색). 옵티마이저는 경로 유일성 제약—동일 노드를 두 번 방문하는 경로를 집계해야 하는가—을 올바르게 처리해야 한다. GDBMSs마다 이를 다르게 해석하며, 경로 의미론과 관련된 옵티마이저 버그는 흔하게 발생한다.
가변 길이 패턴 매칭: 가변 길이 관계를 포함한 쿼리(MATCH (a)-[*1..5]->(b))는 관계형 시스템에서는 유사 사례가 없는 최적화 과제를 만들어낸다. 옵티마이저는 가변 길이 패턴을 언제 확장하고 탐색 공간을 어떻게 가지치기할지 결정해야 하며, 이 과정에서의 버그는 결과 누락으로 이어진다.

성능 버그: 옵티마이저가 유효하지만 극단적으로 느린 실행 계획을 선택한다. 밀리초 내에 완료되어야 할 쿼리가 옵티마이저가 해시 조인 대신 중첩 루프 조인을 선택하거나, 가변 길이 패턴을 잘못된 순서로 확장하는 바람에 수 시간이 소요된다.

충돌 버그: 옵티마이저가 경계 조건을 만나 충돌을 일으켜—사용자가 결과를 전혀 받지 못하는 상황이 발생한다. 조용한 정확성 버그보다는 덜 위험하지만, 충돌 버그는 GDBMS에 대한 신뢰를 저하시키며 쿼리 결과에 의존하는 애플리케이션에서 연쇄적인 장애를 유발할 수 있다.

패턴과 근본 원인

분석 결과, 그래프 특유의 최적화 버그는 그래프 데이터베이스를 관계형 데이터베이스와 구별짓는 기능들을 중심으로 집중되어 있음이 드러난다.

재귀 쿼리: 그래프 순회는 본질적으로 재귀적이다—그래프를 통해 미지의 길이를 가진 경로를 따라가는 것이다. 쿼리 옵티마이저에서의 재귀 처리는 오류가 발생하기 쉬우며, 특히 재귀를 가지치기해야 하는 필터링 조건과 결합될 때 더욱 그렇다.
프로퍼티 그래프 모델의 복잡성: 고정 스키마를 가진 관계형 테이블과 달리, 프로퍼티 그래프는 노드와 엣지 모두에 임의의 프로퍼티를 허용한다. 옵티마이저는 관계형 옵티마이저가 의존하는 통계적 가정(컬럼 카디널리티, 조인 선택도)이 없는 상태에서 이러한 스키마 유연성을 처리해야 한다.
멀티홉 조인(Multi-hop joins): 단일 그래프 패턴이 수십 개의 조인 연산을 내포할 수 있으며, 각 에지 순회마다 하나씩 발생한다. 조인 순서 탐색 공간은 그에 상응하여 방대해지고, 휴리스틱 가지치기가 최적 계획을 제거할 수 있다.

주장과 근거

주장	근거	판정
GDBMS 쿼리 옵티마이저에 정확성 버그가 존재한다	다수의 GDBMS에 걸쳐 보고된 버그에 대한 체계적 분석	✅ 문서화됨
그래프 특화 쿼리 기능이 RDBMS에는 없는 최적화 과제를 야기한다	경로 의미론, 가변 길이 매칭, 재귀적 순회	✅ 지지됨
성능 버그가 수십 배의 성능 저하를 유발할 수 있다	쿼리 실행 시간이 밀리초에서 수 시간으로 느려지는 버그 보고서	✅ 문서화됨
현재 GDBMS가 성숙한 RDBMS만큼 신뢰할 수 있다	버그 밀도는 그래프 쿼리 최적화의 낮은 성숙도를 시사함	❌ 아직 아님
GQL 표준이 구현 불일치를 줄일 것이다	표준이 부상 중이나, 채택 및 준수 여부는 불확실함	⚠️ 기대되나 미검증

미해결 과제

자동화된 버그 탐지: GDBMS 쿼리 옵티마이저의 정확성을 테스트하는 자동화 도구를 구축할 수 있는가—쿼리를 생성하고, 서로 다른 실행 계획 간 결과를 비교하며, 불일치를 표시하는 방식으로?

그래프 특화 비용 모델: 관계형 비용 모델은 테이블 크기와 선택도를 기반으로 조인 비용을 추정한다. 그래프 순회의 비용이 평탄한 통계가 아닌 그래프 위상(차수 분포, 군집 계수)에 의존할 때, 적절한 비용 모델 기본 요소는 무엇인가?

그래프 옵티마이저의 형식 검증: GDBMS 쿼리 재작성 규칙이 쿼리 의미론을 보존함을 형식적으로 검증할 수 있는가? 이는 관계형 데이터베이스에서는 부분적으로 해결되었으나 그래프 데이터베이스에서는 미개척 영역이다.

벤치마크 표준화: 그래프 데이터베이스 커뮤니티는 관계형 데이터베이스의 TPC-H/TPC-DS에 상응하는 표준화된 벤치마크가 부재하다. 표준 벤치마크 없이는 GDBMS 간 옵티마이저 품질을 비교하기 어렵다.

하이브리드 관계형-그래프 최적화: 많은 애플리케이션이 동일한 데이터에 대해 관계형 쿼리와 그래프 쿼리를 모두 사용한다. 옵티마이저는 두 패러다임에 걸친 쿼리를 어떻게 처리해야 하는가?

연구자에게 주는 시사점

데이터베이스 연구자에게 있어, 그래프 쿼리 최적화는 기초적인 작업이 아직 이루어지지 않은 분야이다. 관계형 쿼리 최적화 문헌은 40년에 걸쳐 수천 편의 논문을 아우르지만, 그래프 분야의 동등한 연구는 초기 단계에 머물러 있다. Chen & Yu가 식별한 버그 패턴은 연구 투자가 가장 필요한 곳에 대한 로드맵을 제공한다.

프로덕션 환경에서 그래프 데이터베이스를 사용하는 실무자에게 있어, 핵심 메시지는 주의이다: 특히 가변 길이 경로 및 재귀 패턴을 포함하는 복잡한 쿼리의 경우, 알려진 정답 데이터와 대조하여 쿼리 결과를 검증해야 한다. 옵티마이저가 틀리는 경우가 잦지는 않겠지만, 틀릴 때는 소리 없이 실패한다.

RAG 시스템, 인용 네트워크, 온톨로지의 백엔드 저장소로 그래프 데이터베이스에 점점 더 의존하는 지식 그래프 및 AI 커뮤니티에게 있어, 옵티마이저의 정확성은 신뢰할 수 있는 AI의 전제 조건이다—기반 데이터 검색에 버그가 있다면, 아무리 정교한 LLM이라도 이를 보완할 수 없다.

References (2)

[1] Chen, Y. & Yu, Z. (2026). Understanding Query Optimization Bugs in Graph Database Systems. ACM ICSE.

DOI Scholar

[2] Soulé, R., Neville-Neil, G., Kasouridis, S. et al. (2025). OSDB: Exposing the Operating System's Inner Database. Semantic Scholar.

Scholar