Paper ReviewComputer SystemsMachine/Deep Learning

Talk to Your Database: Natural Language Queries Through Multi-Modal LLMs

SQL remains the gatekeeping language of enterprise data—accessible to database specialists but opaque to the business users who most need data-driven insights. Multi-modal LLMs that translate natural language questions (and even dashboard screenshots) into database queries promise to democratize data access.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

The gap between data and decision-makers is not technological—it is linguistic. Enterprise databases contain terabytes of structured information that could inform every business decision, from marketing spend allocation to supply chain optimization to customer churn prediction. But accessing this information requires SQL—a language that most decision-makers do not speak and have neither the time nor inclination to learn.

Natural language to SQL (NL-to-SQL) systems have been pursued for decades, with each generation of AI technology bringing incremental improvements. The LLM generation represents something qualitatively different: models that understand not just the syntax of natural language questions but their intent—disambiguating vague questions, inferring implicit constraints, and generating SQL that reflects what the user means rather than merely what they said.

Zhang's system extends this capability to the multi-modal domain, accepting not only text questions but also references to charts, tables, and dashboard visualizations. When a user points at a spike in a revenue graph and asks "Why did this happen?", the system must understand both the visual reference (identifying the time period and metric from the chart) and the causal question (generating SQL that retrieves relevant explanatory data for that period).

Beyond Simple Translation

Early NL-to-SQL systems treated the problem as straightforward translation: parse the natural language question, map entities to table/column names, and generate SQL. This approach handles simple questions ("How many orders were placed last month?") but fails on the complex, ambiguous, contextual questions that real users ask:

Ambiguous references: "Show me our best customers" requires understanding what "best" means in context—highest revenue? Most frequent purchases? Longest tenure?
Implicit joins: "What products do our top customers buy?" requires joining customer, order, and product tables without the user specifying the join path
Temporal context: "How did sales change after the price increase?" requires identifying when the price increase occurred (from the database) and comparing sales before and after
Conversational context: "Now break that down by region" refers to the previous query result—the system must maintain conversational state

LLM-based approaches handle these challenges through the model's understanding of language semantics and its ability to reason about database schema in context. The schema—table names, column types, foreign key relationships—is provided as context, and the LLM generates SQL that reflects both the question's intent and the schema's structure.

ML-Enhanced Query Optimization

Wan's complementary work addresses a parallel challenge within database management systems: optimizing query execution using machine learning. The paper focuses on a core DBMS problem—traditional query optimizers rely on plan enumeration and cost estimation to select the best query plan, but inaccurate cost prediction leads to selecting inefficient plans. Wan proposes a tree-structure-based query plan representation method combined with attention and ranking-based learning to improve cost estimation accuracy.

This work is independent of the NL-to-SQL translation step—it operates at the DBMS level regardless of whether queries arrive via natural language, hand-written SQL, or any other source. Its relevance to the NL-to-SQL context is that any gains in query optimization directly improve the execution efficiency of queries, including those generated by NL-to-SQL systems. The two capabilities—translation and optimization—are complementary layers in a complete data-access pipeline.

Claims and Evidence

Claim	Evidence	Verdict
LLMs improve NL-to-SQL accuracy over rule-based approaches	Consistent finding across multiple NL-to-SQL benchmarks	✅ Supported
Multi-modal input (text + visual) enables richer queries	Zhang demonstrates chart-referencing queries	✅ Demonstrated
NL-to-SQL is production-ready for all query types	Complex analytical queries with multiple joins remain challenging	⚠️ Simple queries: yes; complex: improving
ML-based query plan optimization improves DBMS execution efficiency	Wan demonstrates ranking-learning-based plan selection improvement	✅ Supported
Non-technical users can effectively query databases through NL	Limited user study evidence; usability depends on system's ability to handle ambiguity	⚠️ Promising, needs user validation

Open Questions

Error communication: When the NL-to-SQL system generates incorrect SQL, how should it communicate the error to a non-technical user? Showing the SQL is unhelpful; showing wrong results without warning is dangerous.

Schema complexity: Enterprise databases may have thousands of tables with cryptic column names (CUST_ACCT_STAT_CD). How do LLMs handle schemas where the column names provide little semantic information?

Security and access control: NL-to-SQL must respect the user's data access permissions. A query that is syntactically correct but accesses data the user is not authorized to see must be blocked. How do we integrate row-level and column-level security into the generation pipeline?

Confidence calibration: Can the system express confidence in its SQL generation? A user should know whether the system is confident it understood their question or is guessing—information that determines whether the result can be trusted without manual verification.

Training data bias: NL-to-SQL models trained on benchmark datasets may not generalize to enterprise-specific terminology, table structures, and query patterns. How much enterprise-specific training is needed for production deployment?

What This Means for Your Research

For database researchers, NL-to-SQL via LLMs shifts the research frontier from parsing techniques to intent understanding—ensuring that generated SQL captures what users mean, not just what they say. The multi-modal extension (referencing visualizations, documents, and previous query results) opens a rich design space for conversational data analysis.

For enterprise data teams, NL-to-SQL is approaching the threshold where it can meaningfully expand data access beyond the SQL-literate minority. The practical advice: pilot with simple reporting queries where incorrect results are easily verified, and expand to complex analytics as the system demonstrates reliability.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문을 통해 검증해야 한다.

데이터베이스에 말을 걸다: 멀티모달 LLM을 통한 자연어 쿼리

데이터와 의사결정자 사이의 간극은 기술적인 문제가 아니라 언어적인 문제이다. 기업 데이터베이스에는 마케팅 지출 배분부터 공급망 최적화, 고객 이탈 예측에 이르기까지 모든 비즈니스 의사결정에 활용할 수 있는 테라바이트 규모의 구조화된 정보가 담겨 있다. 그러나 이 정보에 접근하려면 SQL이 필요한데, 대부분의 의사결정자는 SQL을 구사하지 못하며 이를 배울 시간이나 의지도 없다.

자연어를 SQL로 변환하는(NL-to-SQL) 시스템은 수십 년에 걸쳐 연구되어 왔으며, AI 기술의 세대마다 점진적인 개선이 이루어졌다. LLM 세대는 질적으로 다른 무언가를 제시한다. 즉, 자연어 질문의 구문뿐만 아니라 의도까지 파악하는 모델이 등장한 것이다. 이 모델은 모호한 질문을 명확히 하고, 암묵적인 제약 조건을 추론하며, 사용자가 말한 내용 그대로가 아니라 사용자가 의미하는 바를 반영하는 SQL을 생성한다.

Zhang의 시스템은 이 기능을 멀티모달 영역으로 확장하여 텍스트 질문뿐만 아니라 차트, 표, 대시보드 시각화에 대한 참조도 입력으로 받는다. 사용자가 수익 그래프의 급등 부분을 가리키며 "왜 이런 일이 일어났나요?"라고 질문할 때, 시스템은 시각적 참조(차트에서 기간과 지표를 식별하는 것)와 인과 관계 질문(해당 기간의 관련 설명 데이터를 검색하는 SQL 생성) 모두를 이해해야 한다.

단순 번역을 넘어서

초기 NL-to-SQL 시스템은 이 문제를 단순한 번역으로 접근했다. 즉, 자연어 질문을 파싱하고, 개체를 테이블/컬럼 이름에 매핑한 뒤, SQL을 생성하는 방식이다. 이 접근 방식은 단순한 질문("지난달에 주문이 몇 건이나 있었나요?")에는 적용 가능하지만, 실제 사용자가 묻는 복잡하고 모호하며 맥락에 의존하는 질문에는 실패한다.

모호한 참조: "우리의 최우수 고객을 보여주세요"는 문맥에서 "최우수"가 무엇을 의미하는지 이해해야 한다. 가장 높은 매출? 가장 빈번한 구매? 가장 긴 거래 기간?
암묵적 조인: "우리의 최상위 고객들은 어떤 제품을 구매하나요?"는 사용자가 조인 경로를 명시하지 않아도 고객, 주문, 제품 테이블을 조인해야 한다.
시간적 맥락: "가격 인상 이후 매출은 어떻게 변했나요?"는 가격 인상이 언제 발생했는지(데이터베이스에서)를 파악하고 전후 매출을 비교해야 한다.
대화 맥락: "이제 그것을 지역별로 분류해 주세요"는 이전 쿼리 결과를 참조한다. 시스템은 대화 상태를 유지해야 한다.

LLM 기반 접근 방식은 모델의 언어 의미 이해와 맥락 속에서 데이터베이스 스키마를 추론하는 능력을 통해 이러한 과제를 처리한다. 스키마(테이블 이름, 컬럼 유형, 외래 키 관계)는 컨텍스트로 제공되며, LLM은 질문의 의도와 스키마 구조 모두를 반영하는 SQL을 생성한다.

ML 기반 쿼리 최적화

Wan의 보완 연구는 데이터베이스 관리 시스템 내의 병렬적인 과제, 즉 머신러닝을 활용한 쿼리 실행 최적화를 다룬다. 이 논문은 DBMS의 핵심 문제에 초점을 맞춘다. 전통적인 쿼리 옵티마이저는 최적의 쿼리 계획을 선택하기 위해 계획 열거와 비용 추정에 의존하지만, 부정확한 비용 예측으로 인해 비효율적인 계획이 선택된다. Wan은 비용 추정 정확도를 향상시키기 위해 어텐션(attention) 및 랭킹 기반 학습과 결합한 트리 구조 기반의 쿼리 계획 표현 방법을 제안한다. 이 연구는 NL-to-SQL 변환 단계와는 독립적으로 작동하며, 쿼리가 자연어, 직접 작성된 SQL, 또는 다른 어떤 소스를 통해 유입되는지와 무관하게 DBMS 수준에서 동작한다. NL-to-SQL 맥락과의 관련성은, 쿼리 최적화에서 얻은 성과가 NL-to-SQL 시스템에 의해 생성된 쿼리를 포함한 모든 쿼리의 실행 효율성을 직접적으로 향상시킨다는 점에 있다. 변환과 최적화라는 두 가지 기능은 완전한 데이터 접근 파이프라인에서 상호 보완적인 계층을 이룬다.

주장과 근거

주장	근거	판정
LLM은 규칙 기반 접근 방식 대비 NL-to-SQL 정확도를 향상시킨다	다수의 NL-to-SQL 벤치마크에서 일관되게 나타나는 결과	✅ 지지됨
멀티모달 입력(텍스트 + 시각 자료)이 더 풍부한 쿼리를 가능하게 한다	Zhang이 차트 참조 쿼리를 시연함	✅ 입증됨
NL-to-SQL은 모든 쿼리 유형에서 프로덕션 수준으로 준비되어 있다	다중 조인을 포함한 복잡한 분석 쿼리는 여전히 도전적인 과제로 남아 있음	⚠️ 단순 쿼리: 가능; 복잡한 쿼리: 개선 중
ML 기반 쿼리 플랜 최적화는 DBMS 실행 효율성을 향상시킨다	Wan이 순위 학습 기반 플랜 선택 개선을 시연함	✅ 지지됨
비기술적 사용자가 NL을 통해 데이터베이스를 효과적으로 쿼리할 수 있다	사용자 연구 근거가 제한적이며, 사용성은 시스템의 모호성 처리 능력에 따라 달라짐	⚠️ 가능성 있음, 사용자 검증 필요

미해결 과제

오류 전달: NL-to-SQL 시스템이 잘못된 SQL을 생성했을 때, 비기술적 사용자에게 오류를 어떻게 전달해야 하는가? SQL을 그대로 보여주는 것은 도움이 되지 않으며, 경고 없이 잘못된 결과를 보여주는 것은 위험하다.

스키마 복잡성: 엔터프라이즈 데이터베이스는 CUST_ACCT_STAT_CD와 같이 불명확한 컬럼명을 가진 수천 개의 테이블을 포함할 수 있다. LLM은 컬럼명이 의미적 정보를 거의 제공하지 않는 스키마를 어떻게 처리하는가?

보안 및 접근 제어: NL-to-SQL은 사용자의 데이터 접근 권한을 반드시 준수해야 한다. 구문적으로 올바르더라도 사용자가 접근 권한이 없는 데이터에 접근하는 쿼리는 차단되어야 한다. 행 수준 및 열 수준 보안을 생성 파이프라인에 어떻게 통합할 것인가?

신뢰도 보정: 시스템이 SQL 생성에 대한 신뢰도를 표현할 수 있는가? 사용자는 시스템이 질문을 확실히 이해했는지, 아니면 추측하고 있는지 알아야 한다. 이 정보는 수동 검증 없이 결과를 신뢰할 수 있는지를 판단하는 근거가 된다.

학습 데이터 편향: 벤치마크 데이터셋으로 학습된 NL-to-SQL 모델은 엔터프라이즈 고유의 용어, 테이블 구조, 쿼리 패턴에 일반화되지 않을 수 있다. 프로덕션 배포를 위해서는 얼마나 많은 엔터프라이즈 특화 학습이 필요한가?

연구에 주는 시사점

데이터베이스 연구자들에게 있어, LLM을 통한 NL-to-SQL은 연구의 최전선을 파싱 기법에서 의도 이해로 전환시킨다. 즉, 생성된 SQL이 사용자가 말하는 것뿐만 아니라 의미하는 것을 정확히 포착하도록 보장하는 것이 핵심 과제가 된다. 멀티모달 확장(시각화, 문서, 이전 쿼리 결과 참조)은 대화형 데이터 분석을 위한 풍부한 설계 공간을 열어준다.

엔터프라이즈 데이터 팀에게 있어, NL-to-SQL은 SQL에 능숙한 소수를 넘어 데이터 접근을 실질적으로 확장할 수 있는 임계점에 근접하고 있다. 실용적인 조언은 다음과 같다. 잘못된 결과를 쉽게 검증할 수 있는 단순한 리포팅 쿼리부터 시범 적용하고, 시스템이 안정성을 입증함에 따라 복잡한 분석으로 확장해 나가는 것이 바람직하다.

References (2)

[1] Zhang, X. (2025). An Intelligent Database Query and Management System Based on NLP and Multi-Modal Large Models. IEEE DSIS.

DOI Scholar

[2] Wan, S. (2025). Research on Improving the Performance of Query Optimization Framework Based on ML in DBMS. IEEE AIC.