0% found this document useful (0 votes)
12 views

Recent Advances in Text to SQL

Recent Advances in Text to SQL

Uploaded by

Abdul Bari Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Recent Advances in Text to SQL

Recent Advances in Text to SQL

Uploaded by

Abdul Bari Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Recent Advances in Text-to-SQL:

A Survey of What We Have and What We Expect

Naihao Deng Yulong Chen Yue Zhang


University of Michigan Westlake University Westlake University
[email protected] [email protected] [email protected]

Abstract What are the major cities in the state of Kansas?


?

Database End User


Text-to-SQL has attracted attention from both
the natural language processing and database
SELECT T1.CITY_NAME FROM CITY AS T1 WHERE
communities because of its ability to convert Model T1.POPULATION > 150000 AND T1.STATE_NAME = "Kansas" ;
the semantics in natural language into SQL
queries and its practical application in build- Figure 1: The framework for text-to-SQL systems.
ing natural language interfaces to database sys- Given the database schema and user utterance, the sys-
tems. The major challenges in text-to-SQL lie tem outputs a corresponding SQL query to query the
in encoding the meaning of natural utterances, database system for the result. Appendix B gives more
decoding to SQL queries, and translating the se- text-to-SQL examples.
mantics between these two forms. These chal-
lenges have been addressed to different extents
by the recent advances. However, there is still language processing (NLP) and the database (DB)
a lack of comprehensive surveys for this task.
community for decades (Codd, 1970; Hemphill
To this end, we review recent progress on text-
to-SQL for datasets, methods, and evaluation
et al., 1990; Dahl et al., 1994; Zelle and Mooney,
and provide this systematic survey, addressing 1996; Popescu et al., 2003; Bertomeu et al., 2006;
the aforementioned challenges and discussing Wang et al., 2020a; Scholak et al., 2021b).
potential future directions. We hope that this The challenges in text-to-SQL lie within three
survey can serve as quick access to existing aspects: (1) extracting the meaning of natural ut-
work and motivate future research. 1 terances (encoding); (2) transforming the extracted
meaning into another expression which is pragmat-
1 Introduction
ically equivalent to the NL meaning (translating)
The task of text-to-SQL is to convert natural ut- and; (3) producing the corresponding SQL queries
terances into SQL queries (Zhong et al., 2017; Yu (decoding). A wide range of methods has been in-
et al., 2018c). Figure 1 shows an example. Given vestigated to address the technical challenges, from
a user utterance “What are the major cities in the representation learning, intermediate structures, de-
state of Kansas?”, the system outputs a correspond- coding, model structures, training objectives, and
ing SQL that can be used for retrieving the answer other perspectives. In addition, much work has
from a database. It builds a natural language in- been conducted on data resources and evaluation.
terface to the database (NLIDB) to help lay users However, relatively little work has been done in the
access information in the database (Popescu et al., literature to provide a comprehensive survey of the
2003; Li and Jagadish, 2014), inspiring research landscape. The only exceptions are (Katsogiannis-
in human-computer interaction (Elgohary et al., Meimarakis and Koutrika, 2021) and (Kalajdjieski
2020). Because the SQL query can be regarded as et al., 2020), but they cover a limited scope. To this
a semantic representation (Guo et al., 2020), text- end, we aim to provide a systematic survey that
to-SQL is also a representative task in semantic involves a broader range of text-to-SQL research
parsing, helping downstream applications such as and addresses the aforementioned challenges.
question answering (Wang et al., 2020d). Thus, In this paper, we survey the recent progress on
text-to-SQL has attracted researchers in the natural text-to-SQL, from datasets (§ 2), methods (§ 3)
1
The Github Link for this survey is: https:// to evaluation (§ 4) 2 and highlight potential direc-
github.com/text-to-sql-survey-coling22/
2
text-to-sql-survey-coling22.github.io. Note that most work discussed in this paper is in English
2166
Proceedings of the 29th International Conference on Computational Linguistics, pages 2166–2187
October 12–17, 2022.
Datasets #Size #DB #D #T/DB Issues addressed Sources for data
College courses,
Domain
Spider (Yu et al., 2018c) 10,181 200 138 5.1 DabaseAnswers,
generalization
WikiSQL
WikiSQL (Zhong et al., 2017) 80,654 26,521 - 1 Data size Wikipedia
Lexicon-level
Squall (Shi et al., 2020b) 11,468 1,679 - 1 WikiTableQuestions
supervision
Domain
KaggleDBQA (Lee et al., 2021) 272 8 8 2.3 Real web daabases
generalization
Internet Movie
IMDB (Yaghmazadeh et al., 2017) 131 1 1 16 -
Database
Yelp (Yaghmazadeh et al., 2017) 128 1 1 7 - Yelp website
University of
Advising (Finegan-Dollak et al., 2018) 3,898 1 1 10 - Michigan course
information
MIMICSQL (Wang et al., 2020d) 10,000 1 1 5 - Healthcare domain
SQL template
SEDE (Hazoom et al., 2021) 12,023 1 1 29 Stack Exchange
diversity

Table 1: The statistic for recent text-to-SQL datasets. #Size, #DB, #D, and #T/DB represent the numbers of
question-SQL pairs, databases, domains, and the averaged number of tables per domain, respectively. The “-” in the
#D column indicates an unknown number of domains, and the “-” in the Issues Addressed indicates no specific
issue addressed by the dataset. Datasets above and below the line are cross-domain and single-domain, respectively.
The complete statistic is listed in Table 7 in Appendix C.

tions for future work (§ 5). Apendix A shows the domains (Finegan-Dollak et al., 2018; Yu et al.,
topology for the text-to-SQL task. 2018c). However, since these datasets are adapted
from real-life applications, most of them contain
2 Datasets domain knowledge (Gan et al., 2021b) and dataset
conventions (Suhr et al., 2020). Thus, they are still
As shown in Table 1, existing text-to-SQL datasets
valuable to evaluate models’ ability to generalize
can be classified into three categories: single-
to new domains and explore how to incorporate do-
domain datasets, cross-domain datasets and others.
main knowledge and dataset convention to model
Single-Domain Datasets Single-domain text-to- predictions.
SQL datasets typically collect question-SQL pairs Appendix B gives a detailed discussion on do-
for a single database in some real-world tasks, in- main knowledge and dataset convention, and con-
cluding early ones such as Academic (Li and Ja- crete text-to-SQL examples.
gadish, 2014), Advising (Finegan-Dollak et al.,
Large Scale Cross-domain Datasets Large
2018), ATIS (Price, 1990; Dahl et al., 1994),
cross-domain datasets such as WikiSQL (Zhong
GeoQuery (Zelle and Mooney, 1996), Yelp and
et al., 2017) and Spider (Yu et al., 2018c) are pro-
IMDB (Yaghmazadeh et al., 2017), Scholar (Iyer
posed to better evaluate deep neural models. Wik-
et al., 2017) and Restaurants (Tang and Mooney,
iSQL uses tables extracted from Wikipedia and lets
2000; Popescu et al., 2003), as well as recent ones
annotators paraphrase questions generated for the
such as SEDE (Hazoom et al., 2021), ESQL (Chen
tables. Compared to other datasets, WikiSQL is an
et al., 2021a) and MIMICSQL (Wang et al., 2020d).
order of magnitude larger, containing 80,654 nat-
These single-domain datasets, particularly the
ural utterances in total (Zhong et al., 2017). How-
early ones, are usually limited in size, containing
ever, WikiSQL contains only simple SQL queries,
only a few hundred to a few thousand examples. Be-
and only a single table is queried within each SQL
cause of the limited size and similar SQL patterns
query (Yu et al., 2018c).
in training and testing phases, text-to-SQL mod-
Yu et al. (2018c) propose Spider, which contains
els that are trained on these single-domain datasets
200 databases with an average of 5 tables for each
can achieve decent performance by simply mem-
database, to test models’ performance on compli-
orizing the SQL patterns and fail to generalize to
cated unseen SQL queries and their ability to gen-
unseen SQL queries or SQL queries from other
eralize to new domains. Furthermore, researchers
unless otherwise specified. expand Spider to study various issues of their inter-
2167
est (Lei et al., 2020; Zeng et al., 2020; Gan et al., and attain robustness towards different types of
2021b; Taniguchi et al., 2021; Gan et al., 2021a). questions (Radhakrishnan et al., 2020) .
Besides, researchers build several large-scale
text-to-SQL datasets in different languages such as Typical data augmentation techniques involve
CSpider (Min et al., 2019a), TableQA (Sun et al., paraphrasing questions and filling pre-defined tem-
2020), DuSQL (Wang et al., 2020c) in Chinese, plates for increasing data diversity. Iyer et al. (2017)
ViText2SQL (Tuan Nguyen et al., 2020) in Viet- use the Paraphrase Database (PPDB) (Ganitkevitch
namese, and PortugueseSpider (José and Cozman, et al., 2013) to generate paraphrases for training
2021) in Portuguese. Given that human transla- questions. Appendix B gives an example of this
tion has shown to be more accurate than machine augmentation method. Iyer et al. (2017) and Yu
translation (Min et al., 2019a), these datasets are an- et al. (2018b) collect question-SQL templates and
notated mainly by human experts based on the En- fill in them with DB schema. Researchers also em-
glish Spider dataset. These Spider-based datasets ploy neural models to generate natural utterances
can serve as potential resources for multi-lingual for sampled SQL queries to acquire more data. For
text-to-SQL research. instance, Li et al. (2020a) fine-tune pre-trained T5
model (Raffel et al., 2019) using SQL query as the
Other Datasets Several context-dependent text-
input to predict natural utterance on WikiSQL, and
to-SQL datasets have been proposed, which involve
then randomly synthesize SQL queries from tables
user interactions with the text-to-SQL system in
in WikiSQL and use the tuned model to generate
English (Price, 1990; Dahl et al., 1994; Yu et al.,
the corresponding natural utterance.
2019a,b) and Chinese (Guo et al., 2021). In addi-
tion, researchers collect datasets to study questions
in text-to-SQL being answerable or not (Zhang The quality of the augmented data is impor-
et al., 2020), lexicon-level mapping (Shi et al., tant because low-quality data can hurt the perfor-
2020b) and cross-domain evaluation for real Web mance of the models (Wu et al., 2021). Various
databases (Lee et al., 2021). approaches have been exploited to improve the
Appendix C.1 discusses more details about quality of the augmented data. After sampling
datasets mentioned in § 2. SQL queries, Zhong et al. (2020b) employ an utter-
ance generator to generate natural utterances and
3 Methods a semantic parser to convert the generated natural
utterance to SQL queries. To filter out low-quality
Early text-to-SQL systems employ rule-based and augmented data, Zhong et al. (2020b) only keep
template-based methods (Li and Jagadish, 2014; data that have the same generated SQL queries as
Mahmud et al., 2015), which is suitable for simple the sampled ones. Wu et al. (2021) use a hierarchi-
user queries and databases. However, with the cal SQL-to-question generation process to obtain
progress in both DB and NLP communities, recent high-quality data. Observing that there is a strong
work focuses on more complex settings (Yu et al., segment-level mapping between SQL queries and
2018c). In these settings, deep models can be more natural utterances, Wu et al. (2021) decompose
useful because of their great feature representation SQL queries into several clauses, translate each
ability and generalization ability. clause into a sub-question, and then combine the
In this survey, we focus on the deep learn- sub-questions into a complete question.
ing methods primarily. We divide these meth-
ods employed in text-to-SQL research into Data To increase the diversity of the augmented
Augmentation (§ 3.1), Encoding (§ 3.2), Decod- data, Guo et al. (2018) incorporate a latent variable
ing (§ 3.3), Learning Techniques (§ 3.4), and Mis- in their SQL-to-text model to encourage question
cellaneous (§ 3.5). diversity. Radhakrishnan et al. (2020) augment the
WikiSQL dataset by simplifying and compressing
3.1 Data Augmentation
questions to simulate the colloquial query behavior
Data augmentation can help text-to-SQL models of end-users. Wang et al. (2021b) exploit a proba-
handle complex or unseen questions (Zhong et al., bilistic context-free grammar (PCFG) to explicitly
2020b; Wang et al., 2021b), achieve state-of-the- model the composition of SQL queries, encourag-
art with less supervised data (Guo et al., 2018), ing sampling compositional SQL queries.
2168
Methods Adopted by
Applied graph.
datasets
Graphs have also been used to encode questions
TypeSQL (Yu et al., together with DB schema. Researchers have been
Encode type WikiSQL
2018a)
GNN (Bogin et al., using different types of graphs to capture the se-
Graph-based Spider
2019a) mantics in NL and facilitate linking between NL
RAT-SQL (Wang
Self-attention
et al., 2020a)
Spider and table schema. Cao et al. (2021) adopt line
SQLova (Hwang graph (Gross et al., 2018) to capture multi-hop
Adapt PLM WikiSQL
et al., 2019) semantics by meta-path (e.g., an exact match for
TaBERT (Yin et al.,
Pre-training
2020)
Spider a question token and column, together with the
column belonging to a table can form a 2-hop
Table 2: Typical methods used for encoding in text- meta-path) and distinguish between local and non-
to-SQL. The full table of existing methods and more local neighbors so that different tables and columns
details are listed in Table 8 in Appendix D. will be attended differently. SADGA (Cai et al.,
2021) adopts the graph structure to provide a uni-
fied encoding for both natural utterances and DB
3.2 Encoding
schemas to help question-schema linking. Apart
Various methods have been adopted to address the from the relations between entities in both ques-
challenges of representing the meaning of ques- tions and DB schema, the structure for DB schemas,
tions, representing the structure for DB schema, S2 SQL (Hui et al., 2022) integrates syntax de-
and linking the DB content to question. We group pendency among question tokens into the graph
them into five categories, as shown in Table 2. to improve model performance. To improve the
Encode Token Types To better encode keywords generalization of the graph method for unseen do-
such as entities and numbers in questions, Yu et al. mains, ShawdowGNN (Chen et al., 2021b) ignores
(2018a) assign a type to each word in the question, names of tables or columns in the database and
with a word being an entity from the knowledge uses abstract schemas in the graph projection neu-
graph, a column, or a number. Yu et al. (2018c) con- ral network to obtain delexicalized representations
catenate word embeddings and the corresponding of questions and DB schemas.
type embeddings to feed into their model. Finally, graph-based techniques are also ex-
ploited in context-dependent text-to-SQL. For in-
Graph-based Methods Since DB schemas con- stance, IGSQL (Cai and Wan, 2020) uses a graph
tain rich structural information, graph-based meth- encoder to utilize historical information of DB
ods are used to better encode such structures. schemas in the previous turns.
As summarized in § 2, datasets prior to Spider
typically involve simple DBs that contain only one Self-attention Models using transformer-based
table or a single DB in both training and testing. encoder (He et al., 2019; Hwang et al., 2019; Xie
As a result, modeling DB schema receives little et al., 2022) incorporate the original self-attention
attention. Because Spider contains complex and mechanism by default because it is the building
different DB in training and testing, Bogin et al. block of the transformer structure.
(2019a) propose to use graphs to represent the struc- RAT-SQL (Wang et al., 2020a) applies relation-
ture of the DB schemas. Specifically, Bogin et al. aware self-attention, a modified version of self-
(2019a) use nodes to represent tables and columns, attention (Vaswani et al., 2017), to leverage rela-
edges to represent relationships between tables and tions of tables and columns. DuoRAT (Scholak
columns, such as tables containing columns, pri- et al., 2021a) also adopts such a relation-aware
mary key, and foreign key constraints, and then self-attention in their encoder.
use graph neural networks (GNNs) (Li et al., 2016)
to encode the graph structure. In their subsequent Adapt PLM Various methods have been pro-
work, Bogin et al. (2019b) use a graph convolu- posed to leverage the knowledge in pre-trained lan-
tional network (GCN) to capture DB structures and guage models (PLMs) and better align PLM with
a gated GCN to select the relevant DB information the text-to-SQL task. PLMs such as BERT (Devlin
for SQL generation. RAT-SQL (Wang et al., 2020a) et al., 2019) are used to encode questions and DB
encodes more relationships for DB schemas such schemas. The modus operandi is to input the con-
as “both columns are from the same table” in their catenation of question words and schema words
2169
to the BERT encoder (Hwang et al., 2019; Choi Methods Adopted by
Applied
et al., 2021). Other methods adjust the embeddings datasets
by PLMs. On WikiSQL, for instance, X-SQL (He SyntaxSQLNet (Yu
Tree Spider
et al., 2018b)
et al., 2019) replaces segment embeddings from SQLNet (Xu et al.,
Sketch WikiSQL
the pre-trained encoder by column type embed- 2017)
dings. Guo and Gao (2019) encode two additional SmBop (Rubin and
Bottom-up Spider
Berant, 2021)
feature vectors for matching between question to- Attention Wang et al. (2019) WikiSQL
kens and table cells as well as column names and Copy Wang et al. (2018a) WikiSQL
concatenate them with BERT embeddings of ques- IRNet (Guo et al.,
IR Spider
2019)
tions and DB schemas. Global-GCN Bogin
Spider
HydraNet (Lyu et al., 2020) uses BERT to Others et al. (2019b)
Kelkar et al. (2020) Spider
encode the question and an individual column,
aligning with the tasks BERT is pre-trained on. Table 3: Typical methods used for decoding in text-to-
After obtaining the BERT representations of all SQL. The full table and more details are listed in Table 9
columns, Lyu et al. (2020) select top-ranked in Appendix D. IR: Intermediate Representation.
columns for SQL prediction. Liu et al. (2021b)
train an auxiliary concept prediction module to pre-
dict which tables and columns correspond to the and other technologies.
question. They detect important question tokens by Tree-based Seq2Tree (Dong and Lapata, 2016)
detecting the largest drop in the confidence score employs a decoder that generates logical forms in a
caused by erasing that token in the question. Lastly, top-down manner. The components in the sub-tree
they train the PLM with a grounding module us- are generated conditioned on their parents apart
ing the question tokens and the corresponding ta- from the input question. Note that the syntax of
bles as well as columns. By empirical studies, Liu the logical forms is implicitly learned from data
et al. (2021b) claim that their approach can awaken for Seq2Tree. Similarly, Seq2AST (Yin and Neu-
the latent grounding from PLM via this erase-and- big, 2017) uses an abstract syntax tree (AST) for
predict technique. decoding the target programming language, where
the syntax is explicitly integrated with AST. Al-
Pre-training There have been various works
though both Seq2Tree (Dong and Lapata, 2016)
proposing different pre-training objectives and us-
and Seq2AST (Yin and Neubig, 2017) do not study
ing different pre-training data to better align the
text-to-SQL datasets, their uses of trees inspire
transformer-based encoder with the text-to-SQL
tree-based decoding in text-to-SQL. SyntaxSQL-
task. For instance, TaBERT (Yin et al., 2020)
Net (Yu et al., 2018b) employs a tree-based decod-
uses tabular data for pre-training with objectives
ing method specific to SQL syntax and recursively
of masked column prediction and cell value recov-
calls modules to predict different SQL components.
ery to pre-train BERT. Grappa (Yu et al., 2021)
synthesizes question-SQL pairs over tables and Sketch-based SQLNet (Xu et al., 2017) designs
pre-trains BERT with the objectives of masked lan- a sketch aligned with the SQL grammar, and SQL-
guage modeling (MLM) and predicting whether a Net only needs to fill in the slots in the sketch
column appears in the SQL query as well as what rather than predict both the output grammar and
SQL operations are triggered. GAP (Shi et al., the content. Besides, the sketch captures the de-
2020a) pre-trains BART (Lewis et al., 2020) on pendency of the predictions. Thus, the prediction
synthesized text-to-SQL and tabular data with the of one slot is only conditioned on the slots it de-
objectives of MLM, column prediction, column pends on, which avoids issues of the same SQL
recovery, and SQL generation. query with varied equivalent serializations. Dong
and Lapata (2018) decompose the decoding into
3.3 Decoding two stages, where the first decoder predicts a rough
Various methods have been proposed for decoding sketch, and the second decoder fills in the low-
to achieve a fine-grained and easier process for level details conditioned on the question and the
SQL generation and bridge the gap between natural sketch. Such coarse-to-fine decoding has also been
language and SQL queries. As shown in Table 3, adopted in other works such as IRNet (Guo et al.,
we group these methods into five main categories 2019). To address the complex SQL queries with
2170
nested structures, RYANSQL (Choi et al., 2021) dependent text-to-SQL task (Wang et al., 2020b).
recursively yields SELECT statements and uses a
sketch-based slot filling for each of the SELECT Intermediate Representations Researchers use
statements. intermediate representations to bridge the gap be-
tween natural language and SQL queries. Inc-
Bottom-up Both the tree-based and the sketch- SQL (Shi et al., 2018) defines actions for different
based decoding mechanisms can be viewed as SQL components and let decoder decode actions
top-down decoding mechanisms. Rubin and Be- instead of SQL queries. IRNet (Guo et al., 2019)
rant (2021) use a bottom-up decoding mechanism. introduces SemQL, an intermediate representation
Given K trees of height t, the decoder scores trees for SQL queries that can cover most of the chal-
with height t + 1 constructed by SQL grammar lenging Spider benchmark. Specifically, SemQL
from the current beam, and K trees with the high- removes the JOIN ON, FROM and GROUP BY
est scores are kept. Then, a representation of the clauses, merges HAVING and WHERE clause for
new K trees is generated and placed in the new SQL queries. ValueNet (Brunner and Stockinger,
beam. 2021) uses SemQL 2.0, which extends SemQL to
include value representation. Based on SemQL,
Attention Mechanism To integrate the encoder-
NatSQL (Gan et al., 2021c) removes the set op-
side information at decoding, an attention score is
erators 3 . Suhr et al. (2020) implement SemQL
computed and multiplied with hidden vectors from
as a mapping from SQL to a representation with
the encoder to get the context vector, which is then
an under-specified FROM clause, which they call
used to generate an output token (Dong and Lapata,
SQLU F . Rubin and Berant (2021) employ a rela-
2016; Zhong et al., 2017).
tional algebra augmented with SQL operators as
Variants of the attention mechanism have been
the intermediate representations.
used to better propagate the information encoded
However, the intermediate representations are
from questions and DB schemas to the decoder.
usually designed for a specific dataset and cannot
SQLNet (Xu et al., 2017) designs column atten-
be easily adapted to others (Suhr et al., 2020). To
tion, where it uses hidden states from columns
construct a more generalized intermediate represen-
multiplied by embeddings for the question to cal-
tation, Herzig et al. (2021) propose to omit tokens
culate attention scores for a column given the ques-
in the SQL query that do not align to any phrase in
tion. Guo and Gao (2018) incorporate bi-attention
the utterance.
over question and column names for SQL com-
Inspired by the success of text-to-SQL task,
ponent selection. Wang et al. (2019) adopt a
intermediate representations are also studied
structured attention (Kim et al., 2017) by comput-
for SPARQL, another executable language for
ing the marginal probabilities to fill in the slots
database systems (Saparina and Osokin, 2021;
in their generated abstract SQL queries. Duo-
Herzig et al., 2021).
RAT (Scholak et al., 2021a) adopts the relation-
aware self-attention mechanism in both its encoder Others PICARD (Scholak et al., 2021b) and
and decoder. Other works that use sequence-to- UniSAr (Dou et al., 2022) set constraints to the
sequence transformer-based models or decoder- decoder to prevent generating invalid tokens. Sev-
only transformer-based models incorporate the self- eral methods adopt an execution-guided decoding
attention mechanism by default (Scholak et al., mechanism to exclude non-executable partial SQL
2021b; Xie et al., 2022). queries from the output candidates (Wang et al.,
Copy Mechanism Seq2AST (Yin and Neubig, 2018b; Hwang et al., 2019). Global-GNN (Bogin
2017) and Seq2SQL (Zhong et al., 2017) employ et al., 2019b) employs a separately trained discrim-
the pointer network (Vinyals et al., 2015) to com- inative model to rerank the top-K SQL queries
pute the probability of copying words from the in the decoder’s output beam, which is to reason
input. Wang et al. (2018a) use types (e.g., columns, about the complete SQL queries instead of con-
SQL operators, constant from questions) to explic- sidering each word and DB schemas in isolation.
itly restrict locations in the query to copy from Similarly, Kelkar et al. (2020) train a separate dis-
and develop a new training objective to only copy criminator to better search among candidate SQL
from the first occurrence in the input. In addition, 3
The operators that combine the results of two or more
the copy mechanism is also adopted in context- SELECT statements, such as INTERSECT
2171
queries. Xu et al. (2017); Yu et al. (2018b); Guo 2012) to learn an auxiliary reward to discount spu-
and Gao (2018); Lee (2019) use separate submod- rious SQL queries in SQL generation. Min et al.
ules to predict different SQL components, eas- (2019b) model the possible SQL queries as a dis-
ing the difficulty of generating a complete SQL crete latent variable and adopt a hard-EM-style
query. Chen et al. (2020b) employ a gate to select parameter updates, letting their model take advan-
between the output sequence encoded for the ques- tage of the possible pre-computed solutions.
tion and the output sequence from the previous
decoding steps at each step for SQL generation. In- 3.5 Miscellaneous
spired by machine translation, Müller and Vlachos In DB linking, BRIDGE (Lin et al., 2020) appends
(2019) apply byte-pair encoding (BPE) (Sennrich a representation for the DB cell values mentioned
et al., 2016) to compress SQL queries to shorter in the question to corresponding fields in the en-
sequences guided by AST, reducing the difficulties coded sequence, which links the DB content to the
in SQL generation. question. Ma et al. (2020) employ an explicit ex-
tractor of slots mentioned in the question and then
3.4 Learning Techniques link them with DB schemas.
Apart from end-to-end supervised learning, differ- Model-wise, Finegan-Dollak et al. (2018) use a
ent learning techniques have been proposed to help template-based model which copies slots from the
text-to-SQL research. Here we summarize these question. Shaw et al. (2021) use a hybrid model
learning techniques, each addressing a specific is- which firstly uses a high precision grammar-based
sue for the task. approach (NQG) to generate SQL queries, then
uses T5 (Raffel et al., 2019) as a back-up if NQG
Fully supervised Ni et al. (2020) adopt active fails. Yan et al. (2020) formulate submodule slot-
learning to save human annotation. Yao et al. (2019, filling as machine reading comprehension (MRC)
2020); Li et al. (2020b) employ interactive or imi- task and apply BERT-based MRC models on it.
tation learning to enhance text-to-SQL systems via Besides, DT-Fixup (Xu et al., 2021) designs an
interactions with end-users. Huang et al. (2018); optimization approach for a deeper Transformer on
Wang et al. (2021a); Chen et al. (2021a) adopt small datasets for the text-to-SQL task.
meta-learning (Finn et al., 2017) for domain gen- In SQL generation, IncSQL (Shi et al., 2018)
eralization. Various multi-task learning settings allows parsers to explore alternative correct action
have been proposed to improve text-to-SQL mod- sequences to generate different SQL queries. Brun-
els via enhancing their abilities on some relevant ner and Stockinger (2021) search values in DB to
tasks. Chang et al. (2020) set an auxiliary task insert values into SQL query.
of mapping between column and condition values. For context-dependent text-to-SQL, researchers
SeaD (Xuan et al., 2021) integrates two denoising adopt techniques such as turn-level encoder and
objectives to help the model better encode infor- copy mechanism (Suhr et al., 2018; Zhang et al.,
mation from the structural data. Hui et al. (2021b) 2019; Wang et al., 2020b), constrained decod-
integrate a task of learning the correspondence be- ing (Wang et al., 2020b), dynamic memory decay
tween questions and DB schemas. Shi et al. (2021) mechanism (Hui et al., 2021a), treating questions
integrate a column classification task to classify and SQL queries as two modalities, and using bi-
which columns appear in the SQL query. McCann modal pre-trained models (Zheng et al., 2022).
et al. (2018) and Xie et al. (2022) train their models
with other semantic parsing tasks, which improves 4 Evaluation
models’ performance on text-to-SQL task.
Metrics Table 4 shows widely used automatic
Weakly supervised Seq2SQL (Zhong et al., evaluation metrics for the text-to-SQL task. Early
2017) use reinforcement learning to learn WHERE works evaluate SQL queries by comparing the
clause to allow different orders for components in database querying results executed from the pre-
WHERE clause. Liang et al. (2018) leverage mem- dicted SQL query and the ground-truth (or gold)
ory buffer to reduce the variance of policy gradient SQL query (Zelle and Mooney, 1996; Yagh-
estimates when applying reinforcement learning mazadeh et al., 2017) or use exact string match
to text-to-SQL. Agarwal et al. (2019) use meta- to compare the predicted SQL query with the gold
learning and Bayesian optimization (Snoek et al., one query (Finegan-Dollak et al., 2018). However,
2172
Metrics Datasets Errors performance loss when tested against different text-
Naiive Execution GeoQuery, IMDB, False to-SQL datasets from other domains (Suhr et al.,
Accuracy Yelp, WikiSQL, etc positive 2020; Lee et al., 2021). It is unclear how to in-
Advising, WikiSQL, False
Exact String Match corporate domain knowledge to the models trained
etc negative
Exact Set Match Spider
False on Spider and deploy these models efficiently on
negative different domains, especially those with similar in-
Test Suite Accuracy
(execution accuracy Spider, GeoQuery, False formation stored in DB but slightly different DB
with generated etc positive schemas. Although large-scale datasets promote
databases) the cross-domain settings, question-SQL pairs from
Spider are free from domain knowledge, ambiguity,
Table 4: The summary of metrics, datasets that use these
metrics, and their potential error cases. or domain convention. Thus, cross-domain text-to-
SQL needs to be studied in future research to build
a practical cross-domain system that can handle
execution accuracy can create false positives for se-
real-world requests.
mantically different SQL queries even if they yield
the same execution results (Yu et al., 2018c). The There are different use cases in real-world sce-
exact string match can be too strict as two different narios, which requires models to be robust to dif-
strings can still have the same semantics (Zhong ferent settings and be smart to handle different user
et al., 2020a). Aware of these issues, Yu et al. requests. For instance, the model trained with DB
(2018c) adopt exact set match (ESM) in Spider, schemas can need to handle a corrupted table, or no
deciding the correctness of SQL queries by com- table is provided in its practical use. Besides, the
paring the sub-clauses of SQL queries. Zhong et al. input from users can vary from the standard ques-
(2020a) generate databases that can distinguish the tion input in Spider or WikiSQL, which poses chal-
predicted SQL query and gold one. Both methods lenges to models trained on these datasets. More
are used as official metrics on Spider. user studies need to be done to study how well
the current systems serve the end-users and the in-
Evaluation Setup Early single-domain datasets put pattern from the end-users. Apart from SQL
typically use the standard train/dev/test split (Iyer queries, administrators can want to change DB
et al., 2017) by splitting the question-SQL pairs ran- schemas, where a system that can translate the
domly. To evaluate generalization to unseen SQL natural language to such DB commands can be
queries within the current domain, Finegan-Dollak helpful. Also, although there are already works
et al. (2018) propose SQL query split, where no on text-to-SQL beyond English (Min et al., 2019a;
SQL query is allowed to appear in more than one Tuan Nguyen et al., 2020; José and Cozman, 2021),
set among the train, dev, and test sets. Further- but we still lack a comprehensive study on multi-
more, Yu et al. (2018c) propose a database split, lingual text-to-SQL, which can be challenging but
where the model does not see the databases in the useful in real-life scenarios. Finally, it is important
test set in its training time. Other splitting methods to build NLIDB for people with disabilities. Song
also exist to help different research topics (Shaw et al. (2022) propose speech-to-SQL that translates
et al., 2021; Chang et al., 2020). voice input to SQL queries, which helps visually
impaired end users. More work can be done to
5 Discussion and Future Directions address various needs from the perspective of end-
Ever since the LUNAR system (Woods et al., 1972; users, in particular, the needs from minorities.
Woods, 1973), systems for retrieving DB informa- Text-to-SQL research can also be integrated into
tion have witnessed an increasing amount of re- a larger scope of research. Application-wise, Xu
search interest and an enormous growth, especially et al. (2020) develop a question answering system
in the field of text-to-SQL in the deep learning for the database, Chen et al. (2020a) generate task-
era. With the ever-increasing model performance oriented dialogue by retrieving knowledge from the
on the WikiSQL and Spider leaderboards, one can database using the text-to-SQL model. An example
be optimistic because models are becoming more of the possible directions is to employ the text-to-
sophisticated than ever. But there are still several SQL model to query databases for fact-checking.
challenges to overcome. Research-wise, Guo et al. (2020) compare SQL
First, these sophisticated models suffer a great queries to other logical forms in semantic pars-
2173
ing, Xie et al. (2022) include text-to-SQL as one of Stewart for proofreading and suggestions. The
the tasks to achieve a generalized semantic parsing work is funded by the Zhejiang Province Key
framework. The inter-relations between various Project 2022SDXHDX0003.
logical forms in semantic parsing can be further
studied. A generalized framework or a general-
ized model can come as the fruit for our semantic References
parsing community. Rishabh Agarwal, Chen Liang, Dale Schuurmans, and
In hindsight, the development of text-to-SQL Mohammad Norouzi. 2019. Learning to generalize
from sparse and underspecified rewards. In Proceed-
has been pushed by the innovation in the general ings of the 36th International Conference on Machine
ML/NLP community, such as LSTM (Hochreiter Learning, ICML 2019, 9-15 June 2019, Long Beach,
and Schmidhuber, 1997), self-attention (Vaswani California, USA, volume 97 of Proceedings of Ma-
et al., 2017), PLMs (Devlin et al., 2019), etc. Re- chine Learning Research, pages 130–140. PMLR.
cently, prompt learning has achieved decent perfor- Núria Bertomeu, Hans Uszkoreit, Anette Frank, Hans-
mance on various tasks, in particular, in the low- Ulrich Krieger, and Brigitte Jörg. 2006. Contextual
resource setting (Liu et al., 2021a). Such charac- phenomena and thematic relations in database QA
dialogues: results from a Wizard-of-Oz experiment.
teristics align well with the expectation of having a
In Proceedings of the Interactive Question Answer-
functional text-to-SQL model with a few training ing Workshop at HLT-NAACL 2006, pages 1–8, New
samples. Some recent works already explore apply- York, NY, USA. Association for Computational Lin-
ing prompt learning to the text-to-SQL task (Xie guistics.
et al., 2022). The practical expectation for the Shikhar Bharadwaj and Shirish Shevade. 2022. Effi-
text-to-SQL task is to deploy the model in differ- cient constituency tree based encoding for natural
ent scenarios, requiring robustness across domains. language to bash translation. In Proceedings of the
However, prompt learning struggles with being ro- 2022 Conference of the North American Chapter of
the Association for Computational Linguistics: Hu-
bust, and the performance can be easily affected man Language Technologies, pages 3159–3168.
by the selected data. This misalignment encour-
ages researchers to study how to employ prompt Ben Bogin, Jonathan Berant, and Matt Gardner. 2019a.
Representing schema structure with graph neural net-
learning in the real-world text-to-SQL task, which
works for text-to-SQL parsing. In Proceedings of the
can need further understanding of the cross-domain 57th Annual Meeting of the Association for Computa-
challenges for text-to-SQL. tional Linguistics, pages 4560–4565, Florence, Italy.
Another line of research is to evaluate these so- Association for Computational Linguistics.
phisticated text-to-SQL systems. The typical mea- Ben Bogin, Matt Gardner, and Jonathan Berant. 2019b.
sure is to evaluate the performance of the system Global reasoning over database structures for text-
on some existing datasets. As there are operational to-SQL parsing. In Proceedings of the 2019 Confer-
systems using NL input to perform tasks such as ence on Empirical Methods in Natural Language Pro-
cessing and the 9th International Joint Conference
getting answers from database management system on Natural Language Processing (EMNLP-IJCNLP),
or building ontologies or playing some games, the pages 3659–3664, Hong Kong, China. Association
performance of these systems can be measured by for Computational Linguistics.
the diminution of the (human) time taken to get Sridevi Bonthu, S Rama Sree, and MHM Kr-
the searched information (Deng et al., 2021; Zhou ishna Prasad. 2021. Text2PyCode: Machine trans-
et al., 2022). While there are context-dependent lation of natural language intent to python source
text-to-SQL datasets available (Yu et al., 2019a,b), code. In International Cross-Domain Conference for
Machine Learning and Knowledge Extraction, pages
researchers can draw inspirations from other fields 51–60. Springer.
of research (Zellers et al., 2021) to design interac-
tive set-ups to evaluate text-to-SQL systems. Ap- Ursin Brunner and Kurt Stockinger. 2021. Valuenet:
A natural language-to-SQL system that learns from
pendix E discusses tasks relevant to the task of
database information. In 2021 IEEE 37th Inter-
text-to-SQL. national Conference on Data Engineering (ICDE),
pages 2177–2182. IEEE.
Acknowledgement
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang
Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ra-
Yue Zhang is the corresponding author. We thank madan, and Milica Gašić. 2018. MultiWOZ - a large-
all reviewers for their insightful comments, and scale multi-domain Wizard-of-Oz dataset for task-
Rada Mihalcea, Siqi Shen, Winston Wu and Ian oriented dialogue modelling. In Proceedings of the
2174
2018 Conference on Empirical Methods in Natural text-to-SQL in cross-domain databases. Computa-
Language Processing, pages 5016–5026, Brussels, tional Linguistics, 47(2):309–332.
Belgium. Association for Computational Linguistics.
E. F. Codd. 1970. A relational model of data for large
Ruichu Cai, Jinjie Yuan, Boyan Xu, and Zhifeng Hao. shared data banks. Commun. ACM, 13(6):377–387.
2021. SADGA: Structure-aware dual graph aggrega-
tion network for text-to-SQL. Advances in Neural Deborah A. Dahl, Madeleine Bates, Michael Brown,
Information Processing Systems, 34. William Fisher, Kate Hunicke-Smith, David Pallett,
Christine Pao, Alexander Rudnicky, and Elizabeth
Yitao Cai and Xiaojun Wan. 2020. IGSQL: Database Shriberg. 1994. Expanding the scope of the ATIS
schema interaction graph based neural model for task: The ATIS-3 corpus. In Human Language Tech-
context-dependent text-to-SQL generation. In Pro- nology: Proceedings of a Workshop held at Plains-
ceedings of the 2020 Conference on Empirical Meth- boro, New Jersey, March 8-11, 1994.
ods in Natural Language Processing (EMNLP),
pages 6903–6912, Online. Association for Computa- Naihao Deng, Shuaichen Chang, Peng Shi, Tao Yu,
tional Linguistics. and Rui Zhang. 2021. Prefix-to-SQL: Text-to-SQL
generation from incomplete user questions. arXiv
Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, preprint arXiv:2109.13066.
Su Zhu, and Kai Yu. 2021. LGESQL: Line graph
enhanced text-to-SQL model with mixed local and Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
non-local relations. In Proceedings of the 59th An- Kristina Toutanova. 2019. BERT: Pre-training of
nual Meeting of the Association for Computational deep bidirectional transformers for language under-
Linguistics and the 11th International Joint Confer- standing. In Proceedings of the 2019 Conference of
ence on Natural Language Processing (Volume 1: the North American Chapter of the Association for
Long Papers), pages 2541–2555, Online. Association Computational Linguistics: Human Language Tech-
for Computational Linguistics. nologies, Volume 1 (Long and Short Papers), pages
4171–4186, Minneapolis, Minnesota. Association for
Shuaichen Chang, Pengfei Liu, Yun Tang, Jing Huang, Computational Linguistics.
Xiaodong He, and Bowen Zhou. 2020. Zero-shot
text-to-SQL learning with auxiliary task. In Proceed- Li Dong and Mirella Lapata. 2016. Language to logical
ings of the AAAI Conference on Artificial Intelligence, form with neural attention. In Proceedings of the
volume 34, pages 7488–7495. 54th Annual Meeting of the Association for Compu-
Chieh-Yang Chen, Pei-Hsin Wang, Shih-Chieh Chang, tational Linguistics (Volume 1: Long Papers), pages
Da-Cheng Juan, Wei Wei, and Jia-Yu Pan. 2020a. 33–43, Berlin, Germany. Association for Computa-
AirConcierge: Generating task-oriented dialogue via tional Linguistics.
efficient large-scale knowledge retrieval. In Find- Li Dong and Mirella Lapata. 2018. Coarse-to-fine de-
ings of the Association for Computational Linguistics: coding for neural semantic parsing. In Proceedings
EMNLP 2020, pages 884–897, Online. Association of the 56th Annual Meeting of the Association for
for Computational Linguistics. Computational Linguistics (Volume 1: Long Papers),
Sanxing Chen, Aidan San, Xiaodong Liu, and Yangfeng pages 731–742, Melbourne, Australia. Association
Ji. 2020b. A tale of two linkings: Dynamically gat- for Computational Linguistics.
ing between schema linking and structural linking for
Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang,
text-to-SQL parsing. In Proceedings COLING-2020,
Jian-Guang Lou, Wanxiang Che, and Dechen Zhan.
the 28th International Conference on Computational
2022. UniSAr: A unified structure-aware autoregres-
Linguistics, pages 2900–2912, Barcelona, Spain (On-
sive language model for text-to-SQL. ArXiv preprint,
line). Association for Computational Linguistics.
abs/2203.07781.
Yongrui Chen, Xinnan Guo, Chaojie Wang, Jian Qiu,
Guilin Qi, Meng Wang, and Huiying Li. 2021a. Ahmed Elgohary, Saghar Hosseini, and Ahmed Has-
Leveraging table content for zero-shot text-to-SQL san Awadallah. 2020. Speak to your parser: Interac-
with meta-learning. ArXiv preprint, abs/2109.05395. tive text-to-SQL with natural language feedback. In
Proceedings of the 58th Annual Meeting of the Asso-
Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zi- ciation for Computational Linguistics, pages 2065–
han Xu, Su Zhu, and Kai Yu. 2021b. ShadowGNN: 2077, Online. Association for Computational Lin-
Graph projection neural network for text-to-SQL guistics.
parser. In Proceedings of the 2021 Conference of
the North American Chapter of the Association for Catherine Finegan-Dollak, Jonathan K. Kummerfeld,
Computational Linguistics: Human Language Tech- Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui
nologies, pages 5567–5577, Online. Association for Zhang, and Dragomir Radev. 2018. Improving text-
Computational Linguistics. to-SQL evaluation methodology. In Proceedings
of the 56th Annual Meeting of the Association for
DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, Computational Linguistics (Volume 1: Long Papers),
and Dong Ryeol Shin. 2021. RYANSQL: Recur- pages 351–360, Melbourne, Australia. Association
sively applying sketch-based slot fillings for complex for Computational Linguistics.
2175
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Chase: A large-scale and pragmatic Chinese dataset
Model-agnostic meta-learning for fast adaptation of for cross-database context-dependent text-to-SQL.
deep networks. In Proceedings of the 34th Inter- In Proceedings of the 59th Annual Meeting of the
national Conference on Machine Learning, ICML Association for Computational Linguistics and the
2017, Sydney, NSW, Australia, 6-11 August 2017, 11th International Joint Conference on Natural Lan-
volume 70 of Proceedings of Machine Learning Re- guage Processing (Volume 1: Long Papers), pages
search, pages 1126–1135. PMLR. 2316–2331, Online. Association for Computational
Linguistics.
Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew
Purver, John R. Woodward, Jinxia Xie, and Peng- Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-
sheng Huang. 2021a. Towards robustness of text- Guang Lou, Ting Liu, and Dongmei Zhang. 2019. To-
to-SQL models against synonym substitution. In wards complex text-to-SQL in cross-domain database
Proceedings of the 59th Annual Meeting of the Asso- with intermediate representation. In Proceedings of
ciation for Computational Linguistics and the 11th the 57th Annual Meeting of the Association for Com-
International Joint Conference on Natural Language putational Linguistics, pages 4524–4535, Florence,
Processing (Volume 1: Long Papers), pages 2505– Italy. Association for Computational Linguistics.
2515, Online. Association for Computational Lin-
guistics. Tong Guo and Huilin Gao. 2018. Bidirectional attention
for SQL generation. ArXiv preprint, abs/1801.00076.
Yujian Gan, Xinyun Chen, and Matthew Purver. 2021b.
Exploring underexplored limitations of cross-domain Tong Guo and Huilin Gao. 2019. Content enhanced
text-to-SQL generalization. In Proceedings of the BERT-based text-to-SQL generation. ArXiv preprint,
2021 Conference on Empirical Methods in Natural abs/1910.07179.
Language Processing, pages 8926–8931, Online and Moshe Hazoom, Vibhor Malik, and Ben Bogin. 2021.
Punta Cana, Dominican Republic. Association for Text-to-SQL in the wild: A naturally-occurring
Computational Linguistics. dataset based on stack exchange data. In Proceedings
Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, of the 1st Workshop on Natural Language Processing
John R. Woodward, John Drake, and Qiaofu Zhang. for Programming (NLP4Prog 2021), pages 77–87,
2021c. Natural SQL: Making SQL easier to infer Online. Association for Computational Linguistics.
from natural language specifications. In Findings Pengcheng He, Yi Mao, Kaushik Chakrabarti, and
of the Association for Computational Linguistics: Weizhu Chen. 2019. X-SQL: reinforce schema
EMNLP 2021, pages 2030–2042, Punta Cana, Do- representation with context. ArXiv preprint,
minican Republic. Association for Computational abs/1908.08113.
Linguistics.
Charles T. Hemphill, John J. Godfrey, and George R.
Juri Ganitkevitch, Benjamin Van Durme, and Chris Doddington. 1990. The ATIS spoken language sys-
Callison-Burch. 2013. PPDB: The paraphrase tems pilot corpus. In Speech and Natural Language:
database. In Proceedings of the 2013 Conference Proceedings of a Workshop Held at Hidden Valley,
of the North American Chapter of the Association Pennsylvania, June 24-27,1990.
for Computational Linguistics: Human Language
Technologies, pages 758–764, Atlanta, Georgia. As- Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin
sociation for Computational Linguistics. Guu, Panupong Pasupat, and Yuan Zhang. 2021. Un-
locking compositional generalization in pre-trained
Jonathan L Gross, Jay Yellen, and Mark Anderson. 2018. models using intermediate representations. ArXiv
Graph theory and its applications. Chapman and preprint, abs/2104.07478.
Hall/CRC.
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long
Daya Guo, Yibo Sun, Duyu Tang, Nan Duan, Jian Yin, short-term memory. Neural computation, 9(8):1735–
Hong Chi, James Cao, Peng Chen, and Ming Zhou. 1780.
2018. Question generation from SQL queries im-
proves neural semantic parsing. In Proceedings of the Po-Sen Huang, Chenglong Wang, Rishabh Singh, Wen-
2018 Conference on Empirical Methods in Natural tau Yih, and Xiaodong He. 2018. Natural language
Language Processing, pages 1597–1607, Brussels, to structured query generation via meta-learning. In
Belgium. Association for Computational Linguistics. Proceedings of the 2018 Conference of the North
American Chapter of the Association for Computa-
Jiaqi Guo, Qian Liu, Jian-Guang Lou, Zhenwen Li, tional Linguistics: Human Language Technologies,
Xueqing Liu, Tao Xie, and Ting Liu. 2020. Bench- Volume 2 (Short Papers), pages 732–738, New Or-
marking meaning representations in neural semantic leans, Louisiana. Association for Computational Lin-
parsing. In Proceedings of the 2020 Conference on guistics.
Empirical Methods in Natural Language Processing
(EMNLP), pages 1520–1540, Online. Association for Binyuan Hui, Ruiying Geng, Qiyu Ren, Binhua Li,
Computational Linguistics. Yongbin Li, Jian Sun, Fei Huang, Luo Si, Pengfei
Zhu, and Xiaodan Zhu. 2021a. Dynamic hybrid re-
Jiaqi Guo, Ziliang Si, Yu Wang, Qian Liu, Ming Fan, lation network for cross-domain context-dependent
Jian-Guang Lou, Zijiang Yang, and Ting Liu. 2021. semantic parsing. ArXiv preprint, abs/2101.01686.
2176
Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Qin, 6045–6051, Hong Kong, China. Association for Com-
Bowen Li, Jian Sun, and Yongbin Li. 2022. S2 SQL: putational Linguistics.
Injecting syntax to question-schema interaction graph
encoder for text-to-SQL parsers. ArXiv preprint, Wenqiang Lei, Weixin Wang, Zhixin Ma, Tian Gan,
abs/2203.06958. Wei Lu, Min-Yen Kan, and Tat-Seng Chua. 2020.
Re-examining the role of schema linking in text-to-
Binyuan Hui, Xiang Shi, Ruiying Geng, Binhua Li, SQL. In Proceedings of the 2020 Conference on
Yongbin Li, Jian Sun, and Xiaodan Zhu. 2021b. Im- Empirical Methods in Natural Language Processing
proving text-to-SQL with schema dependency learn- (EMNLP), pages 6943–6954, Online. Association for
ing. ArXiv preprint, abs/2103.04399. Computational Linguistics.
Wonseok Hwang, Jinyeong Yim, Seunghyun Park, and Mike Lewis, Yinhan Liu, Naman Goyal, Marjan
Minjoon Seo. 2019. A comprehensive exploration on Ghazvininejad, Abdelrahman Mohamed, Omer Levy,
WikiSQL with table-aware word contextualization. Veselin Stoyanov, and Luke Zettlemoyer. 2020.
ArXiv preprint, abs/1902.01069. BART: Denoising sequence-to-sequence pre-training
for natural language generation, translation, and com-
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant prehension. In Proceedings of the 58th Annual Meet-
Krishnamurthy, and Luke Zettlemoyer. 2017. Learn- ing of the Association for Computational Linguistics,
ing a neural semantic parser from user feedback. In pages 7871–7880, Online. Association for Computa-
Proceedings of the 55th Annual Meeting of the As- tional Linguistics.
sociation for Computational Linguistics (Volume 1:
Long Papers), pages 963–973, Vancouver, Canada. Fei Li and Hosagrahar V Jagadish. 2014. Constructing
Association for Computational Linguistics. an interactive natural language interface for relational
databases. Proceedings of the VLDB Endowment,
Marcelo Archanjo José and Fabio Gagliardi Cozman. 8(1):73–84.
2021. mRAT-SQL+ GAP: A Portuguese text-to-SQL
transformer. In Brazilian Conference on Intelligent Ning Li, Bethany Keller, Mark Butler, and Daniel Cer.
Systems, pages 511–525. Springer. 2020a. SeqGenSQL–a robust sequence generation
model for structured query language. ArXiv preprint,
Jovan Kalajdjieski, Martina Toshevska, and Frosina Sto- abs/2011.03836.
janovska. 2020. Recent advances in SQL query gen-
eration: A survey. ArXiv preprint, abs/2005.07667. Yujia Li, Daniel Tarlow, Marc Brockschmidt, and
Richard S. Zemel. 2016. Gated graph sequence
George Katsogiannis-Meimarakis and Georgia Koutrika. neural networks. In 4th International Conference
2021. A deep dive into deep learning approaches for on Learning Representations, ICLR 2016, San Juan,
text-to-SQL systems. In Proceedings of the 2021 Puerto Rico, May 2-4, 2016, Conference Track Pro-
International Conference on Management of Data, ceedings.
pages 2846–2851.
Yuntao Li, Bei Chen, Qian Liu, Yan Gao, Jian-Guang
Amol Kelkar, Rohan Relan, Vaishali Bhardwaj, Saurabh Lou, Yan Zhang, and Dongmei Zhang. 2020b. “what
Vaichal, Chandra Khatri, and Peter Relan. 2020. do you mean by that?” a parser-independent interac-
Bertrand-dr: Improving text-to-SQL using a discrim- tive approach for enhancing text-to-SQL. In Proceed-
inative re-ranker. ArXiv preprint, abs/2002.00557. ings of the 2020 Conference on Empirical Methods
in Natural Language Processing (EMNLP), pages
Yoon Kim, Carl Denton, Luong Hoang, and Alexan- 6913–6922, Online. Association for Computational
der M. Rush. 2017. Structured attention networks. Linguistics.
In 5th International Conference on Learning Rep-
resentations, ICLR 2017, Toulon, France, April 24- Chen Liang, Mohammad Norouzi, Jonathan Berant,
26, 2017, Conference Track Proceedings. OpenRe- Quoc V. Le, and Ni Lao. 2018. Memory augmented
view.net. policy optimization for program synthesis and se-
mantic parsing. In Advances in Neural Information
Chia-Hsuan Lee, Oleksandr Polozov, and Matthew Processing Systems 31: Annual Conference on Neu-
Richardson. 2021. KaggleDBQA: Realistic evalu- ral Information Processing Systems 2018, NeurIPS
ation of text-to-SQL parsers. In Proceedings of the 2018, December 3-8, 2018, Montréal, Canada, pages
59th Annual Meeting of the Association for Compu- 10015–10027.
tational Linguistics and the 11th International Joint
Conference on Natural Language Processing (Vol- Xi Victoria Lin, Richard Socher, and Caiming Xiong.
ume 1: Long Papers), pages 2261–2273, Online. As- 2020. Bridging textual and tabular data for cross-
sociation for Computational Linguistics. domain text-to-SQL semantic parsing. In Findings
of the Association for Computational Linguistics:
Dongjun Lee. 2019. Clause-wise and recursive decod- EMNLP 2020, pages 4870–4888, Online. Association
ing for complex and cross-domain text-to-SQL gen- for Computational Linguistics.
eration. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang,
and the 9th International Joint Conference on Natu- Hiroaki Hayashi, and Graham Neubig. 2021a. Pre-
ral Language Processing (EMNLP-IJCNLP), pages train, prompt, and predict: A systematic survey of
2177
prompting methods in natural language processing. Panupong Pasupat and Percy Liang. 2015. Composi-
ArXiv preprint, abs/2107.13586. tional semantic parsing on semi-structured tables. In
Proceedings of the 53rd Annual Meeting of the As-
Qian Liu, Dejian Yang, Jiahui Zhang, Jiaqi Guo, Bin sociation for Computational Linguistics and the 7th
Zhou, and Jian-Guang Lou. 2021b. Awakening la- International Joint Conference on Natural Language
tent grounding from pretrained language models for Processing (Volume 1: Long Papers), pages 1470–
semantic parsing. In Findings of the Association 1480, Beijing, China. Association for Computational
for Computational Linguistics: ACL-IJCNLP 2021, Linguistics.
pages 1174–1189, Online. Association for Computa-
tional Linguistics. Ana-Maria Popescu, Oren Etzioni, and Henry Kautz.
2003. Towards a theory of natural language inter-
Qin Lyu, Kaushik Chakrabarti, Shobhit Hathi, Souvik
faces to databases. In Proceedings of the 8th interna-
Kundu, Jianwen Zhang, and Zheng Chen. 2020. Hy-
tional conference on Intelligent user interfaces, pages
brid ranking network for text-to-SQL. ArXiv preprint,
149–157.
abs/2008.04759.
Jianqiang Ma, Zeyu Yan, Shuai Pang, Yang Zhang, and P. J. Price. 1990. Evaluation of spoken language sys-
Jianping Shen. 2020. Mention extraction and linking tems: the ATIS domain. In Speech and Natural Lan-
for SQL query generation. In Proceedings of the guage: Proceedings of a Workshop Held at Hidden
2020 Conference on Empirical Methods in Natural Valley, Pennsylvania, June 24-27,1990.
Language Processing (EMNLP), pages 6936–6942,
Online. Association for Computational Linguistics. Karthik Radhakrishnan, Arvind Srikantan, and Xi Vic-
toria Lin. 2020. ColloQL: Robust cross-domain
Tanzim Mahmud, KM Azharul Hasan, Mahtab Ahmed, text-to-SQL over search queries. ArXiv preprint,
and Thwoi Hla Ching Chak. 2015. A rule based ap- abs/2010.09927.
proach for NLP based query processing. In 2015
2nd International Conference on Electrical Informa- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
tion and Communication Technologies (EICT), pages Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
78–82. IEEE. Wei Li, and Peter J Liu. 2019. Exploring the limits
of transfer learning with a unified text-to-text trans-
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong,
former. ArXiv preprint, abs/1910.10683.
and Richard Socher. 2018. The natural language
decathlon: Multitask learning as question answering.
ArXiv preprint, abs/1806.08730. Ohad Rubin and Jonathan Berant. 2021. SmBoP: Semi-
autoregressive bottom-up semantic parsing. In Pro-
Qingkai Min, Yuefeng Shi, and Yue Zhang. 2019a. A ceedings of the 2021 Conference of the North Amer-
pilot study for Chinese SQL semantic parsing. In ican Chapter of the Association for Computational
Proceedings of the 2019 Conference on Empirical Linguistics: Human Language Technologies, pages
Methods in Natural Language Processing and the 311–324, Online. Association for Computational Lin-
9th International Joint Conference on Natural Lan- guistics.
guage Processing (EMNLP-IJCNLP), pages 3652–
3658, Hong Kong, China. Association for Computa- Irina Saparina and Anton Osokin. 2021. SPARQLing
tional Linguistics. database queries from intermediate question decom-
positions. In Proceedings of the 2021 Conference
Sewon Min, Danqi Chen, Hannaneh Hajishirzi, and on Empirical Methods in Natural Language Process-
Luke Zettlemoyer. 2019b. A discrete hard EM ap- ing, pages 8984–8998, Online and Punta Cana, Do-
proach for weakly supervised question answering. In minican Republic. Association for Computational
Proceedings of the 2019 Conference on Empirical Linguistics.
Methods in Natural Language Processing and the
9th International Joint Conference on Natural Lan- Torsten Scholak, Raymond Li, Dzmitry Bahdanau,
guage Processing (EMNLP-IJCNLP), pages 2851– Harm de Vries, and Chris Pal. 2021a. DuoRAT: To-
2864, Hong Kong, China. Association for Computa- wards simpler text-to-SQL models. In Proceedings
tional Linguistics. of the 2021 Conference of the North American Chap-
ter of the Association for Computational Linguistics:
Samuel Müller and Andreas Vlachos. 2019. Byte-pair
Human Language Technologies, pages 1313–1321,
encoding for text-to-SQL generation. ArXiv preprint,
Online. Association for Computational Linguistics.
abs/1910.08962.
Ansong Ni, Pengcheng Yin, and Graham Neubig. 2020. Torsten Scholak, Nathan Schucher, and Dzmitry Bah-
Merging weak and active supervision for semantic danau. 2021b. PICARD: Parsing incrementally for
parsing. In Proceedings of the AAAI Conference on constrained auto-regressive decoding from language
Artificial Intelligence, volume 34, pages 8536–8543. models. In Proceedings of the 2021 Conference on
Empirical Methods in Natural Language Processing,
Peter Ochieng. 2020. Parot: Translating natural lan- pages 9895–9901, Online and Punta Cana, Domini-
guage to SPARQL. Expert Systems with Applica- can Republic. Association for Computational Lin-
tions: X, 5:100024. guistics.
2178
Rico Sennrich, Barry Haddow, and Alexandra Birch. Alane Suhr, Srinivasan Iyer, and Yoav Artzi. 2018.
2016. Neural machine translation of rare words with Learning to map context-dependent sentences to exe-
subword units. In Proceedings of the 54th Annual cutable formal queries. In Proceedings of the 2018
Meeting of the Association for Computational Lin- Conference of the North American Chapter of the
guistics (Volume 1: Long Papers), pages 1715–1725, Association for Computational Linguistics: Human
Berlin, Germany. Association for Computational Lin- Language Technologies, Volume 1 (Long Papers),
guistics. pages 2238–2249, New Orleans, Louisiana. Associa-
tion for Computational Linguistics.
Peter Shaw, Ming-Wei Chang, Panupong Pasupat, and
Kristina Toutanova. 2021. Compositional generaliza- Ningyuan Sun, Xuefeng Yang, and Yunfeng Liu. 2020.
tion and natural language variation: Can a semantic Tableqa: a large-scale Chinese text-to-SQL dataset
parsing approach handle both? In Proceedings of the for table-aware SQL generation. ArXiv preprint,
59th Annual Meeting of the Association for Compu- abs/2006.06434.
tational Linguistics and the 11th International Joint
Conference on Natural Language Processing (Vol- Lappoon R. Tang and Raymond J. Mooney. 2000. Au-
ume 1: Long Papers), pages 922–938, Online. Asso- tomated construction of database interfaces: Inter-
ciation for Computational Linguistics. grating statistical and relational learning for semantic
parsing. In 2000 Joint SIGDAT Conference on Em-
Peng Shi, Patrick Ng, Zhiguo Wang, Henghui Zhu, pirical Methods in Natural Language Processing and
Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos Very Large Corpora, pages 133–141, Hong Kong,
Santos, and Bing Xiang. 2020a. Learning con- China. Association for Computational Linguistics.
textual representations for semantic parsing with
generation-augmented pre-training. ArXiv preprint, Yasufumi Taniguchi, Hiroki Nakayama, Kubo Takahiro,
abs/2012.10309. and Jun Suzuki. 2021. An investigation between
schema linking and text-to-SQL performance. ArXiv
Peng Shi, Tao Yu, Patrick Ng, and Zhiguo Wang. preprint, abs/2102.01847.
2021. End-to-end cross-domain text-to-SQL se-
mantic parsing with auxiliary task. ArXiv preprint, Anh Tuan Nguyen, Mai Hoang Dao, and Dat Quoc
abs/2106.09588. Nguyen. 2020. A pilot study of text-to-SQL semantic
parsing for Vietnamese. In Findings of the Associa-
Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, tion for Computational Linguistics: EMNLP 2020,
Yi Mao, Oleksandr Polozov, and Weizhu Chen. 2018. pages 4079–4085, Online. Association for Computa-
IncSQL: Training incremental text-to-SQL parsers tional Linguistics.
with non-deterministic oracles. ArXiv preprint,
abs/1809.05054. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz
Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Kaiser, and Illia Polosukhin. 2017. Attention is all
Daumé III, and Lillian Lee. 2020b. On the poten- you need. In Advances in Neural Information Pro-
tial of lexico-logical alignments for semantic pars- cessing Systems 30: Annual Conference on Neural
ing to SQL queries. In Findings of the Association Information Processing Systems 2017, December 4-9,
for Computational Linguistics: EMNLP 2020, pages 2017, Long Beach, CA, USA, pages 5998–6008.
1849–1864, Online. Association for Computational
Linguistics. Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly.
2015. Pointer networks. In Advances in Neural
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Information Processing Systems 28: Annual Confer-
2012. Practical bayesian optimization of machine ence on Neural Information Processing Systems 2015,
learning algorithms. In Advances in Neural Informa- December 7-12, 2015, Montreal, Quebec, Canada,
tion Processing Systems 25: 26th Annual Conference pages 2692–2700.
on Neural Information Processing Systems 2012. Pro-
ceedings of a meeting held December 3-6, 2012, Lake Bailin Wang, Mirella Lapata, and Ivan Titov. 2021a.
Tahoe, Nevada, United States, pages 2960–2968. Meta-learning for domain generalization in seman-
tic parsing. In Proceedings of the 2021 Conference
Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang of the North American Chapter of the Association
Zhao, and Di Jiang. 2022. Speech-to-SQL: Towards for Computational Linguistics: Human Language
speech-driven SQL query generation from natural Technologies, pages 366–379, Online. Association
language question. ArXiv preprint, abs/2201.01209. for Computational Linguistics.

Alane Suhr, Ming-Wei Chang, Peter Shaw, and Ken- Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr
ton Lee. 2020. Exploring unexplored generalization Polozov, and Matthew Richardson. 2020a. RAT-
challenges for cross-database semantic parsing. In SQL: Relation-aware schema encoding and linking
Proceedings of the 58th Annual Meeting of the Asso- for text-to-SQL parsers. In Proceedings of the 58th
ciation for Computational Linguistics, pages 8372– Annual Meeting of the Association for Computational
8388, Online. Association for Computational Lin- Linguistics, pages 7567–7578, Online. Association
guistics. for Computational Linguistics.
2179
Bailin Wang, Ivan Titov, and Mirella Lapata. 2019. pages 8974–8983, Online and Punta Cana, Domini-
Learning semantic parsers from denotations with la- can Republic. Association for Computational Lin-
tent structured alignments and abstract programs. In guistics.
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong,
9th International Joint Conference on Natural Lan- Torsten Scholak, Michihiro Yasunaga, Chien-Sheng
guage Processing (EMNLP-IJCNLP), pages 3774– Wu, Ming Zhong, Pengcheng Yin, Sida I Wang,
3785, Hong Kong, China. Association for Computa- et al. 2022. UnifiedSKG: Unifying and multi-tasking
tional Linguistics. structured knowledge grounding with text-to-text lan-
guage models. ArXiv preprint, abs/2201.05966.
Bailin Wang, Wenpeng Yin, Xi Victoria Lin, and Caim-
Peng Xu, Dhruv Kumar, Wei Yang, Wenjie Zi, Keyi
ing Xiong. 2021b. Learning to synthesize data for
Tang, Chenyang Huang, Jackie Chi Kit Cheung, Si-
semantic parsing. In Proceedings of the 2021 Con-
mon J.D. Prince, and Yanshuai Cao. 2021. Opti-
ference of the North American Chapter of the Asso-
mizing deeper transformers on small datasets. In
ciation for Computational Linguistics: Human Lan-
Proceedings of the 59th Annual Meeting of the Asso-
guage Technologies, pages 2760–2766, Online. As-
ciation for Computational Linguistics and the 11th
sociation for Computational Linguistics.
International Joint Conference on Natural Language
Chenglong Wang, Marc Brockschmidt, and Rishabh Processing (Volume 1: Long Papers), pages 2089–
Singh. 2018a. Pointing out SQL queries from text. 2102, Online. Association for Computational Lin-
guistics.
Chenglong Wang, Kedar Tatwawadi, Marc Silei Xu, Sina Semnani, Giovanni Campagna, and Mon-
Brockschmidt, Po-Sen Huang, Yi Mao, Olek- ica Lam. 2020. AutoQA: From databases to QA
sandr Polozov, and Rishabh Singh. 2018b. Robust semantic parsers with only synthetic training data. In
text-to-SQL generation with execution-guided Proceedings of the 2020 Conference on Empirical
decoding. ArXiv preprint, abs/1807.03100. Methods in Natural Language Processing (EMNLP),
pages 422–434, Online. Association for Computa-
Huajie Wang, Mei Li, and Lei Chen. 2020b. PG-GSQL: tional Linguistics.
Pointer-generator network with guide decoding for
cross-domain context-dependent text-to-SQL gener- Xiaojun Xu, Chang Liu, and Dawn Song. 2017. SQL-
ation. In Proceedings of COLING-2022, the 28th Net: Generating structured queries from natural
International Conference on Computational Linguis- language without reinforcement learning. ArXiv
tics, pages 370–380, Barcelona, Spain (Online). As- preprint, abs/1711.04436.
sociation for Computational Linguistics.
Kuan Xuan, Yongbo Wang, Yongliang Wang, Zujie Wen,
Lijie Wang, Ao Zhang, Kun Wu, Ke Sun, Zhenghua and Yang Dong. 2021. Sead: End-to-end text-to-
Li, Hua Wu, Min Zhang, and Haifeng Wang. 2020c. SQL generation with schema-aware denoising. ArXiv
DuSQL: A large-scale and pragmatic Chinese text-to- preprint, abs/2105.07911.
SQL dataset. In Proceedings of the 2020 Conference
on Empirical Methods in Natural Language Process- Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and
ing (EMNLP), pages 6923–6935, Online. Association Thomas Dillig. 2017. SQLizer: query synthesis from
for Computational Linguistics. natural language. Proceedings of the ACM on Pro-
gramming Languages, 1(OOPSLA):1–26.
Ping Wang, Tian Shi, and Chandan K. Reddy. 2020d. Zeyu Yan, Jianqiang Ma, Yang Zhang, and Jianping
Text-to-SQL generation for question answering on Shen. 2020. SQL generation via machine reading
electronic medical records. In WWW ’20: The Web comprehension. In Proceedings of COLING-2022,
Conference 2020, Taipei, Taiwan, April 20-24, 2020, the 28th International Conference on Computational
pages 350–361. ACM / IW3C2. Linguistics, pages 350–356, Barcelona, Spain (On-
line). Association for Computational Linguistics.
W. Woods, Ronald Kaplan, and Bonnie Webber. 1972.
The lunar sciences natural language information sys- Ziyu Yao, Yu Su, Huan Sun, and Wen-tau Yih. 2019.
tem. Model-based interactive semantic parsing: A unified
framework and a text-to-SQL case study. In Proceed-
William A Woods. 1973. Progress in natural language ings of the 2019 Conference on Empirical Methods
understanding: an application to lunar geology. In in Natural Language Processing and the 9th Inter-
Proceedings of the June 4-8, 1973, national computer national Joint Conference on Natural Language Pro-
conference and exposition, pages 441–450. cessing (EMNLP-IJCNLP), pages 5447–5458, Hong
Kong, China. Association for Computational Linguis-
Kun Wu, Lijie Wang, Zhenghua Li, Ao Zhang, Xinyan tics.
Xiao, Hua Wu, Min Zhang, and Haifeng Wang.
2021. Data augmentation with hierarchical SQL- Ziyu Yao, Yiqi Tang, Wen-tau Yih, Huan Sun, and
to-question generation for cross-domain text-to-SQL Yu Su. 2020. An imitation game for learning se-
parsing. In Proceedings of the 2021 Conference on mantic parsers from user interaction. In Proceed-
Empirical Methods in Natural Language Processing, ings of the 2020 Conference on Empirical Methods
2180
in Natural Language Processing (EMNLP), pages Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga,
6883–6902, Online. Association for Computational Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingn-
Linguistics. ing Yao, Shanelle Roman, Zilin Zhang, and Dragomir
Radev. 2018c. Spider: A large-scale human-labeled
Xi Ye, Qiaochu Chen, Xinyu Wang, Isil Dillig, and dataset for complex and cross-domain semantic pars-
Greg Durrett. 2020. Sketch-driven regular expres- ing and text-to-SQL task. In Proceedings of the 2018
sion generation from natural language and examples. Conference on Empirical Methods in Natural Lan-
Transactions of the Association for Computational guage Processing, pages 3911–3921, Brussels, Bel-
Linguistics, 8:679–694. gium. Association for Computational Linguistics.
Pengcheng Yin and Graham Neubig. 2017. A syntactic Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern
neural model for general-purpose code generation. Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene
In Proceedings of the 55th Annual Meeting of the Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit,
Association for Computational Linguistics (Volume David Proctor, Sungrok Shim, Jonathan Kraft, Vin-
1: Long Papers), pages 440–450, Vancouver, Canada. cent Zhang, Caiming Xiong, Richard Socher, and
Association for Computational Linguistics. Dragomir Radev. 2019b. SParC: Cross-domain se-
mantic parsing in context. In Proceedings of the
Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Se- 57th Annual Meeting of the Association for Computa-
bastian Riedel. 2020. TaBERT: Pretraining for joint tional Linguistics, pages 4511–4523, Florence, Italy.
understanding of textual and tabular data. In Proceed- Association for Computational Linguistics.
ings of the 58th Annual Meeting of the Association
for Computational Linguistics, pages 8413–8426, On- John M Zelle and Raymond J Mooney. 1996. Learning
line. Association for Computational Linguistics. to parse database queries using inductive logic pro-
gramming. In Proceedings of the national conference
Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, and Dragomir on artificial intelligence, pages 1050–1055.
Radev. 2018a. TypeSQL: Knowledge-based type-
aware neural text-to-SQL generation. In Proceedings Rowan Zellers, Ari Holtzman, Elizabeth Clark, Lianhui
of the 2018 Conference of the North American Chap- Qin, Ali Farhadi, and Yejin Choi. 2021. TuringAd-
ter of the Association for Computational Linguistics: vice: A generative and dynamic evaluation of lan-
Human Language Technologies, Volume 2 (Short Pa- guage use. In Proceedings of the 2021 Conference of
pers), pages 588–594, New Orleans, Louisiana. As- the North American Chapter of the Association for
sociation for Computational Linguistics. Computational Linguistics: Human Language Tech-
nologies, pages 4856–4880, Online. Association for
Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Computational Linguistics.
Yi Chern Tan, Xinyi Yang, Dragomir R. Radev,
Richard Socher, and Caiming Xiong. 2021. Grappa: Jichuan Zeng, Xi Victoria Lin, Steven C.H. Hoi, Richard
Grammar-augmented pre-training for table semantic Socher, Caiming Xiong, Michael Lyu, and Irwin
parsing. In 9th International Conference on Learning King. 2020. Photon: A robust cross-domain text-
Representations, ICLR 2021, Virtual Event, Austria, to-SQL system. In Proceedings of the 58th Annual
May 3-7, 2021. OpenReview.net. Meeting of the Association for Computational Lin-
guistics: System Demonstrations, pages 204–214,
Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Online. Association for Computational Linguistics.
Dongxu Wang, Zifan Li, and Dragomir Radev. 2018b.
SyntaxSQLNet: Syntax tree networks for complex Rui Zhang, Tao Yu, Heyang Er, Sungrok Shim, Eric
and cross-domain text-to-SQL task. In Proceedings Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong,
of the 2018 Conference on Empirical Methods in Nat- Richard Socher, and Dragomir Radev. 2019. Editing-
ural Language Processing, pages 1653–1663, Brus- based SQL query generation for cross-domain
sels, Belgium. Association for Computational Lin- context-dependent questions. In Proceedings of the
guistics. 2019 Conference on Empirical Methods in Natu-
ral Language Processing and the 9th International
Tao Yu, Rui Zhang, Heyang Er, Suyi Li, Eric Xue, Joint Conference on Natural Language Processing
Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze (EMNLP-IJCNLP), pages 5338–5349, Hong Kong,
Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, China. Association for Computational Linguistics.
Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan
Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vin- Yusen Zhang, Xiangyu Dong, Shuaichen Chang, Tao Yu,
cent Zhang, Caiming Xiong, Richard Socher, Walter Peng Shi, and Rui Zhang. 2020. Did you ask a good
Lasecki, and Dragomir Radev. 2019a. CoSQL: A question? a cross-domain question intention classi-
conversational text-to-SQL challenge towards cross- fication benchmark for text-to-SQL. ArXiv preprint,
domain natural language interfaces to databases. In abs/2010.12634.
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the Yanzhao Zheng, Haibin Wang, Baohua Dong, Xingjun
9th International Joint Conference on Natural Lan- Wang, and Changshan Li. 2022. HIE-SQL: His-
guage Processing (EMNLP-IJCNLP), pages 1962– tory information enhanced network for context-
1979, Hong Kong, China. Association for Computa- dependent text-to-SQL semantic parsing. ArXiv
tional Linguistics. preprint, abs/2203.07376.
2181
Ruiqi Zhong, Tao Yu, and Dan Klein. 2020a. Semantic A Topology for Text-to-SQL
evaluation for text-to-SQL with distilled test suites.
In Proceedings of the 2020 Conference on Empirical Figure 5 shows the topology for the text-to-SQL
Methods in Natural Language Processing (EMNLP), task.
pages 396–411, Online. Association for Computa-
tional Linguistics.
B Text-to-SQL Examples
Victor Zhong, Mike Lewis, Sida I. Wang, and Luke
Zettlemoyer. 2020b. Grounded adaptation for zero- B.1 Table and Database
shot executable semantic parsing. In Proceedings Table 6 shows an example of the table in the
of the 2020 Conference on Empirical Methods in
Natural Language Processing (EMNLP), pages 6869–
database for Restaurants dataset. The domain for
6882, Online. Association for Computational Lin- this dataset is restaurant information, where ques-
guistics. tions are typically about food type, restaurant loca-
tion, etc.
Victor Zhong, Caiming Xiong, and Richard Socher.
2017. Seq2SQL: Generating structured queries from There is a big difference in terms of how many
natural language using reinforcement learning. ArXiv tables a database has. For restaurants, there are 3
preprint, abs/1709.00103. tables in the database, while there are 32 tables in
Jiawei Zhou, Jason Eisner, Michael Newman, Em- ATIS (Suhr et al., 2020).
manouil Antonios Platanios, and Sam Thomson.
2022. Online semantic parsing for latency reduc- B.2 Domain Knowledge
tion in task-oriented dialogue. In Proceedings of the Question: Will undergrads be okay to take 581 ?
60th Annual Meeting of the Association for Compu-
tational Linguistics (Volume 1: Long Papers), pages SQL query:
1554–1576, Dublin, Ireland. Association for Compu- SELECT DISTINCT T1.ADVISORY_REQUIREMENT ,
tational Linguistics. T1.ENFORCED_REQUIREMENT , T1.NAME FROM
COURSE AS T1 WHERE T1.DEPARTMENT =
"EECS" AND T1.NUMBER = 581 ;

In Advising dataset, Department “EECS” is con-


sidered as domain knowledge where “581” in the
utterance means a course in “EECS” department
with course number “581”.

B.3 Dataset Convention


Question: Give me some restaurants in alameda ?
SQL query:
SELECT T1.HOUSE_NUMBER ,
T2.NAME FROM LOCATION AS T1 , RESTAURANT
AS T2 WHERE T1.CITY_NAME = "alameda"
AND T2.ID = T1.RESTAURANT_ID ;

In Restaurants dataset, when the user queries


“restaurants”, by dataset convention, the cor-
responding SQL query returns the column
“HOUSE_NUMBER” and “NAME”.

B.4 Text-to-SQL Templates


An example of the template for text-to-SQL pair
used by Iyer et al. (2017) is as follows:
Question template: Get all <ENT1>.<NAME>
having <ENT2>.<COL1>.<NAME> as
<ENT2>.<COL1>.<TYPE>
SQL query template:
SELECT <ENT1>.<DEF> FROM JOIN_FROM(
<ENT1>, <ENT2>) WHERE JOIN_WHERE(<ENT1>,
<ENT2>) AND
<ENT2>.<COL1> = <ENT2>.<COL1>.<TYPE> ;
2182
ATIS; GeoQuery; Restau-
Single- rants; Scholar; Academic;
domain Yelp; IMDB; Advising; MIM-
ICSQL; ESQL(zh); SEDE

WikiSQL

Large Scale Spider; Spider-DK; SpiderUTran ;


Datasets § 2
Cross-domain Spider-L; SpiderSL ; Spider-Syn

TableQA(zh); DuSQL(zh);
ViText2SQL(vi); CSpi-
der(zh); PortugueseSpider(pt)

ATIS; Sparc; CoSQL;


Others Multi-turn Splash; Chase (zh)

Data Others TriageSQL; Squall; KaggleDBQA


Augmentation
Encode Token Types;
Encoding Graph-based; Self-attention;
Adapt PLM; Pre-training

Tree-based; Sketch-based;
Methodologies Bottom-up; Attention Mecha-
text-to-SQL Decoding
§3 nism; Copy Mechanism; Inter-
mediate Representation; Others

Active Learning; Interac-


Learning Fully
tive/Imitation Learning; Meta-
Techniques Supervised
learning; Multi-task learning
Miscellaneous
Reinforcement Learning; Meta-
Weakly
Learning; Bayesian Optimization;
supervised
Hard-EM-style Parameter Updates

Evaluations Exact string match; Exact set


§4 Metrics match; Execution accuracy

Split Example split; SQL


Methods query split; Database split

Table 5: Topology for text-to-SQL. Format adapted from Liu et al. (2021a).

CITY_NAME* COUNTY REGION paperDataset , dataset WHERE author.


authorId = writes.authorId
VARCHAR(255) VARCHAR(255) VARCHAR(255) AND writes.paperId = paper.paperId
Alameda AND paper.paperId = paperDataset.paperId
Alameda Bay Area
County AND paperDataset.datasetId = dataset.
Contra Costa datasetId AND dataset.datasetName =
Alamo Bay Area
County DATASET_TYPE ;
Alameda
Albany Bay Area
County
... ... ...

Table 6: Geography, one of the tables in Restaurants , where they populate the slots in the templates
database. * denotes the primary key of this table. We with table and column names from the database
only include 3 rows for demonstration purpose.
schema, as well as join the corresponding tables
accordingly.
Generated question: Get all author having dataset
as DATASET_TYPE An example of the PPDB (Ganitkevitch et al.,
Generated SQL query: 2013) paraphrasing is “thrown into jail” and “im-
SELECT author.authorId prisoned”. The English portion of PPDB contains
FROM author , writes , paper , over 220 million paraphrasing pairs.
2183
B.5 Complexity of Natural Language and C.1 More Discussion on Text-to-SQL Datasets
SQL Query Pairs CSpider (Min et al., 2019a), Vi-
In terms of the complexity for SQL queries, Text2SQL (Tuan Nguyen et al., 2020) and José
Finegan-Dollak et al. (2018) find that models per- and Cozman (2021) translate all the English
form better on shorter SQL queries than longer questions in Spider into Chinese, Vietnamese and
SQL queries, which indicates that shorter SQL Portuguese, respectively. TableQA (Sun et al.,
queries are easier in general. Yu et al. (2018c) 2020) follows the data collection method from
define the SQL hardness as the number of SQL WikiSQL, while DuSQL (Wang et al., 2020c)
components. The SQL query is harder when it con- follows Spider. Both TableQA and DuSQL collect
tains more SQL keywords such as GROUP BY and Chinese utterance and SQL query pairs across
nested subqueries. Yu et al. (2018c) gives some different domains. Chen et al. (2021a) propose a
examples of SQL queries with different difficulty Chinese domain-specific dataset, ESQL.
levels: For multi-turn context-dependent text-to-SQL
Easy: benchmarks, ATIS (Price, 1990; Dahl et al.,
SELECT COUNT(*) 1994) includes user interactions with a SQL flight
FROM cars_data database in multiple turns. Sparc (Yu et al., 2019b)
WHERE cylinders > 4 ;
takes a further step to collect multi-turn interactions
Medium: across 200 databases and 138 domains. However,
SELECT T2.name, COUNT(*) both ATIS and Sparc assume all user questions can
FROM concert AS T1 JOIN stadium AS T2 ON
T1.stadium_id = T2.stadium_id GROUP
be mapped into SQL queries and do not include
BY T1.stadium_id ; system responses. Later, inspired by task-oriented
Hard: dialogue system (Budzianowski et al., 2018), Yu
et al. (2019a) propose CoSQL. In CoSQL, the di-
SELECT T1.country_name
FROM countries AS T1 JOIN continents AS alogue state is tracked by SQL. CoSQL includes
T2 ON T1.continent = T2.cont_id JOIN three tasks of SQL-grounded dialogue state track-
car_makers AS T3 ON T1.country_id = T3.
country
ing to generate SQL queries from user’s utterance,
WHERE T2.continent = ’Europe’ system response generation from query results, and
GROUP BY T1.country_name user dialogue act prediction to detect and resolve
HAVING COUNT(*) >= 3 ;
ambiguous and unanswerable questions.
Extra Hard: Besides, TriageSQL (Zhang et al., 2020) col-
SELECT AVG(life_expectancy) FROM country lects unanswerable questions other than natural
WHERE name NOT IN utterance and SQL query pairs from Spider and
(SELECT T1.name
FROM country AS T1 JOIN WikiSQL, bringing up the challenge of distinguish-
country_language AS T2 ing answerable questions from unanswerable ones
ON T1.code = T2.country_code
WHERE T2.language = "English"
in text-to-SQL systems.
AND T2.is_official = "T") ;
D Encoding and Decoding Method
In terms of the complexity of natural utterance,
there is no qualitative measure of how hard the Table 8 and Table 9 show the encoding and decod-
utterance is. Intuitively, models’ performance can ing methods that have been discussed in § 3.2 and
decrease when faced with longer questions from § 3.3, respectively.
users. However, the information conveyed in longer
sentences can be more complete, while there can E Other Related Tasks
be ambiguity in shorter sentences. Besides, there Other tasks that are related to text-to-SQL in-
can be domain-specific phrases that confuse the clude text-to-python (Bonthu et al., 2021), text-
model in both short and long utterances (Suhr et al., to-shell script/bash script (Bharadwaj and She-
2020). Thus, researchers need to consider various vade, 2022), text-to-regex (Ye et al., 2020), text-to-
perspectives to determine the complexity of natural SPARQL (Ochieng, 2020), etc. They all take natu-
utterance. ral language queries as input and output different
logical forms. Among these tasks, text-to-SPARQL
C Text-to-SQL Datasets
is closest to text-to-SQL as both SPARQL and SQL
Table 7 lists statistics for text-to-SQL datasets. can execute on database systems. Therefore, some
2184
Datasets #Size #DB #D #T/DB Issues addressed Sources for data
College courses,
Domain
Spider (Yu et al., 2018c) 10,181 200 138 5.1 DabaseAnswers,
generalization
WikiSQL
Domain
Spider-DK (Gan et al., 2021b) 535 10 - 4.8 Spider dev set
knowledge
Spider + 5,330
Untranslatable
SpiderUtran (Zeng et al., 2020) 15,023 200 138 5.1 untranslatable
questions
questions
Spider-L (Lei et al., 2020) 8,034 160 - 5.1 Schema linking Spider train/dev
SpiderSL (Taniguchi et al., 2021) 1,034 10 - 4.8 Schema linking Spider dev set
Spider-Syn (Gan et al., 2021a) 8,034 160 - 5.1 Robustness Spider train/dev
WikiSQL (Zhong et al., 2017) 80,654 26,521 - 1 Data size Wikipedia
WikiTableQuestions
Lexicon-level
Squall (Shi et al., 2020b) 11,468 1,679 - 1 (Pasupat and Liang,
supervision
2015)
Domain
KaggleDBQA (Lee et al., 2021) 272 8 8 2.3 Real web daabases
generalization
ATIS (Price, 1990; Dahl et al., 1994) 5,280 1 1 32 - Flight-booking
GeoQuery (Zelle and Mooney, 1996) 877 1 1 6 - US geography
Academic
Scholar (Iyer et al., 2017) 817 1 1 7 -
publications
Microsoft Academic
Academic (Li and Jagadish, 2014) 196 1 1 15 - Search (MAS)
database
Internet Movie
IMDB (Yaghmazadeh et al., 2017) 131 1 1 16 -
Database
Yelp (Yaghmazadeh et al., 2017) 128 1 1 7 - Yelp website
University of
Advising (Finegan-Dollak et al., 2018) 3,898 1 1 10 - Michigan course
information
Restaurants (Tang and Mooney, 2000)
378 1 1 3 - Restaurants
(Popescu et al., 2003)
MIMICSQL (Wang et al., 2020d) 10,000 1 1 5 - Healthcare domain
SQL template
SEDE (Hazoom et al., 2021) 12,023 1 1 29 Stack Exchange
diversity

Table 7: Summarization for text-to-SQL datasets. #Size, #DB, #D, and #T/DB represent the number of question-SQL
pairs, databases, domains, and tables per domain, respectively. We put “-” in the #D column because we do not
know how many domains are in the Spider dev set and “-” in the Issues Addressed column because there is no
specific issue addressed for the dataset. Datasets above and below the line are cross-domain and single-domain,
respectively.

end-to-end models that take user queries as the in-


put and output a sequence of logical forms can be
applied to both tasks (Raffel et al., 2019). In con-
trast, methods (Xu et al., 2017) designed to take
care of SQL natures cannot be directly applied to
SPARQL, which requires carefully modification
instead.

2185
Applied
Methods Adopted by Addressed challenges
datasets
Representing question
Encode token type TypeSQL (Yu et al., 2018a) WikiSQL
meaning
GNN (Bogin et al., 2019a) Spider
Global-GCN (Bogin et al., 2019b) Spider
IGSQL (Cai and Wan, 2020) Sparc, CoSQL
RAT-SQL (Wang et al., 2020a) Spider
Graph-based LEGSQL (Cao et al., 2021) Spider
SADGA (Cai et al., 2021) Spider
ShawdowGNN (Chen et al., 2021b) Spider (1) Representing ques-
Spider, tion and DB schemas in
S2 SQL (Hui et al., 2022)
Spider-Syn a structured way
X-SQL (He et al., 2019) WikiSQL (2) Schema linking
SQLova (Hwang et al., 2019) WikiSQL
Self-attention RAT-SQL (Wang et al., 2020a) Spider
DuoRAT (Scholak et al., 2021a) Spider
WikiSQL,
UnifiedSKG (Xie et al., 2022)
Spider
X-SQL (He et al., 2019) WikiSQL
SQLova (Hwang et al., 2019) WikiSQL
Guo and Gao (2019) WikiSQL
Adapt PLM HydraNet (Lyu et al., 2020) WikiSQL
Spider-L, Leveraging external
Liu et al. (2021b), etc data to represent ques-
SQUALL
TaBERT (Yin et al., 2020) Spider tion and DB schemas
Pre-training GraPPA (Yu et al., 2021) Spider
GAP (Shi et al., 2020a) Spider

Table 8: Methods used for encoding in text-to-SQL.

2186
Applied
Methods Adopted by Addressed challenges
datasets
Seq2Tree (Dong and Lapata, 2016) -
Tree-based Seq2AST (Yin and Neubig, 2017) -
SyntaxSQLNet (Yu et al., 2018b) Spider
SQLNet (Xu et al., 2017) WikiSQL
Dong and Lapata (2018) WikiSQL Hierarchical decoding
Sketch-
based IRNet (Guo et al., 2019) Spider
RYANSQL (Choi et al., 2021) Spider
Bottom-up SmBop (Rubin and Berant, 2021) Spider
Seq2Tree (Dong and Lapata, 2016) -
Attention Seq2SQL (Zhong et al., 2017) WikiSQL
Attention Bi-attention Guo and Gao (2018) WikiSQL
Mechanism Structured attention Wang et al. (2019) WikiSQL
Relation-aware
DuoRAT (Scholak et al., 2021a) Spider Synthesizing informa-
Self-attention
Seq2AST (Yin and Neubig, 2017) - tion for decoding
Copy Mech- Seq2SQL (Zhong et al., 2017) WikiSQL
anism Wang et al. (2018a) WikiSQL
SeqGenSQL (Li et al., 2020a) WikiSQL
IncSQL (Shi et al., 2018) WikiSQL
IRNet (Guo et al., 2019) Spider
Spider and
Suhr et al. (2020)
others♠
Intermediate Bridging the gap be-
GeoQuery,
Representa- Herzig et al. (2021) tween natural language
ATIS, Scholar
tion and SQL query
Gan et al. (2021c) Spider
Brunner and Stockinger (2021) Spider
WikiSQL,
Constrained decod- UniSAr (Dou et al., 2022) Spide and
ing others♡
PICARD (Scholak et al., 2021b) Spider, CoSQL
Fine-grained decoding
SQLova (Hwang et al., 2019) WikiSQL
Execution-guided
Wang et al. (2018b) WikiSQL
Discriminative Global-GCN (Bogin et al., 2019b) Spider
SQL Ranking
re-ranking Kelkar et al. (2020) Spider
SQLNet (Xu et al., 2017) WikiSQL
Others Separate submodule Guo and Gao (2018) WikiSQL
Lee (2019) Spider Easier decoding
Advising, ATIS,
BPE Müller and Vlachos (2019)
GeoQuery
Synthesizing
Link gating Chen et al. (2020b) Spider information for
decoding

Table 9: Methods used for decoding in text-to-SQL. ♠ : Academic, Advising, ATIS, GeoQuery, Yelp, IMDB, Scholar,
Restaurants; ♡ : TableQA DuSQL, CoSQL, Sparc, Chase.

2187

You might also like