RESDSQL
RESDSQL
Abstract Question
What are flight numbers of flights departing from City
One of the recent best attempts at Text-to-SQL is the pre- "Aberdeen"?
trained language model. Due to the structural property of the Database schema (including tables and columns)
SQL queries, the seq2seq model takes the responsibility of airlines (airlines)
parsing both the schema items (i.e., tables and columns) and uid airline abbreviation country
(airline id) (airline name) (abbreviation) (country)
the skeleton (i.e., SQL keywords). Such coupled targets in-
crease the difficulty of parsing the correct SQL queries es- airports (airports)
pecially when they involve many schema items and logic city airportcode airportname country
(city) (airport code) (airport name) (country)
operators. This paper proposes a ranking-enhanced encod- countryabbrev
ing and skeleton-aware decoding framework to decouple the (country abbrev)
schema linking and the skeleton parsing. Specifically, for a
flights (flights)
seq2seq encoder-decode model, its encoder is injected by airline flightno sourceairport
the most relevant schema items instead of the whole un- (airline) (flight number) (source airport)
ordered ones, which could alleviate the schema linking ef- destairport
(destination airport)
fort during SQL parsing, and its decoder first generates the
skeleton and then the actual SQL query, which could im- Serialize schema items.
plicitly constrain the SQL parsing. We evaluate our pro- Schema sequence
posed framework on Spider and its three robustness vari- airlines: uid, airline, abbreviation, county|airports:
city, airportcode, airportname, county, countyabbrev|
ants: Spider-DK, Spider-Syn, and Spider-Realistic. The ex-
flights: airline, flightno, sourceairport, destairport
perimental results show that our framework delivers promis-
ing performance and robustness. Our code is available at Question + Schema sequence
https://github.com/RUCKBReasoning/RESDSQL.
Seq2seq PLM (such as BART and T5)
SQL query
Introduction select flights.flightno from flights join airports on
flights.sourceairport = airports.airportcode where
Relational databases that are used to store heterogeneous airports.city = “Aberdeen”
data types including text, integer, float, etc., are omnipresent
in modern data management systems. However, ordinary Figure 1: Illustration of a Text-to-SQL instance solved by a
users usually cannot make the best use of databases be- seq2seq PLM. In the database schema, each schema item is
cause they are not good at translating their requirements to denoted by its “original name (semantic name)”.
the database language—i.e., the structured query language
(SQL). To assist these non-professional users in querying
the databases, researchers propose the Text-to-SQL task (Yu volves many complex SQL operators (such as GROUP BY,
et al. 2018a; Cai et al. 2018), which aims to automati- ORDER BY, and HAVING, etc.) and nested SQL queries.
cally translate users’ natural language questions into SQL
With the recent advances in pre-trained language mod-
queries. At the same time, related benchmarks are becom-
els (PLMs), many existing works formulate the Text-to-SQL
ing increasingly complex, from the single-domain bench-
task as a semantic parsing problem and use a sequence-to-
marks such as ATIS (Iyer et al. 2017) and GeoQuery (Zelle
sequence (seq2seq) model to solve it (Scholak, Schucher,
and Mooney 1996) to the cross-domain benchmarks such as
and Bahdanau 2021; Shi et al. 2021; Shaw et al. 2021). Con-
WikiSQL (Zhong, Xiong, and Socher 2017) and Spider (Yu
cretely, as shown in Figure 1, given a question and a database
et al. 2018c). Most of the recent works are done on Spi-
schema, the schema items are serialized into a schema se-
der because it is the most challenging benchmark which in-
quence where the order of the schema items is either default
* Jing Zhang is the corresponding author. or random. Then, a seq2seq PLM, such as BART (Lewis
Copyright © 2023, Association for the Advancement of Artificial et al. 2020) and T5 (Raffel et al. 2020), is leveraged to gen-
Intelligence (www.aaai.org). All rights reserved. erate the SQL query based on the concatenation of the ques-
13067
tion and the schema sequence. We observe that the target Problem Definition
SQL query contains not only the skeleton that reveals the Database Schema A relational database is denoted as
logic of the question but also the required schema items. D. The database schema S of D includes (1) a set of
For instance, for a SQL query: “SELECT petid FROM pets N tables T = {t1 , t2 , · · · , tN }, (2) a set of columns
WHERE pet age = 1”, its skeleton is “SELECT FROM C = {c11 , · · · , c1n1 , c21 , · · · , c2n2 , · · · , cN N
1 , · · · , cnN } associ-
WHERE ” and its required schema items are “petid”, ated with the tables, where ni is the number of columns in
“pets”, and “pet age”. the i-th table, (3) and a set of foreign key relations R =
Since Text-to-SQL needs to perform not only the schema {(cik , cjh )|cik , cjh ∈ C}, where each (cik , cjh ) denotes a for-
linking which aligns the mentioned entities in the question eign key relation between column cik and column cjh . We use
to schema items in the database schema, but also the skele- PN
M = i=1 ni to denote the total number of columns in D.
ton parsing which parses out the skeleton of the SQL query,
the major challenges are caused by a large number of re- Original Name and Semantic Name We use “schema
quired schema items and the complex composition of opera- items” to uniformly refer to tables and columns in the
tors such as GROUP BY, HAVING, and JOIN ON involved database. Each schema item can be represented by an origi-
in a SQL query. The intertwining of the schema linking and nal name and a semantic name. The semantic name can in-
the skeleton parsing complicates learning even more. dicate the semantics of the schema item more precisely. As
To investigate whether the Text-to-SQL task could be- shown in Figure 1, it is obvious that the semantic names “air-
come easier if the schema linking and the skeleton pars- line id” and “destination airport” are more clear than their
ing are decoupled, we conduct a preliminary experiment original names “uid” and “destairport”. Sometimes the se-
on Spider’s dev set. Concretely, we fine-tune a T5-Base mantic name is the same as the original name.
model to generate the pure skeletons based on the ques- Text-to-SQL Task Formally, given a question q in natural
tions (i.e., skeleton parsing task). We observe that the exact language and a database D with its schema S, the Text-to-
match accuracy on such a task achieves about 80% using the SQL task aims to translate q into a SQL query l that can be
fine-tuned T5-Base. However, even the T5-3B model only executed on D to answer the question q.
achieves about 70% accuracy (Shaw et al. 2021; Scholak,
Schucher, and Bahdanau 2021). This pre-experiment indi- Methodology
cates that decoupling such two objectives could be a poten-
tial way of reducing the difficulty of Text-to-SQL. In this section, we first give an overview of the proposed
framework and then delve into its design details.
To realize the above decoupling idea, we propose
a Ranking-enhanced Encoding plus a Skeleton-aware Model Overview
Decoding framework for Text-to-SQL (RESDSQL). The
former injects a few but most relevant schema items into Following Shaw et al. (2021); Scholak, Schucher, and Bah-
the seq2seq model’s encoder instead of all schema items. In danau (2021), we treat Text-to-SQL as a translation task,
other words, the schema linking is conducted beforehand to which can be solved by an encoder-decoder transformer
filter out most of the irrelevant schema items in the database model. Facing the above problems, we extend the existing
schema, which can alleviate the difficulty of the schema seq2seq Text-to-SQL methods by injecting the most relevant
linking for the seq2seq model. For such purpose, we train an schema items in the input sequence and the SQL skeleton
additional cross-encoder to classify the tables and columns in the output sequence, which results in a ranking-enhanced
simultaneously based on the input question, and then rank encoder and a skeleton-aware decoder. We provide the high-
and filter them according to the classification probabilities level overview of the proposed RESDSQL framework in
to form a ranked schema sequence. The latter does not add Figure 2. The encoder of the seq2seq model receives the
any new modules but simply allows the seq2seq model’s de- ranked schema sequence, such that the schema linking ef-
coder to first generate the SQL skeleton, and then the actual fort could be alleviated during SQL parsing. To obtain such
SQL query. Since skeleton parsing is much easier than SQL a ranked schema sequence, an additional cross-encoder is
parsing, the first generated skeleton could implicitly guide proposed to classify the schema items according to the given
the subsequent SQL parsing via the masked self-attention question, and then we rank and filter them based on the clas-
mechanism in the decoder. sification probabilities. The decoder of the seq2seq model
first parses out the SQL skeleton and then the actual SQL
query, such that the SQL generation can be implicitly con-
Contributions (1) We investigate a potential way of de- strained by the previously parsed skeleton. By doing this, to
coupling the schema linking and the skeleton parsing to re- a certain extent, the schema linking and the skeleton parsing
duce the difficulty of Text-to-SQL. Specifically, we propose are not intertwined but decoupled.
a ranking-enhanced encoder to alleviate the effort of the
schema linking and a skeleton-aware decoder to implicitly Ranking-Enhanced Encoder
guide the SQL parsing by the skeleton. (2) We conduct ex- Instead of injecting all schema items, we only consider the
tensive evaluation and analysis and show that our framework most relevant schema items in the input of the encoder. For
not only achieves the new state-of-the-art (SOTA) perfor- this purpose, we devise a cross-encoder to classify the tables
mance on Spider but also exhibits strong robustness. and columns simultaneously and then rank them based on
13068
Seq2seq Pre-trained Language Model SQL skeleton SQL query
Ranking-enhanced
T5 encoder encoder Skeleton-aware decoder
…
<BOS> select _ from _ where …
Cross-encoder
What are … City "Aberdeen"?|airlines: airline id, airline name, …|airports: city, airport code, …
Figure 2: An overview of the ranking-enhanced encoding and skeleton-aware decoding framework. We train a cross-encoder
for classifying the schema items. Then we take the question, the ranked schema sequence, and optional foreign keys as the input
of the ranking-enhanced encoder. The skeleton-aware decoder first decodes the SQL skeleton and then the actual SQL query.
their probabilities. Based on the ranking order, on one hand, a table could be identified even if the question only men-
we filter out the irrelevant schema items. On the other hand, tions its columns. Concretely, for the i-th table, we inject
we use the ranked schema sequence instead of the unordered the column information C:i ∈ Rni ×d into the table embed-
schema sequence, so that the seq2seq model could capture ding Ti by stacking a multi-head scaled dot-product atten-
potential position information for schema linking. tion layer (Vaswani et al. 2017) and a feature fusion layer on
As for the input of the cross-encoder, we flatten the the top of the encoding module:
schema items into a schema sequence in their default or-
der and concatenate it with the question to form an input TiC = M ultiHeadAttn(Ti , C:i , C:i , h), (1)
sequence: X = q | t1 : c11 , · · · , c1n1 | · · · | tN : cN N
1 , · · · , c nN , T̂i = N orm(Ti + TiC ).
where | is the delimiter. To better represent the semantics of
schema items, instead of their original names, we use their Here, Ti acts as the query and C:i acts as both the key and
semantic names which are closer to the natural expression. the value, h is the number of heads, and N orm(·) is a row-
wise L2 normalization function. TiC represents the column-
Encoding Module We feed X into RoBERTa (Liu et al. attentive table embedding. We fuse the original table em-
2019), an improved version of BERT (Devlin et al. 2019). bedding Ti and the column-attentive table embedding TiC to
Since each schema item will be tokenized into one or more obtain the column-enhanced table embedding T̂i ∈ R1×d .
tokens by PLM’s tokenizer (e.g., the column “airline id” will
be split into two tokens: “airline” and “id”), and our target is Loss Function of Cross-Encoder Cross-entropy loss is a
to represent each schema item as a whole for classification, well-adopted loss function in classification tasks. However,
we need to pool the output embeddings belonging to each since a SQL query usually involves only a few tables and
schema item. To achieve this goal, we use a pooling mod- columns in the database, the label distribution of the train-
ule that consists of a two-layer BiLSTM (Hochreiter and ing set is highly imbalanced. As a result, the number of nega-
Schmidhuber 1997) and a non-linear fully-connected layer. tive examples is many times that of positive examples, which
After pooling, each table embedding can be denoted by Ti ∈ will induce serious training bias. To alleviate this issue, we
R1×d (i ∈ {1, ..., N }) and each column embedding can be employ the focal loss (Lin et al. 2017) as our classification
denoted by Cki ∈ R1×d (i ∈ {1, ..., N }, k ∈ {1, ..., ni }), loss. Then, we form the loss function of the cross-encoder in
where d denotes the hidden size. a multi-task learning way, which consists of both the table
classification loss and the column classification loss, i.e.,
Column-Enhanced Layer We observe that some ques- N N ni
tions only mention the column name rather than the table 1 X 1 XX
L1 = F L(yi , ŷi ) + F L(yki , ŷki ), (2)
name. For example in Figure 1, the question mentions a N i=1 M i=1
k=1
column name “city”, but its corresponding table name “air-
ports” is ignored. This table name missing issue may com- where F L denotes the focal loss function and yi is the
promise the table classification performance. Therefore, we ground truth label of the i-th table. yi = 1 indicates the
propose a column-enhanced layer to inject the column infor- table is referenced by the SQL query and 0 otherwise. yki is
mation into the corresponding table embedding. In this way, the ground truth label of the k-th column in the i-th table.
13069
Similarly, yki = 1 indicates the column is referenced by the List the duration, file size and format of songs
Q
SQL query and 0 otherwise. ŷi and ŷki are predicted proba- whose genre is pop, ordered by title?
bilities, which are estimated by two different MLP modules SELECT T1.duration, T1.file size, T1.formats
FROM files AS T1 JOIN song AS T2 ON T1.f id =
based on the table and column embeddings T̂i and Cki : SQLo
T2.f id WHERE T2.genre is = “pop” ORDER BY
T2.song name
ŷi = σ((T̂i U1t + bt1 )U2t + bt2 ), (3) select files.duration, files.file size, files.formats
ŷki = σ((Cki U1c + bc1 )U2c + bc2 ), SQLn from files join song on files.f id = song.f id where
song.genre is = ‘pop’ order by song.song name asc
where U1t , U1c ∈ Rd×w , bt1 , bc1 ∈ Rw , U2t , U2c ∈ Rw×2 , bt2 , SQLs select from where order by asc
bc2 ∈ R2 are trainable parameters, and σ(·) denotes Softmax.
Prepare Input for Ranking-Enhanced Encoder During Table 1: An example from Spider. Here, Q, SQLo , SQLn ,
inference, for each Text-to-SQL instance, we leverage the and SQLs denote the question, the original SQL query, the
above-trained cross-encoder to compute a probability for normalized SQL query, and the SQL skeleton, respectively.
each schema item. Then, we only keep top-k1 tables in the
database and top-k2 columns for each remained table to
form a ranked schema sequence. k1 and k2 are two impor- skeleton. Now, the objective of the seq2seq model is:
tant hyper-parameters. When k1 or k2 is too small, a portion G
of the required tables or columns may be excluded, which is 1 X s
L2 = p(l , li |Si ), (4)
fatal for the subsequent seq2seq model. As k1 or k2 becomes G i=1 i
larger, more and more irrelevant tables or columns may be
introduced as noise. Therefore, we need to choose appro- where G is the number of Text-to-SQL instances, Si is the
priate values for k1 and k2 to ensure a high recall while input sequence of the i-th instance which consists of the
preventing the introduction of too much noise. The input question, the ranked schema sequence, and optional foreign
sequence for the ranking-enhanced encoder (i.e., seq2seq key relations. li denotes the i-th target SQL query and lis is
model’s encoder) is formed as the concatenation of the ques- the skeleton extracted from li . We will present some neces-
tion, the ranked schema sequence, and optional foreign key sary details on how to normalize SQL queries and how to
relations (see Figure 2). Foreign key relations contain rich extract their skeletons.
information about the structure of the database, which could SQL Normalization The Spider dataset is manually cre-
promote the generation of the JOIN ON clauses. In the ated by 11 annotators with different annotation habits, which
ranked schema sequence, we use the original names instead results in slightly different styles among the final annotated
of the semantic names because the schema items in the SQL SQL queries, such as uppercase versus lowercase keywords.
queries are represented by their original names, and using Although different styles have no impact on the execution
the former will facilitate the decoder to directly copy re- results, the model requires some extra effort to learn and
quired schema items from the input sequence. adapt to them. To reduce the learning difficulty, we normal-
ize the original SQL queries before training by (1) unifying
Skeleton-Aware Decoder the keywords and schema items into lowercase, (2) adding
Most seq2seq Text-to-SQL methods tell the decoder to gen- spaces around parentheses and replacing double quotes with
erate the target SQL query directly. However, the apparent single quotes, (3) adding an ASC keyword after the ORDER
gap between the natural language and the SQL query makes BY clause if it does not specify the order, and (4) remov-
it difficult to perform the correct generation. To alleviate this ing the AS clause and replacing all table aliases with their
problem, we would like to decompose the SQL generation original names. We present an example in Table 1.
into two steps: (1) generate the SQL skeleton based on the
SQL Skeleton Extraction Based on the normalized SQL
semantics of the question, and then (2) select the required
queries, we can extract their skeletons which only contain
“data” (i.e., tables, columns, and values) from the input se-
SQL keywords and slots. Specifically, given a normalized
quence to fill the slots in the skeleton.
SQL query, we keep its keywords and replace the rest parts
To realize the above decomposition idea without adding
with slots. Note that we do not keep the JOIN ON keyword
additional modules, we propose a new generation objective
because it is difficult to find a counterpart from the ques-
based on the intrinsic characteristic of the transformer de-
tion (Gan et al. 2021b). As shown in Table 1, although the
coder, which generates the t-th token depending on not only
original SQL query looks complex, its skeleton is simple and
the output of the encoder but also the output of the decoder
each keyword can find a counterpart from the question. For
before the t-th time step (Vaswani et al. 2017). Concretely,
example, “order by asc” in the skeleton can be inferred
instead of decoding the target SQL directly, we encourage
from “ordered by title?” in the question.
the decoder to first decode the skeleton of the SQL query,
and based on this, we continue to decode the SQL query. Execution-Guided SQL Selector Since we do not con-
By parsing the skeleton first and then parsing the SQL strain the decoder with SQL grammar, the model may gen-
query, at each decoding step, SQL generation will be easier erate some illegal SQL queries. To alleviate this problem, we
because the decoder could either copy a “data” from the in- follow Suhr et al. (2020) to use an execution-guided SQL se-
put sequence or a SQL keyword from the previously parsed lector which performs the beam search during the decoding
13070
procedure and then selects the first executable SQL query in = 32, lr = 5e-5), and RESDSQL-3B (bs = 96, lr = 5e-5). For
the beam as the final result. both stages of training, we adopt linear warm-up (the first
10% training steps) and cosine decay to adjust the learning
Experiments rate. We set the beam size to 8 during decoding. Moreover,
following Lin, Socher, and Xiong (2020), we extract poten-
Experimental Setup
tially useful contents from the database to enrich the column
Datasets We conduct extensive experiments on Spider and information.
its three variants which are proposed to evaluate the robust-
ness of the Text-to-SQL parser. Spider (Yu et al. 2018c) is Environments We conduct all experiments on a server
the most challenging benchmark for the cross-domain and with one NVIDIA A100 (80G) GPU, one Intel(R) Xeon(R)
multi-table Text-to-SQL task. Spider contains a training set Silver 4316 CPU, 256 GB memory and Ubuntu 20.04.2 LTS
with 7,000 samples1 , a dev set with 1,034 samples, and a operating system.
hidden test set with 2,147 samples. There is no overlap be-
tween the databases in different splits. For robustness, we Results on Spider
train the model on Spider’s training set but evaluate it on Table 2 reports EM and EX results on Spider. Notice-
Spider-DK (Gan, Chen, and Purver 2021) with 535 sam- ably, we observe that RESDSQL-Base achieves better per-
ples, Spider-Syn (Gan et al. 2021a) with 1034 samples, formance than the bare T5-3B, which indicates that our
and Spider-Realistic (Deng et al. 2021) with 508 samples. decoupling idea can substantially reduce the learning dif-
These evaluation sets are derived from Spider by modify- ficulty of Text-to-SQL. Then, RESDSQL-3B outperforms
ing questions to simulate real-world application scenarios. the best baseline by 1.6% EM and 1.3% EX on the dev
Concretely, Spider-DK incorporates some domain knowl- set. Furthermore, when combined with NatSQL (Gan et al.
edge to paraphrase questions. Spider-Syn replaces schema- 2021b), an intermediate representation of SQL, RESDSQL-
related words with synonyms in questions. Spider-Realistic Large achieves competitive results compared to powerful
removes explicitly mentioned column names in questions. baselines on the dev set, and RESDSQL-3B achieves new
Evaluation Metrics To evaluate the performance of the SOTA performance on both the dev set and the test set.
Text-to-SQL parser, following Yu et al. 2018c; Zhong, Yu, Specifically, on the dev set, RESDSQL-3B + NatSQL brings
and Klein 2020, we adopt two metrics: Exact-set-Match 4.2% EM and 3.6% EX absolute improvements. On the
accuracy (EM) and EXecution accuracy (EX). The former hidden test set, RESDSQL-3B + NatSQL achieves com-
measures whether the predicted SQL query can be exactly petitive performance on EM and dramatically increases
matched with the gold SQL query by converting them into EX from 75.5% to 79.9% (+4.4%), showing the effective-
a special data structure (Yu et al. 2018c). The latter com- ness of our approach. The reason for the large gap be-
pares the execution results of the predicted SQL query and tween EM (72.0%) and EX (79.9%) is that EM is overly
the gold SQL query. The EX metric is sensitive to the gen- strict (Zhong, Yu, and Klein 2020). For example in Spider,
erated values, but the EM metric is not. In practice, we use given a question “Find id of the candidate who most re-
the sum of EM and EX to select the best checkpoint of the cently accessed the course?”, its gold SQL query is “select
seq2seq model. For the cross-encoder, we use Area Under candidate id from candidate assessments order by assess-
ROC Curve (AUC) to evaluate its performance. Since the ment date desc limit 1”. In fact, there is another SQL query
cross-encoder classifies tables and columns simultaneously, “select candidate id from candidate assessments where as-
we adopt the sum of table AUC and column AUC to select sessment date = (select max(assessment date) from candi-
the best checkpoint of the cross-encoder. date assessments)” which can also be executed to answer
the question (i.e., EX is positive). However, EM will judge
Implementation Details We train RESDSQL in two the latter to be wrong, which leads to false negatives.
stages. In the first stage, we train the cross-encoder for rank-
ing schema items. The number of heads h in the column- Results on Robustness Settings
enhanced layer is 8. We use AdamW (Loshchilov and Hut- Recent studies (Gan et al. 2021a; Deng et al. 2021) show that
ter 2019) with batch size 32 and learning rate 1e-5 for opti- neural Text-to-SQL parsers are fragile to question pertur-
mization. In the focal loss, the focusing parameter γ and the bations because explicitly mentioned schema items are re-
weighted factor α are set to 2 and 0.75 respectively. Then, moved or replaced with semantically consistent words (e.g.,
k1 and k2 are set to 4 and 5 according to the statistics of the synonyms), which increases the difficulty of schema link-
datasets. For training the seq2seq model in the second stage, ing. Therefore, more and more efforts have been recently
we consider three scales of T5: Base, Large, and 3B. We devoted to improving the robustness of neural Text-to-SQL
fine-tune them with Adafactor (Shazeer and Stern 2018) us- parsers, such as TKK (Qin et al. 2022) and SUN (Gao et al.
ing different batch size (bs) and learning rate (lr), resulting in 2022). To validate the robustness of RESDSQL, we train our
RESDSQL-Base (bs = 32, lr = 1e-4), RESDSQL-Large (bs model on Spider’s training set and evaluate it on three chal-
1
Spider also provides additional 1,659 training samples, which lenging Spider variants: Spider-DK, Spider-Syn, and Spider-
are collected from some single-domain datasets, such as Geo- Realistic. Results are reported in Table 3. We can observe
Query (Zelle and Mooney 1996) and Restaurants (Giordani and that in all three datasets, RESDSQL-3B + NatSQL surpris-
Moschitti 2012). But following (Scholak, Schucher, and Bahdanau ingly outperforms all strong competitors by a large margin,
2021), we ignore this part in our training set. which suggests that our decoupling idea can also improve
13071
Dev Set Test Set
Approach
EM EX EM EX
Non-seq2seq methods
RAT-SQL + G RAPPA (Yu et al. 2021) 73.4 - 69.6 -
RAT-SQL + GAP + NatSQL (Gan et al. 2021b) 73.7 75.0 68.7 73.3
S M B O P + G RAPPA (Rubin and Berant 2021) 74.7 75.0 69.5 71.1
DT-Fixup SQL-SP + RoBERTa (Xu et al. 2021) 75.0 - 70.9 -
LGESQL + ELECTRA (Cao et al. 2021) 75.1 - 72.0 -
S2 SQL + ELECTRA (Hui et al. 2022) 76.4 - 72.1 -
Seq2seq methods
T5-3B (Scholak, Schucher, and Bahdanau 2021) 71.5 74.4 68.0 70.1
T5-3B + PICARD (Scholak, Schucher, and Bahdanau 2021) 75.5 79.3 71.9 75.1
RASAT + PICARD (Qi et al. 2022) 75.3 80.5 70.9 75.5
Our proposed method
RESDSQL-Base 71.7 77.9 - -
RESDSQL-Base + NatSQL 74.1 80.2 - -
RESDSQL-Large 75.8 80.1 - -
RESDSQL-Large + NatSQL 76.7 81.9 - -
RESDSQL-3B 78.0 81.8 - -
RESDSQL-3B + NatSQL 80.5 84.1 72.0 79.9
Table 2: EM and EX results on Spider’s development set and hidden test set (%). We compare our approach with some powerful
baseline methods from the top of the official leaderboard of Spider.
13072
Spider-DK Spider-Syn Spider-Realistic
Approach
EM EX EM EX EM EX
RAT-SQL + BERT (Wang et al. 2020a) 40.9 - 48.2 - 58.1 62.1
RAT-SQL + G RAPPA (Yu et al. 2021) 38.5 - 49.1 - 59.3 -
T5-3B (Gao et al. 2022) - - 59.4 65.3 63.2 65.0
LGESQL + ELECTRA (Cao et al. 2021) 48.4 - 64.6 - 69.2 -
TKK-3B (Gao et al. 2022) - - 63.0 68.2 68.5 71.1
T5-3B + PICARD (Qi et al. 2022) - - - - 68.7 71.4
RASAT + PICARD (Qi et al. 2022) - - - - 69.7 71.9
LGESQL + ELECTRA + SUN (Qin et al. 2022) 52.7 - 66.9 - 70.9 -
RESDSQL-3B + NatSQL 53.3 66.0 69.1 76.9 77.4 81.9
Model variant Table AUC Column AUC Total Schema Item Classification
Cross-encoder 0.9973 0.9957 1.9930 Schema item classification is often introduced as an aux-
- w/o enh. layer 0.9965 0.9939 1.9904 iliary task to improve the schema linking performance for
- w/o focal loss 0.9958 0.9943 1.9901 Text-to-SQL. For example, G RAPPA (Yu et al. 2021) and
GAP (Shi et al. 2021) further pre-train the PLMs by using
Table 4: Ablation studies of the cross-encoder. the schema item classification task as one of the pre-training
objectives. Then, Text-to-SQL can be viewed as a down-
stream task to be fine-tuned. Cao et al. (2021) combine the
Model variant EM (%) EX (%)
schema item classification task with the Text-to-SQL task
RESDSQL-Base 71.7 77.9 in a multi-task learning way. The above-mentioned methods
- w/o ranking schema items 67.2 70.1 enhance the encoder by pre-training or the multi-task learn-
- w/o skeleton parsing 71.0 77.1 ing paradigm. Instead, we propose an independent cross-
encoder as the schema item classifier which is easier to be
Table 5: The effect of key designs. trained. We use the classifier to re-organize the input of the
seq2seq model, which can produce a more direct impact on
schema linking. Bogin, Gardner, and Berant (2019) calcu-
PLMs, graph neural networks (GNNs) usually cannot be de- late a relevance score for each schema item, which is then
signed too deep due to the limitation of the over-smoothing used as the soft coefficient of the schema items in the subse-
issue (Chen et al. 2020), which restricts the representation quent graph encoder. Compared with them, our method can
ability of GNNs. Then, PLMs have already encoded lan- be viewed as a hard filtering of schema items which can re-
guage patterns in their parameters after pre-training (Zhang duce noise more effectively.
et al. 2021), however, the parameters of GNNs are usually
randomized. Moreover, the graph encoder relies heavily on Intermediate Representation
the design of relations, which may limit its robustness and Because there is a huge gap between natural language ques-
generality on other datasets (Gao et al. 2022). tions and their corresponding SQL queries, some works have
focused on how to design an efficient intermediate repre-
Grammar-Based Decoder To inject the SQL grammar sentation (IR) to bridge the aforementioned gap (Yu et al.
into the decoder, Yin and Neubig (2017); Krishnamurthy, 2018b; Guo et al. 2019; Gan et al. 2021b). Instead of di-
Dasigi, and Gardner (2017) propose a top-down decoder rectly generating full-fledged SQL queries, these IR-based
to generate a sequence of pre-defined actions that can de- methods encourage models to generate IRs, which can be
scribe the grammar tree of the SQL query. Rubin and Be- translated to SQL queries via a non-trainable transpiler.
rant (2021) devise a bottom-up decoder instead of the top-
down paradigm. PICARD (Scholak, Schucher, and Bah-
danau 2021) incorporates an incremental parser into the
Conclusion
auto-regressive decoder of PLMs to prune the invalid par- In this paper, we propose RESDSQL, a simple yet powerful
tially generated SQL queries during beam search. Text-to-SQL parser. We first train a cross-encoder to rank
and filter schema items which are then injected into the en-
Execution-Guided Decoder Some works use an off-the- coder of the seq2seq model. We also let the decoder generate
shelf SQL executor such as SQLite to ensure grammatical the SQL skeleton first, which can implicitly guide the sub-
correctness. Wang et al. (2018) leverage a SQL executor sequent SQL generation. To a certain extent, such a frame-
to check and discard the partially generated SQL queries work decouples schema linking and skeleton parsing, which
which raise errors during decoding. To avoid modifying the can alleviate the difficulty of Text-to-SQL. Extensive exper-
decoder, Suhr et al. (2020) check the executability of each iments on Spider and its three variants demonstrate the per-
candidate SQL query, which is also adopted by our method. formance and robustness of RESDSQL.
13073
Acknowledgments Gan, Y.; Chen, X.; Xie, J.; Purver, M.; Woodward, J. R.;
We thank Hongjin Su and Tao Yu for their efforts in evaluat- Drake, J. H.; and Zhang, Q. 2021b. Natural SQL: Making
ing our model on Spider’s test set. We also thank the anony- SQL Easier to Infer from Natural Language Specifications.
mous reviewers for their helpful suggestions. This work is In Findings of EMNLP 2021, 2030–2042.
supported by National Natural Science Foundation of China Gao, C.; Li, B.; Zhang, W.; Lam, W.; Li, B.; Huang, F.; Si,
(62076245, 62072460, 62172424, 62276270) and Beijing L.; and Li, Y. 2022. Towards Generalizable and Robust Text-
Natural Science Foundation (4212022). to-SQL Parsing. In Findings of EMNLP 2022.
Giordani, A.; and Moschitti, A. 2012. Automatic Generation
References and Reranking of SQL-derived Answers to NL Questions.
Bogin, B.; Gardner, M.; and Berant, J. 2019. Global Rea- In Proceedings of the Second International Conference on
soning over Database Structures for Text-to-SQL Parsing. Trustworthy Eternal Systems via Evolving Software, Data
In EMNLP-IJCNLP 2019, 3657–3662. and Knowledge, 59–76.
Cai, R.; Xu, B.; Zhang, Z.; Yang, X.; Li, Z.; and Liang, Z. Guo, J.; Zhan, Z.; Gao, Y.; Xiao, Y.; Lou, J.; Liu, T.; and
2018. An Encoder-Decoder Framework Translating Natu- Zhang, D. 2019. Towards Complex Text-to-SQL in Cross-
ral Language to Database Queries. In Proceedings of the Domain Database with Intermediate Representation. In Pro-
Twenty-Seventh International Joint Conference on Artificial ceedings of the 57th Conference of the Association for Com-
Intelligence, IJCAI 2018, 3977–3983. putational Linguistics, ACL 2019, Florence, Italy, July 28-
August 2, 2019, Volume 1: Long Papers, 4524–4535.
Cai, R.; Yuan, J.; Xu, B.; and Hao, Z. 2021. SADGA:
Structure-Aware Dual Graph Aggregation Network for Text- Hochreiter, S.; and Schmidhuber, J. 1997. Long Short-Term
to-SQL. In Advances in Neural Information Processing Sys- Memory. Neural Comput., 1735–1780.
tems 34: Annual Conference on Neural Information Pro- Hui, B.; Geng, R.; Wang, L.; Qin, B.; Li, Y.; Li, B.; Sun,
cessing Systems 2021, NeurIPS 2021, 7664–7676. J.; and Li, Y. 2022. S2 SQL: Injecting Syntax to Question-
Cao, R.; Chen, L.; Chen, Z.; Zhao, Y.; Zhu, S.; and Yu, K. Schema Interaction Graph Encoder for Text-to-SQL Parsers.
2021. LGESQL: Line Graph Enhanced Text-to-SQL Model In Findings of ACL 2022, 1254–1262.
with Mixed Local and Non-Local Relations. In ACL/IJC- Iyer, S.; Konstas, I.; Cheung, A.; Krishnamurthy, J.; and
NLP 2021, 2541–2555. Zettlemoyer, L. 2017. Learning a Neural Semantic Parser
Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; and Sun, X. from User Feedback. In Proceedings of the 55th Annual
2020. Measuring and Relieving the Over-Smoothing Prob- Meeting of the Association for Computational Linguistics,
lem for Graph Neural Networks from the Topological View. ACL 2017, 963–973.
In The Thirty-Fourth AAAI Conference on Artificial Intelli- Krishnamurthy, J.; Dasigi, P.; and Gardner, M. 2017. Neural
gence, AAAI 2020, The Thirty-Second Innovative Applica- Semantic Parsing with Type Constraints for Semi-Structured
tions of Artificial Intelligence Conference, IAAI 2020, The Tables. In EMNLP 2017, 1516–1526.
Tenth AAAI Symposium on Educational Advances in Artifi-
Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mo-
cial Intelligence, EAAI 2020, New York, NY, USA, February
hamed, A.; Levy, O.; Stoyanov, V.; and Zettlemoyer, L.
7-12, 2020, 3438–3445.
2020. BART: Denoising Sequence-to-Sequence Pre-training
Deng, X.; Awadallah, A. H.; Meek, C.; Polozov, O.; Sun, for Natural Language Generation, Translation, and Compre-
H.; and Richardson, M. 2021. Structure-Grounded Pretrain- hension. In ACL 2020, 7871–7880.
ing for Text-to-SQL. In Proceedings of the 2021 Confer-
ence of the North American Chapter of the Association for Lin, T.; Goyal, P.; Girshick, R. B.; He, K.; and Dollár, P.
Computational Linguistics: Human Language Technologies, 2017. Focal Loss for Dense Object Detection. In IEEE
NAACL-HLT 2021, Online, June 6-11, 2021, 1337–1350. International Conference on Computer Vision, ICCV 2017,
2999–3007.
Devlin, J.; Chang, M.; Lee, K.; and Toutanova, K. 2019.
BERT: Pre-training of Deep Bidirectional Transformers for Lin, X. V.; Socher, R.; and Xiong, C. 2020. Bridging Textual
Language Understanding. In Proceedings of the 2019 Con- and Tabular Data for Cross-Domain Text-to-SQL Semantic
ference of the North American Chapter of the Association Parsing. In Findings of EMNLP 2020, 4870–4888.
for Computational Linguistics: Human Language Technolo- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.;
gies, NAACL-HLT 2019, 4171–4186. Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V.
Gan, Y.; Chen, X.; Huang, Q.; Purver, M.; Woodward, J. R.; 2019. RoBERTa: A Robustly Optimized BERT Pretraining
Xie, J.; and Huang, P. 2021a. Towards Robustness of Text- Approach. arXiv preprint arXiv:1907.11692.
to-SQL Models against Synonym Substitution. In ACL/IJC- Loshchilov, I.; and Hutter, F. 2019. Decoupled Weight De-
NLP 2021, (Volume 1: Long Papers), Virtual Event, August cay Regularization. In 7th International Conference on
1-6, 2021, 2505–2515. Learning Representations, ICLR 2019.
Gan, Y.; Chen, X.; and Purver, M. 2021. Exploring Under- Qi, J.; Tang, J.; He, Z.; Wan, X.; Cheng, Y.; Zhou, C.; Wang,
explored Limitations of Cross-Domain Text-to-SQL Gener- X.; Zhang, Q.; and Lin, Z. 2022. RASAT: Integrating Re-
alization. In EMNLP 2021, Virtual Event / Punta Cana, Do- lational Structures into Pretrained Seq2Seq Model for Text-
minican Republic, 7-11 November, 2021, 8926–8931. to-SQL. In EMNLP 2022.
13074
Qin, B.; Wang, L.; Hui, B.; Li, B.; Wei, X.; Li, B.; Huang, F.; SQL Generation with Execution-Guided Decoding. arXiv
Si, L.; Yang, M.; and Li, Y. 2022. SUN: Exploring Intrinsic preprint arXiv:1807.03100.
Uncertainties in Text-to-SQL Parsers. In Proceedings of the Wang, K.; Shen, W.; Yang, Y.; Quan, X.; and Wang, R.
29th International Conference on Computational Linguis- 2020b. Relational Graph Attention Network for Aspect-
tics, COLING 2022, Gyeongju, Republic of Korea, October based Sentiment Analysis. In Proceedings of the 58th An-
12-17, 2022, 5298–5308. nual Meeting of the Association for Computational Linguis-
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; tics, ACL 2020, 3229–3238.
Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring Xu, P.; Kumar, D.; Yang, W.; Zi, W.; Tang, K.; Huang, C.;
the Limits of Transfer Learning with a Unified Text-to-Text Cheung, J. C. K.; Prince, S. J. D.; and Cao, Y. 2021. Opti-
Transformer. J. Mach. Learn. Res., 140:1–140:67. mizing Deeper Transformers on Small Datasets. In ACL/I-
Rubin, O.; and Berant, J. 2021. SmBoP: Semi- JCNLP 2021, 2089–2102.
autoregressive Bottom-up Semantic Parsing. In Proceedings Yin, P.; and Neubig, G. 2017. A Syntactic Neural Model for
of the 2021 Conference of the North American Chapter of General-Purpose Code Generation. In Proceedings of the
the Association for Computational Linguistics: Human Lan- 55th Annual Meeting of the Association for Computational
guage Technologies, NAACL-HLT 2021, 311–324. Linguistics, ACL 2017, 440–450.
Schlichtkrull, M. S.; Kipf, T. N.; Bloem, P.; van den Berg, Yu, T.; Li, Z.; Zhang, Z.; Zhang, R.; and Radev, D. R.
R.; Titov, I.; and Welling, M. 2018. Modeling Relational 2018a. TypeSQL: Knowledge-Based Type-Aware Neural
Data with Graph Convolutional Networks. In The Semantic Text-to-SQL Generation. In Proceedings of the 2018 Con-
Web - 15th International Conference, ESWC 2018, 593–607. ference of the North American Chapter of the Association
Scholak, T.; Schucher, N.; and Bahdanau, D. 2021. for Computational Linguistics: Human Language Technolo-
PICARD: Parsing Incrementally for Constrained Auto- gies, NAACL-HLT, 588–594.
Regressive Decoding from Language Models. In EMNLP Yu, T.; Wu, C.; Lin, X. V.; Wang, B.; Tan, Y. C.;
2021, 9895–9901. Yang, X.; Radev, D. R.; Socher, R.; and Xiong, C. 2021.
Shaw, P.; Chang, M.; Pasupat, P.; and Toutanova, K. 2021. GraPPa: Grammar-Augmented Pre-Training for Table Se-
Compositional Generalization and Natural Language Vari- mantic Parsing. In 9th International Conference on Learn-
ation: Can a Semantic Parsing Approach Handle Both? In ing Representations, ICLR 2021.
ACL/IJCNLP 2021, 922–938. Yu, T.; Yasunaga, M.; Yang, K.; Zhang, R.; Wang, D.; Li, Z.;
Shaw, P.; Uszkoreit, J.; and Vaswani, A. 2018. Self- and Radev, D. R. 2018b. SyntaxSQLNet: Syntax Tree Net-
Attention with Relative Position Representations. In works for Complex and Cross-DomainText-to-SQL Task.
NAACL-HLT, 464–468. arXiv preprint arXiv:1810.05237.
Shazeer, N.; and Stern, M. 2018. Adafactor: Adaptive Learn- Yu, T.; Zhang, R.; Yang, K.; Yasunaga, M.; Wang, D.; Li,
ing Rates with Sublinear Memory Cost. In Proceedings Z.; Ma, J.; Li, I.; Yao, Q.; Roman, S.; Zhang, Z.; and Radev,
of the 35th International Conference on Machine Learning, D. R. 2018c. Spider: A Large-Scale Human-Labeled Dataset
ICML 2018, 4603–4611. for Complex and Cross-Domain Semantic Parsing and Text-
to-SQL Task. In Proceedings of the 2018 Conference on
Shi, P.; Ng, P.; Wang, Z.; Zhu, H.; Li, A. H.; Wang, J.; Empirical Methods in Natural Language Processing, 3911–
dos Santos, C. N.; and Xiang, B. 2021. Learning Contex- 3921.
tual Representations for Semantic Parsing with Generation-
Augmented Pre-Training. In Thirty-Fifth AAAI Conference Zelle, J. M.; and Mooney, R. J. 1996. Learning to Parse
on Artificial Intelligence, AAAI 2021, 13806–13814. Database Queries Using Inductive Logic Programming. In
Proceedings of the Thirteenth National Conference on Ar-
Suhr, A.; Chang, M.; Shaw, P.; and Lee, K. 2020. Exploring tificial Intelligence and Eighth Innovative Applications of
Unexplored Generalization Challenges for Cross-Database Artificial Intelligence Conference, AAAI 96, IAAI 96, 1050–
Semantic Parsing. In Proceedings of the 58th Annual Meet- 1055.
ing of the Association for Computational Linguistics, ACL
2020, 8372–8388. Zhang, Y.; Warstadt, A.; Li, X.; and Bowman, S. R. 2021.
When Do You Need Billions of Words of Pretraining Data?
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, In ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual
L.; Gomez, A. N.; Kaiser, L.; and Polosukhin, I. 2017. At- Event, August 1-6, 2021, 1112–1125.
tention is All you Need. In Advances in Neural Information
Processing Systems 30: Annual Conference on Neural Infor- Zhong, R.; Yu, T.; and Klein, D. 2020. Semantic Evaluation
mation Processing Systems 2017, 5998–6008. for Text-to-SQL with Distilled Test Suites. In EMNLP 2020,
396–411.
Wang, B.; Shin, R.; Liu, X.; Polozov, O.; and Richardson,
M. 2020a. RAT-SQL: Relation-Aware Schema Encoding Zhong, V.; Xiong, C.; and Socher, R. 2017. Seq2SQL: Gen-
and Linking for Text-to-SQL Parsers. In Proceedings of the erating Structured Queries from Natural Language using Re-
58th Annual Meeting of the Association for Computational inforcement Learning. arXiv preprint arXiv:1709.00103.
Linguistics, ACL 2020, 7567–7578.
Wang, C.; Tatwawadi, K.; Brockschmidt, M.; Huang, P.-S.;
Mao, Y.; Polozov, O.; and Singh, R. 2018. Robust Text-to-
13075