0% found this document useful (0 votes)
5 views

Graphix T5

Text to sql research paper publish in 2021

Uploaded by

Abdul Bari Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Graphix T5

Text to sql research paper publish in 2021

Uploaded by

Abdul Bari Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Graphix-T5: Mixing Pre-Trained Transformers with

Graph-Aware Layers for Text-to-SQL Parsing


Jinyang Li1,2 * , Binyuan Hui2 , Reynold Cheng1,5† , Bowen Qin3 , Chenhao Ma4 , Nan Huo1 ,
Fei Huang2 , Wenyu Du1 , Luo Si2 , Yongbin Li2 †
1
The University of Hong Kong 2 DAMO Academy, Alibaba Group
3
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
4
The Chinese University of Hong Kong (Shenzhen) 5 Guangdong–Hong Kong-Macau Joint Laboratory
{jl0725,huonan,wenyudu}@connect.hku.hk, [email protected],
[email protected], [email protected],
arXiv:2301.07507v1 [cs.CL] 18 Jan 2023

{binyuan.hby,f.huang,luo.si,shuide.lyb}@alibaba-inc.com

Nature Language Question:


Abstract female
🧑💻 Find the number of dog pets that are raised by female students
The task of text-to-SQL parsing, which aims at convert- MOD

ing natural language questions into executable SQL queries, Database: student
has garnered increasing attention in recent years, as it can EM

Pets Has_Pet Student


assist end users in efficiently extracting vital information PetID PetType Pet_age PetID StuID StuID Sex Age
HAS

Student Sex
from databases without the need for technical background.
One of the major challenges in text-to-SQL parsing is do- Question

main generalization, i.e. , how to generalize well to un- SQL:


Column
seen databases. Recently, the pretrained text-to-text trans- SELECT count(*) FROM student AS T1 JOIN has_pet AS T2 ON
Table
T1.stuid = T2.stuid JOIN pets AS T3 ON T2.petid = T3.petid
former model, namely T5, though not specialized for text- WHERE T1.sex = 'F' AND T3.pettype = 'dog'
Desired Linking
to-SQL parsing, has achieved state-of-the-art performance
on standard benchmarks targeting domain generalization. In Figure 1: This is an an illustration of cross-domain text-to-
this work, we explore ways to further augment the pre-
trained T5 model with specialized components for text-to-
SQL challenge. The link between the target column sex
SQL parsing. Such components are expected to introduce and the token female is highly desired but extremely chal-
structural inductive bias into text-to-SQL parsers thus im- lenging for the model to capture, especially when domain-
proving model’s capacity on (potentially multi-hop) reason- specific data or effective rules is absent. However, this
ing, which is critical for generating structure-rich SQLs. To dilemma can be mitigated by a multi-hop reasoning path
this end, we propose a new architecture G RAPHIX-T5, a M OD EM HAS
(female −→ student −→ Student −→ Sex).
mixed model with the standard pre-trained transformer model
augmented by some specially-designed graph-aware layers.
Extensive experiments and analysis demonstrate the effec- efforts for non-technical users. Therefore, text-to-SQL (Cai
tiveness of G RAPHIX-T5 across four text-to-SQL bench- et al. 2018; Zelle and Mooney 1996; Xu, Liu, and Song
marks: S PIDER, S YN, R EALISTIC and D K. G RAPHIX-T5 2017; Yu et al. 2018a; Yaghmazadeh et al. 2017), aiming to
surpass all other T5-based parsers with a significant mar- convert natural language instructions or questions into SQL
gin, achieving new state-of-the-art performance. Notably, queries, has attracted remarkable attention.
G RAPHIX-T5-large reach performance superior to the orig-
inal T5-large by 5.7% on exact match (EM) accuracy and In this work, we explore the challenging cross-domain
6.6% on execution accuracy (EX). This even outperforms the setting where a text-to-SQL parser needs to achieve domain
T5-3B by 1.2% on EM and 1.5% on EX. generalization, i.e. , the ability to generalize to domains that
are unseen during training. Achieving this goal would, in
principle, contribute to a universal natural language interface
1 Introduction that allows users to interact with data in arbitrary domains.
Relational database, serving as an important resource for The major challenge towards domain generalization (Wang
users to make decision in many fields, such as health care, et al. 2020a; Cao et al. 2021; Wang et al. 2022; Cai et al.
sports, and entertainment, has emerged frequently because 2021; Hui et al. 2022) is that generating structure-rich SQLs
of the big data era. It is efficient for data users to access the requires (potentially multi-hop) reasoning, i.e. the ability
information from databases via structured query language, to properly contextualize a user question against a given
e.g., SQL. Despite its effectiveness and efficiency, the com- database by considering many explicit relations (e.g., table-
plex nature of SQLs leads to extremely expensive learning column relations specified by database schema) and implicit
* Work done during an intern at Alibaba DAMO Academy. relations (e.g., whether a phrase refers to a column or table).

Corresponding authors are Reynold Cheng and Yongbin Li. Figure 1 shows an introductory example of multi-hop rea-
Copyright © 2023, Association for the Advancement of Artificial soning in the text-to-SQL parsing and Figure 6 presents two
Intelligence (www.aaai.org). All rights reserved. more detailed cases.
SQL SQL SQL SQL

Random Init. Pre-trained (T5) Pre-trained (T5) Pre-trained (T5)


Transformer Transformer Transformer Transformer

Decoder Transformer Transformer Transformer Transformer

Transformer Transformer Transformer Transformer

GNN GNN

Pre-trained (BERT) Pre-trained (T5) Pre-trained (T5) Pre-trained (T5)

Transformer Transformer Transformer Graphix Layer


Encoder
Transformer Transformer Transformer Graphix Layer

Transformer Transformer Transformer Graphix Layer

Question | Schema Question | Schema Question | Schema Question | Schema

(a) RATSQL (b) T5 (c) GNN-T5 (d) Graphix-T5

Figure 2: Graphical illustration of existing methods (a) RATSQL [pre-trained BERT-encoder → graph-based module → ran-
domly initialized decoder]. (b) T5 [pre-trained T5-encoder → pre-trained T5-decoder] and the proposed variant (c) GNN-
T5 [pre-trained T5-encoder → graph-based module → pre-trained T5-decoder] (d) G RAPHIX-T5 [semi-pre-trained graphix-
module → pre-trained T5-decoder].

From the modeling perspective, there are two critical di- and implicit relations. Second, we construct a new encoder
mensions along which we can differentiate current text-to- by stacking the G RAPHIX layers and replacing the origi-
SQL parsers. The first is how to effectively imbue rela- nal T5 encoder. In each G RAPHIX layer, the parameters of
tional structures (both explicit and implicit) in the form of the semantic block are still initialized by T5, in an attempt
graphs into neural networks, and the second is how to take to maintain the contextualized encoding power of the pre-
the most advantage of pre-trained models (e.g.T5 (Raffel training. In contrast to the severed GNN-T5 (Figure 2.(c)),
et al. 2020)). These two dimensions are inter-connected and the G RAPHIX-T5 (Figure 2.(d)) will allow intensive interac-
form a spectrum of methods. On one end of the spectrum, tion between semantic and structure from the starting layers.
PICARD (Scholak, Schucher, and Bahdanau 2021) uses We empirically show the effectiveness of G RAPHIX-T5
the original pre-trained T5 model by linearizing database on several cross-domain text-to-SQL benchmarks, i.e. , S PI -
schemas into sequences, hoping that T5 can successfully DER , S YN , D K and R EALISTIC . On these datasets, the
capture the underlying relational structures. On the other proposed model achieves new state-of-the-art performance,
end of the spectrum, RAT-SQL (Wang et al. 2020a) only uti- substantially outperforming all existing models by large
lizes pre-trained encoders (e.g., BERT (Devlin et al. 2019)) margins. Specifically, G RAPHIX-T5-large surprisingly beats
and explicitly captures desired relations via specialized the vanilla T5-3B. Furthermore, we verified that G RAPHIX-
relation-aware models. However, more powerful encoder- T5 can also achieve the significant improvement in the low-
decoder based pre-trained models are not exploited in this resource and compositional generalization obviously thanks
framework, but relational structures are accommodated at to the introduction of structural bias. It should be noticed that
most. In this work, we explore the cross zone where the though we only focus on text-to-SQL parsing in this work,
encoder-decoder based pre-trained models (specifically T5) we believe that the general methodology of G RAPHIX-T5
and relation-aware encodings are deeply coupled in favor of can be extended to structured knowledge grounding tasks,
better domain generalization. We first observe that naively e.g., TableQA (Pasupat and Liang 2015), Data-to-text (Nan
adding a relational graph-based module in the middle of T5, et al. 2021) and KBQA (Talmor and Berant 2018).
resulting in a ‘T5-encoder → graph-based module → T5-
decoder architecture’ (see also Figure 2(c), namely GNN-
T5), does not work very well on standard benchmarks. Pre- 2 Task Formulation and Notations
sumably, the deficiency comes from the middle graph-based 2.1 Task Definition
modules breaking the original information flow inside T5. 
In order to address this problem, we present a novel ar- Given a natural language question Q = q1 , ..., q|Q| with
chitecture called G RAPHIX-T5 that is capable of effectively its
 corresponding database  schemas D = hC, T i, where C =
modelling relational structure information while maintain- c1 , ..., c|C| and T = t1 , ..., t|T | represent columns and
ing the powerful contextual encoding capability of the pre- tables, |C| and |T | refer to the number of columns and tables
trained T5. First, we design a G RAPHIX layer that simul- in each database respectively. The goal of text-to-SQL is to
taneously encodes a mixture of semantic and structural in- generate the corresponding SQL query y.
formation. Concretely, hidden states of inputs composed by
questions and databases are modelled by contextualized se- 2.2 Vanilla T5 Architecture
mantic encoding, and the structural representation is injected
in each transformer layer using a relational GNN block Model Inputs The most canonical and effective format of
that enhances multi-hop reasoning through message passing inputs to T5 performing text-to-SQL task is PeteShaw (Shaw
(Fang et al. 2020; Velickovic et al. 2018) to capture explicit et al. 2021), which unifies natural language questions Q and
content
database schema D as a joint sequence as shown: Paragraph_text
PARTIAL-MATCH
MOD
t t|T | ids
x = [q1 , ..., q|Q| | Dname |t1 : ct11 , ..., ct|C|
1
|...|t|T | : c1|T | , ..., c|C| |∗], ARG
PARTIAL-MATCH
Document_id
(1)
where qi is ith token in the question, tj represents j th table
MOD
les
Documents
t text
in the D, and ckj refers to the k th column in the j th table.
∗ is the special column token in the database. Dname is the content
NO-MATCH
Paragraph_text content
Bridge Node
Paragraph_text
name of each database. les Document_id les * Document_id

text Document
Encoder-Decoder Training Mechanism Following text Document

(Shaw et al. 2021), T5 (Raffel et al. 2020) adopt an (a) No-Match Mode (b) Bridge Node Mode
encoder-decoder mechanism to generate SQLs. First, the
bi-directional encoder learns the hidden state h of input x, Figure 3: Figure shows the circumstances when entities in
then the decoder generates SQLs based on h as:
fi
fi

the question are hard to string-match the schema items. (a) is


h = EncΘ (x) ; y = DecΥ (h), (2) the strategy to solve this case by N O -M ATCH Mode, which
fully connects schema nodes with all token nodes. (b) is our
where Θ and Υ refers to parameters of the encoder and de- solution to add a bridge node to link the question and schema
coder, and h connects the encoder and decoder. The model nodes via less number of edges.
is initialized with pretrained T5 parameters and optimized as
the following objective. but cannot be linked due to existing string-matched rules.
|y|
X However, as shown in the Figure 3, N O -M ATCH may lead
max log pΘ,Υ (y | x) = log pΘ,Υ (yi | y1:i−1 , x) , (3) to over-smoothing problem (Chen et al. 2020a) since they
Θ,Υ
i=1 bring out too many noisy neighbors to compute the attention
where x, y indicates the input and output tokens respectively score. Suppose there exists A tokens for the question and B
and |y| is the max length of generation SQL. schema items that are semantic relevant but not linked by the
rule, the number of edges need to be linked as N O -M ATCH
is A × B. In contrast, we leverage the special token * as a
3 Proposed Approach: G RAPHIX-T5 bridge node, allowing all schema nodes to be reached from
3.1 Model Inputs the question token nodes by decreasing the number of edges
Contextual Encoding We continue to take both questions drastically from A × B to A + B.
and database schemas as depicted in Eq. (1) to encode the
contextual information through the original T5. 3.2 G RAPHIX-Layer
The G RAPHIX layer is designed to integrate semantic infor-
Graph Construction The joint input questions and
mation obtained from each transformer block with structural
schemas can be displayed as a heterogeneous graph G =
information of a relational graph neural network (GNN)
hV, Ri consisting of three types of nodes V = Q∪C ∪T and
block.
multiple types of relations R = r1 , ..., r|R| , where each ri
refers to a one-hop relation between nodes and a multi-hop Semantic Representation The semantic representations
relation rk is defined as a composition of one-hop relations: of hidden states are firstly encoded by a Transformer
rk = r1 ◦ r2 · · · ◦ rI as shown in the Figure 1, where I refers (Vaswani et al. 2017) block, which contains two important
to the length of each rk . Inspired by (Wang et al. 2020a; Cao components, including Multi-head Self-attention Network
et al. 2021; Qin et al. 2022b; Hui et al. 2022), we enumerated (MHA) and Fully-connected Forward Network (FFN). In
th l
a list of pre-defined relations to connect nodes. The relation n l G RAPHIX
the o Layer, the hidden states represent HS =
sets can be divided into three main categories: (l) (l)
h1 , . . . , hN , N is the max length of the inputs. MHA
• Schema relations: F OREIGN -K EY, P RIMARY-K EY, and
at first maps query matrix Q ∈ Rm×dk , key and value ma-
S AME -TABLE pertain to the particular explicit schema
trix K ∈ Rn×dk , V ∈ Rn×dv into an attention vector via
relations that the original T5 cannot obtain from linear
self-attention mechanism as Eq. (4)
inputs.
QKT
 
• Schema linking relations: E XACT-M ATCH, PARTIAL - Attn(Q, K, V) = softmax √ V, (4)
M ATCH, and VALUE -M ATCH are implicit linking rela- dk
tions between question and schema nodes. A new type of
in which m is the number of query vectors and n is the
relation B RIDGE is introduced.
number of key or value vectors. MHA executes the self-
• Question relations: M ODIFIER and A RGUMENT are im- attention over h heads with each head i being indepen-
plicit dependency relations between tokens in a question. dently parameterized by WiQ ∈ Rdm ×dk , WiK ∈ Rdm ×dk ,
N O -M ATCH Mode vs. B RIDGE Mode Previous works WiV ∈ Rdm ×dv and mapping inputs into queries, key-value
(Cao et al. 2021; Hui et al. 2022) through adding the dummy pairs. Usually dk = dv = dm /h in the transformer blocks
edges called N O -M ATCH indicate that the there are question of T5 and dm denotes the dimension of T5. Then MHA cal-
tokens and the schema tokens, which should be correlated culates the attention outputs for each head and concatenate
them as following: 3.3 G RAPHIX-T5
headi = Attn(QWiQ , KWiK , VWiV ), (5) Here we present our entire G RAPHIX-T5 model formally.
(l)
The hidden states of the last layer of G RAPHIX-encoder can
MHA(HS ) = Concat (head1 , · · · , headh ) WO , (6) be represented as:
b (l) = MHA(H(l) ),
H (7)
S S h = EncΘ,Ψ (x, G) , (16)
O dm h×dm
where W ∈ R is a trainable parameter matrix. where G is the question-schema heterogeneous graph, the Ψ
Semantic hidden states need to be acquired through another are the additional parameters of the RGAT, which are initial-
component, i.e. , FFN, which is applied as Eq. (8). ized randomly. In order to preserve the pre-trained semantic
knowledge, we migrate parameters Θ from original T5 en-
 
b (l) ) = max 0, H
FFN(H b (l) W1 + b1 W2 + b2 , (8)
S S
coder as the initial parameters of semantic transformer block
where linear weight matrices represent W1 ∈ Rdm ×df f , of the G RAPHIX layer.
W2 ∈ Rdf f ×dm respectively. Experimentally, larger df f is
preferred, which is usually set as df f = 4dm . Eventually, 3.4 Training
the semantic hidden states are acquired after layer normal- Similar to original T5, we also follow a fine-tuning strategy.
ization and residual connection as The whole training framework is to optimize the following
(l) (l) (l) log-likelihood.
H̃S = LayerNorm(H
b + FFN(H
S
b )),
S (9)
|y|
Structural Representation In each G RAPHIX Layer, X
max log pΘ,Υ,Ψ (y | x) = log pΘ,Υ,Ψ (yi | y1:i−1 , x, G) .
structural representations are produced through the rela- Θ,Υ,Ψ
i=1
tional graph attention network (RGAT) (Wang et al. 2020b) (17)
over the pre-defined question-schema heterogeneous graph.
Formally, given initial node embedding1 einit
i for ith node
th init
and its j neighbor ej linked by specific types of rela-
4 Experiment
tions, it can be computed through: 4.1 Set up
 > Datasets and Settings We conduct extensive experiments
einit f Q einit W
W f K + φ (rij ) on four challenging benchmarks for cross-domain text-to-
i j
α
~ ij = √ , (10) SQLs and two different training settings. (1) S PIDER (Yu
dz et al. 2018b) is a large-scale cross-domain text-to-SQL
αij = softmaxj (~ αij ) , (11) benchmark, also including 9 previous classic datasets, e.g.,
X   Scholar (Iyer et al. 2017), WikiSQL (Zhong, Xiong, and
êinit
i = αij einit
j Wf V + φ(rij ) , (12) Socher 2017), GeoQuery (Zelle and Mooney 1996), etc. It
j∈N
ei contains 8659 training examples and 1034 development ex-
(l) amples, which covers 200 complex databases across 138
êi = LayerNorm(einit
i + êinit
i W
f O ), (13) domains. The testing set is not available for individual re-
(l) (l) (l) view. (2) S YN (Gan et al. 2021a) replaces the simple string-
ẽi = LayerNorm(êi + FFN(êi )), (14)
matched question tokens or schema names with their syn-
(l)
Then the output node embeddings are collected as ẼG = onyms. (3) D K (Gan, Chen, and Purver 2021) requires
the text-to-SQL parsers to equip with the capability of do-
n o
(l) (l) fO ∈ R d×d
ẽ1 , . . . , ẽN , where W
f Q, W
fK, W
fV , W are
main knowledge reasoning. (4) R EALISTIC removes and
trainable parameters in the RGAT. φ(rij ) is a mapping func- switches the obvious mentions of schema items in questions,
tion that can produce a d-dim embedding representing for making it closer to the real scenarios. Furthermore, we also
each relation between ith node and j th node. More impor- test the compositional generalization ability of our model
tantly, N
ei denotes the relational reception field, which is on the S PIDER -SSP (Shaw et al. 2021) with three splits
equal to the number of how many neighbors of ith node that from S PIDER: Spider-Length (split dataset based on vari-
RGAT will consider when updating representation of each ant lengths); Spider-TMCD (Target Maximum Compound
node via message passing. Divergence) and Spider-Template (split based on different
parsing templates). Finally, the performances of G RAPHIX-
Jointly Representation After computing representations T5 on L OW-R ESOURCE setting are evaluated on usage of
from both semantic and structural space, the lth G RAPHIX 10%, 20%, 50% data separately.
Layer employs a mixture of semantic and structural infor-
mation to enable information integration as following: Evaluation Metrics Following (Yu et al. 2018b), Exact
(l) (l) (l) Match (EM) and Execution Accuracy (EX) are the two stan-
H̃M = H̃S + ẼG , (15) dard metrics we use to measure performance of our model.
1
Various initialization strategies could be implemented. In this EM can evaluate how much a generated SQL is comparable
work, we initialized the node embeddings with their semantic rep- to the gold SQL. EX can reflect whether a predicted SQL is
resentations. valid and returns the exact result as desired by users.
M ODEL EM EX M ODEL S YN DK R EALISTIC
RAT-SQL + BERT ♥ 69.7 - GNN 23.6 26.0 -
IRNet 28.4 33.1 -
RAT-SQL + Grappa ♥ 73.9 - RAT-SQL 33.6 35.8 -
GAZP + BERT 59.1 59.2 RAT-SQL + BERT 48.2 40.9 58.1
BRIDGE v2 + BERT 70.0 68.3 RAT-SQL + Grappa 49.1 38.5 59.3
NatSQL + GAP 73.7 75.0 LGESQL + ELECTRA 64.6 48.4 69.2
SMBOP + GRAPPA 74.7 75.0
LGESQL + ELECTRA ♥ 75.1 - T5-large 53.6 40.0 58.5
G RAPHIX-T5-large 61.1 (↑ 7.5) 48.6 (↑ 8.6) 67.3 (↑ 8.8)
S2 SQL + ELECTRA ♥ 76.4 -
T5-3B 58.0 46.9 62.0
T5-large 67.0 69.3 G RAPHIX-T5-3B 66.9 (↑ 8.9) 51.2 (↑ 4.3) 72.4 (↑ 10.4)
G RAPHIX-T5-large 72.7(↑ 5.7) 75.9(↑ 6.6)
T5-large + P ICARD ♣ 69.1 72.9
G RAPHIX-T5-large + P ICARD ♣ 76.6(↑ 7.5) 80.5(↑ 7.6)
Table 2: Exact match (EM) accuracy (%) on S YN, D K and
R EALISTIC benchmark.
T5-3B 71.5 74.4
G RAPHIX-T5-3B 75.6 (↑ 4.1) 78.2 (↑ 3.8)
T5-3B + P ICARD ♣ 75.5 79.3 M ODEL T EMPLATE L ENGTH T MCD
G RAPHIX-T5-3B + P ICARD ♣ 77.1(↑ 1.6) 81.0(↑ 1.7)
T5-base 59.3 49.0 60.9
T5-3B 64.8 56.7 69.6
Table 1: Exact match (EM) and execution (EX) accuracy (%) NQG-T5-3B 64.7 56.7 69.5
on S PIDER development set. ♥ means the model does not G RAPHIX-T5-3B 70.1 (↑ 5.4) 60.6 (↑ 3.9) 73.8 (↑ 4.3)
predict SQL values. ♣ means the model uses the constrained
decoding P ICARD. ↑ is an absolute improvement. Table 3: Exact match (EM) accuracy (%) on compositional
dataset S PIDER -S SP.
Implementation Details We implement our codes 2
mainly based on hugging-face transformers library (Wolf
alization capability of the G RAPHIX layer is crucial for T5
et al. 2020) 3 . We set the max input length as 1024, gen-
such a text-to-text PLM to perform the text-to-SQL task.
eration max length as 128, and batch size as 32. We also
adopt Adafactor (Shazeer and Stern 2018) as our primary
Zero-shot Results on More Challenging Settings As
optimizer with a linear decayed learning rate of 5e-5. Dur-
shown in the Table 2, we further demonstrate the robust-
ing the experiment, G RAPHIX layers are mainly injected
ness of G RAPHIX-T5 when it confronts with more challeng-
into the encoder to learn better representations for structural
ing and closer to realistic evaluations in S YN, D K, R EAL -
generalization. We evaluate our effectiveness of G RAPHIX-
ISTIC without any additional training. First of all, the re-
T5 across two main versions: T5-Large with approximately
sults show that G RAPHIX-T5-3B outperforms other baseline
800M parameters and T5-3B, with more than 3 Billion pa-
models across all three datasets. Furthermore, we observe
rameters literally. All experiments are conducted on one
that G RAPHIX-T5-large and G RAPHIX-T5-3B surpass the
NVIDIA Tesla A100, which is available for the most re-
performance of vanilla T5-large and T5-3B with a clear mar-
search centers.
gin, respectively. This demonstrates that vanilla T5 is hun-
Compared Methods Our model are compared mainly to gry for structural reasoning when dealing with more flexible
mainstream strong baseline models such as GNNSQL (Bo- and complicated questions for text-to-SQLs from real-world
gin, Berant, and Gardner 2019), RATSQL (Wang et al. scenarios. And G RAPHIX can mitigate this problem.
2020a), GAZP (Zhong et al. 2020), BRIDEGE (Chen et al.
2020b), SMBOP (Rubin and Berant 2021), NatSQL (Gan Results on Compositional Generalization As shown in
et al. 2021b), LGESQL (Cao et al. 2021), S2 SQL (Hui et al. Table 3, on S PIDER -S SP, the grammar-based inductive T5
2022) and T5+PICARD (Scholak, Schucher, and Bahdanau model provided by (Shaw et al. 2021), named NQG-T5, has
2021) across the disparate datasets and settings. no obvious advantages over vanilla T5, which indicates that
the grammar of natural language is not helpful to enhance T5
4.2 Overall Performance for compositional generation. However, G RAPHIX-T5 helps
the T5 gain the SQL knowledge and makes it less vulner-
Results on S PIDER Table 1 displays the performance able to these modifications through the effective fusion of
of G RAPHIX-T5 and other competitive baseline models structural information.
on official S PIDER benchmark. First, we demonstrate that
G RAPHIX-T5-3B with a constrained decoding module PI-
CARD (Scholak, Schucher, and Bahdanau 2021) achieves
the state-of-the-art on this challenging cross-domain text-
to-SQL benchmark. Also, it is evident that G RAPHIX-T5 is
vastly superior to the vanilla T5 on large and 3B scale with
a significant margin. This indicates that the structural gener-
2
https://github.com/AlibabaResearch/DAMO-ConvAI/tree/
main/graphix
3
https://huggingface.co/ Figure 4: Exact match (EM) (left) and execution (EX) (right)
accuracy (%) on S PIDER low-resource setting.
S PIDER S YN DK R EALISTIC
M ODEL
easy medium hard extra all easy medium hard extra all easy medium hard extra all easy medium hard extra all
T5-large 85.5 70.9 55.2 41.6 67.0 69.0 56.8 46.3 30.2 53.6 64.1 44.3 22.9 18.1 40.0 79.8 68.0 44.4 28.9 58.5
G RAPHIX-T5-large 89.9 78.7 59.8 44.0 72.6 75.8 67.5 50.6 33.1 61.1 63.6 54.5 33.8 29.5 48.6 88.1 77.3 50.5 40.2 67.3
T5-3B 89.5 78.3 58.6 40.4 71.6 74.2 64.5 48.0 27.8 58.0 69.9 53.5 24.3 24.8 46.9 85.3 73.4 46.5 27.8 62.0
G RAPHIX-T5-3B 91.9 81.6 61.5 50.0 75.6 80.6 73.1 52.9 44.6 66.9 69.1 55.3 39.2 31.4 51.2 93.6 85.7 52.5 41.2 72.4

Table 4: Exact matching (EM) accuracy by varying the levels of difficulty of the inference data on four benchmarks.

M ODEL EM EX 4.3 Ablation Study


(a) RAT-SQL + BERT 69.7 -
As shown in Table 5, to better validate the function of each
(b) T5-large 67.0 69.3 component of G RAPHIX-T5, ablation studies are performed
(c) GNN-T5-large 51.6 54.5 in large version and expected to answer the following ques-
(d) G RAPHIX-T5-large tions.
w/ B RIDGE Mode 72.7 75.9
w/ N O -M ATCH Mode 71.1 74.2 [1] How effective is B RIDGE M ODE ? G RAPHIX-T5-
w/ D OUBLE -G RAPH 72.0 74.7 large with B RIDGE M ODE can achieve the better perfor-
mance than with N O -M ATCH Mode. It indicates that N O -
Table 5: Ablation study for the variant GNN + PLM tactics MATCH mode will greatly increase the number of noisy
on cross-domain text-to-SQLs, echoing Figure 2, (a) is RAT- neighbors, resulting in higher risk of over-smoothing issues
SQL, (b) is vanilla T5, (c) is GNN-T5 and (d) is G RAPHIX. (Chen et al. 2020a).
80 [2] Could G RAPHIX be incorporated into decoder ?
70 With D OUBLE -G RAPH means that G RAPHIX-T5 incorpo-
60 rate G RAPHIX layer into the both encoder and decoder. The
Accuracy (%)

50 result reveals that adding G RAPHIX layers to the decoder


40 does not lead to any improvements. Since decoder is an auto-
30 regressive model, which only considers the history tokens
20 when generating the current token. However, G RAPHIX-
10 GNN-T5
Graphix-T5 T5, which can forecast the information of future tokens by
0 0 5 10 15 20 25 global linking, may disrupt this characteristic leading to the
Training Step (k)
negative impact on the decoder. Therefore, we propose that
Figure 5: The performance of the validation sets during the the best tactic is to only incorporate G RAPHIX layers into
convergence of G RAPHIX-T5 and GNN-T5 on S PIDER. It the encoder.
can be clearly demonstrated that GNN-T5 has extremely un- [3] Is G RAPHIX superior than other architecture vari-
satisfactory performance, due to catastrophic forgetting. ants ? Echoing Figure 2, we access the performance of
4 categories of models using PLMs on S PIDER. According
to Table 5 (c), the performance of GNN-T5 has decreased
Results on Low-resource Settings Figure 4 records the
by roughly 20% when compared to G RAPHIX-T5, proving
performance of G RAPHIX-T5-large and T5-large on dif-
GNN-T5 training strategy to be ineffective. Moreover, we
ferent low-resource settings. It displays 1) in each low-
notice that such severed GNN-T5 encounters a catastrophic
resource setting, G RAPHIX-T5-large performs considerably
forgetting problem (French 1999) during training. Since the
better than vanilla T5-large. It demonstrates that the struc-
accuracy of the GNN-T5 continues to be 0 in the first thou-
tural knowledge created by humans can compensate for
sands of steps, as shown in Figure 5, it is evident that all pre-
the inadequate learning due to low-resource data (Ye et al.
trained knowledge from T5 would be forgotten. After con-
2022); 2) notably, G RAPHIX-T5-large can perform obvi-
vergence, the GNN-T5 performance decreases significantly
ously better than the vanilla T5-large trained on 100% data
from the G RAPHIX-T5, indicating that only a small portion
even within just usage of 50% data. This further verifies the
of the semantic information from T5 has been utilized. In
strengths of G RAPHIX-T5 for training in the low-data re-
contrast, G RAPHIX-T5 can achieve almost 50% accuracy in-
sources.
side the first 1000 training steps and more than 20% im-
provement than GNN-T5 after convergence, which verifies
Results on Complex Queries As presented in Table 4, the advantages of G RAPHIX-T5 that can avoid catastrophic
we also compare the more precise performance results of forgetting and augment generalization capability.
G RAPHIX-T5 to the vanilla T5 in four separate SQL diffi-
culty levels splitted by S PIDER officially, in order to bet- 4.4 Case Study
ter comprehend the performance improvements. We observe To illustrate the effectiveness of G RAPHIX qualitatively,
that G RAPHIX-T5 is more capable of handling harder text- two examples are displayed in Figure 6, which are sam-
to-SQL cases, as illustrated in the Hard and Extra-hard ex- pled randomly from S YN. Figure 6 shows the compari-
amples, indicating that structural bias training is beneficial son of predicted SQLs by vanilla T5-3B and G RAPHIX-
to the text-to-text PLMs to reason over complex scenarios. T5-3B. We can observe that G RAPHIX can generate correct
Question: List paper IDs, paper names, and paper descriptions for all papers.
Documents T5-3B: SELECT paper_id, paper_name, paper_description FROM documents
paper Graphix-T5-3B: SELECT document_id, document_name, document_description FROM documents
document_id Gold: SELECT document_id, document_name, document_description FROM documents
ids
Modi er Partial-Match
document_name
paper ids document_id
document_description Modi er Partial-Match
name Muti-hop Path
paper description document_description
description Partial-Match
Modi er
paper name document_name

Question: How many French car manufacturers are there?


car_makers T5-3B:SELECT COUNT(*) FROM car_makers WHERE country = "France"
French
Graphix-T5-3B:SELECT COUNT(*) FROM car_makers AS T1 JOIN countries AS T2 ON T1.country
countries
= T2.countryid WHERE T2.countryname = "France"
Gold:SELECT COUNT(*) FROM car_makers AS T1 JOIN countries AS T2 ON T1.country
countryname = T2.countryid WHERE T2.countryname = 'France';
manufacture
country
Value-Match Belongs-To
car
countryid French countryname countries
Muti-hop Path
Value-Match Same-Table Foreign-Key
French countryname countryid country

Figure 6: Case study: two illustrative cases sampled randomly from S YN. It shows that multi-hop reasoning can help G RAPHIX-
fi
fi
fi
T5 generate more correct SQLs in terms of semantic meanings and database schema structures.

SQLs even in the hard scenarios. That is because that, even tural reasoning through modelling relations between schema
with a small number of keywords overlapped, G RAPHIX- and questions. R2 SQL (Hui et al. 2021a), S CORE (Yu et al.
T5 can accurately identify counterpart column or table ob- 2021) and S TAR (Cai et al. 2022) enhance structural reason-
jects and generate a high-quality SQL through multi-hop ing for context-dependent text-to-SQL parsing. These works
reasoning and structural grounding. For example, in the first are performed by the PLM independently building the se-
case, vanilla T5-3B picks the incorrect columns paper id, mantic features, followed by the graph-based module inject-
paper name, and paper description, which even ing the structural information. However, such training strat-
don’t appear in the table documents. This implies that egy is just effective to encoder-based PLMs (i.e. , BERT
vanilla T5-3B is unable to reach the target schema ele- (Devlin et al. 2019), ELECTRA (Clark et al. 2020), et al.).
ments without the capability of structural grounding when Recently, the text-to-text PLM T5 has been proven effec-
confronting challenging text-to-SQLs. Instead, G RAPHIX- tiveness in text-to-SQL (Shaw et al. 2021; Qin et al. 2022c).
T5-3B can correspond the question entities to the correct Besides, (Scholak, Schucher, and Bahdanau 2021) designs
column names through multi-hop paths presented in the a constrained decoding process, namely PICARD, to detect
Figure 6. In the second case, vanilla T5-3B misidentifies and refuse erroneous tokens during the beam-search phase.
the country as their target column, however, "France" Xie et al. (2022) further injects the knowledge from other
only appears in the column countryname of the table structural knowledge grounding tasks into T5 with multi-
countries. This suggests T5-3B is only able to generate task to boost performance on text-to-SQL. Despite effec-
semantically valid SQLs, which fails to take into account tiveness, these methods still struggle to generate SQLs in
the real database structure. On contrary, G RAPHIX-T5 can the more challenging and complex scenarios without explicit
produce truly valid SQLs in terms of both questions and and implicit structural information. However, G RAPHIX-T5
databases via a successful mixture of semantic and structural can overcome this issue by an argument of graph representa-
information during training. tion learning in the encoder. Concurrently, RASAT (Qi et al.
2022) also attempts to provide T5 with the structural infor-
5 Related Works mation by adding edge embedding into the multi-head self-
attention, while we keep the pre-trained transformers com-
The basic principle of a cross-domain text-to-SQL parser
plete in order to benefit the most from prior semantic knowl-
is to build an encoder to learn the representations of the
edge, which leads to better performance.
questions and schemas, while employing a decoder to gen-
erate SQLs with the information learnt in the encoder (Qin
et al. 2022a). In particular, IRNET (Guo et al. 2019) pro- 6 Conclusion
poses to design an encoder to learn the representations of In this paper, we proposed an effective architecture to boost
questions and schemas respectively via an attention-based the capability of structural encoding of T5 cohesively while
Bi-LSTM and a decoder to predict SQLs via encoded in- keeping the pretrained T5’s potent contextual encoding abil-
termediate representations. Later, the graph-based encoders ity. In order to achieve this goal, we designed a Graph-Aware
have been successfully proved its effectiveness in text-to- semi-pretrained text-to-text PLM, namely G RAPHIX-T5, to
SQL tasks, for example, some works (Bogin, Berant, and augment the multi-hop reasoning for the challenging text-
Gardner 2019; Chen et al. 2021) construct the schema graph to-SQL task. The results under the extensive experiments
and enhance the representations of inputs. RATSQL (Wang demonstrate the effectiveness of G RAPHIX-T5, proving that
et al. 2020a), SDSQL (Hui et al. 2021b), LGESQL (Cao structural information is crucial for the current text-to-text
et al. 2021), S2 SQL (Hui et al. 2022) further improve struc- PLMs for complicated text-to-SQL cases.
Acknowledgement Gan, Y.; Chen, X.; Huang, Q.; Purver, M.; Woodward, J. R.;
Xie, J.; and Huang, P. 2021a. Towards Robustness of Text-
We thank Dr. Tao Yu and Tianbao Xie for evaluation of our
to-SQL Models against Synonym Substitution. In Proc. of
work on SPIDER leaderboard. We thank Dr. Bailin Wang
ACL.
and Dr. Bowen Li for constructive suggestions. Reynold
Cheng, Jinyang Li, Nan Huo, and Wenyu Du were supported Gan, Y.; Chen, X.; and Purver, M. 2021. Exploring Under-
by the University of Hong Kong (Project 104006830), the explored Limitations of Cross-Domain Text-to-SQL Gener-
Guangdong–Hong Kong-Macau Joint Laboratory Program alization. In Proc. of EMNLP.
2020 (Project No: 2020B1212030009), and the Innovation Gan, Y.; Chen, X.; Xie, J.; Purver, M.; Woodward, J. R.;
Wing Two Research fund. Jinyang Li was also supported Drake, J.; and Zhang, Q. 2021b. Natural SQL: Making SQL
by HKU Presidential PhD Scholar Programme and Alibaba Easier to Infer from Natural Language Specifications. In
Group through Alibaba Research Intern Program. Chenhao Proc. of EMNLP Findings.
Ma was supported in part by Shenzhen Science and Technol- Guo, J.; Zhan, Z.; Gao, Y.; Xiao, Y.; Lou, J.-G.; Liu, T.;
ogy Program under grant No.ZDSYS20211021111415025. and Zhang, D. 2019. Towards Complex Text-to-SQL in
Cross-Domain Database with Intermediate Representation.
References In Proc. of ACL.
Bogin, B.; Berant, J.; and Gardner, M. 2019. Representing Hui, B.; Geng, R.; Ren, Q.; Li, B.; Li, Y.; Sun, J.; Huang, F.;
Schema Structure with Graph Neural Networks for Text-to- Si, L.; Zhu, P.; and Zhu, X. 2021a. Dynamic Hybrid Rela-
SQL Parsing. In Proc. of ACL. tion Network for Cross-Domain Context-Dependent Seman-
tic Parsing. In Proc. of AAAI.
Cai, R.; Xu, B.; Zhang, Z.; Yang, X.; Li, Z.; and Liang, Z.
2018. An Encoder-Decoder Framework Translating Natural Hui, B.; Geng, R.; Wang, L.; Qin, B.; Li, Y.; Li, B.; Sun,
Language to Database Queries. In Proc. of IJCAI. J.; and Li, Y. 2022. S2 SQL: Injecting Syntax to Question-
Schema Interaction Graph Encoder for Text-to-SQL Parsers.
Cai, R.; Yuan, J.; Xu, B.; and Hao, Z. 2021. SADGA: In Proc. of ACL Findings.
Structure-Aware Dual Graph Aggregation Network for Text- Hui, B.; Shi, X.; Geng, R.; Li, B.; Li, Y.; Sun, J.; and Zhu, X.
to-SQL. In Proc. of NeurIPS. 2021b. Improving Text-to-SQL with Schema Dependency
Cai, Z.; Li, X.; Hui, B.; Yang, M.; Li, B.; Li, B.; Cao, Z.; Li, Learning. In arXiv:2103.04399.
W.; Huang, F.; Si, L.; and Li, Y. 2022. STAR: SQL Guided Iyer, S.; Konstas, I.; Cheung, A.; Krishnamurthy, J.; and
Pre-Training for Context-dependent Text-to-SQL Parsing. Zettlemoyer, L. 2017. Learning a Neural Semantic Parser
In Proc. of EMNLP Findings. from User Feedback. In Proc. of ACL.
Cao, R.; Chen, L.; Chen, Z.; Zhao, Y.; Zhu, S.; and Yu, K. Nan, L.; Radev, D.; Zhang, R.; Rau, A.; Sivaprasad, A.;
2021. LGESQL: Line Graph Enhanced Text-to-SQL Model Hsieh, C.; Tang, X.; Vyas, A.; Verma, N.; Krishna, P.; Liu,
with Mixed Local and Non-Local Relations. In Proc. of Y.; Irwanto, N.; Pan, J.; Rahman, F.; Zaidi, A.; Mutuma,
ACL. M.; Tarabar, Y.; Gupta, A.; Yu, T.; Tan, Y. C.; Lin, X. V.;
Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; and Sun, X. 2020a. Xiong, C.; Socher, R.; and Rajani, N. F. 2021. DART: Open-
Measuring and Relieving the Over-Smoothing Problem for Domain Structured Data Record to Text Generation. In Proc.
Graph Neural Networks from the Topological View. In Proc. of NAACL.
of AAAI. Pasupat, P.; and Liang, P. 2015. Compositional Semantic
Chen, X.; Meng, F.; Li, P.; Chen, F.; Xu, S.; Xu, B.; and Parsing on Semi-Structured Tables. In Proc. of ACL.
Zhou, J. 2020b. Bridging the Gap between Prior and Pos- Qi, J.; Tang, J.; He, Z.; Wan, X.; Cheng, Y.; Zhou, C.; Wang,
terior Knowledge Selection for Knowledge-Grounded Dia- X.; Zhang, Q.; and Lin, Z. 2022. RASAT: Integrating Re-
logue Generation. In Proc. of EMNLP. lational Structures into Pretrained Seq2Seq Model for Text-
to-SQL. In Proc. of EMNLP.
Chen, Z.; Chen, L.; Zhao, Y.; Cao, R.; Xu, Z.; Zhu, S.; and
Yu, K. 2021. ShadowGNN: Graph Projection Neural Net- Qin, B.; Hui, B.; Wang, L.; Yang, M.; Li, J.; Li, B.; Geng,
work for Text-to-SQL Parser. In Proc. of NAACL. R.; Cao, R.; Sun, J.; Si, L.; Huang, F.; and Li, Y. 2022a.
A Survey on Text-to-SQL Parsing: Concepts, Methods, and
Clark, K.; Luong, M.; Le, Q. V.; and Manning, C. D. 2020. Future Directions. In arXiv:2208.13629.
ELECTRA: Pre-training Text Encoders as Discriminators
Rather Than Generators. In Proc. of ICLR. Qin, B.; Wang, L.; Hui, B.; Geng, R.; Cao, Z.; Yang, M.;
Sun, J.; and Li, Y. 2022b. Linking-Enhanced Pre-Training
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. for Table Semantic Parsing. In arXiv:2111.09486.
BERT: Pre-training of Deep Bidirectional Transformers for
Qin, B.; Wang, L.; Hui, B.; Li, B.; Wei, X.; Li, B.; Huang, F.;
Language Understanding. In Proc. of NAACL.
Si, L.; Yang, M.; and Li, Y. 2022c. SUN: Exploring Intrinsic
Fang, Y.; Sun, S.; Gan, Z.; Pillai, R.; Wang, S.; and Liu, J. Uncertainties in Text-to-SQL Parsers. In Proc. of COLING.
2020. Hierarchical Graph Network for Multi-hop Question Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.;
Answering. In Proc. of EMNLP. Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring
French, R. M. 1999. Catastrophic forgetting in connectionist the Limits of Transfer Learning with a Unified Text-to-Text
networks. Trends in cognitive sciences. Transformer. Journal of Machine Learning Research.
Rubin, O.; and Berant, J. 2021. SmBoP: Semi- Yu, T.; Li, Z.; Zhang, Z.; Zhang, R.; and Radev, D. 2018a.
autoregressive Bottom-up Semantic Parsing. In Proc. of TypeSQL: Knowledge-Based Type-Aware Neural Text-to-
NAACL. SQL Generation. In Proc. of NAACL.
Scholak, T.; Schucher, N.; and Bahdanau, D. 2021. Yu, T.; Zhang, R.; Polozov, A.; Meek, C.; and Awadallah,
PICARD: Parsing Incrementally for Constrained Auto- A., Hassan. 2021. SCoRe: Pre-Training for Context Repre-
Regressive Decoding from Language Models. In Proc. of sentation in Conversational Semantic Parsing. In Proc. of
EMNLP. ICLR.
Shaw, P.; Chang, M.-W.; Pasupat, P.; and Toutanova, K. Yu, T.; Zhang, R.; Yang, K.; Yasunaga, M.; Wang, D.; Li, Z.;
2021. Compositional Generalization and Natural Language Ma, J.; Li, I.; Yao, Q.; Roman, S.; Zhang, Z.; and Radev, D.
Variation: Can a Semantic Parsing Approach Handle Both? 2018b. Spider: A Large-Scale Human-Labeled Dataset for
In Proc. of ACL. Complex and Cross-Domain Semantic Parsing and Text-to-
Shazeer, N.; and Stern, M. 2018. Adafactor: Adaptive Learn- SQL Task. In Proc. of EMNLP.
ing Rates with Sublinear Memory Cost. In Proc. of ICML. Zelle, J. M.; and Mooney, R. J. 1996. Learning to Parse
Talmor, A.; and Berant, J. 2018. The Web as a Knowledge- Database Queries Using Inductive Logic Programming. In
Base for Answering Complex Questions. In Proc. of Proc. of AAAI.
NAACL. Zhong, V.; Lewis, M.; Wang, S. I.; and Zettlemoyer, L. 2020.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, Grounded Adaptation for Zero-shot Executable Semantic
L.; Gomez, A. N.; Kaiser, L.; and Polosukhin, I. 2017. At- Parsing. In Proc. of EMNLP.
tention is All you Need. In Proc. of NeurIPS. Zhong, V.; Xiong, C.; and Socher, R. 2017. Seq2SQL: Gen-
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, erating Structured Queries from Natural Language using Re-
P.; and Bengio, Y. 2018. Graph Attention Networks. In Proc. inforcement Learning. In CoRR abs/1709.00103.
of ICLR.
Wang, B.; Shin, R.; Liu, X.; Polozov, O.; and Richardson, M.
2020a. RAT-SQL: Relation-Aware Schema Encoding and
Linking for Text-to-SQL Parsers. In Proc. of ACL.
Wang, K.; Shen, W.; Yang, Y.; Quan, X.; and Wang, R.
2020b. Relational Graph Attention Network for Aspect-
based Sentiment Analysis. In Proc. of ACL.
Wang, L.; Qin, B.; Hui, B.; Li, B.; Yang, M.; Wang, B.;
Li, B.; Huang, F.; Si, L.; and Li, Y. 2022. Proton: Prob-
ing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing. In Proc. of KDD.
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.;
Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; Davi-
son, J.; Shleifer, S.; von Platen, P.; Ma, C.; Jernite, Y.; Plu,
J.; Xu, C.; Le Scao, T.; Gugger, S.; Drame, M.; Lhoest, Q.;
and Rush, A. 2020. Transformers: State-of-the-Art Natural
Language Processing. In Proc. of EMNLP.
Xie, T.; Wu, C. H.; Shi, P.; Zhong, R.; Scholak, T.; Yasunaga,
M.; Wu, C.-S.; Zhong, M.; Yin, P.; Wang, S. I.; Zhong, V.;
Wang, B.; Li, C.; Boyle, C.; Ni, A.; Yao, Z.; Radev, D.;
Xiong, C.; Kong, L.; Zhang, R.; Smith, N. A.; Zettlemoyer,
L.; and Yu, T. 2022. UnifiedSKG: Unifying and Multi-
Tasking Structured Knowledge Grounding with Text-to-Text
Language Models. ArXiv preprint.
Xu, X.; Liu, C.; and Song, D. 2017. Sqlnet: Generating
structured queries from natural language without reinforce-
ment learning. ArXiv preprint.
Yaghmazadeh, N.; Wang, Y.; Dillig, I.; and Dillig, T. 2017.
SQLizer: query synthesis from natural language. Proceed-
ings of the ACM on Programming Languages.
Ye, H.; Zhang, N.; Deng, S.; Chen, X.; Chen, H.; Xiong, F.;
Chen, X.; and Chen, H. 2022. Ontology-enhanced Prompt-
tuning for Few-shot Learning. In WWW ’22: The ACM Web
Conference 2022, Virtual Event, Lyon, France, April 25 - 29,
2022.
Source x Target y Relation Type Description
Question Question M ODIFIER y is a modifier of x.
Question Question A RGUMENT y is the source token of x under the syntax dependency outside of modifier.
Question Question D ISTANCE -1 y is the nearest (1-hop) neighbor of x.
Column Column F OREIGN -K EY y is the foreign key of x.
Column Column S AME -TABLE x and y appears in the same table.
Column * B RIDGE x and y are linked when y is the special column token ‘*’.
Table Column H AS The column y belongs to the table x.
Table Column P RIMARY-K EY The column y is the primary key of the table x.
Table * B RIDGE x and y are connected when y is the special column token ‘*’.
Question Table E XACT-M ATCH x is part of y, and y is a span of the entire question.
Question Table PARTIAL -M ATCH x is part of y, but the entire question does not contain y.
Question Column E XACT-M ATCH x is part of y, and y is a span of the entire question.
Question Column PARTIAL -M ATCH x is part of y, but the entire question does not contain y.
Question Column VALUE -M ATCH x is part of the candidate cell values of column y.
Question * B RIDGE x and y are linked when y is the special column token ‘*’.

Table 6: The checklist of main types of relations used in G RAPHIX-T5. All relations above are asymmetric.

A Fine-grained Syntax Relations Model Dev Test


The previous work (Cao et al. 2021; Wang et al. 2020a),
w/ Encoder-based PLM
which employed distances as the only relationship be-
tween tokens when constructing a graph, was unable RATSQL + BERT 69.7 65.6
to account for the deterministic relationships between RATSQL + GRAPPA 73.4 69.6
tokens. For example, Given two sentences with the same GAZP + BERT 59.1 53.3
meanings: "List names of students who are BRIDGE v2 + BERT 70.0 65.0
not from France."; "What are the names NatSQL + GAP 73.7 68.7
of students whose nationality is not SMBOP + GRAPPA 74.7 69.7
France?". The relation between not and France LGESQL + ELECTRA 75.1 72.0
should be the same in these two sentences. However, it S2 SQL + ELECTRA 76.4 72.1
is represented as two different relations: D ISTANCE -2
and D ISTANCE -1 respectively according to their defined w/ Text-to-Text PLM: T5
relations, which will lead PLMs to learn the wrong relation PICARD + T5-3B 75.5 71.9
representations. Even though Hui et al. (2022) proposed UnifiedSKG + T5-3B 71.8 -
Forward and Backward as the additional abstract correla- RASAT + PICARD + T5-3B 75.3 70.9
tions of question tokens, it is still hard to discern the more
important relations that is useful to text-to-SQL. In this G RAPHIX + PICARD + T5-3B 77.1 74.0
work, we observe that that nouns and other tokens that
potentially indicate characteristics of the nouns can help the Table 7: Results of Exact Match (EM) on S PIDER test.
model to find the corresponding database items. In order to
achieve this goal, we cluster dependency parsing relations Model Dev Test
manually into two new categories of syntax relations:
w/ Encoder-based PLM
M ODIFIER and A RGMENT. As shown in Table 6, the
M ODIFIER denotes that some properties of the source GAZP + BERT 59.2 53.5
token node are being modified by the target token node. For BRIDGE v2 + BERT 68.3 64.3
example, in the phrase Female Students, the Female NatSQL + GAP 75.0 73.3
is a modifier of the Students; the Production is the SMBOP + GRAPPA 75.0 71.1
modifier of the token Time in the phrase Production
Time. All other dependency parsing relations will be w/ Text-to-Text PLM: T5
marked as A RGMENT. PICARD + T5-3B 79.3 75.1
UnifiedSKG + T5-3B 74.4 -
B Leadboard Result RASAT + PICARD + T5-3B 80.5 75.5
After being equipped with PICARD, G RAPHIX-T5 achieves G RAPHIX + PICARD + T5-3B 81.0 77.6
the No.1 on S PIDER testing leaderboard with the clear mar-
gin, as shown in the table 7 and table 8. Table 8: Results of Execution (EX) on S PIDER test.

You might also like