Graphix T5
Graphix T5
{binyuan.hby,f.huang,luo.si,shuide.lyb}@alibaba-inc.com
ing natural language questions into executable SQL queries, Database: student
has garnered increasing attention in recent years, as it can EM
Student Sex
from databases without the need for technical background.
One of the major challenges in text-to-SQL parsing is do- Question
GNN GNN
Figure 2: Graphical illustration of existing methods (a) RATSQL [pre-trained BERT-encoder → graph-based module → ran-
domly initialized decoder]. (b) T5 [pre-trained T5-encoder → pre-trained T5-decoder] and the proposed variant (c) GNN-
T5 [pre-trained T5-encoder → graph-based module → pre-trained T5-decoder] (d) G RAPHIX-T5 [semi-pre-trained graphix-
module → pre-trained T5-decoder].
From the modeling perspective, there are two critical di- and implicit relations. Second, we construct a new encoder
mensions along which we can differentiate current text-to- by stacking the G RAPHIX layers and replacing the origi-
SQL parsers. The first is how to effectively imbue rela- nal T5 encoder. In each G RAPHIX layer, the parameters of
tional structures (both explicit and implicit) in the form of the semantic block are still initialized by T5, in an attempt
graphs into neural networks, and the second is how to take to maintain the contextualized encoding power of the pre-
the most advantage of pre-trained models (e.g.T5 (Raffel training. In contrast to the severed GNN-T5 (Figure 2.(c)),
et al. 2020)). These two dimensions are inter-connected and the G RAPHIX-T5 (Figure 2.(d)) will allow intensive interac-
form a spectrum of methods. On one end of the spectrum, tion between semantic and structure from the starting layers.
PICARD (Scholak, Schucher, and Bahdanau 2021) uses We empirically show the effectiveness of G RAPHIX-T5
the original pre-trained T5 model by linearizing database on several cross-domain text-to-SQL benchmarks, i.e. , S PI -
schemas into sequences, hoping that T5 can successfully DER , S YN , D K and R EALISTIC . On these datasets, the
capture the underlying relational structures. On the other proposed model achieves new state-of-the-art performance,
end of the spectrum, RAT-SQL (Wang et al. 2020a) only uti- substantially outperforming all existing models by large
lizes pre-trained encoders (e.g., BERT (Devlin et al. 2019)) margins. Specifically, G RAPHIX-T5-large surprisingly beats
and explicitly captures desired relations via specialized the vanilla T5-3B. Furthermore, we verified that G RAPHIX-
relation-aware models. However, more powerful encoder- T5 can also achieve the significant improvement in the low-
decoder based pre-trained models are not exploited in this resource and compositional generalization obviously thanks
framework, but relational structures are accommodated at to the introduction of structural bias. It should be noticed that
most. In this work, we explore the cross zone where the though we only focus on text-to-SQL parsing in this work,
encoder-decoder based pre-trained models (specifically T5) we believe that the general methodology of G RAPHIX-T5
and relation-aware encodings are deeply coupled in favor of can be extended to structured knowledge grounding tasks,
better domain generalization. We first observe that naively e.g., TableQA (Pasupat and Liang 2015), Data-to-text (Nan
adding a relational graph-based module in the middle of T5, et al. 2021) and KBQA (Talmor and Berant 2018).
resulting in a ‘T5-encoder → graph-based module → T5-
decoder architecture’ (see also Figure 2(c), namely GNN-
T5), does not work very well on standard benchmarks. Pre- 2 Task Formulation and Notations
sumably, the deficiency comes from the middle graph-based 2.1 Task Definition
modules breaking the original information flow inside T5.
In order to address this problem, we present a novel ar- Given a natural language question Q = q1 , ..., q|Q| with
chitecture called G RAPHIX-T5 that is capable of effectively its
corresponding database schemas D = hC, T i, where C =
modelling relational structure information while maintain- c1 , ..., c|C| and T = t1 , ..., t|T | represent columns and
ing the powerful contextual encoding capability of the pre- tables, |C| and |T | refer to the number of columns and tables
trained T5. First, we design a G RAPHIX layer that simul- in each database respectively. The goal of text-to-SQL is to
taneously encodes a mixture of semantic and structural in- generate the corresponding SQL query y.
formation. Concretely, hidden states of inputs composed by
questions and databases are modelled by contextualized se- 2.2 Vanilla T5 Architecture
mantic encoding, and the structural representation is injected
in each transformer layer using a relational GNN block Model Inputs The most canonical and effective format of
that enhances multi-hop reasoning through message passing inputs to T5 performing text-to-SQL task is PeteShaw (Shaw
(Fang et al. 2020; Velickovic et al. 2018) to capture explicit et al. 2021), which unifies natural language questions Q and
content
database schema D as a joint sequence as shown: Paragraph_text
PARTIAL-MATCH
MOD
t t|T | ids
x = [q1 , ..., q|Q| | Dname |t1 : ct11 , ..., ct|C|
1
|...|t|T | : c1|T | , ..., c|C| |∗], ARG
PARTIAL-MATCH
Document_id
(1)
where qi is ith token in the question, tj represents j th table
MOD
les
Documents
t text
in the D, and ckj refers to the k th column in the j th table.
∗ is the special column token in the database. Dname is the content
NO-MATCH
Paragraph_text content
Bridge Node
Paragraph_text
name of each database. les Document_id les * Document_id
text Document
Encoder-Decoder Training Mechanism Following text Document
(Shaw et al. 2021), T5 (Raffel et al. 2020) adopt an (a) No-Match Mode (b) Bridge Node Mode
encoder-decoder mechanism to generate SQLs. First, the
bi-directional encoder learns the hidden state h of input x, Figure 3: Figure shows the circumstances when entities in
then the decoder generates SQLs based on h as:
fi
fi
Table 4: Exact matching (EM) accuracy by varying the levels of difficulty of the inference data on four benchmarks.
Figure 6: Case study: two illustrative cases sampled randomly from S YN. It shows that multi-hop reasoning can help G RAPHIX-
fi
fi
fi
T5 generate more correct SQLs in terms of semantic meanings and database schema structures.
SQLs even in the hard scenarios. That is because that, even tural reasoning through modelling relations between schema
with a small number of keywords overlapped, G RAPHIX- and questions. R2 SQL (Hui et al. 2021a), S CORE (Yu et al.
T5 can accurately identify counterpart column or table ob- 2021) and S TAR (Cai et al. 2022) enhance structural reason-
jects and generate a high-quality SQL through multi-hop ing for context-dependent text-to-SQL parsing. These works
reasoning and structural grounding. For example, in the first are performed by the PLM independently building the se-
case, vanilla T5-3B picks the incorrect columns paper id, mantic features, followed by the graph-based module inject-
paper name, and paper description, which even ing the structural information. However, such training strat-
don’t appear in the table documents. This implies that egy is just effective to encoder-based PLMs (i.e. , BERT
vanilla T5-3B is unable to reach the target schema ele- (Devlin et al. 2019), ELECTRA (Clark et al. 2020), et al.).
ments without the capability of structural grounding when Recently, the text-to-text PLM T5 has been proven effec-
confronting challenging text-to-SQLs. Instead, G RAPHIX- tiveness in text-to-SQL (Shaw et al. 2021; Qin et al. 2022c).
T5-3B can correspond the question entities to the correct Besides, (Scholak, Schucher, and Bahdanau 2021) designs
column names through multi-hop paths presented in the a constrained decoding process, namely PICARD, to detect
Figure 6. In the second case, vanilla T5-3B misidentifies and refuse erroneous tokens during the beam-search phase.
the country as their target column, however, "France" Xie et al. (2022) further injects the knowledge from other
only appears in the column countryname of the table structural knowledge grounding tasks into T5 with multi-
countries. This suggests T5-3B is only able to generate task to boost performance on text-to-SQL. Despite effec-
semantically valid SQLs, which fails to take into account tiveness, these methods still struggle to generate SQLs in
the real database structure. On contrary, G RAPHIX-T5 can the more challenging and complex scenarios without explicit
produce truly valid SQLs in terms of both questions and and implicit structural information. However, G RAPHIX-T5
databases via a successful mixture of semantic and structural can overcome this issue by an argument of graph representa-
information during training. tion learning in the encoder. Concurrently, RASAT (Qi et al.
2022) also attempts to provide T5 with the structural infor-
5 Related Works mation by adding edge embedding into the multi-head self-
attention, while we keep the pre-trained transformers com-
The basic principle of a cross-domain text-to-SQL parser
plete in order to benefit the most from prior semantic knowl-
is to build an encoder to learn the representations of the
edge, which leads to better performance.
questions and schemas, while employing a decoder to gen-
erate SQLs with the information learnt in the encoder (Qin
et al. 2022a). In particular, IRNET (Guo et al. 2019) pro- 6 Conclusion
poses to design an encoder to learn the representations of In this paper, we proposed an effective architecture to boost
questions and schemas respectively via an attention-based the capability of structural encoding of T5 cohesively while
Bi-LSTM and a decoder to predict SQLs via encoded in- keeping the pretrained T5’s potent contextual encoding abil-
termediate representations. Later, the graph-based encoders ity. In order to achieve this goal, we designed a Graph-Aware
have been successfully proved its effectiveness in text-to- semi-pretrained text-to-text PLM, namely G RAPHIX-T5, to
SQL tasks, for example, some works (Bogin, Berant, and augment the multi-hop reasoning for the challenging text-
Gardner 2019; Chen et al. 2021) construct the schema graph to-SQL task. The results under the extensive experiments
and enhance the representations of inputs. RATSQL (Wang demonstrate the effectiveness of G RAPHIX-T5, proving that
et al. 2020a), SDSQL (Hui et al. 2021b), LGESQL (Cao structural information is crucial for the current text-to-text
et al. 2021), S2 SQL (Hui et al. 2022) further improve struc- PLMs for complicated text-to-SQL cases.
Acknowledgement Gan, Y.; Chen, X.; Huang, Q.; Purver, M.; Woodward, J. R.;
Xie, J.; and Huang, P. 2021a. Towards Robustness of Text-
We thank Dr. Tao Yu and Tianbao Xie for evaluation of our
to-SQL Models against Synonym Substitution. In Proc. of
work on SPIDER leaderboard. We thank Dr. Bailin Wang
ACL.
and Dr. Bowen Li for constructive suggestions. Reynold
Cheng, Jinyang Li, Nan Huo, and Wenyu Du were supported Gan, Y.; Chen, X.; and Purver, M. 2021. Exploring Under-
by the University of Hong Kong (Project 104006830), the explored Limitations of Cross-Domain Text-to-SQL Gener-
Guangdong–Hong Kong-Macau Joint Laboratory Program alization. In Proc. of EMNLP.
2020 (Project No: 2020B1212030009), and the Innovation Gan, Y.; Chen, X.; Xie, J.; Purver, M.; Woodward, J. R.;
Wing Two Research fund. Jinyang Li was also supported Drake, J.; and Zhang, Q. 2021b. Natural SQL: Making SQL
by HKU Presidential PhD Scholar Programme and Alibaba Easier to Infer from Natural Language Specifications. In
Group through Alibaba Research Intern Program. Chenhao Proc. of EMNLP Findings.
Ma was supported in part by Shenzhen Science and Technol- Guo, J.; Zhan, Z.; Gao, Y.; Xiao, Y.; Lou, J.-G.; Liu, T.;
ogy Program under grant No.ZDSYS20211021111415025. and Zhang, D. 2019. Towards Complex Text-to-SQL in
Cross-Domain Database with Intermediate Representation.
References In Proc. of ACL.
Bogin, B.; Berant, J.; and Gardner, M. 2019. Representing Hui, B.; Geng, R.; Ren, Q.; Li, B.; Li, Y.; Sun, J.; Huang, F.;
Schema Structure with Graph Neural Networks for Text-to- Si, L.; Zhu, P.; and Zhu, X. 2021a. Dynamic Hybrid Rela-
SQL Parsing. In Proc. of ACL. tion Network for Cross-Domain Context-Dependent Seman-
tic Parsing. In Proc. of AAAI.
Cai, R.; Xu, B.; Zhang, Z.; Yang, X.; Li, Z.; and Liang, Z.
2018. An Encoder-Decoder Framework Translating Natural Hui, B.; Geng, R.; Wang, L.; Qin, B.; Li, Y.; Li, B.; Sun,
Language to Database Queries. In Proc. of IJCAI. J.; and Li, Y. 2022. S2 SQL: Injecting Syntax to Question-
Schema Interaction Graph Encoder for Text-to-SQL Parsers.
Cai, R.; Yuan, J.; Xu, B.; and Hao, Z. 2021. SADGA: In Proc. of ACL Findings.
Structure-Aware Dual Graph Aggregation Network for Text- Hui, B.; Shi, X.; Geng, R.; Li, B.; Li, Y.; Sun, J.; and Zhu, X.
to-SQL. In Proc. of NeurIPS. 2021b. Improving Text-to-SQL with Schema Dependency
Cai, Z.; Li, X.; Hui, B.; Yang, M.; Li, B.; Li, B.; Cao, Z.; Li, Learning. In arXiv:2103.04399.
W.; Huang, F.; Si, L.; and Li, Y. 2022. STAR: SQL Guided Iyer, S.; Konstas, I.; Cheung, A.; Krishnamurthy, J.; and
Pre-Training for Context-dependent Text-to-SQL Parsing. Zettlemoyer, L. 2017. Learning a Neural Semantic Parser
In Proc. of EMNLP Findings. from User Feedback. In Proc. of ACL.
Cao, R.; Chen, L.; Chen, Z.; Zhao, Y.; Zhu, S.; and Yu, K. Nan, L.; Radev, D.; Zhang, R.; Rau, A.; Sivaprasad, A.;
2021. LGESQL: Line Graph Enhanced Text-to-SQL Model Hsieh, C.; Tang, X.; Vyas, A.; Verma, N.; Krishna, P.; Liu,
with Mixed Local and Non-Local Relations. In Proc. of Y.; Irwanto, N.; Pan, J.; Rahman, F.; Zaidi, A.; Mutuma,
ACL. M.; Tarabar, Y.; Gupta, A.; Yu, T.; Tan, Y. C.; Lin, X. V.;
Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; and Sun, X. 2020a. Xiong, C.; Socher, R.; and Rajani, N. F. 2021. DART: Open-
Measuring and Relieving the Over-Smoothing Problem for Domain Structured Data Record to Text Generation. In Proc.
Graph Neural Networks from the Topological View. In Proc. of NAACL.
of AAAI. Pasupat, P.; and Liang, P. 2015. Compositional Semantic
Chen, X.; Meng, F.; Li, P.; Chen, F.; Xu, S.; Xu, B.; and Parsing on Semi-Structured Tables. In Proc. of ACL.
Zhou, J. 2020b. Bridging the Gap between Prior and Pos- Qi, J.; Tang, J.; He, Z.; Wan, X.; Cheng, Y.; Zhou, C.; Wang,
terior Knowledge Selection for Knowledge-Grounded Dia- X.; Zhang, Q.; and Lin, Z. 2022. RASAT: Integrating Re-
logue Generation. In Proc. of EMNLP. lational Structures into Pretrained Seq2Seq Model for Text-
to-SQL. In Proc. of EMNLP.
Chen, Z.; Chen, L.; Zhao, Y.; Cao, R.; Xu, Z.; Zhu, S.; and
Yu, K. 2021. ShadowGNN: Graph Projection Neural Net- Qin, B.; Hui, B.; Wang, L.; Yang, M.; Li, J.; Li, B.; Geng,
work for Text-to-SQL Parser. In Proc. of NAACL. R.; Cao, R.; Sun, J.; Si, L.; Huang, F.; and Li, Y. 2022a.
A Survey on Text-to-SQL Parsing: Concepts, Methods, and
Clark, K.; Luong, M.; Le, Q. V.; and Manning, C. D. 2020. Future Directions. In arXiv:2208.13629.
ELECTRA: Pre-training Text Encoders as Discriminators
Rather Than Generators. In Proc. of ICLR. Qin, B.; Wang, L.; Hui, B.; Geng, R.; Cao, Z.; Yang, M.;
Sun, J.; and Li, Y. 2022b. Linking-Enhanced Pre-Training
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. for Table Semantic Parsing. In arXiv:2111.09486.
BERT: Pre-training of Deep Bidirectional Transformers for
Qin, B.; Wang, L.; Hui, B.; Li, B.; Wei, X.; Li, B.; Huang, F.;
Language Understanding. In Proc. of NAACL.
Si, L.; Yang, M.; and Li, Y. 2022c. SUN: Exploring Intrinsic
Fang, Y.; Sun, S.; Gan, Z.; Pillai, R.; Wang, S.; and Liu, J. Uncertainties in Text-to-SQL Parsers. In Proc. of COLING.
2020. Hierarchical Graph Network for Multi-hop Question Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.;
Answering. In Proc. of EMNLP. Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring
French, R. M. 1999. Catastrophic forgetting in connectionist the Limits of Transfer Learning with a Unified Text-to-Text
networks. Trends in cognitive sciences. Transformer. Journal of Machine Learning Research.
Rubin, O.; and Berant, J. 2021. SmBoP: Semi- Yu, T.; Li, Z.; Zhang, Z.; Zhang, R.; and Radev, D. 2018a.
autoregressive Bottom-up Semantic Parsing. In Proc. of TypeSQL: Knowledge-Based Type-Aware Neural Text-to-
NAACL. SQL Generation. In Proc. of NAACL.
Scholak, T.; Schucher, N.; and Bahdanau, D. 2021. Yu, T.; Zhang, R.; Polozov, A.; Meek, C.; and Awadallah,
PICARD: Parsing Incrementally for Constrained Auto- A., Hassan. 2021. SCoRe: Pre-Training for Context Repre-
Regressive Decoding from Language Models. In Proc. of sentation in Conversational Semantic Parsing. In Proc. of
EMNLP. ICLR.
Shaw, P.; Chang, M.-W.; Pasupat, P.; and Toutanova, K. Yu, T.; Zhang, R.; Yang, K.; Yasunaga, M.; Wang, D.; Li, Z.;
2021. Compositional Generalization and Natural Language Ma, J.; Li, I.; Yao, Q.; Roman, S.; Zhang, Z.; and Radev, D.
Variation: Can a Semantic Parsing Approach Handle Both? 2018b. Spider: A Large-Scale Human-Labeled Dataset for
In Proc. of ACL. Complex and Cross-Domain Semantic Parsing and Text-to-
Shazeer, N.; and Stern, M. 2018. Adafactor: Adaptive Learn- SQL Task. In Proc. of EMNLP.
ing Rates with Sublinear Memory Cost. In Proc. of ICML. Zelle, J. M.; and Mooney, R. J. 1996. Learning to Parse
Talmor, A.; and Berant, J. 2018. The Web as a Knowledge- Database Queries Using Inductive Logic Programming. In
Base for Answering Complex Questions. In Proc. of Proc. of AAAI.
NAACL. Zhong, V.; Lewis, M.; Wang, S. I.; and Zettlemoyer, L. 2020.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, Grounded Adaptation for Zero-shot Executable Semantic
L.; Gomez, A. N.; Kaiser, L.; and Polosukhin, I. 2017. At- Parsing. In Proc. of EMNLP.
tention is All you Need. In Proc. of NeurIPS. Zhong, V.; Xiong, C.; and Socher, R. 2017. Seq2SQL: Gen-
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, erating Structured Queries from Natural Language using Re-
P.; and Bengio, Y. 2018. Graph Attention Networks. In Proc. inforcement Learning. In CoRR abs/1709.00103.
of ICLR.
Wang, B.; Shin, R.; Liu, X.; Polozov, O.; and Richardson, M.
2020a. RAT-SQL: Relation-Aware Schema Encoding and
Linking for Text-to-SQL Parsers. In Proc. of ACL.
Wang, K.; Shen, W.; Yang, Y.; Quan, X.; and Wang, R.
2020b. Relational Graph Attention Network for Aspect-
based Sentiment Analysis. In Proc. of ACL.
Wang, L.; Qin, B.; Hui, B.; Li, B.; Yang, M.; Wang, B.;
Li, B.; Huang, F.; Si, L.; and Li, Y. 2022. Proton: Prob-
ing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing. In Proc. of KDD.
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.;
Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; Davi-
son, J.; Shleifer, S.; von Platen, P.; Ma, C.; Jernite, Y.; Plu,
J.; Xu, C.; Le Scao, T.; Gugger, S.; Drame, M.; Lhoest, Q.;
and Rush, A. 2020. Transformers: State-of-the-Art Natural
Language Processing. In Proc. of EMNLP.
Xie, T.; Wu, C. H.; Shi, P.; Zhong, R.; Scholak, T.; Yasunaga,
M.; Wu, C.-S.; Zhong, M.; Yin, P.; Wang, S. I.; Zhong, V.;
Wang, B.; Li, C.; Boyle, C.; Ni, A.; Yao, Z.; Radev, D.;
Xiong, C.; Kong, L.; Zhang, R.; Smith, N. A.; Zettlemoyer,
L.; and Yu, T. 2022. UnifiedSKG: Unifying and Multi-
Tasking Structured Knowledge Grounding with Text-to-Text
Language Models. ArXiv preprint.
Xu, X.; Liu, C.; and Song, D. 2017. Sqlnet: Generating
structured queries from natural language without reinforce-
ment learning. ArXiv preprint.
Yaghmazadeh, N.; Wang, Y.; Dillig, I.; and Dillig, T. 2017.
SQLizer: query synthesis from natural language. Proceed-
ings of the ACM on Programming Languages.
Ye, H.; Zhang, N.; Deng, S.; Chen, X.; Chen, H.; Xiong, F.;
Chen, X.; and Chen, H. 2022. Ontology-enhanced Prompt-
tuning for Few-shot Learning. In WWW ’22: The ACM Web
Conference 2022, Virtual Event, Lyon, France, April 25 - 29,
2022.
Source x Target y Relation Type Description
Question Question M ODIFIER y is a modifier of x.
Question Question A RGUMENT y is the source token of x under the syntax dependency outside of modifier.
Question Question D ISTANCE -1 y is the nearest (1-hop) neighbor of x.
Column Column F OREIGN -K EY y is the foreign key of x.
Column Column S AME -TABLE x and y appears in the same table.
Column * B RIDGE x and y are linked when y is the special column token ‘*’.
Table Column H AS The column y belongs to the table x.
Table Column P RIMARY-K EY The column y is the primary key of the table x.
Table * B RIDGE x and y are connected when y is the special column token ‘*’.
Question Table E XACT-M ATCH x is part of y, and y is a span of the entire question.
Question Table PARTIAL -M ATCH x is part of y, but the entire question does not contain y.
Question Column E XACT-M ATCH x is part of y, and y is a span of the entire question.
Question Column PARTIAL -M ATCH x is part of y, but the entire question does not contain y.
Question Column VALUE -M ATCH x is part of the candidate cell values of column y.
Question * B RIDGE x and y are linked when y is the special column token ‘*’.
Table 6: The checklist of main types of relations used in G RAPHIX-T5. All relations above are asymmetric.