TableLlama Towards Open Large Generalist Models For Tables
TableLlama Towards Open Large Generalist Models For Tables
Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic
tables and tasks with instructions. We make the first step towards developing open-source generalist models for
tables with TableInstruct and TableLlama.
as a large number of candidates for classification pared with existing work (Li et al., 2023b). We
and ranking tasks. incorporate a large number of Wikipedia tables and
In pursuing this goal, we realize there lacks a spreadsheets from statistical scientific reports with
comprehensive collection of realistic tables and varied length of contents, realistic and complex
tasks that can support the development and eval- semantic types from Freebase (Google.2015) for
uation of generalist models. Therefore, we con- column type annotation and relation extraction, and
struct TableInstruct, by meticulously selecting a large referent entity corpus with rich metadata
representative table-based tasks from widely used from Wikidata (Vrandečić and Krötzsch, 2014) for
datasets, unifying the format for all tasks and entity linking. In addition, we include complicated
manually annotating instructions. TableInstruct numerical reasoning tasks with hierarchical table
shown in Table 1 offers the following unique fea- structure and existing manually annotated table QA
tures: (1) Diverse coverage of tables and tasks. and fact verification tasks. By doing so, we aim
TableInstruct boasts a collection of 14 datasets to equip models with the capability to cope with
of 11 tasks in total, with both in-domain and out- realistic and complex table-based tasks.
of-domain evaluation settings. Our training data TableInstruct requires models to accommo-
includes 8 tasks, which are curated from 1.24M date long inputs (Table 1). We adopt LongLoRA
tables containing 2.6M instances spanning from ta- (Chen et al., 2023b) based on Llama 2 (7B) (Tou-
ble interpretation, table augmentation, table-based vron et al., 2023) as our backbone model, which
QA, and table-based fact verification. We choose 8 has been shown efficient and effective to handle
datasets for these 8 tasks for in-domain evaluation long contexts. We fine-tune it on TableInstruct
and leave the other 6 datasets for 4 tasks for out-of- and name our model TableLlama. We conducted
domain evaluation. The in-domain training tasks extensive experiments under both in-domain and
can enable the model to learn more fundamental out-of-domain settings. Our experiments show
table understanding abilities such as table interpre- TableLlama has strong capabilities for various
tation and table augmentation, while we choose in-domain table understanding and augmentation
tasks that require more high-level reasoning abili- tasks, and also achieves promising performance in
ties such as table QA and cell description to test the generalizing to unseen tasks and datasets.
model’s generalization ability. This extensive range In summary, our main contributions are:
of tables and diverse tasks not only provide valu-
able resources for table modeling, but also foster a • We construct TableInstruct, a large-scale
more comprehensive evaluation of generalist mod- instruction tuning dataset with diverse, realis-
els. (2) The use of real-world tables and realistic tic tasks based on real-world tables. We unify
tasks. TableInstruct uses authentic real-world their format and manually annotate instruc-
instead of overly simplified synthetic task data com- tions to guarantee quality.
(a) Column Type Annotation ### Instruction:
This is a column type annotation task. The goal for this task is to choose the correct types for one selected column of the table from
1958 Nippon Professional Baseball season the given candidates. The Wikipedia page, ... provide important information for choosing the correct column types.
Central League ### Input:
[TLE] The Wikipedia page is about 1958 Nippon Professional Baseball season. The Wikipedia section is about Central League. The table
Stat Player Team Total
Wins Masaichi Kaneda Kokutetsu Swallows 31
caption is Pitching leaders. [TAB] col: | stat | player | ... [SEP] row 1: | Wins | Masaichi Kaneda | ... [SEP] row 2: | Losses | ...
Losses Noboru Akiyama Taiyo Whales 23 ### Question:
Earned run
Masaichi Kaneda Kokutetsu Swallows 1.3 The column 'player' contains the following entities: <Masaichi Kaneda>, <Noboru Akiyama>, ... The column type candidates are:
average
tv.tv_producer, astronomy.star_system_body, ... What are the correct column types for this column (column name: player; entities:
Strikeouts Masaichi Kaneda Kokutetsu Swallows 311
<Masaichi Kaneda>, ... , etc)?
Motoshi Fujita Yomiuri Giants
Innings pitched 359
Noboru Akiyama Taiyo Whales ### Response: sports.pro_athlete, baseball.baseball_player, people.person.
Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected
column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and
partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the
“task instruction” part. The candidate set size can be hundreds to thousands in TableInstruct.
Table 1: Statistics of train/test tasks and datasets in our TableInstruct. For each task, we explain its definition and
show an example in Appendix E.
swer with tables and optional highlighted cells or dečić and Krötzsch, 2014), which contains hun-
passages as evidence. Fact verification is to dis- dreds of complex metadata, such as “<2011-12
criminate whether the tables can support or refute Melbourne Victory season [DESCRIPTION] Asso-
the claims. Dialogue generation is to generate a re- ciation football club 2011/12 season for Melbourne
sponse grounded on the table and dialogue history. Victory [TYPE] SoccerClubSeason>” as shown in
Data-to-text is to generate a description based on Figure 5 in Appendix E. For schema augmentation
the highlighted cells. By choosing the tasks that and row population, there are a huge number of can-
require models to learn more fundamental table didates that LLMs need to rank. For hierarchical
understanding abilities such as table interpretation table QA, all the tables are engaged with intricate
and table augmentation for training, we hope the structures with multi-level column names and row
model can demonstrate generalization ability on names. In addition, it is intensive in numerical rea-
out-of-domain datasets such as high-level table QA soning which requires LLMs to understand table
and table cell description tasks. structure, identify related cells and do calculations.
In-domain: The tasks for training the general- By doing so, we hope to enable LLMs to become
ist table model include column type annotation, truly powerful generalist models that can handle so-
relation extraction, entity linking, row popula- phisticated table tasks and TableInstruct can be
tion, schema augmentation, hierarchical table QA, a realistic benchmark to evaluate LLMs’ abilities
highlighted cells QA, and table fact verification. compared with specially designed table models.
These tasks require the model to understand the Out-of-domain: A powerful generalist table
semantics of table columns, the relation between model is expected to not only demonstrate strong
table column pairs, the semantics of table cells performance on in-domain tasks, but also general-
and require the model to gain reasoning ability ize well to unseen tasks or unseen datasets of the
to answer table-related questions and verify the same tasks. We choose tasks such as table QA and
facts. For the dataset of each task, we intention- cell description that require the model’s high-level
ally pick up those that enjoy realistic task com- table understanding and reasoning ability as out-
plexity without simplifying assumptions. For ex- of-domain datasets. We involve HybridQA (Chen
ample, for column type annotation and relation et al., 2020b), KVRET (Eric et al., 2017), FEVER-
extraction, these two tasks are multi-choice classifi- OUS (Aly et al., 2021), ToTTo (Parikh et al., 2020),
cation tasks in essence. We use real-world column WikiSQL (Zhong et al., 2017) and WikiTQ (Pasu-
semantic types and relation types from Freebase pat and Liang, 2015) as 6 out-of-domain datasets
(Google.2015), which contains hundreds of com- to test our model’s generalization ability.
plex choices such as “government.politician.party-
government.political_party_tenure.party” shown in 2.2 Task Formulation and Challenges
Figure 4 in Appendix E. For entity linking, the ref- The primary objective of TableInstruct is to de-
erent entities are from real-world Wikidata (Vran- sign one generalist model for all table-based tasks.
As Figure 2 (a)-(c) shows, each instance in our leads to less computation cost with similar perfor-
dataset maps three components: <instruction, table mance compared to fine-tuning with vanilla atten-
input, question> to an output. The instruction is tion. We fine-tune LongLoRA on TableInstruct
manually designed to point out the task and give to get our generalist model TableLlama.
a detailed task description. We concatenate table Existing SOTA Models. In our evaluation settings,
metadata such as the Wikipedia page title, section we have 9 out of 14 SOTA models utilize table
title and table caption with the serialized table as pretraining and/or have special model architecture
table input. In the question, we put all the infor- design for tables. The detailed description for each
mation the model needed to complete the task and SOTA model is in Appendix A.
prompt the model to generate an answer. For exam- Evaluation Metrics. We follow the above base-
ple, for the column type annotation task, as Figure lines to use their evaluation metrics. For column
2 (a) shows, the column named “Player” needs to type annotation, relation extraction and KVRET,
be annotated with its semantic types. In the for- we use Micro F1. For entity linking, TabFact,
mat, the “instruction” gives the description of the FEVEROUS, HybridQA, WikiSQL and WikiTQ,
task. The “input” contains the table-related infor- we use accuracy. For row population and schema
mation. Then we provide the entire candidate pool augmentation, we use MAP. For Hitab, we use exe-
in the “question” and ask the model to choose one cution accuracy (Zhong et al., 2017). For FeTaQA
or multiple correct semantic types for this column. and ToTTo, we use BLEU (Papineni et al., 2002).
Challenges. Since we select realistic tasks and Training and Inference Details. We choose Lon-
tables, the table length can vary from several to gLoRA 7B (Chen et al., 2023b), fully fine-tuning
thousands of rows. Besides, for some tasks that version with 8K context length limit as our base
are essentially multi-choice classification or rank- model. The fully fine-tuning version replaces the
ing, the entire candidate pool can be very large vanilla attention in Llama 2 with shift short atten-
up to thousands. Furthermore, as the candidates tion. We fine-tune the model with Huggingface
are from real-world Freebase and Wikidata, each transformers library (Wolf et al., 2020). We merge
candidate is long, such as “<2011-12 Melbourne all eight datasets and repeat three smaller datasets
Victory season [DESCRIPTION] Association foot- (i.e., FeTaQA, HiTab and TabFact) for six times
ball club 2011/12 season for Melbourne Victory and randomly shuffle them as our final training data.
[TYPE] SoccerClubSeason>” is one candidate for We use a learning rate of 2e-5 and set the batch size
entity linking. These characteristics can not only at 3. We streamingly train the model on 48 A100
make it difficult for the model to learn, but also 80GB GPUs and use a cosine scheduler with a 3%
introduce the challenge of handling long contexts. warm-up period for 2 epochs. To efficiently train
the model, we employ DeepSpeed training with
3 Experimental Setup ZeRO-2 stage (Rajbhandari et al., 2020). For both
training and inference, we set the input length as
Model Construction. Although a few existing 8192. For inference on TableLlama, as different
LLMs (Chen et al., 2023a; Tworkowski et al., 2023) tasks have different lengths of the ground truth, we
can handle longer than 4K contexts, their training use 64 as the output length for column type anno-
time is quadratically increasing with context length, tation, relation extraction, entity linking, HiTab,
which becomes very costly for us to further fine- TabFact, FEVEROUS, HybridQA, WikiSQL and
tune them on TableInstruct due to our large data WikiTQ, 128 for schema augmentation, FeTaQA,
scale. As LongLoRA (Chen et al., 2023b) has been KVRET and ToTTo, and 512 for row population.
shown as an effective and efficient technique to For column type annotation and entity linking, we
train long-context LLMs with shift short attention, uniformly sample a subset from the original test
we adopt it as our backbone model. Shift short at- data as our test set due to the large test size. For
tention splits context length into several groups and row population, we filter out the examples with
conducts attention in each group individually. The more than 500 candidate entities from the original
tokens are shifted by half group size in half atten- test set and randomly sample a subset as our test
tion heads to ensure the information flow between set. For all the downsampled test set, we reproduce
neighboring groups. For example, LongLoRA can the SOTA results using the SOTA model.
use shift short attention with group size 2048 to ap- For closed-source LLMs, we use the gpt-4-1106-
proximate total 8196 context length training, which preview version for GPT-4, which is the latest ver-
In-domain Evaluation
Datasets Metric Base TableLlama SOTA GPT-3.5 GPT-4§
Column Type Annotation F1 3.01 94.39 94.54*† (Deng et al., 2020) 30.88 31.75
Relation Extraction F1 0.96 91.95 94.91*† (Deng et al., 2020) 27.42 52.95
Entity Linking Accuracy 31.80 93.65 84.90*† (Deng et al., 2020) 72.15 90.80
Schema Augmentation MAP 36.75 80.50 77.55*† (Deng et al., 2020) 49.11 58.19
Row Population MAP 4.53 58.44 73.31*† (Deng et al., 2020) 22.36 53.40
HiTab Exec Acc 14.96 64.71 47.00*† (Cheng et al., 2022a) 43.62 48.40
FeTaQA BLEU 8.54 39.05 33.44 (Xie et al., 2022) 26.49 21.70
TabFact Accuracy 41.65 82.55 84.87* (Zhao and Yang, 2022) 67.41 74.40
Table 2: In-domain evaluation results. “Base”: LongLoRA model w/o fine-tuning on TableInstruct; “*”: w/
special model architecture design for tables/tasks; “†”: w/ table pretraining; “§": for GPT-4, we uniformly sample
500 examples from test set for each task due to limited budget.
sion that supports 128K context and reports the may be beneficial for such table QA tasks despite
best performance. For GPT-3.5, we use the gpt-3.5- with semi-structured tables.
turbo-1106 version, which supports 16K context. For entity linking which requires the model
to link the mention in a table cell to the cor-
4 Result Analysis rect referent entity in Wikidata, TableLlama also
presents superior performance with 8 points gain
4.1 Main Results over SOTA. Since the candidates are composed of
In-domain Results. As Table 2 shows, we train referent entity name and description, we hypothe-
TableLlama on eight table-based tasks and eval- size LLMs have certain abilities to understand the
uate it on their test sets as the in-domain results. description which help identify the correct entities.
Due to the special semi-structured nature of tables, Row population is the only task that TableLlama
for most table-based tasks, existing work achieves has a large performance gap compared to the SOTA.
SOTA results by using pretraining on large-scale Here we provide a large number of candidates for
tables and/or special model architecture design tai- the model to rank given table metadata and the seed
lored for tables. Nonetheless, we observe that: row entity. By analyzing the errors, we found that
By simply fine-tuning a large language model on the model can easily identify the entities contain-
TableInstruct, TableLlama can achieve compa- ing similar numbers in sequence, such as the first
rable or even better performance on almost all the example shown in Table 6 in Appendix D. How-
tasks without any table pretraining or special table ever, for entities that share high similarities, such
model architecture design. For most of the tasks, as the second example in Table 6 shows, the tar-
the performance gap is within 3 absolute points, ex- get row entities are the competitions which “Oleg
cept for row population. For entity linking, schema Veretelnikov” got achievements in. To correctly
augmentation, HiTab and FeTaQA, TableLlama populate the entities from the given plenty of can-
can exceed the SOTA performance by up to 17.71 didates highly related to “competitions”, it requires
absolute points. This demonstrates that empower- the model to understand the inherent relation be-
ing open-source LLMs with more powerful table tween the athlete and each given candidate, which
understanding abilities via instruction tuning can be is still challenging for the current model.
a promising research direction to further explore. Out-of-domain results. We evaluate TableLlama
TableLlama displays advantanges in table QA on six out-of-domain datasets. We observe that:
tasks. HiTab and FeTaQA are two table question By comparing with the base model, TableLlama
answering tasks we include for training. By com- can achieve 5-44 points gain on 6 out-of-domain
paring the results, we found that TableLlama can datasets, which demonstrates TableInstruct can
surpass the SOTA by 5.61 points for FeTaQA and enhance the model’s generalization ability. By
17.71 points for HiTab, which is full of numerical learning from the table-based training tasks, the
reasoning on tables. As LLMs have been shown model has acquired essential underlying table un-
superior in interacting with humans and answering derstanding ability, which can be transferred to
questions, this indicates that the existing underly- other table-based tasks/datasets and facilitate their
ing strong language understanding ability of LLMs performance. Among these 6 datasets, we found
Out-of-domain Evaluation
Datasets Metric Base TableLlama SOTA ∆Base GPT-3.5 GPT-4§
FEVEROUS Accuracy 29.68 73.77 85.60 (Tay et al., 2022) +44.09 60.79 71.60
HybridQA Accuracy 23.46 39.38 65.40* (Lee et al., 2023) +15.92 40.22 58.60
KVRET Micro F1 38.90 48.73 67.80 (Xie et al., 2022) +9.83 54.56 56.46
ToTTo BLEU 10.39 20.77 48.95 (Xie et al., 2022) +10.38 16.81 12.21
WikiSQL Accuracy 15.56 50.48 92.70 (Xu et al., 2023b) +34.92 41.91 47.60
WikiTQ Accuracy 29.26 35.01 57.50† (Liu et al., 2022) +5.75 53.13 68.40
Table 3: Out-of-domain evaluation results. “Base”: LongLoRA model w/o fine-tuning on TableInstruct; “*”: w/
special model architecture design for tables/tasks; “†”: w/ table pretraining; “§": for GPT-4, we uniformly sample
500 examples from test set for each task due to limited budget. We put the SOTA performances here in grey for
reference and note that they were achieved under full-dataset training for each task while TableLlama is zero-shot.
Table 4: Transfer between different datasets. Bold numbers are the best results for each evaluation dataset. For
models trained on schema augmentation (ScheAug) and row population (RowPop), their predictions on other
datasets tend to repeat the candidates in the training data, which means they cannot generalize to other datasets, and
hence we use “-” to represent their performances.
models individually trained on two table-based 2020; Wang et al., 2021), and numerical encoding
QA datasets (i.e., HiTab and FeTaQA), we can (Wang et al., 2021) to better encode the table struc-
see TableLlama achieves better zero-shot perfor- ture and infuse more information to the neural ar-
mance. This indicates that including the other tasks chitecture. In addition, some work focuses on table
(i.e., TableInstruct) to train the model can fur- pretraining (Liu et al., 2022; Yin et al., 2020; Deng
ther enhance the model’s underlying table question et al., 2020; Iida et al., 2021) to encode knowledge
answering ability. in large-scale tables. However, although such ex-
Individually fine-tuning models on tasks that are isting works have shown promising progress, they
highly different from others tends to make models are still data-specific and downstream task-specific,
overfit and hardly generalize to others. As Table which requires special design tailored for tables
4 shows, the model individually fine-tuned on 4 and table-based tasks.
tasks: column type annotation, relation extraction, Our work proposes TableInstruct to unify dif-
entity linking and TabFact tends to have weaker ferent table-based tasks and develops a one-for-all
performance when evaluated on other tasks. We LLM TableLlama to reduce those extra efforts dur-
hypothesize that these four tasks are highly differ- ing modeling. This high-level insight is similar
ent from others, so the model individually trained to UnifiedSKG (Xie et al., 2022), which unifies
on such tasks will overfit to the task itself, thus a diverse set of structured knowledge grounding
becoming hard to generalize to other unseen tasks. tasks into a text-to-text format. However, Unified-
SKG deals with different knowledge sources such
5 Related Work as databases, knowledge graphs and web tables
Table Representation Learning. Given the vast and does not explore instruction tuning, while we
amount of knowledge stored in tables, various focus on a wide range of realistic tasks based on
table-based tasks have been proposed (Pujara et al., real-world tables via instruction tuning. In addi-
2021), such as column type annotation (Hulse- tion, a concurrent work (Li et al., 2023b) synthe-
bos et al., 2019), row population (Zhang and Ba- sizes diverse table-related tasks and finetunes close-
log, 2017), table QA (Sun et al., 2016; Pasupat source LLMs such as GPT-3.5 via instruction tun-
and Liang, 2015; Cheng et al., 2022b; Nan et al., ing. Compared to theirs, we collect more realistic
2022), etc. In order to handle the semi-structured and complex task data such as HiTab as well as clas-
tables, existing work puts their efforts into design- sification and ranking tasks with candidates from
ing special model architectures, such as TURL Freebase and Wikidata and develop open-source
with structure-aware attention (Deng et al., 2020), LLMs for table-based tasks. We believe both our
TUTA with tree-based attention (Wang et al., 2021) constructed high-quality table instruction tuning
and TaBERT with vertical self-attention mecha- dataset and the trained model can be valuable re-
nism (Yin et al., 2020); or designing special encod- sources for facilitating this line of research.
ings such as table position encoding (Herzig et al., Instruction Tuning. Instruction tuning that trains
LLMs using <instruction, output> pairs in a super- which are not included in TableInstruct. There-
vised fashion is a crucial technique to enhance the fore, even if TableLlama has demonstrated the
capabilities and controllability of LLMs (Chung generalization ability on different out-of-domain
et al., 2022; Wang et al., 2022; Mishra et al., 2022). datasets and tasks, the model’s performance may
The instructions serve to constrain the model’s out- vary based on the complexity and specifics of the
puts to align with the desired response character- new unseen table tasks and datasets. As we have
istics or domain knowledge and can help LLMs made the first step towards building an open large
rapidly adapt to a specific domain without ex- generalist model for tables, we encourage future
tensive retraining or architecture designs (Zhang work to further explore this line of research and to
et al., 2023). Therefore, different instruction tun- further enhance the model’s generalization ability
ing datasets have been proposed to guide LLMs’ for tables.
behaviors (Wang et al., 2022; Honovich et al.,
2022; Longpre et al., 2023; Xu et al., 2023a; Yue Acknowledgements
et al., 2024). Different instruction tuning mod-
The authors would thank all members of the
els such as InstructGPT (Ouyang et al., 2022), Vi-
OSU NLP group for providing feedback about
cuna (Zheng et al., 2023) and Claude2 emerge and
the project. This research was sponsored in part
demonstrate boosted performance compared with
by NSF IIS-1815674, NSF CAREER #1942980,
the pre-trained models. In addition, instruction tun-
and NSF OAC-2112606. The views and conclu-
ing has been applied to different modalities such as
sions contained herein are those of the authors and
images, videos and audio (Li et al., 2023a) and has
should not be interpreted as representing the offi-
shown promising results. This signals that instruc-
cial policies, either expressed or implied, of the
tion tuning can be a promising technique to enable
U.S. government. The U.S. Government is autho-
large pre-trained models to handle various tasks.
rized to reproduce and distribute reprints for Gov-
However, how to utilize instruction tuning to guide
ernment purposes notwithstanding any copyright
LLMs to complete tables-based tasks is still under-
notice herein.
explored. Our work fills this gap by construct-
ing a high-quality table instruction tuning dataset:
TableInstruct, which covers large-scale diverse References
and realistic tables and tasks to enable both mod-
Rami Aly, Zhijiang Guo, Michael Sejr Schlichtkrull,
eling and evaluation. We also release TableLlama, James Thorne, Andreas Vlachos, Christos
an open-source LLM-based generalist model fine- Christodoulopoulos, Oana Cocarascu, and Arpit
tuned on TableInstruct to promote this avenue Mittal. 2021. The fact extraction and VERification
of research. over unstructured and structured information
(FEVEROUS) shared task. In Proceedings of the
6 Conclusion Fourth Workshop on Fact Extraction and VERifica-
tion (FEVER), pages 1–13, Dominican Republic.
This paper makes the first step towards developing Association for Computational Linguistics.
open-source large generalist models for a diversity Shouyuan Chen, Sherman Wong, Liangjian Chen, and
of table-based tasks. Towards that end, we con- Yuandong Tian. 2023a. Extending context window
struct TableInstruct and develop the first open- of large language models via positional interpolation.
source generalist model for tables, TableLlama.
Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai
We evaluate both in-domain and out-of-domain set- Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and
tings and the experiments show that TableLlama William Yang Wang. 2020a. Tabfact: A large-scale
has gained strong table understanding ability and dataset for table-based fact verification. In Interna-
generalization ability. tional Conference on Learning Representations.
Hiroshi Iida, Dung Thai, Varun Manjunatha, and Mohit Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
Iyyer. 2021. TABBIE: Pretrained representations of Jing Zhu. 2002. Bleu: a method for automatic evalu-
tabular data. In Proceedings of the 2021 Conference ation of machine translation. In Proceedings of the
of the North American Chapter of the Association 40th Annual Meeting of the Association for Compu-
for Computational Linguistics: Human Language tational Linguistics, pages 311–318, Philadelphia,
Technologies, pages 3446–3456, Online. Association Pennsylvania, USA. Association for Computational
for Computational Linguistics. Linguistics.
Ankur Parikh, Xuezhi Wang, Sebastian Gehrmann, Man- Ruan Silva, Eric Michael Smith, Ranjan Subrama-
aal Faruqui, Bhuwan Dhingra, Diyi Yang, and Dipan- nian, Xiaoqing Ellen Tan, Binh Tang, Ross Tay-
jan Das. 2020. ToTTo: A controlled table-to-text lor, Adina Williams, Jian Xiang Kuan, Puxin Xu,
generation dataset. In Proceedings of the 2020 Con- Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan,
ference on Empirical Methods in Natural Language Melanie Kambadur, Sharan Narang, Aurelien Ro-
Processing (EMNLP), pages 1173–1186, Online. As- driguez, Robert Stojnic, Sergey Edunov, and Thomas
sociation for Computational Linguistics. Scialom. 2023. Llama 2: Open foundation and fine-
tuned chat models.
Panupong Pasupat and Percy Liang. 2015. Composi-
tional semantic parsing on semi-structured tables. In Szymon Tworkowski, Konrad Staniszewski, Mikołaj
Proceedings of the 53rd Annual Meeting of the As- Pacek, Yuhuai Wu, Henryk Michalewski, and Piotr
sociation for Computational Linguistics and the 7th Miłoś. 2023. Focused transformer: Contrastive train-
International Joint Conference on Natural Language ing for context scaling.
Processing (Volume 1: Long Papers), pages 1470–
1480, Beijing, China. Association for Computational Denny Vrandečić and Markus Krötzsch. 2014. Wiki-
Linguistics. data: a free collaborative knowledgebase. Communi-
cations of the ACM, 57(10):78–85.
Jay Pujara, Pedro Szekely, Huan Sun, and Muhao Chen.
Yizhong Wang, Swaroop Mishra, Pegah Alipoormo-
2021. From tables to knowledge: Recent advances
labashi, Yeganeh Kordi, Amirreza Mirzaei, Atharva
in table understanding. In Proceedings of the 27th
Naik, Arjun Ashok, Arut Selvan Dhanasekaran,
ACM SIGKDD Conference on Knowledge Discovery
Anjana Arunkumar, David Stap, Eshaan Pathak,
& Data Mining, pages 4060–4061.
Giannis Karamanolakis, Haizhi Lai, Ishan Puro-
hit, Ishani Mondal, Jacob Anderson, Kirby Kuznia,
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
Krima Doshi, Kuntal Kumar Pal, Maitreya Patel,
Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
Mehrad Moradshahi, Mihir Parmar, Mirali Purohit,
Wei Li, Peter J Liu, et al. 2020. Exploring the limits
Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma,
of transfer learning with a unified text-to-text trans-
Ravsehaj Singh Puri, Rushang Karia, Savan Doshi,
former. J. Mach. Learn. Res., 21(140):1–67.
Shailaja Keyur Sampat, Siddhartha Mishra, Sujan
Reddy A, Sumanta Patro, Tanay Dixit, and Xudong
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase,
Shen. 2022. Super-NaturalInstructions: Generaliza-
and Yuxiong He. 2020. Zero: Memory optimizations
tion via declarative instructions on 1600+ NLP tasks.
toward training trillion parameter models.
In Proceedings of the 2022 Conference on Empiri-
cal Methods in Natural Language Processing, pages
Dominique Ritze, Oliver Lehmberg, and Christian Bizer.
5085–5109, Abu Dhabi, United Arab Emirates. As-
2015. Matching html tables to dbpedia. In Proceed-
sociation for Computational Linguistics.
ings of the 5th international conference on web intel-
ligence, mining and semantics, pages 1–6. Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu,
Shi Han, and Dongmei Zhang. 2021. Tuta: Tree-
Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, based transformers for generally structured table pre-
and Xifeng Yan. 2016. Table cell search for question training. In Proceedings of the 27th ACM SIGKDD
answering. In Proceedings of the 25th International Conference on Knowledge Discovery & Data Mining,
Conference on World Wide Web, pages 771–782. pages 1780–1790.
Yi Tay, Mostafa Dehghani, Vinh Q Tran, Xavier Garcia, Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Jason Wei, Xuezhi Wang, Hyung Won Chung, Sia- Chaumond, Clement Delangue, Anthony Moi, Pier-
mak Shakeri, Dara Bahri, Tal Schuster, et al. 2022. ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
Ul2: Unifying language learning paradigms. arXiv icz, Joe Davison, Sam Shleifer, Patrick von Platen,
preprint arXiv:2205.05131. Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
Teven Le Scao, Sylvain Gugger, Mariama Drame,
Hugo Touvron, Louis Martin, Kevin Stone, Peter Al- Quentin Lhoest, and Alexander Rush. 2020. Trans-
bert, Amjad Almahairi, Yasmine Babaei, Nikolay formers: State-of-the-art natural language processing.
Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti In Proceedings of the 2020 Conference on Empirical
Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Methods in Natural Language Processing: System
Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Demonstrations, pages 38–45, Online. Association
Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, for Computational Linguistics.
Cynthia Gao, Vedanuj Goswami, Naman Goyal, An-
thony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong,
Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng
Isabel Kloumann, Artem Korenev, Punit Singh Koura, Wu, Ming Zhong, Pengcheng Yin, Sida I Wang,
Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Di- et al. 2022. Unifiedskg: Unifying and multi-tasking
ana Liskovich, Yinghai Lu, Yuning Mao, Xavier Mar- structured knowledge grounding with text-to-text lan-
tinet, Todor Mihaylov, Pushkar Mishra, Igor Moly- guage models. In Proceedings of the 2022 Confer-
bog, Yixin Nie, Andrew Poulton, Jeremy Reizen- ence on Empirical Methods in Natural Language
stein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Processing, pages 602–631.
Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, A Existing SOTA Models
Pu Zhao, Jiazhan Feng, Chongyang Tao, and Daxin
Jiang. 2023a. Wizardlm: Empowering large lan- TURL (Deng et al., 2020) is an encoder-based
guage models to follow complex instructions. BERT-like model pre-trained on 570K tables.
Kuan Xu, Yongbo Wang, Yongliang Wang, Zujie Wen,
Though TURL has shown SOTA performance on
and Yang Dong. 2023b. Sead: End-to-end text-to-sql various table tasks such as column type annotation,
generation with schema-aware denoising. relation extraction, entity linking, row population
and schema augmentation, it requires fine-tuning
Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Se-
bastian Riedel. 2020. TaBERT: Pretraining for joint task-specific modules on labeled data. The SOTA
understanding of textual and tabular data. In Proceed- method for HiTab builds on 1) TUTA (Wang et al.,
ings of the 58th Annual Meeting of the Association 2021), which uses tree attention as the encoder to
for Computational Linguistics, pages 8413–8426, On- capture table structures and 2) FORTAP (Cheng
line. Association for Computational Linguistics.
et al., 2022a), which leverages spreadsheet formu-
Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wen- las for table pre-training to better handle numerical
hao Huang, Huan Sun, Yu Su, and Wenhu Chen. reasoning. The SOTA method for TabFact designs
2024. MAmmoTH: Building math generalist models
a self-labeled keypoint alignment (Zhao and Yang,
through hybrid instruction tuning. In The Twelfth In-
ternational Conference on Learning Representations. 2022) to align salient evidence and aggregate essen-
tial information between the statement and table.
Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, For HybridQA, the SOTA method MAFiD (Lee
Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tian-
wei Zhang, Fei Wu, and Guoyin Wang. 2023. Instruc- et al., 2023) deploys special fusion in decoder and
tion tuning for large language models: A survey. uses a gated cross-attention layer to enhance the
reasoning ability on tables. The SOTA method for
Shuo Zhang and Krisztian Balog. 2017. Entitables:
WikiTQ is TAPEX (Liu et al., 2022), which fuses
Smart assistance for entity-focused tables. In Pro-
ceedings of the 40th international ACM SIGIR con- table pre-training by learning a neural SQL execu-
ference on research and development in information tor over a synthetic corpus. The SOTA method
retrieval, pages 255–264. for WikiSQL uses two denoising objectives and a
Guangzhen Zhao and Peng Yang. 2022. Table-based clause-sensitive execution guided (EG) decoding
fact verification with self-labeled keypoint alignment. strategy to generate better SQL and then get the an-
In Proceedings of the 29th International Conference swer (Xu et al., 2023b). For FeTaQA, KVRET and
on Computational Linguistics, pages 1401–1411, ToTTo, the SOTA results come from T5-3B fine-
Gyeongju, Republic of Korea. International Com-
mittee on Computational Linguistics. tuned on their own individual training data (Xie
et al., 2022). For FEVEROUS, the SOTA is from a
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan 20B large language model: FLAN UL2 (Tay et al.,
Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, 2022).
Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang,
Joseph E. Gonzalez, and Ion Stoica. 2023. Judging
llm-as-a-judge with mt-bench and chatbot arena. B More details about TableInstruct
Victor Zhong, Caiming Xiong, and Richard Socher. B.1 Data Selection
2017. Seq2sql: Generating structured queries We choose the datasets and tasks based on three
from natural language using reinforcement learning. criteria: diversity, realisticness and reliability.
CoRR, abs/1709.00103.
• Diversity: we hope to cover table-based tasks
as comprehensively as possible both in the
NLP community and database community.
That’s why we include 14 datasets of 11 tasks.
• Realisticness: we include the table
sources from Wikipedia tables and Na-
tional Science Foundation reports (eg,
https://www.nsf.gov/statistics/2019/nsf19319/),
which make sure the table types are real-
istic and include both simple tables and
hierarchical tables with complex table
structures.
• Reliability: we compile existing datasets that sampled 30 instances for each task to double check
are widely used in the NLP community and the data and make sure there are no errors. We also
database community. have two annotators to do the cross-checking.
We split TableInstruct into in-domain (for training C More detailed statistics of
and evaluation) and out-of-domain (for evaluation) TableInstruct.
sets based on three constraints:
Table 5 shows more detailed statistics of
• to make the tasks in the training and out-of- TableInstruct in terms of the average word count
domain evaluation set as disjoint as possible; of different parts of the datasets (i.e., instruction,
• if there are two datasets for the same task, we input, question and response), table size (average
will divide them into training set and out-of- column size and row size per table), table type
domain evaluation set; (Wikipedia tables or NSF reports), task type (rank-
ing or classification) and whether the tables are
• since tables have special two-dimensional hierarchical or not.
structures, we need the model to gain fun-
damental table understanding abilities, which
the model can recognize the relation for cells
within and among different columns and rows,
and also correlate the headers and row names
with corresponding columns and rows. So
we mainly select different table interpretation
and table augmentation tasks to encourage the
model to understand table structures. In addi-
tion, we try to engage the model with strong
numerical reasoning ability, open-ended table
QA and fact verification ability, so we include
HiTab, FeTaQA and TabFact for training as
well. For out-of-domain tasks, we mainly test
the more high-level ability to see the model’s
generalization. For example, the table ques-
tion answering datasets in the training set are
two types: one is full of numerical reasoning
on hierarchical tables and the other is to gener-
ate open-ended answer based on highlighted
table cells. We hope the learned table QA abil-
ity can transfer to different kinds of unseen
table QA tasks such as adding extra compo-
nents (passages or dialogues, etc) as evidence
and letting the model infer the answer from
both tables and added components.
B.2 Data Annotation
The raw tables in our collected datasets are stored
in JSON, CSV or text files. We mainly annotate
instructions and questions based on the metadata
of each task, serialize the table format and put the
ground truth as response (more details and example
cases are in Appendix E).
Table 5: More detailed statistics of TableInstruct in terms of the average word count of different parts of the
datasets (i.e., instruction, input, question and response), table size (average column size and row size per table), table
type (Wikipedia tables or NSF reports), task type (ranking or classification) and whether the tables are hierarchical
or not. ’Y’ indicates ’Yes’ and ’N’ indicates ’No’.
D Case Study
Table 6: Case study for row population task. “Query Caption" refers to the table metadata such as Wikipedia page
title and table caption. “AP" means average precision.
E Example Prompts
Column Type Annotation
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a column type annotation task. The goal for this task is to choose the correct types for one selected column of the
table from the given candidates. The Wikipedia page, section and table caption (if any) provide important information
for choosing the correct column types.
### Input:
[TLE] The Wikipedia page is about 1958 Nippon Professional Baseball season. The Wikipedia section is about Central
League. The table caption is Pitching leaders. [TAB] col: | stat | player | team | total | [SEP] row 1: | Wins | Masaichi
Kaneda | Kokutetsu Swallows | 31| [SEP] row 2: | Losses | Noboru Akiyama | ...
### Question:
The column ’player’ contains the following entities: <Masaichi Kaneda>, <Noboru Akiyama>, etc. The column type
candidates are: tv.tv_producer, astronomy.star_system_body, location.citytown, sports.pro_athlete, biology.organism,
medicine.muscle, baseball.baseball_team, baseball.baseball_player, aviation.aircraft_owner, people.person, ... What are
the correct column types for this column (column name: player; entities: <Masaichi Kaneda>, <Noboru Akiyama>, etc)?
### Response:
sports.pro_athlete, baseball.baseball_player, people.person.
Figure 3: Column type annotation task. This task is to annotate the selected column with the correct semantic
types. We mark candidates with red color in the "task instruction" part. The candidate size can be up to hundreds to
thousands in TableInstruct.
Relation Extraction
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a relation extraction task. The goal for this task is to choose the correct relations between two selected columns
of the table from the given candidates. The Wikipedia page, section and table caption (if any) provide important
information for choosing the correct relation types.
### Input:
[TLE] The Wikipedia page is about Yukon Legislative Assembly. The Wikipedia section is about Current members.
[TAB] col: | | name | party | riding | row 1: | | Kevin Barr | New Democratic Party | Mount Lorne-Southern Lakes | [SEP]
row 2: | | Brad Cathers | ...
### Question:
The two selected column names are: <(name),(party)>. The entity pairs for these two columns are:
<(Kevin Barr),(New Democratic Party)>, <(Brad Cathers),(Yukon Party)>, <(Currie Dixon),(Yukon Party)>,
<(Darius Elias),(Yukon Party)>, ... The relation type candidates are: location.location.contains, avia-
tion.airline.hubs, film.film.written_by, time.event.instance_of_recurring_even , people.person.place_of_birth, mu-
sic.composer.compositions, sports.sports_team.roster- sports.sports_team_roster.player, location.location.containedby,
soccer.football_player.statistics- soccer.football_player_stats.team... What are the correct relation types for the two
selected columns (column names: <(name),(party)>. entity pairs: <(Kevin Barr),(New Democratic Party)>, <(Brad
Cathers),(Yukon Party)>, <(Currie Dixon), (Yukon Party)>, <(Darius Elias),(Yukon Party)>, etc)?
### Response:
government.politician.party-government.political_party_tenure.party.
Figure 4: Relation extraction task. This task is to annotate the selected column pairs with the correct relations. We
mark candidates with red color in the "task instruction" part. The candidate size can be up to hundreds to thousands
in TableInstruct.
Entity Linking
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction: This is an entity linking task. The goal for this task is to link the selected entity mention in the table
cells to the entity in the knowledge base. You will be given a list of referent entities, with each one composed of an
entity name, its description and its type. Please choose the correct one from the referent entity candidates. Note that the
Wikipedia page, Wikipedia section and table caption (if any) provide important information for choosing the correct
referent entity.
### Input: [TLE] The Wikipedia page is about A-League all-time records. The Wikipedia section is about Average
season attendances. [TAB] col: | season | league average | total gate receipts | highest club | average | lowest club |
average | row 1: | 2005-06 | 10,955 | 920,219 | Sydney FC | 16,669 | New Zealand Knights | 3,909 | [SEP] row 2: |
2006-07 | 12,927 | ...
### Question: The selected entity mention in the table cell is: Melbourne Victory. The column name for ’Melbourne
Victory’ is highest club. The referent entity candidates are: <Melbourne Victory FC W-League [DESCRIPTION] None
[TYPE] SoccerClub>, <2016-17 Melbourne Victory FC season [DESCRIPTION] None [TYPE] SoccerClubSeason>,
<2011-12 Melbourne Victory season [DESCRIPTION] Association football club 2011/12 season for Melbourne Victory
[TYPE] SoccerClubSeason>, ... What is the correct referent entity for the entity mention ’Melbourne Victory’ ?
### Response: <Melbourne Victory [DESCRIPTION] association football team from Australia [TYPE] SoccerClub>.
Figure 5: Entity linking task. This task is to link the selected entity mention in the table cells to the entity in the
knowledge base. We mark candidates with red color in the "task instruction" part. The candidate size can be up to
hundreds to thousands in TableInstruct.
Row Population
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction: This is a table row population task. The goal of this task is to populate the possible entities
of the selected column for a table, given the Wikipedia page title, Wikipedia section title, table caption (if any)
and table headers. You will be given a list of entity candidates. Please rank them so that the most likely entities come first.
### Input: [TLE] The Wikipedia page is about NBA conference finals. The Wikipedia section is about eastern
conference finals. The table headers are: | year | champion | coach | result | runner-up | coach |. You need to populate the
column: year. [SEED] The seed entity is <1971_NBA_playoffs>.
Figure 6: Row population task. This task is to populate the possible entities of the selected column for a table given
partial table and table metadata. We mark candidates with red color in the "task instruction" part. The candidate size
can be up to hundreds to thousands in TableInstruct.
Schema Augmentation
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a table schema augmentation task. The goal of this task is to populate the possible headers for a table, given the
table caption and the seed table header. You will be given a list of table header candidates. Please rank them so that the
most likely headers come first.
### Input:
[TLE] The table caption is 2010-11 rangers f.c. season. [SEED] The seed table header is <competition>.
### Question:
The header candidates are: <from>, <fee (\u00a3)>, <opponents>, <final position / round>, <started round>, <player>,
<fee>, <scorers>, <position (s)>, <name>, <venue>. Please rank the headers in the header candidates.
### Response:
<town/city>, <summary>, <suburb>, <county>, <region>, <district>, <stadium>, <city>, <home team>, <remarks>,
<city name>, <film name>
Figure 7: Schema augmentation task. This task is to populate the possible headers for a table, given the table
caption and the seed table header. The targets in this case are: <started round>, <final position/round>.
Hierarchical Table QA
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction: This is a hierarchical table question answering task. The goal for this task is to answer the given
question based on the given table. The table might be hierarchical.
### Input: [TLE] The table caption is department of defense obligations for research, development, test, and evaluation,
by agency: 2015-18. [TAB] | agency | 2015 | 2016 | 2017 | 2018 | [SEP] | department of defense | department of defense |
department of defense | department of defense | department of defense | [SEP] | rdt&e | 61513.5 | 69306.1| 70866.1 |
83725 | [SEP] | total research | 6691.5 | 7152 | 7178 | 7652.7 | [SEP] | basic research | 2133.4 | 2238.7 | 2110.1 | 2389.9 |
[SEP] | defense advanced research projects agency | defense advanced research projects agency | defense advanced
research projects agency | ...
### Question: How many dollars are the difference for total research of department of the air force increase between
2016 and 2018?
### Response:
142.3.
Figure 8: Hierarchical table QA task. This task is to answer the question based on the tables with complex
hierarchical structures.
Highlighted Cells QA
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a free-form table question answering task. The goal for this task is to answer the given question based on the
given table and the highlighted cells.
### Input:
[TLE] The Wikipedia page title of this table is Holly Dunn. The Wikipedia section title of this table is Singles. [TAB] |
Year | Single | Peak chart positions | Peak chart positions | Album | [SEP] | Year | Single | US Country | CAN Country |
Album | [SEP] | 1985 | ...
### Question:
The highlighted cells of the table are: [HIGHLIGHTED_BEGIN] [1988], [Across the Rio Grande in 1988 included the
singles \"That’s What Your Love Does to Me\" and \"(It’s Always Gonna Be) Someday\".], [\"That’s What Your Love
Does to Me\"], [Across the Rio Grande], [1988], [\"(It’s Always Gonna Be) Someday\"], [Across the Rio Grande]
[HIGHLIGHTED_END] What singles were Included in Across the Rio Grande in 1988?
### Response:
Across the Rio Grande in 1988 included the singles \"That’s What Your Love Does to Meänd \"(It’s Always Gonna Be)
Someday\".
Figure 9: Highlighted cells QA task. This task is to answer the question based on the tables with highlighted cells.
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a table fact verification task. The goal of this task is to distinguish whether the given statement is entailed or
refuted by the given table.
### Input:
[TLE] The table caption is about tony lema. [TAB] | tournament | wins | top - 5 | top - 10 | top - 25 |
events | cuts made [SEP] | masters tournament | 0 | 1 | 2 | 4 | 4 | 4 | [SEP] | us open | 0 | 2 | 3 | 4 | 6 | 5 | [SEP] |
the open championship | 1 | 2 | 2 | 2 | 3 | 3 | [SEP] | pga championship | 0 | 0 | 1 | 2 | 5 | 4 | [SEP] | totals | 1 | 5 | 8 | 12 | 18 | 16 |.
### Question:
The statement is: <tony lema be in the top 5 for the master tournament, the us open, and the open championship>. Is it
entailed or refuted by the table above?
### Response:
Entailed.
Figure 10: Table fact verification task. This task is to discriminate whether the claim can be entailed or refuted by
the given table.
Hybrid Question Answering
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a hybrid question answering task. The goal of this task is to answer the question given tables and passages.
### Input:
[TAB] col: | rank | player | team (s) by season | carries | yards | average | [SEP] | 1 | emmitt smith | dallas cowboys ( 1990
- 2002 ) arizona cardinals ( | 4,409 | 18,355 | 4.2 | [SEP] | 3 | frank gore | san francisco 49ers ( 2005 - 2014 ) indianapolis
colts | 3,548 | 15,347 | 4.3 | [SEP] | ...
### Question:
The passage may also provide related context. You can refer to both the passages and the table when you answer the
question. Passages: emmitt smith: smith led the league in rushing and won the super bowl in the same year three times
( 1992 , 1993 , and 1995 ) when to that point it had never been done . | walter payton: walter jerry payton ( july 25 ,
1954 - november 1 , 1999 ) was an american professional football player who was a running back for the chicago bears
of the national football league ( nfl ) for thirteen seasons . | ... The question: what is the middle name of the player with
the second most national football league career rushing yards?
### Response:
Jerry.
Figure 11: HybridQA task. This task is to answer the question based on the table and passages.
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a dialogue response generation task grounded on tables. The goal of this task is to generate response based
on the given dialogue history and the given table. The dialogues are grounded through underlying tables and span
three distinct tasks in the in-car personal assistant space: calendar scheduling, weather information retrieval, and
point-of-interest navigation.
### Input:
col : poi | distance | traffic_info | poi_type | address row 1 : chevron | 5_miles | moderate_traffic | gas_station |
783_arcadia_pl row 2 : town_and_country | 5_miles | no_traffic | shopping_center | 383_university_ave
### Question:
The dialogue history is: <what is the address ? || taking you to chevron | that s good ! please pick the quickest route to
get there and avoid all heavy_traffic ! | there is a chevron | what gas_station are here ?>. Please generate the response
based on the given table and the given dialogue history.
### Response:
783_arcadia_pl is the address for chevron gas_station.
Figure 12: Table grounded dialogue generation task. This task is to generate the response based on the given
table and the dialogue history.
Highlighted Cells Description
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a highlighted cells description task. The goal of this task is to generate the language description given table cells.
### Input:
<page_title> List of Governors of South Carolina </page_title> <section_title> Governors under the Constitution of
1868 </section_title> <table> <cell> 76 <col_header> # </col_header> <col_header> 74 </col_header> <col_header>
75 </col_header> </cell> <cell> Daniel Henry Chamberlain <col_header> Governor </col_header> <row_header>
76 </row_header> </cell> <cell> December 1, 1874 <col_header> Took Office </col_header> <row_header> 76
</row_header> </cell> </table>.
### Question:
Please generate one natural language description to describe the given highlighted table cells.
### Response:
Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.
Figure 13: Highlighted cells description task. This task is to generate the language description for the highlighted
table cells.
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a table fact verification task. The goal of this task is to distinguish whether the given statement is entailed or
refuted by the given table.
### Input:
[TAB] col: | logical system | lindenbaum2̆013tarski algebra | [SEP] | classical sentential logic | boolean algebra | [SEP] |
intuitionistic propositional logic | heyting algebra | [SEP] | ...
### Question:
The statement is: <algebraic logic has five logical system and lindenbaum2̆013tarski algebra which includes physics
algebra and nodal algebra (provide models of propositional modal logics).>. Is it entailed or refuted by the table above?
If you think the current information can not provide enough evidence for determining it, please choose ’not enough
info’, otherwise please choose the answer from ’supports’ or ’refutes’.
### Response:
Refutes.
Figure 14: Table fact verification task. This task is to discriminate whether the claim can be entailed or refuted by
the given table.
Table QA (WikiSQL)
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a table QA task. The goal of this task is to answer the question given the table.
### Input:
[TAB] col: | player | no. | nationality | position | years in toronto | school/club team | [SEP] | aleksandar radojevi0̆107 |
25 | serbia | center | 1999-2000 | barton cc (ks) | [SEP] | shawn respert | 31 | united states | guard | 1997-98 | michigan
state | [SEP] | ...
### Question:
What is terrence ross’ nationality?
### Response:
United states.
Figure 15: Table QA task. This task is to answer the question based on the given table.
Table QA (WikiTQ)
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
This is a table QA task. The goal of this task is to answer the question given the table.
### Input:
[TAB] col: | series # | season # | title | notes | original air date | [SEP] | 1 | 1 | ẗhe charity¨| alfie, dee dee, and melanie are
supposed to be helping | october 15, 1994 | [SEP] | 2 | 1 | ẗhe practical joke war¨| alfie and goo unleash harsh practical
jokes on dee dee | october 22, 1994 | [SEP] | ...
### Question:
Alfie’s birthday party aired on january 19. What was the airdate of the next episode?
### Response:
January 26, 1995.
Figure 16: Table QA task. This task is to answer the question based on the given table.