Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks

H Dong, Z Cheng, X He, M Zhou, A Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …

Neurologic a* esque decoding: Constrained text generation with lookahead heuristics

X Lu, S Welleck, P West, L Jiang, J Kasai… - arXiv preprint arXiv …, 2021 - arxiv.org
The dominant paradigm for neural text generation is left-to-right decoding from
autoregressive language models. Constrained or controllable generation under complex …

ToTTo: A controlled table-to-text generation dataset

AP Parikh, X Wang, S Gehrmann, M Faruqui… - arXiv preprint arXiv …, 2020 - arxiv.org
We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training
examples that proposes a controlled generation task: given a Wikipedia table and a set of …

Large language models are few (1)-shot table reasoners

W Chen - arXiv preprint arXiv:2210.06710, 2022 - arxiv.org
Recent literature has shown that large language models (LLMs) are generally excellent few-
shot reasoners to solve text reasoning tasks. However, the capability of LLMs on table …

Chart-to-text: A large-scale benchmark for chart summarization

S Kantharaj, RTK Leong, X Lin, A Masry… - arXiv preprint arXiv …, 2022 - arxiv.org
Charts are commonly used for exploring data and communicating insights. Generating
natural language summaries from charts can be very helpful for people in inferring key …

Dart: Open-domain structured data record to text generation

L Nan, D Radev, R Zhang, A Rau, A Sivaprasad… - arXiv preprint arXiv …, 2020 - arxiv.org
We present DART, an open domain structured DAta Record to Text generation dataset with
over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially …

KGPT: Knowledge-grounded pre-training for data-to-text generation

W Chen, Y Su, X Yan, WY Wang - arXiv preprint arXiv:2010.02307, 2020 - arxiv.org
Data-to-text generation has recently attracted substantial interests due to its wide
applications. Existing methods have shown impressive performance on an array of tasks …

Folio: Natural language reasoning with first-order logic

S Han, H Schoelkopf, Y Zhao, Z Qi, M Riddell… - arXiv preprint arXiv …, 2022 - arxiv.org
We present FOLIO, a human-annotated, open-domain, and logically complex and diverse
dataset for reasoning in natural language (NL), equipped with first order logic (FOL) …

Table understanding: Problem overview

A Shigarov - Wiley Interdisciplinary Reviews: Data Mining and …, 2023 - Wiley Online Library
Tables are probably the most natural way to represent relational data in various media and
formats. They store a large number of valuable facts that could be utilized for question …

Transformers for tabular data representation: A survey of models and applications

G Badaro, M Saeed, P Papotti - Transactions of the Association for …, 2023 - direct.mit.edu
In the last few years, the natural language processing community has witnessed advances
in neural representations of free texts with transformer-based language models (LMs). Given …