Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks

H Dong, Z Cheng, X He, M Zhou, A Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …

Deep learning for table detection and structure recognition: A survey

M Salaheldin Kasem, A Abdallah, A Berendeyev… - ACM Computing …, 2024 - dl.acm.org
Tables are everywhere, from scientific journals, articles, websites, and newspapers all the
way to items we buy at the supermarket. Detecting them is thus of utmost importance to …

“What it wants me to say”: Bridging the abstraction gap between end-user programmers and code-generating large language models

MX Liu, A Sarkar, C Negreanu, B Zorn… - Proceedings of the …, 2023 - dl.acm.org
Code-generating large language models map natural language to code. However, only a
small portion of the infinite space of naturalistic utterances is effective at guiding code …

Tuta: Tree-based transformers for generally structured table pre-training

Z Wang, H Dong, R Jia, J Li, Z Fu, S Han… - Proceedings of the 27th …, 2021 - dl.acm.org
We propose TUTA, a unified pre-training architecture for understanding generally structured
tables. Noticing that understanding a table requires spatial, hierarchical, and semantic …

Table structure recognition using top-down and bottom-up cues

S Raja, A Mondal, CV Jawahar - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer
Tables are information-rich structured objects in document images. While significant work
has been done in localizing tables as graphic objects in document images, only limited …

TabularNet: A neural network architecture for understanding semantic structures of tabular data

L Du, F Gao, X Chen, R Jia, J Wang, J Zhang… - Proceedings of the 27th …, 2021 - dl.acm.org
Tabular data are ubiquitous for the widespread applications of tables and hence have
attracted the attention of researchers to extract underlying information. One of the critical …

Hitab: A hierarchical table dataset for question answering and natural language generation

Z Cheng, H Dong, Z Wang, R Jia, J Guo, Y Gao… - arXiv preprint arXiv …, 2021 - arxiv.org
Tables are often created with hierarchies, but existing works on table reasoning mainly focus
on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods …

Data preparation: A survey of commercial tools

M Hameed, F Naumann - ACM SIGMOD Record, 2020 - dl.acm.org
Raw data are often messy: they follow different encodings, records are not well structured,
values do not adhere to patterns, etc. Such data are in general not fit to be ingested by …

Large language models for tabular data: Progresses and future directions

H Dong, Z Wang - Proceedings of the 47th International ACM SIGIR …, 2024 - dl.acm.org
Tables contain a significant portion of the world's structured information. The ability to
efficiently and accurately understand, process, reason about, analyze, and generate tabular …

A survey of deep learning approaches for ocr and document understanding

N Subramani, A Matton, M Greaves, A Lam - arXiv preprint arXiv …, 2020 - arxiv.org
Documents are a core part of many businesses in many fields such as law, finance, and
technology among others. Automatic understanding of documents such as invoices …