Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …
and various other document types, a flurry of table pre-training frameworks have been …
Deep learning for table detection and structure recognition: A survey
M Salaheldin Kasem, A Abdallah, A Berendeyev… - ACM Computing …, 2024 - dl.acm.org
Tables are everywhere, from scientific journals, articles, websites, and newspapers all the
way to items we buy at the supermarket. Detecting them is thus of utmost importance to …
way to items we buy at the supermarket. Detecting them is thus of utmost importance to …
“What it wants me to say”: Bridging the abstraction gap between end-user programmers and code-generating large language models
Code-generating large language models map natural language to code. However, only a
small portion of the infinite space of naturalistic utterances is effective at guiding code …
small portion of the infinite space of naturalistic utterances is effective at guiding code …
Tuta: Tree-based transformers for generally structured table pre-training
We propose TUTA, a unified pre-training architecture for understanding generally structured
tables. Noticing that understanding a table requires spatial, hierarchical, and semantic …
tables. Noticing that understanding a table requires spatial, hierarchical, and semantic …
Table structure recognition using top-down and bottom-up cues
Tables are information-rich structured objects in document images. While significant work
has been done in localizing tables as graphic objects in document images, only limited …
has been done in localizing tables as graphic objects in document images, only limited …
TabularNet: A neural network architecture for understanding semantic structures of tabular data
Tabular data are ubiquitous for the widespread applications of tables and hence have
attracted the attention of researchers to extract underlying information. One of the critical …
attracted the attention of researchers to extract underlying information. One of the critical …
Hitab: A hierarchical table dataset for question answering and natural language generation
Tables are often created with hierarchies, but existing works on table reasoning mainly focus
on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods …
on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods …
Data preparation: A survey of commercial tools
M Hameed, F Naumann - ACM SIGMOD Record, 2020 - dl.acm.org
Raw data are often messy: they follow different encodings, records are not well structured,
values do not adhere to patterns, etc. Such data are in general not fit to be ingested by …
values do not adhere to patterns, etc. Such data are in general not fit to be ingested by …
Large language models for tabular data: Progresses and future directions
Tables contain a significant portion of the world's structured information. The ability to
efficiently and accurately understand, process, reason about, analyze, and generate tabular …
efficiently and accurately understand, process, reason about, analyze, and generate tabular …
A survey of deep learning approaches for ocr and document understanding
Documents are a core part of many businesses in many fields such as law, finance, and
technology among others. Automatic understanding of documents such as invoices …
technology among others. Automatic understanding of documents such as invoices …