Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks

H Dong, Z Cheng, X He, M Zhou, A Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …

A survey of deep learning approaches for ocr and document understanding

N Subramani, A Matton, M Greaves, A Lam - arXiv preprint arXiv …, 2020 - arxiv.org
Documents are a core part of many businesses in many fields such as law, finance, and
technology among others. Automatic understanding of documents such as invoices …

Tabbie: Pretrained representations of tabular data

H Iida, D Thai, V Manjunatha, M Iyyer - arXiv preprint arXiv:2105.02584, 2021 - arxiv.org
Existing work on tabular representation learning jointly models tables and associated text
using self-supervised objective functions derived from pretrained language models such as …

Annotating columns with pre-trained language models

Y Suhara, J Li, Y Li, D Zhang, Ç Demiralp… - Proceedings of the …, 2022 - dl.acm.org
Inferring meta information about tables, such as column headers or relationships between
columns, is an active research topic in data management as we find many tables are …

Hitab: A hierarchical table dataset for question answering and natural language generation

Z Cheng, H Dong, Z Wang, R Jia, J Guo, Y Gao… - arXiv preprint arXiv …, 2021 - arxiv.org
Tables are often created with hierarchies, but existing works on table reasoning mainly focus
on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods …

Preview, attend and review: Schema-aware curriculum learning for multi-domain dialog state tracking

Y Dai, H Li, Y Li, J Sun, F Huang, L Si, X Zhu - arXiv preprint arXiv …, 2021 - arxiv.org
Existing dialog state tracking (DST) models are trained with dialog data in a random order,
neglecting rich structural information in a dataset. In this paper, we propose to use …

Data augmentation for ml-driven data preparation and integration

Y Li, X Wang, Z Miao, WC Tan - Proceedings of the VLDB Endowment, 2021 - dl.acm.org
In recent years, we have witnessed the development of novel data augmentation (DA)
techniques for creating additional training data needed by machine learning based …

External Knowledge Infusion for Tabular Pre-training Models with Dual-adapters

C Qin, S Kim, H Zhao, T Yu, RA Rossi… - Proceedings of the 28th …, 2022 - dl.acm.org
Tabular pre-training models have received increasing attention due to the wide-ranging
applications for tabular data analysis. However, most of the existing solutions are directly …

Davarocr: A toolbox for ocr and multi-modal document understanding

L Qiao, H Jiang, Y Chen, C Li, P Li, Z Li, B Zou… - Proceedings of the 30th …, 2022 - dl.acm.org
This paper presents DavarOCR, an open-source toolbox for OCR and document
understanding tasks. DavarOCR currently implements 19 advanced algorithms, covering 9 …

DemOpts: Fairness corrections in COVID-19 case prediction models

N Awasthi, S Abrar, D Smolyak… - arXiv preprint arXiv …, 2024 - arxiv.org
COVID-19 forecasting models have been used to inform decision making around resource
allocation and intervention decisions eg, hospital beds or stay-at-home orders. State of the …