Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …
and various other document types, a flurry of table pre-training frameworks have been …
A survey of deep learning approaches for ocr and document understanding
Documents are a core part of many businesses in many fields such as law, finance, and
technology among others. Automatic understanding of documents such as invoices …
technology among others. Automatic understanding of documents such as invoices …
Tabbie: Pretrained representations of tabular data
Existing work on tabular representation learning jointly models tables and associated text
using self-supervised objective functions derived from pretrained language models such as …
using self-supervised objective functions derived from pretrained language models such as …
Annotating columns with pre-trained language models
Inferring meta information about tables, such as column headers or relationships between
columns, is an active research topic in data management as we find many tables are …
columns, is an active research topic in data management as we find many tables are …
Hitab: A hierarchical table dataset for question answering and natural language generation
Tables are often created with hierarchies, but existing works on table reasoning mainly focus
on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods …
on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods …
Preview, attend and review: Schema-aware curriculum learning for multi-domain dialog state tracking
Existing dialog state tracking (DST) models are trained with dialog data in a random order,
neglecting rich structural information in a dataset. In this paper, we propose to use …
neglecting rich structural information in a dataset. In this paper, we propose to use …
Data augmentation for ml-driven data preparation and integration
In recent years, we have witnessed the development of novel data augmentation (DA)
techniques for creating additional training data needed by machine learning based …
techniques for creating additional training data needed by machine learning based …
External Knowledge Infusion for Tabular Pre-training Models with Dual-adapters
Tabular pre-training models have received increasing attention due to the wide-ranging
applications for tabular data analysis. However, most of the existing solutions are directly …
applications for tabular data analysis. However, most of the existing solutions are directly …
Davarocr: A toolbox for ocr and multi-modal document understanding
This paper presents DavarOCR, an open-source toolbox for OCR and document
understanding tasks. DavarOCR currently implements 19 advanced algorithms, covering 9 …
understanding tasks. DavarOCR currently implements 19 advanced algorithms, covering 9 …
DemOpts: Fairness corrections in COVID-19 case prediction models
COVID-19 forecasting models have been used to inform decision making around resource
allocation and intervention decisions eg, hospital beds or stay-at-home orders. State of the …
allocation and intervention decisions eg, hospital beds or stay-at-home orders. State of the …