Xtab: Cross-table pretraining for tabular transformers

B Zhu, X Shi, N Erickson, M Li, G Karypis… - arXiv preprint arXiv …, 2023 - arxiv.org
The success of self-supervised learning in computer vision and natural language processing
has motivated pretraining methods on tabular data. However, most existing tabular self …

A survey on masked autoencoder for self-supervised learning in vision and beyond

C Zhang, C Zhang, J Song, JSK Yi, K Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
Masked autoencoders are scalable vision learners, as the title of MAE\cite {he2022masked},
which suggests that self-supervised learning (SSL) in vision might undertake a similar …

Stunt: Few-shot tabular learning with self-generated tasks from unlabeled tables

J Nam, J Tack, K Lee, H Lee, J Shin - arXiv preprint arXiv:2303.00918, 2023 - arxiv.org
Learning with few labeled tabular samples is often an essential requirement for industrial
machine learning applications as varieties of tabular data suffer from high annotation costs …

Self-supervised representation learning from random data projectors

Y Sui, T Wu, JC Cresswell, G Wu, G Stein… - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised representation learning~(SSRL) has advanced considerably by exploiting
the transformation invariance assumption under artificially designed data augmentations …

TIP: Tabular-image pre-training for multimodal classification with incomplete data

S Du, S Zheng, Y Wang, W Bai, DP O'Regan… - arXiv preprint arXiv …, 2024 - arxiv.org
Images and structured tables are essential parts of real-world databases. Though tabular-
image representation learning is promising to create new insights, it remains a challenging …

Remasker: Imputing tabular data with masked autoencoding

T Du, L Melis, T Wang - arXiv preprint arXiv:2309.13793, 2023 - arxiv.org
We present ReMasker, a new method of imputing missing values in tabular data by
extending the masked autoencoding framework. Compared with prior work, ReMasker is …

Modality-agnostic self-supervised learning with meta-learned masked auto-encoder

H Jang, J Tack, D Choi, J Jeong… - Advances in Neural …, 2024 - proceedings.neurips.cc
Despite its practical importance across a wide range of modalities, recent advances in self-
supervised learning (SSL) have been primarily focused on a few well-curated domains, eg …

Stochastic re-weighted gradient descent via distributionally robust optimization

R Kumar, K Majmundar, D Nagaraj… - arXiv preprint arXiv …, 2023 - arxiv.org
We develop a re-weighted gradient descent technique for boosting the performance of deep
neural networks. Our algorithm involves the importance weighting of data points during each …

A Comprehensive Survey on Data Augmentation

Z Wang, P Wang, K Liu, P Wang, Y Fu, CT Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Data augmentation is a series of techniques that generate high-quality artificial data by
manipulating existing data samples. By leveraging data augmentation techniques, AI …

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

C Hou, KK Thekumparampil, M Shavlovsky… - arXiv preprint arXiv …, 2023 - arxiv.org
While deep learning (DL) models are state-of-the-art in text and image domains, they have
not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular …