Tabular Data Augmentation for Machine Learning: Progress and Prospects of Embracing Generative AI

L Cui, H Li, K Chen, L Shou, G Chen - arXiv preprint arXiv:2407.21523, 2024 - arxiv.org
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality
tabular data for model training remains a significant obstacle. Numerous works have …

[HTML][HTML] Feasibility Study of Edge Computing Empowered by Artificial Intelligence—A Quantitative Analysis Based on Large Models

Y Chen, C Wu, R Sui, J Zhang - Big Data and Cognitive Computing, 2024 - mdpi.com
The advancement of artificial intelligence (AI) demands significant data and computational
resources that have an adverse impact on the environment. To address this issue, a novel …

NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli

X Wang, C Li, Y Chang, J Wang, Y Wu - arXiv preprint arXiv:2405.02814, 2024 - arxiv.org
Large Language Models (LLMs) have become integral to a wide spectrum of applications,
ranging from traditional computing tasks to advanced artificial intelligence (AI) applications …

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

T Nguyen, Y Bin, J Xiao, L Qu, Y Li, JZ Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Humans use multiple senses to comprehend the environment. Vision and language are two
of the most vital senses since they allow us to easily communicate our thoughts and …

LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?

J Cegin, J Simko, P Brusilovsky - arXiv preprint arXiv:2408.16502, 2024 - arxiv.org
The generative large language models (LLMs) are increasingly being used for data
augmentation tasks, where text samples are LLM-paraphrased and then used for classifier …

Stochastic Adversarial Networks for Multi-Domain Text Classification

X Wang, Y Wu - arXiv preprint arXiv:2406.00044, 2024 - arxiv.org
Adversarial training has been instrumental in advancing multi-domain text classification
(MDTC). Traditionally, MDTC methods employ a shared-private paradigm, with a shared …

A Survey of Data Synthesis Approaches

HY Chang, PY Chen, TH Chou, CS Kao, HY Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper provides a detailed survey of synthetic data techniques. We first discuss the
expected goals of using synthetic data in data augmentation, which can be divided into four …

You Only Need Half: Boosting Data Augmentation by Using Partial Content

J Hu, Y Wu - arXiv preprint arXiv:2405.02830, 2024 - arxiv.org
We propose a novel data augmentation method termed You Only Need hAlf (YONA), which
simplifies the augmentation process. YONA bisects an image, substitutes one half with …

Margin Discrepancy-based Adversarial Training for Multi-Domain Text Classification

Y Wu - arXiv preprint arXiv:2403.00888, 2024 - arxiv.org
Multi-domain text classification (MDTC) endeavors to harness available resources from
correlated domains to enhance the classification accuracy of the target domain. Presently …

Vision Transformer-based Adversarial Domain Adaptation

Y Li, Y Wu - arXiv preprint arXiv:2404.15817, 2024 - arxiv.org
Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source
domain to an unlabeled target domain. The most recent UDA methods always resort to …