[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

A survey on data augmentation for text classification

M Bayer, MA Kaufhold, C Reuter - ACM Computing Surveys, 2022 - dl.acm.org
Data augmentation, the artificial creation of training data for machine learning by
transformations, is a widely studied research field across machine learning disciplines …

Flipda: Effective and robust data augmentation for few-shot learning

J Zhou, Y Zheng, J Tang, J Li, Z Yang - arXiv preprint arXiv:2108.06332, 2021 - arxiv.org
Most previous methods for text data augmentation are limited to simple tasks and weak
baselines. We explore data augmentation on hard tasks (ie, few-shot natural language …

Label-specific feature augmentation for long-tailed multi-label text classification

P Xu, L Xiao, B Liu, S Lu, L Jing, J Yu - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Multi-label text classification (MLTC) involves tagging a document with its most relevant
subset of labels from a label set. In real applications, labels usually follow a long-tailed …

Data augmentation using llms: Data perspectives, learning paradigms and challenges

B Ding, C Qin, R Zhao, T Luo, X Li, G Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
In the rapidly evolving field of machine learning (ML), data augmentation (DA) has emerged
as a pivotal technique for enhancing model performance by diversifying training examples …

TreeMix: Compositional constituency-based data augmentation for natural language understanding

L Zhang, Z Yang, D Yang - arXiv preprint arXiv:2205.06153, 2022 - arxiv.org
Data augmentation is an effective approach to tackle over-fitting. Many previous works have
proposed different data augmentations strategies for NLP, such as noise injection, word …

Substructure substitution: Structured data augmentation for NLP

H Shi, K Livescu, K Gimpel - arXiv preprint arXiv:2101.00411, 2021 - arxiv.org
We study a family of data augmentation methods, substructure substitution (SUB2), for
natural language processing (NLP) tasks. SUB2 generates new examples by substituting …

Genius: Sketch-based language model pre-training via extreme and selective masking for text generation and augmentation

B Guo, Y Gong, Y Shen, S Han, H Huang… - arXiv preprint arXiv …, 2022 - arxiv.org
We introduce GENIUS: a conditional text generation model using sketches as input, which
can fill in the missing contexts for a given sketch (key information consisting of textual spans …

Learn to resolve conversational dependency: A consistency training framework for conversational question answering

G Kim, H Kim, J Park, J Kang - arXiv preprint arXiv:2106.11575, 2021 - arxiv.org
One of the main challenges in conversational question answering (CQA) is to resolve the
conversational dependency, such as anaphora and ellipsis. However, existing approaches …

Asking questions like educational experts: Automatically generating question-answer pairs on real-world examination data

F Qu, X Jia, Y Wu - arXiv preprint arXiv:2109.05179, 2021 - arxiv.org
Generating high quality question-answer pairs is a hard but meaningful task. Although
previous works have achieved great results on answer-aware question generation, it is …