Data boost: Text data augmentation through reinforcement learning guided conditional generation

H Zhang, H Song, S Li, M Zhou, D Song - ACM Computing Surveys, 2023 - dl.acm.org

Controllable Text Generation (CTG) is an emerging area in the field of natural language
generation (NLG). It is regarded as crucial for the development of advanced text generation …

被引用次数：322 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

被引用次数：323 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org

Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

被引用次数：903 相关文章所有 9 个版本

[PDF] arxiv.org

A survey on data augmentation for text classification

M Bayer, MA Kaufhold, C Reuter - ACM Computing Surveys, 2022 - dl.acm.org

Data augmentation, the artificial creation of training data for machine learning by
transformations, is a widely studied research field across machine learning disciplines …

被引用次数：394 相关文章所有 5 个版本

[PDF] arxiv.org

FUDGE: Controlled text generation with future discriminators

K Yang, D Klein - arXiv preprint arXiv:2104.05218, 2021 - arxiv.org

We propose Future Discriminators for Generation (FUDGE), a flexible and modular method
for controlled text generation. Given a pre-existing model G for generating text from a …

被引用次数：281 相关文章所有 6 个版本

[PDF] arxiv.org

AEDA: an easier data augmentation technique for text classification

A Karimi, L Rossi, A Prati - arXiv preprint arXiv:2108.13230, 2021 - arxiv.org

This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the
performance on text classification tasks. AEDA includes only random insertion of …

被引用次数：184 相关文章所有 7 个版本

[PDF] arxiv.org

Increasing diversity while maintaining accuracy: Text data generation with large language models and human interventions

JJY Chung, E Kamar, S Amershi - arXiv preprint arXiv:2306.04140, 2023 - arxiv.org

Large language models (LLMs) can be used to generate text data for training and evaluating
other models. However, creating high-quality datasets with LLMs can be challenging. In this …

被引用次数：102 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] Data augmentation techniques in natural language processing

LFAO Pellicer, TM Ferreira, AHR Costa - Applied Soft Computing, 2023 - Elsevier

Data Augmentation (DA) methods–a family of techniques designed for synthetic generation
of training data–have shown remarkable results in various Deep Learning and Machine …

被引用次数：67 相关文章所有 4 个版本

[PDF] aaai.org

Mitigating political bias in language models through reinforced calibration

R Liu, C Jia, J Wei, G Xu, L Wang… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Current large-scale language models can be politically biased as a result of the data they
are trained on, potentially causing serious problems when they are deployed in real-world …

被引用次数：109 相关文章所有 9 个版本

[PDF] cambridge.org

Improving short text classification with augmented data using GPT-3

SV Balkus, D Yan - Natural Language Engineering, 2024 - cambridge.org

GPT-3 is a large-scale natural language model developed by OpenAI that can perform many
different tasks, including topic classification. Although researchers claim that it requires only …

被引用次数：46 相关文章所有 3 个版本