A survey of controllable text generation using transformer-based pre-trained language models

H Zhang, H Song, S Li, M Zhou, D Song - ACM Computing Surveys, 2023 - dl.acm.org
Controllable Text Generation (CTG) is an emerging area in the field of natural language
generation (NLG). It is regarded as crucial for the development of advanced text generation …

[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

A survey on data augmentation for text classification

M Bayer, MA Kaufhold, C Reuter - ACM Computing Surveys, 2022 - dl.acm.org
Data augmentation, the artificial creation of training data for machine learning by
transformations, is a widely studied research field across machine learning disciplines …

FUDGE: Controlled text generation with future discriminators

K Yang, D Klein - arXiv preprint arXiv:2104.05218, 2021 - arxiv.org
We propose Future Discriminators for Generation (FUDGE), a flexible and modular method
for controlled text generation. Given a pre-existing model G for generating text from a …

AEDA: an easier data augmentation technique for text classification

A Karimi, L Rossi, A Prati - arXiv preprint arXiv:2108.13230, 2021 - arxiv.org
This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the
performance on text classification tasks. AEDA includes only random insertion of …

Increasing diversity while maintaining accuracy: Text data generation with large language models and human interventions

JJY Chung, E Kamar, S Amershi - arXiv preprint arXiv:2306.04140, 2023 - arxiv.org
Large language models (LLMs) can be used to generate text data for training and evaluating
other models. However, creating high-quality datasets with LLMs can be challenging. In this …

[HTML][HTML] Data augmentation techniques in natural language processing

LFAO Pellicer, TM Ferreira, AHR Costa - Applied Soft Computing, 2023 - Elsevier
Data Augmentation (DA) methods–a family of techniques designed for synthetic generation
of training data–have shown remarkable results in various Deep Learning and Machine …

Mitigating political bias in language models through reinforced calibration

R Liu, C Jia, J Wei, G Xu, L Wang… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Current large-scale language models can be politically biased as a result of the data they
are trained on, potentially causing serious problems when they are deployed in real-world …

Improving short text classification with augmented data using GPT-3

SV Balkus, D Yan - Natural Language Engineering, 2024 - cambridge.org
GPT-3 is a large-scale natural language model developed by OpenAI that can perform many
different tasks, including topic classification. Although researchers claim that it requires only …