Machine learning methods for small data challenges in molecular science

B Dou, Z Zhu, E Merkurjev, L Ke, L Chen… - Chemical …, 2023 - ACS Publications
Small data are often used in scientific and engineering research due to the presence of
various constraints, such as time, cost, ethics, privacy, security, and technical limitations in …

[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

A survey of active learning for natural language processing

Z Zhang, E Strubell, E Hovy - arXiv preprint arXiv:2210.10109, 2022 - arxiv.org
In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …

Interactive natural language processing

Z Wang, G Zhang, K Yang, N Shi, W Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within
the field of NLP, aimed at addressing limitations in existing frameworks while aligning with …

[HTML][HTML] Data expansion using back translation and paraphrasing for hate speech detection

DR Beddiar, MS Jahan, M Oussalah - Online Social Networks and Media, 2021 - Elsevier
With proliferation of user generated contents in social media platforms, establishing
mechanisms to automatically identify toxic and abusive content becomes a prime concern …

Improving short text classification with augmented data using GPT-3

SV Balkus, D Yan - Natural Language Engineering, 2024 - cambridge.org
GPT-3 is a large-scale natural language model developed by OpenAI that can perform many
different tasks, including topic classification. Although researchers claim that it requires only …

Generative pre-trained transformer (GPT) in research: A systematic review on data augmentation

F Sufi - Information, 2024 - mdpi.com
GPT (Generative Pre-trained Transformer) represents advanced language models that have
significantly reshaped the academic writing landscape. These sophisticated language …

Mask-then-fill: A flexible and effective data augmentation framework for event extraction

J Gao, C Yu, W Wang, H Zhao, R Xu - arXiv preprint arXiv:2301.02427, 2023 - arxiv.org
We present Mask-then-Fill, a flexible and effective data augmentation framework for event
extraction. Our approach allows for more flexible manipulation of text and thus can generate …

Text autoaugment: Learning compositional augmentation policy for text classification

S Ren, J Zhang, L Li, X Sun, J Zhou - arXiv preprint arXiv:2109.00523, 2021 - arxiv.org
Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-
resource or class-imbalanced situations. Traditional methods first devise task-specific …