Textual data augmentation for efficient active learning on tiny datasets

B Dou, Z Zhu, E Merkurjev, L Ke, L Chen… - Chemical …, 2023 - ACS Publications

Small data are often used in scientific and engineering research due to the presence of
various constraints, such as time, cost, ethics, privacy, security, and technical limitations in …

被引用次数：158 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

被引用次数：324 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org

Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

被引用次数：907 相关文章所有 9 个版本

[PDF] arxiv.org

A survey of active learning for natural language processing

Z Zhang, E Strubell, E Hovy - arXiv preprint arXiv:2210.10109, 2022 - arxiv.org

In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …

被引用次数：102 相关文章所有 3 个版本

[PDF] arxiv.org

Interactive natural language processing

Z Wang, G Zhang, K Yang, N Shi, W Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within
the field of NLP, aimed at addressing limitations in existing frameworks while aligning with …

被引用次数：53 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Data expansion using back translation and paraphrasing for hate speech detection

DR Beddiar, MS Jahan, M Oussalah - Online Social Networks and Media, 2021 - Elsevier

With proliferation of user generated contents in social media platforms, establishing
mechanisms to automatically identify toxic and abusive content becomes a prime concern …

被引用次数：105 相关文章所有 6 个版本

[PDF] cambridge.org

Improving short text classification with augmented data using GPT-3

SV Balkus, D Yan - Natural Language Engineering, 2024 - cambridge.org

GPT-3 is a large-scale natural language model developed by OpenAI that can perform many
different tasks, including topic classification. Although researchers claim that it requires only …

被引用次数：50 相关文章所有 3 个版本

[PDF] mdpi.com

Generative pre-trained transformer (GPT) in research: A systematic review on data augmentation

F Sufi - Information, 2024 - mdpi.com

GPT (Generative Pre-trained Transformer) represents advanced language models that have
significantly reshaped the academic writing landscape. These sophisticated language …

被引用次数：33 相关文章所有 3 个版本

[PDF] arxiv.org

Mask-then-fill: A flexible and effective data augmentation framework for event extraction

J Gao, C Yu, W Wang, H Zhao, R Xu - arXiv preprint arXiv:2301.02427, 2023 - arxiv.org

We present Mask-then-Fill, a flexible and effective data augmentation framework for event
extraction. Our approach allows for more flexible manipulation of text and thus can generate …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

Text autoaugment: Learning compositional augmentation policy for text classification

S Ren, J Zhang, L Li, X Sun, J Zhou - arXiv preprint arXiv:2109.00523, 2021 - arxiv.org

Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-
resource or class-imbalanced situations. Traditional methods first devise task-specific …

被引用次数：37 相关文章所有 5 个版本