Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

Nl-augmenter: A framework for task-sensitive natural language augmentation

KD Dhole, V Gangal, S Gehrmann, A Gupta, Z Li… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation is an important component in the robustness evaluation of models in
natural language processing (NLP) and in enhancing the diversity of the data they are …

Robust encodings: A framework for combating adversarial typos

E Jones, R Jia, A Raghunathan, P Liang - arXiv preprint arXiv:2005.01229, 2020 - arxiv.org
Despite excellent performance on many tasks, NLP systems are easily fooled by small
adversarial perturbations of inputs. Existing procedures to defend against such perturbations …

Context-aware misinformation detection: A benchmark of deep learning architectures using word embeddings

VI Ilie, CO Truică, ES Apostol, A Paschke - IEEE Access, 2021 - ieeexplore.ieee.org
New mass media paradigms for information distribution have emerged with the digital age.
With new digital-enabled mass media, the communication process is centered around the …

Text adversarial attacks and defenses: Issues, taxonomy, and perspectives

X Han, Y Zhang, W Wang… - Security and …, 2022 - Wiley Online Library
Deep neural networks (DNNs) have been widely used in many fields due to their powerful
representation learning capabilities. However, they are exposed to serious threats caused …

Computational models to study language processing in the human brain: A survey

S Wang, J Sun, Y Zhang, N Lin, MF Moens… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite differing from the human language processing mechanism in implementation and
algorithms, current language models demonstrate remarkable human-like or surpassing …

Classification benchmarks for under-resourced bengali language based on multichannel convolutional-lstm network

MR Karim, BR Chakravarthi… - 2020 IEEE 7th …, 2020 - ieeexplore.ieee.org
Exponential growths of social media and micro-blogging sites not only provide platforms for
empowering freedom of expressions and individual voices, but also enables people to …

Automatic classification of scanned electronic health record documents

H Goodrum, K Roberts, EV Bernstam - International journal of medical …, 2020 - Elsevier
Abstract Objectives Electronic Health Records (EHRs) contain scanned documents from a
variety of sources such as identification cards, radiology reports, clinical correspondence …

On learning and representing social meaning in NLP: a sociolinguistic perspective

D Nguyen, L Rosseel, J Grieve - … of the 2021 Conference of the …, 2021 - aclanthology.org
The field of NLP has made substantial progress in building meaning representations.
However, an important aspect of linguistic meaning, social meaning, has been largely …

Improving zero-shot cross-lingual transfer learning via robust training

KH Huang, WU Ahmad, N Peng, KW Chang - arXiv preprint arXiv …, 2021 - arxiv.org
Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show
great potential for zero-shot cross-lingual transfer. However, these multilingual encoders do …