Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L Jin - arXiv preprint arXiv:2402.18041, 2024 - arxiv.org
This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Comprehensive review and comparative analysis of transformer models in sentiment analysis

H Bashiri, H Naderi - Knowledge and Information Systems, 2024 - Springer
Sentiment analysis has become an important task in natural language processing because it
is used in many different areas. This paper gives a detailed review of sentiment analysis …

Streamlining social media information retrieval for public health research with deep learning

Y Hua, J Wu, S Lin, M Li, Y Zhang… - Journal of the …, 2024 - academic.oup.com
Objective Social media-based public health research is crucial for epidemic surveillance, but
most studies identify relevant corpora with keyword-matching. This study develops a system …

Sexual and gender-diverse individuals face more Health challenges during COVID-19: A large-scale social media analysis with natural language processing

Z Zhang, Y Hua, P Zhou, S Lin, M Li, Y Zhang… - Health Data …, 2024 - spj.science.org
Background: The COVID-19 pandemic has caused a disproportionate impact on the sexual
and gender-diverse (SGD) community. Compared with non-SGD populations, their social …

ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations

Z Wang, Y Wang, H Zhang, W Wang, J Qi, J Chen… - Scientific Reports, 2024 - nature.com
Accurately assigning standardized diagnosis and procedure codes from clinical text is
crucial for healthcare applications. However, this remains challenging due to the complexity …

HealthE: Recognizing Health Advice & Entities in Online Health Communities

J Gatto, P Seegmiller, GM Johnston, M Basak… - Proceedings of the …, 2023 - ojs.aaai.org
The task of extracting and classifying entities is at the core of important Health-NLP systems
such as misinformation detection, medical dialogue modeling, and patient-centric …

Characterizing Public Sentiments and Drug Interactions during COVID-19: A Pretrained Language Model and Network Analysis of Social Media Discourse

W Li, Y Hua, P Zhou, L Zhou, X Xu, J Yang - medRxiv, 2024 - medrxiv.org
Objective Harnessing drug-related data posted on social media in real time can offer
insights into how the pandemic impacts drug use and monitor misinformation. This study …

A Dataset for Entity Recognition of COVID-19 Public Opinion in Social Media

L Hou, L Li, D Ren, X Wang, T Yu… - 2023 10th International …, 2023 - ieeexplore.ieee.org
With the outbreak of the epidemic, it has had a major impact on the economy, society, and
people's lives. The entity mining of network public opinion is important, which is helpful for …

Multi-step Transfer Learning in Natural Language Processing for the Health Domain

T Manaka, TV Zyl, D Kar, A Wade - Neural Processing Letters, 2024 - Springer
The restricted access to data in healthcare facilities due to patient privacy and confidentiality
policies has led to the application of general natural language processing (NLP) techniques …

Denoising Longitudinal Social Media for Pandemic Monitoring

S Lin, L Garay, Y Hua, Z Guo, X Xu, J Yang - medRxiv, 2024 - medrxiv.org
Objective Current studies leveraging social media data for disease monitoring face
challenges like noisy colloquial language and insufficient tracking of user disease …