Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

Keyword‐assisted topic models

S Eshima, K Imai, T Sasaki - American Journal of Political …, 2024 - Wiley Online Library
In recent years, fully automated content analysis based on probabilistic topic models has
become popular among social scientists because of their scalability. However, researchers …

Matching with text data: An experimental evaluation of methods for matching documents and of measuring match quality

R Mozer, L Miratrix, AR Kaufman… - Political …, 2020 - cambridge.org
Matching for causal inference is a well-studied problem, but standard methods fail when the
units to match are text documents: the high-dimensional and rich nature of the data renders …

A universal information theoretic approach to the identification of stopwords

M Gerlach, H Shi, LAN Amaral - Nature Machine Intelligence, 2019 - nature.com
One of the most widely used approaches in natural language processing and information
retrieval is the so-called bag-of-words model. A common component of such methods is the …

Seeded topic models in digital archives: Analyzing interpretations of immigration in Swedish newspapers, 1945–2019

M Hurtado Bodell, M Magnusson… - … Methods & Research, 2024 - journals.sagepub.com
Sociologists are discussing the need for more formal ways to extract meaning from digital
text archives. We focus attention on the seeded topic model, a semi-supervised extension to …

[PDF][PDF] oolong: An R package for validating automated content analysis tools

C Chan, M Sältzer - The Journal of Open Source …, 2020 - madoc.bib.uni-mannheim.de
While the validation of statistical properties of topics models is well established, the
substantive meaning of categories uncovered is often less clear and their interpretation …

Negation, coordination, and quantifiers in contextualized language models

AL Kalouli, R Sevastjanova, C Beck… - arXiv preprint arXiv …, 2022 - arxiv.org
With the success of contextualized language models, much research explores what these
models really learn and in which cases they still fail. Most of this work focuses on specific …

Twitter-based analysis reveals differential COVID-19 concerns across areas with socioeconomic disparities

Y Su, A Venkat, Y Yadav, LB Puglisi… - Computers in Biology and …, 2021 - Elsevier
Objective We sought to understand spatial-temporal factors and socioeconomic disparities
that shaped US residents' response to COVID-19 as it emerged. Methods We mined …

Semantic Topic Chains for Modeling Temporality of Themes in Online Student Discussion Forums.

H Chopra, Y Lin, MA Samadi, JG Cavazos, R Yu… - … Educational Data Mining …, 2023 - ERIC
Exploring students' discourse in academic settings over time can provide valuable insight
into the evolution of learner engagement and participation in online learning. In this study …

Monitoring cyber SentiHate social behavior during COVID-19 pandemic in North America

F Alzamzami, A El Saddik - Ieee Access, 2021 - ieeexplore.ieee.org
With communications being shifted to online social networks (OSNs) as a result of travel and
social restrictions during COVID-19 pandemic, the need has arisen for discovering emerging …