Comparison of text preprocessing methods
CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …
a key area that directly affects the natural language processing (NLP) application results. For …
Keyword‐assisted topic models
In recent years, fully automated content analysis based on probabilistic topic models has
become popular among social scientists because of their scalability. However, researchers …
become popular among social scientists because of their scalability. However, researchers …
Matching with text data: An experimental evaluation of methods for matching documents and of measuring match quality
Matching for causal inference is a well-studied problem, but standard methods fail when the
units to match are text documents: the high-dimensional and rich nature of the data renders …
units to match are text documents: the high-dimensional and rich nature of the data renders …
A universal information theoretic approach to the identification of stopwords
One of the most widely used approaches in natural language processing and information
retrieval is the so-called bag-of-words model. A common component of such methods is the …
retrieval is the so-called bag-of-words model. A common component of such methods is the …
Seeded topic models in digital archives: Analyzing interpretations of immigration in Swedish newspapers, 1945–2019
M Hurtado Bodell, M Magnusson… - … Methods & Research, 2024 - journals.sagepub.com
Sociologists are discussing the need for more formal ways to extract meaning from digital
text archives. We focus attention on the seeded topic model, a semi-supervised extension to …
text archives. We focus attention on the seeded topic model, a semi-supervised extension to …
[PDF][PDF] oolong: An R package for validating automated content analysis tools
While the validation of statistical properties of topics models is well established, the
substantive meaning of categories uncovered is often less clear and their interpretation …
substantive meaning of categories uncovered is often less clear and their interpretation …
Negation, coordination, and quantifiers in contextualized language models
AL Kalouli, R Sevastjanova, C Beck… - arXiv preprint arXiv …, 2022 - arxiv.org
With the success of contextualized language models, much research explores what these
models really learn and in which cases they still fail. Most of this work focuses on specific …
models really learn and in which cases they still fail. Most of this work focuses on specific …
Twitter-based analysis reveals differential COVID-19 concerns across areas with socioeconomic disparities
Objective We sought to understand spatial-temporal factors and socioeconomic disparities
that shaped US residents' response to COVID-19 as it emerged. Methods We mined …
that shaped US residents' response to COVID-19 as it emerged. Methods We mined …
Semantic Topic Chains for Modeling Temporality of Themes in Online Student Discussion Forums.
Exploring students' discourse in academic settings over time can provide valuable insight
into the evolution of learner engagement and participation in online learning. In this study …
into the evolution of learner engagement and participation in online learning. In this study …
Monitoring cyber SentiHate social behavior during COVID-19 pandemic in North America
F Alzamzami, A El Saddik - Ieee Access, 2021 - ieeexplore.ieee.org
With communications being shifted to online social networks (OSNs) as a result of travel and
social restrictions during COVID-19 pandemic, the need has arisen for discovering emerging …
social restrictions during COVID-19 pandemic, the need has arisen for discovering emerging …