Information retrieval on the web

M Kobayashi, K Takeda - ACM computing surveys (CSUR), 2000 - dl.acm.org
In this paper we review studies of the growth of the Internet and technologies that are useful
for information search and retrieval on the Web. We present data on the Internet from several …

Recent advances in directional statistics

A Pewsey, E García-Portugués - Test, 2021 - Springer
Mainstream statistical methodology is generally applicable to data observed in Euclidean
space. There are, however, numerous contexts of considerable scientific interest in which …

[HTML][HTML] Characterizing long COVID in an international cohort: 7 months of symptoms and their impact

HE Davis, GS Assaf, L McCorkell, H Wei, RJ Low… - …, 2021 - thelancet.com
Background A significant number of patients with COVID-19 experience prolonged
symptoms, known as Long COVID. Few systematic studies have investigated this …

Fast, sensitive and accurate integration of single-cell data with Harmony

I Korsunsky, N Millard, J Fan, K Slowikowski, F Zhang… - Nature …, 2019 - nature.com
The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional
characterization of cell types across a wide variety of biological and clinical conditions …

[图书][B] Machine learning for text: An introduction

CC Aggarwal, CC Aggarwal - 2018 - Springer
The extraction of useful insights from text with various types of statistical algorithms is
referred to as text mining, text analytics, or machine learning from text. The choice of …

Semantic encoding during language comprehension at single-cell resolution

M Jamali, B Grannan, J Cai, AR Khanna, W Muñoz… - Nature, 2024 - nature.com
From sequences of speech sounds, or letters, humans can extract rich and nuanced
meaning through language. This capacity is essential for human communication. Yet …

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

E Schubert - ACM SIGKDD Explorations Newsletter, 2023 - dl.acm.org
A major challenge when using k-means clustering often is how to choose the parameter k,
the number of clusters. In this letter, we want to point out that it is very easy to draw poor …

[PDF][PDF] Improving word representations via global context and multiple word prototypes

EH Huang, R Socher, CD Manning… - Proceedings of the 50th …, 2012 - aclanthology.org
Unsupervised word representations are very useful in NLP tasks both as inputs to learning
algorithms and as extra word features in NLP systems. However, most of these models are …

Learning feature representations with k-means

A Coates, AY Ng - Neural Networks: Tricks of the Trade: Second Edition, 2012 - Springer
Many algorithms are available to learn deep hierarchies of features from unlabeled data,
especially images. In many cases, these algorithms involve multi-layered networks of …

Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions

W Fan, L Yang, N Bouguila - IEEE Transactions on Pattern …, 2021 - ieeexplore.ieee.org
This paper aims at proposing an unsupervised hierarchical nonparametric Bayesian
framework for modeling axial data (ie, observations are axes of direction) that can be …