Distance estimation in numerical data sets with missing values
The possibility of missing or incomplete data is often ignored when describing statistical or
machine learning methods, but as it is a common problem in practice, it is relevant to …
machine learning methods, but as it is a common problem in practice, it is relevant to …
FINNIM: Iterative imputation of missing values in dissolved gas analysis dataset
Missing values are a common occurrence in a number of real world databases, and
statistical methods have been developed to deal with this problem, referred to as missing …
statistical methods have been developed to deal with this problem, referred to as missing …
Coresets for clustering with missing values
V Braverman, S Jiang… - Advances in Neural …, 2021 - proceedings.neurips.cc
We provide the first coreset for clustering points in $\mathbb {R}^ d $ that have multiple
missing values (coordinates). Previous coreset constructions only allow one missing …
missing values (coordinates). Previous coreset constructions only allow one missing …
Clustering with missing features: a penalized dissimilarity measure based approach
Many real-world clustering problems are plagued by incomplete data characterized by
missing or absent features for some or all of the data instances. Traditional clustering …
missing or absent features for some or all of the data instances. Traditional clustering …
CKNNI: an improved knn-based missing value handling technique
C Jiang, Z Yang - … Intelligent Computing Theories and Applications: 11th …, 2015 - Springer
In data mining field, experimental data sets are often incomplete due to the imperfect nature
of real world situations. However, the incompleteness of data sets generally leads to biased …
of real world situations. However, the incompleteness of data sets generally leads to biased …
What are clusters in high dimensions and are they difficult to find?
The distribution of distances between points in a high-dimensional data set tends to look
quite different from the distribution of the distances in a low-dimensional data set …
quite different from the distribution of the distances in a low-dimensional data set …
Know your monkey: identifying primate conservation challenges in an indigenous Kichwa community using an ethnoprimatological approach
CA Stafford, J Alarcon-Valenzuela, J Patiño… - Folia Primatologica, 2016 - brill.com
Increasing pressure on tropical forests is continually highlighting the need to find new
solutions that mitigate the impact of human populations on biodiversity. However …
solutions that mitigate the impact of human populations on biodiversity. However …
An efficient k‐means‐type algorithm for clustering datasets with incomplete records
A Lithio, R Maitra - Statistical Analysis and Data Mining: The …, 2018 - Wiley Online Library
The k‐means algorithm is arguably the most popular nonparametric clustering method but
cannot generally be applied to datasets with incomplete records. The usual practice then is …
cannot generally be applied to datasets with incomplete records. The usual practice then is …
Imputation method of missing values for dissolved gas analysis data based on iterative KNN and XGBoost
L Qiao, R Ran, H Wu, Q Zhou, S Liu, Y Liu - Proceedings of the 2018 …, 2018 - dl.acm.org
Power transformers are an important part of the power system. Accurate monitoring of its
operating status is particularly important for the normal and stable operation of the entire …
operating status is particularly important for the normal and stable operation of the entire …
Making kernel density estimation robust towards missing values in highly incomplete multivariate data without imputation
R Leibrandt, S Günnemann - Proceedings of the 2018 SIAM International …, 2018 - SIAM
Density estimation is one of the most frequently used data analytics techniques. A major
challenge of real-world datasets is missing values, originating eg from sampling errors or …
challenge of real-world datasets is missing values, originating eg from sampling errors or …