Distance estimation in numerical data sets with missing values

E Eirola, G Doquire, M Verleysen, A Lendasse - Information Sciences, 2013 - Elsevier
The possibility of missing or incomplete data is often ignored when describing statistical or
machine learning methods, but as it is a common problem in practice, it is relevant to …

FINNIM: Iterative imputation of missing values in dissolved gas analysis dataset

Z Sahri, R Yusof, J Watada - IEEE Transactions on Industrial …, 2014 - ieeexplore.ieee.org
Missing values are a common occurrence in a number of real world databases, and
statistical methods have been developed to deal with this problem, referred to as missing …

Coresets for clustering with missing values

V Braverman, S Jiang… - Advances in Neural …, 2021 - proceedings.neurips.cc
We provide the first coreset for clustering points in $\mathbb {R}^ d $ that have multiple
missing values (coordinates). Previous coreset constructions only allow one missing …

Clustering with missing features: a penalized dissimilarity measure based approach

S Datta, S Bhattacharjee, S Das - Machine Learning, 2018 - Springer
Many real-world clustering problems are plagued by incomplete data characterized by
missing or absent features for some or all of the data instances. Traditional clustering …

CKNNI: an improved knn-based missing value handling technique

C Jiang, Z Yang - … Intelligent Computing Theories and Applications: 11th …, 2015 - Springer
In data mining field, experimental data sets are often incomplete due to the imperfect nature
of real world situations. However, the incompleteness of data sets generally leads to biased …

What are clusters in high dimensions and are they difficult to find?

F Klawonn, F Höppner, B Jayaram - … , CHDD 2012, Naples, Italy, May 15 …, 2015 - Springer
The distribution of distances between points in a high-dimensional data set tends to look
quite different from the distribution of the distances in a low-dimensional data set …

Know your monkey: identifying primate conservation challenges in an indigenous Kichwa community using an ethnoprimatological approach

CA Stafford, J Alarcon-Valenzuela, J Patiño… - Folia Primatologica, 2016 - brill.com
Increasing pressure on tropical forests is continually highlighting the need to find new
solutions that mitigate the impact of human populations on biodiversity. However …

An efficient k‐means‐type algorithm for clustering datasets with incomplete records

A Lithio, R Maitra - Statistical Analysis and Data Mining: The …, 2018 - Wiley Online Library
The k‐means algorithm is arguably the most popular nonparametric clustering method but
cannot generally be applied to datasets with incomplete records. The usual practice then is …

Imputation method of missing values for dissolved gas analysis data based on iterative KNN and XGBoost

L Qiao, R Ran, H Wu, Q Zhou, S Liu, Y Liu - Proceedings of the 2018 …, 2018 - dl.acm.org
Power transformers are an important part of the power system. Accurate monitoring of its
operating status is particularly important for the normal and stable operation of the entire …

Making kernel density estimation robust towards missing values in highly incomplete multivariate data without imputation

R Leibrandt, S Günnemann - Proceedings of the 2018 SIAM International …, 2018 - SIAM
Density estimation is one of the most frequently used data analytics techniques. A major
challenge of real-world datasets is missing values, originating eg from sampling errors or …