[HTML][HTML] A survey of tax risk detection using data mining techniques

Q Zheng, Y Xu, H Liu, B Shi, J Wang, B Dong - Engineering, 2024 - Elsevier
Tax risk behavior causes serious loss of fiscal revenue, damages the country's public
infrastructure, and disturbs the market economic order of fair competition. In recent years, tax …

Fast noise removal for k-means clustering

S Im, MM Qaem, B Moseley, X Sun… - International …, 2020 - proceedings.mlr.press
This paper considers k-means clustering in the presence of noise. It is known that k-means
clustering is highly sensitive to noise, and thus noise should be removed to obtain a quality …

A weighted k-member clustering algorithm for k-anonymization

Y Yan, EA Herman, A Mahmood, T Feng, P Xie - Computing, 2021 - Springer
As a representative model for privacy preserving data publishing, K-anonymity has raised a
considerable number of questions for researchers over the past few decades. Among them …

Fast algorithms for distributed k-clustering with outliers

J Huang, Q Feng, Z Huang, J Xu… - … on Machine Learning, 2023 - proceedings.mlr.press
In this paper, we study the $ k $-clustering problems with outliers in distributed setting. The
current best results for the distributed $ k $-center problem with outliers have quadratic local …

Greedy Strategy Works for -Center Clustering with Outliers and Coreset Construction

H Ding, H Yu, Z Wang - arXiv preprint arXiv:1901.08219, 2019 - arxiv.org
We study the problem of $ k $-center clustering with outliers in arbitrary metrics and
Euclidean space. Though a number of methods have been developed in the past decades, it …

Privacy preserving dynamic data release against synonymous linkage based on microaggregation

Y Yan, AH Eyeleko, A Mahmood, J Li, Z Dong, F Xu - Scientific Reports, 2022 - nature.com
The rapid development of the mobile Internet coupled with the widespread use of intelligent
terminals have intensified the digitization of personal information and accelerated the …

[HTML][HTML] MapReduce algorithms for robust center-based clustering in doubling metrics

E Dandolo, A Mazzetto, A Pietracaprina… - Journal of Parallel and …, 2024 - Elsevier
Clustering is a pivotal primitive for unsupervised learning and data analysis. A popular
variant is the (k, ℓ)-clustering problem, where, given a pointset P from a metric space, one …

Federated matrix factorization: Algorithm design and application to data clustering

S Wang, TH Chang - IEEE Transactions on Signal Processing, 2022 - ieeexplore.ieee.org
Recent demands on data privacy have called for federated learning (FL) as a new
distributed learning paradigm in massive and heterogeneous networks. Although many FL …

An Improved Approximation Algorithm for the k-Means Problem with Penalties

Q Feng, Z Zhang, F Shi, J Wang - … , FAW 2019, Sanya, China, April 29 …, 2019 - Springer
The clustering problem has been paid lots of attention in various fields of compute science.
However, in many applications, the existence of noisy data poses a big challenge for the …

A practical algorithm for distributed clustering and outlier detection

J Chen, E Sadeqi Azer… - Advances in Neural …, 2018 - proceedings.neurips.cc
We study the classic k-means/median clustering, which are fundamental problems in
unsupervised learning, in the setting where data are partitioned across multiple sites, and …