Machine learning with oversampling and undersampling techniques: overview study and experimental results

R Mohammed, J Rawashdeh… - 2020 11th international …, 2020 - ieeexplore.ieee.org
Data imbalance in Machine Learning refers to an unequal distribution of classes within a
dataset. This issue is encountered mostly in classification tasks in which the distribution of …

REFORMS: Consensus-based Recommendations for Machine-learning-based Science

S Kapoor, EM Cantrell, K Peng, TH Pham, CA Bail… - Science …, 2024 - science.org
Machine learning (ML) methods are proliferating in scientific research. However, the
adoption of these methods has been accompanied by failures of validity, reproducibility, and …

Handling data irregularities in classification: Foundations, trends, and future challenges

S Das, S Datta, BB Chaudhuri - Pattern Recognition, 2018 - Elsevier
Most of the traditional pattern classifiers assume their input data to be well-behaved in terms
of similar underlying class distributions, balanced size of classes, the presence of a full set of …

Experimental perspectives on learning from imbalanced data

J Van Hulse, TM Khoshgoftaar… - Proceedings of the 24th …, 2007 - dl.acm.org
We present a comprehensive suite of experimentation on the subject of learning from
imbalanced data. When classes are imbalanced, many learning algorithms can suffer from …

On the class imbalance problem

X Guo, Y Yin, C Dong, G Yang… - 2008 Fourth international …, 2008 - ieeexplore.ieee.org
The class imbalance problem has been recognized in many practical domains and a hot
topic of machine learning in recent years. In such a problem, almost all the examples are …

[PDF][PDF] Balancing training data for automated annotation of keywords: a case study.

GE Batista, ALC Bazzan, MC Monard - Wob, 2003 - inf.ufrgs.br
There has been an increasing interest in tools for automating the annotation of databases.
Machine learning techniques are promising candidates to help curators to, at least, guide …

[PDF][PDF] Leave a reply: An analysis of weblog comments

G Mishne, N Glance - Third annual workshop on the …, 2006 - ambuehler.ethz.ch
Access to weblogs, both through commercial services and in academic studies, is usually
limited to the content of the weblog posts. This overlooks an important aspect distinguishing …

Preprocessing unbalanced data using support vector machine

MAH Farquad, I Bose - Decision Support Systems, 2012 - Elsevier
This paper deals with the application of support vector machine (SVM) to deal with the class
imbalance problem. The objective of this paper is to examine the feasibility and efficiency of …

Characterizing and predicting blocking bugs in open source projects

H Valdivia Garcia, E Shihab - … of the 11th working conference on mining …, 2014 - dl.acm.org
As software becomes increasingly important, its quality becomes an increasingly important
issue. Therefore, prior work focused on software quality and proposed many prediction …

[PDF][PDF] New algorithms for efficient high-dimensional nonparametric classification.

T Liu, AW Moore, A Gray, C Cardie - Journal of machine learning research, 2006 - jmlr.org
This paper is about non-approximate acceleration of high-dimensional nonparametric
operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if …