Performance of Classification Algorithms Under Class Imbalance: Simulation and Real-World Evidence

Iqra Arshad, Muhammad Umair, Faheem Jan, Hasnain Iftikhar, Paulo Canas Rodrigues, Ronny Ivan Gonzales Medina, Javier Linkolk Lopez-Gonzales

Research output: Contribution to journalArticlepeer-review

Abstract

Class imbalance is a persistent challenge in machine learning, particularly in high-stakes applications such as medical diagnostics, bioinformatics, and fraud detection, where the minority class often represents critical cases that require special attention and consideration. While prior research has examined the effect of imbalance on classifier performance, little attention has been paid to establishing practical guidelines for the minimum proportion of minority samples required to achieve reliable sensitivity. In this study, we conduct extensive simulations using synthetic datasets and evaluate five widely used classification algorithms: Logistic Regression (Logit), Support Vector Machines (SVM), Random Forest, XGBoost, and Neural Networks (NNs). Our analysis reveals that logistic regression is more effective in identifying minority-class instances under an imbalanced class distribution, as measured by F1 score and sensitivity. In contrast, neural networks slightly perform better for a balanced-class distribution than logistic regression. Importantly, we identify a practical threshold for minority class representation: classifier sensitivity declines sharply when the proportion of positive samples falls below approximately 25–30%. This finding is validated on eight real-world datasets, including large-scale applications, where Neural Networks and XGBoost demonstrate superior sensitivity. By establishing an actionable threshold, this study contributes practical guidance for dataset design and model selection in imbalanced classification problems.

Original languageEnglish
Pages (from-to)179672-179685
Number of pages14
JournalIEEE Access
Volume13
DOIs
StatePublished - 2025
Externally publishedYes

Fingerprint

Dive into the research topics of 'Performance of Classification Algorithms Under Class Imbalance: Simulation and Real-World Evidence'. Together they form a unique fingerprint.

Cite this