Logotipo del repositorio
Comunidades y Colecciones
Estadísticas
¿Nuevo Usuario? Pulse aquí para registrarse¿Has olvidado tu contraseña?
  1. Inicio
  2. Producción Científica UPeU
  3. Publicaciones
  4. Performance of Classification Algorithms Under Class Imbalance: Simulation and Real-World Evidence

Performance of Classification Algorithms Under Class Imbalance: Simulation and Real-World Evidence

Author(s)
Iqra Arshad
Muhammad Umair
Faheem Jan
Hasnain Iftikhar
Paulo Canas Rodrigues
Ronny Ivan Gonzales Medina
Date Issued
1 de enero de 2025
Type
Article
Volume
13
Start Page
179672
End Page
179685
DOI
10.1109/access.2025.3620264
Abstract
Class imbalance is a persistent challenge in machine learning, particularly in high-stakes applications such as medical diagnostics, bioinformatics, and fraud detection, where the minority class often represents critical cases. While prior research has examined the effect of imbalance on classifier performance, little attention has been paid to establishing practical guidelines for the minimum proportion of minority samples required to achieve reliable sensitivity. In this study, we conduct extensive simulations using synthetic datasets and evaluate five widely used classification algorithms: Logistic Regression (Logit), Support Vector Machines (SVM), Random Forest, XGBoost, and Neural Networks (NNs). Our analysis reveals that logistic regression is more effective in identifying minority-class instances under an imbalanced class distribution in terms of F1 score and sensitivity, whereas Neural Network slightly performs better for a balanced-class distribution than logistic regression. Importantly, we identify a practical threshold for minority class representation: classifier sensitivity declines sharply when positive samples fall below approximately 25–30%. This finding is validated on eight real-world datasets, including large-scale applications, where Neural Networks and XGBoost demonstrate superior sensitivity. By establishing an actionable threshold, this study contributes practical guidance for dataset design and model selection in imbalanced classification problems.
Subjects

Logistic regression

Classifier (UML)

Artificial intelligen...

Machine learning

Computer science

Support vector machin...

Artificial neural net...

Statistical classific...

Class (philosophy)

Random forest

Logistic model tree

Multiclass classifica...

Data mining

Selection (genetic al...

Feature selection

Linear classifier

Regression

One-class classificat...

Pattern recognition (...

Deep neural networks

Algorithm

Logistic regression

Classifier (UML)

Support vector machin...

Artificial neural net...

Statistical classific...

Class (philosophy)

Random forest

Logistic model tree

Physical Sciences Com...

Social Sciences Busin...

Metrics
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your Institution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Desarrollado con Software DSpace-CRIS - Extensión mantenida y optimizada por 4Science

  • Accessibility settings
  • Política de privacidad
  • Acuerdo de usuario final
  • Enviar Sugerencias