TY - JOUR
T1 - A Flat-Hierarchical Approach Based on Machine Learning Model for e-Commerce Product Classification
AU - Cotacallapa, Harold
AU - Saboya, Nemias
AU - Canas Rodrigues, Paulo
AU - Salas, Rodrigo
AU - Linkolk Lopez-Gonzales, Javier
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Within the e-commerce sphere, optimizing the product classification process assumes pivotal importance, owing to its direct influence on operational efficiency and profitability. In this context, employing machine learning algorithms stands out as a premier solution for effectively automating this process. The design of these models commonly adopts either a flat or local (hierarchical) approach. However, each of them exhibits significant limitations. The regional approach introduces taxonomic inconsistencies in predictions, whereas the flat approach becomes inefficient when dealing with extensive datasets featuring high granularity. Therefore, our research introduces a solution for hierarchical product classification based on a Machine Learning model that integrates flat and local (hierarchical) classification approaches using a 4-level electronic product dataset obtained from a renowned e-commerce platform in Latin America. In pursuit of this goal, a comparative analysis of seven machine learning algorithms, including Multinomial Naive Bayes, Linear Support Vector Classifier, Multinomial Logistic Regression, Random Forest, XGBoost, FastText, and Voting Ensemble, was conducted. This hybrid approach model performs better than models using a single approach. It surpassed the top-performing flat approach model by 0.15% and outperformed the leading local approach (Local Classifier per Level) model by 4.88%, as measured by the weighted F1-score. Additionally, this paper contributes to the academic community by presenting a significant Spanish-language dataset comprising over one million products and discussing the preprocessing techniques tailored for the dataset. It also addresses the study's inherent limitations and potential avenues for future exploration in this field.
AB - Within the e-commerce sphere, optimizing the product classification process assumes pivotal importance, owing to its direct influence on operational efficiency and profitability. In this context, employing machine learning algorithms stands out as a premier solution for effectively automating this process. The design of these models commonly adopts either a flat or local (hierarchical) approach. However, each of them exhibits significant limitations. The regional approach introduces taxonomic inconsistencies in predictions, whereas the flat approach becomes inefficient when dealing with extensive datasets featuring high granularity. Therefore, our research introduces a solution for hierarchical product classification based on a Machine Learning model that integrates flat and local (hierarchical) classification approaches using a 4-level electronic product dataset obtained from a renowned e-commerce platform in Latin America. In pursuit of this goal, a comparative analysis of seven machine learning algorithms, including Multinomial Naive Bayes, Linear Support Vector Classifier, Multinomial Logistic Regression, Random Forest, XGBoost, FastText, and Voting Ensemble, was conducted. This hybrid approach model performs better than models using a single approach. It surpassed the top-performing flat approach model by 0.15% and outperformed the leading local approach (Local Classifier per Level) model by 4.88%, as measured by the weighted F1-score. Additionally, this paper contributes to the academic community by presenting a significant Spanish-language dataset comprising over one million products and discussing the preprocessing techniques tailored for the dataset. It also addresses the study's inherent limitations and potential avenues for future exploration in this field.
KW - Machine learning
KW - e-commerce
KW - ensemble
KW - hierarchical product classification
KW - local classifier per level
UR - http://www.scopus.com/inward/record.url?scp=85193263312&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3400693
DO - 10.1109/ACCESS.2024.3400693
M3 - Article
AN - SCOPUS:85193263312
SN - 2169-3536
VL - 12
SP - 72730
EP - 72745
JO - IEEE Access
JF - IEEE Access
ER -