ABC-SVM: A Novel Artificial Bee Colony Optimization Framework for Optimal Feature Selection in Breast Cancer Diagnosis

Authors

  • Satish Kumar Kalagotla
  • Thoudam Basanta
  • Mutum Bidyarani Devi

Abstract

Background: Feature selection constitutes a critical pre-processing step in medical diagnosis because high-dimensional datasets frequently contain irrelevant or redundant features that degrade classifier performance and increase computational complexity. The artificial bee colony algorithm offers a powerful metaheuristic approach for solving complex feature selection problems, while support vector machines provide robust classification with strong theoretical foundations.

Objective: This paper proposes ABC-SVM, a novel hybrid framework that integrates artificial bee colony optimization with support vector machines for optimal feature selection in breast cancer diagnosis. The framework simultaneously optimizes feature subsets to maximize classification accuracy while minimizing the number of selected features.

Methods: The proposed ABC-SVM framework employs binary-encoded food sources that represent feature subsets. A multi-objective fitness function balances the accuracy obtained from five-fold cross-validated SVM against feature parsimony. The ABC algorithm iterates through employed bee, onlooker bee, and scout bee phases to evolve optimal feature subsets. The framework was evaluated on four benchmark medical datasets, including Wisconsin Breast Cancer, PIMA Indian Diabetes, Hepatitis, and Mammographic Mass. The performance was compared against genetic algorithm and particle swarm optimization using ten-fold cross-validation with five repeats.

Results: ABC-SVM achieved 98.71% accuracy on the Wisconsin dataset with 66.7% feature reduction, selecting 3.0 features from the original 9 features, thereby outperforming GA-SVM, which achieved 97.94% accuracy with 55.6% reduction and PSO-SVM, which achieved 98.21% accuracy with 61.1% reduction. On the PIMA dataset, ABC-SVM achieved 86.78% accuracy with 60.0% feature reduction, compared to GA-SVM at 84.56% and PSO-SVM at 85.12%. On the Hepatitis dataset, ABC-SVM achieved 87.93% accuracy with 62.6% reduction. The algorithm converged within 50 to 80 iterations, demonstrating an efficient exploration-exploitation balance. The selected feature subsets aligned with clinical knowledge, including bare nuclei, clump thickness, and uniform cell size for breast cancer, glucose and BMI for diabetes, and liver function tests for hepatitis.

Conclusion: ABC-SVM provides an effective framework for optimal feature selection in medical diagnosis, achieving superior feature reduction and improved classification accuracy compared to standard SVM and competing metaheuristics. The multi-objective fitness function successfully balances accuracy and parsimony, producing clinically interpretable feature subsets. The framework’s consistent performance across diverse medical datasets demonstrates its broad applicability for developing parsimonious, accurate, and interpretable clinical decision support systems.

References

R. Iranzad and X. Liu, “A review of random forest-based feature selection methods for data science education and applications,” International Journal of Data Science and Analytics, vol. 20, no. 2, pp. 197–211, Aug. 2025.

N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A review of feature selection methods for machine learning-based disease risk prediction,” Frontiers in Bioinformatics, vol. 2, p. 927312, Jun. 2022.

B. Xue, M. Zhang, W. N. Browne, and X. Yao, “A survey on evolutionary computation approaches to feature selection,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 4, pp. 606–626, Nov. 2015.

J. R. Devadason, P. S. Hepsiba, and D. G. Solomon, “Case studies on the applications of the artificial bee colony algorithm,” Sādhanā, vol. 49, no. 2, p. 152, Apr. 2024.

B. Abdollahzadeh et al., “Puma optimizer (PO): A novel metaheuristic optimization algorithm and its application in machine learning,” Cluster Computing, vol. 27, no. 4, pp. 5235–5283, Jul. 2024.

D. Karaboga, B. Gorkemli, C. Ozturk, and N. Karaboga, “A comprehensive survey: Artificial bee colony (ABC) algorithm and applications,” Artificial Intelligence Review, vol. 42, no. 1, pp. 21–57, Jun. 2014.

H. M. Zawbaa, E. Emary, and C. Grosan, “Feature selection via chaotic antlion optimization,” PLOS ONE, vol. 11, no. 3, p. e0150652, Mar. 2016.

G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor, “Statistical learning,” in An Introduction to Statistical Learning: With Applications in Python, Cham, Switzerland: Springer, 2023, pp. 15–67.

C.-L. Huang and C.-J. Wang, “A GA-based feature selection and parameter optimisation for support vector machines,” Expert Systems with Applications, vol. 31, no. 2, pp. 231–240, Aug. 2006.

S.-W. Lin, K.-C. Ying, S.-C. Chen, and Z.-J. Lee, “Particle swarm optimization for parameter determination and feature selection of support vector machines,” Expert Systems with Applications, vol. 35, no. 4, pp. 1817–1824, Nov. 2008.

M. S. Uzer, N. Yilmaz, and O. Inan, “Feature selection method based on artificial bee colony algorithm and support vector machines for medical datasets classification,” The Scientific World Journal, vol. 2013, no. 1, p. 419187, 2013.

Published

2026-04-13