Machine Learning-Driven Evolutionary Fuzzy Clustering for High-Dimensional Genomic Data Analysis
DOI:
https://doi.org/10.46610/JoFSFLD.2025.v02i03.002Keywords:
Biomarker discovery, Evolutionary fuzzy clustering, Genomic data analysis, High-dimensional data, Machine learning, Precision medicineAbstract
Clustering and classification Genomic data analysis is a fundamental part of contemporary bioinformatics and precision medicine, yet clustering and classification of high-dimensional genomic data is extremely challenging due to noise and uncertainty. The traditional methods of clustering (such as hard partitioning) techniques are not practical in biological data that display overlapping cluster structures, and existing fuzzy clustering methods (such as Fuzzy C-Means (FCM)) are sensitive to initialisation and are likely to converge on local optima. This paper presents a high-dimensional genomic-specific Machine Learning-based Evolutionary Fuzzy Clustering (MLEFC) architecture to ensure these challenges are addressed. The suggested framework makes the power of the evolutionary algorithms such as Genetic Algorithms (GA) and Particle Swarm Optimization (PSO) to maximize centres of fuzzy clusters and membership degrees to address the weakness of traditional clustering. Also, machine learning models are incorporated on cluster validation, predictive analysis, and enhanced biological interpretability. This framework is tested with benchmark genomic data, and results show that it has significantly better clustering accuracy, stability and scale than the traditional ones. Additionally, the biological validation underscores the potential of the framework in finding functional groups of genes, disease subtypes and candidate biomarkers. MLEFC introduces a powerful and interpretive tool to genomic data mining by incorporating fuzzy logic, evolutionary computing, and machine learning and has direct applications to biomarker discovery, cancer subtype classification, and personalized medicine.
References
M. Phogat and D. Kumar, “Feature selection techniques for genomic data,” 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), pp. 785–790, May 2022, doi: https://doi.org/10.1109/com-it-con54601.2022.9850466.
S. K. Thiyagarajan and K. Murugan, “Performance Analysis of Ischemic Stroke Lesion Segmentation in Brain MR Images using Histogram based Filter Enhanced FCM,” In2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 1343–1348, Jan. 2023, doi: https://doi.org/10.1109/icssit55814.2023.10061114.
S. Gogoneata, A.-M. Sandoiu-Ilie, and A. M. Morega, “Numerical simulations of the Pressure-Driven and electrokinetic transport in DNA hybridization,” In2021 12th International Symposium on Advanced Topics in Electrical Engineering (ATEE), vol. 3 5a, pp. 1–4, Mar. 2021, doi: https://doi.org/10.1109/atee52255.2021.9425286.
U. Mohammad and F. Saeed, “Robustness of ML-Based Seizure Prediction Using Noisy EEG Data From Limited Channels,” In2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), vol. 2024, pp. 620–626, Apr. 2024, doi: https://doi.org/10.1109/dcoss-iot61029.2024.00097.
A. Kumar, W. Ahmed, M. Matchanov, S. Sapaev, T. L. T, and V. C. Gandhi, “Application of Quantum Machine Learning in Genomic Data Analysis Using Quantum Support Vector Machines (QSVM),” In2025 International Conference on Networks and Cryptology (NETCRYPT), pp. 864–869, May 2025, doi: https://doi.org/10.1109/netcrypt65877.2025.11102778.
P. Zhou, Y. Yu, Y. Zhang, Y. Li, and Y. Tang, “Exploring Factors Influencing Teachers’ Digital Competence Based on the Social Cognitive Theory,” In2023 Twelfth International Conference of Educational Innovation Through Technology (EITT), vol. 3, pp. 56–60, Dec. 2023, doi: https://doi.org/10.1109/eitt61659.2023.00018.
F. Zhao and Y. Liu, “Compact fuzzy BLS driven Semi-Supervised Multi-Objective Evolutionary adaptive Multiple-Kernel fuzzy clustering for color image segmentation,” 2022 4th International Conference on Natural Language Processing (ICNLP), pp. 94–99, Mar. 2022, doi: https://doi.org/10.1109/icnlp55136.2022.00024.
J. Hu, Y. Pan, T. Li, and Y. Yang, “TW-Co-MFC: Two-level weighted collaborative fuzzy clustering based on maximum entropy for multi-view data,” Tsinghua Science & Technology, vol. 26, no. 2, pp. 185–198, Jul. 2020, doi: https://doi.org/10.26599/tst.2019.9010078.
Ntzoufras, V. Palaskas, and S. Drikos, “Bayesian models for prediction of the set-difference in volleyball,” IMA Journal of Management Mathematics, vol. 32, no. 4, pp. 491–518, Mar. 2021, doi: https://doi.org/10.1093/imaman/dpab007.
F. Zhao and F. Liu, “Coarse-fine Surrogate Model Driven Preference-based Multi-objective Evolutionary Fuzzy Clustering Algorithm for Color Image Segmentation,” In2021 3rd International Conference on Natural Language Processing (ICNLP), pp. 242–247, Mar. 2021, doi: https://doi.org/10.1109/icnlp52887.2021.00047.
F. Zhao, Z. Zeng, H. Liu, R. Lan, and J. Fan, “Semisupervised approach to Surrogate-Assisted multiobjective kernel intuitionistic fuzzy clustering algorithm for color image segmentation,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 6, pp. 1023–1034, Feb. 2020, doi: https://doi.org/10.1109/tfuzz.2020.2973121.
S. L. Thomson, G. Ochoa, S. Verel, and N. Veerapen, “Inferring future Landscapes: Sampling the local optima level,” Evolutionary Computation, vol. 28, no. 4, pp. 621–641, Feb. 2020, doi: https://doi.org/10.1162/evco_a_00271.
N. H. M. Nezhad, M. G. Niasar, C. W. Hagen, and P. Kruit, “Local versus Global Optimization of Electron Lens System Design,” In2020 IEEE 6th International Conference on Optimization and Applications (ICOA), Apr. 2020, doi: https://doi.org/10.1109/icoa49421.2020.9094475.
C.-W. Yeh, C.-W. Huang, C.-L. Yang, and Y.-T. Wang, “A High Performance Computing Platform for Big Biological Data Analysis,”. In2023 9th International Conference on Applied System Innovation (ICASI), Apr. 2023, doi: https://doi.org/10.1109/icasi57738.2023.10179527.
M. K. Eryilmaz, C. Kuzudisli, and B. B. Gungor, “İmmün Bağlantılı Hastalıklarda Aktif Alt Ağ Araması ile Ortak Hastalık Oluşum Mekanizmalarının Tespiti : Identification of Shared Pathways Among Immune Related Diseases Utilizing Active Subnetworks,” In2020 5th International Conference on Computer Science and Engineering (UBMK), vol. 12, pp. 378–382, Sep. 2020, doi: https://doi.org/10.1109/ubmk50275.2020.9219492.
F. Amini, G. Hu, and L. Wang, “Application of the Two-layer Wrapper-Embedded Feature Selection Method to Improve Genomic Selection,” In2022 17th Annual System of Systems Engineering Conference (SOSE), vol. 58, pp. 232–237, Jun. 2022, doi: https://doi.org/10.1109/sose55472.2022.9812666.
Q. Zhang, G. Chen, and Q. Yan, “A new calculation method for membership degree and non-membership degree of PFS,” In2020 39th Chinese Control Conference (CCC), pp. 6082–6085, Jul. 2020, doi: https://doi.org/10.23919/ccc50068.2020.9188596.
H. Li and Y.-H. Ni, “Actor-Critic Method to Solve the Linear Quadratic Problem of Markov Jump Linear System,” In2024 43rd Chinese Control Conference (CCC), pp. 2450–2455, Jul. 2024, doi: https://doi.org/10.23919/ccc63176.2024.10661963.
T. Goo, C. Lee, S. Shin, H. Kim, and T. Park, “Uncertainty Quantification and Statistical Inference for Biologically Informed Neural Networks,” In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 4423–4428, Dec. 2024, doi: https://doi.org/10.1109/bibm62325.2024.10821846.
Sun, P. Wang, C. Zhang, C. Hu, Q. Sun, and R. Lu, “Data flow adaptive clustering method in data center,” 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), vol. 22, pp. 1748–1752, Jun. 2022, doi: https://doi.org/10.1109/itaic54216.2022.9836596.
O. Menukhin, N. Mehandjiev, and Q. Quboa, “Digital Twins in the Metaverse for Collaborative Discovery of Contextual Factors,” In2024 2nd International Conference on Intelligent Metaverse Technologies & Applications (iMETA), pp. 065–069, Nov. 2024, doi: https://doi.org/10.1109/imeta62882.2024.10808107.
P. Ram, N. Hong, H. Xu, and X. Jiang, “Mapping Study Variables to Common Data Elements Using GPT for Sheets: Towards Standardized Data Collection and Sharing,” In2024 IEEE 12th International Conference on Healthcare Informatics (ICHI), pp. 320–325, Jun. 2024, doi: https://doi.org/10.1109/ichi61247.2024.00048.
C. Zhang, J. Wang, X. Li, F. Fu, and W. Wang, “Clustering Centroid Selection using a K-means and Rapid Density Peak Search Fusion Algorithm,”. In2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), vol. 7, pp. 201–207, Oct. 2020, doi: https://doi.org/10.1109/icsess49938.2020.9237746.
Yichen, L. Bo, Z. Chenqian, and M. Teng, “Intelligent Frequency Assignment Algorithm Based on Hybrid Genetic Algorithm,” In2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL), pp. 461–467, Jul. 2020, doi: https://doi.org/10.1109/cvidl51233.2020.00-50.
A. Anand, A. Khartade, A. Maurya, and S. K. Moon, “Authenticating Signals Using Machine Learning,” In2024 Intelligent Systems and Machine Learning Conference (ISML), pp. 45–50, May 2024, doi: https://doi.org/10.1109/isml60050.2024.11007392.
G. Abdelmoumin, C. Liu, and D. Rawat, “Understanding the Computational Complexity of Diverse Classes of Turing and Super-Turing Computational Models,” In2023 International Conference on Computational Science and Computational Intelligence (CSCI), vol. 2, pp. 411–419, Dec. 2023, doi: https://doi.org/10.1109/csci62032.2023.00073.
N. Kalanat, E. Khanjari, and A. Khanshan, “Extracting actionable knowledge from social networks using structural features,” IEEE Access, vol. 8, pp. 59637–59647, Jan. 2020, doi: https://doi.org/10.1109/access.2020.2983146.
K. Qiao, B. Ma, M. Chen, “Multi-Modal Optimisation of Stacked Chip Macro-Module Layout Based on Memristor-Inspired Evolutionary Game,” In2024 4th International Conference on Electronic Information Engineering and Computer Communication (EIECC), pp. 879–883, Dec. 2024, doi: https://doi.org/10.1109/eiecc64539.2024.10929101.