Comparative Analysis of Machine Learning Algorithms for Cyber Threat Attribution
Keywords:
Attribution, Cyber threat, Indicators of Compromise (IOCs), Machine learning, MITRE attack, Tactics, Techniques and Procedures (TTPs), Threat actorAbstract
In the ever-changing cybersecurity landscape, accurately attributing cyber threats is critical. This paper thoroughly investigates the comparative effectiveness of machine learning algorithms for cyber threat attribution. We examine the efficacy of various machine learning models using high-level Indicators of Compromise (IOCs) datasets obtained from a publicly available repository and preprocessing techniques used to ensure data consistency. The research includes the Naive Bayes, K-Nearest Neighbor (KNN), Random Forest, XGBoost, and CATBoost algorithms, whose ability to attribute cyber threats accurately is then evaluated using accuracy, precision, recall, and score. The results show significant variations in algorithm performance, with CATBoost emerging as the most effective, achieving an accuracy of 95.35% and a precision of 96.12%. Feature importance analysis identifies critical Tactics, Techniques, and Procedures (TTPs) such as "Obfuscated Files or Information" and "Tools," which improve the interpretability and effectiveness of cyber threat attribution strategies. This study helps to advance cybersecurity practices by providing empirical insights into the strengths and limitations of machine learning approaches for cyber threat attribution. Elucidating algorithm performance and providing interpretability analysis provide cybersecurity professionals with better tools for identifying, tracking, and attributing cyber threats, strengthening collective defences against evolving adversaries.