Explainable AI for Predicting Malware Threat Levels: A SHAP-Enhanced Random Forest Approach
Keywords:
AI transparency, Cyber threat, Explainable AI (XAI), Machine learning, Random Forest, SHAP (SHapley Additive Explanations)Abstract
This project presents an explainable AI (XAI) approach for predicting malware threat levels using a SHAP-enhanced Random Forest model. As malware detection becomes increasingly vital in cybersecurity, the need for transparent and interpretable models grows, especially in high-stakes environments. We address this by integrating SHAP (SHapley Additive exPlanations), a widely recognized explainability technique, with Random Forests—a robust ensemble learning method known for its strong predictive performance in classification tasks. The approach involves training the random forest model on a comprehensive dataset of malware features, where SHAP values are then used to provide detailed, human-understandable explanations for the model's predictions. These explanations identify the most significant features influencing the threat level assessments and offer insights into how different malware characteristics affect model decision-making. The model is evaluated using a benchmark malware dataset, achieving high prediction accuracy while maintaining interpretability. Comparative analysis against traditional black-box models, such as deep learning-based approaches, highlights the effectiveness of our method in balancing predictive performance and explainability. This transparency improves trust in AI-driven security systems, enabling security analysts to better interpret and act on predictions. Through experiments, we demonstrate that the SHAP-enhanced Random Forest approach maintains high predictive accuracy while significantly improving interpretability compared to traditional black-box models. The results show its potential for practical deployment in real-world cybersecurity applications, where both high accuracy and explainability are crucial for timely and informed decision-making in threat management.
References
N. Capuano, G. Fenza, V. Loia, and C. Stanzione, "Explainable artificial intelligence in cybersecurity: A survey," IEEE Access, vol. 10, pp. 93575–93600, Sep. 2022. https://doi.org/10.1109/ACCESS.2022.3204171.
S. Neupane, J. Ables, W. Anderson, S. Mittal, S. Rahimi, I. Banicescu, and M. Seale, "Explainable intrusion detection systems (X-IDS): A survey of current methods, challenges, and opportunities," IEEE Access, vol. 10, pp. 112392-112415, Oct. 2022. https://doi.org/10.1109/ACCESS.2022.3216617.
Tiwari, S., Shrestha, V., and Srivastava, A., "The Role of Explainable AI in Cybersecurity: Addressing Transparency Challenges in Autonomous Defense Systems," Int. J. Innov. Res. Sci. Eng. Technol., vol. 9, no. 3, pp. 718–733, 2020. https://www.ijirset.com/upload/2020/march/165_The.pdf.
M. MS, M. K. Hasan, R. Sulaiman, S. Islam, and A. U. Khan, “An explainable ensemble deep learning approach for intrusion detection in industrial Internet of Things,” IEEE Access, vol. 11, pp. 115047–115061, Oct. 2023. https://doi.org/10.1109/ACCESS.2023.3323573.
C. S. Wickramasinghe, K. Amarasinghe, D. L. Marino, C. Rieger, and M. Manic, "Explainable unsupervised machine learning for cyber-physical systems," IEEE Access, vol. 9, pp. 131824-131843, Sep. 2021. https://doi.org/10.1109/ACCESS.2021.3112397.
A. Kuppa and N. A. Le-Khac, "Adversarial XAI methods in cybersecurity," IEEE Transactions on Information Forensics and Security, vol. 16, pp. 4924-4938, Oct. 2021. https://doi.org/10.1109/TIFS.2021.3117075.
V. Chamola, V. Hassija, A. R. Sulthana, D. Ghosh, D. Dhingra, and B. Sikdar, "A review of trustworthy and explainable artificial intelligence (XAI)," IEEE Access, vol. 11, pp. 78994–79015, Jul. 2023. https://doi.org/10.1109/ACCESS.2023.3294569.
Dib, M., Torabi, S., Bou-Harb, E., & Assi, C. (2021). A multi-dimensional deep learning framework for IoT malware classification and family attribution. IEEE Transactions on Network and Service Management, 18(2), 1165-1177. https://doi.org/10.1109/TNSM.2021.3075315.
D. Saraswat, P. Bhattacharya, A. Verma, V. K. Prasad, S. Tanwar, G. Sharma, P. N. Bokoro, and R. Sharma, "Explainable AI for healthcare 5.0: Opportunities and challenges," IEEE Access, vol. 10, pp. 84486–84517, Aug. 2022. https://doi.org/10.1109/ACCESS.2022.3197671.
Yang, W., Wei, Y., Wei, H., Chen, Y., Huang, G., Li, X., Li, R., Yao, N., Wang, X., Gu, X., and Amin, M. B., "Survey on explainable AI: From approaches, limitations and applications aspects," Human-Centric Intelligent Systems, vol. 3, no. 3, pp. 161–188, Sep. 2023. https://link.springer.com/article/10.1007/s44230-023-00038-y.
Jagatheesaperumal, S. K., Pham, Q. V., Ruby, R., Yang, Z., Xu, C., & Zhang, Z. (2022). Explainable AI over the Internet of Things (IoT): Overview, state-of-the-art, and future directions. IEEE Open Journal of the Communications Society, 3, 2106-2136. https://doi.org/10.1109/OJCOMS.2022.3215676.
Theunissen, M. and Browning, J., "Putting explainable AI in context: institutional explanations for medical AI," Ethics and Information Technology, vol. 24, no. 2, p. 23, Jun. 2022. https://link.springer.com/article/10.1007/s10676-022-09649-8
C. Hwang and T. Lee, "E-SFD: Explainable sensor fault detection in the ICS anomaly detection system," IEEE Access, vol. 9, pp. 140470-140486, Oct. 2021. https://doi.org/10.1109/ACCESS.2021.3119573.
S. Poudyal and D. Dasgupta, "Analysis of crypto-ransomware using ML-based multi-level profiling," IEEE Access, vol. 9, pp. 122532-122547, Aug. 2021. https://doi.org/10.1109/ACCESS.2021.3109260.
A. H. Askr, E. Elgeldawi, H. Aboul Ella, Y. A. Elshaier, M. M. Gomaa, and A. E. Hassanien, “Deep learning in drug discovery: An integrative review and future challenges,” Artif. Intell. Rev., vol. 56, no. 7, pp. 5975–6037, Jul. 2023. https://link.springer.com/article/10.1007/s10462-022-10306-1.
K. Aryal, M. Gupta, M. Abdelsalam, P. Kunwar, and B. Thuraisingham, "A survey on adversarial attacks for malware analysis," IEEE Access, Dec. 18, 2024. https://doi.org/10.1109/ACCESS.2024.3519524.