Explainable Hate Meme Detection Using Multimodal Learning
Abstract
Memes have become a popular form of communication on social media, combining images and text to express opinions, humor, and emotions. While many memes are harmless, some are used to spread hate, discrimination, and offensive stereotypes. Detecting such hate memes is challenging because the meaning often depends on both visual content and textual context, making traditional text-based approaches insufficient. This paper presents an Explainable Hate Meme Detection System (EHMDS) that analyses memes using a multimodal approach. The proposed system processes textual information using transformer-based language models and visual information using deep learning-based image encoders. These features are combined through a cross-modal attention mechanism to identify hateful content more effectively. In addition to classification, the system provides explanations by highlighting important words and image regions that contribute to the final decision.
Experiments conducted on the Facebook Hateful Memes Dataset demonstrate that the proposed system can accurately detect hateful memes while also improving transparency and interpretability. By providing human-readable explanations alongside predictions, the system supports more trustworthy, ethical automated content moderation.
References
R. Prabhu and V. Seethalakshmi, “A comprehensive framework for multi-modal hate speech detection in social media using deep learning,” Scientific Reports, vol. 15, no. 1, Apr. 2025.
D. S. Martinez Pandiani, E. Tjong Kim Sang, and D. Ceolin, “‘Toxic’ memes: A survey of computational perspectives on the detection and explanation of meme toxicities,” Online Social Networks and Media, vol. 47, p. 100317, May 2025
G. Arya, M. Kamrul Hasan, A. Bagwari, “Multimodal Hate Speech Detection in Memes using Contrastive Language-Image Pre-training,” IEEE Access, vol. 12, pp. 22359–22375, Jan. 2024.
A. Radford, J. W. Kim, C. Hallacy, “Learning Transferable Visual Models from Natural Language Supervision,” Arxiv:2103.00020, Feb. 2021,
F. K. Saddozai, S. K. Badri, D. Alghazzawi, A. Khattak, and M. Z. Asghar, “Multimodal hate speech detection: a novel deep learning framework for multilingual text and images,” Peer Journal of Computer Science, vol. 11, p. e2801, Apr. 2025
F. Wu, G. Chen, J. Cao, Y. Yan, and Z. Li, “Multimodal Hateful Meme Classification Based on Transfer Learning and a Cross-Mask Mechanism,” Electronics, vol. 13, no. 14, p. 2780, Jul. 2024
M. B. Kmainasi, A. Hasnat, M. A. Hasan, A. E. Shahroor, and F. Alam, “MemeIntel: Explainable Detection of Propagandistic and Hateful Memes,” arXiv.org, 2025.
D. S. Martinez Pandiani, E. Tjong Kim Sang, and D. Ceolin, “‘Toxic’ memes: A survey of computational perspectives on the detection and explanation of meme toxicities,” Online Social Networks and Media, vol. 47, p. 100317, May 2025
N. Rodis, C. Sardianos, G. T. Papadopoulos, P. Radoglou-Grammatikis, P. Sarigiannidis, and I. Varlamis, “Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions,” Arxiv (Cornell University), Jun. 2023
J. Mei, J. Chen, W. Lin, B. Byrne, and M. Tomalin, “Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning,” Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistic, pp. 5333–5347, 2024
B. Grasso, V. La Gatta, V. Moscato, and G. Sperlì, “KERMIT: Knowledge-EmpoweRed Model in harmful meme detection,” Information Fusion, vol. 106, p. 102269, Jun. 2024
M. S. Hee and R. K.-W. Lee, “Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions,” arXiv.org, 2025.
S. Sun et al., “A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future,” Arxiv.org, 2024
G. Burbi, A. Baldrati, L. Agnolucci, M. Bertini, and D. Bimbo, “Mapping Memes to Words for Multimodal Hateful Meme Classification,” Arxiv (Cornell University), Oct. 2023.
J. Zhang, J. Huang, S. Jin, and S. Lu, “Vision-Language Models for Vision Tasks: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2024
D. Kiela, “The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes,” Arxiv:2005.04790, Apr. 2021.
R. Garg, T. Padhi, H. Jain, U. Kursuncu, and P. Kumaraguru, “Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes,” Arxiv.org, 2024.
M. Tzelepi and V. Mezaris, “Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge,” Arxiv.org, 2025.
A. El-Sayed and O. Nasr, “AAST-NLP at Multimodal Hate Speech Event Detection 2024: A Multimodal Approach for Classification of Text-Embedded Images Based on CLIP and BERT-Based Models,” ACLWeb, Mar. 01, 2024.
J. Mei, J. Chen, G. Yang, W. Lin, and B. Byrne, “Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection,” Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 23817–23839, 2025
H. Lin, Z. Luo, W. Gao, J. Ma, B. Wang, and R. Yang, “Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models,” Proceedings of the ACM Web Conference 2024, pp. 2359–2370, May 2024
G. K. Kumar and K. Nandakumar, “Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features,” Arxiv (Cornell University), pp. 171–183, Jan. 2022
B. Shah, S. Shiwakoti, M. Chaudhary, and H. Wang, “MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification,” Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 17320–17332, 2024.
J. Armenta-Segura, C. Jesús Núñez-Prado, G. O. Sidorov, A. Gelbukh, and R. F. Román-Godínez, “Ometeotl@Multimodal Hate Speech Event Detection 2023: Hate Speech and Text-Image Correlation Detection in Real Life Memes Using Pre-Trained BERT Models over Text,” ACL Anthology, pp. 53–59, Sep. 2023
A. Aziz, M. A. Hossain, and A. N. Chy, “CSECU-DSG@Multimodal Hate Speech Event Detection 2023: Transformer-based Multimodal Hierarchical Fusion Model for Multimodal Hate Speech Detection,” ACL Anthology, pp. 101–107, Sep. 2023
S. Sharma, M. S. Akhtar, P. Nakov, and T. Chakraborty, “DISARM: Detecting the Victims Targeted by Harmful Memes,” Arxiv.org, 2022.
A. Pahud de Mortanges et al., “Orchestrating explainable artificial intelligence for multimodal and longitudinal data in medical imaging,” Npj Digital Medicine, vol. 7, no. 1, pp. 1–10, Jul. 2024