A Review on Spam Detection Based on Machine Learning Techniques
Keywords:
Big data, Classification, Machine learning, Opinion mining, Review spam, Web miningAbstract
Reviews found on the internet are a great way to find out how the general public feels about a product or service before buying it, and they are also the single most influential element for customers when making a purchase choice. Manufacturers and retailers pay great attention to client comments and feedback due to the influence they have. Concerns about dishonest people using their review platforms to artificially boost or lower the value of products and services are understandable, given the prevalence of online review platforms. Spammers engage in this activity when they manipulate and poison comments for financial gain by making fake, misleading, or otherwise deceptive comments. It is critical to find ways to identify spam because not all evaluations on the internet are genuine. Several machine learning algorithms can be employed for spam identification by utilizing natural language processing (NLP) to derive useful functions from text. Along with the content itself, this approach can also benefit from proofreading information. This article explores the performance of many methods for categorizing and identifying spam comments, as well as the outstanding machine learning strategies offered to address the problem of comment spam detection.
References
R. Y. K. Lau, S. Y. Liao, R. C.-W. Kwok, K. Xu, Y. Xia, and Y. Li, “Text mining and probabilistic language modeling for online review spam detection,” ACM Transactions on Management Information Systems, vol. 2, no. 4, pp. 1–30, Jan. 2012, doi: https://doi.org/10.1145/2070710.2070716
S. Dixit and A. J. Agrawal, “Survey on review spam detection,” International Journal of Computer and Communication Technology, Jan. 2016, Available: https://www.researchgate.net/publication/347567878_SURVEY_ON_REVIEW_SPAM_DETECTION
M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding deceptive opinion spam by any stretch of the imagination,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA: Association for Computational Linguistics, Jul. 2011. Available: https://aclanthology.org/P11-1032/
V. López, S. del Río, J. M. Benítez, and F. Herrera, “Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data,” Fuzzy Sets and Systems, vol. 258, pp. 5–38, Jan. 2015, doi: https://doi.org/10.1016/j.fss.2014.01.015
R. Katpatal and A. Junnarkar, “An efficient approach of spam detection in Twitter,” 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2018, pp. 1240–1243, doi: https://doi.org/10.1109/ICIRCA.2018.8597208
S. Kamble and S. M. Sangve, “Real time detection of drifted Twitter spam based on statistical features,” 2018 International Conference on Information, Communication, Engineering and Technology (ICICET), Pune, India, 2018, pp. 1–3, doi: https://doi.org/10.1109/ICICET.2018.8533767
K. Badola and M. Gupta, “Twitter spam detection using natural language processing by encoder decoder model,” 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 2021, pp. 402–405, doi: https://doi.org/10.1109/ICAIS50930.2021.9395862
S. Gheewala and R. Patel, “Machine learning based Twitter spam account detection: A review,” 2018 Second International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2018, pp. 79–84, doi: https://doi.org/10.1109/ICCMC.2018.8487992
N. Imam and V. Vassilakis, “Detecting spam images with embedded Arabic text in Twitter,” 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, NSW, Australia, 2019, pp. 1–6, doi: https://doi.org/10.1109/ICDARW.2019.50107
K. U. Santoshi, S. S. Bhavya, Y. B. Sri and B. Venkateswarlu, “Twitter spam detection using naïve Bayes classifier,” 2021 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 2021, pp. 773–777, doi: https://doi.org/10.1109/ICICT50816.2021.9358579
S. Sedhai and A. Sun, “Semi-supervised spam detection in Twitter stream,” in IEEE Transactions on Computational Social Systems, vol. 5, no. 1, pp. 169–175, March 2018, doi: https://doi.org/10.1109/TCSS.2017.2773581
D. Gunawan, R. F. Rahmat, A. Putra and M. F. Pasha, “Filtering spam text messages by using Twitter-LDA algorithm,” 2018 IEEE International Conference on Communication, Networks and Satellite (Comnetsat), Medan, Indonesia, 2018, pp. 1–6, doi: https://doi.org/10.1109/COMNETSAT.2018.8684085
W. Daffa, O. Bamasag and A. AlMansour, “A survey on spam URLs detection in Twitter,” 2018 1st International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, 2018, pp. 1–6, doi: https://doi.org/10.1109/CAIS.2018.8441975
E. Elakkiya, S. Selvakumar and R. L. Velusamy, “CIFAS: Community inspired firefly algorithm with fuzzy cross-entropy for feature selection in Twitter spam detection,” 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2020, pp. 1–7, doi: https://doi.org/10.1109/ICCCNT49239.2020.9225321
J. Choi, B. Jeon, and C. Jeon, “Scalable learning framework for detecting new types of Twitter spam with misuse and anomaly detection,” Sensors, vol. 24, no. 7, p. 2263, Apr. 2024, doi: https://doi.org/10.3390/s24072263
S. Bazzaz Abkenar, M. Haghi Kashani, M. Akbari, and E. Mahdipour, “Learning textual features for Twitter spam detection: A systematic literature review,” Expert Systems with Applications, vol. 228, Oct. 2023, doi: https://doi.org/10.1016/j.eswa.2023.120366
S. K. Maurya, S. Gupta and D. Singh, “Exploring sequential information in Twitter spam detection: A deep learning perspective,” 2025 IEEE International Conference on Computer, Electronics, Electrical Engineering & their Applications (IC2E3), Srinagar Garhwal, India, 2025, pp. 1–6, doi: https://doi.org/10.1109/IC2E365635.2025.11167488
A. S. Shahrak, N. Mikaeilvand, S. J. Mirabedini, S. H. H. S. Javadi, and N. Zaghari, “Ten Rob–CNN: A hybrid feature extraction and fuzzy decision-making approach for Twitter spam detection,” Fuzzy Optimization and Modeling Journal, vol. 6, no. 2, Aug. 2025, doi: https://doi.org/10.57647/j.fomj.2025.0602.09
A. Ali, J. Li, H. Chen, U. A. Bhatti, and A. Khan, “Real-time spammers detection based on metadata features with machine learning,” Intelligent Automation & Soft Computing, vol. 38, no. 3, pp. 241–258, 2023, doi: https://doi.org/10.32604/iasc.2023.041645
R. V. Bandakkanavar, R. Medar, and G. Hegde, “View of a surveyon detection of reviews using sentiment classification of methods,” International Journal on Recent and Innovation Trends in Computing and Communication, vol. 2, no. 2, pp. 310–314, 2025, Available: https://mail.ijritcc.org/index.php/ijritcc/article/view/2962/2962