Smart Water Quality Prediction Using Advanced Machine Learning and Explainable AI

Authors

  • Nishant Khare Postgraduate Student, Department of Computer Science Engineering, LNCT Group of Colleges, Bhopal, Madhya Pradesh, India
  • Priyanka Asthana Assistant Professor, Department of Computer Science Engineering, LNCT Group of Colleges, Bhopal, Madhya Pradesh, India

Keywords:

Extra tree Classifier, LIME analysis, Machine learning, Standard scaler, Voting classifier, Water quality classification

Abstract

The environment has suffered due to the world's population growth, particularly the water quality. Water quality prediction has thus been a significant topic during the last ten years. Current methods are inadequate for high accuracy. This study explores a comprehensive approach to water quality classification, leveraging advanced machine learning techniques for enhanced prediction accuracy and reliability. The data preprocessing phase includes dataset integrity checks, handling missing values, and balancing the imbalanced target variable "Potability" using random oversampling, ensuring fair model training. Standard Scaler was employed for feature scaling, followed by a 70:30 data split for training and testing. Two classification models, extra tree Classifier and Voting Classifier, demonstrated strong performance with an accuracy of 82.5% and Voting Classifier outperformed all models, achieving accuracy 84.17%. Explainable AI (LIME) analysis highlighted key features such as "Sulfate" and "pH" in influencing predictions. Comparative analysis against base models revealed the proposed models' significant improvement, with the Voting Classifier outperforming others across all metrics. These findings underscore the effectiveness of ensemble learning and gradient boosting methods in water quality classification, offering a reliable framework for practical implementation.

References

M. S. Y. Muhammad, M. Makhtar, A. Rozaimee, A. A. Aziz, and A. A. Jamal, "Classification model for water quality using machine learning techniques," Int. J. Softw. Eng. Its Appl., vol. 9, no. 6, pp. 45–52, Jun. 2015.

N. Radhakrishnan and A. S. Pillai, "Comparison of water quality classification models using machine learning," in Proc. 5th Int. Conf. Commun. Electron. Syst. (ICCES), Jun. 2020, pp. 1183–1188. https://doi.org/10.1109/ICCES48766.2020.9137903

W. J. Walley and S. Džeroski, "Biological monitoring: a comparison between Bayesian, neural and machine learning methods of water quality classification," in Environmental Software Systems: Proceedings of the International Symposium on Environmental Software Systems, 1995, Springer US, 1996, pp. 229–240. https://link.springer.com/chapter/10.1007/978-0-387-34951-0_20

N. G. Rezk, S. Alshathri, A. Sayed, and E. El-Din Hemdan, "EWAIS: An ensemble learning and explainable AI approach for water quality classification toward IoT-enabled systems," Processes, vol. 12, no. 12, p. 2771, Dec. 2024. https://doi.org/10.3390/pr12122771

A. Nouraki, M. Alavi, M. Golabi, and M. Albaji, "Prediction of water quality parameters using machine learning models: A case study of the Karun River, Iran," Environ. Sci. Pollut. Res., vol. 28, no. 40, pp. 57060–57072, Oct. 2021. https://link.springer.com/article/10.1007/s11356-021-14560-8

B. Ambade, S. S. Sethi, B. Giri, J. K. Biswas, and K. Bauddh, "Characterization, behavior, and risk assessment of polycyclic aromatic hydrocarbons (PAHs) in the estuary sediments," Bull. Environ. Contam. Toxicol., vol. 108, no. 2, pp. 250–260, Feb. 2022. https://doi.org/10.1007/s00128-021-03393-3

S. Singha, S. Pasupuleti, S. S. Singha, R. Singh, and S. Kumar, "Prediction of groundwater quality using efficient machine learning technique," Chemosphere, vol. 276, pp. 130265, Aug. 2021. https://doi.org/10.1016/j.chemosphere.2021.130265

D. T. Bui, K. Khosravi, J. Tiefenbacher, H. Nguyen, and N. Kazakis, "Improving prediction of water quality indices using novel hybrid machine-learning algorithms," Science of the Total Environment, vol. 721, p. 137612, Jun. 15, 2020. https://doi.org/10.1016/j.scitotenv.2020.137612

H. A. Madni, M. Umer, A. Ishaq, N. Abuzinadah, O. Saidani, S. Alsubai, M. Hamdi, and I. Ashraf, "Water-quality prediction based on H2O AutoML and explainable AI techniques," Water, vol. 15, no. 3, p. 475, Jan. 2023. https://doi.org/10.3390/w15030475

R. Amireddy and P. Dileep, "A comparative study on water quality prediction using machine learning and deep learning techniques," in Proc. 3rd Int. Conf. Distributed Comput. Electr. Circuits Electron. (ICDCECE), Apr. 2024, pp. 1–5. https://doi.org/10.1109/icdcece60827.2024.10548555

S. O. Olatinwo and T. H. Joubert, "Water quality assessment tool for on-site water quality monitoring," IEEE Sensors Journal, vol. 24, no. 10, pp. 16450–16466, Apr. 12, 2024. https://doi.org/10.1109/JSEN.2024.3383887

J. Kirui, "Machine learning models for drinking water quality classification," in Proc. 2024 Int. Conf. Control, Autom. Diagn. (ICCAD), May 2024, pp. 1–5. https://doi.org/10.1109/ICCAD60883.2024.10553712

B. Zhang, S. Sun, Y. Su, and Q. Huang, "Surface water quality monitoring system based on autonomous underwater vehicles," in Proc. 3rd Int. Conf. Electr. Eng. Control Sci. (IC2ECS), Dec. 2023, pp. 1264–1269. https://doi.org/10.1109/IC2ECS60824.2023.10493254

T. Tejaswi, C. Manoj, P. V. Naidu, T. Santhosh, P. V. Akhil, and V. Ganesan, "Nexus of water quality prediction by ANN," in Proc. 2022 Int. Conf. Innovative Comput., Intell. Commun. Smart Electr. Syst. (ICSES), Jul. 2022, pp. 1–5. https://doi.org/10.1109/ICSES55317.2022.9914054

M. Kubus, "Evaluation of resampling methods in the class unbalance problem," Econometrics. Ekonometria. Advances in Applied Data Analytics, vol. 24, no. 1, pp. 39–50, 2020. https://doi.org/10.15611/eada.2020.1.04

P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Machine Learning, vol. 63, no. 1, pp. 3–42, Apr. 2006. http://dx.doi.org/10.1007/s10994-006-6226-1

J. Kaliappan, A. R. Bagepalli, S. Almal, R. Mishra, Y. C. Hu, and K. Srinivasan, "Impact of cross-validation on machine learning models for early detection of intrauterine fetal demise," Diagnostics, vol. 13, no. 10, p. 1692, May 2023. https://doi.org/10.3390/diagnostics13101692

M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, Aug. 2016, pp. 1135–1144. https://doi.org/10.1145/2939672.2939778

H. Sinha, "Benchmarking predictive performance of machine learning approaches for accurate prediction of Boston house prices: An in-depth analysis," Int. J. Res. Anal. Rev. (IJRAR), vol. 11, no. 3, pp. 181–187, 2024. https://www.ijrar.org/papers/IJRAR24C2412.pdf

S. MR and P. K. Vishwakarma, "The assessments of financial risk based on renewable energy industry," Int. Res. J. Mod. Eng. Technol. Sci., vol. 6, no. 09, pp. 758–770, 2024.

S. Arora and P. Khare, "AI/ML-Enabled Optimization of Edge Infrastructure: Enhancing Performance and Security," Int. J. Adv. Res. Sci. Commun. Technol., vol. 4, no. 2, pp. 230–242, 2024. https://ijarsct.co.in/Paper18829.pdf

Published

2025-04-17

Issue

Section

Articles