Developing a Unified Ontology for Cross-sector Data Quality Assessment in AI Applications

Authors

  • Swapnali Pawar

Keywords:

Artificial intelligence, Bias mitigation, Data governance, Data quality, Healthcare AI, Manufacturing AI, Trustworthy AI

Abstract

The effectiveness and trustworthiness of Artificial Intelligence (AI) systems are fundamentally dependent on the quality of the data they consume. As AI applications increasingly span sectors such as healthcare, geospatial intelligence, and smart environments, the lack of unified Data Quality (DQ) standards poses significant challenges in ensuring consistency, transparency, and ethical compliance. Current DQ frameworks are often fragmented, domain-specific, and insufficient for cross-sector integration. This paper proposes a unified ontology-based framework for cross-sectoral data quality assessment tailored to AI applications. The methodology involves a comprehensive review of existing domain-specific DQ ontologies, followed by the design and formalization of a modular and extensible ontology that captures both common and context-specific dimensions of DQ. Techniques such as ontology alignment, semantic reasoning, and rule-based inference are employed to enable automated, context-aware DQ evaluation. The proposed ontology was validated using real-world datasets from healthcare and geospatial domains, demonstrating its ability to detect inconsistencies, assess completeness, and support scalable integration of quality metrics. The results highlight the framework’s potential to enhance interoperability, improve data governance, and provide a standardized foundation for reliable and ethically aligned AI systems.

References

A. C. Ajuzieogu, “AI data quality and bias: Challenges, implications, and solutions in modern machine learning,” Nov. 2024. Available: https://doi.org/10.13140/RG.2.2.25830.02880

A. Kerasidou, “Ethics of artificial intelligence in global health: Explainability, algorithmic bias and trust,” Journal of Oral Biology and Craniofacial Research, vol. 11, no. 4, pp. 612–614, Sep. 2021, doi: https://doi.org/10.1016/j.jobcr.2021.09.004

D. Schwabe, K. Becker, M. Seyferth, A. Klaß, and T. Schaeffter, “The METRIC-framework for assessing data quality for trustworthy AI in medicine: A systematic review,” npj Digital Medicine, vol. 7, Aug. 2024, doi: https://doi.org/10.1038/s41746-024-01196-4

N. Berros, Y. Filaly, F. E. Mendili, and Y. E. B. E. L. Idrissi, “Uncovering data quality issues in big healthcare data: Implications for accurate analytics,” in Artificial Intelligence, Data Science and Applications, Cham: Springer International Publishing, Jan. 2024, pp. 499–505. doi: https://doi.org/10.1007/978-3-031-48573-2_72

M. Mashoufi, H. Ayatollahi, D. Khorasani-Zavareh, and T. T. A. Boni, “Data quality in health care: Main concepts and assessment methodologies,” Methods of Information in Medicine, vol. 62, pp. 5–18, Jan. 2023, doi: https://doi.org/10.1055/s-0043-1761500

K. Lee, N. Weiskopf, and J. Pathak, “A framework for data quality assessment in clinical research datasets,” AMIA Annual Symposium Proceedings, pp. 1080–1089, Apr. 2018, Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC5977591/

I. Taleb, M. A. Serhani and R. Dssouli, “Big data quality: A survey,” 2018 IEEE International Congress on Big Data (BigData Congress), San Francisco, CA, USA, 2018, pp. 166–173, doi: https://doi.org/10.1109/BigDataCongress.2018.00029

S. Gilani, C. Quinn, and J. J. McArthur, “A review of ontologies within the domain of smart and ongoing commissioning,” Building and Environment, vol. 182, Sep. 2020, doi: https://doi.org/10.1016/j.buildenv.2020.107099

W. M. H. M. Nasir, R. B. Abdullah, Y. Y. B. Jusoh and S. B. Abdullah, “Big data analytics quality model in enhancing healthcare organizational performance: A content validity study,” 2023 International Conference on Information Management (ICIM), Oxford, United Kingdom, 2023, pp. 25–30, doi: https://doi.org/10.1109/ICIM58774.2023.00011

D. Ardagna, C. Cappiello, W. Samá, and M. Vitali, “Context-aware data quality assessment for big data,” Future Generation Computer Systems, vol. 89, pp. 548–562, Dec. 2018, doi: https://doi.org/10.1016/j.future.2018.07.014

C. Barba-González, I. Caballero, Á. J. Varela-Vaca, J. A. Cruz-Lemus, M. T. Gómez-López, and I. Navas-Delgado, “BIGOWL4DQ: Ontology-driven approach for big data quality meta-modelling, selection and reasoning,” Information and Software Technology, vol. 167, Mar. 2024, doi: https://doi.org/10.1016/j.infsof.2023.107378

S. Geisler, S. Weber, and C. Quix, “An ontology-based data quality framework for data stream applications,” in 16th Intl. Conf. on Information Quality (ICIQ), Jan. 2011. Available: https://www.researchgate.net/publication/233734172_An_Ontology-based_Data_Quality_Framework_for_Data_Stream_Applications

C. Yılmaz, Ç. Cömert, and D. Yıldırım, “Ontology-based spatial data quality assessment framework,” Applied Sciences, vol. 14, no. 21, Nov. 2024, doi: https://doi.org/10.3390/app142110045

S. S. B. T. Lincy and N. S. Kumar, “An enhanced pre-processing model for big data processing: A quality framework,” 2017 International Conference on Innovations in Green Energy and Healthcare Technologies (IGEHT), Coimbatore, India, 2017, pp. 1–7, doi: https://doi.org/10.1109/IGEHT.2017.8094109

C. Schmidt et al., “Combining visual cleansing and exploration for clinical data,” 2019 IEEE Workshop on Visual Analytics in Healthcare (VAHC), Vancouver, BC, Canada, 2019, pp. 25–32, doi: https://doi.org/10.1109/VAHC47919.2019.8945034

X. Piao, “Comparative analysis of the mental health status IoT assisted monitoring of the elderly under the background of big data,” 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2021, pp. 463–466, doi: https://doi.org/10.1109/ICECA52323.2021.9676107

S. Guggilam, V. Chandola, and A. K. Patra, “Large deviations anomaly detection (LAD) for collection of multivariate time series data: Applications to COVID-19 data,” Journal of Computational Science, vol. 72, Sep. 2023, doi: https://doi.org/10.1016/j.jocs.2023.102101

Published

2025-12-19

How to Cite

Pawar, S. (2025). Developing a Unified Ontology for Cross-sector Data Quality Assessment in AI Applications. Journal of Information Technology and Sciences, 11(3), 21–28. Retrieved from https://matjournals.net/engineering/index.php/JOITS/article/view/2860

Issue

Section

Articles