A Quality Assurance Framework for a Cross-Industry Standard Process Model in Data Mining Application Development

Authors

  • Sohan Lal Gupta Assistant Professor, Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, India
  • Vikram Khandelwal Assistant Professor, Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, India
  • Vinod Kataria Associate Professor, Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, India
  • Arpita Sharma Assistant Professor, Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, India
  • Anjali Pandey Assistant Professor, Department of Information Technology, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, India
  • Vipin Kumar Gupta Assistant Professor, Department of Electronics and Communication Engineering, Suresh Gyan Vihar University, Jaipur, Rajasthan, India

Keywords:

Application development, Data mining, Framework, Lean six sigma, Model Validation, Process model, Quality assurance methodology

Abstract

Data mining trends have proven highly valuable in recent decades, mostly when integrated with continuous improvement and quality management practices. A common approach is the CRISPDM model, frequently integrated with the slender Six Sigma DMAIC cycle (Define, Measure, Analyze, Improve, and Control). This combination represents best practices for modeling techniques and algorithms to improve quality management activities. This study presents a modern approach for developing data mining applications grounded on the CRISP-DM framework. The model includes six stages, from major scope to maintaining the deployed data mining applications while integrating a quality assurance methodology to address challenges in data mining development, particularly those identified as risks. This approach is informed by real-world knowledge and systematic prose and has been established as universal and constant. The proposed method builds upon the Cross Industry Standard Process Data Mining framework, which already enjoys strong industry support, by incorporating additional quality assurance practices to further enhance its effectiveness.

References

R. Wirth and J. Hipp, “CRISP-DM: Towards a Standard Process Model for Data Mining,” In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, Jan. 2000. Available: https://www.cs.unibo.it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf

J. Heath and C. McGregor, “CRISP-DM0 : A method to extend CRISP-DM to support null hypothesis driven confirmatory data mining.,” Advances in Health Informatics Conference Advances in Health Informatics Conference, pp. 96–101, Jan. 2010,

G. Mariscal, Ó. Marbán, and C. Fernández, “A survey of data mining and knowledge discovery process models and methodologies,” The Knowledge Engineering Review, vol. 25, no. 2, pp. 137–166, Jun. 2010, doi: https://doi.org/10.1017/s0269888910000032

W. Gersten, R. Wirth, and D. Arndt, “Predictive modeling in automotive direct marketing,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’00, 2000, doi: https://doi.org/10.1145/347090.347174

J. Hipp and G. Lindner, “Analysing Warranty Claims of Automobiles,” Lecture notes in computer science, pp. 31–40, Jan. 1999, doi: https://doi.org/10.1007/978-3-540-46652-9_4

C. Catley, K. Smith, C. McGregor, and M. Tracy, “Extending CRISP-DM to incorporate temporal data mining of multidimensional medical data streams: A neonatal intensive care unit case study,” 22nd IEEE International Symposium on Computer-Based Medical Systems, Sep. 2009, doi: https://doi.org/10.1109/cbms.2009.5255394

Wikipedia, “Cross-industry standard process for data mining,” Wikipedia, Apr. 15, 2019. https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining

O. Marbán, J. Segovia, E. Menasalvas, and C. Fernández-Baizán, “Toward data mining engineering: A software engineering approach,” Information Systems, vol. 34, no. 1, pp. 87–107, Mar. 2009, doi: https://doi.org/10.1016/j.is.2008.04.003

D. Braha, Data Mining for Design and Manufacturing. Springer Nature, 2001. doi: https://doi.org/10.1007/978-1-4757-4911-3

N. Padhy, “The Survey of Data Mining Applications and Feature Scope,” International Journal of Computer Science, Engineering and Information Technology, vol. 2, no. 3, pp. 43–58, Jun. 2012, doi: https://doi.org/10.5121/ijcseit.2012.2303.

IBM. SPSS Modeler. Additional resources. Accessed Auguest 17, 2021. Availalbe from https://www.ibm.com/docs/en/spss-modeler/SaaS?topic=overview-additional-resources#crisp_resources

IBM. IBM SPSS Modeler CRISP-DM Guide. IBM Corporation 1994, 2011. Available from https://inseaddataanalytics.github.io/inseadanalytics/crisp_dm.pdf

V. G. Surange, “Implementation of Six Sigma to Reduce Cost of Quality: A Case Study of Automobile Sector,” Journal of Failure Analysis and Prevention, vol. 15, no. 2, pp. 282–294, Feb. 2015, doi: https://doi.org/10.1007/s11668-015-9927-6.

C. G. Skarpathiotaki and K. E. Psannis, “Cross-Industry Process Standardization for Text Analytics,” Big Data Research, vol. 27, p. 100274, Oct. 2021, doi: https://doi.org/10.1016/j.bdr.2021.100274

P. Craiger and S. Shenoi, Advances in Digital Forensics III: IFIP International Conference on Digital Forensics, National Centre for Forensic Science, Orlando, Florida, January 28-January 31, 2007. Springer Science Business Media, 2007. Do i: https://doi.org/10.1007/978-0-387-73742-3

V. Garcia-Rios, M. M. Salhuana, F. Sierra-Liñan, and M. Cabanillas-Carbonell, “Predictive machine learning applying cross industry standard process for data mining for the diagnosis of diabetes mellitus type 2,” IAES International Journal of Artificial Intelligence, vol. 12, no. 4, pp. 1713–1713, Dec. 2023, doi: https://doi.org/10.11591/ijai.v12.i4.pp1713-1726

V. Plotnikova, M. Dumas, and F. Milani, “Applying the CRISP-DM data mining process in the financial services industry: Elicitation of adaptation requirements,” Data & Knowledge Engineering, vol. 139, p. 102013, 2022. https://doi.org/10.1016/j.datak.2022.102013

Published

2025-04-09