Personally Identifiable Information Detection Using Natural Language Processing

Authors

  • Aakash Deshmukh Bhilai Institute of Technology, Durg, Chhattisgarh, India
  • Apurv Singh Bhilai Institute of Technology, Durg, Chhattisgarh, India
  • Gaurav Singh Bhilai Institute of Technology, Durg, Chhattisgarh, India
  • Shrey Mittal Bhilai Institute of Technology, Durg, Chhattisgarh, India
  • Shiv Dutta Mishra Bhilai Institute of Technology, Durg, Chhattisgarh, India

Keywords:

Deberta v3, Hyperparameter, Large Language Model (LLM), Natural Language Processing (NLP), Personally Identifiable Information (PII)

Abstract

Nowadays, data is one of the most valuable assets in the world. As technology grows, the value of data also increases. When it's not required to disclose the information to prevent problems like identity theft, financial loss, etc., the need to protect personal information also increases. This report will discuss the approach to refine the detection of Personally Identifiable Information (PII) in diverse text data using advanced Natural Language Processing (NLP) and Transformer models implemented in PyTorch. Other than the primary objective, PII detection can also be used to ensure compliance with data protection regulations across organizations. The methodology involves the development of large language models like DeBERTa v3 to distinguish between PII and non-PII within text data while continuing to be flexible to meet changing regulatory needs. Techniques like hyperparameter tuning are done to optimize its performance. Throughout this project, the primary aim is to contribute to advancing data privacy protection by providing a complete and flexible solution for PII detection in diverse textual datasets.

Author Biographies

Aakash Deshmukh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Under Graduate Student, Department of Computer Science & Engineering

Apurv Singh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Under Graduate Student, Department of Computer Science & Engineering

Gaurav Singh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Under Graduate Student, Department of Computer Science & Engineering

Shrey Mittal, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Under Graduate Student, Department of Computer Science & Engineering

Shiv Dutta Mishra, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Assistant Professor, Department of Computer Science & Engineering

Published

2024-05-20

How to Cite

Aakash Deshmukh, Apurv Singh, Gaurav Singh, Shrey Mittal, & Shiv Dutta Mishra. (2024). Personally Identifiable Information Detection Using Natural Language Processing. Journal of Information Security System and Cyber Criminology Research, 1(2), 1–7. Retrieved from https://matjournals.net/engineering/index.php/JoISSCCR/article/view/443

Issue

Section

Articles