Personally Identifiable Information Detection Using Natural Language Processing

Aakash Deshmukh; Apurv Singh; Gaurav Singh; Shrey Mittal; Shiv Dutta Mishra

Authors

Aakash Deshmukh Bhilai Institute of Technology, Durg, Chhattisgarh, India
Apurv Singh Bhilai Institute of Technology, Durg, Chhattisgarh, India
Gaurav Singh Bhilai Institute of Technology, Durg, Chhattisgarh, India
Shrey Mittal Bhilai Institute of Technology, Durg, Chhattisgarh, India
Shiv Dutta Mishra Bhilai Institute of Technology, Durg, Chhattisgarh, India

Keywords:

Deberta v3, Hyperparameter, Large Language Model (LLM), Natural Language Processing (NLP), Personally Identifiable Information (PII)

Abstract

Nowadays, data is one of the most valuable assets in the world. As technology grows, the value of data also increases. When it's not required to disclose the information to prevent problems like identity theft, financial loss, etc., the need to protect personal information also increases. This report will discuss the approach to refine the detection of Personally Identifiable Information (PII) in diverse text data using advanced Natural Language Processing (NLP) and Transformer models implemented in PyTorch. Other than the primary objective, PII detection can also be used to ensure compliance with data protection regulations across organizations. The methodology involves the development of large language models like DeBERTa v3 to distinguish between PII and non-PII within text data while continuing to be flexible to meet changing regulatory needs. Techniques like hyperparameter tuning are done to optimize its performance. Throughout this project, the primary aim is to contribute to advancing data privacy protection by providing a complete and flexible solution for PII detection in diverse textual datasets.

Author Biographies

Aakash Deshmukh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Under Graduate Student, Department of Computer Science & Engineering

Apurv Singh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Under Graduate Student, Department of Computer Science & Engineering

Gaurav Singh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Under Graduate Student, Department of Computer Science & Engineering

Shrey Mittal, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Under Graduate Student, Department of Computer Science & Engineering

Shiv Dutta Mishra, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Assistant Professor, Department of Computer Science & Engineering

Personally Identifiable Information Detection Using Natural Language Processing

Authors

Keywords:

Abstract

Author Biographies

Aakash Deshmukh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Apurv Singh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Gaurav Singh, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Shrey Mittal, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Shiv Dutta Mishra, Bhilai Institute of Technology, Durg, Chhattisgarh, India

Downloads

Published

How to Cite

Issue

Section

Current Issue