DocSort AI: Document Governance System for Secure Business Records

Authors

  • Dipti Patil
  • Aditya Prabhash Lal
  • Harsh Santosh Patil
  • Karan Vijay Pendhari

Abstract

In today’s fast-paced digital landscape, both individuals and enterprises struggle to manage an ever-growing volume of identity records, tax forms, and insurance policies across multiple uncoordinated platforms. This system was created to solve this fragmentation by providing an autonomous, highly secure document management ecosystem. Upon upload, the system utilises a local Vision Language Model (VLM) for zero-click categorisation, accurately identifying document types and extracting essential metadata (e.g., names, dates of birth, and unique identifiers) to route files properly into client-specific folders. To prevent cloud storage bloat without compromising text legibility, the platform utilises an adaptive, multi-tiered WebP compression algorithm, dynamically shrinking high-resolution PDFs and images to targeted kilobytes based on their initial size. Furthermore, it prioritises absolute data privacy through Envelope Encryption architecture backed by Google Cloud KMS, ensuring that every user's vault is secured by unique, KMS-wrapped Data Encryption Keys (DEKs). Together, these integrated systems deliver a highly scalable, duplicate-resistant, and confidential archiving solution designed to handle thousands of records natively.

References

G. Bhavana, G.Tarshith, G.Vandana, and D. Chandra Lekha, “Automated ID and certificate data extraction using optical character recognition,” International Journal of Innovative Research in Technology, vol. 12, no. 2, pp. 2164–2170, Jul. 2025.

D. Gautam and V. Saxena, “A smarter way to compress and decompress data for cloud storage,” Journal of Advances in Mathematics and Computer Science, vol. 40, no. 4, pp. 1–12, Mar. 2025.

Narendranaath S R, S. Muralidharan, R. Krishna Sai Ram, and V. Prema, “Automated OCR-based PAN card text extraction system,” International Journal for Research Trends and Innovation (IJRTI), vol. 10, no. 3, pp. b47–b53, Mar. 2025.

S. A. Chavan, “Automation in data processing using file compression techniques,” International Research Journal of Modernization in Engineering Technology and Science (IRJMETS), vol. 7, no. 6, pp. 5004–5008, June 2025.

S. Rao, V. T S, M. M, B. V and C. Gururaj, “Optimal lossless data compression methodology,” 2021 IEEE Mysore Sub Section International Conference (MysuruCon), Hassan, India, 2021, pp. 103–107.

A. I. Julianto, H. A. D. Rimbawa, and Y. D. W. Asnar, “Study and analysis of end-to-end encryption message security using Diffie-Hellman key exchange encryption,” International Journal of Progressive Sciences and Technologies, vol. 42, no. 1, pp. 173–183, Dec. 2023.

J. K. Mandivarapu, E. Bunch, Q. You, and G. Fung, “Efficient document image classification using region-based graph neural network,” arXiv, Jun. 2021.

S. S. Harsha, B. P. N. M. Kumar, R. S. S. R. Battula, P. J. Augustine, S. Sudha and T. Divya., “Text recognition from images using a deep learning model,” 2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Dharan, Nepal, 2022, pp. 926–931.

S. Surana, K. Pathak, M. Gagnani, V. Shrivastava, M. T. R and S. Madhuri G, “Text extraction and detection from images using machine learning techniques: A research review,” 2022 International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2022, pp. 1201–1207.

K. R. Sumana, “Optimized OCR data extraction using custom-trained NLP-NER models for enhanced image analysis,” Journal of Emerging Technologies and Innovative Research, vol. 11, no. 7, pp. 159–162, Jul. 2024.

D. Agarwal, J. J, R. K. Manikandan, N. R. Ramith and V. M L, “Advanced automated document processing using optical character recognition (OCR),” 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, 2024, pp. 1–5.

Published

2026-04-06

How to Cite

Patil, D., Prabhash Lal, A., Santosh Patil, H., & Vijay Pendhari, K. (2026). DocSort AI: Document Governance System for Secure Business Records. Journal of Hacking Techniques, Digital Crime Prevention and Computer Virology, 3(1), 18–24. Retrieved from https://matjournals.net/engineering/index.php/JoHTDCPCV/article/view/3382