Multiscript Handwritten Receipt Recognition and Information Extraction Using Transformer Architectures

Snehal Ghoparkar; Tarun Parihar; Pranay Pathare; Ashish Patil

Authors

Snehal Ghoparkar
Tarun Parihar
Pranay Pathare
Ashish Patil

Keywords:

Character error rate (CER), OCR (optical character recognition), Transaction digitization, Transformer-based OCR (TrOCR), Word error rate (WER)

Abstract

The digitization of handwritten receipts written in Indian languages presents challenges due to script diversity, handwriting variability, and irregular document layouts. Conventional OCR systems, primarily optimized for printed or English-centric data, often fail to generalize effectively to Indic scripts containing conjunct characters and diacritical modifiers. This study proposes an end-to-end framework for multilingual handwritten receipt recognition and structured transaction extraction. The system integrates a transformer-based OCR model for script-aware text recognition with a semantic processing layer for contextual interpretation of extracted content. Preprocessing techniques are applied to enhance visual clarity under degraded imaging conditions, while schema-guided language modeling converts unstructured OCR output into structured financial records. The framework also supports natural language-based transaction queries for improved usability. Experimental evaluation using character error rate (CER), word error rate (WER), and transaction extraction accuracy demonstrates improved robustness over baseline OCR systems. The proposed solution provides an integrated approach for intelligent receipt digitization in multilingual environments.

References

V. Govindaraju, S. Khedekar, S. Kompalli, F. Farooq, S. Setlur, and R. Vemulapati, “Tools for enabling digital access to multi-lingual Indic documents,” in Proc. Int. Workshop on Document Analysis Systems, Florence, Italy, Jun. 2004.

P. Krishnan, N. Sankaran, A. B. Singh, and C. V. Jawahar, “Towards a robust OCR system for Indic scripts,” in Proc. 11th IAPR Int. Workshop on Document Analysis Systems (DAS), Tours, France, Apr. 2014.

M. Li et al., “TrOCR: Transformer-based optical character recognition with pre-trained models,” in Proc. AAAI Conf. Artificial Intelligence, Washington, DC, USA, Feb. 2023, vol. 37, no. 11, pp. 13094–13102.

J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, Dec. 2017, pp. 4077–4087.

A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learning Representations (ICLR), Virtual, May 2021.

D. Bautista and R. Atienza, “Scene text recognition with permuted autoregressive sequence models,” arXiv preprint, 2022.

T. B. Brown et al., “Language models are few-shot learners,” arXiv preprint, May 2020.

J. Baek et al., “What is wrong with scene text recognition model comparisons? Dataset and model analysis,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, South Korea, Oct. 2019, pp. 4715–4723.

B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298–2304, Nov. 2017.

Y. Xu, M. Li, L. Cui, S. Huang, F. Wei, and M. Zhou, “LayoutLM: Pre-training of text and layout for document image understanding,” in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Diego, CA, USA, Aug. 2020, pp. 1192–1200.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint, Oct. 2018.

R. Smith, “An overview of the Tesseract OCR engine,” in Proc. 9th Int. Conf. Document Analysis and Recognition (ICDAR), Curitiba, Brazil, Sep. 2007, pp. 629–633.

A. Vaswani et al., “Attention is all you need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, Dec. 2017, pp. 5998–6008.

Multiscript Handwritten Receipt Recognition and Information Extraction Using Transformer Architectures

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Current Issue