BhaashaNet: A Real-Time Neural Framework for Indian Sign Language Recognition
Keywords:
CNN-LSTM hybrid, Deep learning, Educational technology, Gesture recognition, Human-Computer Interaction (HCI), Indian Sign Language (ISL), LSTM, Media Pipe Holistic, Multimodal learning, Pose estimation, Real-time systems, Sign Language Recognition (SLR)Abstract
BhaashaNet aims to bridge the communication gap between hearing-impaired individuals and the wider population through a real-time sign language recognition platform. Leveraging a hybrid deep learning architecture that integrates MediaPipe for key point extraction and a stacked LSTM network for temporal modelling, the proposed system accurately interprets Indian Sign Language (ISL) gestures. The framework emphasizes lightweight architecture and real-time responsiveness, which are suitable for deployment on web and mobile platforms. By incorporating spatial and temporal dependencies via CNN-LSTM pipelines, BhaashaNet delivers robust performance in noisy environments, as demonstrated by accuracy and loss convergence graphs. This research is supplemented by comparative analysis across existing frameworks and datasets, including OpenHands and ISL corpus, highlighting its versatility and domain generalization. BhaashaNet is envisioned not only as a recognition system but also as an educational platform to promote inclusivity, learning and awareness of sign language across digital domains.
References
D. Kumari and R. S. Anand, “Isolated Video-Based Sign Language Recognition Using a Hybrid CNN-LSTM Framework Based on Attention Mechanism,” Electronics, vol. 13, no. 7, pp. 1229–1229, Mar. 2024, doi: https://doi.org/10.3390/electronics13071229.
P. Selvaraj, G. NC, P. Kumar, and M. Khapra, “OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages,” arXiv.org, 2021, Available: https://arxiv.org/abs/2110.05877
S. Srivastava, S. Singh, None Pooja, and S. Prakash, “Continuous Sign Language Recognition System Using Deep Learning with MediaPipe Holistic, "Wireless Personal Communications, vol. 137, no. 3, pp. 1455–1468, Jul. 2024, doi: https://doi.org/10.1007/s11277-024-11356-0.
R. Kumar, A. Bajpai, and A. Sinha, “Mediapipe and CNNs for Real-Time ASL Gesture Recognition,”Arxiv (Cornell University), May 2023, doi: https://doi.org/10.48550/arxiv.2305.05296.
B. Dash, “Remote Work and Innovation During this Covid-19 Pandemic: An Employers’ Challenge,” International Journal of Computer Science and Information Technology, vol. 14, no. 2, pp. 13–18, Apr. 2022, doi: https://doi.org/10.5121/ijcsit.2022.14202.
Sincan, Ozge Mercanoglu, J. Julio, S. Escalera, and Keles, Hacer Yalim, “ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results, and Future Research," Thecvf.com, pp. 3472–3481, 2021, Available: https://doi.org/10.48550/arXiv.2105.05066
J. Huang and Varin Chouvatut, “Video-Based Sign Language Recognition via ResNet and LSTM Network,” Journal of Imaging, vol. 10, no. 6, pp. 149–149, Jun. 2024, doi: https://doi.org/10.3390/jimaging10060149.
C. Correia, D. Macêdo, and Cleber Zanchettin, “Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition,” Lecture Notes in Computer Science, pp. 646–657, Jan. 2019, doi: https://doi.org/10.1007/978-3-030-30493-5_59.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of the 2019 Conference of the North, vol. 1, 2019, doi: https://doi.org/10.18653/v1/n19-1423.
N. Pugeault and R. Bowden, "Spelling it out: Real-time ASL fingerspelling recognition," 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 2011, pp. 1114-1119, doi: https://doi.org/10.1109/ICCVW.2011.6130290.
J. Bora, Saine Dehingia, Abhijit Boruah, Anuraag Anuj Chetia, and Dikhit Gogoi, “Real-time Assamese Sign Language Recognition using MediaPipe and Deep Learning,” Procedia Computer Science, vol. 218, pp. 1384–1393, Jan. 2023, doi: https://doi.org/10.1016/j.procs.2023.01.117
A.Singh, S. Arora, P. Shukla and A. Mittal, "Indian Sign Language gesture classification as single or double handed gestures," 2015 Third International Conference on Image Information Processing (ICIIP), Waknaghat, India, 2015, pp. 378-381, doi: https://doi.org/10.1109/ICIIP.2015.7414800