CuraVox: An AI-Powered Mobile Medical Assistant for Visually Impaired Users Using a Hybrid LLM and  OCR Pipeline

Kruthika G; K. Kavya; K. Swathi; Kayam Sai Krishna; S. K Hiremath; B. K Deshpande

Authors

Kruthika G
K. Kavya
K. Swathi
Kayam Sai Krishna
S. K Hiremath
B. K Deshpande

Abstract

CuraVox is a fully implemented AI-powered mobile pharmaceutical assistant designed for visually impaired users. In order to minimize confusion, the assistant uses VLM and LLM to identify tablets by examining their form, color, and imprints. Barcodes, safety information, and expiration dates are instantly identified via barcodes, QR scanning, and certain systems. Customizable warnings for taking medication on time are part of integrated systems. The technologies to be used include Flutter and React Native for front-end (mobile), TalkBack and VoiceOver for accessibility, and FastAPI and Flask Cloud APIs for back-end Python. Tesseract and Google ML Kit are the AI Stack OCRs. The GPT/LLaMA and speech via text-to-speech (TTS) and speech-to-text (STT) are the LLMs. The application integrates a camera-based optical character recognition (OCR) pipeline, a hybrid large language model (LLM) reasoning layer using Google Gemini 2.0 Flash, and a local Ollama fallback, proactive text-to-speech (TTS) narration via expo-speech, haptic confirmation via expo-haptics; and a drug interaction checker supporting up to five concurrent medications. A prescription image vault, and medication reminders, all within a single React Native / Expo mobile application backed by a Python FastAPI asynchronous REST API. The system auto-scans medicine labels every four seconds, identifies medicine name, dosage, manufacturer, expiry date, clinical insight, safety flags, indications, and side effects, and narrates the complete result aloud without any user interaction. In cloud mode, response latency averages 1.2 seconds with approximately 97% medicine identification accuracy. In offline mode, the system operates with 100% data privacy using on-device Ollama inference. A voice agent module powered by @react-native-voice and a backend Gemini NLU layer enables completely hands-free navigation across all application features. This paper presents the complete system design, architecture, implementation details, API specification, and evaluation results.

References

World Health Organization, World report on vision. Geneva, Switzerland: World Health Organization, 2019.

Ministry of Health and Family Welfare, Government of India, National Blindness and Visual Impairment Survey India, 2019.

R. Balestri, “Gender and content bias in large language models: A case study on Google Gemini 2.0 Flash Experimental,” Frontiers in Artificial Intelligence, vol. 8, Mar. 2025.

Ollama Project, “Open-source local LLM runtime,” 2024.

T. M. Breuel, A. Ul-Hasan, M. A. Al-Azawi, and F. Shafait, “High-performance OCR for printed English and Fraktur using LSTM networks,” in Proceedings of the International Conference on Document Analysis and Recognition, Aug. 2013.

R. F. Abadi et al., “‘Demata 2.0’: An on-device AI assistive technology for the visually impaired integrating YOLOv10 and OCR,” Advances in Sustainable Science, Engineering and Technology, vol. 7, no. 4, p. 02504026, Oct. 2025.

K. Singhal et al., “Large language models encode clinical knowledge,” Nature, vol. 620, Jul. 2023.

K. Saab et al., “Capabilities of Gemini models in medicine,” Arxiv preprint arXiv:2404.18416, May 2024.

FastAPI Documentation, “Async REST APIs with Python,” 2024.

Expo Documentation, “Camera, speech and haptics APIs,” 2024.

React Native, “Cross-platform mobile development,” 2024.

Motor Documentation, “Async MongoDB driver for Python,” 2024.

G. B. Holanda et al., “Development of OCR system on Android platforms to aid reading with a refreshable braille display in real time,” Measurement, vol. 120, pp. 150–168, May 2018.

D. D. Brilli, E. Georgaras, S. Tsilivaki, N. Melanitis, and K. Nikita, “AIris: An AI-powered wearable assistive device for the visually impaired,” Arxiv Preprint Arxiv:2405.07606, 2024.

P. Schmiedmayer et al., “LLMonFHIR,” JACC Advances, vol. 4, no. 6, p. 101780, May 2025.

A. E. Hassanien, R. Y. Rizk, D. Pamucar, A. Darwish, and K.-C. Chang, Proceedings of the 9th International Conference on Advanced Intelligent Systems and Informatics 2023. Cham, Switzerland: Springer International Publishing, 2023.

S. Maity and M. J. Saikia, “Large language models in healthcare and medical applications: A review,” Bioengineering, vol. 12, no. 6, p. 631, Jun. 2025.

Government of India, Digital Personal Data Protection Act 2023. New Delhi, India: Ministry of Law, 2023.

Ngrok Documentation, “Secure tunnels for local development,” 2024.

A. Lavric, C. Beguni, E. Zadobrischi, A.-M. Căilean, and S.-A. Avătămăniței, “A comprehensive survey on emerging assistive technologies for visually impaired persons: Lighting the path with visible light communications and artificial intelligence innovations,” Sensors, vol. 24, no. 15, p. 4834, Jul. 2024.

CuraVox: An AI-Powered Mobile Medical Assistant for Visually Impaired Users Using a Hybrid LLM and OCR Pipeline

Authors

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Current Issue