A Full-stack Generative AI-powered Platform for Automated Voice-based Candidate Evaluation

Authors

  • Archana Kale
  • Rohan Mathad
  • Chaitanya Mitkari
  • Tejas Danane
  • Krishna Sadre

Keywords:

Automatic speech recognition (ASR), Cognitive analysis, Emotion detection, Natural language processing (NLP), Voice biomarkers, Web-based interface

Abstract

The evolution of artificial intelligence (AI) and natural language processing (NLP) has led to significant advancements in real-time human-computer interactions. One promising domain is AI-driven real-time voice interview platforms, which integrate automatic speech recognition (ASR), natural language understanding (NLU), and analysis of emotion or cognitive state to evaluate human speech. This review paper presents a critical synthesis of three selected research works: AI-enhanced interview simulation, transforming language education using AI-based speaking practice, and AI-based voice biomarker models for cognitive assessment. Together, these papers demonstrate the transformative potential of AI voice systems in assessing human communication, emotion, cognition, and performance. The review focuses on how these methods contribute to designing a real-time voice-based interview platform for automated evaluation and personalized feedback.

References

B. Nofal, H. Ali, M. Hadi, A. Ahmad, A. Qayyum, A. Johri, A. Al-Fuqaha, and J. Qadir, “AI-enhanced interview simulation in the metaverse: Transforming professional skills training through VR and generative conversational AI,” Computers & Education: Artificial Intelligence, vol. 8, no. 100347, 2025.

J. Du and B. K. Daniel, “Transforming language education: A systematic review of AI-powered chatbots for English as a foreign language speaking practice,” Computers & Education: Artificial Intelligence, vol. 6, no. 100230, 2024.

E. Kiyoshige, S. Ogata, N. Kwon, Y. Nakaoku, C. Hayashi, N. Blaylock, R. Brueckner, V. Subramanian, H. J. O’Connell, Y. Yoshikawa, K. Teramoto, K. Nakatsuka, S. Saito, M. Ihara, M. Takegami, and K. Nishimura, “Developing and testing AI-based voice biomarker models to detect cognitive impairment among community-dwelling adults: A cross-sectional study in Japan,” The Lancet Regional Health–Western Pacific, vol. 59, p. 101598, 2025.

M. U. Islam and B. M. Chaudhry, “A framework to enhance user experience of older adults with speech-based intelligent personal assistants,” IEEE Access, vol. 11, pp. 16683–16698, 2023.

B. Li, C. J. Bonk, C. Wang, and X. Kou, “Reconceptualising self-directed learning in the era of generative AI: An exploratory analysis of language learning,” Computers & Education: Artificial Intelligence, vol. 5, p. 100168, 2024.

J. Lian, C. Zhang, G. K. Anumanchipalli, and D. Yu, “Unsupervised TTS acoustic modeling for TTS with conditional disentangled sequential VAE,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2915–2928, 2023.

F. Nagasawa, S. Okada, T. Ishihara, and K. Nitta, “Adaptive interview strategy based on interviewees’ speaking willingness recognition for interview robots,” IEEE Transactions on Affective Computing, vol. 15, no. 3, pp. 942–957, 2024.

T. Fister and G. K. Thiruvathukal, “Artificial intelligence employment interviews: Examining limitations, biases, and perceptions,” Computer, vol. 57, no. 9, pp. 92–101, 2024.

H. Zhang, “Exploring the impact of AI on human resource management: A case study of organizational adaptation and employee dynamics,” IEEE Transactions on Engineering Management, vol. 71, p. 14991, 2024.

T. Zhang, A. Koutsoumpis, J. K. Oostrom, D. Holtrop, S. Ghassemi, and R. E. de Vries, “Can large language models assess personality from asynchronous video interviews? A comprehensive evaluation of validity, reliability, fairness, and rating patterns,” IEEE Transactions on Affective Computing, vol. 15, no. 3, pp. 1769–1785, 2024.

C. Kim, J. Choi, J. Yoon, D. Yoo, and W. Lee, “Fairness-aware multimodal learning in automatic video interview assessment,” IEEE Access, vol. 11, pp. 122677–122693, 2023.

S. Artiran, P. S. Bedmutha, and P. Cosman, “Analysis of gaze, head orientation, and joint attention in autism with triadic VR interviews,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 32, pp. 760–768, 2024.

Y. Xu, Z. Chen, and M. Dong, “Shaping the fairness journey: The roles of AI literacy, explanation, and interpersonal interaction in AI interviews,” Computers in Human Behavior, vol. 148, p. 112145, 2025.

Y. Qian, X. Gong, H. Huang, Y. Liu, and Y. Zhao, “Layer-wise fast adaptation for end-to-end multi-accent speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1229–1242, 2022.

B. Sisman, J. Yamagishi, S. King, and H. Li, “An overview of voice conversion and its challenges: From statistical modeling to deep learning,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 132–157, 2021.

M. Jayaratne and B. Jayatilleke, “Predicting personality using answers to open-ended interview questions,” IEEE Access, vol. 8, pp. 115345–115355, 2020.

Y.-S. Joo, H. Bae, Y.-I. Kim, H.-Y. Cho, and H.-G. Kang, “Effective emotion transplantation in an end-to-end text-to-speech system,” IEEE Signal Processing Letters, vol. 27, pp. 219–223, 2020.

C. Chen, D. Jiang, J. Peng, R. Lian, Y. Li, C. Zhang, L. Chen, and L. Fan, “Scalable identity-oriented speech retrieval,” IEEE Transactions on Multimedia, vol. 25, pp. 5211–5224, 2023.

J. Yu, J. Zhao, L. Miranda-Moreno, and M. Korp, “Modular AI agents for transportation surveys and interviews: Advancing engagement, transparency, and cost efficiency,” Communications in Transportation Research, vol. 5, p. 100172, 2025.

B. A. Appiah Otoo, K. Osei-Frimpong, and N. Islam, “Do perceived privacy risks of AI matter? A longitudinal study on the drivers of continued use of intelligent voice assistants,” IEEE Transactions on Engineering Management, vol. 72, p. 2521, 2025.

Y. A. Wubet and K.-Y. Lian, “Speaker anonymization for voice biometrics protection using voice conversion and multi-target speaker voice fusion,” IEEE Access, vol. 20, pp. 6046–6057, 2025.

J. Wang, J. Zhang, J. N. Y. Zhu, and L. Bai, “Choose what suits you: The role of relative competency strength in shaping job applicants’ reactions and strategies toward AI-based interviews,” Computers in Human Behavior Reports, vol. 19, p. 100777, 2025.

S. L. King and T. Neal, “Applications of AI-enabled deception detection using video, audio, and physiological data: A systematic review,” IEEE Access, vol. 12, pp. 135207–135240, 2024.

Z.-Y. Sheng, L.-J. Liu, Y. Ai, J. Pan, and Z.-H. Ling, “Voice attribute editing with text prompt,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 33, pp. 1641–1652, 2025.

C. Du, Y. Guo, X. Chen, and K. Yu, “Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3446–3456, 2023.

J. Lu, R. Zheng, Z. Gong, and H. Xu, “Supporting teachers’ professional development with generative AI: The effects on higher order thinking and self-efficacy,” Computers & Education, vol. 17, p. 1267–1277, 2024.

Published

2026-01-22