Robust Speaker Recognition using Spectrogram and CNN Against Replay Attacks
Keywords:
Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Replay attacks, Speaker command recognition, SpectrogramAbstract
Speaker recognition systems benefit various security applications, including access control and authentication. However, these systems are vulnerable to replay attacks, in which an opponent captures and replicates previously recorded speech to trick the system. This study aims to improve the robustness of speaker recognition systems against replay attacks by combining spectrogram analysis and Convolutional Neural Networks (CNN). This process begins by converting speech signals into spectrograms, representing the time-frequency representation of audio signals. Spectrograms capture essential features of the speech signal for speaker identification. Then, CNN architecture is used to extract discriminative features from spectrogram images. The CNN is trained on a dataset of genuine speech samples. The suggested system's performance is evaluated experimentally using benchmark datasets and a range of replay attack scenarios. The findings show that the spectrogram-based technique, when paired with CNNs, effectively mitigates the impact of replay attacks on speaker recognition systems. The genuine speaker recognition system has provided 85.3% average accuracy for the training data taken as test data. The genuine speaker recognition system has provided 76% average accuracy for the independent test data. This system gives an 85% rejection rate for testing genuine models against replay attacks. The new system demonstrates improved accuracy and resilience in real-world circumstances, making it a promising solution for secure and dependable speaker recognition applications in the face of rising security threats.