Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection

frame Teaser

Abstract

Voice spoofing attacks pose a significant threat to automated speaker verification systems. Existing anti-spoofing methods often simulate specific attack types, such as synthetic or replay attacks. However, in real-world scenarios, the countermeasures are unaware of the generation schema of the attack, necessitating a unified solution. Current unified solutions struggle to detect spoofing artefacts, especially with recent spoofing mechanisms. For instance, the spoofing algorithms inject spectral or temporal anomalies, which are challenging to identify. To this end, we present a spectra-temporal fusion leveraging frame-level and utterance-level coefficients. We introduce a novel local spectral deviation coefficient (SDC) for frame-level inconsistencies and employ a bi-LSTM-based network for sequential temporal coefficients (STC), which capture utterance-level artifacts. Our spectra-temporal fusion strategy combines these coefficients, and an auto-encoder generates spectra-temporal deviated coefficients (STDC) to enhance robustness. Our proposed approach addresses multiple spoofing categories, including synthetic, replay, and partial deepfake attacks. Extensive evaluation on diverse datasets (ASVspoof2019, ASVspoof2021, VSDC, partial spoofs, and in-the-wild deepfakes) demonstrated its robustness for a wide range of voice applications.


Citation

Awais Khan, Khalid Mahmood Malik, Shah Nawaz
Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 10761--10765, doi:10.1109/ICASSP48485.2024.10447500, 2024.

BibTeX

@inproceedings{khan2024frame,
    title = {Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection},
    author = {Khan, Awais and Malik, Khalid Mahmood and Nawaz, Shah},
    booktitle = {ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    doi = {10.1109/ICASSP48485.2024.10447500},
    pages = {10761--10765},
    year = {2024}
}