Language:
English
In:
IEEE/ACM Transactions on Audio, Speech, and Language Processing, November 2017, Vol.25(11), pp.2098-2111
Description:
Automatic speaker verification systems can be spoofed through recorded, synthetic, or voice converted speech of target speakers. To make these systems practically viable, the detection of such attacks, referred to as presentation attacks, is of paramount interest. In that direction, this paper investigates two aspects: 1) a novel approach to detect presentation attacks where, unlike conventional approaches, no speech signal modeling related assumptions are made, rather the attacks are detected by computing first-order and second-order spectral statistics and feeding them to a classifier, and 2) generalization of the presentation attack detection systems across databases. Our investigations on ASVspoof 2015 challenge database and AVspoof database show that, when compared to the approaches based on conventional short-term spectral features, the proposed approach with a linear discriminative classifier yields a better system, irrespective of whether the spoofed signal is replayed to the microphone or is directly injected into the system software process. Cross-database investigations show that neither the short-term spectral processing-based approaches nor the proposed approach yield systems which are able to generalize across databases or methods of attack. Thus, revealing the difficulty of the problem and the need for further resources and research.
Keywords:
Speech ; Feature Extraction ; Speech Processing ; Databases ; Mel Frequency Cepstral Coefficient ; Computational Modeling ; Anti-Spoofing ; Cross-Database ; Presentation Attack Detection ; Spectral Statistics ; Engineering
ISSN:
2329-9290
E-ISSN:
2329-9304
DOI:
10.1109/TASLP.2017.2743340
Source:
IEEE Conference Publications
Source:
IEEE Journals & Magazines
Source:
IEEE Xplore
Source:
IEEE Journals & Magazines
URL:
View record in IEEE Xplore (Access to full text may be restricted)
Bookmarklink