Increase Noise Resistance of the Automatic Speaker Recognition System of Critical Use

Authors

  • T. V. Hryshchuk Vinnytsia National Technical University
  • V. V. Kovtun Vinnytsia National Technical University

Keywords:

automatic speaker recognition system of critical use, i-vectors, PLDA mixture

Abstract

The relevant speaker recognition systems in which i-vector/PLDA modeling is applied to the description of soundtracks synthesize the generalized PLDA model with average parameters on all soundtracks base without their segregation on the noise level. As a result such systems provide the acceptable level of reliability only in the presence of the large training selection, both by quantity, and on duration of soundtracks. Authors suggest to synthesize separate PLDA models for the description of soundtracks with the determined levels the relation signal / noise (RSN) therefore factors which characterize specific features of a speaker’s voice, will be concentrated in the most changeable areas of i-vector space. It is assumed that statistical analysis of the parameters of such variability regions for phonograms with a signal-to-noise ratio determinants will determine the factors that are stable to the noise level in the signal and informative for the speaker's identity recognition. The statistical analysis of parameters of such areas of variability for soundtracks with the determined RSN level allowed to define noise resistant and informative for speaker recognition factors. For the solution of this task analytical expression for PLDA model which parameters are defined only by values of i-vectors, into which it is entered the parameters describing the RSN levels is received. Criterion functions and stages EM-algorithm of training RSN depended PLDA mixture are also synthesized and check of efficiency of the offered models by their comparison with results which show RSN independed PLDA mixture for a certain base of the speaker’s soundtracks is carried out. For complex testing of the proposed theoretical results, the authors formed two test samples of phonograms that differed in the way of making noise into a signal. Experimental results show that the RSN depended PLDA model allows for better results than the RSN independed PLDA model for almost all test data variants, when phonograms from the first set were used for training models. However, when the training of models occurred according to data from the second set, the situation turns out to be the opposite. This can be explained by the fact that the use of phonograms with the three levels of the RSN the formation of the first set of training data provides greater informativity than the second way of obtaining training data.

Author Biographies

T. V. Hryshchuk, Vinnytsia National Technical University

Cand. Sc. (Eng.), Assistant Professor, Assistant Professor of the Chair of Computer Control Systems

V. V. Kovtun, Vinnytsia National Technical University

Cand. Sc. (Eng.), Assistant Professor, Assistant Professor of the Chair of Computer Control Systems

References

[1] М. М. Биков, та В. В. Ковтун, «Оцінювання надійності автоматизованих систем розпізнавання мовців критичного застосування,» Вісник Вінницького політехнічного інституту, № 2, с. 70-76, 2017.
[2] R. Saeidi, and D. A. van Leeuwen, «The Radboud University Nijmegen submission to NIST SRE-2012». [Online]. Available: http://repository.ubn.ru.nl/bitstream/handle/2066/116114/116114.pdf?sequence=1. Accessed on: February 14. 2018.
[3] Y. Shao, and D. Wang, «Robust speaker identification using auditory features and computational auditory scene analysis». [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.151.4921&rep=rep1&type=pdf. Accessed on: February 14. 2018.
[4] J. Pelecanos, and S. Sridharan, «Feature warping for robust speaker verification» [Online]. Available: http://www.isca-speech.org/archive_open/archive_papers/odyssey/odys_213.pdf. Accessed on: February 14. 2018.
[5] М. М. Биков, та В. В. Ковтун, «Використання множини мікрофонів у автоматизованій системі розпізнавання мовця критичного застосування,» Вісник Вінницького політехнічного інституту, № 3, с. 84-91, 2017.
[6] P. Kenny, P. Ouellet, N. Dehak, V. Gupta, and P. Dumouchel, «A study of inter-speaker variability in speaker verification,» IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 5, pp. 980-988, 2008.
[7] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, «Front-end factor analysis for speaker verification,» IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.
[8] C. Bishop, Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
[9] A. Hatch, S. Kajarekar, and A. Stolcke Within-class covariance normalization for SVM-based speaker recognition [Online]. Available: http://www.isca-speech.org/archive/archive_papers/interspeech_2006/i06_1874.pdf. Accessed on: February 14, 2018.
[10] T. Hasan, and John H. L. Hansen, «Maximum likelihood acoustic factor analysis models for robust speaker verification in noise,» IEEE Transactions on Audio, Speech And Language Processing, vol. 22, no. 2, pp. 381-391, 2014.
[11] Y. Lei, L. Burget, and N. Scheffer, «A noise robust i-vector extractor using vector Taylor series for speaker recognition,» Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6788–6791. 21 October 2013. 2013. DOI: 10.1109/ICASSP.2013.6638976.
[12] Y. Lei, L. Burget, L. Ferrer, M. Graciarena, and N. Scheffer, «Towards noise-robust speaker recognition using probabilistic linear discriminant analysis,» Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4253-4256. 31 August 2012. 2012. DOI: 10.1109/ICASSP.2012.6288858.
[13] N. Li, and M. W. Mak, «SNR-invariant PLDA modeling in nonparametric subspace for robust speaker verification,» IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 23, no. 10, pp. 1648-1659, 2015.
[14] T. Hasan, and J. Hansen, «Acoustic factor analysis for robust speaker verification,» IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 4, pp. 842-853, 2013.
[15] D. Martinez, L. Burget, T. Stafylakis, Y. Lei, P. Kenny, and E. Lleida, «Unscented transform for i-vector-based noisy speaker recognition,» Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4070-4074. 14 July 2014. 2014. DOI: 10.1109/ICASSP.2014.6854361.
[16] M. McLaren, Y. Lei, N. Scheffer, and L. Ferrer, «Application of convolutional neural networks to speaker recognition in noisy conditions». [Online]. Available: https://pdfs.semanticscholar.org/f6b0/984d6289acdb87139f1ca4abc42d31cb24fc.pdf. Accessed on: February 14, 2018.
[17] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, «Speaker verification using adapted Gaussian mixture models,» Digital Signal Processing, vol. 10, no. 1-3, pp. 19-41, 2000.
[18] L. Rabiner, and B. H. Juang Fundamentals of Speech Recognition. NJ, USA: Prentice-Hall International, Inc., 1993.
[19] А. О. Береза, М. М. Биков, та В. В. Ковтун, «Оптимізація алфавіту інформативних ознак для автоматизованої системи розпізнавання мовців критичного застосування,» Вісник Хмельницького національного університету, серія: Технічні науки, № 3 (249), с. 222-228, 2017.
[20] M. W. Mak, and H. B. Yu, «A study of voice activity detection techniques for NIST speaker recognition evaluations,» Computer, Speech and Language, vol. 28, no. 1, pp. 295-313, 2013.
[21] D. Garcia-Romero, and C. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems. [Online]. Available: http://www.isr.umd.edu/Labs/SCL/publications/conference/dgromero_is11_lnorm_final.pdf. Accessed on: February 14. 2018.
[22] R. Saeidi, and D. A. van Leeuwen. The Radboud University Nijmegen submission to NIST SRE-2012. [Online]. Available: http://repository.ubn.ru.nl/bitstream/handle/2066/116114/116114.pdf?sequence=1. Accessed on: February 14, 2018.

Downloads

Abstract views: 198

Published

2018-02-28

How to Cite

[1]
T. V. Hryshchuk and V. V. Kovtun, “Increase Noise Resistance of the Automatic Speaker Recognition System of Critical Use”, Вісник ВПІ, no. 1, pp. 98–111, Feb. 2018.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.

Most read articles by the same author(s)

1 2 > >>