ОЦІНЮВАННЯ ОСНОВНОГО ТОНУ У АВТОМАТИЗОВАНІЙ СИСТЕМІ РОЗПІЗНАВАННЯ МОВЦЯ КРИТИЧНОГО ЗАСТОСУВАННЯ

V. V. Kovtun

Pitch Estimation for Automated Speaker Recognition System for Critical Use

Authors

V. V. Kovtun Vinnytsia National Technical University

Keywords:

automated speaker recognition system for critical use, pitch, deep neural network, recurrent neural network, factorial hidden Markov model

Abstract

The article proposes a method for pitch trend estimation, which, unlike existing ones, uses a factorial hidden Markov model optimized with the junction tree algorithm for pitch trend estimation, generalizing information from pitch state detectors based on deep and recurrent neural networks, with which it is allowed precisely to predict a pitch trend using long-term information from speech frames packets, describe the dynamics of the pitch in the time domain and reduce the noise influence on the quality of pitch estimates. Methods for estimating pitch states based on deep and recurrent neural networks and a method for estimating the pitch trend based on the factorial hidden Markov model (FHMM) are developed. A study was carried out to optimize the parameters of the proposed methods for use as part of the automated speaker recognition system for critical use (ASRSCU). In particular, the results of the research make it possible to recommend power-normalized cepstral characteristics as the basis for estimating the pitch by the proposed methods, to apply frames packets with a duration of 10 frames, to use 1024 neurons in the hidden layers of neural networks that implement the proposed methods, and to use 68 states to describe the pitch. The results of the conducted researches of the dependence of the quality of speakers recognition by the ASRSCU from the level of the signal-to-noise ratio (SNR) in the input speech material and the pitch estimates obtained as a result of the work of the created methods, the parameters of which are optimized taking into account the results of the conducted studies, showed that for all levels of SNR the exact pitch estimate is provided by the FHMM method, showing the correct speakers recognition probability by the ASRSCU at a level of 96…99% for the selected test sample.

Author Biography

V. V. Kovtun, Vinnytsia National Technical University

Cand. Sc. (Eng.), Assistant Professor of the Chair of Computer Control Systems

Downloads

PDF (Українська)
Downloads: 147

Abstract views: 242

Published

2018-10-18

How to Cite

[1]

V. V. Kovtun, “Pitch Estimation for Automated Speaker Recognition System for Critical Use”, Вісник ВПІ, no. 4, pp. 61–73, Oct. 2018.

Download Citation

Issue

No. 4 (2018)

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.

License

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Pitch Estimation for Automated Speaker Recognition System for Critical Use

Authors

Keywords:

Abstract

Author Biography

V. V. Kovtun, Vinnytsia National Technical University

Downloads

Published

How to Cite

Issue

Section

Metrics

Downloads

License

Most read articles by the same author(s)

Similar Articles

Language

Make a Submission

Information

Visitors

Current Issue