Essay On Speech Recognition

809 Words4 Pages

ABSTRACT In speech recognition, speaker-dependence of a speech recognition system comes from the speech feature, and the variation of vocal tract shape is the major source of inter-speaker variations of the speech feature. Speaker normalization is a process to transform the short-time speech feature of a given speaker to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency of the short-time spectrum associated with a speech is rescaled or warped to normalize the speech. In this work, we develop a speaker normalization scheme by exploiting the fact that frequency domain transformations can be accomplished entirely in the cepstral domain through the …show more content…

The reason of speaker-dependence of speech signal is very complicated. It is not only related to the physical differences of speakers, such as vocal tract shape, but also related to the linguistic differences, such as accent, dialect or even the mental state of the speaker. But researchers agree that one of the major source of inter-speaker variance is the vocal tract shape, especially the vocal tract length (VTL). Therefore, one of the most popular speaker normalization schemes is vocal tract length normalization (VTLN). Many researchers have been working on the VTLN via frequency warping (FWP) in order to compensate for the speaker variation. In a typical implementation of VTLN, a digitally sampled utterance is windowed to isolate a short segment, then analyzed with the FFT to obtain the short-time spectrum. Normalization is achieved by warping the frequency axis of the short-time spectrum using a suitable

Open Document