Human Speech Production Research

2292 Words10 Pages
Chapter 2
Human Speech Production and Perception

2.1 Human Speech Production
Speech signals are composed of a sequence of sounds. These sounds and the transition between them serve as a symbolic representation of information. The arrangement of sounds (symbols) is governed by the rules of the language. The study of the rules and classification of speech is called phonetics. The purpose of processing speech signals is to enhance and extract information, which is helpful in providing as much knowledge as possible about the signal’s structure i.e., about the way in which information is encoded in the signal.
2.1.1 The mechanism of speech production
Human speech production requires three elements – a power source, a sound source and sound modifiers.
…show more content…
2.2.1 Body size The most obvious influence on pitch that comes to mind is the size of the sound producing apparatus; we can observe from the instruments of the orchestra that smaller objects tend to make higher-pitched sounds, and larger ones produce lower-pitched sounds. Therefore, it is logical to assume that small people would make high sounds, and large people would make low sounds. And this assumption is borne out by the facts, at least to an extent. Baby cries have a fundamental frequency (referred to as f0) of around 500 Hz. Child speech ranges from 250-400 Hz, adult females tend to speak at around 200 Hz on average and adult males around 125 Hz.
Thus, the body size is one of the factors related to f0. On the other hand, we know that big opera singers don't always make low sounds; there are very large sopranos, and some rather short, slender basses. So, body weight and height is not a sole determining factor.
2.2.2 Laryngeal size Perhaps, a factor more relevant to the voice source is the size of the larynx. Men, on average, have a larynx about 40% taller and longer (measured along the axis of the vocal folds) than women, as seen in figure 2.4. Nevertheless, this does not completely explain the difference between male and female fundamental frequency f0; there is a size difference inside the larynx, which fully explains the difference in
…show more content…
In this type of analysis, the speech signal is correlated with a set of orthogonal basis functions, which represent the impulse responses of a set of increasing bandwidth filters. The resulting computation structure is very similar to the tree-structured quadrature filter bank used in speech coding. In fact, the quadrature-mirror filter bank is form of wavelet transform with the output samples of the filters representing the transform coefficients. Due to the variable bandwidth, which is proportional to frequency, the basis functions are simply rescaled and shifted versions of each other in time. One of the important characteristics of wavelet transforms, in addition to their variable bandwidth characteristics, is that they are simultaneously localized in time and frequency which allows them to possess, at the same time, the desirable characteristics of good time and frequency resolution. 2.3.4 Cepstral analysis One of the problems of simple spectral analysis is that the resulting output has elements of both the vocal tract (formants) and its excitation (harmonics). This mixture is often confusing and inappropriate for further analysis, such as speech recognition. Ideally, some method of separating out the effects of the vocal tract and the excitation would be appropriate. Unfortunately, these two speech aspects are convolved together and they cannot

More about Human Speech Production Research

Open Document