Speech Waveform Characteristics

1.7 Speech analysis One of the important characteristics of a speech waveform is the time-varying nature of the content of the speech pressure. Determination of the time-varying parameters of speech is a key area of analysis required in speech research. Another key area is classification of speech waveform segments into voiced or voiceless (mixed excitation is usually considered voiced). As mentioned previously, in the case where speech is voiced, the most important parameter is the fundamental frequency value f0. This section introduces these two areas of analysis and discusses the principles and limitation involved. First, the fundamental frequency f0 analysis is considered, followed by the spectral analysis method of dynamic speech…show more content…
This device produces a two-dimensional pattern called a spectrogram in which the vertical dimension corresponds to frequency and horizontal dimension to time. The 16 bit gray scale level is used to represent the given spectrogram. Even though the colour representation is more visually appealing, it sometimes leads to misleading interpretation of the spectrogram. The darkness of the pattern is proportional to signal energy. Thus, the resonance frequencies of the vocal tract show up as dark bands in the spectrogram .Voiced regions are characterized by a striated appearance due to the periodicity of the time waveform, while the unvoiced intervals more solidly filled in. An example, spectrogram of the utterance of “What do you think about that” of a female speaker (in the Figure 1.5a) is shown in the Figure 1.5b. The spectrogram is labelled corresponds to the labelling of Figure 1.5b, so that the time domain and frequency domain can be correlated. The time scale and frequency resolution of the spectrograph plays a vital role in representation of speech spectral energy. The most rapid changes in time scale occur during the release stages of plosives, which order is of 5-10ms. For individual representation of the harmonics of male speech, a frequency resolution less than the minimum expected f0 for males approximately 50 Hz is required. Consequently, there is a direct trade off to be considered between frequency and time resolution and this can be controlled by altering the bandwidth of the spectrograph’s analysis filter. Usually, this is indicated as wide or narrow based on the relation between the filter’s bandwidth and the f0 of the speech being

