Phone Recognition System

731 Words3 Pages

Chapter 1 Overview of Phone Recognition Systems This chapter gives an overview of the state of the art ASR systems used for phone recognition. First the phone recognition problem has been formalized and the basic components of a phone recognition system have been explained. Gaussian Mixture Model based Hidden Markov Mod- els(GMM/HMMs) as acoustic models have been explained in detail here. Finally, Multilayer Perceptron (MLP) Neural Networks have been explained. Their strengths and weaknesses have been explored with respect to using them in the speech recognition framework. 1.1 The Phone Recognition Problem This work focuses on the phone recognition problem in ASR. The phone recognition problem involves mapping a raw speech signal to a sequence …show more content…

Using Bayes Rule, this equation can be rewritten as: W∗ = argmax W P(X|W,M)P(W|M) P(X|M) (1.2) In this equation the term P(X|M) is common for all phone sequence hypotheses W and can 1 Chapter 2. Overview of Phone Recognition Systems 2 be ignored. The term P(X|W,M) is called the acoustic model term while the term P(W|M) is called the language model term. The acoustic and language models are usually independently estimated and the model parameters M are broken down to Ma: the acoustic model parameters, and Ml : the language model parameters. Thus the phone recognition equation becomes: W∗ = argmax W P(X|W,Ma)P(W|Ml) (1.3) The complete Phone Recognition system is shown in 1.1.Eachblockhasbeenexplainedin the next subsections Figure 1.1: The complete Phone Recognition sytem 1.1.1 Feature …show more content…

the Model parameters which maximize the likelihood of generating all the acoustic se- quences in the training data ’D’. The most common acoustic model is the GMM/HMM model. A Hidden Markov Model combines two stochastic processes: An underlying Markov chain of ’states’ and a probability distribution associated with each state, modeled by a Gaussian Mixture. The acoustic model probability P(X|W,M) for a HMM is given by the equation: P(X|W,M) =X S P(X,S|W,Ma) (1.5) ie. P(X|W,M) =X S P(X|S,W,Ma).P(S|W,Ma) (1.6) Here, S = {s1,s2,....,st,.....sT} is a sequence of HMM states andPS is the sum over all possible state sequences for the phone sequence W. Ma has been dropped from equations for the remainder of this subsection,with it always being implied. The terms P(X|S,W) and P(S|W) are separately calculated based on two simplifying assumptions. The first assumption is that observation xt depends only on the state st. This gives us: P(X|S,W) = T Y t=1 P(xt|st) (1.7) The second assumption is the first order Markov assumption ie. the state st depends only on the previous state st−1 giving us: P(S|W) = P(s1) T Y t=2 P(st|st−1) (1.8)

More about Phone Recognition System

Open Document