Speaker Diarization Research Paper

2011 Words9 Pages

Abstract—Speaker Diarization is the first step in many audio processing and aims to solve the problem who spoke when. It therefore relies on efficient use of temporal informa- tion from extracted audio features. Since search for solution space is huge and often there is no ground truth available, its a tough research problem. Most of the implementations done by different research groups fail in the case of varying number of speakers or high noise or high background music. In this project we explore the conventional techniques which involves hierarchichical agglomerative clustering and later shift to Integer Linear Programming clustering which gives state of the art results for unsupervised speaker diarization. Keywords: Speaker Diarization, hierarchichical …show more content…

Stephen Shum develpoed a approach based on the successes of factor analysis-based methods in speaker diarization which is been inspired by the total variability subspace to extract the speaker-specfic features on short term segment of speech. In this paper we are going to introduce the speaker diarization technique by using the VOD(Voice Active Detection) and feature extraction of the signal which is based on the MFCC(Mel Frequency Cesptral Coefficients) and later we apply the diarization technique that provides better results compared to that of the previous techniques. In the section 2 we will be explaining about the diarization method and section 3 we will explaining the proposed diarization …show more content…

Speech Active Detection It is been generally used for the detecting the silence part of sound in the audio.The required characteristics of the voice active detector are reliability,robustness,accuracy , adaptation, simplicity and real-time processing. Actually, all the voice detection algorithms work efficiently with high SNR values, but signal get degraded at low SNR Values. In this method we will be using the Shor-term energy(E) , Spectral Flatness Measure(SFM) and the dominant frequency component frame for determining the speech and the silence components this can be done by computing the frequncy coressonding to the maximum spectral magnitude . The energy is the common feature for determining speeech and silence detection but, this feature is not efficient at in noisy environment with low SNR value. SFM is a measure of noiseness of the spectrum. This can be calculated

Open Document