Spoken Content Retrieval Research Paper

1495 Words6 Pages

Chapter 2: Spoken Content Retrieval 2.1 Introduction Spoken Content Retrieval (SCR) is defined as: ”the task of returning speech media re- sults that are relevant to an information need expressed as a user query” [4]. They claim that it differs from Spoken Document Retrieval (SDR) which is used to refer to the re- trieval techniques of collections having pre-defined document structure, such as stories in broadcast news and was adopted by Text REtrieval Conference (TREC) 1 evaluations. The term Speech Retrieval (SR) was used in the first IR paper to treat the retrieval of spoken content, which explored search of radio news. We believe the three terms deal with the same objective and hence may be used interchangeably. Since the work in this thesis …show more content…

Discussion about approaches that go beyond cascading ASR and IR can be found in [13]. The extent to which it is possible to create an SDR system by indexing the output of an out-of-the-box ASR system using an out-of-the-box IR system will ul- timately depend on the domain of application and the use case, including the user tasks, the complexity and content of the data, the types of queries that users issue to the system and the form of results that they expect to receive in return [8]. When you search for video or audio on YouTube, SoundCloud or some other video or audio sharing websites, you find the retrieved results ranked mainly based upon the title 4of the video and its metadata like description, tags, views, ratings, playlist, additions, flag- ging, embeds, shares, comments, age of video, channel views and subscribers inbound links (links from outside of YouTube pointing to your videos). This, sometimes, makes the retrieved results inaccurate because the title of the video and its metadata can be faked based on keywords known for their high hit rate while the actual content of the video may be irrelevant. This is usually done by dishonest search engine optimization (SEO) …show more content…

Applications involving SUR include browsing broadcast news, voice mail, teleconferences and lectures. SUR can be based on ASR lattice search [15] or phonetic recognition system [16] as we will discuss later. • Spoken Term Detection (STD): The STD task returns instances of particular words (maximum of 5 words) being pronounced within the speech stream. Ex- amples for research done in STD can be found in [17, 18]. • Spoken Topic Detection and Tracking (STDT): STDT is a partially supervised task in which new speech content is regularly arriving (i.e. dynamic) and the goal of the system is to make a judgement about the newly incoming stream. It aims at identifying the topic of the stream or discovering a new topic within it [James et al 2007] . Topic tracking is how these topics develop over time through monitoring the stream of news for subsequent stories on the same topic. The core of most approaches to STDT is computing term overlap between different segments: the more common terms, the more likely those two segments have the same

Open Document