Document Retrieval System

1737 Words7 Pages
This thesis aims to build a tool to perform document retrieval in a journalistic context. This chapter opens with an overview of the research territory, continues to define the technical vocabulary used throughout the thesis, and the theoretical and practical problems this thesis tries to solve. The chapter concludes by giving an overview of the rest of the thesis. Chapter 2 will give an overview of the literature in the research spaces, Chapter 3 will outline the proposed system and Chapter 4 will describe the experiments run in the process of building the system.
1.1 ‘Subjective Objectivity’
According to [DK93], subjective objectivity is one of the core professional values of jour- nalism. They look at many different notions of objectivity.
…show more content…
These documents are called relevant documents. A perfect retrieval system would retrieve only the relevant documents and no irrelevant document. However, a perfect retrieval system is not likely to ever exist because relevance is a subjective matter. [Coo71] give a formal definition

Chapter 1. Introduction 3 of relevance, but in our case, we define a document to be ‘relevant’ if it is a document pertaining to the particular story which we are interested in.
In order to search for documents we must use search terms. The search terms that we use must encapsulate the most information pertaining to the story. This is the first major step of automation in the context of the journalist. No longer will they manually have to pick search terms, a list will be generated, specifically for the story that they wish to work
…show more content…
In practice it is most common for approximate matching models to be used to rank retrieved doc- uments. Vector Space Models (VSMs) were introduced in 1975 [SWY75] and describe a simple model intended for information filtering, information retrieval, indexing and producing relevancy rankings. The are an algebraic model which represent documents as vectors of identifiers. Document identifiers are compared against each other and a metric is calculated for the intended purpose i.e. for filtering, retrieval, indexing or rel- evancy. Probabilistic models treat the process of document retrieval as a probabilistic inference. Probabilistic models were introduced in 1977 [Rob77] with the probability ranking principle. It

More about Document Retrieval System

Open Document