Word Sense Disambiguation Research Paper

1838 Words8 Pages
Abstract—Among the issues in the information retrieval is the problem of ambiguity in a text document or in a query. This problem occurs when a word has more than one meaning. The presence of ambiguity in the text or query will have a negative impact in the search for such information to query expansion process. Addition of supplementary keywords in query expansion method would be inaccurate without identifying the exact sense of the word. Ambiguous term needs to be disambiguated to avoid this problem. The process of identifying the proper sense is known as word sense disambiguation (WSD). The study of word sense disambiguation in text documents sentences have been carried out by researchers worldwide. However, a study on this issue in Malay…show more content…
Section I which is the introduction provides general overview about the word sense disambiguation and the needs of word sense disambiguation to text document. Meanwhile, Section II describes the previous research that has been done by in the field of word sense disambiguation and categorized by the type of word sense disambiguation technique. Section III is about the previous researches that have been done for word sense disambiguation in Malay document. This paper continues with the proposed method for Malay word sense disambiguation in Section IV and the conclusion is in Section…show more content…
This study is about word sense disambiguation in specific domain. This approach is done by identifying the major sense of ambiguous words from Wordnet. In addition, this method is working by embedding two corpora which is domain specific test corpus (contain target ambiguous words) and a domain-specific auxiliary corpus (obtained by using relevant words from the domain-specific test corpus). This method consist of four key steps which is (1) auxiliary corpus generation; (2) related features extraction (from the auxiliary corpus); (3) test features extraction (from the test corpus); and (4) features integration. This approach has been tested on domain-specific corpora (Sports and Finance) and on one balanced corpus, BNC. However, this approach showed some restrictions when dealing with the general-domain corpus but the obtained results for domain specific corpora were better compare to previous
Open Document