Stemming Algorithm In Hindi Literature

812 Words4 Pages
A Literature Review: Stemming Algorithms for Hindi Language Lalit Kumar M.Tech : Department Of C.S.E B.T.K.I.T Dwarahat, Almora, India rastogi.lalit12@gmail.com Abstract - Stemming is a technique used for extracting root word from the given inflection word. Stemming algorithms comes under the preprocessing step in text mining application and plays significant role in numerous application of Natural Language Processing (NLP). Stemming is also used by web search engines for prefix and suffix removal from the derived word. Stemming provides the way to store similar documents together. The main purpose of stemming is to reduce different grammatical forms / word forms of a word like its noun, adjective, verb, adverb etc. to its root form. This…show more content…
Related Work The Performance of Stemming algorithms is depends upon two things, First is based on results which is produced by the stemmer i.e. light weight stemmer [ ], rule based stemmer [ ] and second is based on resource which is used by stemming algorithm i.e. corpus based algorithm [1], dictionary based algorithm [ ] etc. The very first paper published on stemmer in based on rule based approach which is given by Jolie Lovins in the year 1968. After this researchers started to investigate different-different techniques to extract the root word from a given word. In the same sequence Ananthakrishnan Ramanathan and Durgesh D. Rao published a paper on light weight stemmer in Hindi [ ], this approach is based on a predefine datasets of suffixes which is also developed by authors. Another stemmer is developed by Vishal Gupta published as ‘A rule based stemmer for nouns’ [ ]. This stemmer use set of rules for stemming. Hybrid approach is used by Upendra Mishra and Chandra Prakash in their stemmer named MAULIK [ ]. This hybrid approach is nothing but a combination of brute force approach and suffix removal approach. 2. Stemming…show more content…
The first one is algorithmic based approach and second one approach has rapidly diverged into a different field of research called lemmatization. In general we use algorithmic based approach in which we do not use any linguistic consideration, such as gender, verbal tense etc. These stemming algorithms work on the base of a set of predefine rules depending on some conditions. There are some various algorithms available for stemming. i. Brute force algorithm: In brute force algorithm we use a lookup table which already contains the root words. Brute force algorithm is a very simple approach but the performance of this algorithm is directly proportional to the size of database. Thus it can only stem those words which are contained in database. ii. Rule based algorithm: In this algorithm we use to remove suffixes on the basis of a set of pre-define rules. The only drawback of this algorithm is that the stemming rules may have to be created manually for each language. Some Hindi rules are- If the word ends with rk (taa) or rs (te) , replace with uk

More about Stemming Algorithm In Hindi Literature

Open Document