Essay On Text Mining

700 Words3 Pages

Text mining is the process of extracting high quality information from unstructured or semi structured data. The high quality information refers to the combination of relevancy and novelty. Figure 2 shows the important process of text mining. Figure 2: Text mining process flow
Data Gathering Text mining deals with the unstructured data or semi structured data. The sources of text may be a file, single document, document collection from online and offline both. It may be a form of user commands, web pages, documents, etc. The data i.e. document or collection of documents must be a form of unstructured or semi structured
Text Preprocessing
Text preprocessing is an important task in text mining, information retrieval (IR) and Natural Language …show more content…

1. To reduce the file size of the text documents, because the stop word occurs 20-30% of the total words count in the particular document and the stemming may diminish the indexing size up to 50%.
2. To improve the efficiency and effectiveness of text mining system; stop word has no meanings so it is not useful for mining the text and stemming used for corresponding the related words in a particular document.
The important preprocessing steps in text mining are like tokenization, stop word removal and stemming.
Tokenization
Tokenization is the process of crumbling a stream of textual content in to words, phrases, symbols and some other consequential elements that are called tokens. The main objective of tokenization is the assessment of words in a sentence. Mostly, the process of tokenization happens at the word level. But, it is occasionally tough to describe what is meant by a "word". Commonly a tokenizer requires on simple heuristics, for

More about Essay On Text Mining

Open Document