Data Stream Mining And Outlier Analysis

2241 Words9 Pages

Data mining is a system that brings up the light to hidden and valuable information from the data and the facts revealed by data mining which were previously not known, theoretically useful, and of high quality. Data mining offers a means by which we can explores the knowledge in database. Data stream mining and finding outliers are dynamic research areas of data mining. Outlier detection is a division of data mining and has many applications in data stream analysis. This requires consideration from researchers. It is thought that ‘data stream mining and outlier detection’ research has drastically expanded the range of data analysis and will have profound impact on data mining methodologies and applications in the long run. However, there are …show more content…

a normal or Poisson distribution. In case of data streams unconventional knowledge about the data distribution may not be known. Cluster based approaches, such as DBSCAN, CLARANS [14], etc. have been used for outlier detection in different variety of datasets. During identification of clusters, outlier finding is treated as a byproduct. The problem with this approach is that it is mixing of two problems instead of solving each problem individually. Density-Based Approaches adopt a Local Outlier Factor (LOF) for outlier detection [15]. Outlier detection (exception mining, deviation detection, novelty detection, etc.) is a critical issue that has attracted wide interest and different solutions by researchers. There are many outlier detection methods in the literature and in practical use. If data for analysis is given with domain expert-provided labels that can be used to build outlier detection model. This method can be divided into: Supervised, Semi-supervised, and Unsupervised methods, whereas based on assumption, categorization of outlier detection methods are: Statistical methods, Proximity-based methods, and clustering-based methods …show more content…

Disadvantage Model-based approach needs the building of a model, which is often an expensive and difficult task requiring the expertise of a domain expert.
4.2 Connectedness [17]
In application domains where objects are linked (social networks, biological networks), objects with lesser links are considered potential anomalies.
Disadvantage
Connectedness approaches are only defined for datasets with linkage information
4.3 Density-Based [18] Objects in low-density regions of space are marked and treated as

More about Data Stream Mining And Outlier Analysis

Open Document