784 Words4 Pages

Abstract- Outlier detection is an active area for research in data set mining community. Finding outliers from a collection of patterns is a very well-known problem in data mining. Outlier Detection as a branch of data mining has many applications in data stream analysis and requires more attention. An outlier is a pattern which is dissimilar with respect to the rest of the patterns in the data set. Detecting outliers and analyzing large data sets can lead to discovery of unexpected knowledge in area such as fraud detection, telecommunication, web logs, and web document, etc. This paper focuses to clarify the problem with detecting outlier over data stream and specific techniques used for detecting outlier over streaming data in data mining.*…show more content…*

INTRODUCTION Data mining extracts hidden and useful information from the data. Valid, previously unknown, useful and high quality knowledge is discovered by data mining. Outlier detection is an important task in data mining. Outlier detection has many important applications and deserves more attention from data mining community. Outlier detection is an important branch in data pre-processing and data mining, as this stage is required in elaboration and mining of data coming from many application fields such as industrial processes, transportation, ecology, public safety, climatology. Outliers are data which can be considered anomalous due to several causes. Outlier detection techniques are used, for instance, to minimize the influence of outliers in the final model to develop, or as a preliminary pre-processing stage before the information conveyed by a signal is elaborated. On the other hand in*…show more content…*

Statistical Outlier Detection Statistical outlier detection uses certain kind of statistical distribution and computes the parameters by assuming all data points have been generated by statistical distribution. In this approach outliers are points that have a low probability to be generated by the overall distribution Statistical outlier detection technique is also known as parametric approach. This technique is formulated by using the distribution of data point available for processing. Detection model is formulated to fit the data with reference to distribution of data. A Gaussian mixture model was proposed by Yamanishi et. al.[1]. Where each data point is given a formulated score and data point which have a high score declared as outlier. Detecting outlier based on the general pattern within data points was proposed by [2] where it combines a Gaussian mixture model and supervised method Depth based outlier detection [3] is one of the variant of statistical outlier detection. Depth based outlier detection search outliers at the border of the data space bur independent of statistical distributions. These techniques are generally suited quantitative real-valued data sets or quantitative ordinal data distributions. In this approach each data object of dataset represented by an n-D space having a assigned depth. These data points are organized into convex hull layers according to assigned depth and outlier is formulated on the basis of shallow depth values. Outliers are

INTRODUCTION Data mining extracts hidden and useful information from the data. Valid, previously unknown, useful and high quality knowledge is discovered by data mining. Outlier detection is an important task in data mining. Outlier detection has many important applications and deserves more attention from data mining community. Outlier detection is an important branch in data pre-processing and data mining, as this stage is required in elaboration and mining of data coming from many application fields such as industrial processes, transportation, ecology, public safety, climatology. Outliers are data which can be considered anomalous due to several causes. Outlier detection techniques are used, for instance, to minimize the influence of outliers in the final model to develop, or as a preliminary pre-processing stage before the information conveyed by a signal is elaborated. On the other hand in

Statistical Outlier Detection Statistical outlier detection uses certain kind of statistical distribution and computes the parameters by assuming all data points have been generated by statistical distribution. In this approach outliers are points that have a low probability to be generated by the overall distribution Statistical outlier detection technique is also known as parametric approach. This technique is formulated by using the distribution of data point available for processing. Detection model is formulated to fit the data with reference to distribution of data. A Gaussian mixture model was proposed by Yamanishi et. al.[1]. Where each data point is given a formulated score and data point which have a high score declared as outlier. Detecting outlier based on the general pattern within data points was proposed by [2] where it combines a Gaussian mixture model and supervised method Depth based outlier detection [3] is one of the variant of statistical outlier detection. Depth based outlier detection search outliers at the border of the data space bur independent of statistical distributions. These techniques are generally suited quantitative real-valued data sets or quantitative ordinal data distributions. In this approach each data object of dataset represented by an n-D space having a assigned depth. These data points are organized into convex hull layers according to assigned depth and outlier is formulated on the basis of shallow depth values. Outliers are

Related

## Discriminant Analysis Definition

3837 Words | 16 PagesThe discriminant function is a linear variate of metric measurements of two or more independent variables, which are used to explain or forecast a dependent variable. The only difference is that discriminant analysis is apt for researching problems in, which the dependent variable is nominal or nonmetric (categorical). In regression it is utilised when the dependent variable is considered to be metric. Logistic regression is a variant of regression, which has many similarities except for the type of dependent variable. Discriminant analysis can also be compared to “reversing” multivariate analysis of variance also known as (MANOVA).

## Importance Of Model Building

883 Words | 4 PagesThe mining function must be specified when a model is created. A mining function refers the methods for solving data mining problems. This mining function is required for CRETAE_MODEL argument. Mining functions Description Association Is a descriptive mining function. It identifies relationships and the probability of their occurrence within a data set.

## Data Mining In Computer Science

2594 Words | 11 PagesThis huge amount of data needs to be used either for business growth or scientific discoveries. The process of discovering the patterns and relationships in data using the analysis tools is called Data Mining. The simplest form of data mining is as follows: 1. Describing

## SMAA Method Analysis

1065 Words | 5 PagesThe SMAA-3 method applies ELECTRE III type pseudo-criteria in the analysis. The SMAA-O method is the extension of SMAA-2 for treating mixed ordinal and cardinal criteria. The SMAA-P method combines features from prospect theory and the SMAA method. The Ref-SMAA model rank the alternatives using reference points (Tervonen and Lahdelma,

## Multiple Range Test

1109 Words | 5 PagesConstructs are unobservable or latent factors represented by multiple variables. A latent construct is a hypothesized and unobserved concept that can be represented by observable variables. It is measured indirectly by examining consistency among multiple measured variables, also referred to as manifest variables or indicators. SEM’s foundation lies in two familiar multivariate techniques: factor analysis and multiple regression

## The Importance Of Big Data

908 Words | 4 PagesData can come in from many different sources and they can be structured, semi-structured, and even entirely unstructured data sources. It stimulates the generation of heterogeneous, high-dimensional, and nonlinear data with different representation forms, and just preparing it for analysis takes a significant amount of time and effort. However, for Industrial Big Data, there should be two more V’s. One is Visibility, which refers to the discovery of unexpected insights of the existing assets and/or processes and in this way transferring invisible knowledge to visible values. The other V is Value, which put an emphasis on the objective of Industrial Big Data analytics, creating values.

## Big Data Mining Case Study

1723 Words | 7 Pagesare the main challenges in Big Data. Data mining discover patterns from large data set. Data mining with Big data is a complex task. Here HACE theorem is proposed which finds Complex and Evolving relationships among data. It finds the characteristics

## Computer Testing Vs Manual Testing

857 Words | 4 PagesAmong many testing activities, test case generation has become the most challenging and demanding task since it has a strong impact on the effectiveness and efficiency of whole testing process. REFERENCE ORCHESTRATED SURVEY Various techniques have been proposed for generating test data or test cases automatically like fuzzy logic, neural networks, GA, Genetic programming. A lot of work has been done using genetic algorithms for generation of automatic test cases apart from other techniques like structural and behavioural UML based testing, model based testing, structural testing using symbolic execution, random testing etc. These techniques directly or indirectly generate the test cases based on the specifications, source code and design

## Importance Of Data Mining In Telecommunication

1333 Words | 6 PagesOne of the analytical tool for analysis of data is data mining software. Using this software we can analyze data, classify them and summarize the relationships identified. Need of data mining in telecommunication In telecommunication industry, these are the reasons why data mining is used. • For detecting frauds Frauds are serious threats to the telecommunication industry, who will create lose of billions. • For retailing customers The study and research on customer database using data mining tools will help to know how to satisfy our customers.

## Importance Of Association Rule Mining

1463 Words | 6 PagesData Selection Once the data elements are chosen from several sources, it is essential to examine the value of the data. Data samples are accumulated from the sources and data profiling is carried out to recognize the issues of physical data quality. The data which are selected for an object are dependent on the patterns of significance. The data acquired from the sources will be required for three major purposes during the data mining process i.e. training the data mining model, testing it and applying it on the entire

### Discriminant Analysis Definition

3837 Words | 16 Pages### Importance Of Model Building

883 Words | 4 Pages### Data Mining In Computer Science

2594 Words | 11 Pages### SMAA Method Analysis

1065 Words | 5 Pages### Multiple Range Test

1109 Words | 5 Pages### The Importance Of Big Data

908 Words | 4 Pages### Big Data Mining Case Study

1723 Words | 7 Pages### Computer Testing Vs Manual Testing

857 Words | 4 Pages### Importance Of Data Mining In Telecommunication

1333 Words | 6 Pages### Importance Of Association Rule Mining

1463 Words | 6 Pages