Abstract- Outlier detection is an active area for research in data set mining community. Finding outliers from a collection of patterns is a very well-known problem in data mining. Outlier Detection as a branch of data mining has many applications in data stream analysis and requires more attention. An outlier is a pattern which is dissimilar with respect to the rest of the patterns in the data set. Detecting outliers and analyzing large data sets can lead to discovery of unexpected knowledge in area such as fraud detection, telecommunication, web logs, and web document, etc.
In this case it can be differentiated into an independent file format (represented by raw data dd) and specific file format or vendor based format (represented by EO1 format from Encace). The second one is the digital evidence generated from the live acquisition process represented by the pcap extension as the output of live data capture process by wireshark application, and the third is digital evidence of multimedia file (Audio, Video, Image, Text). Meanwhile, according to  forensic analysis will involve an enormous amount of metadata generated from various types of user and system activities. However, this study is limited to metadata information directly related to the management of digital evidence for the chain of custody. A simple test was conducted to find out whether the proposed Pseudo Metadata concept has fulfilled the digital chain of custody solution.
Data mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. B.2 Introduction The growing popularity and development of data mining technologies bring serious threat to the security of individual's
A. Group Assignment a. Discuss the two data mining methodologies The process of going through massive sets of data looking out for unsuspected patterns which can provide us with advantageous information is known as data mining. With data mining, it is more than possible or helping us predict future events or even group populations of people into similar characteristics. Cross Industry Standard Process for Data Mining (CRISP-DM) is a 6-phase model of the entire data mining process which is commonly used across industries for a wide array of data mining projects and provides a structured approach to planning a data mining project.
Storing and using the large data is not an issue, but getting the appropriate information from that data is quite a difficult job to do. The analysis of that collected data is made possible by many data mining techniques. In data mining we find the relation and patterns between the sets of items of larger relational databases which can help in predicting and improving the performance of the system. The relations between the data in data mining are found by a well-known approach, that is, association rule mining. Many association rules are found that relates the dependency of data on each other.
Sample Extraction The second step involves sample extraction, i.e, features that are extracted from both the original and stego images. The feature that can be extracted and be used to solve our problem is Huffman coding. To extract such kind of information from images a program JPEGSnoop can be used, which is able to work for extended information in image, video and text files. JPEGSnoop is able to extract information such as: • Quality of the image • EXIF information • RGB histogram • Tables of Huffman’s coding Huffman’s coding was designed by David Huffman in 1952. It has two properties - a code with a minimal length, it is not only the prefix code and is therefore uniquely decodable.
The disadvantage is that the reconstructed fingerprint seems to be containing spurious minutiae in high curvature region.This paper proposes a technique which directly reconstructs grayscale image from minutiae template. The major disadvantages are many spurious minutiae; partial fingerprint only can be generated..This paper explains the algorithmwhich can be used to reconstruct the skeleton image from minutiae template which is then converted to grayscale image. This method also generates many spurious minutiae. This paper describes a fingerprint synthesis technique that is based on the 2D FM
This analysis is also used to assist the user in constructing test data by describing the subset of the input domain that causes that path to be executed . Issues related to this symbolic evaluation might be: • Output may be extremely complex and hard to manually recognize as proper formula. • Evaluation might become complex due to the fact that a program variable (an array element) may have another variable embedded in it. • It also might be difficult to verify the correctness of the evaluator. Structured walkthroughs is one of the technique used for testing, Walkthrough involve: • Certain classes of inputs are selected and then the actions of the system for that class of data are traced through the design.
Association Rule Learning (Dependency modeling) is a method that describes associated features in data, searching for relationships between variables. As an example, Web pages that are accessed together can be identified by association analysis. Anomaly Detection (Outlier/change/deviation detection), this class identifies anomalies or outlier data records which cause errors, or might be of interest and requires further investigation. Another class is Clustering, which is the task to discover groups and structures in the data which in some aspect is “similar” or “dissimilar”, without using known structures in the data And the last class, Summarization, attempts to provide a more compact representation of the data set, including visualization and report
4. Data mining: This involves the task of analyzing the dataset and extracting the data patterns using various data mining algorithms like classification, regression, association and clustering. 5. Pattern evaluation and knowledge discovery: A systematic determination of strictly interesting patterns representing knowledge, is done using criteria governed by a set of standards. 6.