Introduction To Cluster Analysis

925 Words4 Pages

The Cluster Analysis is an explorative analysis that tries to identify structures within the data. Cluster analysis is also called segmentation analysis or taxonomy analysis. More specifically, it tries to identify homogenous groups of cases, i.e., observations, participants, respondents. Cluster analysis is used to identify groups of cases if the grouping is not previously known. Because it is explorative it does make any distinction between dependent and independent variables. The different cluster analysis methods that SPSS offers can handle binary, nominal, ordinal, and scale (interval or ratio) data.

The Cluster Analysis is often part of the sequence of analyses of factor analysis, cluster analysis, and finally, discriminant analysis. …show more content…

It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. SPSS offers three methods for the cluster analysis: K-Means Cluster, Hierarchical Cluster, and Two-Step …show more content…

The researcher must to define the number of clusters in advance. This is useful to test different models with a different assumed number of clusters (for example, in customer segmentation). Hierarchical cluster is the most common method. It takes time to calculate, but it generates a series of models with cluster solutions from 1 (all cases in one cluster) to n (all cases are an individual cluster). Hierarchical cluster also works with variables as opposed to cases; it can cluster variables together in a manner somewhat similar to factor analysis. In addition, hierarchical cluster analysis can handle nominal, ordinal, and scale data, however it is not recommended to mix different levels of measurement.

Two-step cluster analysis is more of a tool than a single analysis. It identifies the groupings by running pre-clustering first and then by hierarchical methods. Because it uses a quick cluster algorithm upfront, it can handle large data sets that would take a long time to compute with hierarchical cluster methods. In this respect, it combines the best of both approaches. Also two-step clustering can handle scale and ordinal data in the same model. Two-step cluster analysis also automatically selects the number of clusters, a task normally assigned to the researcher in the two other

Open Document