Multidimensional Scaling and Cluste Analysis - Cluster Analysis

Multidimensional Scaling and Cluste Analysis - Cluster Analysis

All scientific fields have the need to cluster or group similar objects. Botanists group plants, historians group events, and chemists group elements and phenomena. It should be no surprise that when marketing managers attempt to become more scientific they should find a need for procedures that will group objects. Actually, the practical applications in marketing for cluster analysis are far too numerous to describe; however, it is possible to suggest by example the scope of this basic technique.
One goal of marketing managers is to identify similar segments so that marketing programs can be developed and tailored to each segment. Thus, it is useful to cluster customers. We might cluster them on the basis of the product benefits they seek. Thus, students could be grouped on the basis of the benefits they seek from a college. We might group customers by their lifestyles. The result could be a group that likes outdoor activities, another that enjoys entertainment, and a third that is into cooking and gardening. Each segment may have distinct product needs and may respond differently to advertising approaches.
We might want to cluster brands or products to determine which brands are regarded as similar and therefore competitive. Brands or products also might be grouped with respect to usage. If two brands or products are found to be bought by the same group of people, a tie-in promotion might be possible.
If a test-market experiment is planned, it might be useful to identify similar cities so that different marketing programs could be compared by trying them in different cities. To identify similar cities we might cluster them on the basis of variables that could contaminate the test, such as size or ethnic composition.
In marketing media decisions, it often is helpful to know which media have similar audiences and which appeal to different audiences.


An Example
A very simple clustering approach, termed quick clustering, will serve to introduce many of the concepts of cluster analysis.7 In a study of gasoline brands, respondents were asked to rate each of 11 brands along a scale from seven (very favorable) to one (unfavorable). Table 18-1 shows the correlations between the brands.
In quick clustering, the highest entry (whether positive or negative) in each column is circled. Then the highest number in all columns is noted (0.64) and the brands involved—two independents, Martin and Owens— form the first cluster. If another brand in either the Martin row or the Owens row has a circled coefficient, it is added to this first cluster. Clark, also an independent but which serves only premium gasoline, qualifies (Owens-Martin-Clark).
The procedure then repeats itself with Owens, Martin, and Clark omitted from the analysis. The highest remaining circled correlation is then 0.48, which pairs Phillips and D-X. In the Phillips row, we find another circled number representing the Shell-Phillips correlation. Thus our second cluster is D-X, Phillips, and Shell. Proceeding in this manner, the two remaining clusters of Mobil-Skelly-Texaco and Standard-Gulf are developed Thus, four clusters emerge from the process.
Figure 18-5 portrays graphically the quick-clustering process. It shows the merging of the brands and the correlation level at which the merging
took place. Such a figure is useful because it provides some feel for the quality of the clustering, that is, the degree to which the brands within the cluster hang together. A simple listing of the four clusters would not provide any indication that Owens and Martin are much more closely linked to each other than to Clark. Figure 18-5 also provides other diagnostic information as the following discussion will make clear.
Cluster analysis is based on some measure of the similarity or proximity of two objects. The clustering procedure itself must be guided by i criterion by which clusters are selected. Given a criterion, one of sever_ methods to select clusters can be used. Each of the elements of cluster analysis will be considered next.


Measures of Similarity
In Table 18-1 the measure of similarity was the familiar correlation coefficient. Actually, any measure that reflects similarity can provide the basis fc: a clustering program. Clearly, the clustering will be only as good as the underlying measure.8

8A common measure of similarity to use is the simple straight-line (or Euclidean) distance Symbolically, the distance measure is

Clustering Criterion

In the quick clustering example, the clustering was based on the pairwise correlation. More typically, there exists some overall measure of the quality of the clustering that guides the program. It could be the average measure of similarity within the clusters. Thus, in the Owen-Martin-Clark cluster, it would be the average of the three involved correlations, 0.64, 0.36, and 0.30. Another measure might be the average similarity within clusters divided by some measure of the average similarity between objects in different clusters. The clustering program attempts to find sets of clusters that yield a high value of the clustering criterion.


Clustering Method
There are two approaches to clustering, a hierarchical approach and a non-hierarchical approach. Hierarchical clustering can start with all objects in one cluster and divide and subdivide them until all objects are in their own single-object cluster. This is called the "top-down" approach. The "bottom-up" approach (as illustrated by quick clustering), in contrast, can start with each object in its own (single-object) cluster and systematically combine clusters until all objects are in one cluster. In either case, the result is an elegant hierarchical arrangement of clusters, as shown at the top of Figure 18-6. When an object is associated with another object in a cluster, it remains clustered with that object. The quick clustering approach emerged with a hierarchical structure, although it was incomplete because it stopped at four clusters and had no mechanism to combine the final four. The reader should attempt to think of such a mechanism.
A nonhierarchical clustering program will differ only in that it will permit objects to leave one cluster and join another as clusters are being formed, if the clustering criterion will be improved by doing so. The bottom half of Figure 18-6 shows the progress of a nonhierarchical clustering program. Note that object 2, for example, is in a cluster with object 1 when there are five clusters, but when there are only four, it is combined with object 3.
Each approach has advantages. Hierarchical clustering emerges as relatively easy to read and interpret. The output has the logical structure that theoretically always should exist. Its disadvantage is that it is relatively unstable and unreliable. The first combination or separation of objects, which may be based on a small difference in the criterion, will constrain the rest of the analysis. In the quick cluster example, the second cluster was based on the 0.48 correlation between Phillips and D-X. Yet, if the correlation between Phillips and Mobil had been only slightly higher, they would have formed the second cluster and the whole analysis would have been different. In doing hierarchical clustering, it is sound practice to split the sample into at least two groups and to do two independent clustering runs to see if the similar clusters emerge in both runs. If they are entirely different, there is an obvious cause for caution.
The advantage of nonhierarchical clustering is that it tends to be more reliable; that is, split-sample runs will tend to look more similar than hierarchical clustering. If the program makes a close decision early in the analysis that subsequently proves to be wrong with respect to the clustering criterion, it can be remedied by moving objects from cluster to cluster. The major disadvantage is that the series of clusters, as illustrated in the bottom half of Figure 18-6, is usually a mess and very difficult to interpret. The fact that it does look messy is sometimes good in that the analysis does not get any false sense of order when none exists. But the fact remains it can be very difficult to work with.

Number of Clusters

A central question in clusters analysis is the determination of the appropriate number of clusters. There are several possible approaches. First, the analyst can specify in advance the number of clusters. Perhaps because of theoretical and logical reasons the number of clusters is known. Or, the analyst may have practical reasons for specifying the number of clusters that derive from the planned use of clusters. Second, the analyst can specify the level of clustering with respect to the cluster criterion. If the cluster criterion is easily interpre table, such as the average within-cluster similarity, it might be reasonable to establish a certain level that would dictate the number of clusters.
A third approach is to determine the number of clusters from the pattern of clusters generated by the program. For example, we might look at Figure 18-5 and conclude that there are ten clusters with only Owens and Martin being combined. Or the correlation of 0.4, which specifies six clusters, might seem like a logical place to break the analysis. The analyst, in looking at the cluster pattern, might look for clusters that are stable over a relatively large range of the clustering criteria.
Whatever approach is used, it usually is useful to look at the total cluster pattern, such as those illustrated in Figures 18-5 and 18-6. They can provide a feel for the quality of the clustering and for the number of clusters that emerge at various levels of the clustering criterion. Usually more than one clustering level is relevant.


Cluster Analysis—A Summary
Application Cluster analysis is used to group variables, objects, or people. For example, people can be grouped into segments.


Input The input is any valid measure of similarity between objects, such as correlations. It also is possible to input the number of clusters or the level of clustering.

Output The output is a grouping of objects into clusters. Usually a series of such groupings is provided, such as those portrayed in Figure 18-5 and 18-6. Associated with each set of clusters will be the value of the clustering criterion. Some programs also output diagnostic information associated with each object. For example, the distances from each object to the center of its cluster and to the center of the next closest cluster are provided. This information can help determine in more depth the cluster cohesion and the level of association between an object and a cluster.


Key Assumptions The most important assumption is that the basic measure of similarity on which the clustering is based is a valid measure of the similarity between objects. A second major assumption is that there is theoretical justification for structuring the objects into clusters. As with other multivariate techniques, there should be theory and logic guiding and underlying cluster analysis.

Limitations It is usually difficult to evaluate the quality of the clustering. There are no standard statistical tests to ensure that the output does not represent pure randomness. The value of the criterion measure, the reasonableness of the output, the appearance of a natural hierarchy (when a non-hierarchical method is used), and the split-sample reliability tests all provide useful information. However, it is still difficult to know exactly which clusters are very similar and which objects are difficult to assign. It is usually difficult to select a clustering criterion and program on any other basis than availability.

Comments

Popular posts from this blog

Office of International Trade

Opportunity bank

FORECASTING EXCHANGE-RATE MOVEMENTS