15.4 Hierarchical clustering

We can visualize clusters calculated using hierarchical methods using dendograms. These are awesome tree-based visualizations, similar to visualizations created for decision trees and random forest models (leafs, nodes, stems, roots).

PMA6 Figure 16.7

PMA6 Figure 16.7

There are two main approaches to linking records into clusters:

Agglomerative is a bottom-up approach. This begins with \(N\) clusters (each observation is a cluster), and at each step two observations that are the most similar are combined until all observations are in a single cluster. Good at identifying small clusters. AKA AGNES (agglomerative nesting)

Devisive is a top-down approach. This begins with one cluster containing all the observations, and splits off cases that are most dissimilar to the rest until each observation is in it’s own cluster. Good at identifying large clusters. Not that common, so will not be discussed further. AKA DIANA (divise analysis).

15.4.1 Linkages

At each step in agglomerative clustering, the clusters with the smallest linkage distance between the two clusters are combined. This linkage distance can be computed using several linkage methods.

Centroid linkage (PMA6 Fig 16.5).

  • The closest two records are combined, and their centroid is calculated. The centroid is calculated as the mean of the length \(p\)-vector, for our 2 dimensional example in the last section this would be \((\bar{p}, \bar{q})\).
    • This centroid point/vector takes the place of the original records in the data set.
  • Then the next closest set of points are combined, and their centroid calculated.
  • The centroid need not be an actual measured observation in the data set.

Single Linkage

  • Each step clusters are combined that contain the closest pair of points, not yet in the same cluster as each other.

Complete Linkage

  • Calculate all pairwise distances between points in each cluster.
  • The distance between clusters is the maximum value of the pairwise differences. I.e. the pair of points in each cluster that are the furthest away from each other.
  • At each step, the two clusters that are separated by the shortest maximum distance are combined.

Ward’s D

  • Minimizes the within-cluster variance.
  • Within cluster variance is calculated as the weighted squared distance between clusters.
    • For the initial step, this is the squared euclidean distance between all points.
  • At each step the pair of clusters that lead to the smallest increase in the total within-cluster variance are merged.

15.4.2 Comparing Linkage methods

In these plots, “height” is a measure of similarity/distance. Also, the longer the “stem” the greater the distance.

Things I picked out from these plots.

  • hum is the last company to be added to any cluster using any methods.
  • rei also seems to stand alone as being different from the others.
  • Ward’s method seems to create more distinct clusters compared to the other methods
  • common companies that are grouped in all methods:
    • nme, hca, ami
    • uk, ald, roh
    • sf, dia, dow
    • gra, hpc, acy, and sometimes mtc.