15.6 Choosing K

This section is under construction. I’m not overly happy with getting different results with the same gap statistic.

15.6.2 Elbow method

Similar to the scree plot, choose the number of clusters that minimizes the within cluster variance.

No real “elbow”.. but \(k=7\) is where I’d say the change point in the slope is at.

15.6.3 Gap statistic

  • This can be used for both hierarchical and non-hierarchical clustering.
  • Compares total intracluster variation with the expected value under a null distribution of no clustering.
  • See Tibshirani et.all for more details.