## 15.1 When is cluster analysis used?

• Biology: species taxonomy. Living things are classified into arbitrary groups based on observed characteristics.
• Types of depression (Andreasen and Grove, 1982, Das-Munshi et al. 2008)
• Properties of dwarf galaxies (Chattopadhyay et.al, 2012)
• Important tool in Data Mining and Marketing research. Consumers can be clustered on the basis of their choice of purchases.
• State level demographics - how do states cluster on measures such as health indicators, racial and socioeconomic disparities.
• Fraud detection in insurance claims
• Credit scoring in banking
• Organize information held in text documents.
• Fantasy football - what other players have similar characteristics to a star player?

### 15.1.1 Data used in this chapter

Suuuuuuper old financial performance data for three different industries in from Forbes in 1981. Bonus points for someone who can get me updated data.

These variables are described in detail in Chapter 9.3 and 16.3 in PMA6. You can download the tab delimited [data file from this link] and the [codebook from this link.]
chem <- read.table("data/Cluster.txt", sep="\t", header=TRUE)

### 15.1.2 Packages used in this chapter

library(factoextra)  # any function that starts with fviz_
library(tidyr)       # for pivot_wider
## library(ggplot2)
## library(gridExtra)   # for multi plot ggplots
## library(dplyr)
## library(cluster)     # for gap statistic
## library(kableExtra)  # side by side tables