Unsupervised Learning — Hierarchical Clustering
Unsupervised learning is a technique that is set apart from supervised learning due to the lack of labelled data. Unsupervised learning has data which is not assigned a label, and allows the model to discover patterns on its own. Some examples are clustering, anomaly detection, and neural networks.
Hierarchical Clustering
Clustering is the most common type of unsupervised learning type. This involves finding hidden patterns or groups in the data. Hierarchical clustering builds a multilevel hierarchy of clusters by creating a cluster tree. This is represented by a dendrogram.
There are two general types of hierarchical clustering:
- Agglomerative or “bottom-up”: observations start their own cluster, then pairs of clusters are merged as they move up the hierarchy.
- Divisive or “top-down”: all observations start in one cluster, then split one after the other until the algorithm can no longer find anymore distinctions.
Clustering Techniques
For the algorithm to decide which agglomerative clusters should be combined or which divisive clusters should be split, the algorithm needs to come up with a metric in order to create its clusters. A measure of dissimilarity between distinct sets must be determined. Usually this is done by a measure of distance between pairs of observations.
Several different distance metrics are used depending on the type of algorithm. Non-numeric data has different distance measuring methods than numeric data.