Unsupervised Learning — Hierarchical Clustering

Mark Subra
2 min readSep 12, 2020

--

Unsupervised learning is a technique that is set apart from supervised learning due to the lack of labelled data. Unsupervised learning has data which is not assigned a label, and allows the model to discover patterns on its own. Some examples are clustering, anomaly detection, and neural networks.

Hierarchical Clustering

Clustering is the most common type of unsupervised learning type. This involves finding hidden patterns or groups in the data. Hierarchical clustering builds a multilevel hierarchy of clusters by creating a cluster tree. This is represented by a dendrogram.

Example of hierarchical clustering visualized

There are two general types of hierarchical clustering:

  • Agglomerative or “bottom-up”: observations start their own cluster, then pairs of clusters are merged as they move up the hierarchy.
  • Divisive or “top-down”: all observations start in one cluster, then split one after the other until the algorithm can no longer find anymore distinctions.
Agglomerative vs divisive hierarchical clustering example

Clustering Techniques

For the algorithm to decide which agglomerative clusters should be combined or which divisive clusters should be split, the algorithm needs to come up with a metric in order to create its clusters. A measure of dissimilarity between distinct sets must be determined. Usually this is done by a measure of distance between pairs of observations.

Several different distance metrics are used depending on the type of algorithm. Non-numeric data has different distance measuring methods than numeric data.

--

--

Mark Subra
Mark Subra

Written by Mark Subra

I am a data scientist having recently graduated from the Flatiron School Immersive Data Science Bootcamp

No responses yet