PORTAL USER GUIDE
Hierarchical Clustering (Heat Map)
The Hierarchical Clustering (Heat Map) tool creates a heat map which displays rows and columns of hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile filled with a colour scale. The heat map facilitates inspection of matrix by bringing out patterns in the data.
The output of this tool is a graph which shows how similar each of the different areas are when taking into account a range of variables. Closeness to each other on the distance matrix suggests greater similarity.
To illustrate the use of the Hierarchical Clustering (Heat Map) tool, we will use a dataset with a number of variables in it that can be related to each other – In this case we are exploring relative socio-economic advantage and disadvantage in suburban Darwin. Prepare the context by:
- Select the Darwin Suburbs SA3 as your area (Australia → Northern Territory → Greater Darwin → Darwin → Darwin Suburbs).
- Select ABS – Socio-Economic Indexes for Areas (SEIFA) – The Index of Relative Socio-economic Advantage and Disadvantage (SA2) 2016 as your dataset, selecting all variables.
Once you have set up your data, open the Hierarchical Clustering (Heat Map) tool (Tools → Charts → Hierarchical Clustering (Heat Map)). The input fields are as follows:
- Dataset Input: The dataset containing the variables that you would like to run through the tool. Select ABS – Socio-Economic Indexes for Areas (SEIFA) – The Index of Relative Socio-economic Advantage and Disadvantage (SA2) 2016.
- Variables: Check the variables you would like to include in the analysis. Select IRSAD Score, Maximum score for SA1s in area, Minimum score for SA1s in area.
- Distance Metric: Distance measure to be used. Select euclidean.
- euclidean: “ordinary” straight-line distance between two points in Euclidean space.
- maximum: greatest distance along any coordinate dimension, also known as chessboard distance.
- manhattan: the distance between two points measured along axes at right angles.
- canberra: a measure of similarity and dissimilarity between groups.
- binary: measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.
- minkowski: is a metric in a normed vector space which can be considered as a generalization of both the Euclidean distance and the Manhattan distance.
- Cluster Metric: The agglomeration method (linkage rule) to be used. It is important to note that in every method used, the analysis is processed as a complete-link case. Select complete.
- ward: calculates the increase in the error sum of squares (ESS) after fusing two clusters.
- single: the two closest points from each cluster
- complete: the two furthest points from each cluster
- average: the average of the cluster’s distances is taken whilst compensating for the number of points in that cluster
- mcquitty: the average of the cluster’s distances is taken, not considering the number of points in that cluster.
- median: the inter-cluster median point
- centroid: the inter-cluster mid-point
- Observation Labels: Attribute which contains the identifying information of each row to be printed on the chart. Select SA2 name 2016
- Chart Title: Title of the chart to display. Type Hierarchical Clustering Heat Map.
- Matrix Label: Whether axis labels are printed for the matrix displayed in the chart. Tick the box.
- Greyscale: Produce the chart in a greyscale colour scheme. Untick the box.
The input parameters are summarised in the image below, once complete click Run Tool.
Once you have run the tool, click the Display button which appears in the pop-up dialogue box. This should open up a chart tool looking like the one shown below.
The output indicates that the SA2s which are the most similar in terms of socio-economic advantage and disadvantage based on their SEIFA IRSAD scores are: Wulagi and Wanguri.