PORTAL USER GUIDE
Hierarchical Clustering (Tree Chart)
The Hierarchical Clustering (Tree Chart) is a node-link diagram that provides a visual condensation of Hierarchical Cluster Analysis output. It is also commonly used in determining the number of clusters and spotting outliers. The tree chart displays the hierarchical structure implied by the similarity matrix and clustered by the linkage rule.The output of this is a graph which shows how similar each of the different areas are when taking into account a range of variables. Closeness to each other on the tree chart suggests greater and greater similarity
To illustrate the use of the Hierarchical Clustering (Tree Chart) tool, we will use a dataset with a number of variables in it that can be related to each other: Income, Inequality and Financial Stress across the Greater Hobart area. To do this:
- Select Greater Hobart as your area
- Select SA2 OECD Indicators: Income, Inequality and Financial Stress 2011 as your dataset, selecting all variables.
Once you have done this, open the Hierarchical Clustering (Tree Chart) tool (Tools → Charts→ Hierarchical Clustering (Tree Chart)) and enter the parameters as listed below:
- Dataset Input: Select a dataset that contains the variables of interest. Select SA2 OECD Indicators: Income, Inequality and Financial Stress 2011.
- Variables: A set of independent variables. Select the following five attributes:
- Median Disposable Income (Synthetic Data)
- Gini Coefficient (Synthetic Data)
- Poverty Rate (Synthetic Data)
- % with no access to emergency money (Synthetic Data)
- % Can’t afford a night out (Synthetic Data)
- Distance Metric: Distance measure to be used. Select euclidean.
- euclidean: “ordinary” straight-line distance between two points in Euclidean space.
- maximum: greatest distance along any coordinate dimension, also known as chessboard distance.
- manhattan: the distance between two points measured along axes at right angles.
- canberra: a measure of similarity and dissimilarity between groups.
- binary: measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.
- minkowski: is a metric in a normed vector space which can be considered as a generalization of both the Euclidean distance and the Manhattan distance.
- Cluster Metric: The agglomeration method (linkage rule) to be used. It is important to note that in every method used, the analysis is processed as a complete-link case. Select complete.
- ward: calculates the increase in the error sum of squares (ESS) after fusing two clusters.
- single: the two closest points from each cluster
- complete: the two furthest points from each cluster
- average: the average of the cluster’s distances is taken whilst compensating for the number of points in that cluster
- mcquitty: the average of the cluster’s distances is taken, not considering the number of points in that cluster.
- median: the inter-cluster median point
- centroid: the inter-cluster mid-point
- Observation Labels: Select a column which contains variables that can be used as labels for each case. Select SA2 Name.
- Chart Title: A title for your chart, this can be left for the default name. Leave this blank.
- Greyscale: Specify whether you would like your graph to be grey-scale (checked) or colour (unchecked). Untick this.
Once you have selected your parameters, click the Run Tool button.
Once you have run the tool, click the Display Output button which appears in the pop-up dialogue box. This should open up a chart tool looking like the one shown below.
The outputs indicate that the SA2s of Hobart are more similar to each other than any are to the SA2 of Mount Wellington, with respect to the variables selected above. They also show that the most similar SA2s are Kingston Beach – Blackmans Bay and Claremont.