Tutorial: griffith university south bank campus tutorial

In this tutorial we will investigate a range of variables and concepts across the southern and western parts of the Brisbane metropolitan region, including Logan and Ipswich. Incorporating a range of datasets and using a variety of tools, we will view this study area through the following themes:

  • Aboriginal and Torres Strait Islander populations

  • Demographics and Socio-economics

  • Employment, Income and Education

  • Movement and Transport

  • Housing, Health, and Wellbeing

Accordion

Setting Up Your Workspace

Setting Up Your Workspace

The first thing we need to do is set up a study area. For today’s tutorial, we will create a study area that includes the southern and western parts of the Brisbane metropolitan region.

There is already comprehensive documentation within the AURIN User Guides on how to select an area, so we will not go into great depth here. Nonetheless, we’ll include some screenshots of the process to guide you along the way.


  1. Open the Area Selection window, click the Search tab and search for Brisbane – South – SA4 (2016) – 303
  2. Once selected, click Done
  3. On the right of the Bounding Box line, click the spanner and click Select Current Map View
  4. Drag the sides of the active bounding box so that encompasses Logan, Beenleigh, Yatala and Ipswich, roughly matching the image below.
  5. Click the Bounding Box line to lock in your area

Search Area Selection

 

Select Current Map View Option

 

Bounding Box Modification

 

Final Area Selection


 

Theme One: Aboriginal and Torres Straight Islander Population datasets

DatasetVariable
SA1-based B01 Selected Person Characteristics by Sex as at 2011-08-11SA1 Code
Indigenous Persons Total Persons
Total Persons
SA1-P09 Country of Birth by Sex-Census 2016*SA1 2016 11-digit identifier
Australia Persons
Canada Persons
China excl SARs and Taiwan Persons
Croatia Persons
Egypt Persons
Fiji Persons
Germany Persons
Greece Persons
Hong Kong SAR of China Persons
India Persons
Indonesia Persons
Iran Persons
Iraq Persons
Ireland Persons
Italy Persons
Japan Persons
Korea Republic of South Persons
Lebanon Persons
Malaysia Persons
Malta Persons
Netherlands Persons
New Zealand Persons
Pakistan Persons
Philippines Persons
Poland Persons
Singapore Persons
South Africa Persons
Sri Lanka Persons
Thailand Persons
The Former Yugoslav Republic of Macedonia Persons
Turkey Persons
United Kingdom Channel Islands and Isle of Man Persons
United States of America Persons
Vietnam Persons
Zimbabwe Persons

*This dataset is also required for Theme Two: Demographics and Socio-economics


Theme Two: Demographics and Socio-economics datasets

DatasetVariable
SA1-P09 Country of Birth by Sex-Census 2016*SA1 2016 11-digit identifier
Australia Persons
Canada Persons
China excl SARs and Taiwan Persons
Croatia Persons
Egypt Persons
Fiji Persons
Germany Persons
Greece Persons
Hong Kong SAR of China Persons
India Persons
Indonesia Persons
Iran Persons
Iraq Persons
Ireland Persons
Italy Persons
Japan Persons
Korea Republic of South Persons
Lebanon Persons
Malaysia Persons
Malta Persons
Netherlands Persons
New Zealand Persons
Pakistan Persons
Philippines Persons
Poland Persons
Singapore Persons
South Africa Persons
Sri Lanka Persons
Thailand Persons
The Former Yugoslav Republic of Macedonia Persons
Turkey Persons
United Kingdom Channel Islands and Isle of Man Persons
United States of America Persons
Vietnam Persons
Zimbabwe Persons
Total Persons
ABS - Index of Household Advantage and Disadvantage (IHAD) (SA1) 2016**SA1 11-Digit Code 2016
Percentage Of Households In The IHAD (a): Quartile 1
Percentage Of Households In The IHAD (a): Quartile 2
Percentage Of Households In The IHAD (a): Quartile 3
Percentage Of Households In The IHAD (a): Quartile 4
SA1 SEIFA 2016 - The Index of Relative Socio-economic Advantage and Disadvantage (IRSAD)SA1 11-digit code 2016
Index Score
Usual resident population
SA1-G01 Selected Person Characteristics by Sex-Census 2016SA1 11-digit code 2016
Persons aged 15-19 years
Persons aged 19 - 24 years
Persons aged 65-74 years
Persons aged 75 - 84 years
Persons aged 85 years and over
Total Persons

*This dataset is also required for Theme One: Aboriginal and Torre Strait Islander Populations

** This dataset is also required for Theme Four: Movement and Transport


Theme Three: Employment, Income, and Education

DatasetVariable
SA1-G43b Labour Force Status by Age by Sex-Census 2016SA1 2016 11-digit identifier
Persons Total unemployed Total
Persons Total employed Total

Persons Not in the labour force Total
Persons Total Total
SA1-P02 Selected Medians and Averages-Census 2016*SA1 2016 11-digit identifier
Median total household income weekly
Median rent weekly
Median mortgage repayment monthly
SA3 Aggregated Population & Dwelling Counts 2016 Census for Australia**SA3 Code
Total Usual Resident Population 2016
School Profile (point) 2008-2016**Unique Identifier
Geometry Field
Calendar Year: 2016
School Sector: Government
School Type: Combined, Primary, Secondary

**These datasets require you to switch to a different area selection. See the Exercise for details.


Theme Four: Movement and Transport

DatasetVariable
SA1-P44 Method of Travel to Work by Sex-Census 2016SA1 2016 11-digit identifier
One method Bicycle Persons
One method Bus Persons
One method Car as driver Persons
One method Car as passenger Persons
One method Ferry Persons
One method Train Persons
One method Walked only Persons
Total Persons
OpenStreetMap - Lines (Australia) 2018*Unique Identifier
Geometry Field
Highway
Department of Health - National Toilet Map - June 2018*ToiletID
Geometry Field
MB Mesh Block 2016 Census for Australia*Mesh Block Code
Geometry Field
Mesh Block Category
Total Usual Resident Population
SA1-P02 Selected Medians and Averages-Census 2016**SA1 2016 11-digit identifier
Median total household income weekly
Median rent weekly
Median mortgage repayment monthly
VAMPIRE 2016 (SA1) for Australian Capital CitiesSA1 Code 2016
VAMPIRE Score
ABS - Index of Household Advantage and Disadvantage (IHAD) (SA1) 2016***SA1 11-Digit Code 2016
Percentage Of Households In The IHAD (a): Quartile 1
Percentage Of Households In The IHAD (a): Quartile 2
Percentage Of Households In The IHAD (a): Quartile 3
Percentage Of Households In The IHAD (a): Quartile 4

*These datasets require you to switch to a different area selection. See the Exercise for details

**This dataset is also required for Theme Three: Employment, Income and Education

***This dataset is also required for Theme Two: Demographics and Socio-economics


Theme Five: Housing, Health, and Wellbeing

DatasetVariable
SA1-G32 Dwelling Structure-Census 2016SA1 2016 11-digit identifier
Occupied private dwellings Flat or apartment Total Dwellings
Occupied private dwellings Other dwelling Total Dwellings
One method Car as driver Persons
Occupied private dwellings Semi detached row or terrace house townhouse etc with Total Persons
Occupied private dwellings Separate house Dwellings
Occupied private dwellings Total occupied private dwellings Dwellings
SA1-G33 Tenure and Landlord Type by Dwelling Structure-Census 2016SA1 2016 11-digit identifier
Rented State or territory housing authority Total
Rented Housing co operative community church group Total
Total Total
SA1-G18 Core Activity Need for Assistance by Age by Sex-Census 2016SA1 2016 11-digit identifier
Persons Total Has need for assistance
Persons Total Total

 

Theme One: Aboriginal and Torres Straight Islander populations

Aboriginal and Torres Strait Islander Populations

Exercise One: Calculating Aboriginal and Torres Strait Islander Population Proportions

In this first exercise, we will be calculating the proportions of the population within each SA1 in our study area that identified as Aboriginal or Torres Strait Islander at the 2016 Census

For this exercise we will use the following datasets:

The following tools will be used in this exercise


Task: Create a choropleth map of your SA1 counts of Aboriginal and Torres Strait Islander people across your study area 


Your map should look something like the image below



We can see from this map that there looks to be high counts of Aboriginal and Torres Strait Islander Australians in some areas to the south and west of the Brisbane Metropolitan region. However, it is possible that these areas just have high population sizes, and we are not necessarily proportionally more Aboriginal and Torres Strait Islander Australians in those areas. In order to determine whether there are proportionally more, we need to standardise the counts by total population size.


Task: Calculate the proportions of populations in each SA2 that identified as Aboriginal or Torres Strait Islander at the 2016 Census using the Generate tool.

Parameters for this are shown below

You should rename this dataset something meaningful


Generate Tool parameters


If you open the dataset, you will notice that one of the proportions is greater than 1 (shown below) – this is due to randomisation of very low count numbers so that the total population count is lower than the randomised count of Aboriginal and Torres Strait Islanders in the SA1. You will need to remove this SA1



Task: Remove any rows that have an Aboriginal or Torres Strait Islander proportion greater 1 from your dataset using the Dataset Attribute Filter tool.

Parameters for this are shown below.

Remember to rename your output dataset something meaningful


Dataset Attribute Filter tool parameters


Once you have filtered this dataset, check the new dataset, you will see that the spurious row has been removed (shown below). You are now ready to visualise your proportional data!



Task: Create a choropleth map of your SA1 population proportions of Aboriginal and Torres Strait Islander people across your study area 


Your map should look something like the map below


 


Key Research Questions

Where are higher proportions of Aboriginal and Torres Strait Islander Australians located in our study area?

Are there any differences between the counts and proportions of Aboriginal and Torres Strait Islander populations in the study area?

What other factors might we look at when examining these potential spatial patterns?


 

Exercise Two: Cluster Analysis of Aboriginal and Torres Strait Islander Populations

In this second exercise, we will be extending our analysis of the distribution of Aboriginal and Torres Strait Islander populations, to determine whether or not we can quantify any clustering patterns

For this exercise we will use the following datasets:

Specifically, we are going to be using the dataset that you generated and filtered.

The following tools will be used in this exercise

Our last exercise provided us with a potential indication of where Aboriginal and Torres Strait Islander people might tend to reside in our study area. 

What we now want to determine is whether or not there is any significant clustering of Aboriginal and Torres Strait Islander populations.

We are going to use two tools to determine this

Moran’s I

Moran’s I is a summary statistic indicating how much a variable or attribute is clustered together (or, spatially autocorrelated) across your a study area. What Moran’s I does is take the value of a variable for a particular unit (in our case, SA1s), and then calculates the average value of for that variable in all of the neighbouring units. So, in this instance, it will determine the proportion of Aboriginal or Torres Straight Islander populations in a given SA1, and then the average proportion of Aboriginal or Torres Strait Islander populations in all of the neighbouring SA1s, and then see if they are related, that is, whether or not high proportions are surrounded by high average proportions, or low with low. Moran’s I values range from -1 to +1. Negative values mean that areas with high proportions have more lower proportions around them, while positive values indicate that high values are clustered with high, and low with low. Values close to zero indicate that there is no clustering, or repulsion.

So lets calculate our Moran’s I

First, we need to turn our dataset into a special file format called a shapefile


Task: Using the Spatialise Aggregated Dataset tool, convert your Generated/Filtered dataset from Exercise 1 into a shapefile.

Parameters for this are shown below.

Remember to name your dataset something Meaningful


Spatialise Aggregated Dataset Tool Parameters


Now that we’ve created our spatialised dataset, we need to make one more additional type of file – a spatial weights matrix.

Recall above that we said that Moran’s I is calculated from taking the average from ‘neighbouring’ units – but we have to specify what a neighbouring unit actually is, and believe it or not there are multiple ways of doing it

We are going to create the simplest type – a contiguous spatial weights matrix


Task: Create a Contiguous Spatial Weights Matrix for your spatialised dataset.

Enter the parameters as you see them below.

Try to avoid opening this dataset when you create it because it is quite large!

Remember to rename this something meaningful


Contiguous Spatial Weights Matrix Tool Parameters


Now we are finally ready to calculate our Moran’s I statistic for our study area


Task: Calculate the Moran’s I statistic for the proportions of Aboriginal and Torres Strait Islander Australians living in your Study Area’s SA1

Enter the parameters as you see them below.


Moran’s I Parameter Input


Once your tool has run, click on the Display button on the pop up dialogue box. Your output should look something like the image below



Your output will contain a number of rows, but you need only look at the top entry. You can see here that our Moran’s I estimate is 0.4147. This indicates that SA1s with high proportions of Aboriginal and Torres Strait Islanders are clustered to at least a moderate degree with other similar SA1s. 

Moran’s I is a summary statistic, but it doesn’t provide us much of a picture. Specifically, where are these clusters of high with high, and low with low? We want to find hot spots and cold spots of Aboriginal and Torres Strait Islander populations and look at them on a map.

To do this, we are going run a Getis-Ord Local G analysis. Essentially, we are going to generate a dataset which we can map which shows clusters of high proportions, clusters of low proportions, and areas where there is no clustering.


Task: Run the Getis-Ord Local G tool for the proportions of Aboriginal and Torres Strait Islander Australians living in your Study Area’s SA1

Enter the parameters as you see them below.

Remember to rename this dataset something meaningful


Getis-Ord Local G Tool Parameters


 

Once the tool has finished running, open the dataset and have a look at the columns. The one we are particularly interested in is called bonmapname. You will see a number called Non Signficant, some called High and some called Low. We can map this variable and have a look at the high (hot spot) and low (cold spots) of Aboriginal and Torres Strait Islander Populations in our study area.


Task: Create a Choropleth of your bonmapgroup variable across your study area


Your map should look something like the map below



You can see that we have statistically significant hot spots of high Aboriginal and Torres Strait Islander populations along an east west corridor from Durack to Ipswich in our study area, and one around the Logan city area. However, these clusters are separated by a cold spot centred on Parkinson, Calamvale and Stretton.

Exercise Three: Comparing Indigenous Populations to other groups

In this third exercise, we will be comparing the spatial distribution of Aboriginal and Torres Strait Islander Populations with other ethnic groups

The following tools will be used in this exercise

First of all, pick a nationality from your SA1-P09 Country of Birth by Sex-Census 2016 dataset that you would like to compare spatially to your Aboriginal and Torres Strait Islander population.

In this example, we will use Vietnam

First of all, lets have a look at the spatial distribution of the proportions of our nationality across our study area. You will need to calculate this proportion!


Task: Calculate the proportion of your nationality in your SA1s using the Generate tool.

Parameters for this are shown below.

Remember to rename this something meaningful.

You may also need to filter this dataset to remove proportions greater than 1

Create a choropleth of these proportions across your study area


What does your map tell you about the spatial distribution of your nationality in your study area? Do you think it is similar to the distribution of your Aboriginal and Torres Strait Islander population?



Now we are going to run a tool to determine how dissimilar the distribution of our two populations are. We are going to use the Dissimilarity Index.

The Dissimilarity Index is a measure of the evenness with which two groups are distributed across the geographic units that make up a larger area of study, measuring how similar or dissimilar two groups are with respect to their geographic spread within a larger region.

Dissimilarity values close to 1 indicate a high dissimilarity, and Dissimilarity values close to zero indicate low dissimilarity (high similarity) in the geographic spread of the two variables.


Task: Merge your population count datasets together (you will run the calculations on population counts, not the percentages that you calculated for your populations).

Calculate the Dissimilarity Index for your two populations.

Enter the parameters as you see them below


Dissimilarity Index Parameters


Once your tool has run, click the Display button that pops up and have a look at the Dissimilarity Index value. 



This shows that there is considerable dissimilarity in the spatial distribution of Aboriginal and Torres Strait Islander and Vietnamese populations in our study area. What Dissimilarity Index value did you get?

Theme Two: Demographics and Socio-economics

Demographics and Socio-economics

Exercise One: Calculating Ethnic Diversity

In this first exercise, we will be calculating the ethnic diversity for SA1s in our study area.

For this exercise we will use the following datasets:

The following tools will be used in this exercise

The Diversity Index is a useful way of measuring the degree of specialisation or alternatively the degree of diversity across attributes within a spatial unit. This allows spatial units to be compared as to the mix of the characteristics being measured.

You can conceptually use the tool for a range of variables: socio-economic, economic, built form or demographic.

The diversity index for a particular region can range from 0 to 1, where a score approaching 0 indicates an increasing degree of diversity and a score approaching 1 indicates an increasing degree of specialisation (or homogeneousness). Determination of whether a region has a high or low diversity index is done by comparing the diversity index scores across all regions


Task: Calculate ethnic diversity in your study area by using the Diversity Index tool.

The parameters are shown below.

Remember to include the numbers of every ethnic category (including Born Elsewhere), but don’t include Total Persons).

Remember to rename your dataset something meaningful.


Diversity Index Parameters


When your tool has finished running, open the dataset and have a look at the Diversity Index column. If you sort that column, you will see that a number of the rows have 0 values, which means that there were no counts contributing to the calculations. These need to be removed



Task: Using the Dataset Attribute Filter tool, filter your dataset so that you only have rows with a Diversity Index value greater than 0.

Parameters for this are shown below.

Remember to rename your dataset something meaningful


Dataset Attribute Filter Tool Parameters


We are now ready to visualise ethnic diversity across our area. 


Task: Create a choropleth of the Diversity Index across your study area. Given that lower Diversity Index values indicate higher levels of ethnic diversity, try reversing your palette in the parameter input so that your map makes sense



Key Research Questions:

Where is ethnic diversity in your study area concentrated?

Where is ethnic diversity lacking?

What are the potential relationships between ethnic diversity and socio-economic indicators?


 

Exercise Two: Examining Socio-Economic Indicators

In this second exercise, we will be examining the spatial distribution of socio-economic status across our study area. We will be comparing household level and area level metrics of advantage and disadvantage

For this exercise we will use the following datasets:

The IHAD dataset contains four interesting columns – these are the proportions of households in each SA1 that are within each quartile of advantage or disadvantage for all households across Australia. Quartile 1 is most disadvantaged, Quartile 2 is middle disadvantaged, Quartile 3 is middle advantaged and Quartile 4 is most advantaged. Percentages higher than 25% indicate there are proportionally more households from that quartile in an SA1 compared to the country as a whole.

The SEIFA IRSAD dataset has one attribute – an index score – which characterises the whole SA1 as relatively more disadvantaged (scores less than 1000) or relatively more advantaged (scores greater than 1000).

For more information on these and other indicators, visit our tutorial on socio-economic indicators

The following tools will be used in this exercise

The first thing that we want to do is ‘eye-ball’ the distribution of socio-economic indicators across our study area


Task: Create a choropleth map for each of the four Quartiles in the IHAD dataset across your study area. Remember to choose different colour schemes, but choose the same number of classes and classifier system so that you can appropriately compare them. Create a choropleth of the IRSAD Index Score as well.


Your choropleths should look something like the images below


IHAD Quartile 1 Percentages


IHAD Quartile 2 Percentages


IHAD Quartile 3 Percentages


IHAD Quartile 4 Percentages


SEIFA IRSAD Index Score


Key Research Question: What is the spatial relationship, if any, between the IHAD quartile percentages? What about between quartile percentages and the SEIFA IRSAD Index Score?


We are now going to examine these relationships in a more statistical framework, using the correlation matrix chart tool. Comprehensive documentation on correlation co-efficient (r) values and the correlation matrix chart can be found here and here. We will need to begin by joining our datasets together


Task: Merge your IHAD and SEIFA IRSAD datasets together.

Parameters for this are shown below.

Remember to rename your dataset something meaningful.

Run a Correlation Matrix Chart tool on the merged dataset.

Parameters for this are shown below


Merge Aggregated Datasets Parameter Input

 

Correlation Matrix Chart Parameter Input (1 of 2)

 

Correlation Matrix Chart Parameter Input (2 of 2)


Once your tool has run, click Display. This will open an image that should look something like the image below



The bottom left of the chart has the correlation co-efficient r values between the variables. The top right of the chart provides and indication of what this might look like as a ‘cloud’ of points on a scatter plot. Yellow indicates positive relationships, green indicates negative, and the width of the ellipses also provides indication of how strong (narrow) or weak (wide) those relationships are. We can see that the IHAD Quartiles 1 and 2 are moderately positively correlated, as are Quartiles 3 and 4. Quartile 1 and Quartile 4 are strongly negatively correlated. Quartiles 2 and 3 are very weakly negatively correlated. Quartile 1 is strongly negatively correlated with the SEIFA IRSAD Score, while Quartile 4 is strongly positively correlated with the SEIFA Index Score.


Key Research Question: What do you think Socio-economic Diversity in your area looks like?

What tools might you use to investigate this?

How do you think this might relate (or not) to the ethnic diversity patterns you uncovered in Exercise 1?


 

Exercise Three: Spatial Distribution of Age Groups

In this third exercise, we will be examining the spatial distribution of different age groups across our study area

We will be using the following datasets

The following tools will be used in this exercise

The first thing we want to do is map the proportions of the population that are aged 15 – 24 years old, and the proportions of the population that are aged 65 years and over. However, we currently only have populations in a smaller age ranges. We will need to add some columns together and then calculate the proportions in a step wise way (five sequential steps of the Generate Tool

  1. (15 to 19 + 19 to 24) = 15 to 24 Total
  2. (15 to 24 Total / Total Population) = 15 to 24 Proportion
  3. (65 to 74 + 75 to 84) = 65 to 84 
  4. (65 to 84 to 85 + 85 and Over) = 65 and Over Total
  5. (65 and Over Total / Total Population) = 65 and Over Proportion

This involves a fair bit of “data wrangling” but this is a common component of any data processing, and it is good to get in the practice of documenting your work and good data management!


Task: Calculate the proportions of each of the 15 to 24 years and 65 years and over age groups for the SA1s in your study area, by using using the Generate tool in a step wise manner.

Parameters for the first instance of this are shown below.

Remember to rename each step something meaningful so that you don’t lose track of your process!

Create a choropleth of these two proportions across your study area


What does your map tell you about the spatial distribution of your age groups across your study area?


Proportion of Population aged 15 to 24 years

Proportion of Population aged 65 and Over

 


Now we are going to run a tool to determine how dissimilar the distribution of our two populations are. We are going to use the Dissimilarity Index.

The Dissimilarity Index is a measure of the evenness with which two groups are distributed across the geographic units that make up a larger area of study, measuring how similar or dissimilar two groups are with respect to their geographic spread within a larger region.

Dissimilarity values close to 1 indicate a high dissimilarity, and Dissimilarity values close to zero indicate low dissimilarity (high similarity) in the geographic spread of the two variables.


Task: Merge your population count datasets together (you will run the calculations on population counts, not the percentages that you calculated for your populations).

Calculate the Dissimilarity Index for your two populations.

Enter the parameters as you see them below


Dissimilarity Index Parameters Input

 


This shows that there is some dissimilarity in the spatial distribution of the population aged 15 to 24 and 65 and over in our study area

Theme Three: Employment, Income and Education

Employment, Income, and Education

Exercise One: Labour Force Participation

In this first exercise, we will be calculating the proportion of the population who are employed, unemployed and not in the labour force.

The following tools will be used in this exercise

The first thing we need to do is calculate the proportions of the population that are employed, unemployed, or not in the labour force. It is important to draw a distinction between these last two. Unemployment counts individuals who are not employed but are looking for work (are in the labour force), while not in the labour force counts individuals who are not employed and are not looking for work (full time students, individuals on disability pension, etc)

We will need to use the Generate tool sequentially to calculate these proportions.


Task: Calculate the proportions of the populations in the SA1s of your study area who are who are employed, unemployed and not in the labour force.

Use the Generate tool for each step.

Parameters for the first step are shown below.

Remember to use the output from the last workflow as your input for the next, so that you end up with the proportions in the same dataset. Create choropleths for each of the proportions across your study area.


Generate Tool Parameters


Your maps should look something like the images below


Proportion of Population Who are Employed


Proportion of Population who are Unemployed


Proportion of Population who are not in the Labour Force


We can see from this map that there appears to be a spatial association between proportions of the population who are unemployed and proportions of the population who are not in the labour force.

We can examine this relationship more closely by using the correlation tool to examine the strength and direction of these relationships.


Task: Run a correlation analysis on your dataset between the three variables that you have calculated.

Parameters for this are shown below


Correlation analysis parameter input


When you run the correlation tool, it produces an untidy tab delimited text file like the one shown below.



You can copy and paste this directly into Excel, to produce something that looks a little like the table below


Correlation ValuesProportion EmployedProportion UnemployedProportion not in Labour Force
Proportion Employed1-0.2201-0.7279
Proportion Unemployed-0.220110.1236
Proportion not in Labour Force-0.72790.12361
P ValuesProportion EmployedProportion UnemployedProportion not in Labour Force
Proportion EmployedNaN00
Proportion Unemployed010
Proportion not in Labour Force001

Our correlation analysis has shown that the proportion of employed people is moderately negatively correlated with the proportion of people who are unemployed, and strongly negatively correlated with proportion of people who are not in the labour force. The proportion of people who are unemployed is weakly (but still significantly) positively correlated with proportion of people who are not in the labour force.


Extension Exercise: Can you calculate the proportion of the 15 to 24 year old populations in each of these groups? Do these rates differ across your study area to to overall population?


 

Exercise Two: Income Patterns

In this second exercise, we will be examining the distribution of median income across our study area. We will be using this data to learn how to Moran’s I Scatterplot and undertaking a spatial clustering analysis (Local Indicators of Spatial Autocorrelation – LISA)

For this exercise we will use the following datasets:

The following tools will be used in this exercise

Lets start by mapping our median income variable across our study area


Task: Create a choropleth of median total household weekly income for your study area


Your map should look something like the image below



We are going to undertake an analysis to identify the extent to which income levels are clustered in our study area. We are going to start by creating a Moran’s I Scatterplot. Moran’s I is explained more comprehensively below, and you can also read about it here, along with information about clustering or spatial autocorrelation more broadly here)

Moran’s I

Moran’s I is a summary statistic indicating how much a variable or attribute is clustered together (or, spatially autocorrelated) across your a study area. What Moran’s I does is take the value of a variable for a particular unit (in our case, SA1s), and then calculates the average value of for that variable in all of the neighbouring units. So, in this instance, it will determine the median income in an SA1, and then the average median income in all of the neighbouring SA1s, and then see if they are related, that is, whether or not high incomes are surrounded by high average incomes, or low with low. Moran’s I values range from -1 to +1. Negative values mean that SA1s with high incomes have more lower incomes around them, while positive values indicate that high values are clustered with high, and low with low. Values close to zero indicate that there is no clustering, or repulsion.

You can illustrate your Moran’s I relationship on a scatterplot.

So lets calculate our Moran’s I scatterplot

First, we need to turn our dataset into a special file format called a shapefile


Task: Using the Spatialise Aggregated Dataset tool, convert your dataset into a shapefile.

Parameters for this are shown below.

Remember to name your dataset something Meaningful


Spatialise Aggregated Dataset Tool Parameters

 


Now that we’ve created our spatialised dataset, we need to make one more additional type of file – a spatial weights matrix.

Recall above that we said that Moran’s I is calculated from taking the average from ‘neighbouring’ units – but we have to specify what a neighbouring unit actually is, and believe it or not there are multiple ways of doing it

We are going to create the simplest type – a contiguous spatial weights matrix


Task: Create a Contiguous Spatial Weights Matrix for your spatialised dataset.

Enter the parameters as you see them below.

Try to avoid opening this dataset when you create it because it is quite large!

Remember to rename this something meaningful


Contiguous Spatial Weights Matrix Tool Parameters


Now we are finally ready to generate a Moran’s I scatterplot for median total household income for our study area


Task: Generate a Moran’s I Scatterplot statistic for the median total household weekly income in your Study Area’s SA1

Enter the parameters as you see them below.


Moran’s I Scatter Plot Parameter Input


Once your tool has run, click on the Display button on the pop up dialogue box. Your output should look something like the image below



Your output is a scatterplot divided into four quarters. You can see that it is a very slightly positive sloping graph with most point spread evenly throughout the graph. This, together with the Moran’s I value of 0.025 indicates that median income is only very weekly positively spatially autocorrelated (clustered) in our study area. 

This doesn’t mean that income doesn’t exhibit spatial clustering elsewhere, or even in our area, only that the methods have chosen to use don’t detect a particularly strong pattern in our study area. This is where it is important to consider your choice of statistical tests within the context of your understanding of an area, together with considerations about the impact of spatial scale, or sample size, on your methods!

Moran’s I is a summary statistic, but it doesn’t provide us much of a picture. Specifically, where are these clusters of high with high, and low with low, and potentially high with low and low with high. We want to find hot spots and cold spots of median income and look at them on the map.

To do this, we are going run a Local Moran’s I analysis. Essentially, we are going to generate a dataset which we can map which shows clusters of high income, clusters of low income, and potentially areas where high income is surrounded by low income – and vice versa – and areas where there is no statistically significant clustering.


Task: Run the Local Moran’s I tool for median income in your Study Area’s SA1

Enter the parameters as you see them below.

Remember to rename this dataset something meaningful


Local Moran’s I Parameter Input


Once the tool has finished running, two datasets will be created. The first one – Output: MoranI-WorkflowXXX – we are not interested in. The second one, named Output: LocalI-WorkflowXXX is the one we are interested in. Open the dataset and have a look at the columns. The one we are particularly interested in is called median_tot_hhd_inc_weekly_map_group_name. You will see a number called Non Significant, some called High-High, Low-Low, High-Low, and Low-High. We can map this variable and have a look at the High-High (hot spot) and Low-Low (cold spots) of median income in our study area. High-Low and Low-High are also interesting because they might point to areas of diversity.


Task: Create a Choropleth of your median_tot_hhd_inc_weekly_map_group_name variable across your study area


Your map should look something like the map below



Dark green areas indicate clusters of low income with low income, light green indicates low income surrounded by high. Light blue indicates clusters of high income surrounded by high income, while dark blow indicates areas of high income surrounded by low. Pink indicates areas with no statistically significant relationship.


Extension Exercise: Try playing around with some of the parameters in your spatial weights matrix generation and see if these affect your results. For example, what happens if you choose rook instead of queen contiguity? Or second order instead of first order? What about if you choose a completely different method of calculating your matrix (such as distance or distance decay?)


 

Exercise Three: Education Inequality

In this third exercise, we will be calculating equity of access to schools in the Greater Brisbane region. 


Task: You will need to change your area selection for this exercise to: Greater Brisbane (2016) gccsa_2016/3BRI



We will be using the following datasets in this analysis

The following tools will be used in this exercise

We are going to look at the spatial distribution of government schools across the Greater Brisbane area relative to population and determine whether there is in an inequity of access to government schools, that is, whether certain areas have higher access to government schools relative to population size. 

We are going to use the Gini coefficient to determine this. We have some comprehensive documentation on the Gini coefficient in the link above so we won’t go into it in great detail here, only suffice to say that they range from 0 to 1. Gini coefficients close to zero indicate less inequality, that is, resources are more equally distributed across a population, while Gini coefficients closer to 1 indicate that resources are concentrated in a smaller component of the total population.

First of all have a look at the spatial distribution of the schools across your study area.


Task: Create a choropleth of your schools by school your study area


Your map should look something like the map below



We want to count the number of schools in each of our spatial units. In this instance our spatial units are going to be SA3s – lets create a map of these to understand what they look like.


Task: Create a choropleth of your SA3s by population size


Your map should look like the image below



We would like to count the number of each of our school inside each of these SA3s. To do this, we must first create a shapefile from our SA3 dataset using the Spatialise Aggregated Dataset tool


Task: Spatialise your SA3 Aggregated Population & Dwelling Counts 2016 Census for Australia dataset using the Spatialise Aggregated Dataset tool.

Enter the parameters as you see them below.

Remember to rename your dataset something meaningful


Spatialise Aggregated Dataset Parameters Input


Now we can count the number of schools in each of these SA3s


Task: Using the Count Points in Polygons tool, count the number of schools that are within each of the SA3s.

Parameters for this are shown below.

Remember to name your dataset something meaningful


Count Points in Polygon Parameter Input


If you open the resultant dataset you can see that the School_Count column with the count in each SA3



Now we are ready to run our Gini Coefficient to see if there is an inequity in access to government schools across our study area


Task: Calculate the Gini Coefficient.

Enter your parameters as you see them below


Gini Coefficient Parameter input


Once your tool has run, click Display to see the output



We have a Gini Coefficient output of around 0.26. This indicates that there are some inequalities in access to government schools, but it could be substantially higher!


Extension Exercise: What do you think would happen if you went to a larger or smaller spatial scale? Have a try, and see what impact it has on the Gini coefficient values – what do you think that means for the reliability of the index?


 

Theme Four: Transport and Movement

Transport and Movement

Exercise One: Journey to Work

In this first exercise, we will be calculating the proportions of people who travel to work by different modes

For this exercise we will use the following datasets:

The following tools will be used in this exercise


Task: Create some choropleth map of your journey to work counts for a couple of the different modes


Your maps should look something like the images below


Bicycle


Bus


Car (Driver)


Car (Passenger)


Ferry


Train


Walked


These maps provide very clear different spatial patterns, but what we need to do to be correct in our interpretation of them is convert them to proportions of the population. We need to standardise the counts by total population size.


Task: Calculate the proportions of population using each of the modes you have chosen using the Generate tool.

Parameters for this are shown below

You should rename these datasets something meaningful


Generate Tool parameters

 


If you open the resultant dataset, you might notice that some of the proportions are greater than 1 – this is due to randomisation of very low count numbers so that the total population count is lower than the randomised count of people who used that mode. You should remove those SA1s using the Dataset Attribute Filter tool.


Task: Filter your mode if the proportions column is greater than 1 by using the Dataset Attribute Filter tool


Dataset Attribute Filter tool parameters

 


Once you have filtered this dataset, check the new dataset, you will see that the spurious row has been removed. You are now ready to visualise your proportional data.


Task: Create a choropleth map for each of your selected modes. 


Your maps should look something like the maps below


 

Bicycle


BUS


Car (Driver)


Car (Passenger)


Ferry


Train


Walked


Key Research Questions

 

How do the different modes shift across the study area?

What are some of the variables which might be associated with these different patterns?

What difference did switching to proportions have for each of the modes – which stayed similar and which changed drastically?


 

Exercise Two: Walkability Analyses

In this second exercise, we will be undertaking a neighbourhood walkability analysis


You will need to change your area selection for this exercise to: Springwood – Kingston Greater Brisbane (sa3_2016/31106)



For this exercise we will use the following datasets:

The following tools will be used in this exercise


First of all, lets change our base map to something dark so that our datasets will show up (shown below)



We now want to ‘eyeball’ our street network dataset to see what it looks like – this is the network that we will be measuring our walkability along. To view this dataset, click the spanner next to the dataset and click Display on Map and accept the parameters (shown below)



Your dataset should look something like the map below



We can see from this that we have a large motorway through our area that obviously doesn’t represent the walkable network. We need to remove components of the dataset that are named motorway and motorway_link in the Highway column


Task: Using the Dataset Attribute Filter tool, filter your dataset so that you remove parts of the network that are named motorway or motorway_link. 

Parameters for this are shown below.

You will need to run the tool twice, using the output of the first step as the input for the second step

Remember to rename your dataset something meaningful


Dataset Attribute Filter Parameter Input


Your maps should look successively like the maps below



You can see that the parts of the network around the motorway successively disappear

Now display your toilets on the map in the same way – this will show up like the image below



You might also want to have a look at what the land use categories of your meshblocks look like. To do this, create a choropleth of your meshblock dataset, choosing “meshblock Category” as your attribute. This will automatically choose Preclassified as your breaks. Select a Qualitative palette type, so that each land use gets a discrete colour. It should look something like the map below



Measuring Walkability around Points

We will be using the Walkability Index with Gross Density (Points) tool

This tool is a “sandwich with the lot” combining the three composite elements of the built urban form – connectivity, land use mix and population density – to provide you with a comprehensive walkability index for your areas in a single step.


Task: Run the Complete with Gross Density (Points) tool for your datasets, calculating the walkability around each of the toilets in your study area.

Enter your parameters as shown below. 


Walkability Index with Gross Density (Points) Parameters Input


Once your tool has run, click on the Display button to bring up the output of the tool. This is a table, with a large amount of information for about each of the catchments around the schools in the analysis (shown below). These are explained in some detail under the image



  • Connectivity: The total number of connections per square kilometer
  • Area: The total area in square metres of each walking catchment
  • Connections The total number of connections in each of the walking catchment
  • LUM_X: The total square metres of each land use falling within each walking catchment
  • LandUseMixMeasure: This is an ‘entropy measure’, measuring the extent to which there is an equal distribution of each land use within the catchments. Values of the land use mix range from 0 (the lowest mix) to 1 (the highest possible mix)
  • AverageDensity: The average population per hectare for each of the catchments.
  • XXX_ZScore: These are the scores for the three different components (connectivity, land use and average density) converted into Z scores, where the mean for the different catchments is zero, and the numbers indicate how many standard deviations each score is above or below the mean. Essentially, the more positive the number, the better relative score for that attribute, and the more negative number, the worse relative score for that attribute. We recommend that you make sure you have a relatively large number of observations (a minimum of 30) before using Z scores in any discussion, as they rely on robust mean and standard deviation calculations, which are less reliable at smaller samples sizes.
  • SumZScore: This is the final Walkability Index for your catchments – and represents the sums of each of the different component Z score

We will now take a look at the distribution of the Walkability Index across our study areas.


Task: Create a choropleth of the SumZScore, choosing a Diverging palette type (such as Spectral or Red to Blue) with an odd number of classes so that the middle colour represents the mean values.


It should look something like the image below. If you hover over each of the bike share stations, you can see its individual attributes, and determine which of the different components let down or improved its overall walkability index



 

Exercise Three: Oil Vulnerability and Housing Costs

In this third exercise, we will be examining the spatial and statistical relationship between oil vulnerability, disadvantage, and housing costs

We will be using the following datasets in this analysis

The following tools will be used in this exercise

The first thing that we want to do is look at the spatial distribution of some of these variables across our study area. However, they have different spatial extents, because the VAMPIRE index only covers the UCL of Brisbane and doesn’t cover the whole of the country like IHAD or the selected medians datasets.

The VAMPIRE index is a measure of vulnerability to  oil price shifts, taking into account both housing and transport variables. High VAMPIRE Index scores indicate higher relative oil vulnerability in an SA1, while low scores represent lower relative vulnerability.

We will need to successively merge our datasets so that they are filtered to only have the rows present in the VAMPIRE index. Once we have done that, we can map some of our variables


Task: Merge your datasets successively using the Merge Aggregated Datasets. 

Parameters for this are shown below.

Use the output of the first step as the input of the new step

Remember to rename your datasets something meaningful.

Create a choropleth for three of the variables: VAMPIRE Index, median monthly mortgage repayment, and the percentage of households in IHAD Quartile 1


Your maps should look like the images below.


VAMPIRE Index


 

Median Monthly Mortgage Repayments ($/Month)


IHAD Quartile 1 Percentages


Do you think there is a relationship between these variables? To examine this as a first pass, we are going to use the Scatterplot Matrix tool to look that relationship between the distribution of values for the three variables. This tool produces a static image which can be put into a report.


Task: Create a Scatterplot Matrix using the Scatterplot Matrix Tool.

Parameters for this are shown below.


Scatterplot Matrix Parameter Inputs (1/2)

 

Scatterplot Matrix Parameter Inputs (2/2)


Once your tool has finished running, click the Display button to view your output. It should look something like the image below



Do any clear relationships jump out at you? It looks like we might need to dive deeper into the analytics to see if there is a relationship. between our variables.

To this end we will run a correlation analysis on these variables to determine their correlation co-efficient (r) values.


Task: Run a correlation analysis on your dataset between the three variables that you have calculated.

Parameters for this are shown below


Correlation Tool Parameter Input (1/2)

 

Correlation Tool Parameter Input (1/2)


When you run the correlation tool, it produces an untidy tab delimited text file like the one shown below.



You can copy and paste this directly into Excel, to produce something that looks a little like the table below


Correlation ValuesVAMPIRE IndexMed. Mthly Mort. ($/mon)% IHAD Q 1
VAMPIRE Index1-0.1138-0.2746
Med. Mthly Mort. ($/mon)-0.11381-0.6043
% IHAD Q 1-0.2746-0.60431
P ValuesVAMPIRE IndexMed. Mthly Mort. ($/mon)% IHAD Q 1
VAMPIRE IndexNaN00
Med. Mthly Mort. ($/mon)0NaN0
% IHAD Q 100NaN

Our correlation analysis has shown that the VAMPIRE index is negatively associated with both the percentages of households in IHAD Quartile 1 and median monthly mortgage repayments. This means that as both increase, the VAMPIRE index decreases


Extension Exercise: How do we explain this particular pattern, which seems paradoxical? Have a look at your specific areas of disadvantage and their VAMPIRE scores


 

Theme Five: Housing, Health, and Wellbeing

Housing, Health and Wellbeing

Exercise One: Calculating Dwelling Diversity

In this first exercise, we will be calculating the ethnic diversity for SA1s in our study area.

For this exercise we will use the following datasets:

The following tools will be used in this exercise

The Diversity Index is a useful way of measuring the degree of specialisation or alternatively the degree of diversity across attributes within a spatial unit. This allows spatial units to be compared as to the mix of the characteristics being measured.

You can conceptually use the tool for a range of variables: socio-economic, economic, built form or demographic.

The diversity index for a particular region can range from 0 to 1, where a score approaching 0 indicates an increasing degree of diversity and a score approaching 1 indicates an increasing degree of specialisation (or homogeneousness). Determination of whether a region has a high or low diversity index is done by comparing the diversity index scores across all regions


Task: Calculate dwelling type diversity in your study area by using the Diversity Index tool.

The parameters are shown below.

Remember to include the dwelling category  but don’t include Total Dwellings

Remember to rename your dataset something meaningful.


Diversity Index Parameters


When your tool has finished running, open the dataset and have a look at the Diversity Index column. If you sort that column, you will see that a number of the rows have 0 values, which means that there were no counts contributing to the calculations. These need to be removed



Task: Using the Dataset Attribute Filter tool, filter your dataset so that you only have rows with a Diversity Index value greater than 0.

Parameters for this are shown below.

Remember to rename your dataset something meaningful


Dataset Attribute Filter Tool Parameters


We are now ready to visualise dwelling type diversity across our area. 


Task: Create a choropleth of the Diversity Index across your study area. Given that lower Diversity Index values indicate higher levels of dwelling type diversity, try reversing your palette in the parameter input so that your map makes sense



Key Research Questions:

Where is dwelling type diversity in your study area concentrated?

Where is dwelling diversity lacking?

What are the potential relationships between dwelling type diversity and socio-economic indicators? Age of housing stock? Land values?


 

Exercise Two: Tenure Type and Disability

In this second exercise, we will be examining the potential relationship between the proportion of households in social housing, and the proportion of the population who require assistance with their daily core activity.

For this exercise we will use the following datasets:

Specifically, we are going to be using the dataset that you generated and filtered.

The following tools will be used in this exercise

The first thing we need to do is calculate the proportions of households that are in social housing, and the proportions of the population that require assistance with their core activities.

We will need to use the Generate tool sequentially to calculate these proportions: 

  1. (Rent State or Territory + Rent Community or Church) = Rent Social Counts*
  2. (Rent Social Counts / Total Households) = Rent Social Prop*
  3. (Core Assistance Counts / Total Persons) = Core Assistance Prop

*The output dataset from step one needs to be used as the input dataset for step two

Then you will need to merge these into one dataset;

  1. Rent Social Prop + Core Assistance Prop = All Proportions

Task: Calculate the proportions of households in social housing and the proportion of population requiring core assistance

Use the Generate tool for each step.

Parameters for the first step are shown below.

Remember to name your datasets appropriate.

Merge your datasets together.

Parameters for the merge are shown below

Create choropleths for each of the proportions across your study area.


Generate Tool Parameters

 

Merge Parameters Input


Your maps should look something like the images below


Proportion of Households in Social Housing


Proportion of Population requiring Core Assistance


We can see from this map that there appears to be a potential association between proportions of households in social housing and proportions of population requiring core assistance. However, lets have a look at the relationship in a scatterplot


Task: Create an interactive scatterplot of the proportion of households in social housing versus proportion of individuals requiring core assistance


Your scatterplot should look something like the image below. What do you think this relationship means?