Tutorial: Exploring the ABS data by region datasets

Introduction

This tutorial explores the ABS Data by Region data packages, which are available through the AURIN Workbench. These datasets cover a range of topics and variables, at a range of geographic levels, most of which are reported on an annual basis. This means that it is possible to map and understand at a fine level of detail some of the social and demographic change occurring in Australia. More detailed information on the Data by Region produce can be found here.

In this tutorial we will explore some of these variables, while using a range of the tools in the AURIN Portal. Each of the tools has its own User Guide which will be linked where that tool is used. We recommend that if you familiarise yourself with these user guides and tools with each tool so that you become more proficient in the tools.

Setting Up Your Workspace

Setting up your Workspace

Begin setting up your workspace by creating a new project as shown in the image below, before selecting your area and data (next tabs above)

Select the following area as your study area:

  • Greater Sydney Capital City Statistical Area 2016 (2GMEL_2016)

Select the following datasets and variables for your workspace. Entries in Orange indicate instances where you have to select a filter value

DatasetVariables
ABS – Data by Region – Economy & Industry (SA2) 2011-2017SA2 Code 2016
Year: 2016
SA2 Name 2016
Building Approvals Total Private Sector Dwelling Units No.
Building Approvals Private Sector Houses No.
ABS – Data by Region – Education & Employment (SA2) 2011-2017SA2 Code 2016
Year: 2016
Labour Force Statistics Unemployed No.
Labour Force Statistics Unemployment Rate %
ABS – Data by Region – Health & Disability (SA2) 2011-2016SA2 Code 2016
Year: 2016
Private Health Taxpayers Who Report Having Private Health Insurance No.
ABS - Data by Region - Income (Including Government Allowances) (SA2) 2011-2017SA2 Code 2016
Year: 2016
Selected Government Pensions & Allowances Age Pension – Centrelink No.
Selected Government Pensions & Allowances Carer Payment No.
Selected Government Pensions & Allowances Commonwealth Rent Assistance No.
Selected Government Pensions & Allowances Disability Support Pension No.
Selected Government Pensions & Allowances Family Tax Benefit B No.
Selected Government Pensions & Allowances Family Tax Benefit A No.
Selected Government Pensions & Allowances Newstart Allowance No.
Selected Government Pensions & Allowances Parenting Payment – Partnered No.
Selected Government Pensions & Allowances Parenting Payment – Single No.
Selected Government Pensions & Allowances Youth Allowance (Full Time Students/Apprentices) No.
Selected Government Pensions & Allowances Youth Allowance (Other) No.
Estimates Of Personal Income Employee Income Earners No.
Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No.
Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Median $
ABS – Data by Region – Family & Community (SA2) 2011-2016SA2 Code 2016
Year: 2016
Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income %
Household Stress Households With Rent Payments Greater Than Or Equal To 30% Of Household Income %
Tenure Type Rented %
Tenure Type Owned Outright %
Tenure Type Owned With A Mortgage %
ABS – Data by Region – Population & People (SA2) 2011-2017SA2 Code 2016
Year: 2016
Estimated Resident Population Persons Total No.
Estimated Resident Population Persons Working Age Population (15-64 Years) %
SA2 Aggregated Population & Dwelling Counts 2016 Census for AustraliaSA2 Code 2016
Total Dwelling Count 2016
Total Usual Resident Population 2016
SA2 Estimating Homelessness 2016SA2 Code 2016
Homeless Operational Group-All homeless persons
SA2-GO2 Selected Medians and Averages - Census 2016SA2 Code 2016
Median Mortgage repayment monthly
Median rent weekly
Median total household income weekly
Theme One: Economy, Industry, and Environment

Economy, Industry and Environment

Now that we have set up our workspace, and appropriately curated and prepared our data, we are ready to dive into some analysis! Our first interrogation of our data relates to housing and homelessness. Our final merged dataset contains a number of variables which relate to the housing space. These include

  • Building Approvals Total Private Sector Dwelling Units No.
  • Building Approvals Private Sector Houses No.
  • Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No.
  • Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Median $
  • Gross Capital Gains Reported by Taxpayers Gross Capital Gains – Amount $M
  • Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income %
  • Household Stress Households With Rent Payments Greater Than Or Equal To 30% Of Household Income %
  • Tenure Type Rented %
  • Tenure Type Owned Outright %
  • Tenure Type Owned With A Mortgage %
  • Total Dwelling Count 2016
  • Homeless Operational Group-All homeless persons
  • Median mortgage repayment weekly
  • Median rent weekly

Housing Stress and Income

The first element of our housing data that we will look at relates to housing stress and its relationship to other factors such as income.

To start of with we will make some maps


Create a Choropleth Map of the Tenure Type Owned With A Mortgage % Variable

Create a Choropleth Centroid Map of the Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % Variable


Your maps should look something like the images below



Key Research Question: Do you think there is a relationship between the proportion of households in an SA2 who are servicing a mortgage, and the proportion of households who are spending more than 30% of their income on servicing a mortgage?


In order to answer the above question, it is worthwhile first of all viewing the shape of this potential relationship with a scatterplot. A scatterplot allows a preliminary ‘eyeballing’ of the data before bringing more rigourous statistical tests to bear on the data


Create an Interactive Scatterplot of Tenure Type Owned With A Mortgage % (x axis) vs Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % (y axis)


Your scatterplot should look something like the image below



This suggests that there is a relationship, but on closer inspection of the variables that we have mapped, it is not particularly insightful: areas with more households servicing a mortgage have more households servicing mortgages with more than 30% of their income – as one goes up the other is likely to go up.

What we are potentially more interested in is understanding which areas have higher proportions of households spending more than 30% of their income on a mortgage as a proportion of the total households servicing a mortgage not as a proportion of total households.

We can generate this measure in the portal by using the Generate tool


Using the Generate tool, create a new dataset with a new variable that is Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % divided by Tenure Type Owned With A Mortgage %

Create a Choropleth Map of this new variable (making sure to switch off your other layers as well!)


Your map should look something like the images below



Key Research Question: What is the potential relationship between the proportion of mortgaged households spending more than 30% of their income on their mortgage and the average income and median mortgage repayments?


Now we are starting to ask some interesting questions: do areas with a greater proportion of mortgaged households spending more than 30% of their income on mortgage have higher or lower average mortgage payments, or higher or lower incomes?

To answer this question, first we want to map mortgage repayments on top of our proportion map.


Create a Choropleth Centroid Map of the Median Monthly Mortgage Repayment

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Monthly Mortgage Repayments

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Weekly Total Household Income


Your map and scatterplot should look something like the images below


Median Monthly Mortgage Repayments:

Median Weekly Household Income:


We can see that there is a datapoint which is an outlier, which could potentially skew the dataset for any future statistical analysis. We need to remove this point from our dataset


Using the Dataset Attribute Filter tool, create a new dataset that has this row removed (hint – only keep rows that have a value of the column you created less than 1!)


Now that we have filtered our dataset we are going to run a correlation analysis of our data – we want to understand whether or not there are positive or negative relationships between our variable that we created, and the average amount either spent on mortgages or the average income taken in by a household in areas.

Remember, a correlation value ranges from -1 to 1. The sign of the correlation tells us whether or not it it is positive or negative. Positive correlations mean that as one variable goes up, the other does too. Negative correlations mean that as one variable goes up, the other goes down. The further away a correlation value is from 0 to either -1 or 1 gives us an indication of how strong the correlation is. The closer to zero, the weaker the relationship, the closer to -1 or 1, the stronger the relationship is.


Using the Correlation tool, run a correlation matrix for the three variables: median monthly income, median weekly rent, and the proportion of mortgaged households spending more than 30% of their income servicing their mortgage.


The output of your analysis should look something like the image below



The problem with this is that it is a text editor. However, if you copy and paste these values into excel, you should be able to see the data structured in a more meaningful way, similar to the tables below.


Correlation R value and P Value Matrix

PropMS_MM0_median_mortgage_repay_monthlyM0_median_tot_hhd_inc_weekly
PropMS_M100
M0_median_mortgage_repay_monthly-0.300710
M0_median_tot_hhd_inc_weekly-0.47550.8651

The matrix is mirrored so you only need to look at the bottom left three values for the tables (in bold/blue) for the correlation values. These show that there are moderate negative relationships between the proportion of mortgaged households spending 30% or more of their incomes servicing their mortgages and the median mortgage repayments and incomes. The top right of the table represents the statistical significance of the relationships. These relationships are statistically significant (the P-Values are negligibly close to zero), so they are shown in green.


Key Take-Home Message #1: Areas with lower incomes, and with lower monthly mortgage repayments, nonetheless have higher proportions of households spending more than 30% of their total income servicing their mortgages


Capital Gains, Ownership, and Building Approvals

Now we are going to switch gears somewhat and start to look at other aspects of the housing landscape in Sydney, by looking at Capital Gains, Housing Ownership and Building Approvals

To start off with we will again begin by making some maps


Create a Choropleth Map of the Tenure Type Owned Outright % Variable

Create a Choropleth Centroid Map of the Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No


Your maps should look something like the images below



Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, and the number of taxpayers reporting Capital Gains on their tax return?


Create an Interactive Scatterplot of Proportion of Households Owning their dwelling Outright vs Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No.

Run a Correlation Matrix of these variables, also including the following variables:

> Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Median $

> Gross Capital Gains Reported by Taxpayers Gross Capital Gains – Amount $M

> Building Approvals Total Private Sector Dwelling Units No.

> Median total household income weekly ($)


Your Scatterplot should look something like the image below. Is this a positive or negative relationship, do you think?



The outcomes of the Correlation tool are shown below:


Correlation R Value and P Value Matrix

       
bld_approvals_tot_dwl_units_numM0_tenure_type_owned_outright_pr100M0_gross_capital_gains_rep_taxpayers_p_numM0_gross_capital_gains_rep_taxpayers_med_audM0_gross_capital_gains_rep_taxpayers_amount_audmM0_median_tot_hhd_inc_weekly
bld_approvals_tot_dwl_units_num100.07230.8440.63020.617
M0_tenure_type_owned_outright_pr100-0.2992100.37640.01140.0001
M0_gross_capital_gains_rep_taxpayers_p_num0.10440.234810.331900
M0_gross_capital_gains_rep_taxpayers_med_aud-0.0115-0.0515-0.0565100.042
M0_gross_capital_gains_rep_taxpayers_amount_audm0.0280.14660.70740.174910
M0_median_tot_hhd_inc_weekly0.02910.21850.61980.11810.4521

This table shows a number of the relationships between these variables

  1. suburbs with the highest number of building approvals, also had the lowest number of households which owned their home upright. There is also no relationship between the average household income of an area the number of building approvals.This makes sense from the perspective that growth areas, with large number of new dwellings being built, are usually where first home buyers purchase – those who are mortgaged rather than owning outright.
  2. Additionally, suburbs with the highest number of households owning their homes outright, had the highest number of taxpayers reporting gross capital gains.
  3. The total amount of capital gains in millions of dollars is also significantly associated with the proportion of households who own their home outright, as well as the median capital gains reported, and the number of taxpayers reporting capital gains (understandably).
  4. Median Income was strongly positively and significantly associated with the number of households which own their home outright, with the total gross capital gains reported in a suburb, the number of taxpayers reporting capital gains, and the median amount of gross capital gains per taxpayer.

Key Take-Home Message #2: Areas with higher income are associated with higher average home ownership rates, higher median capital gains values, and more taxpayers reporting capital gains

Homelessness, Housing Stress and Building Rates

Now we are going to change our analysis up and investigate the relationship between homelessness, housing stress, and building rates.

To start off with we will again begin by making some maps – this time, mapping our housing stress variable and our homelessness rates.


Create a Choropleth Map of the Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage

Create a Choropleth Centroid Map of the Estimated Total Number of Homeless People

Create a Static Scatterplot Chart of two variables


Your maps and static scatterplot should look something like the images below. Note that you can save the static scatterplot and load it into a document as a picture file.


 


Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, the number of homeless people in an area? Is there a relationship between these variables and the rate of new build of dwellings?


In order to answer these questions we need to consider two of the variables that we have included in our analysis. Both the number of homeless people and the number of new building approvals are raw counts, and may be dependent either on the total population size, or the total number of dwellings (these two are related as well). As a result, we may see a relationship between the number of the number of building approvals and the number of homeless people area, simply because both are related to larger population sizes, rather than being related to each other. As a result, it is important to standardise these numbers by a number reflecting population size. Consequently, we will divide the number of homeless people by the total population size in an area, and we will divide the number of new building approvals by the total number of dwellings in an area.


Using the Generate tool, create a new dataset with a new variable that is Total Number of Homeless People divided by Total Usual Resident Population

Using the Generate tool on this new dataset, create another new dataset with a new variable that isBuilding Approvals Total Private Sector Dwelling Units No divided by Total Dwellings


You should end up with two columns in a new dataset that look something like the far right columns below



If you map these variables, you will note that one of the SA2s – Banksmeadow – has a proportion of homeless people of approximately 26%. This is a spurious entry in our dataset and needs to be removed.


Using the Dataset Attribute Filter tool on this output dataset to remove Banksmeadow (hint: keep Homelessness rates below 0.1)


Now when you map these two new columns, it should look something like the image below. Do you think there is a relationship between the rate of homelessness and the rate of new building approvals? What about between the rate of homelessness and the proportion of mortgaged households spending more than 30% of their household income on their mortgage, and homeless, or building approval rates?



Previously, we have used the correlation tool to investigate the relationship between variables. Now we will use a chart tool to summarise these relationships


Create a Correlation Matrix Chart of these three variables (Proportion of Mortgaged Households spending more than 30% of income on mortgages, Building Approval Rates and Homelessness Rates)


Your output will look something like the image below. The ellipses in the top right indicate the strength and direction of the relationship, while the r values in the bottom left provide a parameter for the correlation value.



You can see a strong relationship between rate of homelessness and the proportion of mortgaged households spend more than 30% of income on mortgages. The rate of building approvals appears to have a relationship of some form with these mortgaged households, but no relationship with the rate of homelessness. The issue with the correlation matrix tool, is that, while it gives the strength and direction of the correlation, it does not provide an indicator or statistical significance – that is, the probability that you would get that particular value just by chance alone. For this we, we will also need to run the correlation tool, which we ran for the other two exercises


Run a Correlation Matrix of the same three variables, but also include the proportion of renting households spending more than 30% of their incomes on rent, that you created in Exercise One


The outputs of the Correlation can be seen in the table below. As with previous correlation outputs, the blue values in the bottom left of the table are the correlation r values, while the top right has the significance indicators (P Values), green being statistically significant, red being non-significant. You can see that the rate of homelessness is strongly and significantly positively associated with both proportions of mortgaged and renting households spending more than 30% of their income servicing housing costs – as these are with each other – while areas with higher rates of building approvals are only significantly positively associated with areas containing higher proportions of mortgaged – but not renting – households spending 30% or more of their income on housing costs

Correlation R Value and P Value Matrix


M30_MortHHR30_RentHHBApp_RateHL_Rate
M30_MortHH100.00240
R30_RentHH0.598
10.75840
BApp_Rate0.1755-0.017910.3037
HL_Rate0.17550.2820.05991

Key Take Home Message #3: Homelessness Rates are higher in areas with rental and mortgage stress


 

Theme Two: Employment and Education

Employment and Education

We are going to shift our perspective on Sydney now, leaving behind our treatment and interrogation of housing, property and homelessness to focus on health, employment and income. Our final merged dataset contains a number of variables which relate to these areas. These include

  • Labour Force Statistics Unemployed No
  • Labour Force Statistics Unemployment Rate %
  • Private Health Taxpayers Who Report Having Private Health Insurance No.
  • Selected Government Pensions & Allowances Age Pension – Centrelink No
  • Selected Government Pensions & Allowances Commonwealth Rent Assistance No
  • Selected Government Pensions & Allowances Disability Support Pension No.
  • Selected Government Pensions & Allowances Family Tax Benefit B No.
  • Selected Government Pensions & Allowances Family Tax Benefit A No.
  • Selected Government Pensions & Allowances Newstart Allowance No.
  • Selected Government Pensions & Allowances Parenting Payment – Partnered No.
  • Selected Government Pensions & Allowances Parenting Payment – Single No.
  • Selected Government Pensions & Allowances Youth Allowance (Full Time Students/Apprentices) No.
  • Selected Government Pensions & Allowances Youth Allowance (Other) No.
  • Estimates Of Personal Income Employee Income Earners No.

 

Employment

The first element of our housing data that we will look at relates to housing stress and its relationship to other factors such as income.

To start of with we will make some maps


Create a Choropleth Map of the Tenure Type Owned With A Mortgage % Variable

Create a Choropleth Centroid Map of the Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % Variable


Your maps should look something like the images below



Key Research Question: Do you think there is a relationship between the proportion of households in an SA2 who are servicing a mortgage, and the proportion of households who are spending more than 30% of their income on servicing a mortgage?


In order to answer the above question, it is worthwhile first of all viewing the shape of this potential relationship with a scatterplot. A scatterplot allows a preliminary ‘eyeballing’ of the data before bringing more rigourous statistical tests to bear on the data


Create an Interactive Scatterplot of Tenure Type Owned With A Mortgage % (x axis) vs Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % (y axis)


Your scatterplot should look something like the image below



This suggests that there is a relationship, but on closer inspection of the variables that we have mapped, it is not particularly insightful: areas with more households servicing a mortgage have more households servicing mortgages with more than 30% of their income – as one goes up the other is likely to go up.

What we are potentially more interested in is understanding which areas have higher proportions of households spending more than 30% of their income on a mortgage as a proportion of the total households servicing a mortgage not as a proportion of total households.

We can generate this measure in the portal by using the Generate tool


Using the Generate tool, create a new dataset with a new variable that is Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % divided by Tenure Type Owned With A Mortgage %

Create a Choropleth Map of this new variable (making sure to switch off your other layers as well!)


Your map should look something like the images below



Key Research Question: What is the potential relationship between the proportion of mortgaged households spending more than 30% of their income on their mortgage and the average income and median mortgage repayments?


Now we are starting to ask some interesting questions: do areas with a greater proportion of mortgaged households spending more than 30% of their income on mortgage have higher or lower average mortgage payments, or higher or lower incomes?

To answer this question, first we want to map mortgage repayments on top of our proportion map.


Create a Choropleth Centroid Map of the Median Monthly Mortgage Repayment

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Monthly Mortgage Repayments

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Weekly Total Household Income


Your map and scatterplot should look something like the images below


Median Monthly Mortgage Repayments:

Median Weekly Household Income:


We can see that there is a datapoint which is an outlier, which could potentially skew the dataset for any future statistical analysis. We need to remove this point from our dataset


Using the Dataset Attribute Filter tool, create a new dataset that has this row removed (hint – only keep rows that have a value of the column you created less than 1!)


Now that we have filtered our dataset we are going to run a correlation analysis of our data – we want to understand whether or not there are positive or negative relationships between our variable that we created, and the average amount either spent on mortgages or the average income taken in by a household in areas.

Remember, a correlation value ranges from -1 to 1. The sign of the correlation tells us whether or not it it is positive or negative. Positive correlations mean that as one variable goes up, the other does too. Negative correlations mean that as one variable goes up, the other goes down. The further away a correlation value is from 0 to either -1 or 1 gives us an indication of how strong the correlation is. The closer to zero, the weaker the relationship, the closer to -1 or 1, the stronger the relationship is.


Using the Correlation tool, run a correlation matrix for the three variables: median monthly income, median weekly rent, and the proportion of mortgaged households spending more than 30% of their income servicing their mortgage.


The output of your analysis should look something like the image below



The problem with this is that it is a text editor. However, if you copy and paste these values into excel, you should be able to see the data structured in a more meaningful way, similar to the tables below.


Correlation R value and P Value Matrix

PropMS_MM0_median_mortgage_repay_monthlyM0_median_tot_hhd_inc_weekly
PropMS_M100
M0_median_mortgage_repay_monthly-0.300710
M0_median_tot_hhd_inc_weekly-0.47550.8651

The matrix is mirrored so you only need to look at the bottom left three values for the tables (in bold/blue) for the correlation values. These show that there are moderate negative relationships between the proportion of mortgaged households spending 30% or more of their incomes servicing their mortgages and the median mortgage repayments and incomes. The top right of the table represents the statistical significance of the relationships. These relationships are statistically significant (the P-Values are negligibly close to zero), so they are shown in green.


Key Take-Home Message #1: Areas with lower incomes, and with lower monthly mortgage repayments, nonetheless have higher proportions of households spending more than 30% of their total income servicing their mortgages


Capital Gains, Ownership, and Building Approvals

Now we are going to switch gears somewhat and start to look at other aspects of the housing landscape in Sydney, by looking at Capital Gains, Housing Ownership and Building Approvals

To start off with we will again begin by making some maps


Create a Choropleth Map of the Tenure Type Owned Outright % Variable

Create a Choropleth Centroid Map of the Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No


Your maps should look something like the images below



Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, and the number of taxpayers reporting Capital Gains on their tax return?


Create an Interactive Scatterplot of Proportion of Households Owning their dwelling Outright vs Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No.

Run a Correlation Matrix of these variables, also including the following variables:

> Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Median $

> Gross Capital Gains Reported by Taxpayers Gross Capital Gains – Amount $M

> Building Approvals Total Private Sector Dwelling Units No.

> Median total household income weekly ($)


Your Scatterplot should look something like the image below. Is this a positive or negative relationship, do you think?



The outcomes of the Correlation tool are shown below:


Correlation R Value and P Value Matrix

       
bld_approvals_tot_dwl_units_numM0_tenure_type_owned_outright_pr100M0_gross_capital_gains_rep_taxpayers_p_numM0_gross_capital_gains_rep_taxpayers_med_audM0_gross_capital_gains_rep_taxpayers_amount_audmM0_median_tot_hhd_inc_weekly
bld_approvals_tot_dwl_units_num100.07230.8440.63020.617
M0_tenure_type_owned_outright_pr100-0.2992100.37640.01140.0001
M0_gross_capital_gains_rep_taxpayers_p_num0.10440.234810.331900
M0_gross_capital_gains_rep_taxpayers_med_aud-0.0115-0.0515-0.0565100.042
M0_gross_capital_gains_rep_taxpayers_amount_audm0.0280.14660.70740.174910
M0_median_tot_hhd_inc_weekly0.02910.21850.61980.11810.4521

This table shows a number of the relationships between these variables

  1. suburbs with the highest number of building approvals, also had the lowest number of households which owned their home upright. There is also no relationship between the average household income of an area the number of building approvals.This makes sense from the perspective that growth areas, with large number of new dwellings being built, are usually where first home buyers purchase – those who are mortgaged rather than owning outright.
  2. Additionally, suburbs with the highest number of households owning their homes outright, had the highest number of taxpayers reporting gross capital gains.
  3. The total amount of capital gains in millions of dollars is also significantly associated with the proportion of households who own their home outright, as well as the median capital gains reported, and the number of taxpayers reporting capital gains (understandably).
  4. Median Income was strongly positively and significantly associated with the number of households which own their home outright, with the total gross capital gains reported in a suburb, the number of taxpayers reporting capital gains, and the median amount of gross capital gains per taxpayer.

Key Take-Home Message #2: Areas with higher income are associated with higher average home ownership rates, higher median capital gains values, and more taxpayers reporting capital gains

Homelessness, Housing Stress and Building Rates

Now we are going to change our analysis up and investigate the relationship between homelessness, housing stress, and building rates.

To start off with we will again begin by making some maps – this time, mapping our housing stress variable and our homelessness rates.


Create a Choropleth Map of the Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage

Create a Choropleth Centroid Map of the Estimated Total Number of Homeless People

Create a Static Scatterplot Chart of two variables


Your maps and static scatterplot should look something like the images below. Note that you can save the static scatterplot and load it into a document as a picture file.


 


Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, the number of homeless people in an area? Is there a relationship between these variables and the rate of new build of dwellings?


In order to answer these questions we need to consider two of the variables that we have included in our analysis. Both the number of homeless people and the number of new building approvals are raw counts, and may be dependent either on the total population size, or the total number of dwellings (these two are related as well). As a result, we may see a relationship between the number of the number of building approvals and the number of homeless people area, simply because both are related to larger population sizes, rather than being related to each other. As a result, it is important to standardise these numbers by a number reflecting population size. Consequently, we will divide the number of homeless people by the total population size in an area, and we will divide the number of new building approvals by the total number of dwellings in an area.


Using the Generate tool, create a new dataset with a new variable that is Total Number of Homeless People divided by Total Usual Resident Population

Using the Generate tool on this new dataset, create another new dataset with a new variable that isBuilding Approvals Total Private Sector Dwelling Units No divided by Total Dwellings


You should end up with two columns in a new dataset that look something like the far right columns below



If you map these variables, you will note that one of the SA2s – Banksmeadow – has a proportion of homeless people of approximately 26%. This is a spurious entry in our dataset and needs to be removed.


Using the Dataset Attribute Filter tool on this output dataset to remove Banksmeadow (hint: keep Homelessness rates below 0.1)


Now when you map these two new columns, it should look something like the image below. Do you think there is a relationship between the rate of homelessness and the rate of new building approvals? What about between the rate of homelessness and the proportion of mortgaged households spending more than 30% of their household income on their mortgage, and homeless, or building approval rates?



Previously, we have used the correlation tool to investigate the relationship between variables. Now we will use a chart tool to summarise these relationships


Create a Correlation Matrix Chart of these three variables (Proportion of Mortgaged Households spending more than 30% of income on mortgages, Building Approval Rates and Homelessness Rates)


Your output will look something like the image below. The ellipses in the top right indicate the strength and direction of the relationship, while the r values in the bottom left provide a parameter for the correlation value.



You can see a strong relationship between rate of homelessness and the proportion of mortgaged households spend more than 30% of income on mortgages. The rate of building approvals appears to have a relationship of some form with these mortgaged households, but no relationship with the rate of homelessness. The issue with the correlation matrix tool, is that, while it gives the strength and direction of the correlation, it does not provide an indicator or statistical significance – that is, the probability that you would get that particular value just by chance alone. For this we, we will also need to run the correlation tool, which we ran for the other two exercises


Run a Correlation Matrix of the same three variables, but also include the proportion of renting households spending more than 30% of their incomes on rent, that you created in Exercise One


The outputs of the Correlation can be seen in the table below. As with previous correlation outputs, the blue values in the bottom left of the table are the correlation r values, while the top right has the significance indicators (P Values), green being statistically significant, red being non-significant. You can see that the rate of homelessness is strongly and significantly positively associated with both proportions of mortgaged and renting households spending more than 30% of their income servicing housing costs – as these are with each other – while areas with higher rates of building approvals are only significantly positively associated with areas containing higher proportions of mortgaged – but not renting – households spending 30% or more of their income on housing costs

Correlation R Value and P Value Matrix


M30_MortHHR30_RentHHBApp_RateHL_Rate
M30_MortHH100.00240
R30_RentHH0.598
10.75840
BApp_Rate0.1755-0.017910.3037
HL_Rate0.17550.2820.05991

Key Take Home Message #3: Homelessness Rates are higher in areas with rental and mortgage stress


 

Theme Three: Family and Community

Family and Community

We are going to shift our perspective on Sydney now, leaving behind our treatment and interrogation of housing, property and homelessness to focus on health, employment and income. Our final merged dataset contains a number of variables which relate to these areas. These include

  • Labour Force Statistics Unemployed No
  • Labour Force Statistics Unemployment Rate %
  • Private Health Taxpayers Who Report Having Private Health Insurance No.
  • Selected Government Pensions & Allowances Age Pension – Centrelink No
  • Selected Government Pensions & Allowances Commonwealth Rent Assistance No
  • Selected Government Pensions & Allowances Disability Support Pension No.
  • Selected Government Pensions & Allowances Family Tax Benefit B No.
  • Selected Government Pensions & Allowances Family Tax Benefit A No.
  • Selected Government Pensions & Allowances Newstart Allowance No.
  • Selected Government Pensions & Allowances Parenting Payment – Partnered No.
  • Selected Government Pensions & Allowances Parenting Payment – Single No.
  • Selected Government Pensions & Allowances Youth Allowance (Full Time Students/Apprentices) No.
  • Selected Government Pensions & Allowances Youth Allowance (Other) No.
  • Estimates Of Personal Income Employee Income Earners No.

 

Employment

The first element of our housing data that we will look at relates to housing stress and its relationship to other factors such as income.

To start of with we will make some maps


Create a Choropleth Map of the Tenure Type Owned With A Mortgage % Variable

Create a Choropleth Centroid Map of the Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % Variable


Your maps should look something like the images below



Key Research Question: Do you think there is a relationship between the proportion of households in an SA2 who are servicing a mortgage, and the proportion of households who are spending more than 30% of their income on servicing a mortgage?


In order to answer the above question, it is worthwhile first of all viewing the shape of this potential relationship with a scatterplot. A scatterplot allows a preliminary ‘eyeballing’ of the data before bringing more rigourous statistical tests to bear on the data


Create an Interactive Scatterplot of Tenure Type Owned With A Mortgage % (x axis) vs Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % (y axis)


Your scatterplot should look something like the image below



This suggests that there is a relationship, but on closer inspection of the variables that we have mapped, it is not particularly insightful: areas with more households servicing a mortgage have more households servicing mortgages with more than 30% of their income – as one goes up the other is likely to go up.

What we are potentially more interested in is understanding which areas have higher proportions of households spending more than 30% of their income on a mortgage as a proportion of the total households servicing a mortgage not as a proportion of total households.

We can generate this measure in the portal by using the Generate tool


Using the Generate tool, create a new dataset with a new variable that is Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % divided by Tenure Type Owned With A Mortgage %

Create a Choropleth Map of this new variable (making sure to switch off your other layers as well!)


Your map should look something like the images below



Key Research Question: What is the potential relationship between the proportion of mortgaged households spending more than 30% of their income on their mortgage and the average income and median mortgage repayments?


Now we are starting to ask some interesting questions: do areas with a greater proportion of mortgaged households spending more than 30% of their income on mortgage have higher or lower average mortgage payments, or higher or lower incomes?

To answer this question, first we want to map mortgage repayments on top of our proportion map.


Create a Choropleth Centroid Map of the Median Monthly Mortgage Repayment

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Monthly Mortgage Repayments

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Weekly Total Household Income


Your map and scatterplot should look something like the images below


Median Monthly Mortgage Repayments:

Median Weekly Household Income:


We can see that there is a datapoint which is an outlier, which could potentially skew the dataset for any future statistical analysis. We need to remove this point from our dataset


Using the Dataset Attribute Filter tool, create a new dataset that has this row removed (hint – only keep rows that have a value of the column you created less than 1!)


Now that we have filtered our dataset we are going to run a correlation analysis of our data – we want to understand whether or not there are positive or negative relationships between our variable that we created, and the average amount either spent on mortgages or the average income taken in by a household in areas.

Remember, a correlation value ranges from -1 to 1. The sign of the correlation tells us whether or not it it is positive or negative. Positive correlations mean that as one variable goes up, the other does too. Negative correlations mean that as one variable goes up, the other goes down. The further away a correlation value is from 0 to either -1 or 1 gives us an indication of how strong the correlation is. The closer to zero, the weaker the relationship, the closer to -1 or 1, the stronger the relationship is.


Using the Correlation tool, run a correlation matrix for the three variables: median monthly income, median weekly rent, and the proportion of mortgaged households spending more than 30% of their income servicing their mortgage.


The output of your analysis should look something like the image below



The problem with this is that it is a text editor. However, if you copy and paste these values into excel, you should be able to see the data structured in a more meaningful way, similar to the tables below.


Correlation R value and P Value Matrix

PropMS_MM0_median_mortgage_repay_monthlyM0_median_tot_hhd_inc_weekly
PropMS_M100
M0_median_mortgage_repay_monthly-0.300710
M0_median_tot_hhd_inc_weekly-0.47550.8651

The matrix is mirrored so you only need to look at the bottom left three values for the tables (in bold/blue) for the correlation values. These show that there are moderate negative relationships between the proportion of mortgaged households spending 30% or more of their incomes servicing their mortgages and the median mortgage repayments and incomes. The top right of the table represents the statistical significance of the relationships. These relationships are statistically significant (the P-Values are negligibly close to zero), so they are shown in green.


Key Take-Home Message #1: Areas with lower incomes, and with lower monthly mortgage repayments, nonetheless have higher proportions of households spending more than 30% of their total income servicing their mortgages


Capital Gains, Ownership, and Building Approvals

Now we are going to switch gears somewhat and start to look at other aspects of the housing landscape in Sydney, by looking at Capital Gains, Housing Ownership and Building Approvals

To start off with we will again begin by making some maps


Create a Choropleth Map of the Tenure Type Owned Outright % Variable

Create a Choropleth Centroid Map of the Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No


Your maps should look something like the images below



Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, and the number of taxpayers reporting Capital Gains on their tax return?


Create an Interactive Scatterplot of Proportion of Households Owning their dwelling Outright vs Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No.

Run a Correlation Matrix of these variables, also including the following variables:

> Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Median $

> Gross Capital Gains Reported by Taxpayers Gross Capital Gains – Amount $M

> Building Approvals Total Private Sector Dwelling Units No.

> Median total household income weekly ($)


Your Scatterplot should look something like the image below. Is this a positive or negative relationship, do you think?



The outcomes of the Correlation tool are shown below:


Correlation R Value and P Value Matrix

       
bld_approvals_tot_dwl_units_numM0_tenure_type_owned_outright_pr100M0_gross_capital_gains_rep_taxpayers_p_numM0_gross_capital_gains_rep_taxpayers_med_audM0_gross_capital_gains_rep_taxpayers_amount_audmM0_median_tot_hhd_inc_weekly
bld_approvals_tot_dwl_units_num100.07230.8440.63020.617
M0_tenure_type_owned_outright_pr100-0.2992100.37640.01140.0001
M0_gross_capital_gains_rep_taxpayers_p_num0.10440.234810.331900
M0_gross_capital_gains_rep_taxpayers_med_aud-0.0115-0.0515-0.0565100.042
M0_gross_capital_gains_rep_taxpayers_amount_audm0.0280.14660.70740.174910
M0_median_tot_hhd_inc_weekly0.02910.21850.61980.11810.4521

This table shows a number of the relationships between these variables

  1. suburbs with the highest number of building approvals, also had the lowest number of households which owned their home upright. There is also no relationship between the average household income of an area the number of building approvals.This makes sense from the perspective that growth areas, with large number of new dwellings being built, are usually where first home buyers purchase – those who are mortgaged rather than owning outright.
  2. Additionally, suburbs with the highest number of households owning their homes outright, had the highest number of taxpayers reporting gross capital gains.
  3. The total amount of capital gains in millions of dollars is also significantly associated with the proportion of households who own their home outright, as well as the median capital gains reported, and the number of taxpayers reporting capital gains (understandably).
  4. Median Income was strongly positively and significantly associated with the number of households which own their home outright, with the total gross capital gains reported in a suburb, the number of taxpayers reporting capital gains, and the median amount of gross capital gains per taxpayer.

Key Take-Home Message #2: Areas with higher income are associated with higher average home ownership rates, higher median capital gains values, and more taxpayers reporting capital gains

Homelessness, Housing Stress and Building Rates

Now we are going to change our analysis up and investigate the relationship between homelessness, housing stress, and building rates.

To start off with we will again begin by making some maps – this time, mapping our housing stress variable and our homelessness rates.


Create a Choropleth Map of the Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage

Create a Choropleth Centroid Map of the Estimated Total Number of Homeless People

Create a Static Scatterplot Chart of two variables


Your maps and static scatterplot should look something like the images below. Note that you can save the static scatterplot and load it into a document as a picture file.


 


Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, the number of homeless people in an area? Is there a relationship between these variables and the rate of new build of dwellings?


In order to answer these questions we need to consider two of the variables that we have included in our analysis. Both the number of homeless people and the number of new building approvals are raw counts, and may be dependent either on the total population size, or the total number of dwellings (these two are related as well). As a result, we may see a relationship between the number of the number of building approvals and the number of homeless people area, simply because both are related to larger population sizes, rather than being related to each other. As a result, it is important to standardise these numbers by a number reflecting population size. Consequently, we will divide the number of homeless people by the total population size in an area, and we will divide the number of new building approvals by the total number of dwellings in an area.


Using the Generate tool, create a new dataset with a new variable that is Total Number of Homeless People divided by Total Usual Resident Population

Using the Generate tool on this new dataset, create another new dataset with a new variable that isBuilding Approvals Total Private Sector Dwelling Units No divided by Total Dwellings


You should end up with two columns in a new dataset that look something like the far right columns below



If you map these variables, you will note that one of the SA2s – Banksmeadow – has a proportion of homeless people of approximately 26%. This is a spurious entry in our dataset and needs to be removed.


Using the Dataset Attribute Filter tool on this output dataset to remove Banksmeadow (hint: keep Homelessness rates below 0.1)


Now when you map these two new columns, it should look something like the image below. Do you think there is a relationship between the rate of homelessness and the rate of new building approvals? What about between the rate of homelessness and the proportion of mortgaged households spending more than 30% of their household income on their mortgage, and homeless, or building approval rates?



Previously, we have used the correlation tool to investigate the relationship between variables. Now we will use a chart tool to summarise these relationships


Create a Correlation Matrix Chart of these three variables (Proportion of Mortgaged Households spending more than 30% of income on mortgages, Building Approval Rates and Homelessness Rates)


Your output will look something like the image below. The ellipses in the top right indicate the strength and direction of the relationship, while the r values in the bottom left provide a parameter for the correlation value.



You can see a strong relationship between rate of homelessness and the proportion of mortgaged households spend more than 30% of income on mortgages. The rate of building approvals appears to have a relationship of some form with these mortgaged households, but no relationship with the rate of homelessness. The issue with the correlation matrix tool, is that, while it gives the strength and direction of the correlation, it does not provide an indicator or statistical significance – that is, the probability that you would get that particular value just by chance alone. For this we, we will also need to run the correlation tool, which we ran for the other two exercises


Run a Correlation Matrix of the same three variables, but also include the proportion of renting households spending more than 30% of their incomes on rent, that you created in Exercise One


The outputs of the Correlation can be seen in the table below. As with previous correlation outputs, the blue values in the bottom left of the table are the correlation r values, while the top right has the significance indicators (P Values), green being statistically significant, red being non-significant. You can see that the rate of homelessness is strongly and significantly positively associated with both proportions of mortgaged and renting households spending more than 30% of their income servicing housing costs – as these are with each other – while areas with higher rates of building approvals are only significantly positively associated with areas containing higher proportions of mortgaged – but not renting – households spending 30% or more of their income on housing costs

Correlation R Value and P Value Matrix


M30_MortHHR30_RentHHBApp_RateHL_Rate
M30_MortHH100.00240
R30_RentHH0.598
10.75840
BApp_Rate0.1755-0.017910.3037
HL_Rate0.17550.2820.05991

Key Take Home Message #3: Homelessness Rates are higher in areas with rental and mortgage stress


 

Theme Four: Health, Disability, Income

Health, Disability, and Income

We are going to shift our perspective on Sydney now, leaving behind our treatment and interrogation of housing, property and homelessness to focus on health, employment and income. Our final merged dataset contains a number of variables which relate to these areas. These include

  • Labour Force Statistics Unemployed No
  • Labour Force Statistics Unemployment Rate %
  • Private Health Taxpayers Who Report Having Private Health Insurance No.
  • Selected Government Pensions & Allowances Age Pension – Centrelink No
  • Selected Government Pensions & Allowances Commonwealth Rent Assistance No
  • Selected Government Pensions & Allowances Disability Support Pension No.
  • Selected Government Pensions & Allowances Family Tax Benefit B No.
  • Selected Government Pensions & Allowances Family Tax Benefit A No.
  • Selected Government Pensions & Allowances Newstart Allowance No.
  • Selected Government Pensions & Allowances Parenting Payment – Partnered No.
  • Selected Government Pensions & Allowances Parenting Payment – Single No.
  • Selected Government Pensions & Allowances Youth Allowance (Full Time Students/Apprentices) No.
  • Selected Government Pensions & Allowances Youth Allowance (Other) No.
  • Estimates Of Personal Income Employee Income Earners No.

 

Employment

The first element of our housing data that we will look at relates to housing stress and its relationship to other factors such as income.

To start of with we will make some maps


Create a Choropleth Map of the Tenure Type Owned With A Mortgage % Variable

Create a Choropleth Centroid Map of the Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % Variable


Your maps should look something like the images below



Key Research Question: Do you think there is a relationship between the proportion of households in an SA2 who are servicing a mortgage, and the proportion of households who are spending more than 30% of their income on servicing a mortgage?


In order to answer the above question, it is worthwhile first of all viewing the shape of this potential relationship with a scatterplot. A scatterplot allows a preliminary ‘eyeballing’ of the data before bringing more rigourous statistical tests to bear on the data


Create an Interactive Scatterplot of Tenure Type Owned With A Mortgage % (x axis) vs Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % (y axis)


Your scatterplot should look something like the image below



This suggests that there is a relationship, but on closer inspection of the variables that we have mapped, it is not particularly insightful: areas with more households servicing a mortgage have more households servicing mortgages with more than 30% of their income – as one goes up the other is likely to go up.

What we are potentially more interested in is understanding which areas have higher proportions of households spending more than 30% of their income on a mortgage as a proportion of the total households servicing a mortgage not as a proportion of total households.

We can generate this measure in the portal by using the Generate tool


Using the Generate tool, create a new dataset with a new variable that is Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % divided by Tenure Type Owned With A Mortgage %

Create a Choropleth Map of this new variable (making sure to switch off your other layers as well!)


Your map should look something like the images below



Key Research Question: What is the potential relationship between the proportion of mortgaged households spending more than 30% of their income on their mortgage and the average income and median mortgage repayments?


Now we are starting to ask some interesting questions: do areas with a greater proportion of mortgaged households spending more than 30% of their income on mortgage have higher or lower average mortgage payments, or higher or lower incomes?

To answer this question, first we want to map mortgage repayments on top of our proportion map.


Create a Choropleth Centroid Map of the Median Monthly Mortgage Repayment

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Monthly Mortgage Repayments

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Weekly Total Household Income


Your map and scatterplot should look something like the images below


Median Monthly Mortgage Repayments:

Median Weekly Household Income:


We can see that there is a datapoint which is an outlier, which could potentially skew the dataset for any future statistical analysis. We need to remove this point from our dataset


Using the Dataset Attribute Filter tool, create a new dataset that has this row removed (hint – only keep rows that have a value of the column you created less than 1!)


Now that we have filtered our dataset we are going to run a correlation analysis of our data – we want to understand whether or not there are positive or negative relationships between our variable that we created, and the average amount either spent on mortgages or the average income taken in by a household in areas.

Remember, a correlation value ranges from -1 to 1. The sign of the correlation tells us whether or not it it is positive or negative. Positive correlations mean that as one variable goes up, the other does too. Negative correlations mean that as one variable goes up, the other goes down. The further away a correlation value is from 0 to either -1 or 1 gives us an indication of how strong the correlation is. The closer to zero, the weaker the relationship, the closer to -1 or 1, the stronger the relationship is.


Using the Correlation tool, run a correlation matrix for the three variables: median monthly income, median weekly rent, and the proportion of mortgaged households spending more than 30% of their income servicing their mortgage.


The output of your analysis should look something like the image below



The problem with this is that it is a text editor. However, if you copy and paste these values into excel, you should be able to see the data structured in a more meaningful way, similar to the tables below.


Correlation R value and P Value Matrix

PropMS_MM0_median_mortgage_repay_monthlyM0_median_tot_hhd_inc_weekly
PropMS_M100
M0_median_mortgage_repay_monthly-0.300710
M0_median_tot_hhd_inc_weekly-0.47550.8651

The matrix is mirrored so you only need to look at the bottom left three values for the tables (in bold/blue) for the correlation values. These show that there are moderate negative relationships between the proportion of mortgaged households spending 30% or more of their incomes servicing their mortgages and the median mortgage repayments and incomes. The top right of the table represents the statistical significance of the relationships. These relationships are statistically significant (the P-Values are negligibly close to zero), so they are shown in green.


Key Take-Home Message #1: Areas with lower incomes, and with lower monthly mortgage repayments, nonetheless have higher proportions of households spending more than 30% of their total income servicing their mortgages


Capital Gains, Ownership, and Building Approvals

Now we are going to switch gears somewhat and start to look at other aspects of the housing landscape in Sydney, by looking at Capital Gains, Housing Ownership and Building Approvals

To start off with we will again begin by making some maps


Create a Choropleth Map of the Tenure Type Owned Outright % Variable

Create a Choropleth Centroid Map of the Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No


Your maps should look something like the images below



Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, and the number of taxpayers reporting Capital Gains on their tax return?


Create an Interactive Scatterplot of Proportion of Households Owning their dwelling Outright vs Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No.

Run a Correlation Matrix of these variables, also including the following variables:

> Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Median $

> Gross Capital Gains Reported by Taxpayers Gross Capital Gains – Amount $M

> Building Approvals Total Private Sector Dwelling Units No.

> Median total household income weekly ($)


Your Scatterplot should look something like the image below. Is this a positive or negative relationship, do you think?



The outcomes of the Correlation tool are shown below:


Correlation R Value and P Value Matrix

       
bld_approvals_tot_dwl_units_numM0_tenure_type_owned_outright_pr100M0_gross_capital_gains_rep_taxpayers_p_numM0_gross_capital_gains_rep_taxpayers_med_audM0_gross_capital_gains_rep_taxpayers_amount_audmM0_median_tot_hhd_inc_weekly
bld_approvals_tot_dwl_units_num100.07230.8440.63020.617
M0_tenure_type_owned_outright_pr100-0.2992100.37640.01140.0001
M0_gross_capital_gains_rep_taxpayers_p_num0.10440.234810.331900
M0_gross_capital_gains_rep_taxpayers_med_aud-0.0115-0.0515-0.0565100.042
M0_gross_capital_gains_rep_taxpayers_amount_audm0.0280.14660.70740.174910
M0_median_tot_hhd_inc_weekly0.02910.21850.61980.11810.4521

This table shows a number of the relationships between these variables

  1. suburbs with the highest number of building approvals, also had the lowest number of households which owned their home upright. There is also no relationship between the average household income of an area the number of building approvals.This makes sense from the perspective that growth areas, with large number of new dwellings being built, are usually where first home buyers purchase – those who are mortgaged rather than owning outright.
  2. Additionally, suburbs with the highest number of households owning their homes outright, had the highest number of taxpayers reporting gross capital gains.
  3. The total amount of capital gains in millions of dollars is also significantly associated with the proportion of households who own their home outright, as well as the median capital gains reported, and the number of taxpayers reporting capital gains (understandably).
  4. Median Income was strongly positively and significantly associated with the number of households which own their home outright, with the total gross capital gains reported in a suburb, the number of taxpayers reporting capital gains, and the median amount of gross capital gains per taxpayer.

Key Take-Home Message #2: Areas with higher income are associated with higher average home ownership rates, higher median capital gains values, and more taxpayers reporting capital gains

Homelessness, Housing Stress and Building Rates

Now we are going to change our analysis up and investigate the relationship between homelessness, housing stress, and building rates.

To start off with we will again begin by making some maps – this time, mapping our housing stress variable and our homelessness rates.


Create a Choropleth Map of the Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage

Create a Choropleth Centroid Map of the Estimated Total Number of Homeless People

Create a Static Scatterplot Chart of two variables


Your maps and static scatterplot should look something like the images below. Note that you can save the static scatterplot and load it into a document as a picture file.


 


Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, the number of homeless people in an area? Is there a relationship between these variables and the rate of new build of dwellings?


In order to answer these questions we need to consider two of the variables that we have included in our analysis. Both the number of homeless people and the number of new building approvals are raw counts, and may be dependent either on the total population size, or the total number of dwellings (these two are related as well). As a result, we may see a relationship between the number of the number of building approvals and the number of homeless people area, simply because both are related to larger population sizes, rather than being related to each other. As a result, it is important to standardise these numbers by a number reflecting population size. Consequently, we will divide the number of homeless people by the total population size in an area, and we will divide the number of new building approvals by the total number of dwellings in an area.


Using the Generate tool, create a new dataset with a new variable that is Total Number of Homeless People divided by Total Usual Resident Population

Using the Generate tool on this new dataset, create another new dataset with a new variable that isBuilding Approvals Total Private Sector Dwelling Units No divided by Total Dwellings


You should end up with two columns in a new dataset that look something like the far right columns below



If you map these variables, you will note that one of the SA2s – Banksmeadow – has a proportion of homeless people of approximately 26%. This is a spurious entry in our dataset and needs to be removed.


Using the Dataset Attribute Filter tool on this output dataset to remove Banksmeadow (hint: keep Homelessness rates below 0.1)


Now when you map these two new columns, it should look something like the image below. Do you think there is a relationship between the rate of homelessness and the rate of new building approvals? What about between the rate of homelessness and the proportion of mortgaged households spending more than 30% of their household income on their mortgage, and homeless, or building approval rates?



Previously, we have used the correlation tool to investigate the relationship between variables. Now we will use a chart tool to summarise these relationships


Create a Correlation Matrix Chart of these three variables (Proportion of Mortgaged Households spending more than 30% of income on mortgages, Building Approval Rates and Homelessness Rates)


Your output will look something like the image below. The ellipses in the top right indicate the strength and direction of the relationship, while the r values in the bottom left provide a parameter for the correlation value.



You can see a strong relationship between rate of homelessness and the proportion of mortgaged households spend more than 30% of income on mortgages. The rate of building approvals appears to have a relationship of some form with these mortgaged households, but no relationship with the rate of homelessness. The issue with the correlation matrix tool, is that, while it gives the strength and direction of the correlation, it does not provide an indicator or statistical significance – that is, the probability that you would get that particular value just by chance alone. For this we, we will also need to run the correlation tool, which we ran for the other two exercises


Run a Correlation Matrix of the same three variables, but also include the proportion of renting households spending more than 30% of their incomes on rent, that you created in Exercise One


The outputs of the Correlation can be seen in the table below. As with previous correlation outputs, the blue values in the bottom left of the table are the correlation r values, while the top right has the significance indicators (P Values), green being statistically significant, red being non-significant. You can see that the rate of homelessness is strongly and significantly positively associated with both proportions of mortgaged and renting households spending more than 30% of their income servicing housing costs – as these are with each other – while areas with higher rates of building approvals are only significantly positively associated with areas containing higher proportions of mortgaged – but not renting – households spending 30% or more of their income on housing costs

Correlation R Value and P Value Matrix


M30_MortHHR30_RentHHBApp_RateHL_Rate
M30_MortHH100.00240
R30_RentHH0.598
10.75840
BApp_Rate0.1755-0.017910.3037
HL_Rate0.17550.2820.05991

Key Take Home Message #3: Homelessness Rates are higher in areas with rental and mortgage stress


 

Theme Five: Populations and Demographics

Populations and Demographics

We are going to shift our perspective on Sydney now, leaving behind our treatment and interrogation of housing, property and homelessness to focus on health, employment and income. Our final merged dataset contains a number of variables which relate to these areas. These include

  • Labour Force Statistics Unemployed No
  • Labour Force Statistics Unemployment Rate %
  • Private Health Taxpayers Who Report Having Private Health Insurance No.
  • Selected Government Pensions & Allowances Age Pension – Centrelink No
  • Selected Government Pensions & Allowances Commonwealth Rent Assistance No
  • Selected Government Pensions & Allowances Disability Support Pension No.
  • Selected Government Pensions & Allowances Family Tax Benefit B No.
  • Selected Government Pensions & Allowances Family Tax Benefit A No.
  • Selected Government Pensions & Allowances Newstart Allowance No.
  • Selected Government Pensions & Allowances Parenting Payment – Partnered No.
  • Selected Government Pensions & Allowances Parenting Payment – Single No.
  • Selected Government Pensions & Allowances Youth Allowance (Full Time Students/Apprentices) No.
  • Selected Government Pensions & Allowances Youth Allowance (Other) No.
  • Estimates Of Personal Income Employee Income Earners No.

 

Employment

The first element of our housing data that we will look at relates to housing stress and its relationship to other factors such as income.

To start of with we will make some maps


Create a Choropleth Map of the Tenure Type Owned With A Mortgage % Variable

Create a Choropleth Centroid Map of the Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % Variable


Your maps should look something like the images below



Key Research Question: Do you think there is a relationship between the proportion of households in an SA2 who are servicing a mortgage, and the proportion of households who are spending more than 30% of their income on servicing a mortgage?


In order to answer the above question, it is worthwhile first of all viewing the shape of this potential relationship with a scatterplot. A scatterplot allows a preliminary ‘eyeballing’ of the data before bringing more rigourous statistical tests to bear on the data


Create an Interactive Scatterplot of Tenure Type Owned With A Mortgage % (x axis) vs Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % (y axis)


Your scatterplot should look something like the image below



This suggests that there is a relationship, but on closer inspection of the variables that we have mapped, it is not particularly insightful: areas with more households servicing a mortgage have more households servicing mortgages with more than 30% of their income – as one goes up the other is likely to go up.

What we are potentially more interested in is understanding which areas have higher proportions of households spending more than 30% of their income on a mortgage as a proportion of the total households servicing a mortgage not as a proportion of total households.

We can generate this measure in the portal by using the Generate tool


Using the Generate tool, create a new dataset with a new variable that is Household Stress Households With Mortgage Repayments Greater Than Or Equal To 30% Of Household Income % divided by Tenure Type Owned With A Mortgage %

Create a Choropleth Map of this new variable (making sure to switch off your other layers as well!)


Your map should look something like the images below



Key Research Question: What is the potential relationship between the proportion of mortgaged households spending more than 30% of their income on their mortgage and the average income and median mortgage repayments?


Now we are starting to ask some interesting questions: do areas with a greater proportion of mortgaged households spending more than 30% of their income on mortgage have higher or lower average mortgage payments, or higher or lower incomes?

To answer this question, first we want to map mortgage repayments on top of our proportion map.


Create a Choropleth Centroid Map of the Median Monthly Mortgage Repayment

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Monthly Mortgage Repayments

Create an Interactive Scatterplot of Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage vs Median Weekly Total Household Income


Your map and scatterplot should look something like the images below


Median Monthly Mortgage Repayments:

Median Weekly Household Income:


We can see that there is a datapoint which is an outlier, which could potentially skew the dataset for any future statistical analysis. We need to remove this point from our dataset


Using the Dataset Attribute Filter tool, create a new dataset that has this row removed (hint – only keep rows that have a value of the column you created less than 1!)


Now that we have filtered our dataset we are going to run a correlation analysis of our data – we want to understand whether or not there are positive or negative relationships between our variable that we created, and the average amount either spent on mortgages or the average income taken in by a household in areas.

Remember, a correlation value ranges from -1 to 1. The sign of the correlation tells us whether or not it it is positive or negative. Positive correlations mean that as one variable goes up, the other does too. Negative correlations mean that as one variable goes up, the other goes down. The further away a correlation value is from 0 to either -1 or 1 gives us an indication of how strong the correlation is. The closer to zero, the weaker the relationship, the closer to -1 or 1, the stronger the relationship is.


Using the Correlation tool, run a correlation matrix for the three variables: median monthly income, median weekly rent, and the proportion of mortgaged households spending more than 30% of their income servicing their mortgage.


The output of your analysis should look something like the image below



The problem with this is that it is a text editor. However, if you copy and paste these values into excel, you should be able to see the data structured in a more meaningful way, similar to the tables below.


Correlation R value and P Value Matrix

PropMS_MM0_median_mortgage_repay_monthlyM0_median_tot_hhd_inc_weekly
PropMS_M100
M0_median_mortgage_repay_monthly-0.300710
M0_median_tot_hhd_inc_weekly-0.47550.8651

The matrix is mirrored so you only need to look at the bottom left three values for the tables (in bold/blue) for the correlation values. These show that there are moderate negative relationships between the proportion of mortgaged households spending 30% or more of their incomes servicing their mortgages and the median mortgage repayments and incomes. The top right of the table represents the statistical significance of the relationships. These relationships are statistically significant (the P-Values are negligibly close to zero), so they are shown in green.


Key Take-Home Message #1: Areas with lower incomes, and with lower monthly mortgage repayments, nonetheless have higher proportions of households spending more than 30% of their total income servicing their mortgages


Capital Gains, Ownership, and Building Approvals

Now we are going to switch gears somewhat and start to look at other aspects of the housing landscape in Sydney, by looking at Capital Gains, Housing Ownership and Building Approvals

To start off with we will again begin by making some maps


Create a Choropleth Map of the Tenure Type Owned Outright % Variable

Create a Choropleth Centroid Map of the Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No


Your maps should look something like the images below



Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, and the number of taxpayers reporting Capital Gains on their tax return?


Create an Interactive Scatterplot of Proportion of Households Owning their dwelling Outright vs Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Persons No.

Run a Correlation Matrix of these variables, also including the following variables:

> Gross Capital Gains Reported By Taxpayers Gross Capital Gains – Median $

> Gross Capital Gains Reported by Taxpayers Gross Capital Gains – Amount $M

> Building Approvals Total Private Sector Dwelling Units No.

> Median total household income weekly ($)


Your Scatterplot should look something like the image below. Is this a positive or negative relationship, do you think?



The outcomes of the Correlation tool are shown below:


Correlation R Value and P Value Matrix

       
bld_approvals_tot_dwl_units_numM0_tenure_type_owned_outright_pr100M0_gross_capital_gains_rep_taxpayers_p_numM0_gross_capital_gains_rep_taxpayers_med_audM0_gross_capital_gains_rep_taxpayers_amount_audmM0_median_tot_hhd_inc_weekly
bld_approvals_tot_dwl_units_num100.07230.8440.63020.617
M0_tenure_type_owned_outright_pr100-0.2992100.37640.01140.0001
M0_gross_capital_gains_rep_taxpayers_p_num0.10440.234810.331900
M0_gross_capital_gains_rep_taxpayers_med_aud-0.0115-0.0515-0.0565100.042
M0_gross_capital_gains_rep_taxpayers_amount_audm0.0280.14660.70740.174910
M0_median_tot_hhd_inc_weekly0.02910.21850.61980.11810.4521

This table shows a number of the relationships between these variables

  1. suburbs with the highest number of building approvals, also had the lowest number of households which owned their home upright. There is also no relationship between the average household income of an area the number of building approvals.This makes sense from the perspective that growth areas, with large number of new dwellings being built, are usually where first home buyers purchase – those who are mortgaged rather than owning outright.
  2. Additionally, suburbs with the highest number of households owning their homes outright, had the highest number of taxpayers reporting gross capital gains.
  3. The total amount of capital gains in millions of dollars is also significantly associated with the proportion of households who own their home outright, as well as the median capital gains reported, and the number of taxpayers reporting capital gains (understandably).
  4. Median Income was strongly positively and significantly associated with the number of households which own their home outright, with the total gross capital gains reported in a suburb, the number of taxpayers reporting capital gains, and the median amount of gross capital gains per taxpayer.

Key Take-Home Message #2: Areas with higher income are associated with higher average home ownership rates, higher median capital gains values, and more taxpayers reporting capital gains

Homelessness, Housing Stress and Building Rates

Now we are going to change our analysis up and investigate the relationship between homelessness, housing stress, and building rates.

To start off with we will again begin by making some maps – this time, mapping our housing stress variable and our homelessness rates.


Create a Choropleth Map of the Proportion of Mortgaged Households Spending 30% or more of Income on Mortgage

Create a Choropleth Centroid Map of the Estimated Total Number of Homeless People

Create a Static Scatterplot Chart of two variables


Your maps and static scatterplot should look something like the images below. Note that you can save the static scatterplot and load it into a document as a picture file.


 


Key Research Question: What is the potential relationship between the proportion of households who own their dwelling outright, the number of homeless people in an area? Is there a relationship between these variables and the rate of new build of dwellings?


In order to answer these questions we need to consider two of the variables that we have included in our analysis. Both the number of homeless people and the number of new building approvals are raw counts, and may be dependent either on the total population size, or the total number of dwellings (these two are related as well). As a result, we may see a relationship between the number of the number of building approvals and the number of homeless people area, simply because both are related to larger population sizes, rather than being related to each other. As a result, it is important to standardise these numbers by a number reflecting population size. Consequently, we will divide the number of homeless people by the total population size in an area, and we will divide the number of new building approvals by the total number of dwellings in an area.


Using the Generate tool, create a new dataset with a new variable that is Total Number of Homeless People divided by Total Usual Resident Population

Using the Generate tool on this new dataset, create another new dataset with a new variable that isBuilding Approvals Total Private Sector Dwelling Units No divided by Total Dwellings


You should end up with two columns in a new dataset that look something like the far right columns below



If you map these variables, you will note that one of the SA2s – Banksmeadow – has a proportion of homeless people of approximately 26%. This is a spurious entry in our dataset and needs to be removed.


Using the Dataset Attribute Filter tool on this output dataset to remove Banksmeadow (hint: keep Homelessness rates below 0.1)


Now when you map these two new columns, it should look something like the image below. Do you think there is a relationship between the rate of homelessness and the rate of new building approvals? What about between the rate of homelessness and the proportion of mortgaged households spending more than 30% of their household income on their mortgage, and homeless, or building approval rates?



Previously, we have used the correlation tool to investigate the relationship between variables. Now we will use a chart tool to summarise these relationships


Create a Correlation Matrix Chart of these three variables (Proportion of Mortgaged Households spending more than 30% of income on mortgages, Building Approval Rates and Homelessness Rates)


Your output will look something like the image below. The ellipses in the top right indicate the strength and direction of the relationship, while the r values in the bottom left provide a parameter for the correlation value.



You can see a strong relationship between rate of homelessness and the proportion of mortgaged households spend more than 30% of income on mortgages. The rate of building approvals appears to have a relationship of some form with these mortgaged households, but no relationship with the rate of homelessness. The issue with the correlation matrix tool, is that, while it gives the strength and direction of the correlation, it does not provide an indicator or statistical significance – that is, the probability that you would get that particular value just by chance alone. For this we, we will also need to run the correlation tool, which we ran for the other two exercises


Run a Correlation Matrix of the same three variables, but also include the proportion of renting households spending more than 30% of their incomes on rent, that you created in Exercise One


The outputs of the Correlation can be seen in the table below. As with previous correlation outputs, the blue values in the bottom left of the table are the correlation r values, while the top right has the significance indicators (P Values), green being statistically significant, red being non-significant. You can see that the rate of homelessness is strongly and significantly positively associated with both proportions of mortgaged and renting households spending more than 30% of their income servicing housing costs – as these are with each other – while areas with higher rates of building approvals are only significantly positively associated with areas containing higher proportions of mortgaged – but not renting – households spending 30% or more of their income on housing costs

Correlation R Value and P Value Matrix


M30_MortHHR30_RentHHBApp_RateHL_Rate
M30_MortHH100.00240
R30_RentHH0.598
10.75840
BApp_Rate0.1755-0.017910.3037
HL_Rate0.17550.2820.05991

Key Take Home Message #3: Homelessness Rates are higher in areas with rental and mortgage stress


 

Theme Six: Examining Relationships Between Themes

Managing and preparing your datasets

Now that you’ve set up your project and shopped for some data, you’re ready to go. Before we jump into our analysis, we need to organise our data in a way that allows us to deploy some of the tools. In particular, we need to get all of our datasets into a single table, with each row being an SA2 in Sydney.

To do this, we will use the merge aggregated datasets tool. This tool only allows for one table to be added at a time, so we will need to use it iteratively, successively adding on a table at a time to make it larger than larger. 

One of the key things we will need to remember when using this tool is to be careful to name each step clearly and carefully. The first few steps of the process are explained below in the images below

Open the merge tool and enter your first two datasets, click Add & Run:



Once the tool has run, rename the output dataset something which reflects the datasets that went into producing it:



Now, re-open the merge tool by clicking on the small cogs next to the tool entry under the Analyse panel:



Enter your most recently created dataset (the one that you renamed) on the left hand side, and the next dataset to add on the right hand side:



Repeat these last few steps until you have a dataset which contains all of the datasets you shopped for in a single table At this stage, you may wish to delete the intermediate joined tables if it is creating too much clutter in your session – just make sure you have added everything in, and that you don’t delete the datasets you shopped for originally!