tutorial: Exploring socio-economic indicators

In this tutorial we are going to explore a range of socio-economic indicators that can be used to investigate the distribution of advantage and disadvantage across Australia. There are a range of datasets and indicators which can be used to describe how advantage and disadvantage are distributed spatially, and we will be explore their similarities and differences.

We will also use a range of the tools in the AURIN workbench so that you can become familiar with some of the analytical components of the workbench, in addition to investigating how socio-economic advantage and disadvantage are structured in Australia.

An Introduction to Socio-Economic Indicators

There are a range of socio-economic indicators that you will be introduced to as part of this tutorial

First and foremost among these are the Socio-economic Indexes for Areas (SEIFA). These are produced by the Australian Bureau of Statistics after each census to characterise areas as relatively more or less advantaged or disadvantaged compared to other areas in Australia. There are four SEIFA indexes, each calculated at multiple spatial scales:

  • The Index of Relative Socio-economic Disadvantage (IRSD)
  • The Index of Relative Socio-economic Advantage and Disadvantage (IRSAD)
  • The Index of Education and Occupation (IEO)
  • The Index of Economic Resources (IER)

The production of these indices is quite involved, so they won’t be described in great detail. Nonetheless, they can be explored further here

In addition to SEIFA, two other datasets produced by the Australian Bureau of Statistics have been included here. 

The first of these is an experimental index produced in 2011 known as the Socio-economic Index for Individuals (SEIFI). This actually comprises two indices:

  • The SEIFI Index of Relative Socio-economic Disadvantage (SEIFI IRSD)
  • The SEIFI Index of Relative Socio-economic Advantage and Disadvantage (IRSAD)

In this analysis, based on 2006 census information, working age individuals are categorised as either advantaged or disadvantaged. Then, for each geographic area, the proportion or percentage of individuals within certain groupings of advantage or disadvantage in that area can be determined. Information on the development of the SEIFI indexes can be found here.

The second is an experimental index produced in 2019 from 2016 census data, known as the Index of Household Advantage and Disadvantage (IHAD)

In this index, households are categorised as either advantaged or disadvantaged. Then, for each geographic area, the proportion or percentage of households within the quartiles of advantage or disadvantage across Australia in that area can be determined.

These experimental indexes are world leading, and highlight the ABS’ leading edge position in the field of understanding the complexity of socio-economic advantage and disadvantage. More information on the work undertaken by the ABS can be read here.

In addition to these datasets, we will also include some other important measures. These are described in more detail as they are introduced in the exercises.

Setting Up Your Workspaces

SETTING UP YOUR WORKSPACES

We will be using the AURIN Map for our first exercise, examining the impact of spatial scale on our results. The AURIN Map is a data view, free to use for any person with an internet connection. For this exercise:

  • Go to the AURIN Map (www.map.aurin.org.au)
  • Accept the Terms of Use for the Map
  • Type in Sydney in the Search box to the top left of the window and select the first location which appears (Sydney, NSW)


  • Click the Add Data button and type in IRSAD into the search button. Add the following two datasets:
    • ABS SEIFA IRSAD by SA1 2016
    • ABS SEIFA IRSAD by SA12 2016


Your map should end looking like this



 

We will be investigating socio-economic indicators as they relate to Greater Sydney.

We are going to be bringing in datasets from two different time periods (2006, 2016), which means that our area selection will need to change as we upload the datasets.

2016 Datasets

For our 2016 datasets, we will need to select the Greater Sydney Capital City Statistical Area.

For specific information on how to select your areas, have a look at our comprehensive area selection page

Before you load the 2016 datasets, your portal should look something like the image below. Note the code in the top left of the Area Selection panel



The datasets you will need to load with this area selection are:

DatasetVariable
SA2 SEIFA 2016 - The Index of Economic Resources (IER)SA2 9-digit code 2016
SA2 name 2016
Index score
SA2 SEIFA 2016 - The Index of Education and Occupation (IEO)SA2 9-digit code 2016
Index score
SA2 SEIFA 2016 - The Index of Relative Socio-economic Disadvantage (IRSD)SA2 9-digit code 2016
Index score
SA2 SEIFA 2016 - The Index of Relative Socio-economic Advantage and Disadvantage (IRSAD)SA2 9-digit code 2016
Index score
SA1 SEIFA 2016 - The Index of Relative Socio-economic Advantage and Disadvantage (IRSAD)SA1 11-digit code 2016
SA1 7-digit code 2016
Index score
ABS - Index of Household Advantage and Disadvantage - Percentage of Households (SA1) 20162016 Statistical Area Level 1 (SA1) 7-Digit Code
2016 Statistical Area Level 1 (SA1) 11-Digit Code
Percentage Of Households In The IHAD (a): Quartile 1
Percentage Of Households In The IHAD (a): Quartile 2
Percentage Of Households In The IHAD (a): Quartile 3
Percentage Of Households In The IHAD (a): Quartile 4
Number Of Households In The IHAD (a): Quartile 1
Number Of Households In The IHAD (a): Quartile 2
Number Of Households In The IHAD (a): Quartile 3
Number Of Households In The IHAD (a): Quartile 1
ABS - Index of Household Advantage and Disadvantage - Percentage of Households (SA2) 2016SA2 9-Digit Code 2016
Percentage Of Households In The IHAD (a): Quartile 1
Percentage Of Households In The IHAD (a): Quartile 2
Percentage Of Households In The IHAD (a): Quartile 3
Percentage Of Households In The IHAD (a): Quartile 4
Number Of Households In The IHAD (a): Quartile 1
Number Of Households In The IHAD (a): Quartile 2
Number Of Households In The IHAD (a): Quartile 3
Number Of Households In The IHAD (a): Quartile 4
NATSEM - Social and Economic Indicators - Synthetic Estimates SA2 2016SA2 Code
Housing Stress (30/40 rule)
Median Household Income (synthetic)
Poverty Rate (proportion of people with equivalised disposable household income after housing costs is below half median equivalised disposable household income after housing costs)
Gini Coefficient
ABS - Data by Region - Education & Employment (SA2) 2011-2017SA2 Code 2016
Year: 2016
Youth Engagement In Work/Study Fully Engaged %
Labour Force Statistics Unemployment Rate %
Highest Year Of School Completed - Persons Aged 15 Years And Over Completed Year 10 Or Equivalent %
ABS - Data by Region - Family & Community (SA2) 2011-2016SA2 Code 2016
Year: 2016
Housing Suitability Dwellings With Extra Bedrooms Needed No.
ABS - Data by Region - Income (Including Government Allowances) (SA2) 2011-2017SA2 Code 2016
Year: 2016
Selected Government Pensions & Allowances Newstart Allowance No.
Selected Government Pensions & Allowances Youth Allowance (Full Time Students/Apprentices) No.
Selected Government Pensions & Allowances Age Pension - Centrelink No.
Selected Government Pensions & Allowances Disability Support Pension No.
SA2 Aggregated Population & Dwelling Counts 2016 Census for AustraliaSA2 Code
Total Dwelling Count 2016
Total Usual Resident Population 2016

Once you have loaded these datasets, we are ready to change our area selection

2006 Datasets

For our 2006 datasets, we will need to select the Sydney Statistical Division

For specific information on how to select your areas, have a look at our comprehensive area selection page

Before you load the 2006 datasets, your portal should look something like the image below. Note the code in the top left of the Area Selection panel.



The datasets you will need to load with this area selection are:

DatasetVariable
CCD SEIFI 2006 - Index of Relative Socio-economic Disadvantage - 4 GroupsCD Code
Group 1
Group 2
Group 3
Group 4
SA2 9-digit code 2016
CCD SEIFI 2006 - Index of Relative Socio-economic Advantage and Disadvantage - 10 GroupsCD Code
Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Exercise One: Comparing the SEIFA Indexes

Our first exercise will involve creating Choropleth maps of each of the four SEIFA Indexes at SA2 level across Greater Sydney.  Comprehensive documentation on how to create a choropleth map can be found here. 

Make sure to choose different colour schemes for each of the maps, but make sure that they match in terms of your number of classes, and the classification method (quantile, natural breaks etc). That way you can tell them apart, and compare them.


Task 1.1: Create choropleth maps for each of the four SEIFA indexes at SA2 level (IEO, IER, IRSD, IRSAD)

Do you notice any overlap in the distribution of index scores? Any differences?


Your maps should look like the maps below:


Index of Economic Resources (IER)


Index of Education and Occupation


Index of Relative Socio-economic Disadvantage (IRSD)


Index of Relative Socio-economic Advantage and Disadvantage


It looks like there might be substantial overlap between these indexes: areas in southern and western Sydney look to have lower index scores for all four indexes, while the north and east appears to have higher index scores for all four indexes. Is there one that stands out that looks slightly different?

In order to investigate this more fully, we are going to use a scatterplot 

Before we do this however, we need to merge our four datasets together into a single dataset. We will use the merge aggregated datasets tool. This tool will take two datasets and produce a resultant dataset which has each row (in our case, each SA2) joined, with the variables from the two initial datasets joined. Because we can only join two datasets at a time, we will need to run this tool an additional two times to add all four datasets together. 

It is important that you rename your datasets in a a way that indicates the order that you join the datasets together. This is because for each of the four datasets, the variable that we want to look at is called Index Score. 


Exercise 1.2: Merge your four datasets together

The first instance is shown below, including the renaming process – you will need to do this a total of three times.

Create a static scatterplot of two of your variables. Remember that they will both be called Index Score, so remember which ones you select and put these as X and Y in your Charts!



Enter your parameters for the static scatter plot charts similar to the screen shot below


Your scatterplots should look something like the images below:


IER vs IEO


IER vs IRSD


IER vs IRSAD


IEO vs IRSD


IEO vs IRSAD


IRSD vs IRSAD


We can see that the strength of the relationship between the indexes varies. In fact, if we can determine the strength of these relationships by running the correlation tool. The outputs of that analysis are shown below. The correlation r values are shown in the bottom left, while the top right has the significance (green = significant, red = non-significant).


     
IER Index ScoreIEO Index ScoreIRSD Index ScoreIRSAD Index Score
IER Index Score1/NaN000
IEO Index Score0.41431/NaN00
IRSD Index Score0.7780.84961/NaN0
IRSAD Score0.66420.94440.96061/NaN

You can see that all of the indexes are correlated moderately or very strongly, and all significantly. The least associated with the other indexes is the IER.

Exercise Two: IHAD and SEIFA in 2016

Now we are going to switch gears and switch spatial frames and examine socio-economic indicators at a much finer spatial scale. We are going to include two indexes in our analysis:

  • The 2016 SA1 Index of Relative Socio-economic Advantage and Disadvantage (IRSAD)
  • The 2016 SA1 Index of Household Advantage and Disadvantage (IHAD)

The latter of these is an experimental index which estimates the socio-economic status of individual households, and the groups households into quartiles – Quartile 1 (most disadvantaged), Quartile 2 (middle-disadvantaged), Quartile 3 (middle-advantaged) and Quartile 4 (most advantaged). Neighbourhoods with more than 25% in any of these groups have more than the national average for that particular group.

First of all, lets create some choropleths of the four quartiles across Sydney. Remember again to choose the same classifier and number of classes, but choose a different palette for each of the four quartiles. Also create a choropleth of the IRSAD Index Score.


Task 2.1: Create choropleth maps for each of the four IHAD Quartiles and the IRSAD Index Score


Your maps should look something like the maps below (we have zoomed in to see the detail)


IHAD Quartile 1 (Most Disadvantaged)


IHAD Quartile 2 (Middle-Disadvantaged)


IHAD Quartile 3 (Middle-Advantaged)


IHAD Quartile 4 (Most Advantaged)


SEIFA IRSAD Index Score


We can see that there are higher proportions of disadvantaged households in the south and west of Sydney, and higher proportions of advantaged households in the north and east of Sydney. Interestingly, while middle-disadvantaged households look to cluster more in the south and the west with most disadvantaged households, there appears to be less of a spatial patterning of middle-advantaged households. It also appears that IRSAD Scores are spatially associated with advantage and disadvantage. 

We can test this by first merging our datasets together, and then running a correlation on the variables.


Task 2.2: Merge your SA1 SEIFA IRSAD dataset with your SA1 IHAD dataset.

The parameters for this are shown below.

Don’t forget to rename your dataset after you’ve created it!

Run a correlation analysis on the merged dataset. The parameters for this are shown below


MERGE PARAMETERS
 
CORRELATION PARAMETERS


The outputs of the correlation analysis are shown below. The bottom left of the table indicates the strength and direction of the relationship (correlation r values), while the top right of the table indicates the statistical significance (green = significant, red = non-significant). We can see that there is a very strong relationship between Quartiles 1 and 4 and the SA1 IRSAD Score, and moderately so for Quartiles 2 and 3. There appears to be a sorting of the dataset between Quartiles 1 and 2 and Quartiles 3 and 4. The quartiles are positively associated with other on their end of the spectrum, although the relationship between Quartiles 1 and 2 is stronger than the relationship between Quartiles 3 and 4. However, there is no significant relationship between the proportions of Quartile 2 and 3.


      
IRSAD Index ScoreIHAD Quartile 1IHAD Quartile 2IHAD Quartile 3IHAD Quartile 4
IRSAD Index Score1/NaN0000
IHAD Quartile 1-0.86541/NaN000
IHAD Quartile 2-0.46150.20991/NaN0.61540
IHAD Quartile 30.4039-0.5778-0.00491/NaN0
IHAD Quartile 40.8511-0.7833-0.66250.10431/NaN

Now we are going to look at the socio-economic diversity within SA1s – we want to identify SA1s where there is a mix of households from the IHAD quartiles. We are going to use the Diversity Index tool. The diversity index ranks areas with a score from 0 to 1, where areas which are more diverse have scores closer to 0 and areas with less diversity are closer to 1.


Task 2.3: Run the Diversity Index tool on your original SA1 IHAD dataset, using the number of households in each of the quartiles, rather than the percentages. 

The parameters for this are shown below.

Remember to rename this dataset!

This dataset will contain a number of rows which have a zero value – these are actually SA1s which zero counts. We need to remove these rows. Use the Dataset Attribute Filter tool to only keep rows in this dataset that have a Diversity Index greater than zero.

The parameters for this are shown below.

Remember to rename this dataset!

Create a choropleth map of this diversity index


DIVERSITY INDEX PARAMETERS

DATASET ATTRIBUTE FILTER PARAMETERS


Your resultant choropleth should look something like the map below.



You can see that the most socio-economically diverse neighbourhoods occur in some (but not all) of the neighbourhoods the west and in the south, but not in the north or in the east.

We can examine this pattern more closely by examining the correlation between the diversity index of SA1s and the percentage of households in each of the quartiles (We would prepare our dataset, of course, by merging our filtered diversity index dataset with our original SA1 IHAD dataset, and then running a correlation analysis on these variables). The output of that correlation analysis is shown below


      
Diversity Index ScoreIHAD Quartile 1IHAD Quartile 2IHAD Quartile 3IHAD Quartile 4
Diversity1/NaN0000
IHAD Quartile 10.07771/NaN000
IHAD Quartile 2-0.56980.20991/NaN0.61540
IHAD Quartile 3-0.3865-0.5778-0.00491/NaN0
IHAD Quartile 40.3637-0.7833-0.66250.10431/NaN

We can see from this that there is a strong negative relationship between the diversity index and the proportions of Quartile 2 and 3 in an SA1. However, recall that higher values of a diversity index actually mean less diversity. So if the diversity index value decreases with these quartiles, it means that they are correlated with more socio-economic diversity. By contrast, Quartile 4 percentages are moderately positively correlated with the diversity index, meaning that there is less socioeconomic diversity in areas with higher percentages of Quartile 4 households. Quartile 1 percentages had a very weak but still significant positive correlation with the diversity index, meaning that there was less socio-economic diversity in SA1s with higher percentages of Quartile 1 households.

 

Exercise Three: SEIFI and SEIFA in 2006

Our next analysis is going to stay at the spatial scale (roughly!) but we are going to wind the clock back ten years to the 2006 Census. There are four variables we are going to examine are:

  • The 2006 CCD Index of Relative Socio-economic Disadvantage (IRSD)
  • The 2006 CCD Index of Relative Socio-economic Advantage and Disadvantage (IRSAD)
  • The 2006 CCD Socio-Economic Indexes for Individuals (SEIFI) Index of Relative Socio-economic Disadvantage; and 
  • The 2006 CCD Socio-Economic Indexes for Individuals (SEIFI) Index of Relative Socio-economic Disadvantage; and 

The first of these are contained within in a single csv dataset which you can download from here, and will need to be imported into your portal session (the parameters are shown in the screenshots below).


Import Parameters


The latter two of these is also an experimental index which estimates the socio-economic status of working age individuals. For the SEIFI IRSD, these individuals are grouped into four groups: Group 1 is the most disadvantaged 20% of Australians, Group 2 is the middle disadvantaged 20%, Group 3 is the middle-advantaged 30%, and Group 4 is the most advantaged 30%. For the SEIFI IRSAD, these individuals are grouped into ten groups, each one roughly representing a decile. 

First of all, lets create some choropleths of the four SEIFI IRSD group percentages across Sydney. Remember again to choose the same classifier and number of classes, but choose a different palette for each of the four quartiles.

Also create a choropleth of the 2006 SEIFA IRSD and IRSAD Index Scores


Task 3.1: Create choropleth maps for each of the four SEIFI IRSD groups, and the IRSAD Index Score


Your maps should look something like the maps below (we have zoomed in to see the detail)


SEIFI IRSD Group 1 (Most Disadvantaged)


SEIFI IRSD Group 2 (Middle-Disadvantaged)


SEIFI IRSD Group 3 (Middle-Advantaged)


SEIFI IRSD Group 4 (Most Advantaged)

 


SEIFA IRSD Index Score


SEIFA IRSAD INDEX Score


As with our 2016 IHAD maps, we can see that in 2006 there were higher proportions of disadvantaged households in the south and west of Sydney, and higher proportions of advantaged households in the north and east of Sydney. Similar to 2016, middle-disadvantaged households look to cluster more in the south and the west with most disadvantaged households. However, whereas the 2016 IHAD data had much less apparent spatial patterning of the middle-advantaged quartile, group 3 (its 2006 analogue) appears to exhibit a stronger spatial clustering, in particular in the north and south of the metro region. IRSD and IRSAD scores appear to cluster with these patterns of grouping.

Now we are going to look at the socio-economic diversity within CCDs – we want to identify SA1s where there is a mix of households from the 10 SEIFI IRSAD groups. Again we are going to use the Diversity Index tool from the exercise above.


Task 3.2: Run the Diversity Index tool on your CCD SEIFI 10 Groups dataset. These are percentages rather than numbers, but it will have no impact on the results.

The parameters for this are shown below.

Remember to rename this dataset!

This dataset will contain a number of rows which have a zero value – these are actually CCDs which zero percentages We need to remove these rows. Use the Dataset Attribute Filter tool to only keep rows in this dataset that have a Diversity Index greater than zero.

The parameters for this are shown below.

Remember to rename this dataset!

Create a choropleth map of this diversity index


DIVERSITY INDEX PARAMETERS
DATASET ATTRIBUTE FILTER PARAMETERS

Your resultant choropleth should look something like the map below.



You can see that the most socio-economical diversity neighbourhoods occur in the south and the west of the city in Sydney, and very little in the north and east. This pattern appears more stark than the diversity index we calculated for the IHAD Quartiles. Also, examine the range of diversity index values – what can you see is the difference with our 2016 values? Overall, there appears to be more socio-economic diversity in 2006, but do you think this is a correct conclusion? What do you think the reason for these patterns might be (think about the number of groups we included in these analyses!)

We can examine this spatial pattern more closely by examining the correlation between the diversity index of CCDs and the SEIFA IRSD and IRSAD Score (We would prepare our dataset, of course, by merging our filtered diversity index dataset with our imported CCD SEIFA dataset, and then running a correlation analysis on these variables). The output of that correlation analysis is shown below


    
Diversity Index Score2006 SEIFA IRSD Score 2006 SEIFA IRSAD Score
Diversity1/NaN00
2006 SEIFA IRSD Score 0.95051/NaN0
2006 SEIFA IRSAD Score0.11270.30961/NaN

We can see from this that as with the 2016 data, there is a very strong positive correlation between IRSD and IRSAD index scores. There is also a weak positive relationship between the diversity index and the an SEIFA IRSD scores and a moderate positive relationship between the diversity index and SEIFA IRSAD scores. Again, recall that higher values of a diversity index actually mean less diversity. So this means that specifically as socio-economic advantage increases, there is statistically significant drop in socio-economic diversity. As socio-economic disadvantage increases, there is an increase in socio-economic diversity.

Exercise Four: Combining SEIFA with other Socio-Economic Indicators

Now it’s time to turn our attention to the relationship between these socio-economic indexes, and other indicators of socio-economic well being, advantage and disadvantage. We will shifting back to the SA2 level in Sydney, and combining some of our SEIFA datasets with other datasets.

This is where the fundamental value of the AURIN Workbench becomes apparent. In this analysis we are going to combine datasets from disparate domains along lines of a common geography and start to develop a much richer picture for our understanding of the lived experience of socio-economic status.

We are going to include these variables in this exercise

  • The 2016 SA2 Index of Relative Socio-economic Advantage and Disadvantage (IRSAD)
  • ABS Index of Household Advantage and Disadvantage – Percentage of Households (SA2) 2016
  • Poverty Rate (a)
  • Housing Stress (a)
  • Median Household Income (a)
  • Gini Coefficient (a)
  • Youth Engagement In Work/Study Fully Engaged % (b)
  • Labour Force Statistics Unemployment Rate % (b)
  • Highest Year Of School Completed – Persons Aged 15 Years And Over Completed Year 10 Or Equivalent % (b)
  • Housing Suitability Dwellings With Extra Bedrooms Needed No. (c)
  • Selected Government Pensions & Allowances Newstart Allowance No. (d)
  • Selected Government Pensions & Allowances Youth Allowance (Full Time Students/Apprentices) No. (d)
  • Selected Government Pensions & Allowances Age Pension – Centrelink No. (d)
  • Selected Government Pensions & Allowances Disability Support Pension No. (d)
  • Total Dwelling Count 2016 (e)
  • Total Usual Resident Population 2016 (e)

The variables marked (a) can be found in the NATSEM – Social and Economic Indicators Synthetic Estimates SA2 2016 dataset.The variables marked (b) can be found in the ABS – Data by Region – Education & Employment (SA2) 2011-2017 dataset. The variables marked (c) can be found in the ABS – Data by Region – Family & Community (SA2) 2011-2016 dataset. The variables marked (d) can be found in the ABS – Data by Region – Income (Including Government Allowances) (SA2) 2011-2017 dataset, while the variables marked (e) can be found in the SA2 Aggregated Population & Dwelling Counts 2016 Census for Australia dataset.

Please note that there is a little bit of data management before we launch into mapping and exploring these variables. Just remember that data processing management is an important part of any analysis, so you should become familiar with it!

Firstly, we need to convert some of our counts to proportions, to standardise them for population or dwelling counts. These are the counts marked (c) and (d) above, and they need to be standardised against the counts in (e).


Exercise 4.1 First we need to merge our datasets.

(1) Merge ABS – Data by Region – Family & Community (SA2) 2011-2016 with SA2 Aggregated Population & Dwelling Counts 2016 Census for Australia

Parameters for this are shown below

(2) Merge ABS – Data by Region – Income (Including Government Allowances) (SA2) 2011-2017 with SA2 Aggregated Population & Dwelling Counts 2016 Census for Australia.

Remember to rename your output datasets!

Next, we need to run the Generate tool to calculate proportions

For the output of (1) above, divide the Housing Suitability Dwellings With Extra Bedrooms Needed No. variable (c) by the Total Dwelling Count (2016) 

Parameters for this are shown below

For the output of (2) above, divide each of the number of different government pensions and allowances by the Total Usual Resident Population (2016) variable. You will need to do this sequentially for each of the variables, so remember to carefully name each output column and dataset!


Merge Parameters

 

Generate Parameters

Your final output tables from these tools should look something like the images below



We are now ready to merge all of our datasets together into one large dataset. This will allow us to undertaken correlation analyses, or other statistical analyses.


Exercise 4.2: Sequentially merge your SA2 Datasets together – Remember . you already merged all of your SEIFA datasets together in our first exercise, so you don’t need to redo that step. Also, you don’t need to add the Merge ABS – Data by Region – Income (Including Government Allowances) (SA2) 2011-2017 or the Merge ABS – Data by Region – Family & Community (SA2) 2011-2016, or the SA2 Aggregated Population & Dwelling Counts 2016 Census for Australia because you have created new datasets from these – just add your output datasets from above!


We have named our final dataset: 

IER + IEO + IRSD + IRSAD + IHAD + NATSEM + Inc/Emp + Crowded + Prop_Pensions

Once you have created this dataset, you can delete the intermediate datasets to keep your data panel as uncluttered as possible – but make sure you have included all of the variables!

Now we are ready to do some mapping! 


Exercise 4.3: Create some choropleth and choropleth centroid maps of the various variables in the master dataset you have just created.

Be creative and insightful – what potential relationships might you pick up?


We have created four here, but there are 210 possible combinations of maps!


SEIFA IRSD Index Score vs. Proportion of Total Dwellings that are Crowded


SEIFA IRSAD Index Score vs. Gini Co-efficient (Income Inequality)

 


SEIFA IEO vs Percentage of 15-24 Year Olds Engaged in Work or Study


SEIFA IER vs Proportion of Population Receiving New Start Allowance


Now we are going to run a correlation analysis on all of our variables – the output is going to be large so it will require some careful, sensible management!


Exercise 4.4: Run a correlation analysis on the merged dataset. The parameters for this are shown below


CORRELATION PARAMETERS


The outputs of the correlation analysis are shown in the table below. The bottom left contain the correlation r values, while the top right contain the significance (green = statistically significant, red = statistically non-significant)


                     
IER ScoreIEO ScoreIRSD ScoreIRSAD ScoreIHAD Q1 %IHAD Q2 %IHAD Q3 %IHAD Q4 %Gini Co-efficient% Housing Stress (30/40)Median IncomePoverty Rate% Youth Earn/Learn% Unemployment% Yr 10 Leaving% Crowded Dwellings% New Start% Youth Allowance% Age Pension% Disability Pension
IER Score1/NaN000000.003300.1224000000.00020000.02180
IEO Score0.40191/NaN000000000000000000
IRSD Score0.77660.84351/NaN000000.0245000000.001300000
IRSAD Score0.66050.94190.95981/NaN0000000000000000
IHAD Q1 %-0.7899-0.7298-0.9059-0.88371/NaN0000.183900000000000
IHAD Q2 %-0.5017-0.7974-0.6962-0.81180.66471/NaN0.02470000000000000
IHAD Q3 %0.17550.38570.44470.4305-0.5922-0.13451/NaN0.00020.71690000.168100.00690.538600.000200
IHAD Q4 %0.78420.78420.86990.9013-0.8883-0.88870.22241/NaN0.000100000000000
Gini Co-efficient-0.09270.46310.13460.307-0.0798-0.4213-0.02180.23611/NaN0.972800.325700.540100.00010.00230.05240.00050.0004
% Housing Stress (30/40)-0.8058-0.593-0.8609-0.75470.71420.4887-0.2639-0.68960.0021/NaN00000.01380000.60690
Median Income0.77630.74830.84490.8792-0.8342-0.8130.24290.92030.2962-0.76141/NaN000000000
Poverty Rate-0.7503-0.6616-0.9055-0.80340.76210.4986-0.3654-0.70060.0590.9609-0.74051/NaN000.28160000.04590
% Youth Earn/Learn0.51970.53710.55290.6071-0.6089-0.65450.08280.72510.3172-0.36420.6134-0.35591/NaN000.216600.609500
% Unemployment-0.7778-0.6605-0.9025-0.81530.76410.4999-0.4-0.69210.03680.8844-0.74540.9082-0.3581/NaN0.32020000.0050
% Yr 10 Leaving0.2181-0.6251-0.1913-0.43630.25040.5839-0.1615-0.3806-0.6029-0.1473-0.3116-0.0647-0.409-0.05971/NaN00.0001000
% Crowded Dwellings-0.7259-0.3462-0.6657-0.50880.47280.2516-0.037-0.47150.2270.8196-0.50920.7936-0.07420.7632-0.41381/NaN000.00420
% New Start-0.7086-0.787-0.9247-0.89940.85070.6718-0.4305-0.8201-0.18190.7606-0.7890.7966-0.6270.82050.23550.51131/NaN000
% Youth Allowance-0.5487-0.3788-0.623-0.49560.4520.2457-0.2183-0.3960.11630.7178-0.46420.7372-0.03070.709-0.27720.68520.55741/NaN0.35020
% Age Pension-0.1373-0.4507-0.3392-0.46580.55730.4984-0.5224-0.4847-0.20570.0309-0.48210.1196-0.30720.16770.5727-0.17080.3626-0.05611/NaN0
% Disability Pension-0.6524-0.7153-0.8172-0.82880.85820.6433-0.4754-0.8011-0.20980.6012-0.75940.6474-0.72130.66960.35080.27020.89190.36250.51261/NaN

This table provides a large amount of information about the lived experience of socio-economic disadvantage. Some important take home relationships:

  • All of the SEIFA Indexes are significantly negatively associated with housing stress, poverty rates, crowded dwellings, unemployment rates and the proportion of the population receiving the government pensions
  • All of the SEIFA indexes are significantly positively associated income levels, percentage of young people earning or learning
  • Housing stress and crowded dwellings are significantly positively associated
  • Income inequality (Gini coefficient) increases with increasing advantage for all of the SEIFA indexes except the Index of Economic Resources.
  • IHAD Quartile 1 proportions were strongly positively associated with housing stress, unemployment, crowded dwellings, and the proportion of the population on government allowances or pensions.

What are relationship can you detect in this table? Are they positive or negative? Strong or weak? Statistically significant?

Exercise Five: Advantage, Disadvantage, and Inequality

In our penultimate analysis we are going to look at the potential relationship between socio-economic diversity and income inequality.

Recall from the last exercise that the Gini coefficient value in our dataset was non significantly associated with the IER, and positively associated with the other three SEIFA indexes (IEO, IRSD, IRSAD). The Gini coefficient is a measure of income inequality, explained in some detail here. As Gini coefficient values increase in value, the amount of inequality in an area increases. This means that as areas become more advantaged and less disadvantaged, it appears that income in equality increases.

What we are going to do now is examine the potential relationship between socio-economic diversity and income inequality – are more diverse areas more unequal or equal in terms of income?

We will repeat what we did for our analysis of the IHAD and SEIFI at SA1 level for SA2. This means that we need to run the diversity index tool on our SA2 IHAD datasets, and then filter the dataset to remove zero values. Remember that low diversity index values mean higher socio-economic diversity. Once we have filtered the dataset, we need to merge that dataset with our dataset containing the Gini co-efficient data: NATSEM – Social and Economic Indicators Synthetic Estimates SA2 2016. We can then map the values together with a choropleth and choropleth centroid map, and run a correlation analysis.


Exercise 5.1: Calculate a Diversity Index for your SA2 IHAD dataset, incorporating the numbers of households in each quartile.

Parameters for this are shown below.

Filter this dataset to remove zero values

Parameters for this are shown below.

Merge this dataset with the NATSEM – Social and Economic Indicators Synthetic Estimates SA2 2016

Parameters for this are shown below

Create a choropleth and choropleth centroid of the Diversity Index and the Gini coefficient


Diversity Index Parameters
 
Dataset Attribute Filter Parameters
 
Merge Aggregated Datasets Parameters


Your choropleth and choropleth centroid should look something like the image below



Is there a relationship between socio-economic diversity and income inequality? Lets run a correlation analysis to find out


Exercise 5.2: Run a correlation analysis on the diversity index and the Gini coefficient

The parameters for this are shown below


Correlation Analysis Parameters


Because we are only running the correlation analysis on two variables, theres no need to put a table in here. But we can see from the output that there is a moderate (and statistically) correlation between the diversity index value and the Gini coefficient: r = 0.2207 (P < 0.005). Recall that as the diversity index value increases, the socio-economic diversity actually decreases. This means that as socio-economic diversity decreases in an area, the income inequality increases

Exercise Six: The Impact of Spatial Scale

The last analysis we will undertake is a simple visualisation of the distribution of one of the SEIFA indexes – the IRSAD – at different spatial scales.

We will undertake this analysis in the AURIN Map

The SEIFA indexes are calculated independently at each level of aggregation. This means that the index score at SA1 or SA1 is dependent on the census responses of all of the individuals within that particular boundary. A consequence of this is that at higher and higher levels of aggregation, more and more variation – spatial and social – is ‘smoothed’ out through averaging. 

This has consequences on our ability to detect and address pockets of disadvantage that may exist over small spatial scales.

We will examine this in inner Sydney in the suburb of Waterloo-Beaconsfield

To do this zoom to inner suburb, and turn on both the . 2016 SA1 IRSAD and 2016 SA2 IRSAD datasets that you shopped for earlier. Now, using the opacity slider of the top most layer, slide it along and back to make it disappear (shown in the video below).

If you click on the SA1s (smaller areas) you will see their IRSAD score (recall that scores lower and lower than 1000 are more and more disadvantaged, while scores higher and higher than 1000 are more and more advantaged. If you click on the north-western red coloured SA1s, you will see that these are quite disadvantaged. In fact, one of them has an index score of 582. This SA1 is ranked 57 out of 57,523 for disadvantage, meaning it is in the bottom 0.1% of SA1s in the country. However, when you aggregate up to SA2, you’ll notice that the area of Waterloo-Beaconsfield has a score of 1079. It is ranked 1793 out of 2310 SA2s for disadvantage (517 out of 2310 SA2s for advantage. This puts it in the top 23% of SA2s for advantage.

The information about the stark socio-economic advantage within the suburb is lost when we move from SA1 to SA2. This has consequences for decision making, resource allocation and capital investment.

This is an example of the Modifiable Areal Unit Problem.