PORTAL USER GUIDE
Join Dataset (Outer)
In order to be able to use a lot of the statistical tools comparing data from different sources, we need a way of “bringing them together”. Many of the tools require the variables we may be interested in modelling or comparing or computing need to be in the same dataset in order to be recognised.
The Join Dataset (Outer) tool does this by taking two tables and creating a join based on a specific key or variable. In the vast majority of instances, this key or variable will be the specific area codes such as the variable identifying an SLA, SA2 or LGA. It is therefore vital in these instances to remember that you can only join datasets that are at the same geographical level of aggregation.
In setting up a join we simply specify which table will be on the right side of the new table, which table will be on the left, and join them according to the area code. Unlike the Join Dataset (Inner) tool, this join will keep all additional rows in the left-hand table that may not have corresponding rows in the right-hand table. Refer to the Venn diagram below for the logical relationship of the datasets after the join.
For this worked example, we will combine two datasets from the Northern Territory and Australia.
Select the Northern Territory as your area.
Select ABS Census – B01 Selected Characteristics (SLA) 1991 and select the following variables:
- SLA Code
- SLA Name
- Australian Born, Female
- Australian Born, Male
We now need to change the area to get a dataset representing just the Northern Territory.
Select Greater Darwin as your area.
Select ABS Census – B11 Language Spoken At Home By Sex (SLA) 1991 and select all variables.
Once you have set up your data, open the Join Dataset (Outer) tool (Tools → Data Processing → Join Dataset (Outer)). The input fields are as follows:
- Left side: The dataset which is considered the primary (left) table, where all rows will persist. Select ABS Census – B01 Selected Characteristics (SLA) 1991, the dataset we chose for the Northern Territory.
- Left attribute: The column containing the key used to join both tables. This needs to exist in the right table, however, the naming of the column may be different. Select SLA Code.
- Right side: The dataset which we’re joining to the left one selected above. Select ABS Census – B11 Language Spoken At Home By Sex (SLA) 1991, the dataset we chose for Darwin.
- Left attribute: The column containing the key used to join both tables. Select SLA Code.
The input parameters are summarised in the image below, once complete click Run Tool.
Once the tool has run, click the Display button on the pop-up dialogue box that appears. This will open a window with the joined tables that should look like the image below.
You will see that the output dataset will contain the joined rows from Darwin, but maintain all rows from the Northern Territory even if there is no data available to join to it. This is useful if you still want to consider the areas outside of Darwin in your analysis.