# Correlation

In statistical analysis, one of the first things we want to do with data is explore the relationship between variables. One of the simplest ways to do this is to measure the correlation between values. Together with a scatter plot of the data, correlation can provide very rapid insights into the relationship between variables, without any particularly complex analysis.

• Correlation coefficients are a numerical representation of the relationship between two variables, ranging from -1 to 1.
• Negative correlation coefficients indicate that as one value increases, the other decreases (and vice versa), while positive correlation coefficients indicate that as one value increases, the other increases (and vice versa).
• The closer the value of the correlation coefficient is to -1 or 1, the stronger or tighter the relationship is between them.
• Values of 0 indicate no relationship between two variables, and close to 0 indicate only very weak or loose relationships.

The points above are illustrated in the figure below, showing five scatter plots and correlation coefficients (denoted as ‘r’):

If you have multiple variables and you would like to run correlations on all of them (pairwise correlations), you can do this in the AURIN portal. It is important to understand that while a correlation matrix can show how much one variable changes with another variable, this does not necessarily mean that change in one is causing the change in the other – the basis of the well-worn phrase of “Correlation doesn’t imply causation”.

There are many things that correlate with each other because they are both caused by another variable. In other instances, there are things that correlate with each other for no other reason than by chance alone (check out Spurious Correlations for some really interesting ones!) so be wary of drawing conclusions based on correlations. Still, they are often statistically significant associations which can warrant further investigation.

For a visual representation, refer to the Correlation Matrix tool.

### SET UP

For this worked example, we will compare the marriage status of households and transport mode taken to work throughout the SA2 regions of Brisbane, Queensland to see if there is any correlation throughout the variables.

Select the Greater Brisbane GCCSA as your area.

Select ABS – Data by Region – Family & Community (SA2) 2011-2016 as your dataset, with the following variables:

• SA2 Code
• SA2 Name 2016
• Year (Filter Value: 2016)
• Method Of Travel To Work – Employed Persons Used One Method – Bicycle No.
• Method Of Travel To Work – Employed Persons Used One Method – Bus No.
• Method Of Travel To Work – Employed Persons Used One Method – Car (As Driver Or Passenger) No.
• Method Of Travel To Work – Employed Persons Used One Method – Motor Bike/Scooter No.
• Method Of Travel To Work – Employed Persons Used One Method – Other (Inc. Taxis) No.
• Method Of Travel To Work – Employed Persons Used One Method – Train Or Tram No.
• Method Of Travel To Work – Employed Persons Used One Method – Walked Only No.
• Social Marital Status & Registered Marital Status Married In A De Facto Marriage No.
• Social Marital Status & Registered Marital Status Married In A Registered Marriage No.
• Social Marital Status & Registered Marital Status Not Married No.

### Inputs

Once you have set up your data, open the Correlation tool (Tools → Statistical Analysis → Correlation). The input fields are as follows:

• Name: The name of the correlation tool’s results. You can keep the default name provided, or name it to something you can recognise.
• Dataset Input: The dataset containing the variables to perform the analysis. Select ABS – Data by Region – Family & Community (SA2) 2011-2016.
• Variables: The variables we would like to test. Here we only want variables that convey meaning in our analysis. So we will use all the variables except SA2 Code, SA2 Name and Year:
• Select Method Of Travel To Work – Employed Persons Used One Method – Bicycle No.
• Select Method Of Travel To Work – Employed Persons Used One Method – Bus No.
• Select Method Of Travel To Work – Employed Persons Used One Method – Car (As Driver Or Passenger) No.
• Select Method Of Travel To Work – Employed Persons Used One Method – Motor Bike/Scooter No.
• Select Method Of Travel To Work – Employed Persons Used One Method – Other (Inc. Taxis) No.
• Select Method Of Travel To Work – Employed Persons Used One Method – Train Or Tram No.
• Select Method Of Travel To Work – Employed Persons Used One Method – Walked Only No.
• Select Social Marital Status & Registered Marital Status Married In A De Facto Marriage No.
• Select Social Marital Status & Registered Marital Status Married In A Registered Marriage No.
• Select Social Marital Status & Registered Marital Status Not Married No.
• Method: The correlation method. The Pearson method evaluates a linear relationship between variables, whilst the Spearman method evaluates a monotonic relationship. Select pearson.

The input parameters are summarised in the image below, once complete click Run Tool.

### Outputs

Once the tool has run, click the Display button on the pop-up dialogue box that appears. This will open a window with the outputs of your Correlation result, which should look like the image below.

The text outputted is tab-delimited and is better viewed by copying or importing the data into a spreadsheet application. Also, enabling conditional formatting by colouring min/max values allows you to visualise significant cells.

The returned Correlation P-Matrix shows the p-value according to each cell, indicating its statistical significance. Values of less than 0.05 can be considered statistically significant.

### Looking for Spatial Data?

You can browse the AURIN Data Discovery: