PORTAL USER GUIDE

Summary Statistics

Summary Statistics produces the following statistics to assist in providing insight into the information available in your dataset:

  • Number of observations
  • Minimum, maximum and the range
  • Mean
  • Median
  • 25 and 75% Quantiles
  • Total sum
  • Variable data type (Storage mode)
  • Standard error of the mean
  • Skew
  • Kurtosis
  • Trimmed means

SET UP

For this worked example, we will summarise the age of registered cars in the south-west region of Victoria in a 2016 dataset from the Australian Bureau of Statistics.

Select Warrnambool and South West as your area (Australia → Victoria → Rest of Victoria → Warrnambool and South West).

Select ABS – Data by Region – Economy & Industry (SA2) 2011-2018 as your dataset with the following variables:

  • SA2 Code
  • SA2 Name 2016
  • Year (Filter value: 2016)
  • Registered Motor Vehicles – Year of Manufacture 5 To 10 Years No.
  • Registered Motor Vehicles – Year of Manufacture Less Than 5 Years No.
  • Registered Motor Vehicles – Year of Manufacture Over 10 Years No.
  • Registered Motor Vehicles – Year of Manufacture Total Registered Motor Vehicles No.

Inputs

Once you have set up your data, open the Summary Statistics tool (Tools → Statistical Analysis → Summary Statistics). The input fields are as follows:

  • Dataset Input: The dataset containing the variables you would like to summarise. Select: ABS – Data by Region – Economy & Industry (SA2) 2011-2018
  • Variables: The variables we would like to summarise. Select:
    • Registered Motor Vehicles – Year of Manufacture 5 To 10 Years No.
    • Registered Motor Vehicles – Year of Manufacture Less Than 5 Years No.
    • Registered Motor Vehicles – Year of Manufacture Over 10 Years No.
    • Registered Motor Vehicles – Year of Manufacture Total Registered Motor Vehicles No.

The input parameters are summarised in the image below, once complete click Run Tool.

Outputs

Once the tool has run, click the Display Output button on the pop-up dialogue box that appears. This will open a window with the summaries of your dataset which should look like the image below.

The output provides a tab-delimited output, which can be easier viewed by copying it inside a spreadsheet. A summary of the variable output is as follows:

  • Num rows: Number of rows (observations) in the data.
  • Num cols: Number of columns (variables/attributes) in the data.
  • Min values: The minimum value for a variable/attribute.
  • Means: Mean value(s) of the variables/attributes.
  • Medians: Median value(s) of the variable/attributes.
  • Quantiles: 1st (“25%”) and 3rd (75%) quartile values for each of the columns (variables/attributes).
  • Storage mode: The type of value that each variable/attribute is recorded as: logical, i.e. “TRUE” or “FALSE”; integer, i.e. 1, 2, -1, 7; double, i.e. 1.03472, 2.49227; and character i.e. “green”, “red”.
  • Sd: Standard deviation for a variable/attribute.
  • Trimmed: Trimmed mean value for a variable/attribute calculated by dropping a proportion of the observations from both ends of the sample, and can allow you to determine whether long tails/outliers have an impact on the mean value. The trimmed amount in this tool is set to is 20% (40% on either side).
  • Max: The maximum value for a variable/attribute.
  • Range: The range of values (Max – Min) for a variable/attribute.
  • Skew: The measure of how much a range of values is skewed to the left or right of the mean (asymmetry) for a variable/attribute.
  • Kurtosis: The measure of how flat or “chopped off” a range of values is around the mean for a variable/attribute.
  • Mean se: The standard error of the mean for a variable/attribute.
Mangiafico, S. S. (2016). R Handbook: Descriptive Statistics. Summary and Analysis of Extension Program Evaluation in R. https://rcompanion.org/handbook/C_02.html

Looking for Spatial Data?

You can browse the AURIN Data Discovery:

How can you Create Impact?

Learn more about AURIN Researcher’s outcomes & real-world impact: