Theil Index

The Theil Index is a common measure of concentration and dispersion. It has historically been used to measure income inequality, but in effect can be used to measure the dispersion of any variable across regions relative to the whole (Akita and Kawamura, 2002). It is calculated as follows:

T={\sum_{i}\left(y_{i}\over Y\right)}ln\left(y_{i}/Y\over n_{i}/N\right)

Where y_{i} is the count of the variable of interest (such as number of unemployed people) in area i, Y is total count of that variable of interest across the entire study area, n_{i} is the total population count in region i, and N is the total population count across the entire study area.

The resulting index can range from 1 to the natural log (ln) of the number of categories, n (in this case, the number of areas within the study area). It is common to standardise the index so it can range between zero and one, which is done by dividing by ln(n). A Theil Index of zero indicates perfect equality and every region has the proportion for the population. Conversely, a standardised Theil Index value of one represents a state of perfect inequality, where one region has all of the variable of interest.


To show the Theil Index tool in use, we will run it to calculate the coefficients for the distribution of female youth unemployment across Western Australia.

  • Select Western Australia as your area
  • Select NATSEM – Social and Economic Indicators – Unemployment Rate SA2 2016 as your dataset, and select the following variables:
    • Number of females in the labour force aged 15 – 24
    • Number of unemployed females aged 15 – 24 in the area
    • SA2 Code
    • SA2 Name


Once you have selected these, open the Theil Index parameter input window (Tools → Indices → Theil Index) and enter the parameters as in the image shown below (these are explained in more detail below the image).

  • Dataset input: This is the dataset that contains the values you would like to include in the Theil Index calculation. In this instance we select NATSEM – Social and Economic Indicators – Unemployment Rate SA2 2016.
  • Numerator: This is the column that contains the different counts for the specific variable that you would like to calculate the inequality of distribution across the study region. In this instance, we select Number of unemployed females aged 15 – 24 in the area.
  • Denominator: This is the column that contains the total counts of the sample population that you are taking the numerator from. In this instance, we select Number of females in the labour force aged 15 – 24.

Once you have entered your parameters, click Run Tool.


Once you have run the tool, click the Display Output button that appears on the pop up dialogue box. This should open up a text box like the one shown below, which has the Theil Index values for your variable (raw and standardised). In this instance, we have a standardised value of 0.0069, which suggests low inequality in the distribution of female unemployment in WA.


Akita, T., & Kawamura, K. (2002). Regional income inequality in China and Indonesia: A comparative analysis

Looking for Spatial Data?

You can browse the AURIN Data Discovery:

How can you Create Impact?

Learn more about AURIN Researcher's outcomes & real-world impact: