During COVID-19, public health agencies relied heavily on data to understand where the virus was spreading and who was most at risk.
While robust data have long underpinned decision-making, the pandemic reinforced that statistical factors – such as how data are grouped, how geographical boundaries are defined, and how granular the data are – can all influence how we interpret the results. Small changes in these choices can lead to very different conclusions about where disease incidence appears highest and how confident we should be in those findings.
Dr Aiden Price is a senior research associate at the Queensland University of Technology, and an Environmental Health Domain Specialist at AURIN. He said how researchers monitor disease outbreak is just as important as what is being monitored in the first place, because all models are extrapolations of data.
“There’s a common quote used when describing statistical models, and it’s that ‘all models are wrong, but some are useful,’” Aiden said.
Among the many factors that can influence data, spatial scale is one. Spatial modelling is a key tool used to explore disease spread, helping researchers examine how disease incidence varies from place to place, how it changes over time, and how it relates to factors such as population density, demographics, access to services, or environmental conditions.
Understanding disease spread by location helps researchers understand where and why disease clusters may be emerging. Depending on how robust those patterns are, this information can inform public health decision-making.
Understanding why spatial data resolution matters
In most public health studies, disease data are not analysed at the level of individual people. Instead, they are grouped into geographic areas – a process called aggregation. Aggregation protects privacy while revealing broader patterns that may not be visible at the individual level.
However, not all aggregated spatial data is the same. Spatial data can have different resolutions, ranging from small areas with fine, granular information, such as Statistical Area Level 1 (SA1), to much larger geographical regions with coarser resolutions, such as Statistical Area Level 4 (SA4). Aiden described the Statistical Area Levels as a ‘recursive jigsaw,’ with many SA1’s combining to form a single SA2, multiple SA2’s to form an SA3, and so on.
A long-standing challenge in spatial analysis is the Modifiable Areal Unit Problem, often referred to as MAUP. MAUP describes how statistical results can change when the same data are grouped in different ways. In other words, the same data can tell a different story depending on its resolution and how the boundaries of geographic areas are drawn.
“When we go into a spatial context, we have more considerations to make: how high are we aggregating? What are the shapes of the regions? Choosing an inappropriate spatial resolution can lead you to a completely different result,” Aiden explained.
Aiden said this does not mean one result is ‘wrong’ and the other is ‘right,’ but different results reflect the fact that spatial patterns are sensitive to how areas are defined. Understanding and testing for this sensitivity is essential, especially when results are used to inform public health decisions.
Testing different scales to reveal a clearer picture of risk
To understand how to select appropriate spatial units, Aiden, along with a team of researchers, examined COVID-19 data across Queensland using different geographical resolutions, from SA1 to SA4.
The team found that using coarser resolutions in their modelling, such as SA4, tended to mask disease hotspots, with higher levels of uncertainty. In contrast, finer scales, such as SA1 and SA2, offered greater precision and revealed clearer patterns.
Their analysis showed that SA2 was the optimal level of aggregation in the analysis, striking a balance between spatial resolution, model stability, and interpretability.
“It’s about balance. There’s a lot of detail in SA2, with about 2000 SA2’s across Australia. If you go up to SA3, there’s only about 500 nationally, In terms of modelling COVID-19, SA2 still gives a good amount of detail about pockets of high and low incidence, whereas SA3 takes it a step too far because it averages over a much larger area, and we start to lose important detail.” said Aiden.
Selecting the right spatial scales for statistical analyses ensures decision-makers know which communities are most at risk and where public health interventions will be most effective.
The role of AURIN in enabling stronger public health modelling
Research such as this is made possible through national research infrastructure like AURIN.
“AURIN supports collaborations and builds networks and expertise across the environmental health space, it’s targeted to make a difference, going beyond the typical research metrics and putting impact at the forefront to enable researchers to conduct more meaningful work.” said Aiden.
As future health challenges emerge, the ability to undertake reliable spatial analysis will help governments detect risks earlier, allocate resources more effectively, and tailor interventions to communities most in need. However, Aiden highlighted that building this understanding requires data custodians to be more open to the idea of data sharing in the first place.
“As we’ve learned with this COVID-19 case study, by having data available at as fine a resolution as possible, we’re enabling researchers to take that data and produce the highest quality research possible. Building that understanding only comes from transparency, openness and collaboration,” he said.

