The parkrun events examined in this study are all located within Greater London. This location was chosen because of the relatively high spatial coverage of air quality monitoring stations and parkrun events. Furthermore, London often breaches European air quality limits with poor UAQ contributing to an estimated 9,400 premature deaths, which costs between £1.4 and £3.7 billion per year [78–81]. Finishing times for participants of fifteen parkrun events (Fig. 1) from 2011–2016 were provided by the parkrun research board. These were selected due to their close proximity to Department for Environment, Food and Rural Affairs (DEFRA) monitoring stations (< 15 km) to utilise as accurate ‘at event’ readings as possible due to the high spatial variability of air quality [82–85]. The parkrun dataset contains details of the parkrun location, event date, individual run times of each participant on that corresponding date, their gender and age group. The parkrun finishing time data was anonymised prior to research access being given in accordance with the completed and agreed ethics procedures (ERN_17-1583).
For each parkrun event, the weekly mean finishing time was calculated and then used for further analyses. This was for the complete participant list before being broken down into male and female times. It is important to note that due to the increasing success of parkrun events, average finishing times continue to increase due to growing participation levels. Therefore, decomposition of the run times was performed and the remainder value was used for analysis against the explanatory variables of temperature, relative humidity, wind speed, O3, NO2 and PM2.5 (Fig. 2).
The removal of long term trend and seasonality is determined to be required due to the variation in parkrun numbers over time as a result of parkrun gaining popularity and changes in participants over the course of the year. The decompose function in the R package ‘forecast’ was used to determine the seasonal, long term and random components within the data via an additive model. This is used because the seasonality variation remains relatively constant despite an increase in participation.
Meteorological data was obtained from the British Atmospheric Data Centre (BADC) using the Met Office Integrated Data Archive System (MIDAS). Seven stations (Fig. 1) were used due to their proximity to the parkrun events being examined. Observations were downloaded for 09:00 on Saturdays to match the starting time of parkrun events and ensure that the values used in analyses were as accurate as possible to those the parkrun participants were exposed to. Air temperature, relative humidity and wind speed variables were downloaded. The worldmet package within R was utilised to import meteorological data for analyses [86]. This was quality checked against the MIDAS datasets and it was shown that temperature values were the same but relative humidity in some cases varied by up to 2%, although this is likely due to the formatting algorithms used in processing the data [87].
Air quality data for Greater London was retrieved from the DEFRA Automatic Urban and Rural Network (AURN) SITES, between 08:00 and 10:00 local time at background monitoring sites. This includes hourly readings for NO2, O3, PM2.5 and PM10. PM10 was subsequently removed from analyses due to its high correlation to PM2.5, while the latter was retained due to the greater association of smaller particles with deleterious health effects. Locations of the monitoring sites can be seen in Fig. 1 and were selected due to them being urban background sites, i.e. not in direct proximity to roadsides and vehicular pollution, measuring all or most of the above pollutants and their proximity to parkrun events. The mean 08–10:00 air quality values were found and used for analysis to capture the air quality participants were potentially exposed to before and during the events.
Each parkrun site was paired with the closest DEFRA AURN and Met Office locations (Table 1). Although some are not optimally placed, they are indicative of the local air quality. Due to not all measurement sites recording all of the desired explanatory variables, some events have been analysed against a reduced times series as dates containing missing data have been removed from analysis. Likewise, with discrepancies in the meteorological data. Prior to analysis, parkrun finishing times over ninety minutes were discarded, as these were technical issues indicated by parkrun [88].
Table 1
The analysed parkrun events and their corresponding air quality and meteorological monitoring location along with distances between the sites.
parkrun | DEFRA AURN | Distance (km) | Meteorological Station | Distance (km) |
Bedfont | Harlington | 5.3 | Heathrow | 3.0 |
Brockwell | Westminster | 5.2 | St James Park | 6.1 |
Bromley | Eltham | 7.8 | Biggin Hill | 8.9 |
Bushy | Teddington | 1.4 | Hampton W WKS | 2.2 |
Crystal Palace | Eltham | 10.2 | Biggin Hill | 14.1 |
Finsbury | Haringey | 2.2 | St James Park | 7.3 |
Greenwich | Eltham | 0.6 | London City Airport | 4.3 |
Grovelands | Haringey | 5.3 | St James Park | 14.4 |
Hackney | Haringey | 7.2 | London City Airport | 7.8 |
Kingston | Teddington | 0.8 | Hampton W WKS | 4.9 |
Lloyd | Eltham | 14.3 | Biggin Hill | 9.5 |
Old Deer | Teddington | 4.4 | Kew Gardens | 0.9 |
Richmond | Teddington | 3.7 | Kew Gardens | 3.6 |
Roundshaw | Teddington | 17.0 | Kenley Airfield | 3.5 |
Wimbledon | Teddington | 6.9 | Kew Gardens | 7.8 |
Correlation analyses between the decomposed finishing times and the explanatory variables were performed for the whole data set as well as gender subsets, as used by Helou et al. [25]. Each of the parkrun events was also examined separately to determine whether certain locations were more influenced by the measured variables. Linear regression analyses, the common technique used in aforementioned marathon studies [44], was used to compute the R2 value, showing the total percentage of variance in finishing times explained by the control variables.
Analysis to determine the influence of UAQ and meteorology on the average weekly parkrun finishing time was achieved by multiple linear regression analysis that considered the combined influence of NO2, O3 and PM2.5 on finishing times. For meteorology, temperature, relative humidity and wind speed were used as the independent variables. This analysis method reintroduces a form of natural seasonality that is initially striped from the time series. This is done to remove the ‘slowing’ influence of New Year’s resolution runners and general loss of physical fitness over the Christmas period, rather than leaving the long term trend and seasonality in from the beginning of analyses. It also allows for a more representative insight into real world processes and influences. Post-test analysis was also performed using the following diagnostic tests; Quantile-Quantile, Scale-Location, Fitted vs Residuals, Cooks-Distance and ACF plots and histograms of residuals (Fig. 3).
It needs to be noted that this research follows a time series rather than space-time series analysis. Although there could be variation between parkrun finishing times and the local air quality and meteorology, there are other factors that would also need to be considered such as differences between event surfaces and elevation profiles that could lead to false conclusions. Controlling these factors over a spatial analysis would prove challenging and probably a paper in its own right.