In general, the indoor concentration of an atmospheric pollutant, such as PM2.5, can be modeled through a single mass-balance equation (Eq. 1) considering that indoor concentration changes mainly depend on a) the infiltration of outdoor particles, b) indoor particles that escape outdoors, c) deposition of indoor particles and d) indoor emissions. Eq. 1 assumes uniform mixing of the air pollutants and negates concentration changes due to gas-phase interactions or variations in the environmental parameters between the indoor and outdoor environments 13.

$$\frac{\text{d}{\text{C}}_{\text{i}\text{n}}\left(\text{t}\right)}{\text{d}\text{t}}=\text{p}{\alpha }{\text{C}}_{\text{o}\text{u}\text{t}}\left(\text{t}\right)-{\alpha }{\text{C}}_{\text{i}\text{n}}\left(\text{t}\right)-\text{k}{\text{C}}_{\text{i}\text{n}}\left(\text{t}\right)+\frac{\text{S}\left(\text{t}\right)}{\text{V}}$$

1

where dCin is the change in indoor PM2.5 concentration (µg m–3) during the time interval dt, Cin and Cout are the indoor and outdoor particle’s concentrations (µg m–3), t is the time (h), p is the penetration efficiency of particles (dimensionless), α is the air exchange rate (h–1), k is the deposition rate of particles (h–1), S is the indoor emission rate (µg h–1) and V is the volume of the building or the room (m3). Under the assumption that the instantaneous temporal change of indoor concentration is substantially lower than the average indoor concentration over a considerable time interval, the steady-state form of Eq. 1 can be re-arranged as follows:

$${\text{C}}_{\text{i}\text{n}}=\frac{\text{p}{\alpha }}{{\alpha }+\text{k}}{\text{C}}_{\text{o}\text{u}\text{t}}+\frac{\text{S}}{\text{V}\left({\alpha }+\text{k}\right)}={\text{f}}_{\text{i}\text{n}\text{f}}{\text{C}}_{\text{o}\text{u}\text{t}}+{\text{C}}_{\text{i}\text{n}, \text{g}\text{e}\text{n}}$$

2

$$\frac{{\text{C}}_{\text{i}\text{n}}}{{\text{C}}_{\text{o}\text{u}\text{t}}}={\text{f}}_{\text{i}\text{n}\text{f}}+\frac{{\text{C}}_{\text{i}\text{n}, \text{g}\text{e}\text{n}}}{{\text{C}}_{\text{o}\text{u}\text{t}}}$$

3

with \({\text{f}}_{\text{i}\text{n}\text{f}}=\frac{\text{p}{\alpha }}{{\alpha }+\text{k}}\) the infiltration factor and \({\text{C}}_{\text{i}\text{n}, \text{g}\text{e}\text{n}}=\frac{\text{S}}{\text{V}\left({\alpha }+\text{k}\right)}\) the indoor concentration (µg m–3) generated solely by indoor activities.

The indoor-to-outdoor concentration ratio converges to the finf when the outdoor concentration is significantly high or during periods with minimal indoor contribution (Eq. 3). The latter requires the detection and the removal of the indoor-generated peaks before the calculation of the infiltration factor. The optimal approach to detect such specific instances or/and periods is to record in detail the indoor activities. In this case, the peaks can also be assigned to the major activities that occurred in the indoor environment. If this kind of information is missing, machine learning and rule-based approaches can be evaluated for peak detection 15,17,18,24,28,37. Here, the indoor emission cases were identified through analysis of the indoor PM2.5 series using the Robust Extraction of Baseline Signal (REBS) methodology 38,39. REBS has been applied for detecting local sources in various air pollutants time series 39–41. According to Ruckstuhl et al. 39, the indoor time series can be decomposed as:

$${\text{C}}_{\text{i}\text{n}}\left(\text{t}\right)={\text{C}}_{\text{B}}\left(\text{t}\right)+{\text{C}}_{\text{R}}\left(\text{t}\right)+{\epsilon }$$

4

where CB(t) is the background concentration levels, CR(t) is the concentration due to indoor emissions and other contributions (e.g., outdoor concentration) and ε is the normally distributed and independent errors. The local emissions that are responsible for spikes in time series then can be identified using the REBS method through a two-stage approach. Initially, the background concentration is determined using local linear regression over a moving window of a specific duration. Then, any data points greater than a designated threshold relative to the background concentrations are classified as emissions:

$${\text{C}}_{\text{i}\text{n}}\left(\text{t}\right)>\widehat{{\text{C}}_{\text{B}}}\left(\text{t}\right)+{\beta }\times {\sigma }$$

5

where \(\widehat{{\text{C}}_{\text{B}}}\left(\text{t}\right)\) is the estimated background curve, σ is the standard deviation of the data falling below the background curve. β is a user-defined parameter. β controls the width of the threshold curve with higher values attributing to wider threshold concentrations. Here β is set equal to 3 as initially proposed in 39. In this study, the REBS method was implemented using the *rfbaseline* function of the “IDPmisc” R package 42.

Once the peak emission events were detected, the finf at each timestamp was determined, assuming steady-state conditions after T hours (Section S1 of the supplementary material):

$${\text{f}}_{\text{i}\text{n}\text{f}}\left(\text{t}\right)=\frac{{⟨{\text{C}}_{\text{i}\text{n}}\left(\text{t}\right)⟩}_{\text{T}}}{{⟨{\text{C}}_{\text{o}\text{u}\text{t}}\left(\text{t}\right)⟩}_{\text{T}}}$$

6

with < Cin(t)>T and < Cout(t)>T the T-hour rolling averages for the indoor and the outdoor PM2.5 concentrations, considering only the timestamps that were not identified as indoor emission events and at least 30% data coverage within the T-hour window. A steady state is reached when the instantaneous temporal change of indoor concentration is significantly lower than the average indoor concentration over a considerable time span (Eq. 1). Following the methodology outlined in section S1, the hourly concentration change is calculated by the concentration difference of the current and previous timestamp, ΔCin(t) = Cin(t) – Cin(t–1). Then, running averages at various temporal intervals T are computed for the indoor concentration, <Cin(t)>T, and the respective changes < ΔCin(t)>T. It is expected that < ΔCin(t)> – <Cin(t)>T almost equals to < Cin(t)>T at increasing temporal intervals (Fig. S6 and S7). Section S1 and Fig. S6 and S7 showed steady state conditions are achieved after 48 hours. Based on this result, finf is derived through Eq. 6 using 48-hour running averages of indoor and outdoor PM2.5.