This section describes the data and methods used to investigate the relations visualized in the conceptual model. The data panel is described first, followed by an operationalisation of the concepts into variables. Then descriptive statistics of the data are presented, followed by a description of the research method. Finally, the section culminates in an operational model and the model specification and estimation.
3.1. Data & Sampling
To investigate the relationship between working from home and travel behaviour, we use data from the MPN, a longitudinal household panel that consists of a 3-day travel diary and a set of questionnaires (Hoogendoorn-Lanser et al., 2015). We use five waves of data from the MPN, spanning the years 2017 through 2021. Effectively, this gives us three waves of data collected before the COVID-19 pandemic and two waves collected during the COVID-19 pandemic.
Since we are interested in the relationship between work and travel behaviour, we only use respondents that worked at least 16 hours per week during each wave. Furthermore, we use a pure-stayer sample of respondents who fully participated in all the five waves. The use of a pure-stayer sample allows us to more directly compare the effect sizes found before and during the pandemic, as they are based on data generated by the exact same people for both time periods. The final pure-stayer sample consists of 1100 respondents. A possible drawback of using a pure-stayer sample is the potential of non-random dropout, which could lead to a skewed data set. To test whether this is the case, we give a distribution of socio-demographic characteristics for both the pure-stayer sample and the cross-sectional sample of the 2019 MPN wave in Table 1.
Table 1 Sample distribution of both the pure-stayer sample and participants of the 2019 wave
|
|
Sample (pure stayer, %)
|
Sample (2019 wave, %)
|
Gender
|
Male
|
52
|
50
|
Female
|
48
|
50
|
Age (Years)
|
18-29
|
8
|
15
|
30-39
|
31
|
27
|
40-49
|
25
|
26
|
50-59
|
27
|
24
|
60+
|
9
|
8
|
Education
|
Low
|
11
|
14
|
Medium
|
38
|
40
|
High
|
52
|
46
|
Net Personal Income (€/month)
|
Low (< €2000)
|
32
|
38
|
Modal (€2000 to €3000)
|
46
|
43
|
High (> €3000)
|
22
|
19
|
Urban Density (addresses/m2)
|
<500
|
7
|
9
|
500-1000
|
20
|
21
|
1000-1500
|
17
|
18
|
1500-2500
|
33
|
32
|
>2500
|
23
|
20
|
Household type
|
Single
|
26
|
20
|
Adult household
|
39
|
44
|
Kids < 13
|
29
|
28
|
Kids > 13
|
7
|
8
|
This comparison between the socio-demographics of the pure-stayer sample and the cross-sectional sample, given in Table 1, shows that drop-out is indeed higher in some subgroups, leading to a slightly skewed sample. In particular, younger people (younger than 29 years) seem to drop out more often.
3.2. Operationalisation
The main interest of this research is to a) determine the direction and strength of the (causal) relations between working from home and travel behaviour and b) find whether these effects changed during the COVID pandemic. To find the bidirectional (causal) relations between working from home and travel demand we need to include four variables within each wave.
First, we include the weekly number of hours worked from home and the weekly number of hours worked in total. These two variables allow us to effectively capture the effect of working more from home, whilst controlling for the effects of simply working more. Effectively, we need to separate increases in working from home that result from a simple increase in the total number of hours worked and increases that result from a shift from hours worked in a separate work location (for example an office) to hours worked from home. These variables are measured using a questionnaire, where respondents are prompted to answer how many hours on average they worked in total in the last few weeks. Afterwards, they distribute these hours over the location where they worked, giving us the weekly number of hours spent working from home.
For travel behaviour, we distinguish two types of travel. The first is commute travel, which we define as travel with the purpose of moving to and from the work location. Second, we distinguish leisure travel, which we broadly define as all travel that is not made specifically either for work (= either commuting or business travel) or educational purposes. We operationalise travel behaviour as the time spent traveling for either commuting or leisure purposes during the 3-day observational period of the MPN. We choose to use travel time, as opposed to travel distance, since we assume that people base their decisions on changes in travel time. In other words, we expect that the time saved by commuting less because of working from home is the most tangible benefit felt by people who work from home, and that it is this saved up time they might spend on travelling for other purposes. This is also the main thought behind the theory of constant travel time budgets, which we can tie into our research in this way.
We extend the model beyond these four variables in two ways. First, we include the number of weekend days that are included in a respondent’s travel diary allotment. MPN respondents keep 3-day travel diaries, where the allotted weekdays are kept stable across waves. Some respondents then have 1 or 2 weekend days in these allotted days, whereas others have none. We need to control for these differences to get more accurate between-person relations for time spent travelling. Second, we want to allow for some form of heterogeneity with respect to travel modes in the model. This adds information that is highly relevant to policy makers, road operators, and transit planners. To do so, we add the most often used commute mode in the last pre-pandemic wave (2019). This allows us to attain information on the extent to which commute demand for the various modes has been affected by working from home because of the pandemic. The commute mode is operationalised using two dummy variables: people who commuted by public transport and people who commuted by car. The reference group is then the people who commuted by other travel modes.
3.3. Data description
Means and standard deviations for the four time-variant variables are given in Table 2.
Table 2 Means and standard deviations for the time-variant variables
|
|
2017
|
2018
|
2019
|
2020
|
2021
|
Hours worked (hours/week)
|
Mean
|
35
|
35
|
35
|
35
|
35
|
Median
|
36
|
36
|
36
|
36
|
36
|
SD
|
8.9
|
8.6
|
8.5
|
8.4
|
8.0
|
Hours WFH (hours/week)
|
Mean
|
3
|
3
|
3
|
12
|
10
|
Median
|
0
|
0
|
0
|
4
|
3
|
SD
|
6.3
|
6.8
|
6.5
|
15
|
13
|
Commute travel time (hours / 3 days)
|
Mean
|
1.30
|
1.27
|
1.23
|
0.67
|
1.30
|
Median
|
1.00
|
0.92
|
0.80
|
0.17
|
1.00
|
SD
|
1.43
|
1.43
|
1.47
|
1.07
|
1.43
|
Leisure travel time (hours /3 days)
|
Mean
|
1.98
|
2.15
|
2.03
|
1.73
|
1.98
|
Median
|
1.45
|
1.62
|
1.57
|
1.20
|
1.45
|
SD
|
1.95
|
2.12
|
1.95
|
1.90
|
1.95
|
All variables’ mean and standard deviation values are stable across the three pre-pandemic years. This stability disappears when comparing both 2020 and 2021 to the years before the pandemic, except for the number of hours worked per week. The average number of hours worked from home substantially increased in the year 2019/2020, before levelling off at an average of 10 hours worked per week in 2021. Both commute and leisure travel time decreased: from 74 minutes per 3 days for commute travel in 2019 to 40 minutes in 2020 and from 122 minutes per 3 days for leisure travel to 104 minutes. The decrease in commute travel time thus was more substantial than the decrease in leisure travel time.
Correlations between the time-variant variables for both the years before and the years during the COVID-19 pandemic are given in Table 3.
Table 3 Correlations between time-variant variables
|
Hours worked
|
Hours WFH
|
Commute travel time
|
Leisure travel time
|
Hours worked
|
1
|
0.31
|
0.13
|
-0.041
|
Hours WFH
|
0.26
|
1
|
-0.24
|
0.040
|
Commute travel time
|
0.20
|
-0.068
|
1
|
-0.11
|
Leisure travel time
|
-0.035
|
0.057
|
-0.18
|
1
|
Bivariate correlations in the lower-left (bold) are from the pooled 2017-2019 data; correlations in the upper-right (italic) are from the pooled 2020-2021 data
|
As expected, based on the conceptual model, the correlation between working from home and commute travel is negative both before and during the COVID-19 pandemic. The magnitude of the correlation is much larger during than it was before the pandemic. The correlation of hours worked from home and leisure travel time is positive – again in line with the expectation set out in the conceptual model – and did not change in magnitude nearly as much. The correlations between commute and leisure travel time are negative, and the correlations between working and working from home are positive.
3.4. Research method
To determine the longitudinal relations between the variables, we use a random-intercept cross-lagged panel model (RI-CLPM), which is an extension of the cross-lagged panel model (CLPM; (Finkel, 2011). The CLPM is an often-used model to study panel data, with the purpose of empirically testing the bidirectional effects between multiple concepts that are measured over time. To do so, the CLPM specifies auto-regressive relationships, which are supposed to control for the stability of a variable over time. The cross-lagged relationships between the constructs are then supposed to represent the causal processes between the variables. As pointed out in Hamaker et al. (2015), this approach assumes that the value of each variable for every person varies over time around the same sample mean. This assumption is problematic, as most variables do in fact contain stable differences between individuals. For example, some individuals will persistently work more hours per week than others across all measurements, which is ignored by the CLPM.
Hamaker et al. (2015) therefore argues that researchers should not only control for temporal stability across the sample, but also for the time-invariant stability of each variable on the level of the individual. Doing so effectively separates within-person effects over time from constant between-person differences. This is achieved by including random intercepts, which account for the trait-like, time-invariant stability of the variables. The random intercepts thus capture the between-persons differences, allowing the (auto)-regressive structure to specifically capture within-person effects. Figure 2 contains a graphical representation of both the base CLPM (in black) and the additions of the RI-CLPM (in dark red).
The resulting auto-regressive coefficients can then be interpreted as within-individual carry-over effects (Mulder & Hamaker, 2020), meaning that a positive effect indicates that a higher (or lower) than expected value is likely to also have a higher (or lower) than expected value during the next observation, where the expected value is based on the average, trait-like, value per respondent. Similarly, the cross-lagged effects represent that an individual with a higher than expected value on one variable also has a higher than expected value on the other variable in the next measurement. These effects are directional and since they represent within-individual changes, they can be more correctly assumed to represent causal processes on the within-individual level than parameter estimates from CLPMs. The RI-CLPM also allows us to estimate the correlations between the random intercepts, which can be interpreted as the general between-person relations between the concepts associated with the random intercepts (Mulder & Hamaker, 2020).
We use two extensions to the model discussed by Mulder & Hamaker (2020), both relating to the incorporation of time-invariant predictors of the time-variant variables. Time-invariant variables are stable over time, and in this model specification we expect them to have affected the time-variant variables. The first extension is the use of the number of weekend days as a predictor of the random intercepts. Effectively, this controls the trait-like stability of the variables for the differences in our measuring instrument that allots a different number of weekend days to the respondents. Second, we regress the time-variant variables for the waves during the pandemic on the commuting mode used in the last wave before the pandemic. This gives us additional insight into the effects of working from home on the use of the different travel modes.
Finally, we test whether the within-person effects are different during the COVID-19 pandemic than they were before the pandemic by imposing equality constraints on the effects before the pandemic (2017 -> 2018 -> 2019) and on the effects during the pandemic (2019->2020->2021). We effectively estimate two sets of effects, allowing a direct comparison between the effects before- and during the pandemic.
3.5. Model specification and estimation procedure
A schematic overview of the variables and relationships is given in the model in Figure 3. For the sake of visual clarity, the random-intercept structure and the correlations between variables are omitted from this figure. The regression structure is simplified as well. The arrow from work to work between 2017 (denoted with an a) for instance contains four different effects (cross- and autoregressive parameters between total hours worked and hours worked at home). These four effects are each kept stable between the two pre-pandemic wave-pairs. In total, there are sixteen regression parameters for each wave pair.
The models are estimated using the R-package Lavaan (Rosseel, 2012). First, a CLPM is estimated. The fit of this model is unsatisfactory, based on most goodness-of-fit statistics for structural equation models as given in Hooper et al. (2008): chi-square value is 1199 with 176 degrees of freedom (p-value is 0.000). RMSEA is 0.073 and CFI is 0.912. Following the estimation of the CLPM, a RI-CLPM is estimated. Model fit of the RI-CLPM is much better. Chi-square value of the model is 338.9 with 154 degrees of freedom, for a p-value of 0.000. This means that strictly speaking the chi-square test rejects the hypothesis that the model-based covariance matrix fits the sample covariance matrix. However, the chi-square test is sensitive to larger sample sizes, and the chi-square value divided by the degrees of freedom indicates good model fit (Hooper et al. 2008). The RMSEA is 0.043, which also indicates good model fit. Finally, the CFI is 0.975, again indicating good model fit.