The synthetic fused data were further analyzed to investigate the factors affecting commute mode choice behaviour in the GTA during the pandemic. Specifically, we take a subset of the travel diary data representing commuting trips from the core survey, the socio-demographic information of the respondents, and their attitudinal statements imputed from the satellite surveys. We use this information to jointly model the determinants of mode choices with respondents’ attitudes and risk perceptions (latent attributes derived from the imputed attitudinal variables) using a hybrid choice model (HCM) framework (42, 43).
The model estimation data observed five major commute modes: car drive, car passenger, transit, walk, and bicycle. Although some observations of commuting trips used other modes, their number is too low to have reasonable market shares, so these observations are omitted. Transportation level-of-service (LOS) attributes of the five modes are generated using a Google application programming interface (API) framework, namely the Tool for Incorporating Level of Service attributes (TILOS) (44). TILOS generates mode-specific travel time and distance information using trip origin and destination coordinates, and departure time. Auto LOS relies on a mix of historical and real-time travel data. Transit LOS uses General Transit Feed Specification (GTFS) data. To generate cost by auto mode, we employed a list of available cost matrices widely used for transportation planning by various government and public agencies in the study region (45). For transit fare, we used a calibrated Deterministic User Equilibrium traffic assignment model of the study area called the GTA model, which generates transit fare based on origin and destination traffic analysis zones and departure times.
The choice set for each individual was determined using feasibility constraints: one must have a driver’s license and a car to use the car drive mode; total transit travel time over 120 min is considered to be infeasible for commuting; a distance over 3 km is considered to be infeasible for walking, and a distance over 10 km is considered to be infeasible for using bicycle. The feasibility constraints for transit, walk, and bicycle is based on the sample data. In the raw data, 95% of the commuting trips made by walking have a trip length of less than 3 km. Similarly, 10 km for bicycle trips and 120 min for transit trips correspond to 95th percentile and 98th percentile values, respectively. After removing the observations with infeasible mode choices, missing personal and household socioeconomic information, and missing level-of-service attributes, a final dataset of 3,319 commuting trips and commuters is obtained. In this data, car drive is the most dominant commuting mode, with a 65% share. Car passenger contributes to 6.5% of daily commuting trips. Transit share is 19.5% whereas walk and bicycle respectively contribute 8% and 1% of daily commuting trips.
6.1 Model specification
Factor analysis was conducted to identify latent factors based on the imputed attitudinal questions (referred to here as indicator variables). The most consistent findings were obtained using two factors, which loaded strongly onto eight statements (with loadings larger than 0.4) as shown in Table 4.
Table 4
Definition of latent attitudinal factors based on the fusion outputs
Latent Construct
|
Observed indicator
|
Factor Loading
|
Perception of increased risk during the pandemic
|
I believe there are more risks associated with leaving my home than before the pandemic
|
0.402
|
I believe there is more risk associated with using ride-sourcing services than before the pandemic
|
0.400
|
I believe there is more risk associated with using taxi services than before the pandemic
|
0.494
|
I believe there is more risk associated with carpooling than before the pandemic
|
0.445
|
I believe there is more risk associated with using car-sharing services (e.g., Zipcar, Communauto) than before the pandemic
|
0.436
|
Concerns regarding the pandemic
|
I am concerned about the number of daily new cases in Ontario, Canada
|
0.479
|
I am concerned about the emergence of the new variant of COVID-19
|
0.483
|
I am concerned about the mortality rate of the disease which is causing the pandemic
|
0.445
|
Figure 5 shows the two components of the HCM – the discrete choice model and the latent variable model consisting of structural and measurement equations. In the HCM, the utility \(U\) of mode \(m\) for individual \(n\) is defined by the following function, which combines both observed and latent variables:
$${U}_{m,n}={V}_{m,n}+{\epsilon }_{m,n}={\beta }_{m}{X}_{m,n}+{\lambda }_{l}{\alpha }_{l,n}+{\epsilon }_{m,n}$$
3
Here, \(V\) represents the systematic component of the utility function of the corresponding alternative,
\(X\) is a vector of observed variables (socio-demographic attributes and modal LOS attributes),
\(\beta\) is a vector of utility coefficients associated with the observed variables,
\(\alpha\) is a vector of latent variables (\(l=1,\dots ,L\) where \(L=2\)),
\(\lambda\) is a vector of coefficients associated with the latent variables,
\(\epsilon\) represents a random error to capture the unobserved component of the utility function of the corresponding alternative. The error is IID across alternatives and observations and follows a type I extreme value distribution.
The structural equation of the latent variable \({\alpha }_{l}\) for individual \(n\) is given by:
$${\alpha }_{l,n}={\gamma }_{l}{Z}_{l,n}+{\eta }_{l,n}$$
4
Where, \(Z\) is a vector of socio-demographic characteristics of individual
\(\gamma\) is a vector of estimated parameters,
\(\eta\) follows a standard Normal distribution across individuals, capturing the random component of the latent variable
Thus, the likelihood of the observed choice \(m\) for individual \(n\), conditional on \(\beta\) and \({\alpha }_{n}\) is given by:
$${L}_{{C}_{n}}\left(\beta ,{\alpha }_{n}\right)=\frac{{e}^{{V}_{{m}^{*},n}}}{{\sum }_{m=1}^{5}{e}^{{V}_{m,n}}}$$
5
Where \({m}^{*}\) is the alternative chosen by individual \(n\). The two latent variables are also used to explain the value of the associated attitudinal questions, where we adopt ordered logit specifications. The likelihood of the ordered logit models is given by:
$${L}_{{I}_{n}}\left(\tau ,\zeta ,{\alpha }_{n}\right)=\prod _{i=1}^{I}\left(\sum _{p=1}^{5}\left({I}_{n,i}=p\right)\left[\frac{{e}^{{\tau }_{i,p}-{\zeta }_{i}{\alpha }_{n}}}{1+{e}^{{\tau }_{i,p}-{\zeta }_{i}{\alpha }_{n}}}-\frac{{e}^{{\tau }_{i,p-1}-{\zeta }_{i}{\alpha }_{n}}}{1+{e}^{{\tau }_{i,p-1}-{\zeta }_{i}{\alpha }_{n}}}\right]\right)$$
6
Where, \({\zeta }_{i}\) is an estimated parameter that measures the impact of \({\alpha }_{n}\) on the attitudinal indicator \({I}_{i}\), and \({\tau }_{i}\) is a vector of threshold parameters for this indicator. The term \(\left({I}_{n,i}=p\right)\) will be equal to 1 if and only if individual \(n\) answers with level \(p\) to indicator \({I}_{i}\), where \(p=1,\dots ,5\).
The combined log-likelihood of the HCM is then given by:
$$LL\left(\gamma ,\zeta ,\tau ,\beta \right)=\sum _{n=1}^{N}log{\int }_{{\eta }_{n}}^{ }{L}_{{C}_{n}}\left(\beta ,{\alpha }_{n}\right){L}_{{I}_{n}}\left(\tau ,\zeta ,{\alpha }_{n}\right)\varphi \left({\eta }_{n}\right)d{\eta }_{n}$$
7
Having imputed \(s\) instances of each attitudinal statement in the fusion process, we could estimate \(s\) hybrid choice models, which in turn allowed us to generate the distribution of each parameter in the model and have a detailed understanding of the extent of the effect of each mode choice attribute. For this, we first finalized the model specification using the nearest neighbour fusion result. We then re-estimated the model for all the 100 sets of imputed attitudinal statements to generate distributions of the parameters. All models were coded and estimated in Apollo v.0.1.0 (46).
6.2 Model estimation results and discussion
The results of the HCM estimation are shown in Table 5. The specification of the final model using the NN fusion output is derived based on the accommodation of variables with proper signs and statistical significance. The critical value (1.96) of the t-stat with a 95% confidence limit is used as the threshold value of considering variables in the model. However, some parameters with t-stat values lower than 1.96 are retained in the model because the corresponding variables provide considerable insight into the behavioural process. In all the subsequent estimations (using the multiple imputation fusion outputs), the same attributes were found to be significant with similar signs (and somewhat similar magnitudes), highlighting the statistical robustness of the final specification reported in Table 5.
Table 5
Hybrid choice model of commuting modes estimated using fusion outputs
|
|
NN fusion
|
Multiple imputation fusion
|
Variable
|
Mode
|
Para-meter
|
t-stat
|
10th perc.
|
25th perc.
|
Median
|
75th perc.
|
90th perc.
|
Choice model component
|
Alternative specific constant
|
Car drive
|
-12.351
|
-3.428
|
-8.583
|
-8.235
|
-7.479
|
-6.582
|
-6.093
|
Car passenger
|
0.000
|
-
|
0.000
|
0.000
|
0.000
|
0.000
|
0.000
|
Transit
|
3.709
|
3.571
|
4.768
|
5.102
|
5.342
|
5.568
|
5.778
|
Walk
|
3.595
|
5.016
|
4.010
|
4.088
|
4.263
|
4.493
|
4.604
|
Bicycle
|
0.685
|
0.866
|
1.143
|
1.234
|
1.403
|
1.637
|
1.730
|
In-vehicle travel time (min)
|
Car drive
|
-0.144
|
-2.603
|
-0.185
|
-0.175
|
-0.165
|
-0.155
|
-0.150
|
Car passenger
|
-0.376
|
-8.312
|
-0.421
|
-0.416
|
-0.413
|
-0.408
|
-0.400
|
Transit
|
-0.102
|
-5.632
|
-0.113
|
-0.111
|
-0.110
|
-0.108
|
-0.107
|
(Access + egress) time (min)
|
Transit
|
-0.080
|
-2.195
|
-0.098
|
-0.096
|
-0.089
|
-0.084
|
-0.081
|
Number of transfer(s)
|
Transit
|
-0.768
|
-2.409
|
-0.978
|
-0.949
|
-0.918
|
-0.893
|
-0.846
|
Travel cost ($)
|
All motorized modes
|
-0.450
|
-3.620
|
-0.556
|
-0.548
|
-0.534
|
-0.521
|
-0.509
|
Parking cost ($)
|
Car drive
|
-0.049
|
-1.630
|
-0.052
|
-0.050
|
-0.048
|
-0.047
|
-0.047
|
Trip length (km)
|
Walk & Bicycle
|
-1.062
|
-8.430
|
-1.181
|
-1.168
|
-1.152
|
-1.134
|
-1.123
|
Number of vehicles in household
|
Car drive
|
6.015
|
5.557
|
4.518
|
4.584
|
4.836
|
4.898
|
5.088
|
Car passenger
|
0.525
|
2.927
|
0.540
|
0.562
|
0.575
|
0.595
|
0.608
|
Number of bicycles in household
|
Bicycle
|
0.473
|
3.258
|
0.445
|
0.452
|
0.457
|
0.461
|
0.470
|
Gender: Female
|
Bicycle
|
-1.384
|
-2.926
|
-1.487
|
-1.467
|
-1.449
|
-1.436
|
-1.422
|
Increased risk perception
|
Car drive
|
13.135
|
5.365
|
9.993
|
10.178
|
10.878
|
11.042
|
11.382
|
Car passenger
|
-2.996
|
-5.447
|
-3.534
|
-3.465
|
-3.313
|
-3.196
|
-3.107
|
Pandemic concern
|
Transit
|
-3.351
|
-8.927
|
-3.895
|
-3.837
|
-3.741
|
-3.684
|
-3.638
|
Structural model for latent variable "Increased risk perception"
|
Age
|
0.018
|
7.588
|
0.014
|
0.014
|
0.015
|
0.016
|
0.016
|
Usual workplace during COVID: workplace
|
0.163
|
2.472
|
0.175
|
0.181
|
0.189
|
0.197
|
0.207
|
Student
|
-0.439
|
-5.210
|
-0.547
|
-0.522
|
-0.502
|
-0.494
|
-0.473
|
Structural model for latent variable "Pandemic concern"
|
Age
|
0.009
|
2.508
|
0.010
|
0.012
|
0.013
|
0.015
|
0.016
|
Student
|
-0.190
|
-1.409
|
-0.186
|
-0.121
|
-0.091
|
-0.049
|
-0.019
|
Household income below $60,000
|
-0.373
|
-2.506
|
-0.478
|
-0.468
|
-0.409
|
-0.369
|
-0.348
|
Household has adult > 60 years old
|
0.629
|
5.506
|
0.550
|
0.568
|
0.625
|
0.663
|
0.697
|
Measurement models for latent variable "Increased risk perception" (ordered logit)
|
Risk of being outside
|
|
|
|
|
|
|
|
|
Threshold 1
|
-2.984
|
-22.695
|
-3.241
|
-3.163
|
-3.086
|
-3.025
|
-2.979
|
Threshold 2
|
-1.722
|
-19.818
|
-1.942
|
-1.898
|
-1.832
|
-1.770
|
-1.709
|
Threshold 3
|
-0.168
|
-2.266
|
-0.475
|
-0.433
|
-0.375
|
-0.340
|
-0.294
|
Threshold 4
|
1.891
|
20.144
|
1.598
|
1.647
|
1.687
|
1.762
|
1.820
|
Impact of latent variable
|
0.263
|
4.585
|
0.024
|
0.052
|
0.103
|
0.133
|
0.157
|
Risk of ridesourcing
|
|
|
|
|
|
|
|
|
Threshold 1
|
-3.794
|
-20.611
|
-4.635
|
-4.562
|
-4.441
|
-4.304
|
-4.086
|
Threshold 2
|
-2.262
|
-22.533
|
-2.735
|
-2.613
|
-2.564
|
-2.489
|
-2.424
|
Threshold 3
|
-0.521
|
-7.132
|
-0.782
|
-0.727
|
-0.689
|
-0.661
|
-0.596
|
Threshold 4
|
1.317
|
15.912
|
0.979
|
1.033
|
1.097
|
1.166
|
1.213
|
Impact of latent variable
|
0.223
|
3.809
|
0.033
|
0.058
|
0.113
|
0.171
|
0.216
|
Risk of taxi
|
|
|
|
|
|
|
|
|
Threshold 1
|
-3.464
|
-21.707
|
-4.245
|
-4.081
|
-3.980
|
-3.852
|
-3.766
|
Threshold 2
|
-2.029
|
-21.628
|
-2.611
|
-2.509
|
-2.418
|
-2.376
|
-2.321
|
Threshold 3
|
-0.354
|
-4.847
|
-0.675
|
-0.591
|
-0.569
|
-0.504
|
-0.471
|
Threshold 4
|
1.254
|
15.204
|
0.988
|
1.024
|
1.073
|
1.130
|
1.152
|
Impact of latent variable
|
0.232
|
4.038
|
0.014
|
0.037
|
0.078
|
0.115
|
0.150
|
Risk of carpooling
|
|
|
|
|
|
|
|
|
Threshold 1
|
-3.549
|
-20.977
|
-4.225
|
-4.092
|
-3.988
|
-3.888
|
-3.794
|
Threshold 2
|
-2.004
|
-21.016
|
-2.698
|
-2.631
|
-2.526
|
-2.469
|
-2.421
|
Threshold 3
|
-0.317
|
-4.192
|
-0.831
|
-0.813
|
-0.770
|
-0.725
|
-0.690
|
Threshold 4
|
1.448
|
16.330
|
0.976
|
1.006
|
1.036
|
1.092
|
1.124
|
Impact of latent variable
|
0.305
|
5.255
|
0.018
|
0.041
|
0.095
|
0.127
|
0.148
|
Risk of car-sharing
|
|
|
|
|
|
|
|
|
Threshold 1
|
-2.990
|
-22.239
|
-4.138
|
-4.006
|
-3.841
|
-3.767
|
-3.655
|
Threshold 2
|
-1.899
|
-20.273
|
-2.558
|
-2.506
|
-2.440
|
-2.381
|
-2.296
|
Threshold 3
|
-0.321
|
-4.145
|
-0.758
|
-0.722
|
-0.661
|
-0.604
|
-0.566
|
Threshold 4
|
1.623
|
17.452
|
1.071
|
1.112
|
1.186
|
1.220
|
1.244
|
Impact of latent variable
|
0.303
|
5.207
|
0.087
|
0.099
|
0.154
|
0.193
|
0.233
|
Measurement models for latent variable "Pandemic concern" (ordered logit)
|
Concerned about daily new cases
|
|
|
|
|
|
|
|
Threshold 1
|
-2.615
|
-20.524
|
-2.561
|
-2.462
|
-2.398
|
-2.308
|
-2.285
|
Threshold 2
|
-1.468
|
-14.573
|
-1.434
|
-1.396
|
-1.339
|
-1.276
|
-1.211
|
Threshold 3
|
-0.416
|
-4.441
|
-0.482
|
-0.422
|
-0.393
|
-0.349
|
-0.291
|
Threshold 4
|
1.469
|
13.591
|
1.214
|
1.280
|
1.325
|
1.356
|
1.471
|
Impact of latent variable
|
0.489
|
5.533
|
0.116
|
0.161
|
0.208
|
0.245
|
0.280
|
Concerned about new variants
|
|
|
|
|
|
|
|
Threshold 1
|
-3.643
|
-20.721
|
-3.304
|
-3.245
|
-3.080
|
-3.008
|
-2.881
|
Threshold 2
|
-1.749
|
-17.512
|
-1.761
|
-1.705
|
-1.653
|
-1.617
|
-1.533
|
Threshold 3
|
-0.752
|
-8.318
|
-0.831
|
-0.758
|
-0.719
|
-0.681
|
-0.574
|
Threshold 4
|
0.832
|
8.509
|
0.585
|
0.658
|
0.689
|
0.736
|
0.833
|
Impact of latent variable
|
0.429
|
5.343
|
0.087
|
0.151
|
0.200
|
0.253
|
0.333
|
Concerned about mortality rate
|
|
|
|
|
|
|
|
Threshold 1
|
-3.147
|
-20.616
|
-3.031
|
-2.940
|
-2.814
|
-2.731
|
-2.661
|
Threshold 2
|
-1.739
|
-15.899
|
-1.757
|
-1.697
|
-1.653
|
-1.601
|
-1.525
|
Threshold 3
|
-0.483
|
-4.828
|
-0.504
|
-0.453
|
-0.409
|
-0.374
|
-0.323
|
Threshold 4
|
1.260
|
11.187
|
1.118
|
1.159
|
1.215
|
1.250
|
1.293
|
Impact of latent variable
|
0.519
|
6.200
|
0.237
|
0.250
|
0.294
|
0.337
|
0.399
|
The specification of the choice model component shows that LOS attributes (travel cost, trip length, the different trip time components, and the number of transit transfers) have signs that match expectations. Among the different travel time components, in-vehicle travel time is more relevant to commute mode choice decisions than walking (access and egress) time. Such findings may be related to the high level of transit access coverage in the study area. The model shows that females are less likely to cycle than males in terms of personal and household attributes. On the other hand, as expected, household vehicle and bicycle ownership positively affect car use (car drive and car passenger) and bicycle use.
Regarding the role of the latent attitudinal variables, the choice model shows a strong and positive effect of “increased risk perception” towards car drive mode. It seems that the decision to commute by car is mainly determined by the increased risk perception associated with travelling during the pandemic rather than traditional personal and modal attributes (given the high parameter value and significance of the latent variable). Interestingly, the same latent variable shows a negative effect for car passenger mode, which is understandable given that the car passenger mode in our model also includes ride sharing option with non-household members. Thus, individuals with higher levels of increased risk perception are less likely to share rides with others. As for the “pandemic concern”, it is found that individuals who are more concerned about the various aspects of the pandemic are less likely to choose transit as their commute mode. These findings are in line with previous studies (3, 5, 9).
The estimates of the parameters \(\gamma\) in the structural models help explain the influence of individual’s socio-demographic characteristics on the latent variables. For example, it is found that older respondents and respondents who had to be physically present in their workplace during the pandemic have higher risk perceptions. Similarly, older respondents and respondents who have lived with senior household members (aged 60 years and above) have increased concern regarding the pandemic. These findings make intuitive sense, given that COVID-19 has been found to be more dangerous for the older population. On the contrary, individuals whose household income is below $60,000 are less likely to be concerned about the pandemic than higher-income individuals. In terms of the measurement models, the \(\zeta\) estimates have the expected sign (positive) and are statistically significant, confirming the results of the factor analysis. Thus, more positive values of the risk perception and pandemic concern latent variables increase the probability of stronger agreement with the associated attitudinal statements.
Overall, the findings of the HCM provide important behavioural insights about the commute mode choice decisions during the pandemic that meet a priori expectations and are consistent with other studies in the literature. This also highlights that the fused data can be reliably used for much more complex and stable investigations than would be possible individually with either the core or the satellite survey data.