The models developed for associating passenger flows with network centralities of the metro network (network of infrastructure) and the substitute network (alternative public transport options) validate the existence of high correlations among those measures and highlight the potential of network-based predictive models, as first suggested by Luo et al. (2020). Although causation is not implied, the existence of correlation can be valuable regarding fast, cost-efficient, and reliable predictions during the operation of a metro system. In particular, measures such as node degree, betweenness and closeness of the metro network, and node strength of the substitute network can be appropriate predictors of total passenger flows at stations (all of boardings, disembarkation and pass-throughs). Moreover, node centralities of the substitute network, such as strength and weighted betweenness, can be used as predictors of Origin-Destination passenger flows (only boardings and disembarkations). Interestingly, node centralities of the metro network do not seem adequate predictors of OD flows. This finding is in line with the literature suggesting that conventional centralities, such as betweenness, are not appropriate predictors of traffic flows (Kazerani, & Winter, 2009; Gao et al., 2013; Ye et al., 2016). As such, further complex network formulations need to be considered when searching for centralities that can be used as OD flow predictors.
However, the results emerging from model application suggest that the case is not the same for both types of passenger flows, when it comes to model performance. The variance of total passenger flows among metro stations can be explained by centralities of metro and substitute networks to a great extent. At the same time, the evaluation metrics of model predictive accuracy are rather satisfying and thus indicate that network-based models can be used for predictions of total passenger flows. On the other hand, the amount of variance of OD flows explained by the proposed centrality-based model is significantly less, and the evaluation metrics are not good enough to support a satisfying predictive accuracy. Evidently, there are additional significant factors that affect the distribution of OD flows, rather than network structure. Demographic characteristics, such as place of residence, work area, place of education etc. are essential factors affecting travelers’ origin and destination at micro-level, as well as population and building density, land use etc. affect total departures and arrivals from/at metro stations at macro-level. For instance, He et al. (2019) suggested that except for network structure, land use, socioeconomics and intermodal transport accessibility are also significant determinants of metro ridership. As such, centrality measures cannot solely be used for OD flow predictions at this point, but they must be combined with other appropriate socio-economic variables, instead.
But what are the most appropriate centralities to begin with? The findings of this study suggest that node strength and weighted betweenness centrality of the substitute network can be the most appropriate predictors of OD flows, among centralities of the metro and substitute networks. In fact, the network of public transport alternative options, such as bus routes, can provide valuable insight about the volumes of departures and arrivals at metro stations, due to a reverse engineering association. Since bus route design, in terms of frequency, capacity and coverage, has already incorporated determinants of travel demand, the substitute network, which accounts for alternative route performance, succeeds in capturing the information about metro OD flows related to the same socio-economic determinants. This finding is in line with the literature, highlighting the superiority of modified centrality measures over conventional ones (Ye et al., 2016; Senousi et al., 2022). The same is also supported by the fact that it is weighted and not conventional betweenness which is significantly correlated with OD flows.
On the contrary, the findings suggest that total passenger flows can effectively be described by centralities of metro and substitute networks. This measure is different because it also includes the pass-throughs among metro stations. For a better comprehension of why these are highly correlated with network structure, one can imagine total passenger flows in a metro network as the equivalent of liquids in a system of tubes. It is reasonable that those flows are influenced by the exact structure of the network itself, as the flows of the liquids are influenced by the structure of tubes. Hence, the strong association between centrality measures of the metro network and total passenger flows can be justified in the same way. As for node strength of the substitute network, not only is it related to total passenger flows, but it is also the most important predictor among the centralities. A similar reverse-engineering justification like for OD flows can be proposed to explain this association.
In this study, more complex machine learning models (XGBoost) are also developed along with regression models to evaluate the relative performance of the latter through a constructive comparison. The results suggest that both models can be used for this reason, as well as for making predictions, when appropriate. Although the accuracy of the XGBoost models is higher, the difference is not big enough to exclude statistical models, which may be more appropriate for small datasets and more convenient to researchers. In fact, when small datasets are concerned, more complex machine learning models cannot unfold their full potential, and thus, statistical models can be equally reliable. However, the proposed methodology can also be scalable through more complex machine learning techniques.
Total passenger flows and OD flows are treated separately in this study. According to results, different explanatory variables are significant for each type and different model fits are encountered. This suggests that there are different mechanisms behind the birth of each flow type. From a disruption management perspective, there are also distinct implications attached to each flow type. On the one hand, rail track disruptions at/near metro stations would create a network segmentation. That is, both the upstream and downstream passenger flow would entirely be disrupted and each of the new segments could not be reachable by the other. Practically, a trip would violently be terminated at the point of the disruption. Evidently, all metro passengers of the disrupted station would be affected by this situation, since one could not board, alight, or pass through that station. On the other hand, station platform disruptions would affect only the exact station, that is, only a node of the network, but the rest of the network would remain unharmed. On such occasions, only passengers who would be willing to board or alight from/to this station would be affected. The passengers passing through the station would continue their trip freely, since the rail would operate normally. Total passenger flows would be affected in the first case, but only OD passenger flows would be affected in the second case. As such, the determinants of each flow type must be researched separately, so that predictions can be customized depending on the needs of the operator.
The policy implications of this study mainly focus on ridership estimations in cases of disruptions. Through network theory, valuable information about metro system operation can effectively be captured. CNT-based models can be much faster and economical for making reliable ridership estimations. Public transport operators are expected to be supported by such models during daily operations management, especially when disruptions occur. In cases of disruptions, ridership estimates are essential for assessing the potential impacts of it, as well as for designing, sizing and budgeting for mitigation and contingency plans. For instance, platform disruptions at metro stations could be addressed by enhancing the capacity of bus networks serving the nearest operational stations, or rail track disruptions could be faced by bus-bridging services connecting the segmented parts of the metro network. The different possibilities in handling disruptions highlight the importance of treating different flow types separately.
As far as study limitations are concerned, the passenger flows considered in the analysis are daily, representative of a workday. Their static nature constitutes a limitation of this work since time-dependent dynamic data of passenger flows would further enhance the analysis and would be expected to provide more valuable insights. The size of the available dataset is also a limitation, since larger datasets (corresponding to longer time periods) would provide safer conclusions. Last, daily data are appropriate for medium-term estimations, for example, when disruptions of 1–3 days are concerned. Narrowing down the temporal horizon in smaller timeslots would increase the applicability of the method, including short-term estimations, as well.