Assessing Preference Heterogeneity for Mobility-on-Demand Transit Service in Low-Income Communities: A Latent Segmentation Based Decision Tree Method

The future of public transit service is often envisioned as Mobility-on-Demand (MOD), i.e., a system that integrates ﬁxed routes and shared on-demand shuttles. The MOD transit system has the potential to provide better transit service with higher efﬁciency and coverage. However, little research has focused on understanding traveler preferences for MOD transit and preference heterogeneity, especially among the disadvantaged population. This study addresses this gap by proposing a two-step method, called latent segmentation based decision tree (LSDT). This method ﬁrst uses a latent class cluster analysis (LCCA) that extracts traveler pro-ﬁles who have similar usage patterns for shared modes. Then, decision trees (DT) are adopted to reveal the associations between various factors with preferences for MOD transit across different clusters. We collected stated-preference data among two


Introduction
In recent years, many transit observers have envisioned the future of transit to be a Mobility-on-Demand (MOD) transit system that integrates fixed-route services with on-demand ridesharing (Maheo et al., 2019;Shen et al., 2018;Yan et al., 2019b). The MOD transit system may enhance transit operations by solving the first-/lastmile problems, filling in the gaps in the existing services, enhancing accessibility for under-served communities, increasing transit ridership, and cutting operational costs.
To better plan and implement the MOD transit system, it is essential to study traveler preferences for MOD transit and preference heterogeneity, especially among the disadvantaged populations (who are often low-income, less-educated, carless, elderly, etc.). These disadvantaged individuals are usually more transit-dependent, but are more likely to have low technological capability and lack access to smartphones or data plans (Pew Research Center, 2018). Therefore, it is imperative to study the needs of disadvantaged travelers to better inform policies and strategies. However, few published studies have focused on this topic. To fill this research gap, in this study we address the following research questions (RQs): -RQ 1: What travel profiles can we extract from individuals living in low-income communities based on their current use of transit and ridehailing? -RQ 2: What factors (e.g., demographic and socioeconomic characteristics, and built-environment variables) are associated with traveler preferences for MOD transit and how do these associations differ across traveler profiles?
To answer these questions, we adopt a latent segmentation based decision tree (LSDT) method. The LSDT method includes two steps, namely, (1) applying latent class cluster analysis (LCCA) to segment the market by using travelers' current bus and ridehailing usage as the indicators, and (2) probabilistically assigning travelers to each cluster (i.e., a traveler can be 20% in Cluster 1 and 80% in Cluster 2, if there are two clusters in total), and fitting different decision tree (DT) models to different clusters. Each step answers a RQ discussed above.
Two-step methods like LSDT have been applied in the field of transportation to better account for heterogeneity. For example, a similar approach 1 , i.e., LCCA plus DT, has been used to analyze travelers' heterogeneity when evaluating transit service quality (de Oña et al., 2016). Ding and Zhang (2016) applied hierarchical clustering analysis and multinomial logit models to analyze travel mode choice. Depaire et al. (2008) applied LCCA to identify clusters with homogeneous traffic crash patterns and then used multinomial logit to assess the risk factors of each cluster. Chang et al. (2019) and Liu and Fan (2020) also used a two-step method, i.e., LCCA plus mixed logit models, to investigate injury severity in traffic crashes. Prior research has shown that applying such two-step method can reveal hidden relationships and generate richer insights for decision-makers (Chang et al., 2019;de Oña et al., 2016).
The first step of the proposed LSDT is to use LCCA to segment the entire sample into subgroups with similar characteristics. The main reason we are using LCCA here is that it is a probability-based parametric clustering technique, which has been applied in the previous travel behavior literature to identify market segments and has shown its strength in analyzing heterogeneity (e.g., Kim et al., 2019;Wang et al., 2021). In a companion paper of this study, Wang et al. (2021) applied LCCA to residents from low-income neighborhoods in Michigan and they identified three latent clusters based on their current usage of shared modes (including fixed-route public transit and ridehailing services) and their preferences for a proposed MOD transit system; the three clusters include shared-mode enthusiast, shared-mode opponent, and fixed-route transit loyalist. Results indicate varying MOD preferences among the three segments, which intrigues us to further analyze the decision rules regarding MOD preferences in different segments. Therefore, in this paper, we decide to use LCCA to segment people from low-income neighborhoods based on their current transit/ridehailing usage to answer RQ 1.
In the second step of the proposed LSDT, we propose to use DT to conduct cluster-specific analysis, instead of using logit models like some previous work did (Chang et al., 2019;Depaire et al., 2008;Liu and Fan, 2020). The main reason is that most logit models have certain limitations due to their predefined assumptions, e.g., the assumption of the independence of irrelevant alternatives [IIA] for multinomial logit models and random parameter distributions for mixed logit models. Once the assumptions are violated, the estimation of the likelihood function will be erroneous (de Oña et al., 2016). In addition, logit models take on the inflexible functional forms to model the relationships between the input and response variables, which may not be accurate or even appropriate when there exist high nonlinearities and/or interactions in the data. By contrast, DT models do not rely on these assumptions and have flexible model structure to capture nonlinearities and interactions. Moreover, DT models offer graphic representation and transparent interpretation for policy making (James et al., 2013). By integrating LCCA and DT, we will be able to extract key insights on what factors are associated with people's preferences for MOD transit and how these relationships vary across different traveler groups determined by their current shared mode usage (RQ 2).
The reminder of this paper is organized as follows. Section 2 provides a literature review on different models used to assess preference heterogeneity in travel behavior. Section 3 describes the study area and the data. Section 4 discusses the overall modeling framework and introduces the formulation of LCCA and DT. Section 5 presents the results. Section 6 synthesizes the findings, discusses the policy implications, concludes the paper with strengths and limitations of the study, and identifies future research directions.

Literature Review
Different individuals would react to the new MOD transit system distinctively due to preference heterogeneity (Bhat, 1997;Fu, 2020). Understanding and analyzing preference heterogeneity can help decision-makers develop better-targeted policies to meet the travel needs of all residents who live in low-income communities.
In the past several decades, mixed logit models have been widely utilized to assess preference heterogeneity (Train, 2009;Yan et al., 2019a). Despite having better model fit than simpler logit models (e.g., multinomial logit and ordered logit models), the mixed logit models have suffered from several drawbacks. Specifically, the mixed logit models rely on the mathematical assumptions about random parameter distributions and error term distributions (Walker and Ben-Akiva, 2002), but these assumptions could easily be violated in real-world applications. In addition, the mixed logit models require extensive work in model tuning and high computational costs. Moreover, some argued that the mixed logit models tend to become quite complex, which makes them less transparent for direct interpretation (Fu, 2020).
Alternative to the mixed logit models, the latent class model (LCM), also known as the latent class choice model, has been developed to study preference heterogeneity (Shen, 2009). The LCM contains two sub-models, i.e., the class membership model and the choice model. More specifically, the LCM first separates the population into different segments with a class membership model, which maximizes within-segment homogeneity and between-segment heterogeneity; it then estimates segment-specific choice models to reveal the preference heterogeneity residing in the effects of explanatory variables (Kim and Mokhtarian, 2018). The LCM allows researchers to identify various population segments with distinctive preferences, and and it has been wildly applied to assess preference heterogeneity in travel behavior studies (Eldeeb and Mohamed, 2020;Fu, 2020;Kim and Mokhtarian, 2018;Oliva et al., 2018;Shen, 2009;Vij et al., 2013;Wen et al., 2012). For example, Vij et al. (2013) incorporated the influence of latent modal preferences on travel mode choice behavior by using LCM. Recently, Fu (2020) applied LCM to study how a traveler's habit moderates his/her mode choice for commuting trips.
However, the LCM only allows for one dependent variable when conducting the joint estimation for both the class membership model and the choice model, bringing many limitations to real-world applications that may require different dependent variables for the two models and/or need multiple dependent variables (also known as indicators) when conducting clustering analysis. A two-step method (i.e., a clustering step followed by a cluster-specific modeling step) can relax this constraint and has recently been used to model and interpret people's travel behavior, (e.g., Ding and Zhang, 2016;de Oña et al., 2016). For instance, de Oña et al. (2016) integrated LCCA and DT to assess the perceived transit service quality and detect specific needs and requirements from different subgroups with unique traveler profiles.

Study Area and Data
This study investigates heterogeneous traveler preferences for a MOD transit system among low-income neighborhoods. We distributed a web-based survey in the city of Detroit and the city of Ypsilanti area, Michigan, both of which are low-source communities in the region with a significant proportion of the population living under poverty 2 . Participants were recruited from July to November 2018. We obtained a total of 497 and 534 completed responses from Ypsilanti and Detroit, respectively. After removing invalid responses and observations with missing values, a total of 825 (Ypsilanti: 410; Detroit: 415) responses were retained for further analysis. The survey collected data on travelers' stated preferences for MOD transit versus fixed-route system, their current usage of shared mobility, their demographic and socioeconomic characteristics, and built-environment factors. More details of the survey design and distribution can be found in Yan et al. (2019b).
The descriptive statistics of the variables considered in this paper are summarized in Table 1. In the last column of the table, we show in which model(s) the variable is included. Note that MOD Transit Preference is the response variable for DT, while Ridehailing Usage Frequency and Bus Usage Frequency are the indicators for LCCA. Note that as Likert scale (i.e., ordinal) variables with five or more categories can usually be treated as continuous with little concerns (Johnson and Creech, 1983;Norman, 2010;Rhemtulla et al., 2012;Sullivan and Artino Jr, 2013), here, we treat the Likert scale variable (i.e., MOD Transit Preference) as a continuous one and apply regression trees to interpret people's preferences for MOD transit across various population groups.

Modeling Framework
In this paper, we adopt a two-step latent segmentation based decision tree (LSDT) method, i.e., an integrated approach with LCCA and DT, to assess preference heterogeneity for MOD transit service in low-income communities. Figure 2 illustrates the overall modeling framework.
As shown in Figure 2, the first step is to collect the individual-level travel preference data using survey tools, which will be covered in the next section. Then, LCCA is applied to segment the dataset into K different clusters, each of which represents distinctive traveler profiles. In particular, we estimate the probabilities of an observation belonging to different latent classes and weight all the observations with the cluster-specific probabilities when training DT models for different clusters. Compared to directly splitting the dataset into subsets (i.e., deterministic classification), our method (probabilistic classification) enables DT to use the full dataset (i.e., full information) to train three cluster-level DT models, which are distinct from each other due to different weights applied. Moreover, probabilistic classification usually generates more homogeneous results and fewer noises within each cluster, which could lead to a clearer path of the decision rules. These cluster-specific DT models can then allow us to analyze the heterogeneous traveler preference for MOD transit in order to engage more nuanced policy discussions and develop better-targeted policy intervention strategies for low-income neighborhoods.

Latent Class Cluster Analysis
Latent class cluster analysis (LCCA) is a probabilistic based clustering technique. Figure 2 presents the model framework of the simplified LCCA modified from Wang et al. (2021) 4 . The LCCA model contains two sub-models: The membership model and the measurement model. Specifically, the membership model uses active covariates z to predict the latent class membership k, i.e., the latent shared mobility usage segment. In this simplified LCCA model, active covariates include demographic and socioeconomic traits (i.e., gender, age, race, education attainment), travel-related traits (i.e., vehicle ownership), and technology usage (i.e., smartphone and data plan ownership). Note that we retain covariates that relate to job accessibility as inactive covariates, which does not influence the latent class structure. Instead, we will use the retained inactive covariates as inputs for DT to predict the MOD Transit Preference. In the measurement model, we use the latent variable k to capture the association between the two observed ordinal indicators y: Ridehailing Usage Frequency and Fixed-Route Transit Usage Frequency. Under the local independence assumption, the two indicators are assumed to be mutually independent given Cluster k. (1) represents the probability of observing the two indicators y i for individual i given a set of observed covariates z i . The unobserved latent class k, which has K categories, intervenes between y i and z i . Specifically, P(y i |z i ) is the probability of the membership model and P(y i |k) is the probability of the measurement model. Given the local independence assumption, the probability of the measurement model could write as the probability product of the two indicators, i.e., ∏ 2 t=1 P(y it |k). Eq.
(2) defines the probability of individual i belonging to latent class k given a set of observed covariates z i , which is parameterized using the multinomial logit formula. For each latent class, we estimate an intercept γ k0 and a set of parameters γ kr corresponding to the R active covariates. Eq.
(3) defines the probability of individual i with its tth indicator equal to m given the latent class k. Note that both indicators used in this study are ordinal variables. As such, the probability is parameterized using the adjacent-category logit formula. We estimate an intercept for each ordinal value m and a parameter β t k for each latent class. Here, the y t * m is the score assigned to level m of the tth indicator.
In this paper, we estimate the LCCA model by using Latent GOLD software (v.5.1). Three clusters are achieved from our analysis, and the detailed results are covered in Subsect. 5.1.

Decision Trees
Decision trees (DT) can be used to tackle both regression and classification problems, and in this paper, we treat the five-level MOD Transit Preference variable as continuous and fit regression trees to explain the heterogeneity in people's travel preferences. DT can automatically capture complex high-dimensional data and is famous for its intelligible graphical representation and transparent interpretation. Despite many different methods to fit DT, the classification and regression trees (CART) algorithm is probably the most popular one for tree induction (Breiman et al., 1984). The following description is focused on regression part of CART.
DT recursively partitions the feature space into sub-regions until some stopping rule is applied (Hastie et al., 2009). Suppose we have the data with each observation denoted by (x i , y i ) and its case weight w i , we consider a splitting variable j and split point s; then, the pair of half-planes are defined as: Then, we aim to estimate the splitting variable j and split point s by solving For any j and s, the inner minimization is achieved byĉ 1 = ave(y i |x i ∈ R 1 ( j, s)) and c 2 = ave(y i |x i ∈ R 2 ( j, s)), where ave(·) indicates a weighted average function. After finding the best split, we can partition the data into two regions and repeat the partition process until a stopping criterion is reached. Such a large tree can be denoted by T 0 . However, a very large tree tends to overfit the data, so we need to control the tree size to achieve the best out-of-sample performance. Therefore, the tree is often pruned by using cost-complexity pruning (Hastie et al., 2009). The cost complexity criterion is where |T | is the number of terminal nodes (leaves) in tree T ,ĉ m = ave(y i |x i ∈ R m ), and α is the complexity parameter. Here, we aim to find, for each α ≥ 0, the subtree T α ⊆ T 0 to minimize C α (T ). There is clearly a trade-off between tree size and its goodness-of-fit to the data. We can select a value of α by using cross-validation, and then return to the entire dataset to output the subtree corresponding to α.
A key output of DT is variable importance, which assesses the impacts of independent variables on the DT model's prediction. In our case of regression trees, variable importance is estimated by the decrease in node impurities from splitting on the variable, where node impurity is measured by residual sum of squares.
In this paper, we apply the CART algorithm by using the R package rpart Therneau et al. (2015) and the trees are visualized using rpart.plot Milborrow (2020). We use grid search to tune the main hyperparameters of DT models, including minsplit (the minimum number of observations that must exist in a node in order for a split to be attempted), maxdepth (the maximum depth of any node of the final tree, with the root node counted as depth 0), and cp (complexity parameter). For the benchmark DT model (trained on the overall sample), minsplit = 20, maxdepth = 6, and cp = 0.013. For the DT model built for Cluster 1, minsplit = 14, maxdepth = 4, and cp = 0.017; for Cluster 2, minsplit = 18, maxdepth = 3, and cp = 0.010; for Cluster 3, minsplit = 18, maxdepth = 10, and cp = 0.014.

Latent Class Cluster Analysis
To select the optimal number of latent classes, we run the LCCA model with varying numbers of clusters from 1 to 10. The Bayesian Information Criterion, or BIC (equals to 4561.57), indicates the 3-cluster solution has the best model fit after penalizing model complexities; the solution also has a good interpretability. As such, we choose the 3-cluster LCCA as the final model. Table 2 presents parameters and z-values of both membership and measurement models of the 3-cluster LCCA solution. We name and develop cluster profiles based on the cluster-specific distributions of the indicators and covariates (see Table 3).
As shown in Table 3, Cluster 1 is the largest cluster among the three, which comprises 50% of the full sample. Cluster 1 members have an average ridehailing usage frequency of 2.03, indicating a more-than-twice usage of ridehailing services in the past week, which is the highest among the three clusters. Meanwhile, Cluster 1 members also have a relatively high fixed-route transit usage frequency (2.44). As such, we name Cluster 1 as "shared-mode user." The shared-mode user cluster comprises a slightly larger proportion of males than the sample average (53% versus 48%). Among the three clusters, shared-mode users have the largest proportion of individuals who are younger than 40 years old (71%) and own college degrees (64%). They also have the highest household income. A large proportion of individuals from this cluster own a vehicle (88%), whereas 11% and 15% individuals do not have a smartphone or data plans, respectively.
Cluster 2 comprises 29% of respondents in the sample. Their average ridehailing and fixed-route transit usage frequencies are 0.26 and 0.39, which are the lowest among the three clusters, respectively. Reflective of their low usage of shared modes, we name Cluster 2 as "shared-mode non-user." The shared-mode non-user cluster contains more females than males (64% versus 36%). More than half of the individuals in this cluster have a college degree (54%). Moreover, shared-mode non-users have the highest proportion of vehicle owners (94%), smartphone owners (97%), and data plan owners (97%) among the three clusters.
Cluster 3 comprises 21% of respondents in the sample. Cluster 3 members have the lowest usage of ridehailing services (0.20) and the highest fixed-route transit usage frequency (2.98) among the three clusters. Thus, we name Cluster 3, "transitonly user." Compared to the other two clusters, the transit-only user cluster has the largest proportion of elderly people (60 years and above, 17%), and the largest proportion of the low-income group (63% of the individuals have a household income less than $25,000). Only 5% of individuals from the transit-only user cluster have college degrees and only 32% own vehicles, which are much lower than the counterparts of the other two clusters. The transit-only user cluster also has the highest proportions of individuals who do not have smartphones (21%) or data plans (23%) among all three clusters.

Decision Trees
As illustrated in Figures 3-6, four different regression trees have been generated. Specifically, Figure 3 is the DT for overall sample of travelers; Figures 4-6 correspond to each of the detailed traveler profiles of three different clusters. As illustrated in Table 1, MOD Transit Preference is selected as the response variable, while seven other variables are chosen as the independent variables. The selection of independent variables is mainly based upon the results from Yan et al. (2019b), which found these seven variables are statistically significant when used to model people's stated preferences for the MOD transit service. We use mean absolute error (MAE) to measure the performance of the DT models. MAE is formally defined as whereŷ i is the predicted value for observation i, y i is the true value for observation i, and n is the number of observations in the testing set. The overall MAE estimate from the joint DT models for K clusters can be computed as whereŷ i,k is the predicted value for observation i from the DT model for Cluster k, and p i,k is the probability that observation i belongs to Cluster k with ∑ K k=1 p i,k = 1. By using leave-one-out cross-validation and Eqs. (7) and (8), we estimate the MAE of the DT model for the overall sample is 0.833, while the overall MAE from the joint DT models for the three clusters is 0.829. Hence, we find that by applying the proposed framework illustrated in Figure 2, the LSDT method shows similar (or, even slightly better) predictive accuracy than the benchmark DT model.
For the fitted trees (see Figures 3-6), each box denotes a tree node, and the nodes at the bottom are called terminal nodes. In each node, we indicate the total number of observations belonging to this node, the corresponding percentage of observations in the node, and the average value (i.e., the fitted value) of the dependent variable (i.e., MOD Transit Preference) among all the observations in this node. The coloring of the node boxes are based on the fitted value: Darker the blue, larger the fitted value. Under each node, the left branch indicates 'yes' to the condition listed there, while the right branch denotes 'no' to the condition.
In Figure 3, we show the DT built for the overall sample of travelers. The primary split for the overall sample is based on Job Accessibility by Transit, which is the same case for the three cluster-specific DT models. An important insight we gain here is when job accessibility is very high (above 52k), travelers are in general more favorable of fixed-route transit service; when job accessibility is below 52k, people are more open to MOD transit, but have much more complex decision rules. For example, Node 13 indicates that with job accessibility less than 4,025 (much lower than the mean of job accessibility, i.e., 10,261), having previous ridehailing experience, and owning a college degree, these travelers are very supportive of MOD transit (with the fitted value of 4.4), consisting of 22% of the overall sample. Therefore, we may conclude that people who have high job accessibility could go to work easily by using the existing fixed-route transit services. In other words, fixed-route transit may have already met their travel demands; as such, they do not necessarily need MOD transit. In contrast, MOD transit can serve as an affordable alternative for people who are currently having low job accessibility.
From Figure 4 to 6, we show the DT models for the three latent clusters. These three DT models have the entire sample as the input data (i.e., n = 825 at the top node), but different case weights (estimated from LCCA to represent the individual's probabilities of belonging to each cluster) are applied when fitting models for different clusters. Note that, for some nodes, we may observe that the number of observations seem inconsistent with the percentage of observations in the node: Taking the  DT for Cluster 1 as an example (see Figure 4), Node 2 consists of 473 observations and 45% of the sample, while Node 9 has 352 observations and 55% of the sample. This is because the percentage shown here is a weighted percentage using the case weights (i.e., probabilities of belonging to different clusters) passed to the CART al-  gorithm (Milborrow, 2020). Therefore, when interpreting the DT models, we mainly focus on the percentage of observations instead of the absolute observation counts. Figure 4 illustrates the DT built for Cluster 1 "Shared-mode user." This tree is similar to the DT for the overall sample, but Ridehailing Experience is not included in this tree. In addition, Node 1 (i.e., root node) of Cluster 1 model has the fitted value of 4.1, which is larger than the fitted values for the root nodes of the other two clusterspecific DT models. These results suggest that shared-mode users are more open to different shared modes and have higher preferences for MOD transit. In addition, the DT for Cluster 1 uses College Degree for splitting, which is not included in the other two cluster-specific DT models. For shared-mode users who have a college degree and low job accessibility (i.e., Node 11), they are very supportive of MOD transit. This is consistent with the existing findings that travelers who are more highly educated are more open to new mobility options (Lavieri and Bhat, 2019). Moreover, this DT model also shows that shared-mode users who have better job accessibility but living outside the transit service area are more willing to adopt MOD transit. This finding indicates the potential of MOD transit to tackle the infamous first-/last-mile problem in the U.S. Figure 5 shows the DT built for Cluster 2 "Shared-mode non-user." This tree looks much simpler compared to the DT models for the overall sample and the other two clusters. Only two variables, namely, Job Accessibility Transit and Ridehailing Experience, are included in the model. As shown in Node 5, the majority (i.e., 65% of the sample) of the shared-mode non-users are approximately neutral when comparing the fixed-route with MOD transit. Figure 6 represents the DT built for Cluster 3 "Transit-only user." This tree is the most complicated one among the three cluster-specific DT models. Six different variables show up in this tree, in comparison to four included in the overall sample tree, three in Cluster 1 tree, and two in Cluster 2 tree. An important observation is that with relatively higher job accessibility (more than 19k), transit-only users have higher preference for fixed-route over MOD transit. In contrast, according to the other two cluster-specific DT models, the threshold of job accessibility is much higher for choosing fixed-route over MOD transit (i.e., the fitted values of MOD Transit Preference less than 3): 52k for shared-mode users who also live within transit service area (Node 4 in Figure 4) and 52k for shared-mode non-users (Node 2 in Figure 5). Compared to other two types of travelers who would choose fixed-route only if the job accessibility is exceptional, transit-only users tend to stick to fixed-route transit, when the job accessibility is acceptable. However, for the DT model built for the overall sample, the job accessibility threshold is 52k (Node 2 in Figure 3), which demonstrates that the proposed LSDT method can generate rather richer insights than a single DT could. We also find that with relatively lower job accessibility (less than 19k), transit-only users who have access to personal vehicles have relatively lower preference for MOD transit than the ones who have no access to personal vehicles do. But the difference is small, i.e., 3.3 for car owners versus 3.9 for carless people. Among those carless transit-only users, male travelers are more acceptive of MOD transit than females. This observation is consistent with the results in Yan et al. (2019b), which also finds that females might have safety concerns regarding the new MOD transit service. Besides, among those female transit-only users, despite acceptable job accessibility, no data plan could lead to low acceptance of MOD transit (Node 12), which shows the importance of addressing digital divide when deploying the new MOD transit system.
There exist several seemingly unreasonable nodes in the trees, i.e., Nodes 8 and 9 in Figure 3, Nodes 6 and 7 in Figure 4, and Nodes 5 and 6 in Figure 6. These nodes all have a same problem that with job accessibility below certain thresholds, travelers are less likely to choose MOD transit. As the DT models are purely data-driven without relying on any predefined assumptions, these anomalies are usually caused by the noise/bias in the data and overfitting of the DT models.  Figure 7, we present the relative variable importance plots (scaled to sum up to 100% for each DT model) for the four DT models. We find that Job Accessibility by Transit is the most important variable for all four DT models. Thus, it seems that traveler preferences are mostly shaped by the destinations accessible via transit. This finding is consistent with the notion that accessibility-rather than mobilityrepresents people's basic need for transportation (Levine et al., 2019). On the other hand, whether living within or outside the transit service area shapes the preferences of "shared-mode users" (Cluster 1), indicating the importance of last-mile transit connectivity. Ridehailing Experience is the second-most important variable for the overall sample tree and Cluster 2 tree. This indicates that for shared-mode non-users, having used ridehailing at least once in the past week is an important indicator to gauge traveler preferences for MOD transit. Moreover, Car Ownership and Male are important variables in the decision tree model for Cluster 3 ("Transit-only user"), but they are insignificant in the other models. According to the population profiles shown in Table 3, the vast majority of individuals in Cluster 1 and Cluster 2 have access to a data plan and own a personal vehicle. The lack of variability may explain why they are not important predictors of MOD transit preference in all models expect the Cluster 3 model. This finding further verifies the importance of fitting cluster-specific models, as an all-sample model may suppress the heterogeneous preferences across population segments. Interestingly, a lower preference for MOD transit exists among females in Cluster 3 but not those in the other two clusters. A possible explanation is that some females in Cluster 3 might have unpleasant experiences with or negative perceptions of ridehailing.

Discussion and Conclusion
According to the results presented in the previous section, we find that the LSDT method can generate much richer insights than the a single DT model fitted for the overall sample. In particular, when combining the results from LCCA and clusterspecific DT, we can attach the traveler class profile to their corresponding decision rules when choosing between MOD transit and fixed-route services.
For example, the LCCA results for Cluster 3 suggest that the travelers in this cluster are most vulnerable (i.e., having the largest proportion of older, low-income, carless, and technology illiterate people than the other two clusters) yet most dependent on public transit services. When investigating their decision rules shown in Figure  6, we find that for people who are currently enjoying very high job accessibility by transit want to stick with the fixed-route services and those people are very likely to live in the downtown area (Yan et al., 2019b), so we may want to keep running the fixed-route service in the downtown region especially between major corridors. On the other hand, for people have relatively lower accessibility and no access to personal vehicles, females are more reluctant to choose MOD transit than males due to safety concerns (Yan et al., 2019b), so to successfully serve low-income neighborhoods located in lower-density areas, we need to come up with innovative strategies to improve the safety of on-demand shuttles. Some solutions include instead of sending travelers to their doorsteps, the on-demand shuttles would send them to a virtual stop that is located in the central area of the community in order to reduce the concerns from female travelers.
In contrast, the LCCA results show that the travelers in Cluster 1 have the largest proportion of individuals who are technology savvy, younger than 40, own college degrees, have high household income, and own a vehicle, and they are using public transit and ridehailing service frequently. The DT model for Cluster 1 shows in general this group of people is supportive of the MOD transit service, with only 4% of them are somewhat inclined to the fixed-route transit (Node 4 in Figure 4). Hence, when MOD transit starts to operate, we probably will not lose much transit ridership among this small population group, whereas for the rest majority, they are likely to use the MOD transit to substitute their fixed-route transit trips and potentially, some of their ridehailing trips.
According to the LCCA results for Cluster 2, we find that the travelers in this group have the highest proportion of owning personal vehicles, smartphones, and data plans, and they do not use public transit or ridehailing much in their daily life. The decision rules of Cluster 2 are quite simple and show these people are generally neutral to MOD transit, with one exception that for the individuals currently have very low job accessibility (Node 6 in Figure 5), they show high potential to adopt MOD transit in the future. Therefore, one insight is that when designing the MOD transit system, we need to expand the service area of the existing transit system and provide on-demand shuttles to fill the transit gaps created by the existing fixed-route services.
To summarize, the insights gained here can help transit agencies and transportation planners and engineers to design an inclusive MOD transit system with higher efficiency and effectiveness. They can also leverage our research findings to develop better-targeted strategies to promote MOD transit usage in low-income communities.
There are some limitations of this study. First, there exists some sampling bias when collecting data in Ypsilanti, Michigan. Unlike Detroit data collection, we did not have in-person recruitment in Ypsilanti, so some low-income population was under-represented in our sample. Second, DT models may sometimes be sensitive to small perturbations, which would lead to unstable model structures.
Future work should increase in-person recruitment among low-income communities to have a less biased sample for analysis. In addition, a model distillation approach could be considered to generate more stable DT models for interpretation (Zhou et al., 2018). Lastly, we want to emphasize that we do not advocate fully relying on the proposed method to make policy intervention decisions; instead, we suggest comparing the outputs from different approaches (i.e., our proposed method, logit models, and machine-learning methods) to generate more comprehensive results and insights for decision-making (Zhao et al., 2020). Zhao X, Yan X, Yu A, Van Hentenryck P (2020) Prediction and behavioral analysis of travel mode choice: A comparison of machine learning and logit models. Travel behaviour and society 20:22-35 Zhou Y, Zhou Z, Hooker G (2018) Approximation trees: Statistical stability in model distillation. arXiv preprint arXiv:180807573