ALF-Score++ - Transferability of a Predictive Network-Based Walkability Scoring System

7 Walkability is an important measure with strong ties to our health. However, there are existing gaps in the literature. Our previous work proposed new approaches to address existing limitations. This paper explores new ways of applying transferability using transfer-learning. Road networks, POIs, and road-related characteristics grow/change over time. Moreover, calculating walkability for all locations in all cities is very time-consuming. Transferability enables reuse of already-learned knowledge for continued learning, reduce training time, resource consumption, training labels and improve prediction accuracy. We propose ALF-Score++, that reuses trained models to generate transferable models capable of predicting walkability score for cities not seen in the process. We trained transfer-learned models for St. John’s NL and Montréal QC and used them to predict walkability scores for Kingston ON and Vancouver BC. MAE error of 13.87 units (ranging 0-100) was achieved for transfer-learning using MLP and 4.56 units for direct-training (random forest) on personalized clusters.


Introduction
Background 48 Ensuring ALF-Score's pipeline does not engage in repeated wasteful activities is one of the sub-objectives of this research. This 49 is particularly important since road networks can vary in size with some cities being very small (eg. with a population of a few 50 hundred) while some other cities could be very large and dense (eg. Tokyo, Japan with a population of over 37 million people 51 in just one city). Table 1 shows a list of various cities alongside their network size, number of POIs, population and total land 52 area size. Processing data from St. John's, NL as opposed to data from Toronto, ON will have significantly different resource 53 requirement and time consumption due to the change in the the size of the city leading to an extended set of complexities 54 introduced into the network. If the algorithms are not optimized, this difference in requirements may lead to infeasibility of the 55 research. In this research we have experimented with all cities mentioned in Table 1; however, we will only highlight the results 56 for Kingston ON, Vancouver BC, and Montréal QC. 57 Table 1. List of road networks for various cities with their network and POI sizes that have been experimented with in this research. For brevity, in this paper we mostly focus on 3 cities of Kingston, Vancouver and Montréal. Nodes and edges are extracted from road networks. Population density and the total land area information are excerpted from Wikipedia. Transfer learning is the process of re-utilizing the knowledge learned from a task in other tasks. In many machine learning 58 approaches, solving a single task at hand has been the main focus, but now development of approaches that help with transfer 59 learning has become a very popular focus in the recent years 15 . As with most real-world problems, specifically in machine 60 learning, collecting labelled data is a time consuming, expensive 16 and difficult task. Transfer learning uses the knowledge 61 learned from previous problems to solve new but related problems 17 . As a result of its approach, transfer learning can help 62 reduce training time, resources and the required labeled data, 18 as well as improve overall accuracy. Weiss et al. 19 provide a 63 much more formal definition of transfer learning as the following: "given a source domain D S with a corresponding source task 64 T S and a target domain D T with a corresponding task T T , transfer learning is the process of improving the target predictive for other tasks without any more learning 21 . There are a few approaches to transfer learning including feature extraction, training generalized models, and use of existing pre-train models. When it comes to feature extraction, determining the best 80 representation for the problem at hand is a key task which if done properly, can often lead to much better and more accurate 81 results. Carefully selected features can often lead to a powerful and well-generalized model that can be applied to various 82 related problems. Furthermore, using already available pre-trained models is yet another, very popular option. In fact there are 83 numerous pre-trained models available online that provide ready weights for many popular tasks such as classifying certain 84 types of images, object detection and object tracking. It is important to highlight that this approach only requires access to 85 a previously trained model, and not the entire dataset. Additionally, another approach to solve a task using transfer learning 86 where there is not enough data available and no pre-trained models can be found, is to take the previous approach a step further 87 and train models that are designed for another but similar task and that have an abundance of data. These models can then act 88 as a starting point to address the original task. To highlight the difference with the previous approach, in this technique to solve 89 task A, we will be doing our own training on a similar task B. Once we are satisfied with the model, we can now transfer and 90 reuse this knowledge. Goodfellow in his book 21 further discussed two extreme forms of transfer learning, namely: 1) one-shot 91 learning -which only one labeled example of the transfer task is given while, 2) zero-shot learning, has no labeled example 92 given.

93
As an overview to the data used in this research, the general structure of our road network and feature set remains the 94 same to one described in our paper ALF-Score 13 . We collected a small set of user opinion data containing 1,050 user entries

106
In this research, we were able to successfully achieve transferability for ALF-Score++. First, using the newly collected user 107 opinion data for the city of Montréal QC, we were able to achieve a consistency of 99.6% during the GLEPO processing stage.

108
While various feature combinations and machine learning techniques were experimented with, we were able to achieve our   Can-ALE with assigning higher walkability scores to the city center, the first major differentiator among the two is that in 147 Can-ALE's higher walkability is given to the central and highly populated areas of the city center whereas in ALF-Score++, 148 while central region is ranked with higher walkability, ALF-Score++ recognizes the core as slightly less walkable compared to 149 locations surrounding the core of the city center. Specifically, ALF-Score++ favours waterfront walkways and paths as more 150 walkable as opposed to Can-ALE. For instance, the area near to Leon's Centre on Ontario Street is known to be a walkable area 151 and is ranked with high walkability through ALF-Score's zero-user-input approach whereas it is ranked with a significantly 152 lower walkability score by Can-ALE.

153
Additionally, ALF-Score captured a cluster of greener/more walkable spots close to students housing and living quarters 154 near Queen's University. While this area is popular among many students, faculty and other members of the public, Can-ALE 155 was unable to capture it due to its area-based structure and lower spatial resolution. Moreover, we observed various other 156 areas that ALF-Score++ ranked as walkable whereas Can-ALE failed to capture their actual walkability due to it's lower n/a n/a n/a n/a 32.44 STJ on STJ (50%) + MTL (100%) ( n/a n/a n/a n/a 33.89 MTL on STJ + MTL (2) (rand 80-20) 25     generate new models to add new data, one does not need to rerun the entire process on the entire data sets. All that is required is 224 to transfer the knowledge from previously trained models (which can be transfer learned models themselves) and only run 225 a smaller transfer learning task on the newly collected data. We also believe ALF-Score++ pipeline can be adjusted to be  As we went through the predictive process, a variation was observed between the performance of shallow and deep models.

251
We believe ALF-Score and its various extensions such as ALF-Score+ and ALF-Score++ can be very beneficial and act as very 252 powerful tools for many people from various backgrounds working on different domains. Although ALF-Score can produce 253 results specific to various parameters, such as demographics to provide personalized walkability scores, ALF-Score's pipeline 254 takes a generalized approach instead to allow for various issues that may not be related to walking or walkability be addressed 255 using this method. For instance, bikeability, school friendliness, transit friendliness, or even POI specialties based on different 256 demographics and perceptions. Moreover, the pipeline may be capable of handling wide variety of features as well as other 257 types of networks instead of road network. For example, subway networks. At its core, ALF-Score requires a vector of user 258 ground truth labels alongside a list of features. ALF-Score uses our dedicated web-tool to collect the ground truth labels and 259 processes them through GLEPO to reflect relative to absolute conversion within a small group of users. However, ALF-Score's 260 pipeline follows a black box system and works with any compatible input data regardless of how they were prepared. The 261 ground truth data can be processed according to individual researchers' needs and this step can be bypassed in the pipeline.

262
Although walkability scores generated by ALF-Score and its extensions rely on road network data, the generalization offered 263 by their pipeline can be further distilled to beyond road networks. Road network data is treated as any other features and can 264 be replaced with an appropriate feature based on the issue at hand and the research requirements. We genuinely believe that 265 ALF-Score opens the door to many possibilities well beyond the scope covered in this research. learning process will then include features and user opinion from a second city. The output will be a more generalized model 290 capable of transferring its knowledge to cities never seen during its training process. The personalization process utilizes this 291 transfer learning approach to do the same task but on each separate profile cluster, resulting in multiple models capable of 292 predicting personalized walkability scores for cities seen or never seen by the algorithm.

294
To prepare the map database, the first step is to gather the feature set that includes various information such as POI, road 295 embedding and road network data. The POI data is available freely through OpenStreetMap (OSM) 25 . We utilized Overpass- location's type as well as its coordinates:

304
As each POI is represented by a node on the road, we assign a value to 10 separate distance ranges which represent the 305 number of POIs of a specific category within a specific distance range to a specific node. Based on 53 amenity categories, we 306 can produce a POI feature list containing 530 feature columns and n rows for the number of unique nodes in the road network.
forest) while its maximum depth of the tree was not limited. Most other parameters such as the number of jobs to run in parallel, 367 the number of features to consider when looking for the best split and bootstrap sampling were set to scikit-learn 37 default 368 parameters. Random forest is an ensemble approach. Ensemble learners aim to use multiple weak learners to build a strong 369 learner that perform very well taking a divide and conquer approach. Random forest uses standard decision tree which could be 370 considered as its weak learner. Multiple of these trees will then form a forest which can perform better as a group. Table 2   371 shows the difference in error between random forest using 100 weak learners as well as a single decision tree. Random forest 372 performs significantly better. There are two specific functions in scikit-learn's random forest that although not specifically 373 labeled as transfer learning approaches, are geared toward transferring previously learned knowledge. These functions are 374 warm_start and partial_fit. Warm start aims to fit an estimator repeatedly over the same data set but with varying parameters.

375
Using this approach one can look at various parameters to improve performance while reusing the model learned from previous 376 parameters to save computing resources and time. Warm start is typically used for fine tuning the model parameters. Partial fit 377 on the other hand aims to provide an online machine learning approach while maintaining a fixed model parameters between 378 calls, by allowing for new data in every call. This data is called mini-batch. Online machine learning is a method used to update 379 the predictor in a sequential order as new data becomes available. This is the opposite approach taken in batch learning where 380 the training data set never changes.

381
Furthermore, MLP was used as a way to utilize deep learning specifically as a doorway to transfer learning. In this paper, 382 we work with transfer learning under the assumption that previously trained models of similar task are available (through 383 ALF-Score). The first step to initiate the transfer learning process is to import three sets of data: 1) previously trained MLP 384 models, 2) GIS features such as POI, centrality and embedding features associated with the new city, 3) user data such as 385 user opinion and demographics associated with the new city. After successful import of data, the usual data processing and 386 preparation steps will need to be taken such as dealing with incomplete entries and processing features through one-hot 387 encoding, where applicable. In this research we use TensorFlow 38 to facilitate MLP training and transfer learning processes.

388
TensorFlow is "a free and open-source software library for machine learning and artificial intelligence" that enables us to apply {x, y} set. In our approach, we only import the models previously trained through this approach. ALF-Score uses various combination of dense layers and number of neurons. Table 6 shows a brief set of example settings 402 we have experimented with. To transfer the model generated/imported as above, the first step is to create a new Sequential 403 model and copy the hidden layers desired from the original model over to the new model. In the process we will exclude the 404 output layer. We also need to ensure all transferred layers are frozen by setting them as non-trainable so the algorithm will 405 not modify them. Next, we add a dense output layer to the new model with unit set to 1 and activation function set to linear.

406
Finally we set the loss function to mean_absolute_error, the optimizer to adam and the metrics to mean_squared_error and 407 compile and fit the new model. After a few iterations/epochs, we can try to unfreeze the reused hidden layers to allow back 408 propagation to modify and fine-tune them and re-evaluated the performance. It is also suggested 21 to reduce the learning rate 409 to avoid changes in weights that are fine-tuned when these layers are unfrozen. A good rule of thumb is to train the model 410 for the new task for a few epochs while the reused layers are frozen. Then unfreeze the reused layers and continue to train,