Urban transportation systems play a critical role in shaping the quality of life in cities, and understanding travel behaviors is essential for effective transportation planning and policy development. Traditionally, travel surveys have served as the primary data source for studying travel behaviors, providing valuable insights into commuting patterns and travel preferences (Nakamya et al., 2007). Typically, travel surveys gather data regarding individuals’ socio-economic and demographic information, as well as a journey diary for a specific day, which includes information about the starting and ending locations, times, mode of travel, companions, and purpose of each trip (Hong et al., 2021). However, these surveys have several limitations that can impact the accuracy and comprehensiveness of the data collected, such as reporting bias, where people may inaccurately recall or report all of their travel activities, leading to underreporting or misrepresentation of certain trips (Clarke et al., 1981). Non-response bias is another concern, as certain groups of people may be less likely to participate in the survey, potentially skewing the representation of travel behaviors (Richardson et al., 1996). Moreover, travel surveys often have a limited timeframe, capturing data for only a specific day or short period (Stopher et al., 2008), and they may rely on relatively small sample sizes of participants (Stopher et al., 2011), limiting the representation of travel behaviors. Therefore, to overcome these limitations and obtain a more comprehensive and precise understanding of travel behaviors, there is a growing need to explore alternative data sources that can complement and enhance the insights derived from travel surveys.
In recent years, the landscape of travel data collection and analysis has been dramatically transformed by the rise of mobile devices and cutting-edge data collection technologies. With the widespread adoption of GPS-enabled smartphones and other mobile gadgets, an invaluable resource known as Mobile Device Location Data (MDLD) has become available (Yang et al., 2023). This data source captures the movements and locations of individuals with unprecedented precision and continuity throughout their daily routines (Hu et al., 2023). In contrast to conventional travel surveys, which rely on sporadic and self-reported data, MDLD distinguishes itself by accurately documenting the precise locations and movement trajectories of individuals throughout their daily routines (Ratti et al., 2006). This continuous influx of high-resolution location data provides an intricate portrayal of travel behaviors, showcasing origin and destination points, travel routes, and durations spent at specific sites. Through extended data collection periods, MDLD unveils the temporal dynamics of travel patterns, revealing differences between weekdays and weekends, peak and off-peak hours, and even seasonal variations in travel habits (Bachir et al., 2019). Additionally, MDLD illuminates recurring travel behaviors, exposing habitual routes and preferred modes of transportation among individuals, along with their frequently visited destinations (Ashbrook & Starner, 2003).
Despite its promise, MDLD also has its own set of limitations. One of the main challenges lies in the lack of explicit demographic information, which limits its standalone applicability for conducting comprehensive travel behavior analysis (Rojas IV et al., 2016). Demographic factors play a crucial role in shaping travel patterns and preferences. For instance, age and employment status may influence the frequency and purpose of trips, with younger individuals and workers likely having different travel behaviors compared to retirees or students (Su et al., 2020). Similarly, household size and income level can impact choices of transportation modes and travel distances (Amoh-Gyimah & Aidoo, 2013). By combining MDLD with traditional travel survey datasets like the Regional Travel Survey (RTS), researchers can bridge the gap and create a more robust and comprehensive dataset. At present, the research and application of the integration of survey data and location data mainly focus on two directions. One is GPS-based travel survey, where participants are asked to use wearable GPS devices (Hawkins & Stopher, 2004) or smartphone Apps (Safi et al., 2015) to record their travel activities. A significant advantage of GPS-based travel survey is that the demographic information of participants are still self-reported, which makes it more reliable and detailed. However, a limitation of GPS-based surveys is the potential for sampling biases, as they depend on participants willing to respond to the GPS survey, similar to biases in traditional travel surveys (Stopher & Greaves, 2007). The other direction is applying population synthesis to location data, wherein socio-demographics in the location data are matched to the marginal control totals in aggregated census data (Janzen, 2017). However, this also requires a sub-sample of location data with known demographics, which is often scarce and challenging to obtain (Bwambale et al., 2021).
This study proposes a novel approach to enhance travel behavior analysis through the integration of MDLD and RTS datasets based on record linkage algorithms. Two distinct data linkage approaches are utilized to connect individuals with similar travel behavior across datasets. The resulting data panels offer comprehensive and accurate representations of travel behaviors over multiple days, capturing not only peak-hour commutes but also off-peak travel and non-home-related trips. Additionally, they include representative population demographics, enhancing the overall richness and reliability of the dataset.
The primary objectives of this study are as follows:
-
To integrate MDLD and RTS datasets using two data linkage approaches: one based on a probabilistic method and the other on similarity-based techniques.
-
To construct data panels that offer a longitudinal perspective on individuals' travel behaviors, overcoming the limitations of traditional survey methods.
-
To evaluate the effectiveness of each data linkage approach in capturing accurate travel patterns and compare the characteristics of the integrated data panels with the original RTS and MDLD datasets.
-
To explore the representativeness of the data panels and discuss the implications of the findings for transportation planning and policy development.
In the following sections, we present the data used for data linkage, the methodology employed for panel construction, the results of the linking process, and a comprehensive analysis of the data panels, discussing the implications and potential applications of our findings for transportation planning and policy development.