Understanding the protection of privacy when counting subway travelers through anonymization

Abstract

Public transportation, especially in large cities, is critical for livability. Counting passengers as they travel between stations is crucial to establishing and maintaining effective transportation systems. Various information and communication technologies, such as GPS, Bluetooth, and Wi-Fi, have been used to measure people’s movements automatically. Regarding public transportation applications, the automated fare collection (AFC) system has been widely adopted as a convenient method for measuring passengers, mainly because it is relatively easy to identify card owners uniquely and, as such, the movements of their card holders. However, there are serious concerns regarding privacy infringements when deploying such technologies, to the extent that Europe’s General Data Protection Regulation has forbidden straightforward deployment for measuring pedestrian dynamics unless explicit consent has been provided. As a result, privacy-preservation techniques (e.g., anonymization) must be used when deploying such systems. Against this backdrop, we investigate to what extent a recently developed anonymization technique, known as detection k-anonymity, can be adapted to count public transportation travelers while preserving privacy. In the case study, we tested our methods with data from Beijing subway trips. Results show different scenarios when detection k-anonymity can be effectively applied and when it cannot. Due to the complicated relationship between the detection k-anonymity parameters, setting the proper parameter values can be difficult, leading to inaccurate results. Furthermore, through detection k-anonymity, it is possible to count travelers between two locations with high accuracy. However, counting travelers from more than two locations leads to more inaccurate results.

Publication
Computers, Environment and Urban Systems, 110. https://doi.org/10.1016/j.compenvurbsys.2024.102091
Mingshu Wang
Mingshu Wang
Reader in Geospatial Data Science