conference logo
State of the Map 2019

Playlist "State of the Map 2019"

Client-side route planning: preprocessing the OpenStreetMap road network for Routable Tiles

Harm Delva

Travelers have high expectations of their route planners. We explore how preprocessing techniques applied to Linked Open Data derived from OSM (Routable Tiles) can provide a satisfying performance for client-side route planning.

Travelers have higher expectations than current route planning providers can fulfill, yet new solutions struggle to break through. Matching user experience from existing applications is already challenging without the large-scale infrastructure most of them have at their disposal; additionally integrating datasets such as the road network, public transportation schedules, or even real time air quality data is an even more laborious endeavour.
W3C and OGC mention the usage of Linked Open Data as a best practice for publishing interoperable geospatial datasets. Instead of relying on proprietary data formats or monolithic CSV files, Linked Open Data uses the RDF data model as a framework for existing domain models. Every data element, and even the relations between them, receives a Uniform Resource Identifier (URI). Data publishers can reuse these identifiers to unambiguously refer to resources on the Web, thus making individual data sets more interoperable. The ultimate goal being automated integration, giving even clients the power to execute the queries. Client-side querying differs from traditional approaches but provides some advantages: (i) it takes the load off the service provider, (ii) the data can be cached for subsequent queries, and (iii) the user leaks less personal data.

The OSM road network has recently been published as routable Linked Open Data, following a similar approach to vector tiles (http://pieter.pm/demo-paper-routable-tiles/). However, executing route planning queries on the client is still an unsolved problem. Long-distance queries require large amounts of data and downloading all the data takes a long time. It also makes caching less effective because caches have a fixed capacity, and writing to a full cache will evict other data. Moreover, even when all the data ingredients are there, processing them may still take a long time. State-of-the-art route planning algorithms achieve better query execution times by using auxiliary data that has been computed in a preprocessing phase. The biggest bottleneck in client-side querying is bandwidth; downloading more data to improve query times will ultimately make querying even slower. Client-side route planning requires a different approach to match the quality of service of existing services.

We explore several ways of preprocessing the tiles to improve the user-perceived performance of query evaluation. As a first step, we compute how to efficiently traverse pedestrian areas. Only the boundary edges of these areas are defined in OSM which means that most routing engines route along these edges, yielding suboptimal paths. Secondly, we identify which nodes and ways are actually needed to cross an individual tile, filtering out the elements that are only used for local traffic. Queries only need the full tiles around the departure and arrival locations. Finally, a hierarchical network graph is computed using an adaptation of the contraction hierarchies algorithm. Each step yields a different view of the tile data and the results are published as Linked Open Data, in accordance with the W3C and OGC best practices.

We integrated the results into a route planner for public transportation. The road network is used to route to and from public transport stops. We found that short-distance one-to-many queries such as getting to the closest nearby station that initially took around 400 ms to complete only take around 260 ms now, and the first results are presented after 140 ms. The difference becomes bigger over long distances; queries that used to take minutes to complete can now be answered in seconds. More importantly, downloading and querying the road network is no longer the main bottleneck. The majority of the querying time is currently spent on parsing the data files, which seems like a tooling issue that should resolve itself as the linked data ecosystem matures.