conference logo

Playlist "State of the Map 2021"

Towards a framework for measuring local data contribution in OpenStreetMap

Maxwell Owusu

OpenStreetMap (OSM) constitutes a new open geographic database and offers several possibilities of adding local knowledge. While the importance of local knowledge is largely acknowledged in the OSM community, relatively few scientific studies have evaluated them. This study presents a framework to measure local data contribution in OSM in three case studies. The results highlight a framework for measuring local data in OSM as well as the distinct mapping stories of local OSM communities.

OpenStreetMap (OSM) has proven to be a valuable source of spatial data for many applications, including humanitarian aid. Information on buildings and roads - that can be provided by remote mapping - is of highest concern for many humanitarian applications. However, further information - that can only be mapped on the ground -  is of high importance for finer scale humanitarian action. Road surface information, type of material and information on the use of a building (health site, school,...) is highly relevant.  OSM offers several possibilities of adding local knowledge [1]. Recent works deals with analyzing and classifying data production in OSM [2] and intrinsic analysis has gained popularity as an indicator for measuring quality of OSM data [3–6]. Nevertheless, relatively few scientific studies have touched on "local knowledge" and local data in OSM in sufficient detail.
The question of how much local knowledge is added and what kind of local data is added remains unanswered. Addressing this question is important since only local knowledge provides access to the plethora of contextual information that is necessary for many purposes. The term "local knowledge" is  often debated in the OSM community due to its ambiguity. Consequently, it is hardly taken into account by researchers when evaluating OSM [1]. This study presents a metric to measure local data contributions in OSM and analyzes temporal patterns of local contributions at three case studies. The aim of the metric is to identify archetypes of places representing a variety of contextual information.
Firstly, we evaluated Rebacca Firth's framework on OSM contribution types that focused on the humanitarian context (see Twitter post: https://t.co/rDaSraiVZF). Secondly, we discussed with local community working groups how to measure local data contributions ("What exactly are local OSM data to you?"). The outcome of the community discussion provided valuable information to design a generalized workflow for measuring local data contribution in OSM. Subsequently, we identified aspects on which the local communities agreed with respect to perception of local data. Based on this first insights, we developed a classification schema for measuring local data in OSM that is "fit-for-purpose" for local OSM communities. This schema consists of four main levels and assigned OSM tags that could be used as indicators for each level. Thirdly, we explored the temporal evolution of local data in OSM for three unique regions. These regions mapping activities are influenced by local mapping organizations. (i) Ramani Huria in Dar es Salaam, Tanzania, focusing on flood resilience (ii) Crowd2MAP mainly operating in the Mara region, Tanzania and focusing on identifying features that can support the fight against girls and women at risk of female genital mutilation, and (iii) Power mapping project by Youth Mappers in the Koindugu, Sierra Leone, focusing on mapping electrical grid infrastructure. We used the ohsome API to access the full history of OSM. We determined the density and the ratio (as the sum of all OSM tags to the number of OSM elements) per month for each region and localness level. 
The outcome of the community discussion showed that local mappers/editors had different perceptions about local knowledge. The type of local data produced depends on: (1) the context within which the data is produced and (2) the character/interest of the individual mapping them. However, the local data produced could be broadly categorized as "core" or "specific". The "core" category consisted of the objects that cut across almost all projects or activities (e.g., buildings, roads, place names and administrative boundary) and the category "specific" were special elements mapped as a results of a particular interest or aim of the project (e.g., culvert, drains, access types, parking type). 
To develop a metric for local data analysis, we classified OSM data into four main levels based -  level one consists of objects that can be derived easily by remote mapping from satellite images such as roads and building (this is information that does not require local knowledge), level 2 focuses on place names and administrative boundaries which are frequently imported, level 3 focuses on the presence of general (e.g., residential and commercial) or specific amenities (e.g., school, clinic, and point of interest) and level 4 focuses on micro-data that provides further contextual information about an object (e.g., road:maxspeed, surface condition). Level 1 and 2 mainly fall into the "core" category whereas level 3 and 4 mainly belong to the "specific" category (which will vary across different regions).  Our results show that the amount of features in OSM decreased from level 1 to level 4. The ratio between level 1 and level 4 could be used as an indicator for how widely local information is present in OSM at a specific location. Thereby, it can provide insights on the quality of the OSM data and fitness-for-purpose for applications that need information beyond the existence of highways or buildings.
From the temporal analysis, we observed that the amount of features in OSM decreased from level 1 to level 4. The ratio between level 1 and level 4 shows how widely local information is present in OSM at a specific location. Thereby, it provides insight on the quality of the OSM data and fitness-for-purpose for applications that need information beyond the existence of highways or buildings. Most of the mapping in the selected region started in 2015. By digging deeper into the objected mapped, each selected region depicts unique characteristics which is largely shaped by the interest of contributors/organizations.  Mapping patterns are clearly distinct from each region with respect to the development of tags. For example, there was high local data regarding waterways, drainage, and solid waste in Dar es-Salaam and very low in Mara region and Koindugu. It reveals the distinct mapping stories of individual or organization. Our results show further, that there is no common path from level 1 to  Level 2 to level 3 among the different regions. For the case of Dar es Salaam, mapping of features of the three levels has happened more or less simultaneously. Mapping in the Mara region focused first on place names (level 2) and then on amenities (level 3) as well as buildings and roads (level 1). For Koindugu, mapping of level 2 started in 2011 already and was followed by mapping of buildings and roads (level 1) in 2014 and amenities (level 3) from 2017 on.
The classification schema helps to conceptualize a metric to measure localness of OSM data at different levels of details. This metric can be easily used to group OSM data into the categories "core" and "specific". By analyzing the temporal patterns, we identified that contribution of local data was highly unequal and largely depended on the interest of the mapper(s). The research shed light on the richness of contextual information in OSM as well as an indication for the quality of data. In future research we would like to extend the results presented here by including more regions and more perspectives from local OSM communities. By doing so we hope to be able to extend the definition of local data by considering the editors' local knowledge as well.