Towards understanding the quality of OpenStreetMap contributions: Results of an intrinsic quality assessment of data for Mozambique

Aphiwe Madubedube

Playlists: 'sotm2020' videos starting here / audio

Contributors of OpenStreetMap data for Mozambique, a country in Southern Africa, were classified into four distinct groups. The most active group included 25% of all contributors, most of them long-term contributors, and most features were last edited by members of this group. One can therefore conclude that the quality of the data is likely to be good, however, it lacks in completeness and the number of edits per feature is low. Even though no absolute statements about data quality can be made, the analysis provides valuable insight into the quality and can inform efforts to further improve the quality.

OpenStreetMap (OSM) has made it possible for any volunteer to contribute geographic information, regardless of their level of experience or skills. Since the task of creating geographic information is no longer exclusively performed by trained professionals, data quality can be a concern. Uncertainty about the quality of data contributed by volunteers has been cited as a hindrance to its use (Mooney and Morgan, 2015). As the number of OSM contributors continues to grow, gaining knowledge about their characteristics and the kind of data they contribute is important. Data quality can be assessed extrinsically, i.e. against reference datasets, or intrinsically, i.e. by analysing the data itself. OSM data quality has often been assessed extrinsically by comparing it to other reference datasets (Girres and Touya, 2010; Haklay, 2010; Neis et al, 2012; Helbich et al, 2012; Mooney and Corcoran, 2012; Fan et al, 2014; Dorn et al, 2015). However, such reference data is not always available and therefore intrinsic assessment methods have been employed (Anderson et al, 2018; Barron et al, 2014). For example, analysing contributors and their contributions can answer questions, such as: What kind of contributors (e.g. experienced vs newcomers) have worked on the data in the area? In which areas should the data be validated or updated (e.g. where data has been contributed by newer contributors or older non-recurring contributors).

In this study, contributors and their contributions to OSM in Mozambique, a country in Southern Africa, were analysed in order to gain insight into the quality of the data. We chose Mozambique because in 2019, it received a significant amount of attention in the OSM community following the floods and damages as a result of cyclones Idai and Kenneth. The OSM contributors were characterised in three steps: 1) OSM history data, containing information about the contributors and their contributions, was downloaded; 2) using cluster analysis, OSM contributors were classified according to their contribution characteristics; 3) based on the classification, the OSM data contributors were characterised in Mozambique in order to get insight into the quality of the data.

OSM history data provides a record of all the edits (or changes) performed on OSM features. Each edit results in an increment in a feature’s version number. Each version of a feature is associated with a contributor (called ‘user’). The results of the cluster analysis revealed four distinct classes of contributors. The most active class of contributors had 2,552 volunteers (25% of all contributors in the area), with on average the highest numbers of changesets, total contributions, node contributions, way contributions and ways and nodes for which they were the last user to modify them. These volunteers are ‘older’ contributors who have sustained their contributions in the Mozambique area over a long period of time, with the first contributions dating back around 15 years. Compared to the other contributors, they have mapped on more days than the others and the average number of edits per feature is higher. Nevertheless, 99.6% of buildings and 84% of ways in the data had been edited twice at most. Buildings are almost entirely concentrated within the centre of Mozambique, in the areas for which the mapathons were conducted. In some European countries, the number of edits per feature is much higher.

Similar to the results of other OSM contribution analyses (Neis and Zipf, 2012), most of the data generated in Mozambique has been contributed by a small group of active contributors who have dedicated a significant amount of time to this. Studies have suggested that such contributors are more likely to be experienced and knowledgeable about the project (Bégin et al, 2013; Budhathoki and Haythornthwaite, 2013; Barron et al, 2014; Yang et al, 2016) and are therefore more likely to produce data that is of good quality (Barron et al, 2014; Yang et al, 2016; Anderson et al, 2018). Even though no absolute statements can be made about the quality of the OSM data for Mozambique, analysing contributors and their contributions provides valuable insight into the quality of the data and can inform efforts to further improve the quality. The results of this study show how one can gain a better understanding of the community that contributes data in a specific area by inspecting history data. Intrinsic methods for evaluating data quality should not seek to replace ground truthing or the use of reference data sets for evaluating data quality (i.e. extrinsic methods), but rather, to complement these methods by providing alternative ways to gain insight about data quality when reference data is not available.