A proposal for a QGIS Plugin for Spatio-temporal analysis of OSM data quality: the case study for the city of Salvador, Brazil

Elias Nasr Naim Elias

Playlists: 'sotm2021' videos starting here / audio

It Consists in a proposal for a QGIS Plugin for Spatio-temporal analysis of OSM data quality in an area of Brazil.

The development of methodologies to evaluate geospatial data quality is one of the most important aspects to be considered while obtaining these data. For the developing countries, such as Brazil, the lack of investment for the maintenance of the topographic mapping, especially on a big scale, is a recurrent challenge to the National Mapping Agencies (ANM) [1]. For example, studies reveal areas in Brazil that have never been mapped and that the topographic mapping in the 1:25.00 scale is nearly 5% of its extension [2].
The technological advances enabled a series of methodologies for obtaining geospatial data [3]. One example is presented as Volunteered Geographic Information (VGI) [4]. In this case, the update of information may occur faster and with a reduced cost in detriment to the traditional structures of topographic mapping [5]. A successful case of VGI is the OpenStreetMap (OSM) platform, which presents the growth in the number of contributors and contributions or mapped features. To comprehend the behaviour of the OSM features and their integration potential to the topographic mapping, different surveys worldwide have put efforts to evaluate its quality, whether by its extrinsic [6, 7] or intrinsic [8] aspects. In this regard, some studies have evaluated the quality of OSM's features by combining extrinsic and intrinsic aspects, like [9], that evaluated the positional precision of OSM based on the combination of the edition's history. Besides that, the most recent researches have focused on comprehending spatial and temporal aspects of events in OSM contributions [10], as well developing add-ons for evaluating data quality, as is presented by [11], that developed a QGIS toolbox to evaluate parameters of the intrinsic quality of OSM features.
The literature identifies as one of the main challenges for the integration processes, the heterogeneity of the data. Once the quality may vary according to the study area, the indicator used of even the spatial variations through time in the same region. In this context, to understand the adjustment of OSM's resources to the topographic mapping, it is crucial to connect aspects related to the quality and heterogeneity of data. Researches like [1] argue that, based on the obtained quality, the resources resulting from VGI may be used to integrate, detect changes or report errors. Therefore, classifying resources from OSM according to their usability in a certain region becomes essential, especially in developing countries like Brazil. Besides that, the research that explores issues of quality, heterogeneity, and contributions patterns of OSM is still not widespread in developing countries [12].
Given the importance to classify OSM features according to their usability for a given region, especially in developing countries, few researchers explore quality, heterogeneity, and contribution pattern issues of OSM in Brazil. we proposed a hypothesis that understanding aspects of the extrinsic and intrinsic quality of the quality of OSM features, related to spatiotemporal aspects of contributions in developing countries, will help decision making regarding the influence of the dynamics of insertion of features concerning quality.
Thus, this research has as an objective to evaluate the extrinsic quality of OSM features for the county of Salvador-Bahia-Brazil (the northeast region of the country). Therefore, we investigated indicators of positional accuracy, thematic accuracy and completeness, the visualisation of heterogeneity of data, and the analysis of the edition history. To accomplish the evaluation of extrinsic quality, the OSM features were compared to the topographic mapping of the country regarding the Cartographic and Cadastral System of the County of Salvador (SICAD, 2006) and features from the Urban Development Company of the State of Bahia (CONDER).
The analysis of positional and thematic accuracy was made through procedures of feature sampling. The analysis of completeness occurred from comparing the total of available features. The verified categories were features from the road system, religious, educational, and health buildings. We divided the municipality of Salvador into sub-regions to identify different local patterns of quality in the analysis of thematic accuracy and completeness. The visualisation allows obtaining the data's heterogeneity through a plugin developed in the software QGIS, making the planimetric positional evaluation for point and line features. The statistical procedures for developing the plugins were realised based on the Brazilian law to evaluate geospatial data quality analysis [13] and based on the method of double buffer proposed by [14]. The plugin is available, and it is possible to be accessed in the online repository https://github.com/eliasnaim/AcuraciaPosicional_PEC-PCD. Even though the final results comprehend aspects of Brazilian law, they can be replicated to obtain the discrepancies and posterior adjustments. We used the OHSOME Application Programming Interface (API) (identify the patterns concerning the OSM editing history. Thus, from the adaptations done in scripts given by researchers linked to OHSOME, it was possible to identify the aspects of OSM contributions between 2008 and 2020. We also tested the generation of regression curves and calculated the number of daily contributions to identify these patterns. These verifications were occasioned through the generation of an evolving rectangle of 5x5 km in the study area. The disposition of the rectangle was given through a visual analysis with a larger quantity of OSM features.
The evaluation of extrinsic evaluation highlighted the variability of the results obtained in [15]. In analysing the positional accuracy, the scale found varied from 1:20,000 to 1:30,000, while the discrepancies between the mapped coordinates and the reference one varied between 10.27m and 0.12m. In analysing completeness, the road system presented a percentage of 82%, while in the other features, the variation was from 29% to 46%. When analysing thematic accuracy, it turns out that the primary source of errors is related to the absence of names in editing. In the analysis referent to the history's growth of represented features, it was possible to notice a near-linear function, with an R2 value of 0.94. This aspect gives the initial premise that it is possible to model patterns of contributions and associate them to the saturation level of the quantity of added elements in a particular area. Besides that, it was possible to observe that the patterns of collaboration can be affected by different variables because it was noticed that in 2016, more than 800 features were added in a short period. These aspects can be related to events such as data importation or mapathons.
The development of add-ons for evaluating OSM data quality that departs from the making of statistical procedures up to visualising the heterogeneity of data will assist in the decision-making as to data quality.
The magnitude of discrepancies did not present patterns and that this may vary according to the period of edition and the database used for the contributions. We noticed the relevance in identifying the aspects of quality and heterogeneity in OSM contributions.
For Brazil, identifying these characteristics may numerally indicate the integration potential of these data to the authoritative mapping. Besides that, it will estimate the influence of unusual agents, like it is the case of data import in the contributions. The continuity of the studies is recommended to identify the causes of different patterns of growth and the continuity of studies to automatise the quality procedures.