Damien Graux and Thibaud Michel
During the past decades, the European Commission has invested billions in research through various programmes, such as H2020. In this study, we review exhaustively all the H2020 open deliverables to analyse how these public european projects are relying on OpenStreetMap.
Since 1984, the European Commission has been supporting research through various successive programmes. Recently, from 2014 to 2020, the EU invested approximately 80 billion euros into its eighth programme, named Horizon 2020 [1]. Among various focuses such as the excellence of science or industrial secondments, H2020 emphasised on supporting an open access policy for all the research results [2]. Moreover, H2020 projects were strongly encouraged to use open source software and tools.
Practically, all the research domains were eligible to be supported by the H2020 programme, and therefore, the scopes of the projects vary from e.g. computer science, to philology passing by agriculture… Technically, as these projects are almost always involving several partners located in several European member states joining forces from multiple institutions, there is often a need to deal with data coming from different places. And, more generally, geo-data are often involved to tag information which may be research data, meeting localisation, partner addresses, etc.
In such a context where open source tools are recommended by the European Commission, we analyse the presence of OpenStreetMap in H2020 projects. In addition, we also review the presence of other geographic services such as Google, Bing and Baidu maps, in order to better understand how researchers tend to choose one over the other.
Thanks to the open access policy, participants of the H2020 projects had to make their results available. To do so, their various types of materials were submitted to the European portal which then offers them publicly. As a consequence, for each project, one can access the articles (through DOIs), the blog posts, the slide decks, the deliverables… In particular, in our study, we decided to focus on the deliverables as they are accessible on the EC portal directly and are the common reports written by the partners to describe their approaches. Indeed, these deliverables (usually written on a regular basis during the project) report on the findings and methodology set up to achieve the project’s goals and authors explain their architectural choices in depth such as describing the tools used. As a consequence, cartographic services, if involved at some stage in the project, are likely to be mentioned in these documents either as acronyms (e.g. OSM) or as website references (e.g. https://www.openstreetmap.org/).
In order to obtain the deliverables together with projects’ information, we combined two European sources of information to gather all the facets we wanted to cover: CORDIS [3] and Data.Europa [4]. In particular, we extracted from CORDIS various high-level information about the projects themselves: from their names and acronyms to their durations passing by the specific European call-for-fundings they answered and obtained their money from. This latter category can be useful in order to have a finer-grained understanding of the domains which are prone to involved cartographic services. Next in order, Data.Europa was used to download the deliverables themselves, which required several days of computing resources.
Overall, during the course of the H2020 programme, 33636 projects were funded by the European Commission. Depending on the type of action which was set by the projects, not all of them had some open deliverables written (and thereby available on the Europa platform). Actually, a large part of these projects did not have deliverables per se but rather articles or web posts. We indeed counted 25157 projects without deliverables which restricted our study to the remaining 8479 projects. Out of them, we listed a total of 92612 distinct deliverables to be analysed, representing more than 260GB.
Technically, once all these deliverables were downloaded, we searched them for various terms to know if some cartographic services are involved in the text. We therefore set up several regex rules (e.g. 'open.?street.?map’ or ‘[^a-z0-9]osm[^a-z0-9]’) which were run over the 92000+ deliverables. This allowed us to systematically count all the occurences of the considered cartographic solutions. In the end, we found that 1840 deliverables (from 651 projects) mention OpenStreetMap. More precisely, through all the H2020 deliverables, there are approximately: 18600 mentions to OSM, 2800 to GoogleMaps, 226 BingMaps and 4 to BaiduMaps. Empirically, we notice that 1) one order of magnitude separates the occurrences of each cartographic service and 2) OpenStreetMap is from far the most represented solution and thereby the one on which public European researchers rely the most. Contextually, it is also interesting to note that not all the deliverables (1796 of them) mentioning “point of interest” refer to a cartographic service.
Moreover, we also analysed the co-occurence cases, where different cartographic providers are jointly mentioned within a single deliverable. Notably, there are not that many. Indeed, only 59 deliverables mention both OSM and BingMaps, over the 226 occurrences of the latter; and only 291 deliverables mention both OSM and GoogleMaps, over the 2800 occurrences of GMaps. Besides, only 39 deliverables mention OSM, GoogleMaps and BingMaps. Such figures tend to suggest that once a group of researchers has chosen a cartographic solution, they tend to stick to it and do not try to compare them.
Furthermore, regarding OpenSeaMap, we counted 312 mentions from 27 deliverables, among which 20 ones mention both OSM and OpenSeaMap, showing how connected are the two initiatives.
In this study, we systematically analysed all the available H2020 deliverables, searching for cartographic service references, with a specific focus on OpenStreetMap. Our efforts show that OSM is the most used cartographic service in European H2020 projects in terms of mentions in the deliverable’s texts, followed by GoogleMaps with one order of magnitude less mentions. It is worth noting that these projects involving OSM were backed by almost 4 billion euros of public money.
Based on these first interesting results, we plan to extend our scope of analysis following three axes. First, we think that it could be worth reviewing also the other types of project’s results such as the articles or the software source code bases. Second, we hope our approach paves the road to similar reviews of public funded initiatives, and based on this observation we plan to apply our scripts to other European funding programmes. Third, additional cartographic services could also be integrated into our pipelines such as ApplePlans or other OSM-related initiatives like OpenCycleMap in order to extend the covered scope.
Finally, for reproducibility purposes, we also share on a public github repository [5] all the scripts necessary to download the deliverables and generate the statistics. Furthermore, https://dgraux.github.io/OSM-in-H2020 provides the reader with additional and detailed analyses together with visualisations, hoping these will help the community better understand the impact of OSM within the public European research landscape.