Detecting informal settlements via topological analysis
We outline methods for a) extracting the geometry of street blocks in urban centres using
OSM and remote sensing data, b) generating approximate cadastral maps of a
block given contained building footprints, and c) quantifying residents’ ability to navigate within
blocks through topological analysis of cadastral maps. This topological metric, termed “spatial
accessibility” and denoted k = 1, 2, 3, ..., determines whether areas of a city are informal
settlements, as blocks where k > 2 contain cadastral parcels without direct access to formal road
networks. We analysed 1 terabyte of OpenStreetMap data for 120 low and middle income (LMIC) countries.
Determinations of whether a neighbourhood is underserviced depends greatly on local concerns, however topological analysis of building footprints in relation streetlevel access offers a pathway to near-universal criteria for determining whether a neighbourhood is potentially a slum or urban informal settlement. With a topological approach, it is possible to determine the number of building parcels a streetblock inhabitant must cross to access a potential services bearing street, i.e. one that enables for example access to emergency services, sanitation and piped water. Focusing on topological invariants allows us to analyse cities without respect to the specific morphology of their street network [1]. To create a global index of these under-serviced neighbourhoods, we extracted open-source data on building footprints and street networks and applied a topological analysis to each extracted street block to characterise the level of spatial accessibility. Using these techniques, we analysed 1 terabyte of OpenStreetMap (OSM) data to create an index of street block geometry, cadastral maps, and spatial accessibility calculations for 120 Low and Middle Income Countries (LIMCs) in the Global South. These results highlight the developing world’s most spatially inaccessible communities, enabling prioritisation of infrastructure investment and further study.
The primary data universe for our work involves three entities: 1) administrative boundary polygons, 2) building footprint polygons, and 3) street network line-string collections. The administrative data boundaries were sourced from the Database of Global Administrative Areas, while building footprints and street networks were sourced from OSM. Since OSM does not provide direct downloads, the data under consideration was obtained from a mirror of OSM hosted by GeoFabrik. This data reflects a snapshot of OSM as of July 2019.
Starting with the initial road network, the geometry of the street network can be extracted via the well-studied process of polygonization. In polygonization, the self-intersections of the street network are determined in order to define the block geometry. Most GIS software packages offer a polygonization feature (e.g. ST_MakePolygon from PostGIS). In implementation, however, we found a set-theoretic approach to be more stable and performant [2].
By buffering the one-dimensional line-strings comprising the street network with a small buffer radius, we obtain two-dimensional polygons capturing the outline of the street network. We then find the set-theoretic difference between the administrative boundary polygons and the buffered road-network polygon; this renders the negative area between the road network geometry as a collection of polygons. These negative areas are precisely the street blocks we are trying to extract.
With each building’s access to the formal road network - or lack thereof - analysed, we can also provide suggestions to the local community about ways to connect each building to the existing road network while respecting existing building footprints. This process, known as re-blocking, is in general, NP-hard. In contrast to previous algorithmic re-blocking efforts involving stochastic graph search [3], we approach the problem using Steiner tree approximations to the optimal road network.
To frame the universal access network as a Steiner tree problem, we segmented the building parcel boundaries to create “non-terminal” nodes at each segment boundary, and create a “terminal” node for each building parcel lacking direct street access by placing a node on the non-street parcel boundary closest to the building. We then construct a complete subgraph of all the terminal nodes, where the weight of each edge is the Euclidean distance between each pair of terminal nodes; each existing road segment has zero weight. Solving the minimum-spanning tree problem on this subgraph, and then recovering the original path segments, gives us the optimal new road network.
In the short future, we hope to repeat our analysis on denser sources of data on footprints and street network data from other providers (such as private satellite imagery companies). We expect that while OpenStreetMap has good coverage in some areas, we can use data in which footprints are automatically extracted from satellite imagery to supplement areas where coverage may be lacking. Additionally, we hope to re-block more neighbourhoods and cities and investigate the connections between an urban centre’s infrastructural topology and the amount of new infrastructure needed to be built to universally connect each building to the formal road network. Finally, we expect that our generated dataset could be used to study development and health outcomes, especially around questions of tenure security and emergency service provision.