A Gentle Indroduction to Stream Processing
While “software is eating the world”, those who are able to best manage the huge mass of data will emerge out on the top.
The batch processing model has been faithfully serving us for decades. However, it might have reached the end of its usefulness for all but some very specific use-cases. As the pace of businesses increases, most of the time, decision makers prefer slightly wrong data sooner, than 100% accurate data later. Stream processing - or data streaming - exactly matches this usage: instead of managing the entire bulk of data, manage pieces of them as soon as they become available.
In this talk, Nicholas Fränkel defines the context in which the old batch processing model was born, the reasons that are behind the new stream processing one, how they compare, what are their pros and cons, and a list of existing technologies implementing the latter with their most prominent characteristics. He concludes by describing in detail one possible use-case of data streaming that is not possible with batches: display in (near) real-time all trains in Switzerland and their position on a map. He goes through the all the requirements and the design and, using an OpenData endpoint and the Hazelcast platform, gives a working demo implementation of it.