Pitfalls for performance: Typical latencies to keep in mind
"Less computation implies faster code": Sounds reasonable at a first glance,
but a closer inspection considering memory or disk latency numbers
of modern computer architectures quickly reveals this to be a fallacy.
In this talk I will discuss some ballpark latency numbers
every programmer should know and hint at implications
for designing algorithms in the context of high-performance computing
as well as scientific simulations.
In recent years a number of algorithmic approaches have appeared
in high-performance computing, which are no longer based on storing
intermediate results on disk or in memory, but which explicitly avoid
any kind of storage operations as much as possible. Often these
methods willingly pay the price of recomputing parts of the intermediate
results over and over. As counter-intuitive as it sounds at first,
this may reduce runtime provided that recomputation is faster than
typical latencies required for operating on the storage medium.
A key aspect to keep in mind in this development is typical
latency numbers for access into storage. For example
modern hardware typically allows to perform on the order of 1000
floating point operations per load from main memory.
Recent trends in memory band width and CPU design suggest
this number to increase in the future. In this sense
having a feel about latencies helps to judge, whether
storing data is worth it and not actually more expensive than
recomputing it.
In this talk I will discuss typical latency numbers
and in the light of this especially hint at approaches
to solving linear algebra problems,
which avoid access to main memory.
Most notably I will discuss so-called matrix-free or
contraction-based methods.
No knowledge of numerical linear algebra is required
to follow the talk, but naturally there will be maths
and some formulae, so beware!
### Attachments
- __[Slides](https://michael-herbst.com/talks/2018.09.08_mrmcd18_pitfalls_performance_latencies.pdf)__