The Myth of Exascale and the New Paradigm of Parallel Programming

The IT industry is facing the exascale challenge. That’s the challenge of building a supercomputer with 1 exaflops of scientific computing power (64-bit floating point), that is, doing 1 trillion calculations per second.

This makes it clear: the times when these expensive computers had to be operated with special programs specially developed for them are over.

High performance computing (HPC) uses very large computers to solve some of the world’s toughest computing problems.

These “supercomputers” are used where the representation of the real problem is too large, too small, too fast, too slow or too complex to directly answer the resulting questions experimentally.

Many scientific breakthroughs are based on theoretical models, which are then simulated on the supercomputers.

In addition to theory building and experiments, simulation has long since established itself as the third pillar of science – in astrophysics, medicine, energy research, materials and life sciences or climate research. For example, the daily weather report comes from the supercomputer.

It is clear that economies, research institutes and companies that actively use modeling and simulation are among those that will, over time, gain competitive advantages and derive economic benefit from them.

As a result, there is currently a global scramble to invest in large, state-of-the-art HPC systems.

The exascale race

Computer graphics of the exascale computer Aurora, scheduled for delivery to Argonne National Lab in 2022

Many countries, above all China and the USA, but also Japan and the EU, are therefore investing heavily in HPC.

Because it’s so prestigious, there’s a real race, much like the moon landing, to get the very first exascale computer up and running.

Currently, the best supercomputers only manage 20 to 50 percent of an exascale computer. Here are more advantages of exascale computers.

The Japanese supercomputer Fugaku is currently the world’s top performer with 415.53 petaflops.

China wants to overtake the Japanese with the Tianhe-3, which is supposed to break the exaflops limit.

The big question is whether he will be the first to do this, because Intel also plays a leading role – as the main supplier for the Aurora system.

Aurora will be one of the first US exascale supercomputers to be installed at Argonne National Lab in 2021 and is expected to be online by early 2022 at the latest with at least 1 exaflops of performance.

Aurora will work with next-gen Intel Xeon processors and Xe GPUs and is expected to cost around $500 million, according to businesspally. 

It remains to be seen whether Aurora will be the world’s first exaflops computer; the race seems absolutely open – and the outcome remains exciting.

Exorbitant increase in computing nodes in supercomputers

One consequence of this development is the exorbitant increase in processor cores in supercomputers – the larger systems today have more than 100,000 cores, Fugaku even more than seven million.

Specifically, the current Exascale Challenge consists of taking the step from currently a good 1015 flops (petaflops) to the next milestone at 1018 flops – i.e. 1 exaflops.

The following properties characterize the future exascale computers:

Large number of processors with low power consumption and many (possibly millions) of cores

Numerical accelerators – such as GPUs or FGPAs – with direct access to the same memory as the processors

Faster, larger and energy-saving memory modules with advanced interfaces

Novel high-bandwidth, low-latency topology networks connecting compute nodes and storage modules.

While general-purpose processors continue to form the backbone of all computing infrastructure, accelerators are becoming more mainstream. The reason is obvious:

graphics processing units (GPUs), field programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) offer clear advantages for certain workload classes, especially in promising applications from areas such as artificial intelligence, analytics or big data .

The Exascale Challenge – also a software problem

All these developments make it clear that not only the hardware, but also the software and especially the applications represent the actual exascale challenge.

Such systems with more than one million CPU cores, GPUs and other accelerators need applications that are broken down into parts that can be executed in parallel (threads / processes), which in turn are executed at the same time – i.e. “concurrently”.

Because every application type has different demands on computing power, the optimal computer architecture also varies, says chaktty.

For example, route planners have completely different requirements in terms of processing speed, interfaces and hardware features than autonomous driving, weather forecasting or a financial risk analysis.

To enable all these applications to use the best fitting hardware components in an exascale computer without proprietary software stacks, Intel developed oneAPI.

This is a standards-based solution that promises portability and a programming interface close to the hardware – i.e. top performance – at the same time.

As described on the following pages, oneAPI uses the programming language Data Parallel C++ (DPC++ for short), which abstracts from the hardware, but also OpenMP (Open Multi-Processing) in version 5.0.

This is an API for shared memory programming in C++, C and Fortran on multiprocessor computers, developed jointly by various hardware and compiler manufacturers.

Thanks to OpenMP, existing code for HPC applications can also be outsourced to a GPU, even if it was originally intended for CPU-based systems.

Exascale users can continue to use existing HPC applications with oneAPI, but also benefit from everything that is developed in the oneAPI ecosystem for AI or analytics.

Leave a reply