INSUBCONTINENT EXCLUSIVE:

The scale of supercomputing has grown almost too large to comprehend, with millions of compute units performing calculations at rates

requiring, for first time, the exa prefix — denoting quadrillions per second

How was this accomplished? With careful planning… and a lot of wires, say two people close to the project. Having noted the news that

Intel and Argonne National Lab were planning to take the wrapper off a new exascale computer called Aurora (one of several being built in

the U.S.) earlier this year, I recently got a chance to talk with Trish Damkroger, head of Intel Extreme Computing Organization, and Rick

Stevens, Argonne associate lab director for computing, environment and life sciences. The two discussed the technical details of the system

at the Supercomputing conference in Denver, where, probably, most of the people who can truly say they understand this type of work already

were

So while you can read at industry journals and the press release about the nuts and bolts of the system, including Intel new Xe architecture

and Ponte Vecchio general-purpose compute chip, I tried to get a little more of the big picture from the two. Intel and Cray are building a

$500 million ‘exascale& supercomputer for Argonne National Lab It should surprise no one that this is a project long in the making —

but you might not guess exactly how long: more than a decade

Part of the challenge, then, was to establish computing hardware that was leagues beyond what was possible at the time. &Exascale was first

being started in 2007

At that time we hadn&t even hit the petascale target yet, so we were planning like three to four magnitudes out,& said Stevens

&At that time, if we had exascale, it would have required a gigawatt of power, which is obviously not realistic

So a big part of reaching exascale has been reducing power draw.& Intel supercomputing-focused Xe architecture is based on a 7-nanometer

process, pushing the very edge of Newtonian physics — much smaller and quantum effects start coming into play

But the smaller the gates, the less power they take, and microscopic savings add up quickly when you&re talking billions and trillions of

them. But that merely exposes another problem: If you increase the power of a processor by 1000x, you run into a memory bottleneck

The system may be able to think fast, but if it can&t access and store data equally fast, there no point. &By having exascale-level

computing, but not exabyte-level bandwidth, you end up with a very lopsided system,& said Stevens. And once you clear both those obstacles,

you run into a third: what called concurrency

High performance computing is equally about synchronizing a task between huge numbers of computing units as it is about making those units

as powerful as possible

The machine operates as a whole, and as such every part must communicate with every other part — which becomes something of a problem as

you scale up. &These systems have many thousands of nodes, and the nodes have hundreds of cores, and the cores have thousands of computation

units, so there like, billion-way concurrency,& Stevens explained

&Dealing with that is the core of the architecture.& How they did it, I, being utterly unfamiliar with the vagaries of high performance

computing architecture design, would not even attempt to explain

But they seem to have done it, as these exascale systems are coming online

The solution, I&ll only venture to say, is essentially a major advance on the networking side

The level of sustained bandwidth between all these nodes and units is staggering. Making exascale accessible While even in 2007 you could

predict that we&d eventually reach such low-power processes and improved memory bandwidth, other trends would have been nearly impossible to

predict — for example, the exploding demand for AI and machine learning

Back then it wasn&t even a consideration, and now it would be folly to create any kind of high performance computing system that wasn&t at

least partially optimized for machine learning problems. &By 2023 we expect AI workloads to be a third of the overall HPC server market,&

said Damkroger

&This AI-HPC convergence is bringing those two workloads together to solve problems faster and provide greater insight.& To that end the

architecture of the Aurora system is built to be flexible while retaining the ability to accelerate certain common operations, for instance

the type of matrix calculations that make up a great deal of certain machine learning tasks. &But it not just about performance, it has to

be about programmability,& she continued

&One of the big challenges of an exacale machine is being able to write software to use that machine

oneAPI is going to be a unified programming model — it based on an open standard of Open Parallel C++, and that key for promoting use in

the community.& Summit, as of this writing the most powerful single computing system in the world, is very dissimilar to many of the systems

developers are used working on

If the creators of a new supercomputer want it to have broad appeal, they need to bring it as close to being like a &normal& computer to

operate as possible. &It something of a challenge to bring x86-based packages to Summit,& Stevens noted

&The big advantage for us is that, because we have x86 nodes and Intel GPUs, this thing is basically going to run every piece of software

that exists

It&ll run standard software, Linux software, literally millions of apps.& I asked about the costs involved, since it something of a mystery

with a system like this how that a half-billion dollar budget gets broken down

Really I just thought it would be interesting to know how much of it went to, say, RAM versus processing cores, or how many miles of wire

they had to run

Though both Stevens and Damkroger declined to comment, the former did note that &the backlink bandwidth on this machine is many times the

total of the entire internet, and that does cost something.& Make of that what you will. Aurora, unlike its cousin El Capitan at Lawrence

Livermore National Lab, will not be used for weapons development. $600M Cray supercomputer will tower above the rest — to build better

nukes &Argonne is a science lab, and it open, not classified science,& said Stevens

&Our machine is a national user resource; We have people using it from all over the country

A large amount of time is allocated via a process that peer reviewed and priced to accommodate the most interesting projects

About two thirds is that, and the other third Department of Energy stuff, but still unclassified problems.& Initial work will be in climate

science, chemistry, and data science, with 15 teams between them signed up for major projects to be run on Aurora — details to be

announced soon.

Intel and Argonne National Lab on ‘exascale’ and their new Aurora supercomputer