Supercomputing 19 – Rumours from the Trade Show Floor

SC19 was red hot this year as the race to exascale computing got into top gear. Not even the snow on the last afternoon damped the collective ‘exascale enthusiasm’. SC19 is our industry’s exhibition pinnacle and as normal, the weekend before the show opens on Monday evening is packed full of training sessions, briefings, industry updates, etc. that cover everything from the latest HPC and AI product releases and tools, to tours of nearby supercomputing centers.

With my green Icelandic data center heritage, I was drawn to the tour of the National Renewable Energy Laboratory (NREL) about an hour west of Denver set against the beautiful foothills of the Rocky Mountains.

Their supercomputer is located in the building on the stilts on the right side of the picture. After some fascinating briefings about their wind and geothermal energy research, we got practical first-hand insight into wind turbine blade design with a virtual reality visualisation of the airflow around the blades and their interaction with each other. It was really fascinating stuff.

Thereafter we toured the Eagle Supercomputer which is 100% water-cooled, including their NVIDIA V100 GPUs. This interested me because NVIDIA only warranties the V100 for air-cooling. NREL were unfazed by this detail and commented that they are a national laboratory and need to push the boundaries both of science and what their computers can do. Having renewable energy DNA, they also use the heat from their supercomputer to warm the buildings on the campus, thus creating a nice circular heat economy:

Anyway, back to SC19 and my favorite exascale announcement of the show was the launch of the Graphcore IPU. It’s available in Dell DSS 8440 servers and on the cloud. The Graphcore IPU has a great hardware specs:

GraphCore’s new Colossus GC2 chip holds 1216 IPU-Cores™. Each IPU runs at 100GFlops and can run 7 threads. The GC2 chip supports 300MB of memory, with an aggregate of 30TB/s of memory bandwidth. Each IPU supports low precision floating point arithmetic in completely parallel/concurrent execution. The GC2 chip has 23.6B transistors!

Each GC2 chip supports 80 IPU-Links™ to connect to other GC2 chips with 2.5tbps of chip to chip bandwidth and includes a PCIe Gen 4 x16 link (31.5GB/s) to host processors. Additionally, it supports up to 8TB/s IPU-Exchange™ on the chip bandwidth for inter chip, IPU to IPU communications.

The architectural secret sauce is having enough memory on the GC2 chip to allow the whole model to stay in the chip memory. The full mesh high speed connectivity helps too. This all results in fabulous benchmarks like this one for convolution training:

This device is particularly relevant to machine vision, natural language processing, autonomous vehicles and security applications where FP64 is infrequently used. So far, nobody has announced a supercomputer using IPUs but I would expect to see that happen next year at ISC20 or SC20 – stay tuned!

As a sprog engineer, I learned about Direct Memory Access (DMA) hardware acceleration in early microprocessors. Given a source memory address, destination memory address and the size of the transfer special hardware moves the data without the need to write a program to move the data a byte or word at a time using the CPU. Some recent evolutions of this theme have been making a big HPC impact recently.

Infiniband networking has long been the go-to connectivity for multiple nodes of compute and it uses a similar concept Remote Direct Memory Access (RDMA) across the network. At SC-19 Nvidia CEO Jensen Huang’s annual keynote announced Magnum IO, a suite of software optimized to eliminate storage and input/output bottlenecks using GPUDIrect – a GPU DMA technology.

Magnum IO, using GPUDirect, delivers up to 20x faster data processing for multi-server, multi-GPU computing nodes when working with massive datasets to carry out complex financial analysis, climate modeling and other HPC workloads.

Yellowbrick Data, a Palo Alto based storage company recently out of stealth mode, also leverage DMA technology this time using an updated bios to move data directly from the MVME storage to the CPU cachewith amazing results: “Plow through data 10-100x faster with Yellowbrick. Ad-hoc workloads on Yellowbrick run faster than heavily tuned, indexed queries on other data warehouses.” We pleased to have one of the first Yellowbrick nodes in our Icelandic data center nestled with a miriad of other HPC kit.

Flash memory and clever DMA producing outstanding storage performance

Intel have been catching-up in the accelerated computing market. Intel's Nervana AI NNP-I chips and boards were visible everywhere and they announced their new Xe line of GPUs – the top product is named Ponte Vecchio and will be used in Aurora, the first planned U.S. exascale computer.

An Intel gem not yet ready for prime-time was their neuromorphic accelerator a potential next step beyond DNN techniques. This advancement may exploit Spiking neural networks (SNNs) which are a novel model for arranging those elements to emulate natural neural networks that exist in biological brains. Each “neuron” in the SNN can fire independently of the others, and doing so, it sends pulsed signals to other neurons in the network that directly change the electrical states of those neurons. By encoding information within the signals, themselves and their timing, SNNs simulate natural learning processes by dynamically remapping the synapses between artificial neurons in response to stimuli.

Loihi: A Neuromorphic Manycore Processor with On-Chip Learning

Intel Labs is making Loihi-based systems available to the global research community. If only I had more time to experiment with one of these puppies!

Irrespective of your future HPC/AI plans, with or without DMA or SNNs, Verne Global has your colocation bases covered with free-air cooling and very affordable power.

Bob Fletcher, VP of Artificial Intelligence, Verne Global (Email:

Written by Bob Fletcher

See Bob Fletcher's blog

Bob, a veteran of the telecommunications and technology industries, is Verne Global's VP of Strategy. He has a keen interest in HPC and the continuing evolution of AI and deep neural networks (DNN). He's based in Boston, Massachusetts.

Related blogs

Accelerating towards Exascale

One of the most important advancements that we see in computing today is the continued march forward of computational efficiency. There is no better stage for monitoring these advancements than the Green500. The Green500 is a biannual ranking of the TOP500 supercomputers using the efficiency metric of performance per watt. Performance is measured using the TOP500 measure of high performance LINPACK benchmarks at double-precision floating-point (FP64) format. Unofficially tracked since 2009 and officially listed since 2013, the Green500 metrics show the trajectory for the advancement of computational efficiency in supercomputers.

Read more

Talking Ai with London Cab Drivers

Being a daytime Londoner one of my favourite pastimes is to take an occasional ride in a proper London black cab and have a natter along the way. That is often a pretty cathartic experience - probably more so for the driver than the passenger, but I find it entertaining being the listening post, and London cab drivers are a good measure of the mood of the city. Recently I've had some great conversations about Ai and it's impact on society.

Read more

TrueHPC at Industrial Scale

As we prepare to descend upon Frankfurt this week for ISC18, science will take its appropriate place at center stage. Throughout the week, the gathering will hear how through science we are challenging the world’s toughest problems, dissecting those problems down to their foundations and then building them back up by methodically moving every minute detail that has been learned into the powerful realm of the supercomputer. Science shows us that the best innovations are created by literally starting from scratch and building from there.

Read more

We use cookies to ensure we give you the best experience on our website, to analyse our website traffic, and to understand where our visitors are coming from. By browsing our website, you consent to our use of cookies and other tracking technologies. Read our Privacy Policy for more information.