Verne Global’s recent announcement of its hpcDIRECT HPC-as-a-service (HPCaaS) platform, with its custom HPC configuration option, caused me to reflect on what’s the optimum HPC cluster configuration. Unsurprisingly, I’m a real bleeding-edge technology advocate so I really pay attention when someone provides me with a well-reasoned alternative approach. Let's review what's bleeding-edge in the HPC space...
CPUs - Intel Xeon processors are dominant in servers and are based on normal desktop-grade CPUs, but have some advanced features such as support for ECC memory, higher core counts, 4 > 28, support for larger amounts of RAM, and larger cache memory. The Xeon Phi emphasises more cores, <= 72, with higher memory bandwidth and is designed for parallel computing applications.
GPUs - Nvidia’s GPUs currently lead the market and their latest V-100 has 5120 cores.
Techniques - Deep Neural Networks (DNN) are now the basis for many innovative products from language translation to machine vision. Virtual Reality (VR) and augmented reality are migrating from animated movie and gaming domains into mainstream business and engineering applications. No longer limited to their monitors, users can don their VR headset and wander through their data and designs with amazing results.
Now let’s look at some tools which have extensive utilisation and fail to match this bleeding-edge bar:
Computational fluid dynamics (CFD) - CFD is used to design anything that operates in a fluid from aircraft to shower curtains. Often only the solver component of the applications benefits from serious parallelism of compute and a few common applications are reputed not to really benefit from more than 8 CPU cores.
Astro-physics - I had a fabulous couple of days at the Intel Xeon Phi User Group (IXPUG) meeting in Cambridge this spring. Being a rogue, I asked one of the Black Hole professors what language they wrote their modeling solution in. Surprise, surprise, it was Fortran that same format sensitive language I first programmed with punch cards back in university long before PCs. Apparently, there are no academic brownie points for rewriting a couple of million lines of Fortran in something more contemporary despite its efficiency.
Materials science - Gaussian is one of the most progressive of the materials science applications with respect to exploiting the latest technology. Here is some of their current technical insight – basically tread carefully:
- Efficiency is lost when threads are moved from one CPU to another, thereby invalidating the cache and causing other overhead
- As long as sufficient memory is available, and threads are tied to specific cores, then parallel efficiency on large molecules is good for up to 64 or more cores
- Hyperthreading is not useful for Gaussian since it effectively divides resources such as memory bandwidth among threads on the same physical CPU
- Gaussian 16 can use NVIDIA K40 and K80 GPUs under Linux
- When using GPUs, each GPU must be controlled by a specific CPU. The controlling CPU should be as physically close as possible to the GPU it is controlling. GPUs cannot share controlling CPUs. Note that CPUs used as GPU controllers cannot be used as compute nodes as well.
- Integrating the latest bleeding-edge CPU and GPU technology with HPC applications is a challenge and at a minimum, needs a thoughtful compute configuration.
A successful F1 race-team focuses on matching their application. As a major user of one most popular commercial CFD applications they match the multi-core CPU capabilities of its solver software. Which is currently closer to 8 cores than the bleeding-edge 28 cores. They prepare a total cost of ownership (TCO) spreadsheet including: multi-core CPU return on investment, storage area network, networking, colocation versus in-house data center hosting.
The Stephen Hawking Centre for Theoretical Cosmology focuses on the problem. They exploit and extend the legacy astrophysics modeling technology, utilising their local academic HPC cluster with Xeon Phi parallel computing capabilities. They focus only on optimising the programs which provide the best parallelisation option and bypass the remainder.
Deep L’s natural language translation utilises the latest and greatest technology. They exploit the most powerful CPUs & GPUs and extensively researched the best value-based colocation environment to operate them in - Iceland.
Now you can match your compute task with your operational expense funded HPC cluster with Verne Global’s hpcDIRECT – your custom bare metal compute environment. Alleviate the drama of getting a custom HPC compute environment for your HPC/AI training application and exploit the Icelandic free air cooling with inexpensive green power. Verne Global provides the option of bleeding edge technology or sometime less so and appropriate, with a great TCO and award-winning customer support.