DevOps. Stacking up to be a Common Theme at ISC18

HPC Insights

Yesterday was a good day to be in Frankfurt. All of the majors in the supercomputing universe descended upon the Messe Frankfurt to begin ISC18 with a series of training seminars. For my morning session, I chose Getting Started with Containers on HPC through Singularity put on by the team at Sylabs, Inc. I have been tracking the progress on Singularity in the HPC community since before Sylabs was founded by CEO Greg Kurtzer in an effort to bring the technology of root secure containers into the realm of enterprise supported software. I was excited to hear about the progress that Sylabs has made and to see where the future of containers lies for the broader HPC community. If I was forced to sum the tutorial into a single portmanteau, it would be DevOps. After this session, it is clear to me that the world of DevOps that has been created in the cloud native universe is on a collision course with HPC. And the future of science says that it can’t happen soon enough.

The session started with Sylabs software engineer Arango Gutierrez as presenter. Based on his Github profile, I would bet he is over the moon this morning after Colombia’s 3-0 whitewashing of Poland in the World Cup Group stages last night. Gutierrez ran us through a tutorial of deploying a Singularity system on a laptop. The tutorial included a how-to on deploying containers, importing docker containers into singularity, and the simplicity of executing shell commands both inside and from outside of the container. We learned what makes Singularity different from the more traditional containers such as Docker that are used for web-scale deployments of microservices. Singularity is focused on providing a logical sequence from on-laptop development to testing cluster simulation and finally into runtime production on the main supercomputer cluster. Singularity allows users to ‘BYOE’ - bring your own environment, while giving the operators of the main cluster confidence that a Singularity container will not allow a running program to access the namespace of the cluster as root. This security requirement is paramount for the operators of HPC clusters that are accessed by many ‘untrusted’ users. And the security options are ever increasing with very interesting tools that will allow scientists and developers to cryptographically sign their containers, so that they only run in the certified and intended environments. I would recommend that anyone interested in the benefits of containers run through Gutierrez’s Singularity tutorial and according to the Sylabs team, they shouldn't be shy about using the burgeoning Singularity community for support through the process. There is even a public Slack channel where the community are actively discussing the next phase of container solutions for HPC.

As with any good session, we then went on to hear from an end user. Andrew Younge, Senior Computer Scientist at Sandia National Laboratories, introduced very specifically how Singularity is used to ensure that critical running time on the Sandia supercomputers is not wasted by taking advantage of the DevOps practices that Singularity enables. Younge started his presentation by talking about what an organisation like Sandia National Laboratories looks for in an HP container. Spoiler alert - it isn’t the same thing that Google is looking for when it deploys billions of containers to deliver web-scale services. For the HPC community, here are the most important elements that are needed when adopting a container based approach to HPC computing:

  • We want version control integration. This ensures that teams are able to collaborate efficiently without breaking the application by adding incompatible libraries and other features. Version control integration allows for continuous integration and push button deployments.
  • We require minimal overhead on the underlying system. Younge said that a maximum of 1-2% overhead for functionality is acceptable and has been demonstrated by Singularity on a variety of HPC clusters.
  • "We don't need Microservices". We don't want to pack 10-20 containers on a single node.
  • We don't want to allow root operations. Keeping untrusted applications and untrusted users from accessing the underlying infrastructure is actually a security risk greater than you think when you consider the types of applications that are running on these supercomputing environments.
  • We don't want to deal with the concept of opening ports. This doesn't typically apply from an HPC perspective
  • And ultimately, we want to support development on laptops. As with any organisation, Sandia and other HPC communities want the scientists to be able to execute science, not become bogged down in complex operational requirements.

Younge outlined the process that an application developer uses in order to leverage containers for development and submission of simulations into the supercomputing environment:

  1. Scientists and application developers code and build their applications within a Singularity container on their own laptop, using the libraries that will be available within their target system.
  2. Once fully functional on the laptop, the container image is pushed to a private container registry service, hosted on Gitlab for example.
  3. From a larger testing cluster, the container image is pulled from the repository and submitted as a job. This step ensures that appropriate measures have been made ahead of jumping into the often long queue on the main supercomputers. And since Singularity containers are command line executable so they can be very easily slotted into the scheduler associated with the larger testing cluster.
  4. Younge noted that at this point, it is possible to port the container into an EC2 testing environment if a testing cluster isn’t available.
  5. Finally, after testing is completed, the container is pulled into the main supercomputing queue using the same private registry and executed in turn.

Younge finished his presentation by showing us how IMB and HPCG testing show that a Singularity container environment is able to achieve very high levels of performance (e.g. 99.4% of native for an MPI application using CrayMPI libraries) when benchmarked against natively built applications. He went on to announce that Singularity will in fact be the primary DevOps tool that is used for Astra, the upcoming ARM based supercomputer that is the first of a potential series of advanced archecture prototype platforms that will deployed as part of the U.S. Department of Energy’s (DOE) National Nuclear Security Administration’s (NNSA) Vanguard Project. And the eventual goal of that project is of supporting ARM-based exascale systems at the agency. So you could say that Singularity is empowering DevOps into the exascale age.

To round out the four-hour session, Christian Kniep of Docker, quickly ran through further ways that the DevOps power that Docker has brought to web-scale is ready to empower the HPC community. And with the ability to directly import Docker work into Singularity, the huge base of tools already built for Docker will be a great opportunity for the HPC community to embrace the efficiencies of containers. Kniep even showed us how simple it is to tie acceleration tools such as NVIDIA Cuda libraries into Docker configurations with single lines of code. Kniep used a term I hadn’t heard of before, 'SciOps', when describing the benefits that containers and continuous integration can bring to the scientific community.

As a closing thought, I can’t help but feel that we aren’t at the end of the adoption of HPC DevOps, and in fact I believe that we are hardly even at the beginning. And so, it is perhaps difficult to predict how far HPC will go towards the tools of web-scale. It is agreed that today’s HPC simulation software will continue to be monolithic in its approach to containers, and the HPC community will not see specific benefit for incorporating concepts such as microservices that are used at web-scale. However, web-scale DevOps should be applied not only to the core simulation tools, but to pre and post-processing, the overall coding process, the breakdown of individual tasks, quality control, and ultimately the ability to roll out applications in an optimised manner that are better suited for the final optimised environment. The result: science is progressed in the most effective manner. DevOps in HPC is in the early stages, but I expect it is a portmanteau heard many times over this week in Frankfurt.

Written by Tate Cantrell

See Tate Cantrell's blog

Tate is Verne Global's CTO and is responsible for setting the technical direction for the company. You can follow Tate on Twitter: @tate8tech

Related blogs

NeurIPS – Rumours from the trade show floor

Generally, trade shows follow the sun and tourists to popular vacation destinations. Everyone loves a conference in San Diego or Orlando! The recently rebranded NeurIPS (formally NIPS) took a different road this year and visited Montreal in early December. Montreal is one of my favourite cities but in early December it’s the season for cold, cloudy weather and infrequent freezing rain. Here's a quick rundown on my experiences at the conference.

Read more

HPC and big data convergence in fraud detection

The convergence of two existing disciplines can be an explosively creative force. A great example within the world of tech is the convergence of HPC merging with big data and machine learning. Though in many ways this convergence is still in its early stages, the merging of these technologies is already starting to deliver concrete, real world benefits in the fraud detection field, helping save financial firms hundreds of millions of dollars.

Read more

AI Events - Rumours from the trade show floor

After of a couple of weeks sunning and sailplane racing in Florida it was straight back into the saddle for a London-based Quarterly Business Review and the excellent “Rise of AI” event in Berlin. Following a year of attending various international AI events I’m starting to develop a feel for their differing textures and here is my current evaluation.

Read more

We use cookies to ensure we give you the best experience on our website, to analyse our website traffic, and to understand where our visitors are coming from. By browsing our website, you consent to our use of cookies and other tracking technologies. Read our Privacy Policy for more information.