Deploying Performant Parallel Filesystems - Ansible and BeeGFS

HPC Insights


BeeGFS is a parallel file system suitable for High Performance Computing (HPC) with a proven track record in scalable storage solution space. In this blog hosted by Verne Global, we explore how different components of BeeGFS are pieced together and how we have incorporated them into an Ansible role for a seamless storage cluster deployment experience.


With access to a high-performance InfiniBand fabric, Verne Global's hpcDIRECT users can take advantage of BeeGFS’s native RDMA support to create high performance parallel filesystems as part of their HPC deployments, either on dedicated storage resources or hyperconverged with their compute nodes. This is a great way of making scratch space available to hpcDIRECT workloads.

Users looking to optimise the time to science can do this for their deployments through the hpcDIRECT portal. For those looking to get more hands-on, this guide will take you behind the scenes on how a BeeGFS storage cluster can be configured as part of the deployment of cloud-native HPC.

In this post we'll focus on some practical details for how to dynamically provision BeeGFS filesystems and/or clients running in cloud environments. There are actually no dependencies on OpenStack APIs here - although we do like to draw our Ansible inventory from Cluster-as-a-Service infrastructure and hpcDIRECT makes this possible.

As described here, BeeGFS has components which may be familiar concepts to those working in parallel file system solution space:

  • Management service: for registering and watching all other services
  • Storage service: for storing the distributed file contents
  • Metadata service: for storing access permissions and striping info
  • Client service: for mounting the file system to access stored data
  • Admon service (optional): for presenting administration and monitoring options through a graphical user interface.

Introducing our Ansible role for BeeGFS

We have an Ansible role published on Ansible Galaxy which handles the end-to-end deployment of BeeGFS. It takes care of details all the way from deployment of management, storage and metadata servers to setting up client nodes and mounting the storage point. To install, simply run:


There is a README that describes the role parameters and example usage.

An Ansible inventory is organised into groups, each representing a different role within the filesystem (or its clients). An example inventory-beegfs file with two hosts bgfs1 and bgfs2 may look like this:

Through controlling the membership of each inventory group, it is possible to create a variety of use cases and configurations. For example, client-only deployments, server-only deployments, or hyperconverged use cases in which the filesystem servers are also the clients (as above).

A minimal Ansible playbook which we shall refer to as beegfs.yml to configure the cluster may look something like this:

To create a BeeGFS cluster spanning the two nodes as defined in the inventory, run a single Ansible playbook to handle the setup and the teardown of BeeGFS storage cluster components by setting beegfs_state flag to present or absent:

The playbook is designed to fail if the path specified for BeeGFS storage service under beegfs_oss is already being used for another service. To override this behaviour, pass an extra option as -e beegfs_force_format=yes. Be warned that this will cause data loss as it formats the disk if a block device is specified and it also erases management and metadata server data if there is an existing BeeGFS deployment.


Highlights of the Ansible role for BeeGFS

  • The idempotent role will leave state unchanged if the configuration has not changed compared to the previous deployment.
  • The tuning parameters for optimal performance of the storage servers recommended by the BeeGFS maintainers themselves are automatically set.
  • The role can be used to deploy both storage-as-a-service and hyperconverged architecture by the nature of how roles are ascribed to hosts in the Ansible inventory. For example, the hyperconverged case would have storage and client services running on the same nodes while in the disaggregated case, the clients are not aware of storage servers.


One point to be aware of: BeeGFS is sensitive to hostname. It prefers hostnames to be consistent and permanent. If the hostname changes, services refuse to start. As a result, this is worth being mindful of during the initial cluster setup.


Looking Ahead


The simplicity of BeeGFS deployment and configuration makes it a great fit for automated cloud-native deployments such as hpcDIRECT. We have seen a lot of potential in the performance of BeeGFS, and we hope to be publishing more details from our tests in a future post. Watch this space!


Written by Stig Telfer (Guest)

See Stig Telfer (Guest)'s blog

Stig is the CTO of StackHPC. He has a background in R&D working for various prominent technology companies, particularly in HPC and software-defined networking. Stig is also co-chair of the OpenStack Scientific Working Group,

Related blogs

Creating one team - Treating partners like your customers

Late in 2017 I told our senior leadership team that I wanted to make our partners part of our customer success programme. As with everything “Verne Global” the support was unwavering and I was asked to outline what that might look like for 2018. Having just drawn up my goals for this year its far easier to describe exactly how that’s going to happen...

Read more


NeurIPS – Rumours from the trade show floor

Generally, trade shows follow the sun and tourists to popular vacation destinations. Everyone loves a conference in San Diego or Orlando! The recently rebranded NeurIPS (formally NIPS) took a different road this year and visited Montreal in early December. Montreal is one of my favourite cities but in early December it’s the season for cold, cloudy weather and infrequent freezing rain. Here's a quick rundown on my experiences at the conference.

Read more


The edge could be a winning card for telcos

For some time now, I’ve been trying to talk more about “digital infrastructure” than “data centers”. That’s because the connections that link data centers, their users and other resources such as power, are just as important as the servers and infrastructure inside the buildings. When it comes to the 'Edge' - new, exciting opportunities could exist for telecommunications providers...

Read more

We use cookies to ensure we give you the best experience on our website, to analyse our website traffic, and to understand where our visitors are coming from. By browsing our website, you consent to our use of cookies and other tracking technologies. Read our Privacy Policy for more information.