Industrial HPC Solutions: Bioinformatics

HPC Scientific Research


Consisting of the combination of biology, computer science, and mathematics, the science of bioinformatics has advanced rapidly in recent years. Thanks in part to HPC (high performance computing) and the expanding knowledge of and expertise in how to collect and analyse growing datasets, industries from agriculture to healthcare and more are experiencing the benefits of a bioinformatics revolution.

This article will provide examples of bioinformatics in industry and describe advancements in applications and compute power that have helped to raise the impact of bioinformatics to new heights that are changing the world.


Examples of Bioinformatics in Industry

Bioinformatics may well be best known for its role in and contributions to the healthcare industry. Collaborative Efforts Produce Clinical Workflows for Fast Genetic Analysis (HPCwire, 6 May 2019) details recent innovations and advancements by Mayo Clinic. “With individualised medicine - one of the holy grails of modern healthcare - diagnosis and treatment of patients would rely in part on each individual’s specific DNA profile, enabling truly personalised care. But in order for genetic information to contribute meaningfully to patient care, DNA testing has to be affordable and efficient.” The goal is significant: “making DNA analysis a possibility for every patient. The first aim of the project focused on finding faster methods for clinical analysis of the whole human genome.

The “whole human genome” - wow. You wonder where HPC plays a role in bioinformatics and your wonder ends when considering genomics-related workflows, initially focused on speeding up clinical testing. The initial result? “Ultimately, Mayo Clinic decided to adopt a new variant calling software that completes analysis 44 times faster than the traditional industry-standard pipeline - requiring just a few hours to process a whole genome, rather than days.” What do they call it? “‘Mayomics’. Mayo + genomics,” and there is much more on the horizon.

In agriculture — a perhaps lesser known but important and far-reaching industry — bioinformatics now plays a key role. In Bioinformatics its role & applications in agriculture & other disciplines (Technology Times, 19 December 2017), a descriptive overview is provided: “Bioinformatics is a new field of science but it is making progress in every field of biotechnology very rapidly. As it has its application in the medicine by providing the genome information of various organisms, similarly the field of agriculture has also taken advantage of this field because microorganisms play an important role in agriculture and bioinformatics provides full genomic information of these organisms. The genome sequencing of the plants and animals has also provided benefits to agriculture.

There are now dozens of examples of advancements in the convergence of traditional and evolving bioinformatics with HPC in both healthcare and agriculture with many active large projects and published results regularly.


Modern Bioinformatics - The Compute Side

With hundreds of bioinformatics/genomics codes, supporting a wide range of workflows on the application side is one thing, supporting it on the compute side is another. Today, most major providers of hardware, processors, and storage have a bioinformatics/genomics focus.

Dell and Intel and their Dell Genomic Data Analysis Platform (GDAP) are one example of a partnership to address the many challenges and opportunities in this space. InsideHPC’s Guide to Genomics describes how their solution “is designed to achieve fast results with maximum efficiency. The solution is architected to solve a number of customer challenges, including the perception that implementation must be large-scale in nature, compliance, security and clinician uses:

  • Data explosion – A single genome will produce between 200GB and 300GB of data. This data must be readily available to the computer systems that will need to decode it. Databanks are doubling in size every few months.
  • Big compute requirements – With the massive amounts of data arriving in such a short amount of time, the expectation is that the results from computational analysis will arrive in shorter amounts of time as well.
  • Cumbersome infrastructure – If a system is cobbled together as needs grow, there will likely be a mismatch of the most optimum components. Old systems will have to be networked with newer systems, and a conglomeration of patches, storage mismatches, etc. will surely surface. The IT department will ultimately have to manage these incompatibilities and deal with lack of expected cluster performance. In smaller organizations that do not have enough dedicated IT skilled administrators, chaos and lack of confidence in the computing systems will surely become an issue.
  • Shareware, middleware, favorite tools will make their way into the software stack. A defined system will have to deal with these specific applications or middleware.

Dell and Intel’s solution set, in this case, declares that it is an “integrated genomic processing infrastructure. It is designed to meet the needs of researchers and clinicians, and includes all of the components necessary to reduce turnaround time from days to hours.” While but one of many examples, the sophistication of offerings on the compute side specific to bioinformatics is expanding rapidly.


Summary

Applications in bioinformatics are coming to bear with amazing regularity, being leveraged to develop workloads and use cases that require HPC power and scale like few domains. With compute evolving, including widespread integration of GPUs to run more sophisticated workloads, this entire solution set, if you will, is impacting healthcare and agriculture significantly.

The next five years will likely involve more, more, more - more applications helping to run more workloads in more areas, with jobs running on more powerful machines with greater processing power, more memory, and more storage, hopefully at lower cost.


Written by Brendan McGinty (Guest)

See Brendan McGinty (Guest)'s blog

Brendan McGinty is Director of Industry for the National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign.

Related blogs

CERN - When science can't use the cloud

Demand for high performance computing (HPC) is growing fast - and you might expect it to become kind of generic. But leading research sites like CERN still make extreme demands.

Read more


HPCs role in seismological study and advanced earthquake analysis

Researchers have typically taken an empirical approach to earthquake study, but as high performance computing (HPC) becomes more prevalent, traditional methods of seismological study are making way for a new paradigm of earthquake analysis based on high-granularity models.

Read more


HPC is unlocking the 'data gusher' in Oil and Gas research

An ‘oil gusher’, or a 'blowout', is the name for that phenomenon that you’ve seen in photos and film clips, when a drill strikes oil and it sprays out of the top of the well. It was common in the early 20th Century but is now quite rare, thanks to pressure control equipment. However in today’s oil and gas industry, data is the modern gusher – it sprays out in an uncontrolled fashion, signifying that something good is going on but it remains hard to get under control.

Read more

We use cookies to ensure we give you the best experience on our website, to analyse our website traffic, and to understand where our visitors are coming from. By browsing our website, you consent to our use of cookies and other tracking technologies. Read our Privacy Policy for more information.