Verne Global

AI / ML / DL | HPC | Tech Trends |

18 January 2018

Building Deep Learning Systems

Written by Adam Nethersole

Adam is Senior Director of Marketing at Verne Global and has worked within the areas of sustainability and renewable energy for the last 15 years. You can follow him at: @AdamNethersole

Everywhere I look, it seems everyone is neck deep in the deep learning pool. There isn't an industry or a product line that isn't in some form or another being given the "deep learning makeover". To be fair, it is a technology that is helping to produce some amazing breakthroughs in machine perception fields such as computer vision, speech recognition, video analysis and other classification and learning tasks. However, there are significant technological challenges hindering the development of deep learning solutions.

To date, deep learning applications like Amazon’s Alexa, translation engine and market disruptor DeepL, (whom I'm proud to say use Verne Global for a large portion of their compute) and Google Brain’s new image enhancement system, are the best-known applications of deep learning technology. But in Europe (especially the UK and Germany), the US and China, deep-learning startups scenes are thriving. Despite the optimism for deep learning, which is expected to hit $36 billion in total market value by 2025, there are a number of issues that need addressing to avoid this progression being restricted.

The first phase of developing a deep neural network’s intelligence — called training — is the process of using a large data set to teach a neural network how to reach the correct result or conclusion through observation, repetition, and trial and error. Training an artificial neural network to have real-world value takes large amounts of data, hundreds or thousands of times more than humans need to learn the same information, according to Neil Lawrence, professor of machine learning at the University of Sheffield. Our VP of Strategy, Bob Fletcher has also done a lot of analysis within this first stage and his blogs are well worth a read.

Since researchers at Stanford and NVIDIA discovered that GPUs were dramatically more efficient at this training process, they’ve been adopted widely throughout the machine learning field for this purpose. However, training a complex artificial neural network with GPUs often means building the most dense compute pool possible. The specialised clusters used in deep learning require considerably more power than general-purpose HPC clusters for other types of data center workloads — often as high as 30kW or more. In contrast, the power density in a standard data center rack rarely exceeds 10-15kW per rack, which is considered a high power density, and often stays within the 3 to 5kW per-rack range.

While GPUs are much efficient than CPUs at processing deep learning workloads, and require less energy to do so, the high power density seen in deep learning systems can present other challenges, such as making the systems harder to cool. Cooling data centers that can have racks with such a high power density often means employing multiple techniques simultaneously, including determining optimal airflow management techniques for the facility, aisle containment structures to increase the cooling efficiency, spacing high density racks out to provide better ventilation, or liquid cooling methods. Specialised computer room air conditioning (CRAC) units can also be effective, but add substantial power cost and can dramatically increase a data center’s carbon footprint.

In order to keep the total cost of ownership (TCO) of deep learning systems low, industry-leaders like Google, Intel, and others are launching new chips and interconnect technologies to help maximise the efficiency of deep learning systems, while also controlling the expense of powering and cooling them. This has led to something of an arms race in the field of deep learning hardware.

For example, Intel spent $408M dollars to acquire 48-person startup Nervana, which in addition to producing an open source deep learning framework, also develops a specialised chip called the Nervana Engine, an application specific integrated circuit (ASIC) that’s optimised for deep learning. Intel has stated their goal is to develop the Nervana technology to speed the training of a deep neural network by one-hundred times by 2020. Not long after acquiring Nervana, Intel made another pricey acquisition related to deep learning when they purchased Movidius, an Irish company that specialises in making chips geared toward computer vision applications, an area of the deep learning field in which Intel has an apparent interest.

Other companies have developed improved deep learning capability internally. Google recently released details about the next generation of proprietary chip it’s developed to support its TensorFlow machine-learning framework. According to Google, their new tensor processing unit (TPU), also called Cloud TPU, can not only be used to speed inference workloads by 15 to 30 times over a standard GPU/CPU combo, but can now also train deep neural networks (which previous generations could not) while providing 30 to 80 times higher TOPS/W performance of competitors. Google will use these new chips in proprietary systems like AlphaGo, which famously defeated Go world-champion Lee Sedol in 2016, but is also giving away access to a 1,000 TPU cluster to encourage research based on the platform.

Improved chips and architecture are just one part of the effort to develop strong, more efficient deep learning systems. In order to lower the cost of developing these, and ensure that the optimism and excitement for deep learning and other artificial intelligence applications stays sustainable - a topic close to my heart - companies developing applications should explore a range of ideally renewably powered data center options, just like DeepL did when they chose Verne Global.

Verne Global is proud to help companies in the deep learning field benefit from the low-cost of power and 100% renewable energy available in Iceland. These key benefits, combined with the additional advantage of free air cooling enabled by Iceland’s naturally mild climate and HPC-optimised infrastructure, can help to greatly lower the TCO of deep neural network development and ensure that the trend toward machine intelligence stays as cost effective and as sustainable as possible.

Share:FacebookTwitterLinkedInmail

Sign up for the Verne Global newsletter

Opinion, thought leadership and news delivered directly to your inbox once a month.