Over the last five years or more, Deep Neural Networks (DNNs) have been driving major advances in image recognition - which is bad news if robbing banks is your game. Anyway, before we focus on them and their ill-gotten cash, lets look at the tech that's behind all of this (and if you're already familiar with DNNs feel free to skip past this section)...
Neural networks are often 4 to 6 layers deep and have millions of nodes, perhaps 12 million per layer, representing the pixels, from a cell phone picture. Data is fed into the input layer and removed from the output layer, as shown below:
Each of the lines between dots represent a convolution of a data input. This means the input layer data is multiplied by the arrow coefficient and placed in the destination location to be summed with the other products from the other input layer arrows. Sometimes every data node interacts with every other one on the next row – fully convoluted. This is 12,000,000 X 12,000,000 (144,000,000,000,000) floating point calculations on each row or about 864,000,000,000,000 for each image flowing through the 6-layer network – this process is called neural network inference.
Surprisingly, this is not the compute intensive part of DNNs. Training is where the coefficients are determined for where each of the arrows is. It is standard practice to feed millions of images into the DNNs and look for a desired result at the output, while also testing with known good images. A common test is to look for cats in millions of Facebook images. Each time an undesired output result is found the coefficients are adjusted. The coefficients are often adjusted continuously for weeks before the desired behaviour is learned – burning many megawatts of compute power in the process.
Now, enter stage left, our hapless bank robber. Recently some law enforcement agencies have been following Facebook and Google’s lead in image analysis. Their goal being to recognise the bank robber candidates, post-robbery, on various social media forums and then build a map of their social/professional associates and their common haunts and hangouts to facilitate their expedient questioning. Step-one in this process is to train a deep neural network about bad-taste bank robbery images!
I just love seeing the pictures of folks surrounded by their stash of ill-gotten cash. Thereafter it’s necessary to have the system recognise the specific bank robbers and hunt through a multitude of social media content for leads. All very straight-forward and well proven technology. What could possibly go wrong?
Surprisingly, after months of DNN training an Intel server-based compute cluster was still grinding away with no end in sight. The data center operator and the power company were truly delighted. With their budget in tatters the law enforcement technical team did something radical and tried a different server architecture – IBM Power8 this time. Using Power8, things suddenly started to take shape and after another month the image training was complete and the system could build a map of the bank robber’s associates and local haunts too.
There are no devils or fairies in high performance computing, although until you find the problem root-cause it sometimes feels that way, hence the Power8 hardware was doing something better. Both the Intel server and the Power8 clusters made extensive use of Nvidia’s P-100 GPU to accelerate the convolution of the neural network. The big difference between them is Intel servers exploit an implementation of the PCI Express I/O bus to transfer data to the P-100 GPU whereas the Power8 has a native implementation of the much higher bandwidth Nvidia NVLink I/O bus.
Likely the ability to move about 3 times the data volume over the NVLink bus back and forth between the server CPU and the GPU prevented the training from timing-out on some I/O intensive tasks and retrying it over again, repeatedly. This is the modern-day computer equivalent to matching garden hoses - feeding a 1” hose from a ½” hose is an object lesion in futility – I’ve tried it, and got very wet in the process. Perhaps the server GPU and CPU needed to move more data quicker that the PCI Express allowed.
Another interesting application of a similar technology (and again not good news for our now tiring and less-than-enthusiastic bank robber) is video analysis. I never worry about being lost in London, there are security cameras on almost every lamp-post around Westminster, on all the trains, buses, taxis and the Underground. In fact 25% of CCTV cameras worldwide are reportedly in London. Hold a sign saying, “I’m lost - help” and a polite London policeman (or "Bobby") will surely be right there to help.
So, in times of crisis or crime it’s great to know that there is ample video coverage to find the culprits. The challenge is the extraordinary volume of video. There just aren’t sufficient law enforcement staff to review all the video footage for a white male with jeans and a black t-shirt before someone has flown around the world. Once again DNNs are being trained to rapidly review colossal amounts of video and pull out all the 5-minute clips with the desired human attributes. I wonder what the next unexpected DNN training technical glitch will be?
So, sleep safe at home knowing DNNs are turbo-charging leading-edge image/video analysis capabilities and keeping Verne Global’s Icelandic data center humming. And skip bank robbing as a career option – it has no future after DNN machine vision...
Come and meet me! I'm looking forward to attending the excellent SC17 super computing conference in Denver, Colorado during 12-17 November. If you are also heading there feel free to get in touch (firstname.lastname@example.org) - I'm always interested to meet fellow HPC and DNN enthusiasts.