**I recently revisited Bristol, UK for an AI and HPC Meet****up hosted at Graphcore’s HQ where they design the wicked powerful IPU CPU accelerator. There were three excellent presentations and the one by Helen Byrne, AI Research Engineer at Graphcore, about their 2020 research focus reiterated that my mathematics skills were well and truly rusty. **

Graphcore’s hot 2020 research topics are:

- Arithmetic Efficiency
- Memory Efficient Training
- Optimization of Stochastic Learning
- Sparse Structures for Training of Overparametrized Models
- Neural Architecture Search
- Probabilistic Planning for Model-Based Reinforcement Learning
- Distributed Learning in Large-Scale Machines

The underpinning of most of these topics is the use of Floating-Point numbers. Traditionally scientific and HPC applications use IEEE 754 Double Precision Floating Point numbers (FP-64) but there is a myriad of variations on this theme:

Floating-point formats:

- IEEE 754
- 16-bit: Half (binary16)
- 32-bit: Single (binary32), decimal32
- 64-bit: Double (binary64), decimal64
- 128-bit: Quadruple (binary128), decimal128
- 256-bit: Octuple (binary256)
- Extended-precision formats (40-bit or 80-bit)

- Other
- Minifloat
- bfloat16
- Microsoft Binary Format
- IBM floating-point architecture
- Posit
- G.711 8-bit floats

Historically IEEE 754 is upgraded every decade or so. I suspect that the advance of DNN training and the increasing use of sub 32-bit numbers will drive another refresh soon. Graphcore are researching the exploitation of a custom 8-bit floating point format.

Binary formats for representing numbers have limitations in representing only whole numbers (integers) within a range. 32-bit binary can represent numbers from 0 through 4,294,967,295 (2^{32} − 1), ~4G and 32-bit two’s complement numbers, the most significant bit is a positive/negative sign semaphore, can represent −2,147,483,648 (−2^{31}) through 2,147,483,647 (2^{31} − 1).

Floating point numbers offer the ability to address a larger range of values, both larger and smaller, than the equivalent binary integers. A FP-32 number has a sign bit on the left, the most significant (largest) bit, followed by an 8-bit exponent and the 23-bit significand (historically called the mantissa).

Starting with the sign, it’s value is defined as (-1)^{sign}, making a sign bit of 0 a positive number. The exponent can be either an 8-bit signed integer from −128 to 127 (2's complement) or an 8-bit unsigned integer from 0 to 255. However, exponents of −127 (all 0s) and +128 (all 1s) are reserved for special numbers so the resulting range is −126 to +127. If the unsigned integer format is used, the exponent value used is the exponent shifted by a bias (offset), an exponent value of 127 represents the actual zero (i.e. for 2e − 127 to be one, e must be 127). The above 01111100 = 124 so the unbiased exponent = 124-127 = -3, resulting in the exponent component being 2^{-3 }= 1/8 = 0.125.

The above significand is a binary number 01000000000000000000000 which is expanded as 1 + X/2^{1} + X/2^{2} + … = 1 + 0*1/2 + 1*1/4 + … = 1 ¼ = 1.25.

Hence the full FP-32 number is 1.25 * 0.125 = **0.15625**.

The minimum and maximum representable FP-32 numbers are:

0 00000001 000000000000000000000002 = 0080 000016 = 2−126 ≈ 1.1754943508 × 10^{−38} - smallest positive normal number

0 11111110 111111111111111111111112 = 7f7f ffff16 = 2127 × (2 − 2−23) ≈ 3.4028234664 × 10^{38 }- largest normal number

0 01111110 111111111111111111111112 = 3f7f ffff16 = 1 − 2−24 ≈ 0.9999999404 - largest number less than one

Thus providing a huge addressable range compared with the 4,294,967,295 available from binary integers.

My university labs drilled into me that inputting rubbish into an equation results in rubbish out of it. One particularly odd experiment involved measuring the area of a rectangle with a ruler - which at best you could estimate to ¼ of a mm. The grading rubric was simple A – XXX.Ymm^{2}, B – XXX.YYmm^{2}, C – XXX.YYYmm^{2}, D – XXX.YYYYmm^{2}, E – XXX.YYYYYmm^{2}, … The students lacking prescriptive on the accuracy of a ruler at a certain temperature, using their computers, got a fabulous common-sense lesson!

Similar logic drives the desire to use 8-bit floating point notation for AI training of data like natural language and machine vision with imprecise data. 8-bit floating point numbers support integer numbers in the range 122,880 to -122,880 due to a lack of precision from the 3 significand bits.

Image file formats exploit many smart compression techniques for sub-sections of the image. However, assuming no compression, if we are training images at the pixel granularity the number of colors per pixel drives the mathematical granularity and total image size. Here are the typical image colour densities:

Bits per pixel | Number of colours |

1 bpp | 2 colours |

2 bpp | 4 colours |

3 bpp | 8 colours |

4 bpp | 16 colours |

5 bpp | 32 colours |

6 bpp | 64 colours |

7 bpp | 128 colours |

8 bpp | 256 colours |

10 bpp | 1024 colours |

16 bpp | 65536 colours |

24 bpp | 16777216 colors (16.7 million colors) |

32 bpp | 4294967296 colors (4294 million colors) |

8-bit floating point data will be enough for in the region of 65,536 color/pixel images which is more than sufficient for DNN training. Here is a 65536-color image for reference:

It’s far more refined than needed for your average robot or autonomous vehicle which more often look like:

Rather than being intuitive Graphcore test their numerical precision theories:

Here you see that there is an advantage to having Float-32 for the training weight updates when training ResNet-32 images.

The word stochastic is jargon for random. A stochastic process is a system which evolves in time while undergoing chance fluctuations. There are used extensively to guide AI DNN training. Here is an example of using one for simplifying the rounding compute requirements:

In this case it improves the overall rounding accuracy aggregated over 100 iterations. When used in the previous ResNet-32 training example it facilitates the use of FP-16 with very similar accuracy to using FP-32 without the additional compute requirement – **wicked cool!**

** **

Clearly understanding the science enables Graphcore to design a better hardware/software AI training environment and toolset. Similarly, having a data center knowledgeable and supportive of AI and HPC technology will remove friction from your development process.

Bring your AI training projects to Iceland and take advantage of our green energy, free air cooling and fabulous pricing. Let’s chat about your next AI project and don’t miss our next AI and HPC meet-up.

Bob Fletcher, VP of Artificial Intelligence, Verne Global (Email: bob.fletcher@verneglobal.com)