Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors - 2018


Deep Learning is getting additional and more attentions in recent years. Many hardware architectures have been proposed for economical implementation of deep neural network. The arithmetic unit, as a core processing half of the hardware design, can confirm the functionality of the entire design. During this paper, an economical fastened/floating-point merged multiply-accumulate unit for Deep Learning processor is proposed. The proposed architecture supports 16-bit 0.5-precision floating-point multiplication with 32-bit single-precision accumulation for training operations of Deep Learning algorithm. Similarly, among the identical hardware, the proposed design also supports 2 parallel 8-bit fastened-point multiplications and accumulating the product to 32-bit fixed-purpose variety. This will enable higher throughput for inference operations of Deep Learning algorithms. Compared to a 0.5-precision multiply-accumulate unit (accumulating to single-precision), the proposed design has only four.6percent space overhead. With the proposed multiply-accumulate unit, the Deep Learning processor can support both coaching and high-throughput inference.

