Acceleration of Convolutional Neural Network Training using Field Programmable Gate Arrays
No Thumbnail Available
Date
2022-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Convolutional neural networks (CNN) training often necessitates a considerable amount
of computational resources. In recent years, several studies have proposed CNN inference
and training accelerators, which the FPGAs have previously demonstrated good
performance and energy efficiency. To speed processing, the CNN requires additional
computational resources such as memory bandwidth, a FPGA plantform resource usage,
time, and power consumption. As well as training the CNN needs large datasets
and computational power, and they are constrained by the requirement for improved
hardware acceleration to support scalability beyond existing data and model sizes. In
this study, we propose a procedure for energy efficient CNN training in collaboration
with an FPGA-based accelerator. We employed optimizations such as quantization,
which is a common model compression technique, to speed up the CNN training process.
Additionally, a gradient accumulation buffer is used to ensure maximum operating
efficiency while maintaining gradient descent of the learning algorithm.
Subsequently, to validate our design, we implemented the AlexNet and VGG16
models on an FPGA board and a laptop CPU and GPU. Consequently, our designs
achieve 203.75 GOPS on Terasic DE1-SoC with the AlexNet model and 196.50 GOPS
with the VGG16 model on Terasic DE-SoC. This, as far as we know, outperforms existing
FPGA-based accelerators. Compared to the CPU and GPU, our design is 22.613X
and 3.709X more energy efficient respectively.
Description
Keywords
Gate Arrays, Convolutional Neural Network