Fitsum, Assamnew (PhD)Guta, Tesema2022-02-112023-11-042022-02-112023-11-042022-01http://etd.aau.edu.et/handle/123456789/30005Convolutional neural networks (CNN) training often necessitates a considerable amount of computational resources. In recent years, several studies have proposed CNN inference and training accelerators, which the FPGAs have previously demonstrated good performance and energy efficiency. To speed processing, the CNN requires additional computational resources such as memory bandwidth, a FPGA plantform resource usage, time, and power consumption. As well as training the CNN needs large datasets and computational power, and they are constrained by the requirement for improved hardware acceleration to support scalability beyond existing data and model sizes. In this study, we propose a procedure for energy efficient CNN training in collaboration with an FPGA-based accelerator. We employed optimizations such as quantization, which is a common model compression technique, to speed up the CNN training process. Additionally, a gradient accumulation buffer is used to ensure maximum operating efficiency while maintaining gradient descent of the learning algorithm. Subsequently, to validate our design, we implemented the AlexNet and VGG16 models on an FPGA board and a laptop CPU and GPU. Consequently, our designs achieve 203.75 GOPS on Terasic DE1-SoC with the AlexNet model and 196.50 GOPS with the VGG16 model on Terasic DE-SoC. This, as far as we know, outperforms existing FPGA-based accelerators. Compared to the CPU and GPU, our design is 22.613X and 3.709X more energy efficient respectively.en-USGate ArraysConvolutional Neural NetworkAcceleration of Convolutional Neural Network Training using Field Programmable Gate ArraysThesis