MODEL OF AUTOMATED SYNTHESIS TOOL FOR HARDWARE ACCELERATORS OF CONVOLUTIONAL NEURAL NETWORKS FORPROGRAMMABLE LOGIC DEVICES

Victor A. Egiazarian, Sergei V. Bykovskii

2020 , VOLUME 20, NUMBER 4 ( july-august )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2020-20-4-560-567

MODEL OF AUTOMATED SYNTHESIS TOOL FOR HARDWARE ACCELERATORS OF CONVOLUTIONAL NEURAL NETWORKS FORPROGRAMMABLE LOGIC DEVICES

V. A. Egiazarian, S. V. Bykovsky

Read the full article

Article in Russian

For citation:

Egiazarian V.A., Bykovskii S.V. Model of automated synthesis tool for hardware accelerators of convolutional neural networks for programmable logic devices. Scientiﬁc and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 4, pp. 560–567 (in Russian). doi: 10.17586/2226-1494-2020-20-4-560-567

Abstract

Currently, more and more tasks on image processing and analysis are being solved using convolutional neural networks. Neural networks implemented using high-level programming languages, libraries and frameworks cannot be used in real-time systems, for example, for processing streaming video in cars, due to the low speed and energy efﬁciency of such implementations. The application of specialized hardware accelerators of neural networks is necessary for these tasks. The design of such accelerators is a complex iterative process requiring highly specialized knowledge and qualiﬁcation. This consideration makes the creation of automation tools for high-level synthesis of such computers a relevant issue. The purpose of this research is a tool development for the automated synthesis of neural network accelerators from a high-level speciﬁcation for programmable logic devices (FPGAs), which reduces the development time. A description of networks is used as a high-level speciﬁcation, which can be obtained using the TensorFlow framework. The several strategies have been researched for optimizing the structure of convolutional networks, methods for organizing the computational process and formats for representing data in neural networks and their effect on the characteristics of the resulting computer. It was shown that structure optimization of neural network fully connected layers on the example of solving the handwritten digit recognition problem from the MNIST set reduces the number of network parameters by 95 % with a loss of accuracy equal to 0.43 %, pipelining of calculations speeds up the calculation by 1.7 times, and parallelization of the computing process individual parts provides the acceleration by almost 20 times, although it requires 4-6 times more FPGA resources. Applying of ﬁxed-point numbers instead of ﬂoating-point numbers in calculations reduces the used FPGA resources by 1.7–2.8 times. The analysis of the obtained results is carried out and a model of an automated synthesis tool is proposed, which performs the indicated optimizations in automatic mode in order to meet the requirements for speed and resources used in the implementation of neural network accelerators on FPGA.

Keywords: convolutional neural networks, neural network accelerators, hardware accelerators, FPGA, CAD, high-level synthesis

References

1. Guan Y., Liang H., Xu N., Wang W., Shi S., Chen X., Sun G., Zhang W., Cong J. FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. Proc. 25th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2017), 2017, pp. 152–159. doi: 10.1109/FCCM.2017.25

2. Venieris S.I., Bouganis C.S. FpgaConvNet: A Framework for mapping convolutional neural networks on FPGAs. Proc. 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM 2016), 2016, pp. 40–47. doi: 10.1109/FCCM.2016.22

3. Venieris S.I., Bouganis C.-S. Latency-driven design for FPGA-based convolutional neural networks. Proc. 27th International Conference on Field Programmable Logic and Applications (FPL 2017), 2017, pp. 8056828. doi: 10.23919/FPL.2017.8056828

4. Noronha D.H., Salehpour B., Wilton S.J.E. LeFlow: enabling flexible FPGA high-level synthesis of tensorflow deep neural networks. Proc. 5th International Workshop on FPGAs for Software Programmers (FSP 2018), co-located with International Conference on Field Programmable Logic and Applications (FPL 2018), 2018, pp. 46–53.

5. Lattner C., Adve V. LLVM: a compilation framework for lifelong program analysis & transformation. Proc. of the International Symposium on Code Generation and Optimization (CGO 2004), 2004, pp. 75–86. doi: 10.1109/CGO.2004.1281665

6. Zhang X., Wang J., Zhu C., Lin Y., Xiong J., Hwu W.-M., Chen D. DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs. Proc. 37th IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2018), 2018, pp. a56. doi: 10.1145/3240765.3240801

7. Zhang C., Sun G., Fang Z., Zhou P., Pan P., Cong J. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, vol. 38, no. 11, pp. 2072–2085. doi: 10.1109/TCAD.2017.2785257

8. Duarte J., Han S., Harris P., Jindariani S., Kreinar E., Kreis B., Ngadiuba J., Pierini M., Rivera R., Tran N., Wu Z. Fast inference of deep neural networks in FPGAs for particle physics. Journal of Instrumentation, 2018, vol. 13, no. 7, pp. P07027. doi: 10.1088/1748-0221/13/07/P07027

9. Cheng Y., Wang D., Zhou P., Zhang T. A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282.

10. Rastegari M., Ordonez V., Redmon J., Farhadi A. XNOR-net: Imagenet classification using binary convolutional neural networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9908, pp. 525–542. doi: 10.1007/978-3-319-46493-0_32

11. Vanhoucke V., Senior A., Mao M.Z. Improving the speed of neural networks on CPUs. Proc. of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS 2011), 2011.

12. Jang H., Park A., Jung K. Network implementation using CUDA and OpenMP. Proc. of the Digital Image Computing: Techniques and Applications (DICTA 2008), 2008, pp. 155–161. doi: 10.1109/DICTA.2008.82

13. Abadi M., Agarwal A., Barham P. et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467.

14. Chollet F. et al. Keras. 2015. Available at: https://github.com/keras-team/keras (accessed: 01.02.2020).

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License