doi: 10.17586/2226-1494-2020-20-4-560-567


MODEL OF AUTOMATED SYNTHESIS TOOL FOR HARDWARE ACCELERATORS OF CONVOLUTIONAL NEURAL NETWORKS FORPROGRAMMABLE LOGIC DEVICES

V. A. Egiazarian, S. V. Bykovsky


Read the full article  ';
Article in Russian

For citation:
Egiazarian V.A., Bykovskii S.V. Model of automated synthesis tool for hardware accelerators of convolutional neural networks for programmable logic devices. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 4, pp. 560–567 (in Russian). doi: 10.17586/2226-1494-2020-20-4-560-567


Abstract
Currently, more and more tasks on image processing and analysis are being solved using convolutional neural networks. Neural networks implemented using high-level programming languages, libraries and frameworks cannot be used in real-time systems, for example, for processing streaming video in cars, due to the low speed and energy efficiency of such implementations. The application of specialized hardware accelerators of neural networks is necessary for these tasks. The design of such accelerators is a complex iterative process requiring highly specialized knowledge and qualification. This consideration makes the creation of automation tools for high-level synthesis of such computers a relevant issue. The purpose of this research is a tool development for the automated synthesis of neural network accelerators from a high-level specification for programmable logic devices (FPGAs), which reduces the development time. A description of networks is used as a high-level specification, which can be obtained using the TensorFlow framework. The several strategies have been researched for optimizing the structure of convolutional networks, methods for organizing the computational process and formats for representing data in neural networks and their effect on the characteristics of the resulting computer. It was shown that structure optimization of neural network fully connected layers on the example of solving the handwritten digit recognition problem from the MNIST set reduces the number of network parameters by 95 % with a loss of accuracy equal to 0.43 %, pipelining of calculations speeds up the calculation by 1.7 times, and parallelization of the computing process individual parts provides the acceleration by almost 20 times, although  it requires 4-6 times more FPGA resources. Applying of fixed-point numbers instead of floating-point numbers in calculations reduces the used FPGA resources by 1.7–2.8 times. The analysis of the obtained results is carried out and a model of an automated synthesis tool is proposed, which performs the indicated optimizations in automatic mode in order to meet the requirements for speed and resources used in the implementation of neural network accelerators on FPGA.

Keywords: convolutional neural networks, neural network accelerators, hardware accelerators, FPGA, CAD, high-level synthesis

References
1. Guan Y., Liang H., Xu N., Wang W., Shi S., Chen X., Sun G., Zhang W., Cong J. FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. Proc. 25th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2017), 2017, pp. 152–159. doi: 10.1109/FCCM.2017.25
2. Venieris S.I., Bouganis C.S. FpgaConvNet: A Framework for mapping convolutional neural networks on FPGAs. Proc. 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM 2016), 2016, pp. 40–47. doi: 10.1109/FCCM.2016.22
3. Venieris S.I., Bouganis C.-S. Latency-driven design for FPGA-based convolutional neural networks. Proc. 27th International Conference on Field Programmable Logic and Applications (FPL 2017), 2017, pp. 8056828. doi: 10.23919/FPL.2017.8056828
4. Noronha D.H., Salehpour B., Wilton S.J.E. LeFlow: enabling flexible FPGA high-level synthesis of tensorflow deep neural networks. Proc. 5th International Workshop on FPGAs for Software Programmers (FSP 2018), co-located with International Conference on Field Programmable Logic and Applications (FPL 2018), 2018, pp. 46–53.
5. Lattner C., Adve V. LLVM: a compilation framework for lifelong program analysis & transformation. Proc. of the International Symposium on Code Generation and Optimization (CGO 2004), 2004, pp. 75–86. doi: 10.1109/CGO.2004.1281665
6. Zhang X., Wang J., Zhu C., Lin Y., Xiong J., Hwu W.-M., Chen D. DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs. Proc. 37th IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2018), 2018, pp. a56. doi: 10.1145/3240765.3240801
7. Zhang C., Sun G., Fang Z., Zhou P., Pan P., Cong J. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, vol. 38, no. 11, pp. 2072–2085. doi: 10.1109/TCAD.2017.2785257
8. Duarte J., Han S., Harris P., Jindariani S., Kreinar E., Kreis B., Ngadiuba J., Pierini M., Rivera R., Tran N., Wu Z. Fast inference of deep neural networks in FPGAs for particle physics. Journal of Instrumentation, 2018, vol. 13, no. 7, pp. P07027. doi: 10.1088/1748-0221/13/07/P07027
9. Cheng Y., Wang D., Zhou P., Zhang T. A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282.
10. Rastegari M., Ordonez V., Redmon J., Farhadi A. XNOR-net: Imagenet classification using binary convolutional neural networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9908, pp. 525–542. doi: 10.1007/978-3-319-46493-0_32
11. Vanhoucke V., Senior A., Mao M.Z. Improving the speed of neural networks on CPUs. Proc. of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS 2011), 2011.
12. Jang H., Park A., Jung K. Network implementation using CUDA and OpenMP. Proc. of the Digital Image Computing: Techniques and Applications (DICTA 2008), 2008, pp. 155–161. doi: 10.1109/DICTA.2008.82
13. Abadi M., Agarwal A., Barham P. et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467.
14. Chollet F. et al. Keras. 2015. Available at: https://github.com/keras-team/keras (accessed: 01.02.2020).


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2022 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика