[1] Jorge Albericio, Patrick Judd, Tayler H Hetherington, Tor M Aamodt, Natalie D Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. international symposium on computer architecture 44, 3 (2016), 1–13.
[2] Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2017. YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems PP, 99 (2017), 1–1.
[3] Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. architectural support for programming languages and operating systems 49, 4 (2014), 269–284.
[4] Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: an automated end-to-end optimizing compiler for deep learning. operating systems design and implementation (2018), 579–594.
[5] Yuhsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-state Circuits 52, 1 (2017), 127–138.
[6] Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Fei Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. DaDianNao: A Machine-Learning Supercomputer. international symposium on microarchitecture (2014), 609–622.
[7] Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. international symposium on computer architecture 44, 3 (2016), 27–39.
[8] Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Fei Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: shifting vision processing closer to the sensor. international symposium on computer architecture 43, 3 (2015), 92–104.
[9] Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christoforos E Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. architectural support for programming languages and operating systems 45, 1 (2017), 751–764.
[10] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. international symposium on computer architecture 44, 3 (2016), 243–254..
[11] Norman P Jouppi, C S Young, Nishant Patil, David A Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh K Bhatia, Nan Boden, Al Borchers, et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. international symposium on computer architecture 45, 2 (2017), 1–12.
[12] Patrick Judd, Jorge Albericio, Tayler H Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: bit-serial deep neural network computing. international symposium on microarchitecture (2016), 1–12.
[13] Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory. In International Symposium on Computer Architecture.
[14] Hongbin Li, Xitian Fan, Li Jiao, Wei Ping Cao, Xuegong Zhou, and Lingli Wang. 2016. A high performance FPGA-based accelerator for large-scale convolutional neural networks. (2016), 1–9.
[15] Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. architectural support for programming languages and operating systems 43, 1 (2015), 369–381.
[16] Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: an instruction set architecture for neural networks. international symposium on computer architecture 44, 3 (2016), 393–405.
[17] Bert Moons, Roel Uytterhoeven, Wim Dehaene, and Marian Verhelst. 2017. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. In Solid-state Circuits Conference.
[18] Bert Moons and Marian Verhelst. 2016. A 0.3-2.6 TOPS/W Precision-Scalable Processor for Real-Time Large-Scale ConvNets. In Vlsi Circuits.
[19] Thierry Moreau, Tianqi Chen, Ziheng Jiang, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. VTA: An Open Hardware-Software Stack for Deep Learning. arXiv: Learning (2018).
[20] P. R. Panda. 2001. SystemC - a modeling platform supporting multiple design abstractions. In International Symposium on System Synthesis.
[21] Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. international symposium on computer architecture 45, 2 (2017), 27–40.
[22] Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et al. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. (2016), 26–35.
[23] Brandon Reagen, Paul N Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, Jose Miguel Hernandezlobato, Guyeon Wei, and David M Brooks. 2016. Minerva: enabling low-power, highly-accurate deep neural network accelerators. international symposium on computer architecture 44, 3 (2016), 267–278.
[24] Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. international symposium on computer architecture 44, 3 (2016), 14–26.
[25] Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit K Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. international symposium on microarchitecture (2016), 1–12.
[26] Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. (2017), 541–552.
[27] Liang Xiaoyao. 2019. Ascend AI Processor Architecture and Programming. TSINGHUA UNIVERSITY PRESS
[28] S. Yin, P. Ouyang, J. Yang, T. Lu, X. Li, L. Liu, and S. Wei. 2018. An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS. In 2018 IEEE Symposium on VLSI Circuits. 37–38. https://doi.org/10.1109/VLSIC.2018.8502388
[29] Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-x: an accelerator for sparse neural networks. international symposium on microarchitecture (2016), 1–12.
[30] Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jenghau Lin, Mani B Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs. (2017), 15–24.