Public Lecture #5

"Efficient Deep Learning at Scale: Hardware and Software"



The rapid growth of modern neural network models’ scale generates ever-increasing demands for high computing power of artificial intelligence (AI) systems. Many specialized computing devices have been also deployed in the AI systems, forming a truly application-driven heterogenous computing platform. This talk discusses the importance of hardware/software co-design in the development of AI computing systems. We first use resistive memory based NN accelerators to illustrate the design philosophy of heterogeneous AI computing systems, and then present several hardware-friendly neural network model compression techniques. We also extend our discussions to distributed systems and briefly introduce the automation of the co-design flow, e.g., neural architecture search. A research roadmap of our relevant research is given at the end of the talk.


  1. W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning Structured Sparsity in Deep Neural Networks,” Annual Conference on Neural Information Processing Systems (NIPS), Dec. 2016, pp. 1-9.
  2. W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, and H. Li, “TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning,” Annual Conference on Neural Information Processing Systems (NIPS), Dec. 2017.
  3. L. Song, X. Qian, H. Li, and Y. Chen, “PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning,” International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2017, pp. 541-552.
  4. B. Yan, Q. Yang, W.-H. Chen, K.-T. Chang, J.-W. Su, C.-H. Hsu, S.-H. Li, H.-Y. Lee, S.-S. Sheu, M.-F. Chang, Q. Wu, Y. Chen, and H. Li, “RRAM-Based Spiking Nonvolatile Computing-In-Memory Processing Engine with Precision-Configurable in Situ Nonlinear Activation,” 2019 Symposium on VLSI Circuits, Jun. 2019, pp. T86-T87.