Public Lecture #6

"TinyML and Efficient Deep Learning"



Today’s AI is too big. Deep neural networks demand extraordinary levels of data and computation, and therefore power, for training and inference. This severely limits the practical deployment of AI in edge devices. We aim to improve the efficiency of neural network design. First, I’ll present MCUNet [1] that brings deep learning to IoT devices. MCUNet is a framework that jointly designs the efficient neural architecture (TinyNAS) and the light-weight inference engine (TinyEngine), enabling ImageNet-scale inference on micro-controllers that have only 1MB of Flash. Next I will introduce Once-for-All Network[2], an efficient neural architecture search approach, that can elastically grow and shrink the model capacity according to the target hardware resource and latency constraints. From inference to training, I’ll present TinyTL [3] that enables tiny transfer learning on-device, reducing the memory footprint by 7-13x. Finally, I will describe data-efficient GAN training techniques[4] that can generate photo-realistic images using only 100 images, which used to require tens of thousands of images. We hope such TinyML techniques can make AI greener, faster, more efficient and more sustainable.


  1. MCUNet: Tiny Deep Learning on IoT Devices, (NeurIPS’20 spotlight)
  2. Once-for-All: Train One Network and Specialize it for Efficient Deployment (ICLR’19)
  3. Tiny Transfer Learning: Reduce Memory, not Parameters for Efficient On-Device Learning (NeurIPS’20)
  4. Differentiable Augmentation for Data-Efficient GAN Training (NeurIPS’20)