Hardware Accelerators for Machine Learning

This course provides in-depth coverage of the architectural techniques used to design accelerators for training and inference in machine learning systems. We start with classical ML algorithms including linear regression and support vector machines and mainly focus on DNN models such as convolutional neural nets and recurrent neural nets. The course will explore acceleration and hardware trade-offs for both training and inference of these models. We will also examine the impact of parameters including batch size, precision, sparsity and compression on the design space trade-offs for efficiency vs accuracy. The course presents several guest lecturers from top groups in industry and academia.

Topics Include

  • Accelerator design for ML model inference and training
  • Linear algebra fundamentals and accelerating linear algebra
  • Neural networks: MLPs and CNNs Inference
  • Evaluating Performance, Energy efficiency, Parallelism, Locality, Memory hierarchy, Roofline model
  • Generalization and Regularization of Training
  • Fast Inference/Training
  • Distributed Training
  • Sparsity, Low Precision, and Asynchronous training

You Will Learn

  • How to implement the core computational kernels used in ML using parallelism, locality, and low precision
  • How to design energy-efficient accelerators, making trade-offs between ML model parameters and hardware implementation techniques

Course Page
$5,200.00 Subject to change
Online, instructor-led
10 weeks, 10-20 hrs/week
Stanford School of Engineering