Hardware Accelerators for Machine Learning

This course provides in-depth coverage of the architectural techniques used to design accelerators for training and inference in machine learning systems. We start with classical ML algorithms including linear regression and support vector machines and mainly focus on DNN models such as convolutional neural nets and recurrent neural nets. The course will explore acceleration and hardware trade-offs for both training and inference of these models. We will also examine the impact of parameters including batch size, precision, sparsity and compression on the design space trade-offs for efficiency vs accuracy. The course presents several guest lecturers from top groups in industry and academia.

Topics Include

Accelerator design for ML model inference and training
Linear algebra fundamentals and accelerating linear algebra
Neural networks: MLPs and CNNs Inference
Evaluating Performance, Energy efficiency, Parallelism, Locality, Memory hierarchy, Roofline model
Generalization and Regularization of Training
Fast Inference/Training
Distributed Training
Sparsity, Low Precision, and Asynchronous training

You Will Learn

How to implement the core computational kernels used in ML using parallelism, locality, and low precision
How to design energy-efficient accelerators, making trade-offs between ML model parameters and hardware implementation techniques

Course Page

Price

$5,200.00 Subject to change

Delivery

Online, instructor-led

Level

Advanced

Commitment

10 weeks, 10-20 hrs/week

School

Stanford School of Engineering

Language

English