Skip to content

DOWNLOAD AAAI 2023 TUTORIAL SLIDES

“Large-Scale Deep Learning Optimization Techniques”

Get the slides

Large-Scale Deep Learning Optimization Techniques

Presented at AAAI Conference, 7 February 2023, by James Demmel, Chief Strategy Officer, and Yang You, Founder, President and Chairman at HPC-AI Tech.

Large transformer models display promising performance on a wide spectrum of AI applications. However, there has been a recent insurgence of extremely large models due to their good performance. These models have exorbitant training costs due to large communication overhead and the number of computations they execute. Therefore, both academia and industry are scaling DL training on larger clusters. However, degrading generalization performance, non-negligible communication overhead, and increasing model size prevent DL researchers and engineers from exploring large-scale AI models. In this tutorial, we aim to provide a clear sketch of the optimizations for large-scale deep learning with regard to model accuracy and model efficiency. We investigate algorithms that are most commonly used for optimizing: we recall the key idea of gradient descent optimization, introduce large batch training optimization, elaborate on the debatable topic of the generalization gap that arises in large-batch training, present second-order optimization, and lastly, review the state-of-the-art strategies in addressing the communication overhead and reducing memory footprints. 

The tutorial will be self-contained and only a basic understanding of deep learning would be good enough to follow most materials. More details can be found at https://github.com/hpcaitech/ColossalAI/tree/main/colossalai/nn/optimizer.