TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition (PPoPP 2023 - Main Conference)

Who

Lizhi Xiang, Miao Yin, Chengming Zhang, Aravind Sukumaran-Rajam, Saday Sadayappan, Bo Yuan, Dingwen Tao

Track

PPoPP 2023 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 28 Feb 2023 13:50 - 14:10 at Montreal 4 - Session 5: Decompositions Chair(s): Milind Chabbi

Abstract

Tucker decomposition is one of the SOTA CNN model compression techniques. However, unlike the FLOPs reduction, we observe very limited inference time reduction with Tucker-compressed models using existing GPU software such as cuDNN. To this end, we propose an efficient end-to-end framework that can generate highly accurate and compact CNN models via Tucker decomposition and optimized inference code on GPUs. Specifically, we propose an ADMM-based training algorithm that can achieve highly accurate Tucker-format models. We also develop a high-performance kernel for Tucker-format convolutions and analytical performance models to guide the selection of execution parameters. We further propose a co-design framework to determine the proper Tucker ranks driven by practical inference time (rather than FLOPs). Our evaluation on five modern CNNs with A100 demonstrates that our compressed models with our optimized code achieve up to 2.21$\times$ speedup over cuDNN, 1.12$\times$ speedup over TVM, and 3.27$\times$ over the original models using cuDNN with at most 0.05% accuracy loss.

Lizhi Xiang

University of utah

Miao Yin

Rutgers University

Chengming Zhang

Indiana University

Aravind Sukumaran-Rajam

Saday Sadayappan

University of Utah, USA

Bo Yuan

Rutgers University

Dingwen Tao