Nowadays, the size of DNN models has grown rapidly. To train a large model, pipeline parallelism-based frameworks partition the model across GPUs and slice each batch of data into multiple micro-batches. However, pipeline parallelism suffers from a bubble issue and low peak utilization of GPUs. Recent work tries to address the two issues, but fails to exploit the benefit of vanilla pipeline parallelism, i.e., overlapping communication with computation. In this work, we employ an elastic averaging-based framework which explores elastic averaging to add multiple parallel pipelines. To help the framework exploit the advantage of pipeline parallelism while reducing the memory footprints, we propose a schedule, advance forward propagation. Moreover, since the numbers of parallel pipelines and micro-batches are essential to the framework performance, we propose a profiling-based tuning method to automatically determine the settings. We integrate those techniques into a prototype system, namely AvgPipe, based on PyTorch. Our experiments show that AvgPipe achieves a 1.7x speedups over state-of-the-art solutions of pipeline parallelism on average.
Wed 1 MarDisplayed time zone: Eastern Time (US & Canada) change
10:00 - 11:40 | Session 7: Machine LearningMain Conference at Montreal 4 Chair(s): Milind Kulkarni Purdue University | ||
10:00 20mTalk | TGOpt: Redundancy-Aware Optimizations for Temporal Graph Attention Networks Main Conference Yufeng Wang University of Illinois at Urbana-Champaign, Charith Mendis University of Illinois at Urbana-Champaign | ||
10:20 20mTalk | Dynamic N:M Fine-grained Structured Sparse Attention Mechanism Main Conference Zhaodong Chen University of California, Santa Barbara, Zheng Qu University of California, Santa Barbara, Yuying Quan University of California, Santa Barbara, Liu Liu , Yufei Ding UC Santa Barbara, Yuan Xie UCSB | ||
10:40 20mTalk | Elastic Averaging for Efficient Pipelined DNN Training Main Conference Zihao Chen East China Normal University, Chen Xu East China Normal University, Weining Qian East China Normal University, Aoying Zhou East China Normal University | ||
11:00 20mTalk | DSP: Efficient GNN Training with Multiple GPUs Main Conference Zhenkun Cai The Chinese University of Hong Kong, Qihui Zhou The Chinese University of Hong Kong, Xiao Yan Southern University of Science and Technology, Da Zheng Amazon Web Services, Xiang Song Amazon Web Services, Chenguang Zheng The Chinese University of Hong Kong, James Cheng The Chinese University of Hong Kong, George Karypis Amazon Web Services | ||
11:20 20mTalk | PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs Main Conference |