PPoPP 2023
Sat 25 February - Wed 1 March 2023 Montreal, Canada
Tue 28 Feb 2023 08:30 - 09:30 at Montreal 3-4-5 - PPoPP Keynote

Machine learning models are growing at an unprecedented speed and fitting such models requires a large distributed GPU system with costly collective communication primitives such as AllReduce or AllToAll. Recent trends in large language models suggest that 30-70% of both training and inferencing is spent on communication. Despite major leaps in hardware innovation for GPU communication, there is still a lot of software performance improvement left on the table. The reason for this gap is majorly due to the lack of a performant P2P communication abstraction on GPUs. HPC applications running on CPUs utilize performant implementations of MPI abstraction to control the communication at a fine-grained level, however, such a performant abstraction does not exist on GPUs.

In this talk, we will discuss how much performance room there is for collective communication on GPUs and the challenges for maximizing it. We will also look at existing abstractions in communication primitives and how they are limiting the space of parallelizing configurations for AI workloads. Lastly, we will look at overlapping communication-computation challenges in GPUs. We will conclude by proposing an abstraction for GPU communication and how effective they are in closing the performance gaps.

Tue 28 Feb

Displayed time zone: Eastern Time (US & Canada) change

08:30 - 09:30
PPoPP KeynoteKeynotes at Montreal 3-4-5
GPU Communication Requires Rethinking Abstractions
Saeed Maleki Microsoft Research