PPoPP 2023
Sat 25 February - Wed 1 March 2023 Montreal, Canada
Tue 28 Feb 2023 16:20 - 16:40 at Montreal 4 - Session 6: Kernels Chair(s): Martin Kong

This paper demonstrates that state-of-the-art proposals to compute convolutions on architectures with CPUs supporting SIMD instructions deliver poor performance for long SIMD lengths due to frequent cache conflict misses. We first discuss how to adapt the state-of-the-art SIMD direct convolution to architectures using long SIMD instructions and analyze the implications of increasing the SIMD length on the algorithm formulation. Next, we propose two new algorithmic approaches: the Bounded Direct Convolution (BDC), which adapts the amount of computation exposed to mitigate cache misses, and the Multi-Block Direct Convolution (MBDC), which redefines the activation memory layout to improve the memory access pattern. We evaluate BDC, MBDC, the state-of-the-art technique, and a proprietary library on an architecture featuring CPUs with 16,384-bit SIMD registers using ResNet convolutions. Our results show that BDC and MBDC achieve respective speed-ups of 1.44× and 1.28× compared to the state-of-the-art technique for ResNet-101, and 1.83× and 1.63× compared to the proprietary library.

Tue 28 Feb

Displayed time zone: Eastern Time (US & Canada) change

15:40 - 16:40
Session 6: KernelsMain Conference at Montreal 4
Chair(s): Martin Kong The Ohio State University
15:40
20m
Talk
iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures
Main Conference
Zhen Peng William & Mary, Minjia Zhang Microsoft Research, Kai Li Kent State University, Ruoming Jin Kent State University, Bin Ren College of William & Mary
16:00
20m
Talk
WISE: Predicting the Performance of Sparse Matrix Vector Multiplication with Machine Learning
Main Conference
Serif Yesil University of Illinois Urbana-Champaign, Azin Heidarshenas University of Illinois Urbana-Champaign, Adam Morrison Tel Aviv University, Josep Torrellas University of Illinois at Urbana-Champaign
16:20
20m
Talk
Efficient Direct Convolution Using Long SIMD Instructions
Main Conference
Alexandre Santana Barcelona Supercomputing Center, Adrià Armejach Sanosa Barcelona Supercomputing Center & Universitat Politècnica de Catalunya, Marc Casas Barcelona Supercomputing Center