POSTER: AArch64 Atomics: Might they be harming your performance? (PPoPP 2023 - Main Conference) - PPoPP 2023

Sat 25 February - Wed 1 March 2023 Montreal, Canada

Who

Ricardo Jesus, Michele Weiland

Track

PPoPP 2023 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Sun 26 Feb 2023 18:00 - 20:00 at Salon Ville-Marie - Reception and Poster Session

Abstract

Atomic operations are indivisible operations guaranteed to execute as a whole. One of the most important and widely used atomic operations is “compare-and-swap” (CAS), which allows threads to perform concurrent read-modify-write operations on the same memory location, free of data races. On recent Arm architectures, CAS operations can be implemented either directly via CAS instructions, or via load-linked/store-conditional (LL-SC) instruction pairs.

In this work we explore the performance of the CAS and LL-SC approaches to implement CAS operations on recent high-performance AArch64 CPUs, namely the A64FX, ThunderX2 (TX2), and Graviton3. We observe that these instructions can lead to fundamentally different performance profiles. On A64FX, for example, the newer CAS instructions—often preferred by compilers over the older LL-SC pairs—can lead to a quadratic increase in average time per successful CAS operation as the number of threads increases, whereas the older LL-SC pairs show the expected linear increase. For high thread counts, this translates into LL-SC being more than 20$x$ faster than CAS. On TX2 and Graviton3, LL-SC can bring more conservative (but still significant) 2–3$x$ speedups.

Ricardo Jesus

EPCC, The University of Edinburgh

Michele Weiland

EPCC, The University of Edinburgh

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Sun 26 Feb
Displayed time zone: Eastern Time (US & Canada) change

	18:00 - 20:00	Reception and Poster SessionMain Conference at Salon Ville-Marie

	18:00 2h Poster		POSTER: Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU Main Conference Muhammad Osama University of California, Davis, Duane Merrill NVIDIA Corporation, Cris Cecka NVIDIA Corporation, Michael Garland NVIDIA, John D. Owens University of California, Davis Pre-print
	18:00 2h Poster		POSTER: Unexpected Scaling in Path Copying Trees Main Conference Vitaly Aksenov Inria & ITMO University, Trevor Brown University of Toronto, Alexander Fedorov IST Austria, Ilya Kokorin ITMO University
	18:00 2h Poster		POSTER: Transactional Composition of Nonblocking Data Structures Main Conference Wentao Cai University of Rochester, Haosen Wen University of Rochester, Michael L. Scott University of Rochester
	18:00 2h Poster		POSTER: The ERA Theorem for Safe Memory Reclamation Main Conference Gali Sheffi Technion - Israel, Erez Petrank Technion
	18:00 2h Poster		POSTER: AArch64 Atomics: Might they be harming your performance? Main Conference Ricardo Jesus EPCC, The University of Edinburgh, Michele Weiland EPCC, The University of Edinburgh
	18:00 2h Poster		POSTER: Fast Parallel Exact Inference on Bayesian Networks Main Conference Jiantong Jiang The University of Western Australia, Zeyi Wen The Hong Kong University of Science and Technology (Guangzhou), Atif Mansoor The University of Western Australia, Ajmal Mian The University of Western Australia
	18:00 2h Poster		POSTER: High-Throughput GPU Random Walk with Fine-tuned Concurrent Query Processing Main Conference Cheng Xu Shanghai Jiao Tong University, Chao Li Shanghai Jiao Tong University, Pengyu Wang Shanghai Jiao Tong University, Xiaofeng Hou Hong Kong University of Science and Technology, Jing Wang Shanghai Jiao Tong University, Shixuan Sun National University of Singapore, Minyi Guo Shanghai Jiao Tong University, Hanqing Wu Alibaba Inc, Dongbai Chen Alibaba Inc, Xiangwen Liu Alibaba Inc
	18:00 2h Poster		POSTER: Efficient All-reduce for Distributed DNN Training in Optical Interconnect Systems Main Conference Fei Dai University of Otago, Yawen Chen University of Otago, Zhiyi Huang University of Otago, Haibo Zhang University of Otago, Fangfang Zhang Qilu University of Technology
	18:00 2h Poster		POSTER: CuPBoP: A framework to make CUDA portable Main Conference Ruobing Han Georgia Institute of Technology, Jun Chen Georgia Institute of Technology, Bhanu Garg Georgia Institute of Technology, Jeffrey Young Georgia Institute of Technology, Jaewoong Sim Seoul National University, Hyesoon Kim Georgia Tech
	18:00 2h Poster		POSTER: Generating Fast FFT Kernels on CPUs via FFT-Specific Intrinsics Main Conference Zhihao Li SKLP, Institute of Computing Technology, CAS, Haipeng Jia SKLP, Institute of Computing Technology, CAS, Yunquan Zhang SKLP, Institute of Computing Technology, CAS, Yuyan Sun Huawei Technologies Co., Ltd, Yiwei Zhang SKLP, Institute of Computing Technology, CAS, Tun Chen SKLP, Institute of Computing Technology, CAS
	18:00 2h Poster		POSTER: Learning to Parallelize in a Shared-Memory Environment with Transformers Main Conference Re'em Harel , Yuval Pinter , Gal Oren Technion - Israel Institute of Technology Pre-print