Boosting Performance and QoS for Concurrent GPU B+trees by Combining-based Synchronization
Concurrent B+trees have been widely used in many systems. With the scale of data requests increasing exponentially, the systems are facing tremendous performance pressure. GPU has shown its potential to accelerate concurrent B+trees performance. When many concurrent requests are processed, the conflicts should be detected and resolved. Prior methods guarantee the correctness of concurrent GPU B+trees through lock-based or software transactional memory (STM)-based approaches. However, these methods complicate the request processing logic, increase the number of memory accesses and bring execution path divergence. They lead to performance degradation and variance in response time increasing. Moreover, previous methods do not guarantee linearizability among concurrent requests. In this paper, we design a combined-based concurrency control framework, called Eirene, for GPU B+tree to reduce the overhead of conflict detection and resolution. First, a combining-based synchronization method is designed to combine and issue requests. It combines the requests with the same key, constructs their dependence, decides the issued request, and determines their return values. Since only one request for each key is issued, key conflicts are eliminated. Then, an optimistic STM method is used to reduce structure conflicts. The query and the update requests are partitioned into different kernels. For the update kernels, STM is involved only when the number of the retry reaches a threshold. Finally, a locality-aware warp reorganization optimization is proposed to improve memory behavior and reduce conflicts by exploiting the locality among requests. Evaluations on an NVIDIA A100 GPU show that Eirene is efficient (a throughput of 2.4 billion per second) and can guarantee linearizability. Compared to the state-of-the-art GPU B+tree, it can achieve a speedup of 7.43X and reduce the response time variance from 36% to 5%.
Mon 27 FebDisplayed time zone: Eastern Time (US & Canada) change
10:00 - 12:00 | |||
10:00 20mTalk | Boosting Performance and QoS for Concurrent GPU B+trees by Combining-based Synchronization Main Conference Weihua Zhang Fudan University, Chuanlei Zhao Fudan University, Lu Peng Tulane University, Yuzhe Lin Fudan University, Fengzhe Zhang Fudan University, Yunping Lu Fudan University | ||
10:20 20mTalk | The State-of-the-Art LCRQ Concurrent Queue Algorithm Does NOT Require CAS2 Main Conference | ||
10:40 20mTalk | Provably Good Randomized Strategies for Data Placement in Distributed Key-Value Stores Main Conference Zhe Wang Washington University in St. Louis, Jinhao Zhao Washington University in St. Louis, Kunal Agrawal Washington University in St. Louis, USA, Jing Li New Jersey Institute of Technology, He Liu Apple Inc., Meng Xu Apple Inc. | ||
11:00 20mTalk | 2PLSF: Two-Phase Locking with Starvation-Freedom Main Conference Pedro Ramalhete Cisco Systems, Andreia Correia University of Neuchâtel, Pascal Felber University of Neuchâtel | ||
11:20 20mTalk | Provably Fast and Space-Efficient Parallel Biconnectivity Main Conference Xiaojun Dong University of California, Riverside, Letong Wang University of California, Riverside, Yan Gu UC Riverside, Yihan Sun University of California, Riverside | ||
11:40 20mTalk | Practically and Theoretically Efficient Garbage Collection for Multiversioning Main Conference Yuanhao Wei Carnegie Mellon University, USA, Guy E. Blelloch Carnegie Mellon University, USA, Panagiota Fatourou FORTH ICS and University of Crete, Greece, Eric Ruppert York University |