ASE2025
HybridSIMD: A Super C++ SIMD Library with Integrated Auto-tuning Capabilities
Haolin Pan, Xulin Zhou, Mingjie Xing, Yanjun Wu
Abstract
Single Instruction, Multiple Data (SIMD) technology is crucial for enhancing computational efficiency in High-Performance Computing (HPC). While C++ SIMD libraries abstract away low-level complexities, their proliferation has led to a fragmented set of libraries, creating significant challenges in both performance and usability for developers. To overcome these library-level limitations, this paper introduces a new collaborative concept for SIMD library design. We present HybridSIMD, a C++ library to embody this principle, resolving fragmentation through a unified interface and an operator-level collaborative back-end that leverages the collective strengths of existing libraries. A built-in auto-tuning engine, featuring a hierarchical search strategy, automatically navigates the rich optimization space created by this collaborative approach to deliver maximum performance without manual intervention. Experimental results across six real-world HPC benchmarks on AVX2, AVX512, and NEON architectures demonstrate HybridSIMD’s superiority. Notably, the highest speedups achieved are 185.34× on AVX2, 97.80× on AVX512, and 71.32× on NEON, showcasing its effectiveness in resolving fragmentation while delivering state-of-the-art performance. Our artifact is available at https://github.com/Panhaolin2001/HybridSIMD.