Small SIMD Matrices for CERN High Throughput Computing
System tracking is an old problem and has been heavily optimized throughout the past. However, in High Energy Physics, many small systems are tracked in real-time using Kalman filtering and no implementation satisfying those constraints currently exists. In this paper, we present a code generator used to speed up the Cholesky factorization and the Kalman filter for small matrices. The generator is easy to use and produces portable and heavily optimized code. We focus on current SIMD architectures (SSE, AVX, AVX512, Neon, SVE, Altivec and VSX). Our Cholesky factorization outperforms any existing libraries: from x3 to x10 faster than the MKL. The Kalman filter is also faster than existing implementations, and achieves a performance of 4e9 iter/s on a 2x24C Intel Xeon.
CERN SIMD slides (CERN-SIMD.pdf) | 1.43MiB |
Sat 24 FebDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:30 - 12:00 | |||
10:30 30mTalk | Small SIMD Matrices for CERN High Throughput Computing WPMVP File Attached | ||
11:00 30mTalk | SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors WPMVP Christopher Rodrigues Huawei America Research Lab, Amarin Phaosawasdi Huawei America Research Lab, Peng Wu Huawei America Research Lab File Attached | ||
11:30 30mTalk | MIPP: a Portable C++ SIMD Wrapper and its use for Error Correction Coding in 5G Standard WPMVP Adrien Cassagne INRIA, Olivier Aumage , Denis Barthou , Camille Leroux INRIA, Christophe Jégo IMS Lab - Institut Polytechnique de Bordeaux File Attached |