Vectorization of a spectral finite-element numerical kernel (Application)
In this paper, we present an optimized implementation of the Finite-Element-Methods numerical kernel for SIMD vectorization. A typical application is the modeling of seismic wave propagation. In this case, the computations at the element level are generally based on nested loops where the memory accesses are non-contiguous. Moreover, the back and forth from the element level to the global level (e.g., assembly phase) is a serious brake for automatic vectorization by compilers and for efficient reuse of data at the cache memory levels. This is particularly true when the problem under study relies on an unstructured mesh.
The experiments we have carried out on EFISPEC code that implements the spectral finite-element method to solve the elastodynamic equations show that the intra-node performance may be further improved. We underline that standard compilers such as GNU GCC, Clang and Intel ICC are unable to perform automatic vectorization even when the nested loops were reorganized or when SIMD pragmas were added.
We have extracted the internal forces computation kernel (90% of the execution time) into a proxy application written in C++. Due to the irregular memory access pattern, we introduce a dedicated strategy to squeeze the maximum performance out of the SIMD units. Experiments are carried out on Intel Broadwell and Skylake platforms that respectively offer AVX2 and AVX-512 SIMD units. We believe that our vectorization approach may be generic enough to be adapted to other codes.