Vectorization of a spectral finite-element numerical kernel (Application) (WPMVP 2018 - Workshop on Programming Models for SIMD/Vector Processing)

Who

Sylvain Jubertie, Fabrice Dupros, Florent De Martin

Track

WPMVP 2018

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 24 Feb 2018 09:15 - 09:45 at Europa 5 - WPMVP 2018 Session 1

Abstract

In this paper, we present an optimized implementation of the Finite-Element-Methods numerical kernel for SIMD vectorization. A typical application is the modeling of seismic wave propagation. In this case, the computations at the element level are generally based on nested loops where the memory accesses are non-contiguous. Moreover, the back and forth from the element level to the global level (e.g., assembly phase) is a serious brake for automatic vectorization by compilers and for efficient reuse of data at the cache memory levels. This is particularly true when the problem under study relies on an unstructured mesh.

The experiments we have carried out on EFISPEC code that implements the spectral finite-element method to solve the elastodynamic equations show that the intra-node performance may be further improved. We underline that standard compilers such as GNU GCC, Clang and Intel ICC are unable to perform automatic vectorization even when the nested loops were reorganized or when SIMD pragmas were added.

We have extracted the internal forces computation kernel (90% of the execution time) into a proxy application written in C++. Due to the irregular memory access pattern, we introduce a dedicated strategy to squeeze the maximum performance out of the SIMD units. Experiments are carried out on Intel Broadwell and Skylake platforms that respectively offer AVX2 and AVX-512 SIMD units. We believe that our vectorization approach may be generic enough to be adapted to other codes.

Sylvain Jubertie

Laboratoire d'Informatique Fondamentale d'Orleans

Fabrice Dupros

BRGM

Florent De Martin