MarketIterative Stencil Loops
Company Profile

Iterative Stencil Loops

Iterative Stencil Loops (ISLs) or Stencil computations are a class of numerical data processing solution which update array elements according to some fixed pattern, called a stencil. They are most commonly found in computer simulations, e.g. for computational fluid dynamics in the context of scientific and engineering applications. Other notable examples include solving partial differential equations, the Jacobi kernel, the Gauss–Seidel method, image processing and cellular automata. The regular structure of the arrays sets stencil techniques apart from other modeling methods such as the Finite element method. Most finite difference codes which operate on regular grids can be formulated as ISLs.

Definition
ISLs perform a sequence of sweeps (called timesteps) through a given array. More formally, we may define ISLs as a 5-tuple (I, S, S_0, s, T) with the following meaning: • I = \prod_{i=1}^k [0, \ldots, n_i] is the index set. It defines the topology of the array. • S is the (not necessarily finite) set of states, one of which each cell may take on any given timestep. • S_0\colon \Z^k \to S defines the initial state of the system at time 0. • s \in \prod_{i=1}^l \Z^k is the stencil itself and describes the actual shape of the neighborhood. There are l elements in the stencil. • T\colon S^l \to S is the transition function which is used to determine a cell's new state, depending on its neighbors. Since I is a k-dimensional integer interval, the array will always have the topology of a finite regular grid. The array is also called simulation space and individual cells are identified by their index c \in I. The stencil is an ordered set of l relative coordinates. We can now obtain for each cell c the tuple of its neighbors indices I_c : I_c = \{j \mid \exists x \in s: j = c + x\} \, Their states are given by mapping the tuple I_c to the corresponding tuple of states N_i(c), where N_i\colon I \to S^l is defined as follows: : N_i(c) = (s_1, \ldots, s_l) \text{ with } s_j = S_i(I_c(j)) \, This is all we need to define the system's state for the following time steps S_{i+1}\colon \Z^k \to S with i \in \N: : S_{i+1}(c) = \begin{cases}T(N_i(c)), & c \in I\\ S_i(c), & c \in \Z^k \setminus I \end{cases} Note that S_i is defined on \Z^k and not just on I since the boundary conditions need to be set, too. Sometimes the elements of I_c may be defined by a vector addition modulo the simulation space's dimension to realize toroidal topologies: : I_c = \{j \mid \exists x \in s: j = ((c + x) \mod(n_1, \ldots, n_k))\} This may be useful for implementing periodic boundary conditions, which simplifies certain physical models. Example: 2D Jacobi iteration To illustrate the formal definition, we'll have a look at how a two dimensional Jacobi iteration can be defined. The update function computes the arithmetic mean of a cell's four neighbors. In this case we set off with an initial solution of 0. The left and right boundary are fixed at 1, while the upper and lower boundaries are set to 0. After a sufficient number of iterations, the system converges against a saddle-shape. : \begin{align} I & = [0, \ldots, 99]^2 \\ S & = \R \\ S_0 &: \Z^2 \to \R \\ S_0((x, y)) & = \begin{cases} 1, & x {{multiple image | width = 100 | align = center | image_gap = 10 | footer = 2D Jacobi Iteration on a 100^2 Array | image1 = 2D_Jacobi_t_0000.png | alt1 = S_0 | caption1 = S_{0} | image2 = 2D_Jacobi_t_0200.png | alt2 = S_200 | caption2 = S_{200} | image3 = 2D_Jacobi_t_0400.png | alt3 = S_400 | caption3 = S_{400} | image4 = 2D_Jacobi_t_0600.png | alt4 = S_600 | caption4 = S_{600} | image5 = 2D_Jacobi_t_0800.png | alt5 = S_800 | caption5 = S_{800} | image6 = 2D_Jacobi_t_1000.png | alt6 = S_1000 | caption6 = S_{1000} }} ==Stencils==
Stencils
The shape of the neighborhood used during the updates depends on the application itself. The most common stencils are the 2D or 3D versions of the von Neumann neighborhood and Moore neighborhood. The example above uses a 2D von Neumann stencil while LBM codes generally use its 3D variant. Conway's Game of Life uses the 2D Moore neighborhood. That said, other stencils such as a 25-point stencil for seismic wave propagation can be found, too. ==Implementation issues==
Implementation issues
Many simulation codes may be formulated naturally as ISLs. Since computing time and memory consumption grow linearly with the number of array elements, parallel implementations of ISLs are of paramount importance to research. This is challenging since the computations are tightly coupled (because of the cell updates depending on neighboring cells) and most ISLs are memory bound (i.e. the ratio of memory accesses to calculations is high). Virtually all current parallel architectures have been explored for executing ISLs efficiently; at the moment GPGPUs have proven to be most efficient. ==Libraries==
Libraries
Due to both the importance of ISLs to computer simulations and their high computational requirements, there are a number of efforts which aim at creating reusable libraries to support scientists in performing stencil-based computations. The libraries are mostly concerned with the parallelization, but may also tackle other challenges, such as IO, steering and checkpointing. They may be classified by their APIs. Patch-based libraries This is a traditional design. The library manages a set of n-dimensional scalar arrays, which the user program may access to perform updates. The library handles the synchronization of the boundaries (dubbed ghost zone or halo). The advantage of this interface is that the user program may loop over the arrays, which makes it easy to integrate legacy code . The disadvantage is that the library can not handle cache blocking (as this has to be done within the loops) or wrapping of the API-calls for accelerators (e.g. via CUDA or OpenCL). Implementations include Cactus, a physics problem solving environment, and waLBerla. Cell-based libraries These libraries move the interface to updating single simulation cells: only the current cell and its neighbors are exposed, e.g. via getter/setter methods. The advantage of this approach is that the library can control tightly which cells are updated in which order, which is useful not only to implement cache blocking, This approach requires the user to recompile the source code together with the library. Otherwise a function call for every cell update would be required, which would seriously impair performance. This is only feasible with techniques such as class templates or metaprogramming, which is also the reason why this design is only found in newer libraries. Examples are Physis and LibGeoDecomp. • Advanced Simulation LibraryFinite difference methodComputer simulationFive-point stencilCompact stencilNon-compact stencilStencil jumpingStencil (numerical analysis) ==References==
tickerdossier.comtickerdossier.substack.com