Gather/scatter (vector addressing)

Gather/scatter is a type of memory addressing that at once collects (gathers) from, or stores (scatters) data to, multiple arbitrary memory indices. Examples of its use include sparse linear algebra operations, sorting algorithms, fast Fourier transforms, and some computational graph theory problems. It is the vector equivalent of register indirect addressing, with gather involving indexed reads, and scatter, indexed writes. Vector processors have hardware support for gather and scatter operations, as do many input/output systems, allowing large data sets to be transferred to main memory more rapidly.

Definitions

Gather A sparsely populated vector y (with dimension M) holding N non-empty elements can be represented by two densely populated vectors of length N; x containing the non-empty elements of y, and idx giving the index in y where x's element is located. The gather of y into x, denoted x \leftarrow y|_x, assigns x(i)=y(idx(i)) with idx having already been calculated. Assuming no pointer aliasing between x[], y[],idx[], a C implementation is for (i = 0; i Scatter The sparse scatter, denoted y|_x \leftarrow x is the reverse operation. It copies the values of x into the corresponding locations in the sparsely populated vector y, i.e. y(idx(i))=x(i). for (i = 0; i == Support ==

Support

Scatter/gather units were also a part of most vector computers, notably the Cray X-MP and its follow-ons. In this case, the purpose was to efficiently store values in the limited resource of the vector registers. For instance, the Cray-1 had eight 64-word vector registers, so data that contained values that had no effect on the outcome, like zeros in an addition, were using up valuable space that would be better used. By gathering non-zero values into the registers, and scattering the results back out, the registers could be used much more efficiently, leading to higher performance. However the Cray-1 vector memory reference instructions could only access memory in "constant stride" - which allowed fast access of contiguous data (stride 1) or by some other constant increment. With the introduction of gather and scatter instructions in the X-MP, this restriction was eliminated. This basic layout was widely copied in later supercomputer designs, especially on the variety of models from Japan. As microprocessor design improved during the 1990s, commodity CPUs began to add vector processing units. At first these tended to be simple, sometimes overlaying the CPU's general purpose registers, but over time these evolved into increasingly powerful systems that met and then surpassed the units in high-end supercomputers. By this time, scatter/gather instructions had been added to many of these designs. x86-64 CPUs which support the AVX2 instruction set can gather 32-bit and 64-bit elements with memory offsets from a base address. A second register determines whether the particular element is loaded, and faults occurring from invalid memory accesses by masked-out elements are suppressed. The AVX-512 instruction set also contains (potentially masked) scatter operations. The ARM instruction set's Scalable Vector Extension includes gather and scatter operations on 8-, 16-, 32- and 64-bit elements. InfiniBand has hardware support for gather/scatter. Without instruction-level gather/scatter, efficient implementations may need to be tuned for optimal performance, for example with prefetching; libraries such as OpenMPI may provide such primitives. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com