The simplest way to understand SIMT is to imagine a multi-core (
MIMD) system, where each core has its own register file, its own
ALUs (both SIMD and Scalar) and its own data cache, but that unlike a standard multi-core system which has multiple independent instruction caches and decoders, as well as multiple independent Program Counter registers, the instructions are synchronously
broadcast to all SIMT cores from a
single unit with a single instruction cache and a single instruction decoder which reads instructions using a single Program Counter. The key difference between SIMT and
SIMD lanes is that each of the Processing Units in the SIMT Array have their own local memory, and may have a completely different Stack Pointer (and thus perform computations on completely different data sets), whereas the ALUs in SIMD lanes know nothing about memory per se, and have no
register file. This is illustrated by the
ILLIAC IV. Each SIMT core was termed a processing element (PE), and each PE had its own separate Memory (PEM). Each PE had an "Index register" which was an address into its PEM. In the
ILLIAC IV the Burroughs B6500 primarily handled I/O, but also sent instructions to the Control Unit (CU), which would then handle the broadcasting to the PEs. Additionally, the B6500, in its role as an I/O processor, had access to
all PEMs. Additionally, each PE may be made active or inactive. If a given PE is inactive it will not execute the instruction broadcast to it by the Control Unit: instead it will sit idle until activated. Each PE can be said to be
Predicated. Also important to note is the difference between SIMT and
SPMD - Single Program Multiple Data. SPMD, like standard multi-core systems, has multiple Program Counters, where SIMT only has one: in the (one) Control Unit. ==History==