AMX was introduced by Intel in June 2020 and first supported by Intel with the
Sapphire Rapids microarchitecture for
Xeon servers, released in January 2023. It introduced 2-dimensional
registers called tiles upon which accelerators can perform operations. It is intended as an extensible architecture; the first accelerator implemented is called tile matrix multiply unit (TMUL). In Intel Architecture Instruction Set Extensions and Future Features revision 46, published in September 2022, a new AMX-FP16 extension was documented. This extension adds support for
half-precision floating-point numbers. In revision 48 from March 2023, AMX-COMPLEX was documented, adding support for half-precision floating-point
complex numbers. Both extensions are available in the
Granite Rapids set of server processors (with AMX-COMPLEX support only being available in
Granite Rapids-D ).
Tile matrix multiply unit TMUL unit supports
BF16 and
INT8 input types. AMX-FP16 and AMX-COMPLEX also add support for real and complex
FP16 numbers. The register file consists of 8 tiles, each with 16 rows of size of 64 bytes (32 BF16/FP16 or 64 INT8 elements). The only supported operation is matrix
multiply and accumulate (MMA), which is the extension of the
fused multiply–add (FMA) operation for scalar values as applied to matrix operands: C_{nm} = C_{nm} + \sum_{j=1}^J A_{nj}B_{jm}. the maximal input sizes are 16 \times J for and J \times 16 for , where is 64 for INT8 and 32 for BF16. The matrix multiplication requires 256J multiplication and 256J additions, thus performing 512J operations in 16 cycles. == Software support ==