Advanced Vector Extensions

AVX uses sixteen YMM registers to perform a single instruction on multiple pieces of data (see SIMD). Each YMM register can hold and do simultaneous operations (math) on: • eight 32-bit single-precision floating-point numbers or • four 64-bit double-precision floating-point numbers. The width of the SIMD registers is increased from 128 bits to 256 bits, and renamed from XMM0–XMM7 to YMM0–YMM7 (in x86-64 mode, from XMM0–XMM15 to YMM0–YMM15). The legacy SSE instructions can still be utilized via the VEX prefix to operate on the lower 128 bits of the YMM registers. AVX introduces a three-operand SIMD instruction format called VEX coding scheme, where the destination register is distinct from the two source operands. For example, an SSE instruction using the conventional two-operand form can now use a non-destructive three-operand form , preserving both source operands. Originally, AVX's three-operand format was limited to the instructions with SIMD operands (YMM), and did not include instructions with general purpose registers (e.g. EAX). It was later used for coding new instructions on general purpose registers in later extensions, such as BMI. VEX coding is also used for instructions operating on the k0-k7 mask registers that were introduced with AVX-512. The alignment requirement of SIMD memory operands is relaxed. Unlike their non-VEX coded counterparts, most VEX coded vector instructions no longer require their memory operands to be aligned to the vector size. Notably, the VMOVDQA instruction still requires its memory operand to be aligned. The new VEX coding scheme introduces a new set of code prefixes that extends the opcode space, allows instructions to have more than two operands, and allows SIMD vector registers to be longer than 128 bits. The VEX prefix can also be used on the legacy SSE instructions giving them a three-operand form, and making them interact more efficiently with AVX instructions without the need for VZEROUPPER and VZEROALL. The AVX instructions support both 128-bit and 256-bit SIMD. The 128-bit versions can be useful to improve old code without needing to widen the vectorization, and avoid the penalty of going from SSE to AVX; they are also faster on some early AMD implementations of AVX. This mode is sometimes known as AVX-128. Compared to SSE series, AVX further enhanced performance for digital media playback, web browsing, asymmetric encryption, and others. New instructions These AVX instructions are in addition to the ones that are 256-bit extensions of the legacy 128-bit SSE instructions; most are usable on both 128-bit and 256-bit operands. CPUs with AVX • Intel • Sandy Bridge processors (Q1 2011) and newer, except models branded as Celeron and Pentium. • Pentium and Celeron branded processors starting with Tiger Lake (Q3 2020) and newer. • AMD: • Bulldozer processors (Q4 2011) and newer, including Jaguar and Puma. Issues regarding compatibility between future Intel and AMD processors are discussed under XOP instruction set. • VIA: • Nano QuadCore • Eden X4 • Zhaoxin: • WuDaoKou-based processors (KX-5000 and KH-20000) Compiler and assembler support • Absoft supports with - flag. • The Free Pascal compiler supports AVX and AVX2 with the -CfAVX and -CfAVX2 switches from version 2.7.1. • RAD studio (v11.0 Alexandria) supports AVX2 and AVX-512. • The GNU Assembler (GAS) inline assembly functions support these instructions (accessible via GCC), as do Intel primitives and the Intel inline assembler (closely compatible to GAS, although more general in its handling of local references within inline code). GAS supports AVX starting with binutils version 2.19. • GCC starting with version 4.6 (although there was a 4.3 branch with certain support) and the Intel Compiler Suite starting with version 11.1 support AVX. • The Open64 compiler version 4.5.1 supports AVX with - flag. • PathScale supports via the - flag. • The Vector Pascal compiler supports AVX via the - flag. • The Visual Studio 2010/2012 compiler supports AVX via intrinsic and switch. • NASM starting with version 2.03 and newer. There were numerous bug fixes and updates related to AVX in version 2.04. • Other assemblers such as MASM VS2010 version, YASM, FASM and JWASM. Operating system support AVX adds new register-state through the 256-bit wide YMM register file, so explicit operating system support is required to properly save and restore AVX's expanded registers between context switches. The following operating system versions support AVX: • DragonFly BSD: support added in early 2013. • FreeBSD: support added in a patch submitted on January 21, 2012, which was included in the 9.1 stable release. • Linux: supported since kernel version 2.6.30, released on June 9, 2009. • macOS: support added in 10.6.8 (Snow Leopard) update released on June 23, 2011. In fact, macOS Ventura does not support x86 processors without the AVX2 instruction set. • OpenBSD: support added on March 21, 2015. • Solaris: supported in Solaris 10 Update 10 and Solaris 11. • Windows: supported since Windows 7 SP1 and Windows Server 2008 R2 SP1. • Windows Server 2008 R2 SP1 with Hyper-V requires a hotfix to support AMD AVX (Opteron 6200 and 4200 series) processors, KB2568088 • Windows XP and Windows Server 2003 do not support AVX in both kernel drivers and user applications. == Advanced Vector Extensions 2 ==