x87 Early x86 processors could be extended with
floating-point hardware in the form of a series of floating-point
numerical co-processors with names like
8087, 80287 and 80387, abbreviated x87. This was also known as the NPX (
Numeric Processor eXtension), an apt name since the coprocessors, while used mainly for floating-point calculations, also performed integer operations on both binary and decimal formats. With very few exceptions, the 80486 and subsequent x86 processors then integrated this x87 functionality on chip which made the x87 instructions a
de facto integral part of the x86 instruction set. Each x87 register, known as ST(0) through ST(7), is 80 bits wide and stores numbers in the
IEEE floating-point standard double extended precision format. These registers are organized as a stack with ST(0) as the top. This was done in order to conserve opcode space, and the registers are therefore randomly accessible only for either operand in a register-to-register instruction; ST0 must always be one of the two operands, either the source or the destination, regardless of whether the other operand is ST(x) or a memory operand. However, random access to the stack registers can be obtained through an instruction which exchanges any specified ST(x) with ST(0). The operations include arithmetic and transcendental functions, including trigonometric and exponential functions, and instructions that load common constants (such as 0; 1; e, the base of the natural logarithm; log2(10); and log10(2)) into one of the stack registers. While the integer ability is often overlooked, the x87 can operate on larger integers with a single instruction than the 8086, 80286, 80386, or any x86 CPU without to 64-bit extensions can, and repeated integer calculations even on small values (e.g., 16-bit) can be accelerated by executing integer instructions on the x86 CPU and the x87 in parallel. (The x86 CPU keeps running while the x87 coprocessor calculates, and the x87 sets a signal to the x86 when it is finished or interrupts the x86 if it needs attention because of an error.)
PAE The
Physical Address Extension (PAE) was first added in the Intel
Pentium Pro, and later by
AMD in the Athlon processors, to allow up to 64 GB of RAM to be addressed. Without PAE, physical RAM in 32-bit protected mode is usually limited to 4
GB. PAE defines a different page table structure with wider page table entries and a third level of page table, allowing additional bits of physical address. Although the initial implementations on 32-bit processors theoretically supported up to 64 GB of RAM, chipset and other platform limitations often restricted what could actually be used.
x86-64 processors define page table structures that theoretically allow up to 52 bits of physical address, although again, chipset and other platform concerns (like the number of DIMM slots available, and the maximum RAM possible per DIMM) prevent such a large physical address space to be realized. On x86-64 processors PAE mode must be active before the switch to
long mode, and must remain active while
long mode is active, so while in long mode there is no "non-PAE" mode. PAE mode does not affect the width of linear or virtual addresses.
MMX MMX is a
SIMD instruction set designed by Intel and introduced in 1997 for the
Pentium MMX microprocessor. The MMX instruction set was developed from a similar concept first used on the
Intel i860. It is supported on most subsequent IA-32 processors by Intel and other vendors. MMX is typically used for video processing (in multimedia applications, for instance). MMX added 8 new registers to the architecture, known as MM0 through MM7 (henceforth referred to as
MMn). In reality, these new registers were just aliases for the existing x87 FPU stack registers. Hence, anything that was done to the floating-point stack would also affect the MMX registers. Unlike the FP stack, these MMn registers were fixed, not relative, and therefore they were randomly accessible. The instruction set did not adopt the stack-like semantics so that existing operating systems could still correctly save and restore the register state when multitasking without modifications. The introduction of this technology coincided with the rise of
3D entertainment applications and was designed to improve the CPU's
vector processing performance of graphic-intensive applications. 3D video game developers and 3D graphics hardware vendors use 3DNow! to enhance their performance on AMD's
K6 and
Athlon series of processors. 3DNow! was designed to be the natural evolution of MMX from integers to floating point. As such, it uses exactly the same register naming convention as MMX, that is MM0 through MM7. The only difference is that instead of packing integers into these registers, two
single-precision floating-point numbers are packed into each register. The advantage of aliasing the FPU registers is that the same instruction and data structures used to save the state of the FPU registers can also be used to save 3DNow! register states. Thus no special modifications are required to be made to operating systems which would otherwise not know about them.
SSE In 1999, Intel introduced the Streaming SIMD Extensions (SSE)
instruction set, following in 2000 with SSE2. The first addition allowed offloading of basic floating-point operations from the x87 stack and the second made MMX almost obsolete and allowed the instructions to be realistically targeted by conventional compilers. Introduced in 2004 along with the
Prescott revision of the
Pentium 4 processor, SSE3 added specific memory and
thread-handling instructions to boost the performance of Intel's
HyperThreading technology. AMD licensed the SSE3 instruction set and implemented most of the SSE3 instructions for its revision E and later Athlon 64 processors. The Athlon 64 does not support HyperThreading and lacks those SSE3 instructions used only for HyperThreading. SSE discarded all legacy connections to the FPU stack. This also meant that this instruction set discarded all legacy connections to previous generations of SIMD instruction sets like MMX. But it freed the designers up, allowing them to use larger registers, not limited by the size of the FPU registers. The designers created eight 128-bit registers, named XMM0 through XMM7. (In
AMD64, the number of SSE XMM registers has been increased from 8 to 16.) However, the downside was that operating systems had to have an awareness of this new set of instructions in order to be able to save their register states. So Intel created a slightly modified version of Protected mode, called Enhanced mode which enables the usage of SSE instructions, whereas they stay disabled in regular Protected mode. An OS that is aware of SSE will activate Enhanced mode, whereas an unaware OS will only enter into traditional Protected mode. SSE is a SIMD instruction set that works only on floating-point values, like 3DNow!. However, unlike 3DNow! it severs all legacy connection to the FPU stack. Because it has larger registers than 3DNow!, SSE can pack twice the number of
single precision floats into its registers. The original SSE was limited to only single-precision numbers, like 3DNow!. The SSE2 introduced the capability to pack
double precision numbers too, which 3DNow! had no possibility of doing since a double precision number is 64-bit in size which would be the full size of a single 3DNow! MMn register. At 128 bits, the SSE XMMn registers could pack two double precision floats into one register. Thus SSE2 is much more suitable for scientific calculations than either SSE1 or 3DNow!, which were limited to only single precision. SSE3 does not introduce any additional registers. While IA-64 was incompatible with x86, the Itanium processor did provide
emulation abilities for translating x86 instructions into IA-64, but this affected the performance of x86 programs so badly that it was rarely, if ever, actually useful to the users: programmers should rewrite x86 programs for the IA-64 architecture or their performance on Itanium would be orders of magnitude worse than on a true x86 processor. The market rejected the Itanium processor since it broke
backward compatibility and preferred to continue using x86 chips, and very few programs were rewritten for IA-64. AMD decided to take another path toward 64-bit memory addressing, making sure backward compatibility would not suffer. In April 2003, AMD released the first x86 processor with 64-bit general-purpose registers, the
Opteron, capable of addressing much more than 4
GB of virtual memory using the new
x86-64 extension (also known as AMD64 or x64). The 64-bit extensions to the x86 architecture were enabled only in the newly introduced
long mode, therefore 32-bit and 16-bit applications and operating systems could simply continue using an AMD64 processor in protected or other modes, without even the slightest sacrifice of performance and with full compatibility back to the original instructions of the 16-bit Intel 8086. The market responded positively, adopting the 64-bit AMD processors for both high-performance applications and business or home computers. Seeing the market rejecting the incompatible Itanium processor and Microsoft supporting AMD64, Intel had to respond and introduced its own x86-64 processor, the
Prescott Pentium 4, in July 2004. As a result, the Itanium processor with its IA-64 instruction set is rarely used and x86, through its x86-64 incarnation, is still the dominant CPU architecture in non-embedded computers. x86-64 also introduced the
NX bit, which offers some protection against security bugs caused by
buffer overruns. As a result of AMD's 64-bit contribution to the x86 lineage and its subsequent acceptance by Intel, the 64-bit RISC architectures ceased to be a threat to the x86 ecosystem and almost disappeared from the workstation market. x86-64 began to be utilized in powerful
supercomputers (in its
AMD Opteron and
Intel Xeon incarnations), a market which was previously the natural habitat for 64-bit RISC designs (such as the
IBM Power microprocessors or
SPARC processors). The great leap toward 64-bit computing and the maintenance of backward compatibility with 32-bit and 16-bit software enabled the x86 architecture to become an extremely flexible platform today, with x86 chips being utilized from small low-power systems (for example,
Intel Quark and
Intel Atom) to fast gaming desktop computers (for example,
Intel Core i7 and
AMD FX/
Ryzen), and even dominate large supercomputing
clusters, effectively leaving only the
ARM 32-bit and 64-bit RISC architecture as a competitor in the
smartphone and
tablet market.
AMD-V and VT-x Prior to 2005, x86 architecture processors were unable to meet the
Popek and Goldberg virtualization requirements – a set of conditions for efficient virtualization created in 1974 by
Gerald J. Popek and
Robert P. Goldberg. However, both proprietary and open-source
x86 virtualization hypervisor products were developed using
software-based virtualization. Proprietary systems include
Hyper-V,
Parallels Workstation,
VMware ESX,
VMware Workstation,
VMware Workstation Player and
Windows Virtual PC, while
free and open-source systems include
QEMU,
Kernel-based Virtual Machine,
VirtualBox, and
Xen. The introduction of the AMD-V and Intel VT-x instruction sets in 2005 allowed x86 processors to meet the Popek and Goldberg virtualization requirements.
AES-NI The Advanced Encryption Standard New Instructions (AES-NI) instruction set extension is designed to accelerate
AES encryption and decryption operations. It was first proposed by Intel in 2008.
AVX The Advanced Vector Extensions (AVX) doubled the size of SSE registers to 256-bit YMM registers. It also introduced the VEX coding scheme to accommodate the larger registers, plus a few instructions to permute elements. AVX2 did not introduce extra registers, but was notable for the addition for masking,
gather, and shuffle instructions. AVX-512 features yet another expansion to 32 512-bit ZMM registers and a new EVEX scheme. Unlike its predecessors featuring a monolithic extension, it is divided into many subsets that specific models of CPUs can choose to implement.
APX The Advanced Performance Extensions (APX) are extensions to double the number of general-purpose registers from 16 to 32 and add new features to improve general-purpose performance. These extensions have been called "generational" and "the biggest x86 addition since 64 bits". Intel contributed APX support to
GNU Compiler Collection (GCC) 14. Also Microsoft
Visual Studio 2026 added APX support. According to the architecture specification, the main features of APX are: • 16 additional general-purpose registers R16-R31, called the Extended GPRs (EGPRs) • Three-operand instruction formats for many integer instructions • New conditional instructions for loads, stores, and comparisons with common instructions that do not modify flags • Optimized register save/restore operations • A 64-bit absolute direct jump instruction; 32-bit operating systems and 32-bit applications cannot invoke APX Extended GPRs for general purpose instructions are encoded using a 2-byte
REX2 prefix, while new instructions and extended operands for existing
AVX/
AVX2/
AVX-512 instructions are encoded with an
extended EVEX prefix which has four variants used for different groups of instructions. ==See also==