X86-64

History AMD64 (also variously referred to by AMD in their literature and documentation as "AMD 64-bit Technology" and "AMD x86-64 Architecture") was created as an alternative to the radically different IA-64 architecture designed by Intel and Hewlett-Packard, which was backward-incompatible with IA-32, the 32-bit extension of the x86 architecture. AMD originally announced AMD64 in 1999 with a full specification available in August 2000. As AMD was never invited to be a contributing party for the IA-64 architecture and any kind of licensing seemed unlikely, the AMD64 architecture was positioned by AMD from the beginning as an evolutionary way to add 64-bit computing capabilities to the existing x86 architecture while supporting legacy 32-bit x86 code, as opposed to Intel's approach of creating an entirely new, completely x86-incompatible 64-bit architecture with IA-64. The first AMD64-based processor, the Opteron, was released in April 2003. Implementations AMD's processors implementing the AMD64 architecture include Opteron, Athlon 64, Athlon 64 X2, Athlon 64 FX, Athlon II (followed by "X2", "X3", or "X4" to indicate the number of cores, and XLT models), Turion 64, Turion 64 X2, Sempron ("Palermo" E6 stepping and all "Manila" models), Phenom (followed by "X3" or "X4" to indicate the number of cores), Phenom II (followed by "X2", "X3", "X4" or "X6" to indicate the number of cores), FX, Fusion/APU and Ryzen/Epyc. Architectural features The primary defining characteristic of AMD64 is the availability of 64-bit general-purpose processor registers (for example, ), 64-bit integer arithmetic and logical operations, and 64-bit virtual addresses. The designers took the opportunity to make other improvements as well. Notable changes in the 64-bit extensions include: ; 64-bit integer capability : All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc., can operate directly on 64-bit integers. Pushes and pops on the stack default to 8-byte strides, and pointers are 8 bytes wide. ; Additional registers : In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight (i.e. , , , , , , , ) in x86 to 16 (i.e. , , , , , , , , , , , , , , , ). It is therefore possible to keep more local variables in registers rather than on the stack, and to let registers hold frequently accessed constants; arguments for small and fast subroutines may also be passed in registers to a greater extent. : AMD64 still has fewer registers than many RISC instruction sets (e.g. Power ISA has 32 GPRs; 64-bit ARM, RISC-V I, SPARC, Alpha, MIPS, and PA-RISC have 31) or VLIW-like machines such as the IA-64 (which has 128 registers). However, an AMD64 implementation may have far more internal registers than the number of architectural registers exposed by the instruction set (see register renaming). (For example, AMD Zen cores have 168 64-bit integer and 160 128-bit vector floating-point physical internal registers.) ; Additional XMM (SSE) registers : Similarly, the number of 128-bit XMM registers (used for Streaming SIMD instructions) is also increased from 8 to 16. : The traditional x87 FPU register stack is not included in the register file size extension in 64-bit mode, compared with the XMM registers used by SSE2, which did get extended. The x87 register stack is not a simple register file although it does allow direct access to individual registers by low cost exchange operations. ; Larger virtual address space : The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations. This is compared to just 4 GiB (232 bytes) for the x86. : This means that very large files can be operated on by mapping the entire file into the process address space (which is often much faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space. ; Larger physical address space : The original implementation of the AMD64 architecture implemented 40-bit physical addresses and so could address up to 1 TiB (240 bytes) of RAM. and therefore can address up to 256 TiB (248 bytes) of RAM. The architecture permits extending this to 52 bits in the future (limited by the page table entry format); or 4 GiB of RAM without PAE mode. SSE3 instructions and later Streaming SIMD Extensions instruction sets are not standard features of the architecture. ; No-Execute bit : The No-Execute bit or NX bit (bit 63 of the page table entry) allows the operating system to specify which pages of virtual address space can contain executable code and which cannot. An attempt to execute code from a page tagged "no execute" will result in a memory access violation, similar to an attempt to write to a read-only page. This should make it more difficult for malicious code to take control of the system via "buffer overrun" or "unchecked buffer" attacks. A similar feature has been available on x86 processors since the 80286 as an attribute of segment descriptors; however, this works only on an entire segment at a time. : Segmented addressing has long been considered an obsolete mode of operation, and all current PC operating systems in effect bypass it, setting all segments to a base address of zero and (in their 32-bit implementation) a size of 4 GiB. AMD was the first x86-family vendor to implement no-execute in linear addressing mode. The feature is also available in legacy mode on AMD64 processors, and recent Intel x86 processors, when PAE is used. ; Removal of older features : A few "system programming" features of the x86 architecture were either unused or underused in modern operating systems and are either not available on AMD64 in long (64-bit and compatibility) mode, or exist only in limited form. These include segmented addressing (although the FS and GS segments are retained in vestigial form for use as extra-base pointers to operating system structures), Windows did not support the entire 48-bit address space until Windows 8.1, which was released in October 2013. Further extensions may allow full 64-bit virtual address space and physical memory with 12-bit page table descriptors and 16- or 21-bit memory offsets for 64 KiB and 2 MiB page allocation sizes; the page table entry would be expanded to 128 bits to support additional hardware flags for page size and virtual address space size. Operating system limits The operating system can also limit the virtual address space. Details, where applicable, are given in the "Operating system compatibility and characteristics" section. Physical address space details Current AMD64 processors support a physical address space of up to 248 bytes of RAM, or 256 TiB. The operating system may place additional limits on the amount of RAM that is usable or supported. Details on this point are given in the "Operating system compatibility and characteristics" section of this article. Operating modes The architecture has two primary modes of operation: long mode and legacy mode. Long mode Long mode is the architecture's intended primary mode of operation; it is a combination of the processor's native 64-bit mode and a combined 32-bit and 16-bit compatibility mode. It is used by 64-bit operating systems. Under a 64-bit operating system, 64-bit programs run under 64-bit mode, and 32-bit and 16-bit protected mode applications (that do not need to use either real mode or virtual 8086 mode in order to execute at any time) run under compatibility mode. Real-mode programs and programs that use virtual 8086 mode at any time cannot be run in long mode unless those modes are emulated in software. However, such programs may be started from an operating system running in long mode on processors supporting VT-x or AMD-V by creating a virtual processor running in the desired mode. Since the basic instruction set is the same, there is almost no performance penalty for executing protected mode x86 code. This is unlike Intel's IA-64, where differences in the underlying instruction set mean that running 32-bit code must be done either in emulation of x86 (making the process slower) or with a dedicated x86 coprocessor. However, on the x86-64 platform, many x86 applications could benefit from a 64-bit recompile, due to the additional registers in 64-bit code and guaranteed SSE2-based FPU support, which a compiler can use for optimization. However, applications that regularly handle integers wider than 32 bits, such as cryptographic algorithms, will need a rewrite of the code handling the huge integers in order to take advantage of the 64-bit registers. Legacy mode Legacy mode is the mode that the processor is in when it is not in long mode. In this mode, the processor acts like an older x86 processor, and only 16-bit and 32-bit code can be executed. Legacy mode allows for a maximum of 32 bit virtual addressing which limits the virtual address space to 4 GiB. 64-bit programs cannot be run from legacy mode. Protected mode Protected mode is made into a submode of legacy mode. It is the submode that 32-bit operating systems and 16-bit protected mode operating systems operate in when running on an x86-64 CPU. Real mode Real mode is the initial mode of operation when the processor is initialized, and is a submode of legacy mode. It is backwards compatible with the original Intel 8086 and Intel 8088 processors. Real mode is primarily used today by operating system bootloaders, which are required by the architecture to configure virtual memory details before transitioning to higher modes. This mode is also used by any operating system that needs to communicate with the system firmware with a traditional BIOS-style interface. == Intel 64 ==