Characteristics of the microarchitecture include the following: • Form factors •
Socket AM2+ with
DDR2 for the 65 nm Phenom and Athlon 7000 Series •
Socket AM3 with either DDR2 or
DDR3 for Semprons and the 45 nm Phenom II and Athlon II Series. They can also be used on AM3+ motherboards with DDR3. Note that, while all K10 Phenom Processors are backwards compatible with Socket AM2+ and
Socket AM2, some 45 nm Phenom II Processors are only available for Socket AM2+.
Lynx processors do not use either AM2+ nor AM3. •
Socket FM1 with DDR3 for
Lynx processors. •
Socket F with DDR2, DDR3 with
Shanghai and later Opteron processors • Instruction set additions and extensions • New bit-manipulation
instructions ABM: Leading Zero Count (LZCNT) and Population Count (POPCNT) • New
SSE instructions named as
SSE4a: combined mask-shift instructions (EXTRQ/INSERTQ) and scalar streaming store instructions (MOVNTSD/MOVNTSS). These instructions are not found in Intel's
SSE4 • Support for unaligned SSE load-operation instructions (which formerly required 16-byte alignment) • Execution pipeline enhancements • 128-bit wide
SSE units • Wider L1 data cache interface allowing for two 128-bit loads per cycle (as opposed to two 64-bit loads per cycle with K8) • Lower integer divide latency • 512-entry indirect
branch predictor and a larger return stack (size doubled from K8) and branch target buffer • Side-Band Stack Optimizer, dedicated to perform increment/decrement of register stack pointer • Fastpathed CALL and RET-Imm instructions (formerly microcoded) as well as MOVs from SIMD registers to general purpose registers • Integration of new technologies onto CPU die: •
Four processor cores (Quad-core) • Split
power planes for CPU core and memory controller/northbridge for more effective power management, first dubbed
Dynamic Independent Core Engagement or
D. I. C. E. by AMD and now known as
Enhanced PowerNow! (also dubbed Independent Dynamic Core Technology), allowing the cores and northbridge (integrated memory controller) to scale power consumption up or down independently. • Shutting down portions of the circuits in core when not in load, named "CoolCore" Technology. • Improvements in the memory subsystem: • Improvements in access latency: • Support for re-ordering loads ahead of other loads and stores • More aggressive
instruction prefetching, 32 bytes instruction prefetch as opposed to 16 bytes in
K8 • DRAM prefetcher for buffering reads • Buffered burst writeback to RAM in order to reduce contention • Changes in memory hierarchy: • Prefetch directly into L1 cache as opposed to L2 cache with K8 family • 32-way set associative L3 victim cache sized at least 2 MB, shared between processing cores on a single die (each with 512 K of independent exclusive L2 cache), with a sharing-aware replacement policy. • Extensible L3 cache design, with 6 MB planned for
45 nm process node, with the chips codenamed
Shanghai. • Changes in address space management: • Two 64-bit independent memory controllers, each with its own physical address space; this provides an opportunity to better utilize the available bandwidth in case of random memory accesses occurring in heavily multi-threaded environments. This approach is in contrast to the previous "interleaved" design, where the two 64-bit data channels were bounded to a single common address space. • Larger Tagged Lookaside Buffers; support for 1
GB page entries and a new 128-entry 2 MB page TLB • 48-bit
memory addressing to allow for 256 TB memory subsystems • Memory mirroring (alternatively mapped DIMM addressing), data poisoning support and Enhanced
RAS •
AMD-V Nested Paging for improved MMU virtualization, claimed to have decreasing world switch time by 25%. • Improvements in system interconnect: •
HyperTransport retry support • Support for HyperTransport 3.0, with HyperTransport Link unganging which creates 8 point-to-point links per socket. • Platform-level enhancements with additional functionality: • Five p-states allowing for automatic clock rate modulation • Increased
clock gating • Official support for coprocessors via
HTX slots and vacant CPU sockets through
HyperTransport:
Torrenza initiative. ==Feature tables==