ARM Cortex-A72

• Pipelined processor with deeply out-of-order, speculative issue 3-way superscalar execution pipeline • DSP and NEON SIMD extensions are mandatory per core • VFPv4 Floating Point Unit onboard (per core) • Hardware virtualization support • Thumb-2 instruction set encoding reduces the size of 32-bit programs with little impact on performance. • TrustZone security extensions • Program Trace Macrocell and CoreSight Design Kit for unobtrusive tracing of instruction execution • 32 KiB data (2-way set-associative) + 48 KiB instruction (3-way set-associative) L1 cache per core • Integrated low-latency level-2 (16-way set-associative) cache controller, 512 KB to 4 MB configurable size per cluster • 48-entry fully associative L1 instruction translation lookaside buffer (TLB) with native support for 4 KiB, 64 KiB, and 1 MB page sizes • 32-entry fully associative L1 data TLB with native support for 4 KiB, 64 KiB, and 1 MB page sizes • 4-way set-associative of 1024-entry unified L2 TLB per core, supports hit-under-miss • Sophisticated branch prediction algorithm that significantly increases performance and reduces energy from misprediction and speculation • Early IC tag –3-way L1 cache at direct-mapped power* • Regionalized TLB and μBTB tagging • Small-offset branch-target optimizations • Suppression of superfluous branch predictor accesses ==Chips==