With this design generation, ARM moved from a
von Neumann architecture (Princeton architecture) to a (modified; meaning split cache)
Harvard architecture with separate instruction and
data buses (and caches), significantly increasing its potential speed. Most silicon chips integrating these cores will package them as
modified Harvard architecture chips, combining the two address buses on the other side of separated
CPU caches and tightly coupled memories. There are two subfamilies, implementing different ARM architecture versions.
Differences from ARM7 cores Key improvements over
ARM7 cores, enabled by spending more transistors, include: • Clock frequency improvements. Shifting from a three-stage
instruction pipeline to a five-stage one lets the
clock speed be approximately doubled, on the same
silicon fabrication process. • Cycle count improvements. Many unmodified ARM7
binaries were measured as taking about 30% fewer
cycles to execute on ARM9 cores. Key improvements include: • Faster
loads and stores; many instructions now cost just one cycle. This is helped by both the modified Harvard architecture (reducing bus and cache contention) and the new pipeline stages. • Exposing pipeline interlocks, enabling compiler optimizations to reduce blockage between stages. Additionally, some ARM9 cores incorporate "Enhanced
DSP" instructions, such as a
multiply-accumulate, to support more efficient implementations of
digital signal processing algorithms. Switching from a von Neumann architecture entailed using a non-unified cache, so that instruction
fetches do not evict data (and vice versa). ARM9 cores have separate data and address bus signals, which chip designers use in various ways. In most cases they connect at least part of the address space in von Neumann style, used for both instructions and data, usually to an
AHB interconnect connecting to a
DRAM interface and an
External Bus Interface usable with
NOR flash memory. Such hybrids are no longer pure Harvard architecture processors.
ARM license ARM Holdings neither manufactures nor sells CPU devices based on its own designs, but rather licenses the processor architecture to interested parties. ARM offers a variety of licensing terms, varying in cost and deliverables. To all licensees, ARM provides an integratable hardware description of the ARM core, as well as complete software development toolset and the right to sell manufactured
silicon containing the ARM CPU. This model of licensed CPU core design is called an
intellectual property (IP) core.
Silicon customization Integrated device manufacturers (IDM) receive the ARM Processor IP as
synthesizable RTL (written in
Verilog). In this form, they have the ability to perform architectural level optimizations and extensions. This allows the manufacturer to achieve custom design goals, such as higher clock speed, very low power consumption, instruction set extensions, optimizations for size, debug support, etc. To determine which components have been included in a particular ARM CPU chip, consult the manufacturer datasheet and related documentation. ==Cores==