The 29k evolved from the same
Berkeley RISC design that also led to the
Sun SPARC,
Intel i960,
ARM and
RISC-V. One design element used in some of the Berkeley RISC-derived designs is the concept of
register windows, a technique used to speed up
procedure calls significantly. The idea is to use a large set of
registers as a stack, loading local data into a set of registers during a call, and marking them "dead" when the procedure returns. Values being returned from the routines would be placed in the "global page", the top eight registers in the SPARC (for instance). The competing early RISC design from
Stanford University, the
Stanford MIPS, also looked at this concept but decided that improved compilers could make more efficient use of general purpose registers than a hard-wired window. In the original Berkeley design, SPARC, and i960, the windows were fixed in size. A routine using only one local variable would still use up eight registers on the SPARC, wasting this expensive resource. It was here that the 29k differed from these earlier designs, using a variable window size. In this example only two registers would be used, one for the local variable, another for the
return address. It also added more registers, including the same 128 registers for the procedure stack, but adding another 64 for global access. In comparison, the SPARC had 128 registers in total, and the global set was a standard window of eight. This change resulted in much better register use in the 29000 under a wide variety of workloads. The 29k also extended the register window stack with an in-memory (and in theory, in-cache) stack. When the window filled the calls would be pushed off the end of the register stack into memory, restored as required when the routine returned. Generally, the 29k's register usage was considerably more advanced than competing designs based on the Berkeley concepts. Another difference from the Berkeley design is that the 29k avoided use of the condition codes. Although the 29k generates the
standard NZVC flags after arithmetic and logical operations, their only
use was by the add and subtract with carry instructions. Conditional branches were limited to branching on the most significant (sign) bit of a general-purpose register, which could be set by one of a series of compare instructions (such as "signed greater than" or "equal") Any register could be used for this purpose, allowing the conditions to be easily saved at the expense of complicating some code. A Branch Target Cache (512 bytes on the 29000 and 1024 bytes on the 29050) stored sets of 4 or 2 sequential instructions found at the branch target address, reducing the instruction fetch latency during taken branches—the 29000 did not include any
branch prediction system so there was a delay if a branch was taken. It means the 29000 has a single branch
delay slot. The buffer mitigated this by storing four or two instructions from the target address of the branch, which could be run instantly while the fetch buffer was re-filled with new instructions from memory. Support for virtual address translation followed a similar approach to that of the MIPS architecture. A 64-entry
translation lookaside buffer (TLB) retained mappings from virtual to physical addresses, and upon an untranslated address being encountered, the resulting TLB "miss" would cause the processor to trap to a software routine responsible for providing any appropriate mapping to physical memory. In contrast to the MIPS approach which employed a
random register to select the TLB entry to be replaced upon a TLB miss event, the 29000 provided a dedicated
lru (least recently used) register. Some products in the 29000 family provided only 16 TLB entries to be able to dedicate part of the silicon to peripherals. To compensate, the maximum
page size employed by a mapping was increased from 8 KB to 16 MB. ==Versions==