32H workstation The POWER1 is a multi-chip CPU built from separate chips that are connected to each other by buses. The POWER1 consists of an
instruction-cache unit (ICU), a
fixed-point unit (FXU), a
floating point unit (FPU), a number of
data-cache units (DCU), a
storage-control unit (SCU) and an
I/O unit. Due to its modular design, IBM was able to create two configurations by simply varying the number of DCUs,
RIOS-1 and a
RIOS.9. The RIOS-1 configuration has four DCUs, the intended amount, and was clocked at up to 40 MHz, whereas the RIOS.9 CPU had two DCUs and was clocked at lower frequencies. The chips are mounted on the “CPU planar”, a
printed circuit board (PCB), using through-hole technology. Due to the large number of chips with wide
buses, the PCB has eight planes for routing wires, four for power and ground and four for signals. There are two signal planes on each side of the board, while the four power and ground planes are in the center. The chips that make up the POWER1 are fabricated in a 1.0 μm
CMOS process with three layers of interconnect. The chips are packaged in
ceramic pin grid array (CPGA) packages that can have up to 300 pins and dissipate a maximum of 4
W of heat each. The total number of
transistors featured by the POWER1, assuming that it is a RIOS-1 configuration, is 6.9 million, with 2.04 million used for logic and 4.86 million used for memory. The die area of all the chips combined is 1,284 mm2. The total number of signal pins is 1,464.
Chips Instruction-cache unit (ICU) The ICU contains the instruction cache, referred to as the "I-cache" by IBM and the
branch processing unit (BPU). The BPU contains the
program counter, the condition code register and a loop register. The ICU contains 0.75 million transistors with 0.2 million used for logic and 0.55 million used for
SRAM. The ICU
die measures approximately 160 mm2 (12.7 × 12.7 mm). The BPU was capable of dispatching multiple instructions to the fixed and floating point instructions queues while it was executing a program flow control instruction (up to four simultaneously and out of order). Speculative
branches were also supported by using a prediction bit in the branch instructions, with the results discarded before being saved if the branch was not taken. The alternate instruction would be buffered and discarded if the branch was taken. Consequently,
subroutine calls and
interrupts are dealt with without incurring branch penalties. The condition code register has eight field sets, with the first two reserved for fixed and floating point instructions and the seventh for
vector instructions. The rest of the fields could be used by other instructions. The loop register is a counter for "decrement and branch on zero" loops with no branch penalty, a feature similar to those found in some
DSPs such as the TMS320C30.
Fixed-point unit (FXU) The FXU is responsible for decoding and executing all fixed-point instructions and floating-point load and store instructions. For execution, the FXU contains the POWER1's fixed-point register file, an arithmetic logic unit (ALU) for general instructions, and a dedicated fixed-point multiply and divide unit. It also contains instruction buffers that receive both fixed- and floating-point instructions from the ICU, passing on the floating-point instructions to the FPU, and a 128-entry two-way set-associative D-
TLB for address translation. The FXU contains approximately 0.5 million transistors, with 0.25 million used for logic and 0.25 used for memory, on a die measuring approximately 160 mm2.
Floating-point unit (FPU) The POWER1's floating point unit executes floating-point instructions issue by the ICU. The FPU is
pipelined and can execute
single precision (32-bit) and
double precision (64-bit) instructions. It is capable of performing
multiply-add instructions, which contributed to the POWER1's high floating point performance. In most processors, a multiply and an add, which is common in technical and scientific floating-point code, cannot be executed in one cycle, as in the POWER1. Use of
fused multiply–add also means that the data is only rounded once, improving the precision of the result slightly. The floating-point register file is also located on the FPU chip. It contains 32 64-bit floating-point registers, six rename registers and two registers that are used by divide instructions.
Data-cache unit (DCU) The POWER1 has a 64 KB
data cache implemented through four identical data-cache units (DCU), each containing 16 KB of data cache. The cache and the buses that connect the DCU to the other chips are ECC protected. The DCUs also provide the interface to the memory. If two DCUs are present (RIOS.9 configuration), the memory bus is 64 bits wide, and if four DCUs are present (RIOS-1 configuration), the memory bus is 128 bits wide. The memory interface portion of the DCUs provide three features that improves the reliability and availability of the memory:
memory scrubbing,
ECC and
bit steering. Each DCU contains approximately 1.125 million transistors, with 0.175 million used for logic and 0.95 million used for SRAM, on a die measuring approximately 130 mm² (11.3 × 11.3 mm).
Storage-control unit (SCU) The POWER1 is controlled by the SCU chip. All communications between the ICU, FXU and DCU chips as well as the
memory and
I/O devices is arbitrated by the SCU. Although the DCUs provide the means to perform memory scrubbing, it is the SCU that controls the process. The SCU contains approximately 0.23 million transistors, all of them for logic, on a die measuring approximately 130 mm2.
I/O unit The POWER1's I/O interfaces are implemented by the I/O unit, which contains an I/O channel controller (IOCC) and two
serial link adapters (SLAs). The IOCC implements the
Micro Channel interface and controls both I/O and
DMA transactions between the Micro Channel adapters and the system memory. The two SLAs each implement a serial
fibre optic link, which are intended to connect RS/6000 systems together. The optical links were not supported at the time of the RS/6000's release. The I/O unit contains approximately 0.5 million transistors, with 0.3 million used for logic and 0.2 million used for memory, on a die measuring approximately 160 mm2. == See also ==