While many designs achieved the aim of higher throughput at lower cost and also allowed high-level language constructs to be expressed by fewer instructions, it was observed that this was not
always the case. For instance, low-end versions of complex architectures (i.e. using less hardware) could lead to situations where it was possible to improve performance by
not using a complex instruction (such as a procedure call or enter instruction) but instead using a sequence of simpler instructions. One reason for this was that architects (
microcode writers) sometimes "over-designed" assembly language instructions, including features that could not be implemented efficiently on the basic hardware available. There could, for instance, be "side effects" (above conventional flags), such as the setting of a register or memory location that was perhaps seldom used; if this was done via ordinary (non duplicated) internal buses, or even the external bus, it would demand extra cycles every time, and thus be quite inefficient. Even in balanced high-performance designs, highly encoded and (relatively) high-level instructions could be complicated to decode and execute efficiently within a limited transistor budget. Such architectures therefore required a great deal of work on the part of the processor designer in cases where a simpler, but (typically) slower, solution based on decode tables and/or microcode sequencing is not appropriate. At a time when transistors and other components were a limited resource, this also left fewer components and less opportunity for other types of performance optimizations.
The RISC idea The circuitry that performs the actions defined by the microcode in many (but not all) CISC processors is, in itself, a processor which in many ways is reminiscent in structure to very early CPU designs. In the early 1970s, this gave rise to ideas to return to simpler processor designs in order to make it more feasible to cope without (
then relatively large and expensive) ROM tables and/or
PLA structures for sequencing and/or decoding. An early (retroactively) RISC-
labeled processor (
IBM 801 IBM's Watson Research Center, mid-1970s) was a tightly pipelined simple machine originally intended to be used as an internal microcode kernel, or engine, in CISC designs, but also became the processor that introduced the RISC idea to a somewhat larger audience. Simplicity and regularity also in the visible instruction set would make it easier to implement overlapping processor stages (
pipelining) at the machine code level (i.e. the level seen by compilers). However, pipelining at that level was already used in some high-performance CISC "supercomputers" in order to reduce the instruction cycle time (despite the complications of implementing within the limited component count and wiring complexity feasible at the time). Internal microcode execution in CISC processors, on the other hand, could be more or less pipelined depending on the particular design, and therefore more or less akin to the basic structure of RISC processors. The
CDC 6600 supercomputer, first delivered in 1965, has also been retroactively described as RISC. It had a load–store architecture which allowed up to five loads and two stores to be in progress simultaneously under programmer control. It also had multiple function units which could operate at the same time.
Superscalar In a more modern context, the complex variable-length encoding used by some of the typical CISC architectures makes it complicated, but still feasible, to build a
superscalar implementation of a CISC programming model
directly; the in-order superscalar original
Pentium and the out-of-order superscalar
Cyrix 6x86 are well-known examples of this. The frequent memory accesses for operands of a typical CISC machine may limit the instruction-level parallelism that can be extracted from the code, although this is strongly mediated by the fast cache structures used in modern designs, as well as by other measures. Due to inherently compact and semantically rich instructions, the average amount of work performed per machine code unit (i.e. per byte or bit) is higher for a CISC than a RISC processor, which may give it a significant advantage in a modern cache-based implementation. Transistors for logic, PLAs, and microcode are no longer scarce resources; only large high-speed cache memories are limited by the maximum number of transistors today. Although complex, the transistor count of CISC decoders do not grow exponentially like the total number of transistors per processor (the majority typically used for caches). Together with better tools and enhanced technologies, this has led to new implementations of highly encoded and variable-length designs without load–store limitations (i.e. non-RISC). This governs re-implementations of older architectures such as the ubiquitous x86 (see below) as well as new designs for
microcontrollers for
embedded systems, and similar uses. The superscalar complexity in the case of modern x86 was solved by converting instructions into one or more
micro-operations and dynamically issuing those micro-operations, i.e. indirect and dynamic superscalar execution; the
Pentium Pro and
AMD K5 are early examples of this. It allows a fairly simple superscalar design to be located after the (fairly complex) decoders (and buffers), giving, so to speak, the best of both worlds in many respects. This technique is also used in
IBM z196 and later
z/Architecture microprocessors.
CISC and RISC terms By the mid-1980s the computer industry's consensus was that RISC was more efficient than CISC.
Digital Equipment Corporation estimated that RISC had a
price/performance ratio at most half that of CISC. Two possible responses from CISC vendors were: • Improve CISC as much as possible until reaching the current architecture's limits. Chosen for
IBM mainframes and
x86. • Move to RISC as fast as possible.
Sun Microsystems chose this by moving from the
Motorola 68000 series to
SPARC. Intel was successful in improving x86 to match RISC's performance. The terms CISC and RISC have become less meaningful with the continued evolution of both CISC and RISC designs and implementations. The first highly (or tightly) pipelined x86 implementations, the 486 designs from Intel,
AMD,
Cyrix, and IBM, supported every instruction that their predecessors did, but achieved
maximum efficiency only on a fairly simple x86 subset that was only a little more than a typical RISC instruction set (i.e., without typical RISC
load–store limits). As CISC became a catch-all term meaning anything that's not a load–store (RISC) architecture, it's not the number of instructions, nor the complexity of the implementation or of the instructions, that define CISC, but that arithmetic instructions also perform memory accesses. Compared to a small 8-bit CISC processor, a RISC floating-point instruction is complex. CISC does not even need to have complex addressing modes; 32- or 64-bit RISC processors may well have more complex addressing modes than small 8-bit CISC processors. A
PDP-10, a
PDP-8, an x86 processor, an
Intel 4004, a Motorola 68000-series processor, a
IBM Z mainframe, a
Burroughs B5000, a
VAX, a
Zilog Z80000, and a
MOS Technology 6502 all vary widely in the number, sizes, and formats of instructions, the number, types, and sizes of registers, and the available data types. Some have hardware support for operations like scanning for a substring, arbitrary-precision BCD arithmetic, or
transcendental functions, while others have only 8-bit addition and subtraction. But they are all in the CISC category. because they have "load-operate" instructions that load and/or store memory contents within the same instructions that perform the actual calculations. For instance, the PDP-8, having only 8 fixed-length instructions and no microcode at all, is a CISC because of
how the instructions work, PowerPC, which has over 230 instructions (more than some VAXes), and several implementations of which have complex internals such as
register renaming and a
reorder buffer, is a RISC, while
Minimal CISC has 8 instructions, but is clearly a CISC because it combines memory access and computation in the same instructions. ==See also==