The ASC was based around a single high-speed shared memory, which was accessed by the CPU and eight
I/O channel controllers, in an organization similar to
Seymour Cray's groundbreaking
CDC 6600. Memory was accessed solely under the control of the memory control unit (MCU). The MCU was a two-way, 256-bit per channel parallel network that could support up to eight independent processors, with a ninth channel for accessing "main memory" (referred to as "extended memory"). The MCU also acted as a
cache controller, offering high-speed access to a
semiconductor-based memory for the eight processor ports, and handling all communications to the 24-bit address space in main memory. The MCU was designed to operate asynchronously, allowing it to work at a variety of speeds and scale across a number of performance points. For instance, main memory could be constructed out of slower but less expensive
core memory, although this was not used in practice. At the fastest, it could sustain transfer rates of 80 million 32-bit words per second per port, for a total transfer rate of 640 million words per second. This was well beyond the capabilities of even the fastest memories of the era. The CPU had a 60 ns clock cycle (16.67 MHz clock frequency) and its logic was built from 20-
gate emitter-coupled logic integrated circuits originally developed by TI for the
ILLIAC IV supercomputer. The CPU had an extremely advanced architecture and organization for its era, supporting
microcoded arithmetic and mathematical instructions that operated on scalars, vectors, or matrices. The vector processing facilities had a memory-to-memory architecture; where the vector operands were read from, and the resulting vector written to, memory. The CPU could have one, two, or four vector lanes, allowing the CPU to produce one to four vector results every cycle, depending on the number of vector lanes installed. The vector lanes were also used for scalar instructions, and each lane could keep up to 12 scalar instructions in-flight simultaneously. The CPU, with four lanes, allowed up to 36 instructions in total across the entire CPU. The processor had forty-eight 32-bit registers, a huge number for the time. 16 of the registers were used for addressing, 16 for scalar operations, 8 for index offsets, and 8 for specifying the various parameters for vector instructions. Data was moved between the registers and memory by load/store instructions, which could transfer from 4–64 bits (two registers) at a time. Most
vector processors tended to be memory bandwidth-limited, that is, they could process data faster than they could get it from memory. This remains a major problem on modern SIMD designs as well, which is why considerable effort has been put into increasing memory throughput in modern computer designs (although largely unsuccessfully). In the ASC this was improved somewhat with a lookahead unit that predicted upcoming memory accesses and loaded them into the scalar registers invisibly, using a memory interface in the CPU called the memory buffer unit (MBU). The "Peripheral Processor" was a separate system dedicated entirely to quickly running the
operating system and programs running within it, as well as feeding data to the CPU. The PP was built out of eight "virtual processors" (VPs), which were designed to handle instructions and basic integer arithmetic only. Each VP had its own
program counter and registers, and the system could thus run eight programs at the same time, limited only by memory accesses. Keeping eight programs running allowed the system to shuffle execution of programs on the CPU depending on what data was available on the memory bus at that time, minimizing "dead time" where the CPU had to wait for data from the memory. The PP also included a set of sixty-four 32-bit communications registers (CRs). The CRs stored the state required for communication between the various parts of the ASC: the CPU, VPs, and
channel controllers. The ASC instruction set include a bit-reverse instruction that was intended to speed up the calculation of
fast Fourier transforms (FFTs). By the time the ASC was in production, better FFT algorithms had been developed that did not require this operation. TI offered a bounty to the first person to come up with a valid use for this instruction, but was never collected. ==Market reception==