Control units use many methods to keep a pipeline full and avoid stalls. For example, even simple control units can assume that a backwards branch, to a lower-numbered, earlier instruction, is a loop, and will be repeated. So, a control unit with this design will always fill the pipeline with the backwards branch path. If a
compiler can detect the most frequently-taken direction of a branch, the compiler can just produce instructions so that the most frequently taken branch is the preferred direction of branch. In a like way, a control unit might get hints from the compiler: Some computers have instructions that can encode hints from the compiler about the direction of branch. Some control units do
branch prediction: A control unit keeps an electronic list of the recent branches, encoded by the address of the branch instruction. This list has a few bits for each branch to remember the direction that was taken most recently. Some control units can do
speculative execution, in which a computer might have two or more pipelines, calculate both directions of a branch, and then discard the calculations of the unused direction. Results from memory can become available at unpredictable times because very fast computers
cache memory. That is, they copy limited amounts of memory data into very fast memory. The CPU must be designed to process at the very fast speed of the cache memory. Therefore, the CPU might stall when it must access main memory directly. In modern PCs, main memory is as much as three hundred times slower than cache. To help this, out-of-order CPUs and control units were developed to process data as it becomes available. (See next section) But what if all the calculations are complete, but the CPU is still stalled, waiting for main memory? Then, a control unit can switch to an
alternative thread of execution whose data has been fetched while the thread was idle. A thread has its own program counter, a stream of instructions and a separate set of registers. Designers vary the number of threads depending on current memory technologies and the type of computer. Typical computers such as PCs and smart phones usually have control units with a few threads, just enough to keep busy with affordable memory systems. Database computers often have about twice as many threads, to keep their much larger memories busy. Graphic processing units (GPUs) usually have hundreds or thousands of threads, because they have hundreds or thousands of execution units doing repetitive graphic calculations. When a control unit permits
threads, the software also has to be designed to handle them. In general-purpose CPUs like PCs and smartphones, the threads are usually made to look very like normal time-sliced processes. At most, the operating system might need some awareness of them. In GPUs, the thread scheduling usually cannot be hidden from the application software, and is often controlled with a specialized subroutine library. == Out of order control units ==