Branch predictor

Static branch prediction Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead, it predicts the outcome of a branch based solely on the branch instruction. Early implementations of SPARC and MIPS (two of the first commercial RISC architectures) used single-direction static branch prediction: they always predict that a conditional jump will not be taken, so they always fetch the next sequential instruction. Only when the branch or jump is evaluated and found to be taken, does the instruction pointer get set to a non-sequential address. Both CPUs fetch instructions in one cycle and evaluate branches in the decode stage. As a result, the branch target recurrence is two cycles long, and the machine always fetches the instruction immediately after any taken branch. Both architectures define branch delay slots in order to utilize these fetched instructions. A more-advanced form of static prediction assumes that backward branches will be taken and forward branches will not. (A backward branch is one that has a target address that is lower than its own address.) This rule increases prediction accuracy in loops, which can be constructed with a backward branch at the end that is mostly taken, or a forward branch at the beginning that is mostly not taken. With processors that use this prediction method, ordering the instructions can maximize branch-prediction accuracy. The RISC-V ISA recommends that software written for RISC-V harts (hardware threads), or generated to run on RISC-V harts, be optimized with the assumption that backward branches are taken and forward branches are not. (Even when the processor incorporates a more-advanced predictor that diminishes the relative value of static prediction.) Some processors accept branch-prediction hints that override the static prediction. The Intel Pentium 4 and Pentium 4E accept branch-prediction hints as prefixes. In the presence of dynamic prediction, static prediction offers almost no benefit, and overriding the static prediction helps even less. Intel's subsequent Pentium M and Core2 processors ignore branch-prediction hint prefixes. No x86 manufacturer has re-introduced prediction hints. Dynamic branch predictors naturally fall back on static branch prediction when they have no cached information (such as the first time a given branch is encountered). Both the Motorola MPC7450 (G4e) and the Intel Pentium 4 fall back on static prediction. In static prediction, all decisions are made at compile time, before the execution of the program. Dynamic branch prediction Dynamic branch prediction The predictor table is indexed with the instruction address bits, so that the processor can fetch a prediction for every instruction before the instruction is decoded. Two-level predictor The Two-Level Branch Predictor, also referred to as Correlation-Based Branch Predictor, uses a two-dimensional table of counters, also called "Pattern History Table". The table entries are two-bit counters. Two-level adaptive predictor If an if statement is executed three times, the decision made on the third execution might depend upon whether the previous two were taken or not. In such scenarios, a two-level adaptive predictor works more efficiently than a saturation counter. Conditional jumps that are taken every second time or have some other regularly recurring pattern are not predicted well by the saturating counter. A two-level adaptive predictor remembers the history of the last n occurrences of the branch and uses one saturating counter for each of the possible 2n history patterns. This method is illustrated in figure 3. Consider the example of n = 2. This means that the last two occurrences of the branch are stored in a two-bit shift register. This branch history register can have four different binary values, 00, 01, 10, and 11, where zero means "not taken" and one means "taken". A pattern history table contains four entries per branch, one for each of the 22 = 4 possible branch histories, and each entry in the table contains a two-bit saturating counter of the same type as in figure 2 for each branch. The branch history register is used for choosing which of the four saturating counters to use. If the history is 00, then the first counter is used; if the history is 11, then the last of the four counters is used. Assume, for example, that a conditional jump is taken every third time. The branch sequence is 001001001... In this case, entry number 00 in the pattern history table will go to state "strongly taken", indicating that after two zeroes comes a one. Entry number 01 will go to state "strongly not taken", indicating that after 01 comes a zero. The same is the case with entry number 10, while entry number 11 is never used because there are never two consecutive ones. The general rule for a two-level adaptive predictor with an n-bit history is that it can predict any repetitive sequence with any period if all n-bit sub-sequences are different. Since the initial publication in 1991, this method has become very popular. Variants of this prediction method are used in most modern microprocessors. Two-level neural predictor A two-level branch predictor where the second level is replaced with a neural network has been proposed. Local branch prediction A local branch predictor has a separate history buffer for each conditional jump instruction. It may use a two-level adaptive predictor. The history buffer is separate for each conditional jump instruction, while the pattern history table may be separate as well or it may be shared between all conditional jumps. The Intel Pentium MMX, Pentium II, and Pentium III have local branch predictors with a local 4-bit history and a local pattern history table with 16 entries for each conditional jump. On the SPEC'89 benchmarks, very large local predictors saturate at 97.1% correct. combines the local and global prediction principles by concatenating local and global branch histories, possibly with some bits from the program counter as well. Tests indicate that the VIA Nano processor may be using this technique. Hybrid predictor A hybrid predictor, also called combined predictor, implements more than one prediction mechanism. The final prediction is based either on a meta-predictor that remembers which of the predictors has made the best predictions in the past, or a majority vote function based on an odd number of different predictors. Scott McFarling proposed combined branch prediction in his 1993 paper. Newer processors from Intel and AMD can predict indirect branches by using a two-level adaptive predictor. This kind of instruction contributes more than one bit to the history buffer. The zEC12 and later z/Architecture processors from IBM support a instruction that can preload the branch predictor entry for a given instruction with a branch target address constructed by adding the contents of a general-purpose register to an immediate displacement value. Processors without this mechanism will simply predict an indirect jump to go to the same target as it did last time. Neural branch prediction Machine learning for branch prediction using LVQ and multilayer perceptrons, called "neural branch prediction", was proposed by Lucian Vintan (Lucian Blaga University of Sibiu). One year later he developed the perceptron branch predictor. The neural branch predictor research was developed much further by Daniel Jimenez. In 2001, The main advantage of the neural predictor is its ability to exploit long histories while requiring only linear resource growth. Classical predictors require exponential resource growth. Jimenez reports a global improvement of 5.7% over a McFarling-style hybrid predictor. He also used a gshare/perceptron overriding hybrid predictors.). Intel already implements this idea in one of the IA-64's simulators (2003). The AMD Ryzen multi-core processor's Infinity Fabric and the Samsung Exynos processors include perceptron-based neural branch predictors. ==History==