Loop optimizations Loop optimization acts on the statements that make up a loop, such as a
for loop, for example
loop-invariant code motion. Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops.
Data-flow optimizations Data-flow optimizations, based on
data-flow analysis, primarily depend on how certain properties of data are propagated by control edges in the
control-flow graph. Some of these include: ;
Common subexpression elimination: In the expression (a + b) - (a + b)/4, "common subexpression" refers to the duplicated (a + b). Compilers implementing this technique realize that (a + b) will not change, and so only calculate its value once. Used in most modern languages. ;
Induction variable recognition and elimination: See discussion above about
induction variable analysis. ;
Alias classification and pointer analysis: In the presence of
pointers, it is difficult to make any optimizations at all, since potentially any variable can have been changed when a memory location is assigned to. By specifying which pointers can alias which variables, unrelated pointers can be ignored. ;
Dead-store elimination: Removal of assignments to variables that are not subsequently read, either because the lifetime of the variable ends or because of a subsequent assignment that will overwrite the first value.
SSA-based optimizations These optimizations are intended to be done after transforming the program into a special form called
static single-assignment form, in which every variable is assigned in only one place. Although some function without SSA, they are most effective with SSA. Many optimizations listed in other sections also benefit with no special changes, such as register allocation. ;
Global value numbering: GVN eliminates redundancy by constructing a value graph of the program, and then determining which values are computed by equivalent expressions. GVN can identify some redundancy that
common subexpression elimination cannot, and vice versa. ;
Sparse conditional constant propagation: Combines constant propagation,
constant folding, and
dead-code elimination, and improves upon what is possible by running them separately. This optimization symbolically executes the program, simultaneously propagating constant values and eliminating portions of the
control-flow graph that this makes unreachable.
Code generator optimizations ;
Register allocation: The most frequently used variables should be kept in processor registers for the fastest access. To find which variables to put in registers, an interference-graph is created. Each variable is a vertex and when two variables are used at the same time (have an intersecting liverange) they have an edge between them. This graph is colored using for example
Chaitin's algorithm using the same number of colors as there are registers. If the coloring fails one variable is "spilled" to memory and the coloring is retried. ;
Instruction selection: Most architectures, particularly
CISC architectures and those with many
addressing modes, offer several different ways of performing a particular operation, using entirely different sequences of instructions. The job of the instruction selector is to do a good job overall of choosing which instructions to implement which operators in the low-level
intermediate representation with. For example, on many processors in the
68000 family and the x86 architecture, complex addressing modes can be used in statements like lea 25(a1,d5*4), a0, allowing a single instruction to perform a significant amount of arithmetic with less storage. ;
Instruction scheduling: Instruction scheduling is an important optimization for modern pipelined processors, which avoids stalls or bubbles in the pipeline by clustering instructions with no dependencies together, while being careful to preserve the original semantics. ;
Rematerialization: Rematerialization recalculates a value instead of loading it from memory, eliminating an access to memory. This is performed in tandem with register allocation to avoid spills. ;Code factoring: If several sequences of code are identical, or can be parameterized or reordered to be identical, they can be replaced with calls to a shared subroutine. This can often share code for subroutine set-up and sometimes tail-recursion. ;
Trampolines: Many CPUs have smaller subroutine call instructions to access low memory. A compiler can save space by using these small calls in the main body of code. Jump instructions in low memory can access the routines at any address. This multiplies space savings from code factoring. The problem of determining an optimal set of macros that minimizes the space required by a given code segment is known to be
NP-complete, ;Reduction of
cache collisions: (e.g., by disrupting alignment within a page) ;
Stack-height reduction: Rearrange an expression tree to minimize resources needed for expression evaluation. ;Test reordering: If we have two tests that are the condition for something, we can first deal with the simpler tests (e.g., comparing a variable to something) and only then with the complex tests (e.g., those that require a function call). This technique complements
lazy evaluation, but can be used only when the tests are not dependent on one another.
Short-circuiting semantics can make this difficult.
Interprocedural optimizations Interprocedural optimization works on the entire program, across procedure and file boundaries. It works tightly with intraprocedural counterparts, carried out with the cooperation of a local part and a global part. Typical interprocedural optimizations are procedure
inlining, interprocedural dead-code elimination, interprocedural constant propagation, and procedure reordering. As usual, the compiler needs to perform interprocedural analysis before its actual optimizations. Interprocedural analyses include alias analysis,
array access analysis, and the construction of a
call graph. Interprocedural optimization is common in modern commercial compilers from
SGI,
Intel,
Microsoft, and
Sun Microsystems. For a long time, the open source
GCC was criticized for a lack of powerful interprocedural analysis and optimizations, though this is now improving. Another open-source compiler with full analysis and optimization infrastructure is
Open64. Due to the extra time and space required by interprocedural analysis, most compilers do not perform it by default. Users must use compiler options explicitly to tell the compiler to enable interprocedural analysis and other expensive optimizations. == Practical considerations ==