Tail recursion is important to some
high-level languages, especially
functional and
logic languages and members of the
Lisp family. In these languages, tail recursion is the most commonly used way (and sometimes the only way available) of implementing iteration. The language specification of Scheme requires that tail calls are to be optimized so as not to grow the stack. Tail calls can be made explicitly in
Perl, with a variant of the "goto" statement that takes a function name: goto &NAME; However, for language implementations which store function arguments and local variables on a
call stack (which is the default implementation for many languages, at least on systems with a
hardware stack, such as the
x86), implementing generalized tail-call optimization (including mutual tail recursion) presents an issue: if the size of the callee's activation record is different from that of the caller, then additional cleanup or resizing of the stack frame may be required. For these cases, optimizing tail recursion remains trivial, but general tail-call optimization may be harder to implement efficiently. For example, in the
Java virtual machine (JVM), tail-recursive calls can be eliminated (as this reuses the existing call stack), but general tail calls cannot be (as this changes the call stack). As a result, functional languages such as
Scala that target the JVM can efficiently implement direct tail recursion, but not mutual tail recursion. The
GCC,
LLVM/Clang, and
Intel compiler suites perform tail-call optimization for
C and other languages at higher optimization levels or when the -foptimize-sibling-calls option is passed. Though the given language syntax may not explicitly support it, the compiler can make this optimization whenever it can determine that the return types for the caller and callee are equivalent, and that the argument types passed to both function are either the same, or require the same amount of total storage space on the call stack. Various implementation methods are available.
In assembly Tail calls are often optimized by
interpreters and
compilers of
functional programming and
logic programming languages to more efficient forms of
iteration. For example,
Scheme programmers commonly express
while loops as calls to procedures in tail position and rely on the Scheme compiler or interpreter to substitute the tail calls with more efficient
jump instructions. For compilers generating assembly directly, tail-call elimination is easy: it suffices to replace a call opcode with a jump one, after fixing parameters on the stack. From a compiler's perspective, the first example above is initially translated into pseudo-
assembly language (in fact, this is valid
x86 assembly): foo: call B call A ret Tail-call elimination replaces the last two lines with a single jump instruction: foo: call B jmp A After subroutine A completes, it will then return directly to the return address of foo, omitting the unnecessary ret statement. Typically, the subroutines being called need to be supplied with
parameters. The generated code thus needs to make sure that the
call frame for A is properly set up before jumping to the tail-called subroutine. For instance, on
platforms where the
call stack does not just contain the
return address, but also the parameters for the subroutine, the compiler may need to emit instructions to adjust the call stack. On such a platform, for the code:
function foo(data1, data2) B(data1)
return A(data2) (where data1 and data2 are parameters) a compiler might translate that as: foo: mov reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register. push reg ; put data1 on stack where B expects it call B ; B uses data1 pop ; remove data1 from stack mov reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register. push reg ; put data2 on stack where A expects it call A ; A uses data2 pop ; remove data2 from stack. ret A tail-call optimizer could then change the code to: foo: mov reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register. push reg ; put data1 on stack where B expects it call B ; B uses data1 pop ; remove data1 from stack mov reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register. mov [sp+data1],reg ; put data2 where A expects it jmp A ; A uses data2 and returns immediately to caller. This code is more efficient both in terms of execution speed and use of stack space.
Through trampolining Since many
Scheme compilers use
C as an intermediate target code, the tail recursion must be encoded in C without growing the stack, even if the C compiler does not optimize tail calls. Many implementations achieve this by using a device known as a
trampoline, a piece of code that repeatedly calls functions. All functions are entered via the trampoline. When a function has to tail-call another, instead of calling it directly and then returning the result, it returns the address of the function to be called and the call parameters back to the trampoline (from which it was called itself), and the trampoline takes care of calling this function next with the specified parameters. This ensures that the C stack does not grow and iteration can continue indefinitely. It is possible to implement trampolines using
higher-order functions in languages that support them, such as
Groovy,
Visual Basic .NET and
C#. Using a trampoline for all function calls is rather more expensive than the normal C function call, so at least one Scheme compiler,
Chicken, uses a technique first described by
Henry Baker from an unpublished suggestion by
Andrew Appel, in which normal C calls are used but the stack size is checked before every call. When the stack reaches its maximum permitted size, objects on the stack are
garbage-collected using the
Cheney algorithm by moving all live data into a separate heap. Following this, the stack is unwound ("popped") and the program resumes from the state saved just before the garbage collection. Baker says "Appel's method avoids making a large number of small trampoline bounces by occasionally jumping off the Empire State Building." The garbage collection ensures that mutual tail recursion can continue indefinitely. However, this approach requires that no C function call ever returns, since there is no guarantee that its caller's stack frame still exists; therefore, it involves a much more dramatic internal rewriting of the program code:
continuation-passing style. ==Relation to the while statement==