When one function A calls another function B during a typical program's execution, the local state of B (including
parameters and
local variables) must be stored somewhere. In a
leaf function, this can be stored in registers, but if B calls another function C which itself uses those registers, they must be preserved somewhere in memory so B can proceed after C returns. In most compiled programs, this local state is stored on the
call stack in a
data structure called a
stack frame or
activation record. This stack frame is pushed, or allocated, before B calls C, and is popped, or deallocated, just before B returns. The upwards funarg problem arises when the return value of B is itself a function F which refers to B's state (say, some local variable V) after B has returned. A may call F, which needs V to still be allocated. Therefore, the stack frame containing B's state variables must not be deallocated when B returns, violating the
stack-based function call paradigm. One solution to the upwards funarg problem is to simply allocate all activation records from the
heap instead of the stack and rely on some form of
garbage collection or
reference counting to deallocate them when they are no longer needed. Managing activation records on the heap has historically been perceived to be less efficient than on the stack (although this is partially contradicted) and has been perceived to impose significant implementation complexity. Most functions in typical programs (less so for programs in
functional programming languages) do not create upwards funargs, adding to concerns about potential overhead associated with their implementation. Furthermore, this approach is genuinely difficult in languages that do not support garbage collection. Some efficiency-minded compilers employ a hybrid approach in which the activation records for a function are allocated from the stack if the
compiler is able to deduce, through
static program analysis, that the function creates no upwards funargs. Otherwise, the activation records are allocated from the heap. Another solution is to simply copy the value of the variables into the closure at the time the closure is created. This will cause a different behavior in the case of mutable variables, because the state will no longer be shared between closures. But if it is known that the variables are constant, then this approach will be equivalent. The
ML languages take this approach, since variables in those languages are bound to values—i.e. variables cannot be changed.
Java also takes this approach with respect to anonymous classes (and lambdas since Java 8), in that it only allows one to refer to variables in the enclosing scope that are effectively final (i.e. constant). Some languages allow the programmer to explicitly choose between the two behaviors.
PHP 5.3's anonymous functions require one to specify which variables to include in the closure using the use () clause; if the variable is listed by reference, it includes a reference to the original variable; otherwise, it passes the value. In Apple's
Blocks anonymous functions, captured local variables are by default captured by value; if one wants to share the state between closures or between the closure and the outside scope, the variable must be declared with the __block modifier, in which case that variable is allocated on the heap.
Example The following
Haskell-like
pseudocode defines
function composition:
λ is the operator for constructing a new function, which in this case has one argument, x, and returns the result of first applying g to x, then applying f to that. This λ function carries the functions f and g (or pointers to them) as internal state. The problem in this case exists if the compose function allocates the parameter variables f and g on the stack. When compose returns, the stack frame containing f and g is discarded. When the internal function λx attempts to access g, it will access a discarded memory area. ==Downwards funarg problem==