First-class function

In this section, we compare how particular programming idioms are handled in a functional language with first-class functions (Haskell) compared to an imperative language where functions are second-class citizens (C). Higher-order functions: passing functions as arguments In languages where functions are first-class citizens, functions can be passed as arguments to other functions in the same way as other values (a function taking another function as argument is called a higher-order function). In the language Haskell: map :: (a -> b) -> [a] -> [b] map f [] = [] map f (x:xs) = f x : map f xs Languages where functions are not first-class often still allow one to write higher-order functions through the use of features such as function pointers or delegates. In the language C: void map(int (*f)(int), int x[], size_t n) { for (int i = 0; i There are a number of differences between the two approaches that are not directly related to the support of first-class functions. The Haskell sample operates on lists, while the C sample operates on arrays. Both are the most natural compound data structures in the respective languages and making the C sample operate on linked lists would have made it unnecessarily complex. This also accounts for the fact that the C function needs an additional parameter (giving the size of the array.) The C function updates the array in-place, returning no value, whereas in Haskell data structures are persistent (a new list is returned while the old is left intact.) The Haskell sample uses recursion to traverse the list, while the C sample uses iteration. Again, this is the most natural way to express this function in both languages, but the Haskell sample could easily have been expressed in terms of a fold and the C sample in terms of recursion. Finally, the Haskell function has a polymorphic type, as this is not supported by C we have fixed all type variables to the type constant int. Anonymous and nested functions In languages supporting anonymous functions, we can pass such a function as an argument to a higher-order function: main = map (\x -> 3 * x + 1) [1, 2, 3, 4, 5] In a language which does not support anonymous functions, we have to bind it to a name instead: int f(int x) { return 3 * x + 1; } int main() { int list[] = {1, 2, 3, 4, 5}; map(f, list, 5); } Non-local variables and closures Once we have anonymous or nested functions, it becomes natural for them to refer to variables outside of their body (called non-local variables): main = let a = 3 b = 1 in map (\x -> a * x + b) [1, 2, 3, 4, 5] If functions are represented with bare function pointers, we can not know anymore how the value that is outside of the function's body should be passed to it, and because of that a closure needs to be built manually. Therefore we can not speak of "first-class" functions here. typedef struct { int (*f)(int, int, int); int a; int b; } Closure; void map(Closure* closure, int x[], size_t n) { for (int i = 0; i f)(closure->a, closure->b, x[i]); } } int f(int a, int b, int x) { return a * x + b; } void main() { int l[] = {1, 2, 3, 4, 5}; int a = 3; int b = 1; Closure closure = {f, a, b}; map(&closure, l, 5); } Also note that the map is now specialized to functions referring to two ints outside of their environment. This can be set up more generally, but requires more boilerplate code. If f would have been a nested function we would still have run into the same problem and this is the reason they are not supported in C. Higher-order functions: returning functions as results When returning a function, we are in fact returning its closure. In the C example any local variables captured by the closure will go out of scope once we return from the function that builds the closure. Forcing the closure at a later point will result in undefined behaviour, possibly corrupting the stack. This is known as the upwards funarg problem. Assigning functions to variables Assigning functions to variables and storing them inside (global) data structures potentially suffers from the same difficulties as returning functions. f :: Integer] -> [Integer f = let a = 3 b = 1 in [map (\x -> a * x + b), map (\x -> b * x + a)] Equality of functions As one can test most literals and values for equality, it is natural to ask whether a programming language can support testing functions for equality. On further inspection, this question appears more difficult and one has to distinguish between several types of function equality: ; Extensional equality: Two functions f and g are considered extensionally equal if they agree on their outputs for all inputs (∀x. f(x) = g(x)). Under this definition of equality, for example, any two implementations of a stable sorting algorithm, such as insertion sort and merge sort, would be considered equal. Deciding on extensional equality is undecidable in general and even for functions with finite domains often intractable. For this reason no programming language implements function equality as extensional equality. ; Intensional equality: Under intensional equality, two functions f and g are considered equal if they have the same "internal structure". This kind of equality could be implemented in interpreted languages by comparing the source code of the function bodies (such as in Interpreted Lisp 1.5) or the object code in compiled languages. Intensional equality implies extensional equality (assuming the functions are deterministic and have no hidden inputs, such as the program counter or a mutable global variable.) ; Reference equality: Given the impracticality of implementing extensional and intensional equality, most languages supporting testing functions for equality use reference equality. All functions or closures are assigned a unique identifier (usually the address of the function body or the closure) and equality is decided based on equality of the identifier. Two separately defined, but otherwise identical function definitions will be considered unequal. Referential equality implies intensional and extensional equality. Referential equality breaks referential transparency and is therefore not supported in pure languages, such as Haskell. == Type theory ==