Early speculations In 1861, historical linguist
Max Müller published a list of speculative theories concerning the origins of spoken language: •
Bow-wow. The
bow-wow, or
cuckoo, theory, which Müller attributed to the German philosopher
Johann Gottfried Herder, saw early words as imitations of the cries of beasts and birds. •
Pooh-pooh. The
pooh-pooh theory saw the
first words as emotional interjections and exclamations triggered by pain, pleasure, surprise, etc. •
Ding-dong. Müller suggested what he called the
ding-dong theory, which states that all things have a vibrating natural resonance, echoed somehow by humans in their earliest words. •
Yo-he-ho. The
yo-he-ho theory claims that language emerged from collective rhythmic labor; that is, the attempt to synchronize muscular efforts resulting in sounds such as
heave alternating with sounds such as
ho. •
Ta-ta. The
ta-ta theory did not feature in Max Müller's list, having been proposed in 1930 by Sir Richard Paget. According to the
ta-ta theory, humans made the earliest words by tongue movements that mimicked manual gestures, rendering them audible. Most scholars today consider all such theories not so much wrong—they occasionally offer peripheral insights—as naïve and irrelevant. The problem with these theories is that they rest on the assumption that once early humans had discovered a workable
mechanism for linking sounds with meanings, language would automatically have evolved. Much earlier,
medieval Muslim scholars developed theories on the origin of language. Their theories were of five general types: •
Naturalist: There is a natural relationship between expressions and the things they signify. Language thus emerged from a natural human inclination to imitate the sounds of nature. •
Conventionalist: Language is a social convention. The names of things are
arbitrary inventions of humans. •
Revelationist: Language was gifted to humans by
God, and it was thus God—and not humans—who named everything. •
Revelationist-Conventionalist: God revealed to humans a core base of language—enabling humans to communicate with each other—and then humans invented the rest of language. •
Non-Committal: The view that conventionalist and revelationist theories are equally plausible.
Problems of reliability and deception From the perspective of signalling theory, the main obstacle to the evolution of language-like communication in nature is not a mechanistic one. Rather, it is the fact that symbols—arbitrary associations of sounds or other perceptible forms with corresponding meanings—are unreliable and may as well be false. The problem of reliability was not recognized at all by Darwin, Müller or the other early evolutionary theorists. Animal vocal signals are, for the most part, intrinsically reliable. When a cat purrs, the signal constitutes direct evidence of the animal's contented state. The signal is trusted, not because the cat is inclined to be honest, but because it just cannot fake that sound. Primate vocal calls may be slightly more manipulable, but they remain reliable for the same reason—because they are hard to fake. Primate social intelligence is "
Machiavellian"; that is,
self-serving and unconstrained by moral scruples. Monkeys, apes and particularly humans often attempt to deceive each other, while at the same time remaining constantly on guard against falling victim to deception themselves. Paradoxically, it is theorized that primates' resistance to deception is what blocks the evolution of their signalling systems along language-like lines. Language is ruled out because the best way to guard against being deceived is to ignore all signals except those that are instantly verifiable. Words automatically fail this test. A peculiar feature of language is
displaced reference, which means reference to topics outside the currently perceptible situation. This property prevents utterances from being corroborated in the immediate "here" and "now". For this reason, language presupposes relatively high levels of mutual trust in order to become established over time as an
evolutionarily stable strategy. This stability is born of a longstanding mutual trust and is what grants language its authority. A theory of the origins of language must therefore explain why humans could begin
trusting cheap signals in ways that other animals apparently cannot.
The "mother tongues" hypothesis The "mother tongues" hypothesis was proposed in 2004 as a possible solution to this problem.
W. Tecumseh Fitch suggested that the Darwinian principle of "
kin selection"—the convergence of genetic interests between relatives—might be part of the answer. Fitch suggests that languages were originally "mother tongues". If language evolved initially for communication between mothers and their own biological offspring, extending later to include adult relatives as well, the interests of speakers and listeners would have tended to coincide. Fitch argues that shared genetic interests would have led to sufficient trust and cooperation for intrinsically unreliable signals—words—to become accepted as trustworthy and so begin evolving for the first time. Critics of this theory point out that kin selection is not unique to humans. So even if one accepts Fitch's initial premises, the extension of the posited "mother tongue" networks from close relatives to more distant relatives remains unexplained.—to explain the unusually high levels of intentional honesty necessary for language to evolve. "Reciprocal altruism" can be expressed as the principle that ''if you scratch my back, I'll scratch yours
. In linguistic terms, it would mean that if you speak truthfully to me, I'll speak truthfully to you''. Ordinary Darwinian reciprocal altruism, Ulbæk points out, is a relationship established between frequently interacting individuals. For language to prevail across an entire community, however, the necessary reciprocity would have needed to be enforced universally instead of being left to individual choice. Ulbæk concludes that for language to evolve, society as a whole must have been subject to moral regulation. Critics point out that this theory fails to explain when, how, why or by whom "obligatory reciprocal altruism" could possibly have been enforced.
The gossip and grooming hypothesis Gossip, according to
Robin Dunbar in his book
Grooming, Gossip and the Evolution of Language, language does for group-living humans what
manual grooming does for other primates—it allows individuals to service their relationships and so maintain their alliances on the basis of the principle: ''if you scratch my back, I'll scratch yours''. Dunbar argues that as humans began living in increasingly larger social groups, the task of manually grooming all one's friends and acquaintances became so time-consuming as to be unaffordable. In response to this problem, humans developed "a cheap and ultra-efficient form of grooming"—
vocal grooming. To keep allies happy, one now needs only to "groom" them with low-cost vocal sounds, servicing multiple allies simultaneously while keeping both hands free for other tasks. Vocal grooming then evolved gradually into vocal language—initially in the form of "
gossip". Critics of this theory point out that the efficiency of "vocal grooming"—the fact that words are so cheap—would have undermined its capacity to signal commitment of the kind conveyed by time-consuming and costly manual grooming. A further criticism is that the theory does nothing to explain the crucial transition from vocal grooming—the production of pleasing but meaningless sounds—to the cognitive complexities of syntactical speech.
Ritual/speech coevolution The ritual/speech coevolution theory was originally proposed by social anthropologist
Roy Rappaport before being elaborated by anthropologists such as Chris Knight, Jerome Lewis, Nick Enfield, Camilla Power and Ian Watts. Cognitive scientist and robotics engineer
Luc Steels is another prominent supporter of this general approach, as is biological anthropologist and neuroscientist
Terrence Deacon. A more recent champion of the approach is the Chomskyan specialist in
linguistic syntax, Cedric Boeckx. These scholars argue that there can be no such thing as a "theory of the origins of language". This is because language is not a separate adaptation, but an internal aspect of something much wider—namely, the entire domain known to anthropologists as human
symbolic culture. Attempts to explain language independently of this wider context have failed, say these scientists, because they are addressing a problem with no solution. Language would not work outside its necessary environment of confidence-building social mechanisms and institutions. For example, it would not work for a nonhuman ape communicating with others of its kind in the wild. Not even the cleverest nonhuman ape could make language work under such conditions. Advocates of this school of thought point out that words are cheap. Should an especially clever nonhuman ape, or even a group of articulate nonhuman apes, try to use words in the wild, they would carry no conviction. The primate vocalizations that do carry conviction—those they actually use—are unlike words, in that they are emotionally expressive, intrinsically meaningful, and reliable because they are relatively costly and hard to fake. Oral and gestural languages consist of pattern-making whose cost is essentially zero. As pure social conventions, signals of this kind cannot evolve in a Darwinian social world—they are a theoretical impossibility. Being intrinsically unreliable, language works only if one can build up a reputation for trustworthiness within a certain kind of society—namely, one where symbolic cultural facts (sometimes called "institutional facts") can be established and maintained through collective social endorsement. In any hunter-gatherer society, the basic mechanism for establishing trust in symbolic cultural facts is collective
ritual. Therefore, the task facing researchers into the origins of language is more multidisciplinary than is usually supposed. It involves addressing the evolutionary emergence of human ritual, kinship, religion and symbolic culture taken as a whole, with language an important but subsidiary component. In a 2023 article, Cedric Boeckx In philosophical terms, they are "institutional facts": fictions that are granted factual status within human social institutions From this standpoint, according to Boeckx, linguistic utterances are symbolic to the extent that they are patent falsehoods serving as guides to communicative intentions. "They are communicatively useful untruths, as it were." Chomsky's own theory is that language emerged in an instant and in perfect form, prompting his critics in turn, to retort that only something that does not exist—a theoretical construct or convenient scientific fiction—could possibly emerge in such a miraculous way. The purpose of this test was to focus on the planning aspect of Acheulean tool making and cued word generation in language (an example of cued word generation would be trying to list all words beginning with a given letter). Theories of language developing alongside tool use has been theorized by multiple individuals; however, until recently, there has been little empirical data to support these hypotheses. Focusing on the results of the study performed by Uomini
et al. evidence for the usage of the same brain areas has been found when looking at cued word generation and Acheulean tool use. The relationship between tool use and language production is found in working and planning memory respectively and was found to be similar across a variety of participants, furthering evidence that these areas of the brain are shared. In later theory, especially in
functional linguistics, the primacy of communication is emphasised over psychological needs. The exact way language evolved is however not considered as vital to the study of languages.
Structural linguist Ferdinand de Saussure abandoned
evolutionary linguistics after having come to the firm conclusion that it would not be able to provide any further revolutionary insight after the completion of the major works in
historical linguistics by the end of the 19th century. Saussure was particularly sceptical of the attempts of
August Schleicher and other Darwinian linguists to access prehistorical languages through series of reconstructions of
proto-languages. Saussure's solution to the problem of language evolution involves dividing
theoretical linguistics in two. Evolutionary and historical linguistics are renamed as
diachronic linguistics. It is the study of
language change, but it has only limited explanatory power due to the inadequacy of all of the reliable research material that could ever be made available.
Synchronic linguistics, in contrast, aims to widen scientists' understanding of language through a study of a given contemporary or historical language stage as a system in its own right. Although Saussure put much focus on diachronic linguistics, later structuralists who equated structuralism with the synchronic analysis were sometimes criticised of ahistoricism. According to
structural anthropologist Claude Lévi-Strauss, language and meaning—in opposition to "knowledge, which develops slowly and progressively"—must have appeared in an instant. Structuralism, as first introduced to
sociology by
Émile Durkheim, is nonetheless a type of humanistic evolutionary theory which explains diversification as necessitated by growing complexity. There was a shift of focus to functional explanation after Saussure's death. Functional structuralists including the
Prague Circle linguists and
André Martinet explained the growth and maintenance of structures as being necessitated by their functions. Thus, in this theory, language appeared rather suddenly within the history of human evolution. Chomsky, writing with computational linguist and computer scientist Robert C. Berwick, suggests that this scenario is completely compatible with modern biology. They note that "none of the recent accounts of human language evolution seem to have completely grasped the shift from conventional Darwinism to its fully
stochastic modern version—specifically, that there are stochastic effects not only due to sampling like directionless drift, but also due to directed stochastic variation in fitness, migration, and heritability—indeed, all the "forces" that affect individual or gene frequencies... All this can affect evolutionary outcomes—outcomes that as far as we can make out are not brought out in recent books on the evolution of language, yet would arise immediately in the case of any new genetic or individual innovation, precisely the kind of scenario likely to be in play when talking about language's emergence." Citing evolutionary geneticist
Svante Pääbo, they concur that a substantial difference must have occurred to differentiate
Homo sapiens from
Neanderthals to "prompt the relentless spread of our species, who had never crossed open water, up and out of Africa and then on across the entire planet in just a few tens of thousands of years.... What we do not see is any kind of 'gradualism' in new tool technologies or innovations like fire, shelters, or figurative art." Berwick and Chomsky therefore suggest language emerged approximately between 200,000 years ago and 60,000 years ago (between the appearance of the first anatomically modern humans in southern Africa and the last exodus from Africa respectively). "That leaves us with about 130,000 years, or approximately 5,000–6,000 generations of time for evolutionary change. This is not 'overnight in one generation' as some have (incorrectly) inferred—but neither is it on the scale of geological eons. It's time enough—within the ballpark for what Nilsson and Pelger (1994) estimated as the time required for the
full evolution of a vertebrate eye from a single cell, even without the invocation of any 'evo-devo' effects." The single-mutation theory of language evolution has been directly questioned on different grounds. A formal analysis of the probability of such a mutation taking place and going to fixation in the species has concluded that such a scenario is unlikely, with multiple mutations with more moderate fitness effects being more probable. Another criticism has questioned the logic of the argument for single mutation and puts forward that from the formal simplicity of
Merge, the capacity Berwick and Chomsky deem the core property of human language that emerged suddenly, one cannot derive the (number of) evolutionary steps that led to it.
The Romulus and Remus hypothesis The Romulus and Remus hypothesis, proposed by neuroscientist
Andrey Vyshedskiy, seeks to address the question as to why the modern speech apparatus originated over 500,000 years before the earliest signs of modern human imagination. This hypothesis proposes that there were two phases that led to modern recursive language. The phenomenon of
recursion occurs across multiple linguistic domains, arguably most prominently in
syntax and
morphology. Thus, by nesting a structure such as a sentence or a word within themselves, it enables the generation of potentially (
countably) infinite new variations of that structure. For example, the base sentence [Peter likes apples.] can be nested in
irrealis clauses to produce [Mary said [Peter likes apples., [Paul believed [Mary said [Peter likes apples.] and so forth. The first phase includes the slow development of non-recursive language with a large vocabulary along with the modern speech apparatus, which includes changes to the hyoid bone, increased voluntary control of the muscles of the diaphragm, and the evolution of the FOXP2 gene, as well as other changes by 600,000 years ago. Then, the second phase was a rapid
Chomskian single step, consisting of three distinct events that happened in quick succession around 70,000 years ago and allowed the shift from non-recursive to recursive language in early hominins. • A genetic mutation that slowed down the
prefrontal synthesis (PFS) critical period of at least two children that lived together. • This allowed these children to create recursive elements of language such as spatial prepositions. • Then this merged with their parents' non-recursive language to create recursive language. It is not enough for children to have a modern prefrontal cortex (PFC) to allow the development of PFS; the children must also be mentally stimulated and have recursive elements already in their language to acquire PFS. Since their parents would not have invented these elements yet, the children would have had to do it themselves, which is a common occurrence among young children that live together, in a process called
cryptophasia. This means that delayed PFC development would have allowed more time to acquire PFS and develop recursive elements. Delayed PFC development also comes with negative consequences, such as a longer period of reliance on one's parents to survive and lower survival rates. For modern language to have occurred, PFC delay had to have an immense survival benefit in later life, such as PFS ability. This suggests that the mutation that caused PFC delay and the development of recursive language and PFS occurred simultaneously, which lines up with evidence of a genetic bottleneck around 70,000 years ago. This could have been the result of a few individuals who developed PFS and recursive language which gave them significant competitive advantage over all other humans at the time. Research has found strong support for the idea that
oral communication and sign language depend on similar neural structures. Patients who used sign language, and who suffered from a left-
hemisphere lesion, showed the same disorders with their sign language as vocal patients did with their oral language. Other researchers found that the same left-hemisphere brain regions were active during sign language as during the use of vocal or written language. Primate gesture is at least partially genetic: different nonhuman apes will perform gestures characteristic of their species, even if they have never seen another ape perform that gesture. For example, gorillas beat their breasts. This shows that gestures are an intrinsic and important part of primate communication, which supports the idea that language evolved from gesture. Further evidence suggests that gesture and language are linked. In humans, manually gesturing has an effect on concurrent vocalizations, thus creating certain natural vocal associations of manual efforts. Chimpanzees move their mouths when performing fine motor tasks. These mechanisms may have played an evolutionary role in enabling the development of intentional vocal communication as a supplement to gestural communication. Voice modulation could have been prompted by preexisting manual actions. This addresses the idea that gestures quickly change in humans from a sole means of communication (from a very young age) to a supplemental and predictive behavior that is used despite the ability to communicate verbally. This too serves as a parallel to the idea that gestures developed first and language subsequently built upon it. Two possible scenarios have been proposed for the development of language, one of which supports the gestural theory: • Language developed from the calls of human ancestors. • Language was derived from gesture. The first perspective that language evolved from the calls of human ancestors seems logical because both humans and animals make sounds or cries. One evolutionary reason to refute this is that, anatomically, the centre that controls calls in monkeys and other animals is located in a completely different part of the brain than in humans. In monkeys, this centre is located in the depths of the brain related to emotions. In the human system, it is located in an area unrelated to emotion. Humans can communicate simply to communicate—without emotions. So, anatomically, this scenario does not work.(humans communicated by gesture first and sound was attached later). The important question for gestural theories is why there was a shift to vocalization. Various explanations have been proposed: • Human ancestors started to use more and more tools, meaning that their hands were occupied and could no longer be used for gesturing. • Manual gesturing requires that speakers and listeners be visible to one another. In many situations, they might need to communicate, even without visual contact—for example after nightfall or when foliage obstructs visibility. • A composite hypothesis holds that early language took the form of part gestural and part vocal
mimesis (imitative 'song-and-dance'), combining modalities because all signals (like those of nonhuman apes and monkeys) still needed to be costly in order to be intrinsically convincing. In that event, each multi-media display would have needed not just to disambiguate an intended meaning but also to inspire confidence in the signal's reliability. The suggestion is that only once community-wide contractual understandings had come into force could trust in communicative intentions be automatically assumed, at last allowing
Homo sapiens to shift to a more efficient default format. Since vocal distinctive features (sound contrasts) are ideal for this purpose, it was only at this point—when intrinsically persuasive body-language was no longer required to convey each message—that the decisive shift from manual gesture to the current primary reliance on
spoken language occurred. A comparable hypothesis states that in 'articulate' language, gesture and vocalisation are intrinsically linked, as language evolved from equally intrinsically linked dance and song. There are also a great number of
sign languages still in existence, commonly associated with Deaf communities. These sign languages are equal in complexity, sophistication, and expressive power, to any oral language. The cognitive functions are similar and the parts of the brain used are similar. The main difference is that the "phonemes" are produced on the outside of the body, articulated with hands, body, and facial expression, rather than inside the body articulated with tongue, teeth, lips, and breathing. (Compare the
motor theory of speech perception.) Critics of gestural theory note that it is difficult to name serious reasons why the initial pitch-based vocal communication (which is present in primates) would be abandoned in favor of the much less effective non-vocal, gestural communication. However,
Michael Corballis has pointed out that it is supposed that primate vocal communication (such as alarm calls) cannot be controlled consciously, unlike hand movement, and thus it is not credible as precursor to human language; primate vocalization is rather homologous to and continued in involuntary reflexes (connected with basic human emotions) such as screams or laughter (the fact that these can be faked does not disprove the fact that genuine involuntary responses to fear or surprise exist).
Tool-use associated sound in the evolution of language Proponents of the motor theory of language evolution have primarily focused on the visual domain and communication through observation of movements. The
Tool-use sound hypothesis suggests that the production and perception of sound also contributed substantially, particularly
incidental sound of locomotion (
ISOL) and
tool-use sound (
TUS). Human bipedalism resulted in rhythmic and more predictable
ISOL. That may have stimulated the evolution of musical abilities, auditory working memory, and abilities to produce complex vocalizations, and to mimic natural sounds. Since the human brain proficiently extracts information about objects and events from the sounds they produce,
TUS, and mimicry of
TUS, might have achieved an iconic function. The prevalence of sound symbolism in many extant languages supports this idea. Self-produced
TUS activates multimodal brain processing (
motor neurons, hearing,
proprioception, touch, vision), and
TUS stimulates primate audiovisual mirror neurons, which is likely to stimulate the development of association chains. Tool use and auditory gestures involve motor-processing of the forelimbs, which is associated with the evolution of vertebrate vocal communication. The production, perception, and mimicry of
TUS may have resulted in a limited number of vocalizations or protowords that were associated with tool use. This hypothesis is supported by some
cytoarchitectonic homologies between monkey premotor area F5 and human Broca's area. Rates of vocabulary expansion link to the ability of children to vocally mirror non-words and so to acquire the new word pronunciations. Such
speech repetition occurs automatically, quickly and separately in the brain to
speech perception. Moreover, such vocal imitation can occur without comprehension such as in
speech shadowing and
echolalia. Further evidence for this link comes from a 2010 study in which the brain activity of two participants was measured using fMRI while they were gesturing words to each other using hand gestures with a game of
charades—a modality that some have suggested might represent the evolutionary precursor of human language. Analysis of the data using
Granger Causality revealed that the mirror-neuron system of the observer indeed reflects the pattern of activity of in the motor system of the sender, supporting the idea that the motor concept associated with the words is indeed transmitted from one brain to another using the mirror system. Not all linguists agree with the above arguments, however. In particular, supporters of Noam Chomsky argue against the possibility that the mirror neuron system can play any role in the hierarchical recursive structures essential to syntax.
Putting-down-the-baby theory According to
Dean Falk's "putting-down-the-baby" theory, vocal interactions between early hominid mothers and infants began a sequence of events that led, eventually, to human ancestors' earliest words. The basic idea is that evolving human mothers, unlike their counterparts in other primates, could not move around and forage with their infants clinging onto their backs. Loss of fur in the human case left infants with no means of clinging on. Frequently, therefore, mothers had to put their babies down. As a result, these babies needed to be reassured that they were not being abandoned. Mothers responded by developing 'motherese'—an infant-directed communicative system embracing facial expressions, body language, touching, patting, caressing, laughter, tickling, and emotionally expressive contact calls. The argument is that language developed out of this interaction.
From-where-to-what theory The "from where to what" model is a language evolution model that is derived primarily from the organization of
language processing in the brain into two structures: the auditory dorsal stream and the auditory ventral stream. It hypothesizes seven stages of language evolution (see illustration). Speech originated for the purpose of exchanging contact calls between mothers and their offspring to find one another in the event they became separated (illustration part 1). The contact calls could be modified with intonations in order to express either a higher or lower level of distress (illustration part 2). The use of two types of contact calls enabled the first question-answer conversation. In this scenario, the child would emit a low-level distress call to express a desire to interact with an object, and the mother would respond with either another low-level distress call (to express approval of the interaction) or a high-level distress call (to express disapproval) (illustration part 3). Over time, the improved use of intonations and vocal control led to the invention of unique calls (phonemes) associated with distinct objects (illustration part 4). At first, children learned the calls (phonemes) from their parents by imitating their lip-movements (illustration part 5). Eventually, infants were able to encode into long-term memory all the calls (phonemes). Consequentially, mimicry via lip-reading was limited to infancy and older children learned new calls through mimicry without lip-reading (illustration part 6). Once individuals became capable of producing a sequence of calls, this allowed multi-syllabic words, which increased the size of their vocabulary (illustration part 7). The use of words, composed of sequences of syllables, provided the infrastructure for communicating with sequences of words (i.e. sentences). The theory's name is derived from the two auditory streams, which are both found in the brains of humans and other primates. The auditory ventral stream is responsible for sound recognition, and so it is referred to as the auditory
what stream. In primates, the auditory dorsal stream is responsible for
sound localization, and thus it is called the auditory
where stream. Only in humans (in the left hemisphere) is it also responsible for other processes associated with language use and acquisition, such as speech repetition and production, integration of phonemes with their lip movements, perception and production of intonations, phonological
long-term memory (long-term memory storage of the sounds of words), and phonological working memory (the temporary storage of the sounds of words). Some evidence also indicates a role in recognizing others by their voices. The emergence of each of these functions in the auditory dorsal stream represents an intermediate stage in the evolution of language. A contact call origin for human language is consistent with animal studies, as like human language, contact call discrimination in monkeys is lateralised to the left hemisphere. Mice with knock-out to language related genes (such as
FOXP2 and
SRPX2) also resulted in the pups no longer emitting contact calls when separated from their mothers. Supporting this model is also its ability to explain unique human phenomena, such as the use of intonations when converting words into commands and questions, the tendency of infants to mimic vocalizations during the first year of life (and its disappearance later on) and the protruding and visible
human lips, which are not found in other apes. This theory could be considered an elaboration of the putting-down-the-baby theory of language evolution.
Grammaticalization theory "
Grammaticalization" is a continuous historical process in which free-standing words develop into grammatical appendages, while these in turn become ever more specialized and grammatical. An initially "incorrect" usage, in becoming accepted, leads to
unforeseen consequences, triggering knock-on effects and extended sequences of change. Paradoxically, grammar evolves because, in the final analysis, humans care less about grammatical niceties than about making themselves understood. If this is how grammar evolves today, according to this school of thought, similar principles at work can be legitimately inferred among distant human ancestors, when grammar itself was first being established. In order to reconstruct the evolutionary transition from early language to languages with complex grammars, it is necessary to know which hypothetical sequences are plausible and which are not. In order to convey abstract ideas, the first recourse of speakers is to fall back on immediately recognizable concrete imagery, very often deploying
metaphors rooted in shared bodily experience. A familiar example is the use of concrete terms such as "belly" or "back" to convey abstract meanings such as "inside" or "behind". Equally metaphorical is the strategy of representing temporal patterns on the model of spatial ones. For example, in the English sentence "Exams are approaching," the word "approaching" literally describes movement in space, but here it is used to describe an event that will happen soon. From such examples it can be seen why grammaticalization is consistently unidirectional—from concrete to abstract meaning, not the other way around. Creativity drives grammatical change. Creativity and reliability are incompatible demands; for "
Machiavellian" primates as for animals generally, the overriding pressure is to demonstrate reliability. If humans escape these constraints, it is because in their case, listeners are primarily interested in mental states. To focus on mental states is to accept fictions—inhabitants of the imagination—as potentially informative and interesting. An example is metaphor: a metaphor is, literally, a false statement. In
Romeo and Juliet, Romeo declares "Juliet is the sun!". Juliet is a woman, not a ball of plasma in the sky, but human listeners are not (or not usually) pedants insistent on point-by-point factual accuracy. They want to know what the speaker has in mind. Grammaticalization is essentially based on metaphor. To outlaw its use would be to stop grammar from evolving and, by the same token, to exclude all possibility of expressing abstract thought. A criticism of all this is that while grammaticalization theory might explain language change today, it does not satisfactorily address the more difficult challenge of explaining the initial transition from primate-style communication to language as it is known today. Rather, the theory assumes that language already exists. As
Bernd Heine and
Tania Kuteva acknowledge: "Grammaticalisation requires a linguistic system that is used regularly and frequently within a community of speakers and is passed on from one group of speakers to another". When such animals view their reflection (
mirror test), they recognize themselves and exhibit
self-consciousness. Notably, humans evolved in a quite different environment than that of these animals. Human survival became easier with the development of tools, shelter, and fire, thus facilitating further advancement of social interaction, self-expression, and tool-making, as for hunting and gathering. The increasing brain size allowed advanced provisioning and tools and the technological advances during the Palaeolithic era that built upon the previous evolutionary innovations of bipedalism and hand versatility allowed the development of human language.
Self-domesticated ape theory According to a study investigating the song differences between
white-rumped munias and its domesticated counterpart (
Bengalese finch), the wild munias use a highly stereotyped song sequence, whereas the domesticated ones sing a highly unconstrained song. In wild finches, song syntax is subject to female preference—
sexual selection—and remains relatively fixed. However, in the Bengalese finch, natural selection is replaced by breeding, in this case for colorful plumage, and thus, decoupled from selective pressures, stereotyped song syntax is allowed to drift. It is replaced, supposedly within 1000 generations, by a variable and learned sequence. Wild finches, moreover, are thought incapable of learning song sequences from other finches. In the field of
bird vocalization, brains capable of producing only an innate song have very simple neural pathways: the primary forebrain motor centre, called the robust nucleus of
arcopallium, connects to midbrain vocal outputs, which in turn project to brainstem motor nuclei. By contrast, in brains capable of learning songs, the arcopallium receives input from numerous additional forebrain regions, including those involved in learning and social experience. Control over song generation has become less constrained, more distributed, and more flexible. == Speech and language for communication ==