The work is aimed at the theoretical reconstruction of the genesis of articulate speech and main abilities of consciousness based on evolutionary regularities and socio-psychological mechanisms. The basic concepts of the conceptual apparatus of reconstruction are presented: “abilities”, “attitudes”, “interiorization”, “interactive ritual”, “niches”, “social orders”, group and individual “concerns”, “communicative concerns”, “supporting structures”, including "magic wands" with a special potential for flexibility and multifunctionality. In ontogeny, human attitudes and abilities are formed through the mechanisms of interiorization (according to Leo Vygotsky) and interactive ritual (according to E. Durkheim, E. Goffman, R. Collins). To reconstruct human traits folding in anthropogenesis is to represent a regular stepwise transformation of the most ancient hominids’ initial ingredients, probably like the apes’ features. Particular attention is paid to pre-rituals that form the internal and behavioral attitudes of apes, as well as their abilities for sign communication and learning new signs. It is shown what sequence of challenges and responses, new concerns and supporting structures led hominids to the formation of joint intentionality (M. Tomasello), self-domestication (D. K. Belyaev, R. Wrangham), normative rituals, the first group rules and internal normative attitudes (C. Lovejoy, D. Dor et al.). This complex phenomenon of normativity, the corresponding social orders and renewed communicative concerns became the main drivers of articulate speech development and closely associated with it consciousness abilities.

Keywords: the origin of language, cognitive evolution, articulate speech, consciousness, attitudes, abilities, social orders, normativity, rituals, joint intentionality, communicative concerns

Received: 18.08.2021


For citation: Rozov N.S. Formation of speech and consciousness in anthropogenesis: evolutionary drivers and socio-psychological mechanisms. Kul'turno-istoricheskaya psikhologiya = Cultural-Historical Psychology, 2022. Vol. 18, no. 4, pp. 111–118. DOI: 10.17759/chp.2022180411. (In Russ., аbstr. in Engl.)

The Problem of the Language Rubicon and Vygotsky's Ideas on the Genesis of Speech and Thinking

The origin of articulated speech, language, and consciousness is still among the "eternal problems" of philosophical and scientific cognition. That is why the continuing growth of research in this field is not surprising (see review in the book [3]). The most mysterious remains the "linguistic Rubicon" –  the breakthrough from the sign system of animal communication to the beginning of articulate speech, the further stages of the development of which don’t seem so surprising anymore.

L.S. Vygotsky attached great importance to W. Koehler's observation about the gap between the chimpanzees' sign communication and their ability to think practically. He noted the same gap in the early stage of development of children with subsequent connection. L.S. Vygotsky captured this similarity in his significant theoretical conclusions:

"Apes detect humanoid intelligence in some respects (rudiments of tool use) and humanoid speech in quite other respects (phonetics of speech, emotional function, and rudiments of social function of speech). Apes do not exhibit a relationship characteristic of humans –  a close relationship between thinking and speech. One and the other are not connected in chimpanzees" [4, p. 757].

On the one hand, this thesis indicates that the potential ingredients for the formation of speech communication were already present in the predecessors of hominids (in the era of separation from anthropoids); on the other hand, there was a very high barrier (the "language Rubicon") that hominids overcame for several million years, while their closest relatives remained without articulated speech.

Conceptual Tools as Key Concepts of Reconstruction

Attitudes are internal structures that regulate an individual's psyche and behavior (L. Lange, W. Thomas, F. Znanietsky, D. Uznadze [12]).

Abilities (including coagulated, automated skills) are operational properties of attitudes acquired through repetition, practice, and training. We will be interested in the formation during anthropogenesis of abilities to articulate and recognize speech, but here we will also understand the development of consciousness as a layering of special abilities to focus attention and operate with various kinds of mental representations. These abilities arise and grow as individuals practice and thus already have attitudes –  a mindset of repetition and training. But where do the attitudes themselves come from?

Interiorization –  transformation of external social interactions into "higher psychological functions" (L.S. Vygotsky, A.R. Luria), in other words, into mental structures that control behavior, i.e., into the very attitudes [4; 5].

Interactive ritual is an interaction of two or more individuals in a "here and now" situation with a common focus of attention, automatic reactions of each to the manifestations of others, synchronized actions and psychophysiological rhythms, and common emotional arousal of one or another modality. Full-fledged, successful rituals lead to formation, strengthening of social relations (solidarity with comrades-in-arms, bowing before leaders, authorities, estrangement to the rejected), as well as feelings, beliefs regarding sacral objects.

Hereafter, for simplicity, we will understand "ritual" in this broad sociopsychological (and micro-sociological) sense. Thus, interiorization is the core of ritual as a kind of "social machine for the production of attitudes.

Attitudes are formed and reinforced in rituals through the positive reinforcement mechanism [10], and the interaction of participants, complex cognitive and emotional processes in rituals here play the role of operant attempts.

Caring is an objective need, or a need analogous to a function in evolutionary biology, but without rigid attachment to an organism; rather, it is a characteristic of a dynamic interaction of a living system with its environment. Along with individual needs, there are group concerns (cf. "social needs"). Besides natural (and later techno-natural) niches there are social niches, or social orders, systems of typical interactions and relations with this or that configuration of positions, corresponding patterns of behavior, with different access to each other, to benefits, resources, and in the long run with norms, institutions, practices. The renewal of orders to provide for certain challenges and concerns (usually basic –  in the spheres of sustenance, security, status-prestige, sexuality, parenthood) supplies individuals and groups with new challenges and concerns (superstructural –  above all in the spheres of social relations, communications and technology).

Concerns are initially expressed through challenges-threats and challenges-opportunities, and prospective responses become behavioral strategies, practices that lead to the formation of providing structures [24].

Such structures (adaptations in the broad sense and related elements, restrictions, connections, processes) can be of a very different nature: an organ, a property of an organ (for example, brain, larynx), innate assignments (including those for mastering speech) and gene mechanisms of their formation. Structures of group and individual care provision are considered to be an attitude and corresponding type of behavior, a social practice, a social institution, an element, a rule or construct of language, a capacity of consciousness.

Some flexible structures with especially great potential for modification and polyfunctionality are called magic wands. The skillful hand, the brain, the larynx, tools, rituals, attitudes, signs and meanings, abilities to use them, patterns transmitted through generations –  all this falls into this class of structures. Prehistory and history have deployed such major magic wands as: language, consciousness, technology, intergenerational transmission, mythologies, thinking, cognition, art.

To reconstruct the intermediate stages of evolution, we should also take into account that no structure can emerge "out of nothing": there are always some initial ingredients whose combination and modification constitute the structure's formation.

So, abilities (including abilities of speech and consciousness) are formed together with attitudes, which, through interiorization and rituals, form as structures providing cares. The latter are formed by encompassing (techno)natural niches and social orders, which are themselves structures of providing care for survival in these niches. At the beginning of the causal chain there is a renewal, including the "construction" of niches and orders [20].

Children's Learning and Social Control of Speech "Correctness"

Thanks to the works of L.S. Vygotsky and J. Piaget, it became clear what an exceptional role interaction with adults plays in a child's cognitive development. The transfer of patterns from adults to children was and remains the basis of cultural transmission. Patterns here are gradually released from the optional behavioral "additives" peculiar to a particular generation.

Let us note the presence of all components of interactive ritual in each interaction between the adult and the acquiring speech child: a common focus of attention (on what the child does and pronounces), the emotion of solidarity, explicit expressions of support from elders, approval when pronouncing correctly, instant corrections for mistakes, the child's attempts to correct articulation, and again positive reinforcement for success. Connections of solidarity and joy of inclusion in rituals of general success of mutual understanding were and remain a powerful motive for mastering of the sign system new to the child, and with it, the whole complex of behavioral norms.

Animal Pre-Consciousness and Peculiarities of Apes' Behavior

There is no doubt that animals with sharp eyes, sensitive ears and noses perceive the world around them and behave quite adequately with respect to different objects, according to their "meanings". Moreover, the most developed animals have the ability to retain the "meanings" of objects that have disappeared from the field of vision. The field of sensory (visual, auditory and olfactory) attention of animals can be rightly considered an evolutionary stage of pre-consciousness, or zero stage of development of consciousness.

There are important features of the psyche, behavior of apes, especially chimpanzees and bonobos, probably similar to those features of the most ancient hominids that became the ingredients of future sapient (characteristic for humans) structures. Along with the complex system of sign communication, emotional pre-rituals, which will be discussed further, we should point out the inclination and high ability of apes to imitate actions, good trainability and teachability, including in mastering new actions and signs, and developed practical thinking [4; 5].

Pre-Rituals in the Animal World

The most vivid analogs of ritual actions in primates and other higher mammals living in groups are fights of males (demonstrative or bloody, including among apes), which result in "imprinting" into the psyche of the winner and loser of the corresponding structures, settings, which will determine their behavior until a new rival fight [6].

Thus, hierarchical relations of dominance are established in the group with the establishment of priority access to prey and females, to hunting in the territory. When the weakened head of the group loses the fight to the strengthened rival, both of them develop mental structures regulating further behavior, which can be expressed approximately in the following way: "now I am the defeated, everything here is no longer mine, and I will have to leave" or "now I am the leader, all this territory and all females are mine, I will not let anyone here".

Other examples of pre-rituals are establishing and maintaining relationships of acceptance, friendship ("solidarity"), sexual partnership, parenthood through grooming, touching, exchange of certain sounds [6]. "Mating games" and "courtship" among mammals and birds play the same role when forming pairs, since further partner and parental behavior (albeit mostly instinctive) is already directed to the partner, which is fixed just in the partnership pre-rituals.

In apes, positive reinforcement is not limited to a treat, as in trained animals, or to momentary access to a female to win a fight with a rival during the rutting period in many herbivores. Important motivators for chimpanzees and bonobos and very significant for us are feelings, emotions, and usually related to the level of social, group membership, and attitude on the part of significant others.

The Animal "Language" is a Part of the "Episodic Mind"

Many social animals, including apes, communicate with each other quite effectively by means of differentiated sound signals, the so-called "animal language" or "animal communication system" [13]. Simple meanings of signals (the appearance of a dangerous predator, threat, an agreement to obey, invitation to play, be friends, become a sexual partner, etc.) are conveyed by simple sounds.

The language of those monkeys whose life is important for different group behavior, adequate to different external threats or different situations within groups, turns out to be rather complex with many differentiated and easily recognizable sounds. This conclusion has received solid empirical substantiation thanks to a technique with tape-recording of different sounds and subsequent video-recording of monkeys' behavior. Such is the "episodic mind" according to M. Donald [17].

It should be assumed that our common ancestor with the apes had approximately the same level of development of the sign system, otherwise it would be necessary to consider the ways of communication of chimpanzees and bonobos degraded, for which there is no reason.

Ability to Learn New Signs

The laryngeal anatomy of apes imposes strong limitations on the ability to produce clearly distinguishable sounds and cohesive combinations of sounds (words). Bonobos are trained to use graphemes (signs with meaning of objects and actions). The most talented and famous of them, Kanzi, learned several hundred such tablets by communicating with experimenters by pressing keys, after which the tablets appeared on the screen [21].

Kanzi made ample use of combinations of these "words", i.e., protophrases, displaying the learned graphemes (usually denoting something tasty, as well as "give" and "want to eat") in no particular order. M. Donald rightly observed that such successes are the result not only and not so much of the innate biological abilities of the monkeys themselves, but also of a culture of signs, meanings, and attempts at human communication brought in from outside by experimenters [17, p. 29].

Differences in Attention Structure and Ritual Behavior

A curious peculiarity was revealed by M. Tomasello: monkeys never point at anything, including meaningful things (for example, a treat or a toy), to each other. They usually draw attention to themselves by their behavior and sometimes show with their movements what they are going to do [25, p. 129].

The lack of the ability to point is common for wild animals. Dogs can muzzle or bark to indicate to people where something is (e.g., a padded duck). This ability is probably due to the long evolution of dogs in the human cultural space. However, dogs cannot point at an object to other dogs and keep their joint attention on it for a long time.

Specially trained chimpanzees can point to an experimenter for a treat to get it, but there is no evidence that they would do anything like that in the natural environment. A seemingly small detail. However, it is our ability to jointly focus our attention on the same object that underlies full human interactive rituals, so this "detail" turns out to be quite significant.

The ability of humans to point and understand directions, to respond to them adequately, is one of the specific features of our species.

Children even before they have mastered speech, at 12-14 months of age, are already quite confident in responding to instructions and are able to point at their own discretion.

It is precisely because of the absence of clear indicative gestures in animals that there is no full-fledged joint intentionality, when attention is focused not on themselves, not on the situation of their interaction, not on the practical goal (as in hunting), but on the object of common interest and their mutual communication [11; 14; 25]. Accordingly, in animal pre-rituals there are no separate symbols with autonomy of meaning from a specific emotional situation, no ability to keep the joint focus of attention on an object for a long time, and no shared, shareable subjective reality (at least, it cannot be judged).

The Descent of Hominids to the Ground and the Transition of Dominance to Egalitarian Coalitions

At the descent to land, early hominids fell into a highly competitive niche of gatherers and, alas, inferior scavengers, with the need to stick together for protection against formidable predators and to bring food from afar to women left with children in the stay [2; 22].

According to the known morphological traits, there was a self-domestication: large jaws disappeared, cranial ridges and shafts decreased, and sexual dimorphism (the difference in size and strength between males and females) decreased. It is reasonable to believe that this process included not only anatomical, but also significant social and cognitive transformations towards equality and intragroup solidarity [1; 7; 26].

There are several explanations of egalitarianism that do not contradict but complement each other: distant group violence (stoning), the appearance of lethal weapons (choppers), coalitions of mothers against aggressors to protect children and themselves from violence, negative sexual selection of aggressive alpha males.

Hominids drove away competitive scavengers from their prey with stones [2], but it was also necessary to drive predators away from the camp, protecting small children, so females were able to throw stones as well as males. They were also the least tolerant of internecine clashes and fights, because the winning males became a threat to the children of the defeated, and females were threatened with sexual violence [7; 19; 23].

The version of the group stoning of the strongest opponents [15] is supplemented here by a quite plausible alliance between mothers and a group of relatively weak males, who together confronted large bullies and rapists.

The rudimentary choppers used to cut up carcasses were a new weapon that could not only severely injure but also kill. Since among the higher mammals it was more often a demonstration of threats (who would be afraid of whom) instead of fights, without a real fight, the single aggressors were more likely to yield to a coalition of weaker males, more so supported by females.

Finally, even in the absence of sure victories in skirmishes, females that were in alliance against aggressive males avoided mating with them by all means. It is possible that it was the latter factor of negative sexual selection that became the most effective for hominid self-domestication both in morphology (gracilization) and behavioral traits (orientation towards group membership and solidarity rather than personal dominance through violence and intimidation).

In group life, the main outcome of these processes, which took a very long period (approximately from 8-7 to 2.5-1.5 million years ago), was the transition of dominance from aggressive alpha males to solidary and relatively egalitarian coalitions. The dominance of males (as in most known hunter-gatherer groups) or females (as in bonobo groups) was not so important.

The Ultra-Micro Level: From Self-training to Normative Rituals

From millions of years, let us move on to minutes and seconds, the main social actions in "here and now" situations that shaped the psyche and behavior of the participants. Hominids were certainly no less sensitive and intelligent than apes, so they learned to react adequately to the mood of their tribesmen. The members of the dominant coalition, and then the rest of the group, responded to inappropriate behavior by amicably expressing a common emotion of disapproval with facial expressions, postures, and certain audible signals, reinforced by the threat of group punishment. Solidary and useful for the group behavior (generosity in sharing, protection of the weak, arrangement of a camp, making a convenient tool), on the contrary, was encouraged, but also amicably and with special vocalization.

Already in these actions, one began to form joint intentionality, group cognitive involvement in a situation, someone's behavior with keeping a common focus of attention and experiencing a common emotion [14; 25, p. 305]. What took place is best qualified as self-training. Indeed, the group systematically censured, ridiculed those who allowed violence, rudeness, greed, cowardice, and expressed approval of the skillful, fortunate, generous, ready to help fellow tribesmen.

Group control over time was provided by soft but perfectly recognizable signs of approval (smiles, patting) and disapproval (scoffing, angry face, contemptuous glances), which are still used in all human communities today.

Sound cues have by no means disappeared. Friendly loud yelling, previously used to frighten the intruder, was no longer necessary. A soft but clearly discernible signal uttered by just one member of the dominant coalition, accompanied by appropriate intonation and facial expressions, was already quite sufficient.

Shift of the Regulative Instance from the Other Person to the Sign

It is extremely important that, over time, the standard audible signals signifying disapproval or approval acquired their regulatory power over individual behavior through interiorization. The mechanism of interiorization itself is already present in animals' pre-rituals: after a fight the loser gets the attitude of subordination, and the winner –  the attitude of domination (see above). At the same time, the whole emotional situation with fear of the victor or triumph over the defeated opponent is "imprinted" in the animal's psyche. The whole situation with the emotion has been interiorized here.

Now consider what happens in the training process. A well-trained dog "understands" commands "lie down!", "sit!", "voice!", "may!", "ugh!", "near!", etc., obeys them even if someone else (in something similar to the master) pronounces these words with an authoritative intonation. There is no interiorization here, because social interaction is still required for "correct" behavior. Recognition and execution of multiple commands by a dog or a circus animal can be traditionally described in terms of conditioned reflexes according to I. Pavlov or in terms of operant conditioning according to B. Skinner [10], but it can also be described as a formed attitude according to Dm. Uznadze [12] –  mood and ability to respond with a learned action to clearly pronounced words spoken by a person.

The notion of "self-training" is significant here because each hominid as an object of "training" was to some extent also its subject, because it learned and was capable not only of recognizing, but also of uttering the same sound signals that signified group approval or disapproval of someone's actions or what was going on in general.

Let us present the three stages of hominid self-training almost exactly according to the stages of formation of mental structures according to L.S. Vygotsky [4]:

1) repetitive situations with a friendly expression of group approval or disapproval of someone's action, accompanied by a specific, well-recognized sound signal;

2) the participant, being alone and wishing to do something disapproved (for example, to eat the extracted food instead of taking it to the parking lot and sharing it), himself loudly utters words of disapproval, imagining displeasure of his tribesmen, and refuses to break; or on the contrary, he does not want to do something approved (to go for prey to a dangerous place, cross a cold river, share the extracted food), but loudly utters encouraging signals and overpowers himself;

3) the same as in item 2, but the sounds are pronounced "in mind," i.e., the attitude is interiorized and attached to the sign.

Thus, the widespread ability of animals to form attitudes in pre-rituals without interiorization is here combined with the ability to interiorize sound signals and obey them thanks to talk in mind. Adherents of psychoanalysis have every right to see here the birth of the instance of the "Super-Ego."

Let's consider difference of new moral feelings –  shame and pride –  from emotions of animals similar to them on external expression. According to their visible signs (a downcast look, the lowered head and shoulders, hunched over) shame really is related to more ancient experience of subordination, oppression. However, shame is a more complex, superstructural feeling, since it includes not only recognition of one's failure and weakness, but also a certain violation of a rule. The main thing becomes not fear of punishment, but the experience of inappropriateness of the behavior.

Pride, similarly expressed by humans, apes and probably ancient hominids (straight posture, raised head, straightened shoulders, straight gaze, burning eyes) means not only and not so much victory in a fight and power advantage, as the very "moral force" according to E. Durkheim, i.e., sense of legitimacy of their behavior, justification of their high social membership in the group, that is prestige [18, p. 176].

Normativity and New Communicative Concerns

With the emergence of normativity, the basic spheres and types of concerns, security, sustenance, prestige, sexuality, parenthood, remained, but social conditions and permissible ways of achieving the corresponding interests and goals were steadily becoming more difficult and complicated. This forced hominids to search for new answers, and in the same chosen "rut" of coordinating actions through communication [8].

Normativity became a magic wand, generating new rules and relations, and hence new social orders. In this new environment, the communication and recognition of more and more signs became critical to individual and group concerns. The more signals appeared, the more criteria to distinguish and recognize them began to be demanded [8, p. 24]. This was achieved by distinguishing syllables and then phonemes, thanks to which syncretic signals turned into proto-syllables (still tied to situations). Distinguishing syllables, phonemes and their meanings became a lexical magic wand, i.e., a mechanism for generating multiple protowords.

Thus, the pre-speech as a stage of language evolution between the communicative system of animals (as well as infant humming in ontogenesis) and protolanguage (with an established phonetic structure and ordering of words autonomous from the situation and semantically related to each other) –  appeared.

Pre-speech is characterized by rudimentary differentiation of proto-syllables (early pre-speech) and differentiation of syllables with basic phonemes (late pre-speech), use of proto-words (with situational, unrelated meanings), and reactive proto-phrases (pronounced in response to a situation a set of protowords without a meaningful order).

Normative settings, the ability to execute and control the execution of rules served as the basis for articulatory standardization, without which the recognition of pronounced combinations of sounds is impossible.

In the absence of the overwhelming power of alpha males, a renewed need to coordinate sentences for decision-making emerged. An entirely new communicative concern of persuasion appeared, involving mutual understanding, and here it was no longer possible to do without mimics and gestures alone [8, p. 23].

The struggle for dominance through intimidation with violence has been replaced by competition for leadership and prestige through mobilization of support based on correct behavior.

The seemingly "natural" order of speaking, where one speaks and the rest remain silent, is a particular norm that was developed and consolidated, probably by transposing the rule of strict order of access to food at communal meals.

The prohibition of sexual violence led to an increase in the importance of courtship and flirtation, which also began to be carried out through aural communication.

Among the first abilities of consciousness associated with increased linguistic complexity were:

  • The ability to jointly focus attention on emotionally meaningful disapproved or approved by the group of fellow tribesmen in the situation at hand (initially in normative rituals);
  • The ability to distinguish, distinguish belonging relations, follow the appropriate rules of access both to goods and resources (tasty food, tools, places) and to fellow tribesmen (especially in the sexual sphere);
  • The ability to distinguish more complex and diverse rules, to recognize others' and one's actions falling under them; to feel shame, pride, anger, respect when comparing actions with norms, moral attitudes;
  • The ability to fix attention to "tomorrow" and "yesterday," and then to hold attention to the alternating days, to orient oneself in them (probably, in connection with the necessity to support the fire, to prepare fuel).

The reconstruction of these complex processes in connection with the renewal of techno-natural niches, social orders, and communicative concerns requires a separate exposition.


