lip-reading
lip-reading The common understanding is that lip-reading is trying to understand speech by watching the lip and mouth movements of someone speaking when the normal accompanying speech sounds cannot be heard. This has led to the general assumption that lip-reading is, by and large, a preoccupation of the deaf and hard-of-hearing. But psychological and psychophysical studies, together with modern brain imaging techniques, have shown that there is more to lip-reading than meets the proverbial eye.
In 1880, the World Congress of Educators in Milan resolved that deaf children everywhere should be taught to lip-read. The decision was predicated on the belief that a speaking face provides sufficient linguistic information to permit speech comprehension, and that practice is all that is necessary to effect skilled performance. In recent years, however, it has become apparent that, for a child who becomes deaf before learning to speak, speech recognition through lip-reading alone is an ability slowly mastered, accompanied often by delays in acquiring an understanding of language and impaired communication skills later in life. And yet infants who can hear normally learn to speak more quickly than blind children, and this is probably because they can also see movements of the faces and lips of other speakers. So how can such apparently contradictory observations be resolved, and how should we properly conceive of the information conveyed visually? A clue is provided by the detailed analysis of how lip-reading is possible at all.
The ability to extract verbal information from the speaker's face relies on the fact that the configuration of the visible articulators, primarily the lips, teeth, and tongue, shapes the simple resonances of the vocal track to modulate the emitted sound. Visual speech cues permit the discrimination of the place of articulation of certain consonants (e.g. ‘p’ can be distinguished from ‘v’), as well as the identity of different vowels. However, some parts of the acoustic speech signal are generated by movements within the oral cavity that cannot be seen and are consequently indistinguishable visually (e.g. visual perception of ‘ga’ and of ‘ka’ are virtually identical). Thus, visible speech cues provide some, but not all, of the linguistic information that acoustic signals offer, necessitating a certain amount of ‘guesswork’ to fill in the gaps when sound is absent. In fact it turns out that these visual cues provide information about precisely those parts of speech that are most difficult to discriminate by ear alone. Lip-reading, therefore, directly complements auditory speech perception: speech comprehension is very much an audio-visual activity. To conceive of lip-reading as a cognitive ability in isolation from auditory speech perception is to misunderstand the nature of its contribution to human communication. Lip-reading, it seems, is useful to all sighted people, including those with normal hearing.
From early infancy, human beings are predisposed to put together what they see and what they hear. Even at the age of 4 months, infants presented with two video displays of a person articulating different vowels, while listening to a tape recording of one of the vowels, attend selectively to the video of the vowel that can be heard. The early development of this capacity to link sound and sight may relate to the fact that, especially for infants, heard speech is usually accompanied by the sight of the speaker. By the time we reach maturity, the visual information emanating from a speaker's mouth and face during normal conversation plays a significant role in influencing the perception and understanding of spoken language. This is particularly apparent in noisy surroundings, such as a crowded party, where seeing a speaker can improve intelligibility to a degree equivalent to increasing the intensity of their voice by about 15 decibels. Nevertheless, we are usually unaware of the influence of visual cues on heard speech. They become apparent only when they contradict the auditory information. This happens, for example, when watching a poorly dubbed movie: the late or early onset of the speaker's lip movements hampers our ability to hear what is otherwise a clean auditory signal. The potency of the influence of vision on what is heard was graphically demonstrated by psychologists Harry McGurk and John MacDonald in the 1970s. They artificially induced a mismatch between the auditory and visual channels by dubbing the sound of a spoken syllable onto a videotape of someone mouthing a different syllable, and demonstrated that the seen syllable reliably influenced what viewers heard (even if they knew exactly what was going on). For example, pairing an auditory ‘ba’ with a visual ‘ga’ generally induced the perception of something intermediate, typically ‘da’. Instructing subjects to attend solely to the auditory signal made no difference to their report, as long as their eyes were open. This suggests that the visual processing of speech is a significant, and perhaps mandatory, part of the speech process when the speaker is in sight. But how do these visual speech cues help us to hear what is being said?
In 1997 a team of scientists used the novel brain imaging technique, called ‘functional magnetic resonance imaging’, to try to discover where in the brain the sight of someone speaking might influence the perception of that they are saying. When normal, hearing people looked at a video of someone speaking, with the sound turned off, areas in the occipital lobe of the cerebral cortex, long known to be involved in processing visual information, became active. But, surprisingly, so did cortical areas that are normally activated by listening to speech (including the primary auditory region, thought to be involved in rather simple processing of sound). This visual activation of auditory areas even occurred when the video showed someone silently mouthing incomprehensible nonsense, but did not happen if the face was simply moving without the lips being opened — suggesting that visual signals are sent to the auditory cortex whenever they look like vocalizations, but that this process happens relatively automatically, without actual recognition of speech. These findings point to a physiological explanation for the subjective experience that you can almost hear what's being said when you watch someone speaking in a silent movie. If the effect of visual speech cues broadly equates to turning up the volume knob, small wonder that George Bush famously told US voters: ‘Read my lips’!
See also cerebral cortex; deafness; hearing; imaging techniques; language; sensory integration.
In 1880, the World Congress of Educators in Milan resolved that deaf children everywhere should be taught to lip-read. The decision was predicated on the belief that a speaking face provides sufficient linguistic information to permit speech comprehension, and that practice is all that is necessary to effect skilled performance. In recent years, however, it has become apparent that, for a child who becomes deaf before learning to speak, speech recognition through lip-reading alone is an ability slowly mastered, accompanied often by delays in acquiring an understanding of language and impaired communication skills later in life. And yet infants who can hear normally learn to speak more quickly than blind children, and this is probably because they can also see movements of the faces and lips of other speakers. So how can such apparently contradictory observations be resolved, and how should we properly conceive of the information conveyed visually? A clue is provided by the detailed analysis of how lip-reading is possible at all.
The ability to extract verbal information from the speaker's face relies on the fact that the configuration of the visible articulators, primarily the lips, teeth, and tongue, shapes the simple resonances of the vocal track to modulate the emitted sound. Visual speech cues permit the discrimination of the place of articulation of certain consonants (e.g. ‘p’ can be distinguished from ‘v’), as well as the identity of different vowels. However, some parts of the acoustic speech signal are generated by movements within the oral cavity that cannot be seen and are consequently indistinguishable visually (e.g. visual perception of ‘ga’ and of ‘ka’ are virtually identical). Thus, visible speech cues provide some, but not all, of the linguistic information that acoustic signals offer, necessitating a certain amount of ‘guesswork’ to fill in the gaps when sound is absent. In fact it turns out that these visual cues provide information about precisely those parts of speech that are most difficult to discriminate by ear alone. Lip-reading, therefore, directly complements auditory speech perception: speech comprehension is very much an audio-visual activity. To conceive of lip-reading as a cognitive ability in isolation from auditory speech perception is to misunderstand the nature of its contribution to human communication. Lip-reading, it seems, is useful to all sighted people, including those with normal hearing.
From early infancy, human beings are predisposed to put together what they see and what they hear. Even at the age of 4 months, infants presented with two video displays of a person articulating different vowels, while listening to a tape recording of one of the vowels, attend selectively to the video of the vowel that can be heard. The early development of this capacity to link sound and sight may relate to the fact that, especially for infants, heard speech is usually accompanied by the sight of the speaker. By the time we reach maturity, the visual information emanating from a speaker's mouth and face during normal conversation plays a significant role in influencing the perception and understanding of spoken language. This is particularly apparent in noisy surroundings, such as a crowded party, where seeing a speaker can improve intelligibility to a degree equivalent to increasing the intensity of their voice by about 15 decibels. Nevertheless, we are usually unaware of the influence of visual cues on heard speech. They become apparent only when they contradict the auditory information. This happens, for example, when watching a poorly dubbed movie: the late or early onset of the speaker's lip movements hampers our ability to hear what is otherwise a clean auditory signal. The potency of the influence of vision on what is heard was graphically demonstrated by psychologists Harry McGurk and John MacDonald in the 1970s. They artificially induced a mismatch between the auditory and visual channels by dubbing the sound of a spoken syllable onto a videotape of someone mouthing a different syllable, and demonstrated that the seen syllable reliably influenced what viewers heard (even if they knew exactly what was going on). For example, pairing an auditory ‘ba’ with a visual ‘ga’ generally induced the perception of something intermediate, typically ‘da’. Instructing subjects to attend solely to the auditory signal made no difference to their report, as long as their eyes were open. This suggests that the visual processing of speech is a significant, and perhaps mandatory, part of the speech process when the speaker is in sight. But how do these visual speech cues help us to hear what is being said?
In 1997 a team of scientists used the novel brain imaging technique, called ‘functional magnetic resonance imaging’, to try to discover where in the brain the sight of someone speaking might influence the perception of that they are saying. When normal, hearing people looked at a video of someone speaking, with the sound turned off, areas in the occipital lobe of the cerebral cortex, long known to be involved in processing visual information, became active. But, surprisingly, so did cortical areas that are normally activated by listening to speech (including the primary auditory region, thought to be involved in rather simple processing of sound). This visual activation of auditory areas even occurred when the video showed someone silently mouthing incomprehensible nonsense, but did not happen if the face was simply moving without the lips being opened — suggesting that visual signals are sent to the auditory cortex whenever they look like vocalizations, but that this process happens relatively automatically, without actual recognition of speech. These findings point to a physiological explanation for the subjective experience that you can almost hear what's being said when you watch someone speaking in a silent movie. If the effect of visual speech cues broadly equates to turning up the volume knob, small wonder that George Bush famously told US voters: ‘Read my lips’!
Gemma Calvert
Bibliography
Dodd, B. and and Campbell, R. (1987). Hearing by eye. Lawrence Erlbaum Associates, London.
See also cerebral cortex; deafness; hearing; imaging techniques; language; sensory integration.
More From encyclopedia.com
About this article
lip reading
All Sources -
You Might Also Like
NEARBY TERMS
lip reading