Audio-visual Influence on Speech Perception

Audio-visual Influence on Speech Perception

Author: Lena Quinto

Publisher:

Published: 2007

Total Pages: 60

ISBN-13: 9780494403983

DOWNLOAD EBOOK

The importance of visual cues in speech perception is illustrated by the McGurk effect, whereby incongruent visual cues affect the perception speech sounds. It is unclear whether similar effects occur for sung materials. In Experiment 1, participants heard sequences of syllables (la-la-la-ba, la-la-la-ga) that were spoken or sung. Sung stimuli were ascending triads (do-mi-so) that returned to the tonic (do). Incongruent stimuli were created by combining an auditory /ba/ with a visual /ga/. Participants reported the final syllable. Results revealed overwhelming auditory dominance for spoken and for sung conditions. In Experiment 2, background noise was added to increase attention to visual cues. Auditory dominance prevailed in quiet but visual dominance prevailed in noise. In Experiment 3 the target syllable was isolated. As before, participants exhibited auditory dominance, but they had greater difficulty detecting sung syllables than spoken syllables that were presented in isolation. The contributions of visual and auditory cues from the preceding context are discussed.


Visual and Auditory Factors Facilitating Multimodal Speech Perception

Visual and Auditory Factors Facilitating Multimodal Speech Perception

Author: Pamela Ver Hulst

Publisher:

Published: 2006

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

Abstract: Speech perception is often described as a unimodal process, when in reality it involves the integration of multiple sensory modalities, specifically, vision and hearing. Individuals use visual information to fill in missing pieces of auditory information when hearing has been compromised, such as with a hearing loss. However, individuals use visual cues even when auditory cues are perfect, and cannot ignore the integration that occurs between auditory and visual inputs when listening to speech. It is well known that individuals differ in their ability to integrate auditory and visual speech information, and likewise that some individuals produce clearer speech signals than others, either auditorily or visually. Clark (2005) found that some talkers in a study of the McGurk effect, produced much stronger 'integration effects' than did other talkers. One possible underlying mechanism of auditory + visual integration is the substantial redundancy found in the auditory speech signal. But how much redundancy is necessary for effective integration? And what auditory and visual characteristics make a good integration talker? The present study examined these questions by comparing the auditory intelligibility, visual intelligibility, and the degree of integration for speech sounds that were highly reduced in auditory redundancy, produced by 7 different talkers. Performance of participants under four conditions: 1) degraded auditory only, 2) visual only, 3) degraded auditory + visual, and 4) non-degraded auditory + visual, was examined. Results indicate across-talker differences in auditory and auditory + visual intelligibility. Degrading the auditory stimulus did not affect the overall amount of McGurk-type integration, but did influence the type of McGurk integration observed.


Audiovisual Speech Recognition: Correspondence between Brain and Behavior

Audiovisual Speech Recognition: Correspondence between Brain and Behavior

Author: Nicholas Altieri

Publisher: Frontiers E-books

Published: 2014-07-09

Total Pages: 102

ISBN-13: 2889192512

DOWNLOAD EBOOK

Perceptual processes mediating recognition, including the recognition of objects and spoken words, is inherently multisensory. This is true in spite of the fact that sensory inputs are segregated in early stages of neuro-sensory encoding. In face-to-face communication, for example, auditory information is processed in the cochlea, encoded in auditory sensory nerve, and processed in lower cortical areas. Eventually, these “sounds” are processed in higher cortical pathways such as the auditory cortex where it is perceived as speech. Likewise, visual information obtained from observing a talker’s articulators is encoded in lower visual pathways. Subsequently, this information undergoes processing in the visual cortex prior to the extraction of articulatory gestures in higher cortical areas associated with speech and language. As language perception unfolds, information garnered from visual articulators interacts with language processing in multiple brain regions. This occurs via visual projections to auditory, language, and multisensory brain regions. The association of auditory and visual speech signals makes the speech signal a highly “configural” percept. An important direction for the field is thus to provide ways to measure the extent to which visual speech information influences auditory processing, and likewise, assess how the unisensory components of the signal combine to form a configural/integrated percept. Numerous behavioral measures such as accuracy (e.g., percent correct, susceptibility to the “McGurk Effect”) and reaction time (RT) have been employed to assess multisensory integration ability in speech perception. On the other hand, neural based measures such as fMRI, EEG and MEG have been employed to examine the locus and or time-course of integration. The purpose of this Research Topic is to find converging behavioral and neural based assessments of audiovisual integration in speech perception. A further aim is to investigate speech recognition ability in normal hearing, hearing-impaired, and aging populations. As such, the purpose is to obtain neural measures from EEG as well as fMRI that shed light on the neural bases of multisensory processes, while connecting them to model based measures of reaction time and accuracy in the behavioral domain. In doing so, we endeavor to gain a more thorough description of the neural bases and mechanisms underlying integration in higher order processes such as speech and language recognition.


Speech Perception By Ear and Eye

Speech Perception By Ear and Eye

Author: Dominic W. Massaro

Publisher: Psychology Press

Published: 2014-01-02

Total Pages: 356

ISBN-13: 131776045X

DOWNLOAD EBOOK

First published in 1987. This book is about the processing of information. The central domain of interest is face-to-face communication in which the speaker makes available both audible and visible characteristics to the perceiver. Articulation by the speaker creates changes in atmospheric pressure for hearing and provides tongue, lip, jaw, and facial movements for seeing. These characteristics must be processed by the perceiver to recover the message conveyed by the speaker. The speaker and perceiver must share a language to make communication possible; some internal representation is necessarily functional for the perceiver to recover the message of the speak.


Auditory and Visual Characteristics of Individual Talkers in Multimodal Speech Perception

Auditory and Visual Characteristics of Individual Talkers in Multimodal Speech Perception

Author: Corinne D. Anderson

Publisher:

Published: 2007

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

Abstract: When people think about understanding speech, they primarily think about perceiving speech auditorily (via hearing); however, there are actually two key components to speech perception: auditory and visual. Speech perception is a multimodal process; i.e., combining more than one sense, involving the integration of auditory information and visual cues. Visual cues can supplement missing auditory information; for example, when auditory information is compromised, such as in noisy environments, seeing a talker's face can help a listener understand speech. Interestingly, auditory and visual integration occurs all of the time, even when the auditory and visual signals are perfectly intelligible. The role that visual cues play in speech perception is evidenced in a phenomenon known as the McGurk effect, which demonstrates how auditory and visual cues are integrated (McGurk and MacDonald, 1976). Previous studies of audiovisual speech perception suggest that there are several factors affecting auditory and visual integration. One factor is characteristics of the auditory and visual signals; i.e., how much information is necessary in each signal for listeners to optimally integrate auditory and visual cues. A second factor is the auditory and visual characteristics of individual talkers; e.g., visible cues such as mouth opening or acoustic cues such as speech clarity, that might facilitate integration. A third factor is characteristics of the individual listener; such as central auditory or visual abilities, that might facilitate greater or lesser degrees of integration (Grant and Seitz, 1998). The present study focused on the second factor, looking at both auditory and visual talker characteristics and their effect on auditory and visual integration of listeners. Preliminary results of this study show considerable variability across talkers in the auditory only condition, suggesting that different talkers have different degrees of auditory intelligibility. Interestingly, there were also substantial differences in the amount of audiovisual integration produced by different talkers that were not highly correlated with auditory intelligibility, suggesting talkers who have optimal auditory intelligibility are not the same talkers that facilitate optimal audiovisual integration.


Audiovisual Speech Perception with Degraded Auditory Cues

Audiovisual Speech Perception with Degraded Auditory Cues

Author: Elizabeth Anderson

Publisher:

Published: 2006

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

Abstract: Speech perception, although generally assumed to be a primarily auditory process, also depends on visual cues. Audio and visual signals are not only used together when signals are compromised, such as in a noisy environment, but also when the signals are completely intelligible. McGurk and MacDonald (1976) demonstrated the integration of these cues in a paradigm known today as the McGurk effect. One possible underlying explanation for the McGurk effect is the substantial redundancy in the auditory speech signal. An unanswered question concerns the circumstances that promote optimal perception of auditory and visual signals; is integration improved when one or both signals contain some ambiguity, or is a certain degree of redundancy necessary for integration to occur? If so, how much redundancy is necessary for optimal integration? The present study began to examine the amount of redundancy necessary for optimal auditory + visual integration. Audio portions of speech recordings were degraded using a software program that reduced the speech signals to four spectral bands, effectively reducing the redundancy of the auditory signal. Performance of participants under four conditions; 1) degraded auditory only, 2) visual only, 3) degraded auditory + visual, and 4) non-degraded auditory + visual, was explored to assess the degree of integration when the redundancy of the auditory signal is reduced. Integration was determined by; 1) comparing the percent of integration across degraded and non-degraded auditory + visual conditions to degraded-auditory only and visual only conditions, and 2) recording the percent of degraded auditory + visual McGurk responses. Results indicate that reducing the redundancy of the auditory signal has no significant effect on auditory + visual integration, suggesting that the amount of redundancy in the auditory signal does not influence the degree of multimodal integration.


Hearing Eye II

Hearing Eye II

Author: Douglas Burnham

Publisher: Psychology Press

Published: 2013-10-28

Total Pages: 318

ISBN-13: 1135471967

DOWNLOAD EBOOK

This volume outlines some of the developments in practical and theoretical research into speechreading lipreading that have taken place since the publication of the original "Hearing by Eye". It comprises 15 chapters by international researchers in psychology, psycholinguistics, experimental and clinical speech science, and computer engineering. It answers theoretical questions what are the mechanisms by which heard and seen speech combine? and practical ones what makes a good speechreader? Can machines be programmed to recognize seen and seen-and-heard speech?. The book is written in a non-technical way and starts to articulate a behaviourally-based but cross-disciplinary programme of research in understanding how natural language can be delivered by different modalities.


Perceiving Talking Faces

Perceiving Talking Faces

Author: Dominic W. Massaro

Publisher: MIT Press

Published: 1998

Total Pages: 524

ISBN-13: 9780262133371

DOWNLOAD EBOOK

This book discusses the author's experiments on the use of multiple cues in speech perception and other areas and unifies the results through a logical model of perception.


Speech Perception, Production and Acquisition

Speech Perception, Production and Acquisition

Author: Huei‐Mei Liu

Publisher: Springer Nature

Published: 2020-09-14

Total Pages: 277

ISBN-13: 9811576068

DOWNLOAD EBOOK

This book addresses important issues of speech processing and language learning in Chinese. It highlights perception and production of speech in healthy and clinical populations and in children and adults. This book provides diverse perspectives and reviews of cutting-edge research in past decades on how Chinese speech is processed and learned. Along with each chapter, future research directions have been discussed. With these unique features and the broad coverage of topics, this book appeals to not only scholars and students who study speech perception in preverbal infants and in children and adults learning Chinese, but also to teachers with interests in pedagogical applications in teaching Chinese as Second Language.