Visible Speech

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 7071 Experts worldwide ranked by ideXlab platform

Aslı Özyürek - One of the best experts on this subject based on the ideXlab platform.

  • Aging and working memory modulate the ability to benefit from Visible Speech and iconic gestures during Speech-in-noise comprehension
    Psychological Research, 2020
    Co-Authors: Louise Schubotz, Linda Drijvers, Judith Holler, Aslı Özyürek
    Abstract:

    When comprehending Speech-in-noise (SiN), younger and older adults benefit from seeing the speaker’s mouth, i.e. Visible Speech. Younger adults additionally benefit from manual iconic co-Speech gestures. Here, we investigate to what extent younger and older adults benefit from perceiving both visual articulators while comprehending SiN, and whether this is modulated by working memory and inhibitory control. Twenty-eight younger and 28 older adults performed a word recognition task in three visual contexts: mouth blurred (Speech-only), Visible Speech, or Visible Speech + iconic gesture. The Speech signal was either clear or embedded in multitalker babble. Additionally, there were two visual-only conditions (Visible Speech, Visible Speech + gesture). Accuracy levels for both age groups were higher when both visual articulators were present compared to either one or none. However, older adults received a significantly smaller benefit than younger adults, although they performed equally well in Speech-only and visual-only word recognition. Individual differences in verbal working memory and inhibitory control partly accounted for age-related performance differences. To conclude, perceiving iconic gestures in addition to Visible Speech improves younger and older adults’ comprehension of SiN. Yet, the ability to benefit from this additional visual information is modulated by age and verbal working memory. Future research will have to show whether these findings extend beyond the single word level.

  • degree of language experience modulates visual attention to Visible Speech and iconic gestures during clear and degraded Speech comprehension
    Cognitive Science, 2019
    Co-Authors: Linda Drijvers, Julija Vaitonyte, Aslı Özyürek
    Abstract:

    Visual information conveyed by iconic hand gestures and Visible Speech can enhance Speech comprehension under adverse listening conditions for both native and non-native listeners. However, how a listener allocates visual attention to these articulators during Speech comprehension is unknown. We used eye-tracking to investigate whether and how native and highly proficient non-native listeners of Dutch allocated overt eye gaze to Visible Speech and gestures during clear and degraded Speech comprehension. Participants watched video clips of an actress uttering a clear or degraded (6-band noise-vocoded) action verb while performing a gesture or not, and were asked to indicate the word they heard in a cued-recall task. Gestural enhancement was the largest (i.e., a relative reduction in reaction time cost) when Speech was degraded for all listeners, but it was stronger for native listeners. Both native and non-native listeners mostly gazed at the face during comprehension, but non-native listeners gazed more often at gestures than native listeners. However, only native but not non-native listeners' gaze allocation to gestures predicted gestural benefit during degraded Speech comprehension. We conclude that non-native listeners might gaze at gesture more as it might be more challenging for non-native listeners to resolve the degraded auditory cues and couple those cues to phonological information that is conveyed by Visible Speech. This diminished phonological knowledge might hinder the use of semantic information that is conveyed by gestures for non-native compared to native listeners. Our results demonstrate that the degree of language experience impacts overt visual attention to visual articulators, resulting in different visual benefits for native versus non-native listeners.

  • Non-native Listeners Benefit Less from Gestures and Visible Speech than Native Listeners During Degraded Speech Comprehension:
    Language and Speech, 2019
    Co-Authors: Linda Drijvers, Aslı Özyürek
    Abstract:

    Native listeners benefit from both Visible Speech and iconic gestures to enhance degraded Speech comprehension (Drijvers & Ozyurek, 2017). We tested how highly proficient non-native listeners benefit from these visual articulators compared to native listeners. We presented videos of an actress uttering a verb in clear, moderately, or severely degraded Speech, while her lips were blurred, Visible, or Visible and accompanied by a gesture. Our results revealed that unlike native listeners, non-native listeners were less likely to benefit from the combined enhancement of Visible Speech and gestures, especially since the benefit from Visible Speech was minimal when the signal quality was not sufficient.

  • visual context enhanced the joint contribution of iconic gestures and Visible Speech to degraded Speech comprehension
    Journal of Speech Language and Hearing Research, 2017
    Co-Authors: Linda Drijvers, Aslı Özyürek
    Abstract:

    Purpose This study investigated whether and to what extent iconic co-Speech gestures contribute to information from Visible Speech to enhance degraded Speech comprehension at different levels of no...

Linda Drijvers - One of the best experts on this subject based on the ideXlab platform.

  • Aging and working memory modulate the ability to benefit from Visible Speech and iconic gestures during Speech-in-noise comprehension
    Psychological Research, 2020
    Co-Authors: Louise Schubotz, Linda Drijvers, Judith Holler, Aslı Özyürek
    Abstract:

    When comprehending Speech-in-noise (SiN), younger and older adults benefit from seeing the speaker’s mouth, i.e. Visible Speech. Younger adults additionally benefit from manual iconic co-Speech gestures. Here, we investigate to what extent younger and older adults benefit from perceiving both visual articulators while comprehending SiN, and whether this is modulated by working memory and inhibitory control. Twenty-eight younger and 28 older adults performed a word recognition task in three visual contexts: mouth blurred (Speech-only), Visible Speech, or Visible Speech + iconic gesture. The Speech signal was either clear or embedded in multitalker babble. Additionally, there were two visual-only conditions (Visible Speech, Visible Speech + gesture). Accuracy levels for both age groups were higher when both visual articulators were present compared to either one or none. However, older adults received a significantly smaller benefit than younger adults, although they performed equally well in Speech-only and visual-only word recognition. Individual differences in verbal working memory and inhibitory control partly accounted for age-related performance differences. To conclude, perceiving iconic gestures in addition to Visible Speech improves younger and older adults’ comprehension of SiN. Yet, the ability to benefit from this additional visual information is modulated by age and verbal working memory. Future research will have to show whether these findings extend beyond the single word level.

  • degree of language experience modulates visual attention to Visible Speech and iconic gestures during clear and degraded Speech comprehension
    Cognitive Science, 2019
    Co-Authors: Linda Drijvers, Julija Vaitonyte, Aslı Özyürek
    Abstract:

    Visual information conveyed by iconic hand gestures and Visible Speech can enhance Speech comprehension under adverse listening conditions for both native and non-native listeners. However, how a listener allocates visual attention to these articulators during Speech comprehension is unknown. We used eye-tracking to investigate whether and how native and highly proficient non-native listeners of Dutch allocated overt eye gaze to Visible Speech and gestures during clear and degraded Speech comprehension. Participants watched video clips of an actress uttering a clear or degraded (6-band noise-vocoded) action verb while performing a gesture or not, and were asked to indicate the word they heard in a cued-recall task. Gestural enhancement was the largest (i.e., a relative reduction in reaction time cost) when Speech was degraded for all listeners, but it was stronger for native listeners. Both native and non-native listeners mostly gazed at the face during comprehension, but non-native listeners gazed more often at gestures than native listeners. However, only native but not non-native listeners' gaze allocation to gestures predicted gestural benefit during degraded Speech comprehension. We conclude that non-native listeners might gaze at gesture more as it might be more challenging for non-native listeners to resolve the degraded auditory cues and couple those cues to phonological information that is conveyed by Visible Speech. This diminished phonological knowledge might hinder the use of semantic information that is conveyed by gestures for non-native compared to native listeners. Our results demonstrate that the degree of language experience impacts overt visual attention to visual articulators, resulting in different visual benefits for native versus non-native listeners.

  • Non-native Listeners Benefit Less from Gestures and Visible Speech than Native Listeners During Degraded Speech Comprehension:
    Language and Speech, 2019
    Co-Authors: Linda Drijvers, Aslı Özyürek
    Abstract:

    Native listeners benefit from both Visible Speech and iconic gestures to enhance degraded Speech comprehension (Drijvers & Ozyurek, 2017). We tested how highly proficient non-native listeners benefit from these visual articulators compared to native listeners. We presented videos of an actress uttering a verb in clear, moderately, or severely degraded Speech, while her lips were blurred, Visible, or Visible and accompanied by a gesture. Our results revealed that unlike native listeners, non-native listeners were less likely to benefit from the combined enhancement of Visible Speech and gestures, especially since the benefit from Visible Speech was minimal when the signal quality was not sufficient.

  • visual context enhanced the joint contribution of iconic gestures and Visible Speech to degraded Speech comprehension
    Journal of Speech Language and Hearing Research, 2017
    Co-Authors: Linda Drijvers, Aslı Özyürek
    Abstract:

    Purpose This study investigated whether and to what extent iconic co-Speech gestures contribute to information from Visible Speech to enhance degraded Speech comprehension at different levels of no...

Barbara Wise - One of the best experts on this subject based on the ideXlab platform.

  • accurate Visible Speech synthesis based on concatenating variable length motion capture data
    IEEE Transactions on Visualization and Computer Graphics, 2006
    Co-Authors: Ronald A Cole, Bryan L Pellom, Wayne H Ward, Barbara Wise
    Abstract:

    We present a novel approach to synthesizing accurate Visible Speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in Visible Speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end Visible Speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the Visible Speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for Visible Speech synthesis.

  • accurate automatic Visible Speech synthesis of arbitrary 3d models based on concatenation of diviseme motion capture data
    Computer Animation and Virtual Worlds, 2004
    Co-Authors: Ronald A Cole, Bryan L Pellom, Wayne H Ward, Barbara Wise
    Abstract:

    We present a technique for accurate automatic Visible Speech synthesis from textual input. When provided with a Speech waveform and the text of a spoken sentence, the system produces accurate Visible Speech synchronized with the audio signal. To develop the system, we collected motion capture data from a speaker's face during production of a set of words containing all diviseme sequences in English. The motion capture points from the speaker's face are retargeted to the vertices of the polygons of a 3D face model. When synthesizing a new utterance, the system locates the required sequence of divisemes, shrinks or expands each diviseme based on the desired phoneme segment durations in the target utterance, then moves the polygons in the regions of the lips and lower face to correspond to the spatial coordinates of the motion capture data. The motion mapping is realized by a key-shape mapping function learned by a set of viseme examples in the source and target faces. A well-posed numerical algorithm estimates the shape blending coefficients. Time warping and motion vector blending at the juncture of two divisemes and the algorithm to search the optimal concatenated Visible Speech are also developed to provide the final concatenative motion sequence. Copyright © 2004 John Wiley & Sons, Ltd.

  • accurate automatic Visible Speech synthesis of arbitrary 3d models based on concatenation of diviseme motion capture data research articles
    Computer Animation and Virtual Worlds, 2004
    Co-Authors: Ronald A Cole, Bryan L Pellom, Wayne H Ward, Barbara Wise
    Abstract:

    We present a technique for accurate automatic Visible Speech synthesis from textual input. When provided with a Speech waveform and the text of a spoken sentence, the system produces accurate Visible Speech synchronized with the audio signal. To develop the system, we collected motion capture data from a speaker's face during production of a set of words containing all diviseme sequences in English. The motion capture points from the speaker's face are retargeted to the vertices of the polygons of a 3D face model. When synthesizing a new utterance, the system locates the required sequence of divisemes, shrinks or expands each diviseme based on the desired phoneme segment durations in the target utterance, then moves the polygons in the regions of the lips and lower face to correspond to the spatial coordinates of the motion capture data. The motion mapping is realized by a key-shape mapping function learned by a set of viseme examples in the source and target faces. A well-posed numerical algorithm estimates the shape blending coefficients. Time warping and motion vector blending at the juncture of two divisemes and the algorithm to search the optimal concatenated Visible Speech are also developed to provide the final concatenative motion sequence. Copyright © 2004 John Wiley & Sons, Ltd.

Laura A Thompson - One of the best experts on this subject based on the ideXlab platform.

  • reliance on Visible Speech cues during multimodal language processing individual and age differences
    Experimental Aging Research, 2007
    Co-Authors: Laura A Thompson, E Garcia, D Malloy
    Abstract:

    The current study demonstrates that when a strong inhibition process is invoked during multimodal (auditory-visual) language understanding: older adults perform worse than younger adults, Visible Speech does not benefit language-processing performance, and individual differences in measures of working memory for language do not predict performance. In contrast, in a task that does not invoke inhibition: adult age differences in performance are not obtained, Visible Speech benefits language performance, and individual differences in working memory predict performance. The results offer support for a framework for investigating multimodal language processing that incorporates assumptions about general information processing, individual differences in working memory capacity, and adult cognitive aging.

  • attention resources and Visible Speech encoding in older and younger adults
    Experimental Aging Research, 2004
    Co-Authors: Laura A Thompson, Daniel M Malloy
    Abstract:

    Two experiments investigated adult age differences in the distribution of attention across a speaker's face during auditory-visual language processing. Dots were superimposed on the faces of speakers for 17-ms presentations, and participants reported the spatial locations of the dots. In Experiment 1, older adults showed relatively better detection performance at the mouth area than the eye area compared to younger adults. In Experiment 2, in the absence of audible language, both age groups did not differentially focus on the mouth area. The results are interpreted in light of Massaro's (1998, Perceiving talking faces: From Speech perception to a behavioral principle. Cambridge, MA: MIT Press) theoretical framework for understanding auditory-visual Speech perception. It is claimed that older adults' greater reliance on Visible Speech is due to a reallocation of resources away from the eyes and toward the mouth area of the face.

  • some limits on encoding Visible Speech and gestures using a dichotic shadowing task
    Journals of Gerontology Series B-psychological Sciences and Social Sciences, 1999
    Co-Authors: Laura A Thompson, Felipe A Guzman
    Abstract:

    Visible Speech and gestures are two forms of available language information that can be used by listeners to help them understand the speaker's meaning. Previous research has shown that older adults are particularly dependent on Visible Speech, yet seem to profit less than younger adults from the speaker's gestures. To understand how Visible Speech and gestures are used when listening becomes difficult, the authors conducted an experiment with a dichotic shadowing task. The experiment examined how accurately participants could shadow the right- or left-ear input when instructed to attend selectively to a particular ear and whether performance benefited from visual input. The results indicate that older adults' shadowing performance was unaffected by Visible Speech and gestures. Younger adults did benefit by both Visible Speech and gestures. Thus, under extremely attention-demanding listening conditions, older adults are unable to use a compensatory mechanism for encoding visual language.

  • Visible Speech improves human language understanding implications for Speech processing systems
    Artificial Intelligence Review, 1995
    Co-Authors: Laura A Thompson, William C Ogden
    Abstract:

    Evidence from the study of human language understanding is presented suggesting that our ability to perceive Visible Speech can greatly influence our ability to understand and remember spoken language. A view of the speaker’s face can greatly aid in the perception of ambiguous or noisy Speech and can aid cognitive processing of Speech leading to better understanding and recall. Some of these effects have been replicated using computer synthesized visual and auditory Speech. Thus, it appears that when giving an interface a voice, it may be best to give it a face too.

  • encoding and memory for Visible Speech and gestures a comparison between young and older adults
    Psychology and Aging, 1995
    Co-Authors: Laura A Thompson
    Abstract:

    Two experiments explored whether older adults have developed a strategy of compensating for slower speeds of language processing and hearing loss by relying more on the visual modality. Experiment 1 examined the influence of visual articulatory movements of the face (Visible Speech ) in auditory-visual syllable classification in young adults and older adults. Older adults showed a significantly greater influence of Visible Speech. Experiment 2 examined immediate recall in three spoken-language sentence conditions : Speech alone, with Visible Speech, or with both Visible Speech and iconic gestures. Sentences also varied in meaningfulness and Speech rate. In the old adult group, recall was better for sentences containing Visible Speech compared with the Speech-alone sentences in the meaningful sentence condition. Old adults' recall showed no overall benefit of the presence of gestures. Young adults' recall on meaningful sentences was not higher for the Visible Speech compared with the Speech-alone condition, whereas recall was significantly higher with the addition of iconic gestures. In the anomalous sentence condition, both young and old adults showed an advantage in recall by the presence of Visible Speech. The experiments provide converging evidence for old adults' greater reliance on Visible Speech while processing visual-spoken language.

Lawrence D Rosenblum - One of the best experts on this subject based on the ideXlab platform.

  • visibility of Speech articulation enhances auditory phonetic convergence
    Attention Perception & Psychophysics, 2016
    Co-Authors: James W Dias, Lawrence D Rosenblum
    Abstract:

    Talkers automatically imitate aspects of perceived Speech, a phenomenon known as phonetic convergence. Talkers have previously been found to converge to auditory and visual Speech information. Furthermore, talkers converge more to the Speech of a conversational partner who is seen and heard, relative to one who is just heard (Dias & Rosenblum Perception, 40, 1457-1466, 2011). A question raised by this finding is what visual information facilitates the enhancement effect. In the following experiments, we investigated the possible contributions of Visible Speech articulation to visual enhancement of phonetic convergence within the noninteractive context of a shadowing task. In Experiment 1, we examined the influence of the visibility of a talker on phonetic convergence when shadowing auditory Speech either in the clear or in low-level auditory noise. The results suggest that visual Speech can compensate for convergence that is reduced by auditory noise masking. Experiment 2 further established the visibility of articulatory mouth movements as being important to the visual enhancement of phonetic convergence. Furthermore, the word frequency and phonological neighborhood density characteristics of the words shadowed were found to significantly predict phonetic convergence in both experiments. Consistent with previous findings (e.g., Goldinger Psychological Review, 105, 251-279, 1998), phonetic convergence was greater when shadowing low-frequency words. Convergence was also found to be greater for low-density words, contrasting with previous predictions of the effect of phonological neighborhood density on auditory phonetic convergence (e.g., Pardo, Jordan, Mallari, Scanlon, & Lewandowski Journal of Memory and Language, 69, 183-195, 2013). Implications of the results for a gestural account of phonetic convergence are discussed.

  • hearing a face cross modal speaker matching using isolated Visible Speech
    Attention Perception & Psychophysics, 2006
    Co-Authors: Lawrence D Rosenblum, Nicolas M Smith, Sarah M Nichols, Steven Hale, Joanne Lee
    Abstract:

    An experiment was performed to test whether cross-modal speaker matches could be made using isolated Visible Speech movement information. Visible Speech movements were isolated using a pointlight technique. In five conditions, subjects were asked to match a voice to one of two (unimodal) speaking point-light faces on the basis of speaker identity. Two of these conditions were designed to maintain the idiosyncratic Speech dynamics of the speakers, whereas three of the conditions deleted or distorted the dynamics in various ways. Some of these conditions also equated video frames across dynamically correct and distorted movements. The results revealed generally better matching performance in the conditions that maintained the correct Speech dynamics than in those conditions that did not, despite containing exactly the same video frames. The results suggest that Visible Speech movements themselves can support cross-modal speaker matching.

  • an audiovisual test of kinematic primitives for visual Speech perception
    Journal of Experimental Psychology: Human Perception and Performance, 1996
    Co-Authors: Lawrence D Rosenblum, Helena M Saldana
    Abstract:

    Isolated kinematic properties of Visible Speech can provide information for lip reading. Kinematic facial information is isolated by darkening an actor's face and attaching dots to various articulators so that only moving dots can be seen with no facial features present. To test the salience of these images, the authors conducted experiments to determine whether the images could visually influence the perception of discrepant auditory syllables. Results showed that these images can influence auditory Speech independently of the participant's knowledge of the stimuli. In other experiments, single frozen frames of Visible syllables were presented with discrepant auditory syllables to test the salience of static facial features. Although the influence of the kinematic stimuli was perceptual, any influence of the static featural stimuli was likely based on participant's misunderstanding or postperceptual response bias.