Speech Acquisition

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 327 Experts worldwide ranked by ideXlab platform

Jingdong Chen - One of the best experts on this subject based on the ideXlab platform.

  • Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems
    2010
    Co-Authors: Yiteng Huang, Jingdong Chen, Shaoyan Chen
    Abstract:

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant Speech Acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., Speech) and noise that exists in the spatial and temporal domains. As a result, the automatic Speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the Speech interface useful. The developed Speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, Speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array Speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using Speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  • Speech Acquisition and enhancement in a reverberant cocktail party like environment
    International Conference on Acoustics Speech and Signal Processing, 2006
    Co-Authors: Yiteng Huang, Jacob Benesty, Jingdong Chen
    Abstract:

    Developing a successful multi-microphone Speech Acquisition system in a reverberant, cocktail-party-like environment is a very challenging problem since both interfering sources and reverberation need to be well controlled. In this paper, we propose an algorithm based on blind SIMO identification. We first blindly identify the channels from the interfering sources to all the microphones. Then we extract the Speech signal of interest. Finally Speech dereverberation is performed using the MINT method. Simulations with acoustic impulse responses measured in the varechoic chamber at Bell Labs are carried out to verify the proposed algorithm.

  • ICASSP (5) - Speech Acquisition and Enhancement in a Reverberant, Cocktail-Party-Like Environment
    2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 1
    Co-Authors: Yiteng Huang, Jacob Benesty, Jingdong Chen
    Abstract:

    Developing a successful multi-microphone Speech Acquisition system in a reverberant, cocktail-party-like environment is a very challenging problem since both interfering sources and reverberation need to be well controlled. In this paper, we propose an algorithm based on blind SIMO identification. We first blindly identify the channels from the interfering sources to all the microphones. Then we extract the Speech signal of interest. Finally Speech dereverberation is performed using the MINT method. Simulations with acoustic impulse responses measured in the varechoic chamber at Bell Labs are carried out to verify the proposed algorithm.

Yiteng Huang - One of the best experts on this subject based on the ideXlab platform.

  • Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems
    2010
    Co-Authors: Yiteng Huang, Jingdong Chen, Shaoyan Chen
    Abstract:

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant Speech Acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., Speech) and noise that exists in the spatial and temporal domains. As a result, the automatic Speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the Speech interface useful. The developed Speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, Speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array Speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using Speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  • Speech Acquisition and enhancement in a reverberant cocktail party like environment
    International Conference on Acoustics Speech and Signal Processing, 2006
    Co-Authors: Yiteng Huang, Jacob Benesty, Jingdong Chen
    Abstract:

    Developing a successful multi-microphone Speech Acquisition system in a reverberant, cocktail-party-like environment is a very challenging problem since both interfering sources and reverberation need to be well controlled. In this paper, we propose an algorithm based on blind SIMO identification. We first blindly identify the channels from the interfering sources to all the microphones. Then we extract the Speech signal of interest. Finally Speech dereverberation is performed using the MINT method. Simulations with acoustic impulse responses measured in the varechoic chamber at Bell Labs are carried out to verify the proposed algorithm.

  • ICASSP (5) - Speech Acquisition and Enhancement in a Reverberant, Cocktail-Party-Like Environment
    2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 1
    Co-Authors: Yiteng Huang, Jacob Benesty, Jingdong Chen
    Abstract:

    Developing a successful multi-microphone Speech Acquisition system in a reverberant, cocktail-party-like environment is a very challenging problem since both interfering sources and reverberation need to be well controlled. In this paper, we propose an algorithm based on blind SIMO identification. We first blindly identify the channels from the interfering sources to all the microphones. Then we extract the Speech signal of interest. Finally Speech dereverberation is performed using the MINT method. Simulations with acoustic impulse responses measured in the varechoic chamber at Bell Labs are carried out to verify the proposed algorithm.

Darren Moore - One of the best experts on this subject based on the ideXlab platform.

  • Speech Acquisition in meetings with an audio-visual sensor array
    IEEE International Conference on Multimedia and Expo ICME 2005, 2005
    Co-Authors: Iain Mccowan, Maganti Hari Krishna, Darren Moore, Daniel Gatica-perez, Sileye Ba
    Abstract:

    Close-talk headset microphones have been traditionally used for Speech Acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks-than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and Speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates the output of the tracker with a beamformer for Speech enhancement. We compare and discuss the features of the resulting Speech signal with respect to that obtained from single close-talking and table-top microphones

  • ICME - Speech Acquisition in Meetings with an Audio-Visual Sensor Array
    2005 IEEE International Conference on Multimedia and Expo, 1
    Co-Authors: Iain A. Mccowan, Maganti Hari Krishna, Daniel Gatica-perez, Darren Moore
    Abstract:

    Close-talk headset microphones have been traditionally used for Speech Acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks-than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and Speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates the output of the tracker with a beamformer for Speech enhancement. We compare and discuss the features of the resulting Speech signal with respect to that obtained from single close-talking and table-top microphones

Jacob Benesty - One of the best experts on this subject based on the ideXlab platform.

  • Speech Acquisition and enhancement in a reverberant cocktail party like environment
    International Conference on Acoustics Speech and Signal Processing, 2006
    Co-Authors: Yiteng Huang, Jacob Benesty, Jingdong Chen
    Abstract:

    Developing a successful multi-microphone Speech Acquisition system in a reverberant, cocktail-party-like environment is a very challenging problem since both interfering sources and reverberation need to be well controlled. In this paper, we propose an algorithm based on blind SIMO identification. We first blindly identify the channels from the interfering sources to all the microphones. Then we extract the Speech signal of interest. Finally Speech dereverberation is performed using the MINT method. Simulations with acoustic impulse responses measured in the varechoic chamber at Bell Labs are carried out to verify the proposed algorithm.

  • ICASSP (5) - Speech Acquisition and Enhancement in a Reverberant, Cocktail-Party-Like Environment
    2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 1
    Co-Authors: Yiteng Huang, Jacob Benesty, Jingdong Chen
    Abstract:

    Developing a successful multi-microphone Speech Acquisition system in a reverberant, cocktail-party-like environment is a very challenging problem since both interfering sources and reverberation need to be well controlled. In this paper, we propose an algorithm based on blind SIMO identification. We first blindly identify the channels from the interfering sources to all the microphones. Then we extract the Speech signal of interest. Finally Speech dereverberation is performed using the MINT method. Simulations with acoustic impulse responses measured in the varechoic chamber at Bell Labs are carried out to verify the proposed algorithm.

Christopher A. Moore - One of the best experts on this subject based on the ideXlab platform.

  • Distinct developmental profiles in typical Speech Acquisition
    Journal of neurophysiology, 2012
    Co-Authors: Jennell Vick, Thomas F. Campbell, Jordan R. Green, Lawrence D. Shriberg, Hervé Abdi, Heather Leavy Rusiewicz, Lakshmi Venkatesh, Christopher A. Moore
    Abstract:

    Three- to five-year-old children produce Speech that is characterized by a high level of variability within and across individuals. This variability, which is manifest in Speech movements, acoustic...

  • Imitation of contrastive lexical stress in children with Speech delay
    The Journal of the Acoustical Society of America, 2005
    Co-Authors: Jennell Vick, Christopher A. Moore
    Abstract:

    This study examined the relationship between acoustic correlates of stress in trochaic (strong‐weak), spondaic (strong‐strong), and iambic (weak‐strong) nonword bisyllables produced by children (3;0‐5;0) with normal Speech Acquisition and children with Speech delay. Ratios comparing the acoustic measures (vowel duration, rms, and f0) of the first syllable to the second syllable were calculated to evaluate the extent to which each phonetic parameter was used to mark stress. In addition, a calculation of the variability of jaw movement in each bisyllable was made. Finally, perceptual judgments of accuracy of stress production were made. Analysis of perceptual judgments indicated a robust difference between groups: While both groups of children produced errors in imitating the contrastive lexical stress models (∼40%), the children with normal Speech Acquisition tended to produce trochaic forms in substitution for other stress types, whereas children with Speech delay showed no preference for trochees. The re...

  • Imitation of contrastive lexical stress in children with Speech delay
    The Journal of the Acoustical Society of America, 2005
    Co-Authors: Jennell Vick, Christopher A. Moore
    Abstract:

    This study examined the relationship between acoustic correlates of stress in trochaic (strong‐weak), spondaic (strong‐strong), and iambic (weak‐strong) nonword bisyllables produced by children (3;0‐5;0) with normal Speech Acquisition and children with Speech delay. Ratios comparing the acoustic measures (vowel duration, rms, and f0) of the first syllable to the second syllable were calculated to evaluate the extent to which each phonetic parameter was used to mark stress. In addition, a calculation of the variability of jaw movement in each bisyllable was made. Finally, perceptual judgments of accuracy of stress production were made. Analysis of perceptual judgments indicated a robust difference between groups: While both groups of children produced errors in imitating the contrastive lexical stress models (∼40%), the children with normal Speech Acquisition tended to produce trochaic forms in substitution for other stress types, whereas children with Speech delay showed no preference for trochees. The relationship between segmental acoustic parameters, kinematic variability, and the ratings of stress by trained listeners will be presented.