The Experts below are selected from a list of 24804 Experts worldwide ranked by ideXlab platform
James Glass - One of the best experts on this subject based on the ideXlab platform.
-
Speech feature denoising and dereverberation via deep Autoencoders for noisy reverberant speech recognition
ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014Co-Authors: Xue Feng, Yaodong Zhang, James GlassAbstract:Denoising Autoencoders (DAs) have shown success in generating robust features for images, but there has been limited work in applying DAs for speech. In this paper we present a deep denoising autoencoder (DDA) framework that can produce robust speech features for noisy reverberant speech recognition. The DDA is first pre-trained as restricted Boltzmann machines (RBMs) in an unsupervised fashion. Then it is unrolled to Autoencoders, and fine-tuned by corresponding clean speech features to learn a nonlinear mapping from noisy to clean features. Acoustic models are re-trained using the reconstructed features from the DDA, and speech recognition is performed. The proposed approach is evaluated on the CHiME-WSJ0 corpus, and shows a 16-25% absolute improvement on the recognition accuracy under various SNRs.
Signal Processing - One of the best experts on this subject based on the ideXlab platform.
-
SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP Autoencoders FOR NOISY REVERBERANT SPEECH RECOGNITION Xue Feng , Yaodong Zhang , James Glass MIT Computer Science and Artificial Intelligence Laboratory
ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014Co-Authors: Ieee International Conference, Signal ProcessingAbstract:Denoising Autoencoders (DAs) have shown success in generating robust features for images, but there has been limited work in applying DAs for speech. In this paper we present a deep denoising autoencoder (DDA) framework that can produce robust speech features for noisy reverberant speech recognition. The DDA is first pre-trained as restricted Boltz-mann machines (RBMs) in an unsupervised fashion. Then it is unrolled to Autoencoders, and fine-tuned by corresponding clean speech features to learn a nonlinear mapping from noisy to clean features. Acoustic models are retrained using the reconstructed features from the DDA, and speech recognition is performed. The proposed approach is evaluated on the CHiME-WSJ0 corpus, and shows a 16-25% absolute improvement on the recognition accuracy under various SNRs.
Xue Feng - One of the best experts on this subject based on the ideXlab platform.
-
Speech feature denoising and dereverberation via deep Autoencoders for noisy reverberant speech recognition
ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014Co-Authors: Xue Feng, Yaodong Zhang, James GlassAbstract:Denoising Autoencoders (DAs) have shown success in generating robust features for images, but there has been limited work in applying DAs for speech. In this paper we present a deep denoising autoencoder (DDA) framework that can produce robust speech features for noisy reverberant speech recognition. The DDA is first pre-trained as restricted Boltzmann machines (RBMs) in an unsupervised fashion. Then it is unrolled to Autoencoders, and fine-tuned by corresponding clean speech features to learn a nonlinear mapping from noisy to clean features. Acoustic models are re-trained using the reconstructed features from the DDA, and speech recognition is performed. The proposed approach is evaluated on the CHiME-WSJ0 corpus, and shows a 16-25% absolute improvement on the recognition accuracy under various SNRs.
Ieee International Conference - One of the best experts on this subject based on the ideXlab platform.
-
SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP Autoencoders FOR NOISY REVERBERANT SPEECH RECOGNITION Xue Feng , Yaodong Zhang , James Glass MIT Computer Science and Artificial Intelligence Laboratory
ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014Co-Authors: Ieee International Conference, Signal ProcessingAbstract:Denoising Autoencoders (DAs) have shown success in generating robust features for images, but there has been limited work in applying DAs for speech. In this paper we present a deep denoising autoencoder (DDA) framework that can produce robust speech features for noisy reverberant speech recognition. The DDA is first pre-trained as restricted Boltz-mann machines (RBMs) in an unsupervised fashion. Then it is unrolled to Autoencoders, and fine-tuned by corresponding clean speech features to learn a nonlinear mapping from noisy to clean features. Acoustic models are retrained using the reconstructed features from the DDA, and speech recognition is performed. The proposed approach is evaluated on the CHiME-WSJ0 corpus, and shows a 16-25% absolute improvement on the recognition accuracy under various SNRs.
Jiwen Lu - One of the best experts on this subject based on the ideXlab platform.
-
single sample face recognition via learning deep supervised Autoencoders
IEEE Transactions on Information Forensics and Security, 2015Co-Authors: Yuting Zhang, Jiwen LuAbstract:This paper targets learning robust image representation for single training sample per person face recognition. Motivated by the success of deep learning in image representation, we propose a supervised autoencoder, which is a new type of building block for deep architectures. There are two features distinct our supervised autoencoder from standard autoencoder. First, we enforce the faces with variants to be mapped with the canonical face of the person, for example, frontal face with neutral expression and normal illumination; Second, we enforce features corresponding to the same person to be similar. As a result, our supervised autoencoder extracts the features which are robust to variances in illumination, expression, occlusion, and pose, and facilitates the face recognition. We stack such supervised Autoencoders to get the deep architecture and use it for extracting features in image representation. Experimental results on the AR, Extended Yale B, CMU-PIE, and Multi-PIE data sets demonstrate that by coupling with the commonly used sparse representation-based classification, our stacked supervised Autoencoders-based face representation significantly outperforms the commonly used image representations in single sample per person face recognition, and it achieves higher recognition accuracy compared with other deep learning models, including the deep Lambertian network, in spite of much less training data and without any domain information. Moreover, supervised autoencoder can also be used for face verification, which further demonstrates its effectiveness for face representation.