Autoencoders

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 24804 Experts worldwide ranked by ideXlab platform

James Glass - One of the best experts on this subject based on the ideXlab platform.

  • Speech feature denoising and dereverberation via deep Autoencoders for noisy reverberant speech recognition
    ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014
    Co-Authors: Xue Feng, Yaodong Zhang, James Glass
    Abstract:

    Denoising Autoencoders (DAs) have shown success in generating robust features for images, but there has been limited work in applying DAs for speech. In this paper we present a deep denoising autoencoder (DDA) framework that can produce robust speech features for noisy reverberant speech recognition. The DDA is first pre-trained as restricted Boltzmann machines (RBMs) in an unsupervised fashion. Then it is unrolled to Autoencoders, and fine-tuned by corresponding clean speech features to learn a nonlinear mapping from noisy to clean features. Acoustic models are re-trained using the reconstructed features from the DDA, and speech recognition is performed. The proposed approach is evaluated on the CHiME-WSJ0 corpus, and shows a 16-25% absolute improvement on the recognition accuracy under various SNRs.

Signal Processing - One of the best experts on this subject based on the ideXlab platform.

Xue Feng - One of the best experts on this subject based on the ideXlab platform.

  • Speech feature denoising and dereverberation via deep Autoencoders for noisy reverberant speech recognition
    ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014
    Co-Authors: Xue Feng, Yaodong Zhang, James Glass
    Abstract:

    Denoising Autoencoders (DAs) have shown success in generating robust features for images, but there has been limited work in applying DAs for speech. In this paper we present a deep denoising autoencoder (DDA) framework that can produce robust speech features for noisy reverberant speech recognition. The DDA is first pre-trained as restricted Boltzmann machines (RBMs) in an unsupervised fashion. Then it is unrolled to Autoencoders, and fine-tuned by corresponding clean speech features to learn a nonlinear mapping from noisy to clean features. Acoustic models are re-trained using the reconstructed features from the DDA, and speech recognition is performed. The proposed approach is evaluated on the CHiME-WSJ0 corpus, and shows a 16-25% absolute improvement on the recognition accuracy under various SNRs.

Ieee International Conference - One of the best experts on this subject based on the ideXlab platform.

Jiwen Lu - One of the best experts on this subject based on the ideXlab platform.

  • single sample face recognition via learning deep supervised Autoencoders
    IEEE Transactions on Information Forensics and Security, 2015
    Co-Authors: Yuting Zhang, Jiwen Lu
    Abstract:

    This paper targets learning robust image representation for single training sample per person face recognition. Motivated by the success of deep learning in image representation, we propose a supervised autoencoder, which is a new type of building block for deep architectures. There are two features distinct our supervised autoencoder from standard autoencoder. First, we enforce the faces with variants to be mapped with the canonical face of the person, for example, frontal face with neutral expression and normal illumination; Second, we enforce features corresponding to the same person to be similar. As a result, our supervised autoencoder extracts the features which are robust to variances in illumination, expression, occlusion, and pose, and facilitates the face recognition. We stack such supervised Autoencoders to get the deep architecture and use it for extracting features in image representation. Experimental results on the AR, Extended Yale B, CMU-PIE, and Multi-PIE data sets demonstrate that by coupling with the commonly used sparse representation-based classification, our stacked supervised Autoencoders-based face representation significantly outperforms the commonly used image representations in single sample per person face recognition, and it achieves higher recognition accuracy compared with other deep learning models, including the deep Lambertian network, in spite of much less training data and without any domain information. Moreover, supervised autoencoder can also be used for face verification, which further demonstrates its effectiveness for face representation.