Ladder Networks

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 192 Experts worldwide ranked by ideXlab platform

Carlos Busso - One of the best experts on this subject based on the ideXlab platform.

  • semi supervised speech emotion recognition with Ladder Networks
    IEEE Transactions on Audio Speech and Language Processing, 2020
    Co-Authors: Srinivas Parthasarathy, Carlos Busso
    Abstract:

    Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. For example, systems that show superior performance on certain databases show poor performance when tested on other corpora. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of Ladder Networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. We implement the approach with sentence-level or frame-level features, demonstrating the flexibility of our approach. Additionally, the generalization of the Ladder Networks is evaluated in cross-corpus settings using sentence-level features, obtaining important improvements. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture.

  • semi supervised speech emotion recognition with Ladder Networks
    arXiv: Audio and Speech Processing, 2019
    Co-Authors: Srinivas Parthasarathy, Carlos Busso
    Abstract:

    Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of Ladder Networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that Ladder Networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture.

  • Ladder Networks for emotion recognition using unsupervised auxiliary tasks to improve predictions of emotional attributes
    arXiv: Audio and Speech Processing, 2018
    Co-Authors: Srinivas Parthasarathy, Carlos Busso
    Abstract:

    Recognizing emotions using few attribute dimensions such as arousal, valence and dominance provides the flexibility to effectively represent complex range of emotional behaviors. Conventional methods to learn these emotional descriptors primarily focus on separate models to recognize each of these attributes. Recent work has shown that learning these attributes together regularizes the models, leading to better feature representations. This study explores new forms of regularization by adding unsupervised auxiliary tasks to reconstruct hidden layer representations. This auxiliary task requires the denoising of hidden representations at every layer of an auto-encoder. The framework relies on Ladder Networks that utilize skip connections between encoder and decoder layers to learn powerful representations of emotional dimensions. The results show that Ladder Networks improve the performance of the system compared to baselines that individually learn each attribute, and conventional denoising autoencoders. Furthermore, the unsupervised auxiliary tasks have promising potential to be used in a semi-supervised setting, where few labeled sentences are available.

Srinivas Parthasarathy - One of the best experts on this subject based on the ideXlab platform.

  • semi supervised speech emotion recognition with Ladder Networks
    IEEE Transactions on Audio Speech and Language Processing, 2020
    Co-Authors: Srinivas Parthasarathy, Carlos Busso
    Abstract:

    Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. For example, systems that show superior performance on certain databases show poor performance when tested on other corpora. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of Ladder Networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. We implement the approach with sentence-level or frame-level features, demonstrating the flexibility of our approach. Additionally, the generalization of the Ladder Networks is evaluated in cross-corpus settings using sentence-level features, obtaining important improvements. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture.

  • semi supervised speech emotion recognition with Ladder Networks
    arXiv: Audio and Speech Processing, 2019
    Co-Authors: Srinivas Parthasarathy, Carlos Busso
    Abstract:

    Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of Ladder Networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that Ladder Networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture.

  • Ladder Networks for emotion recognition using unsupervised auxiliary tasks to improve predictions of emotional attributes
    arXiv: Audio and Speech Processing, 2018
    Co-Authors: Srinivas Parthasarathy, Carlos Busso
    Abstract:

    Recognizing emotions using few attribute dimensions such as arousal, valence and dominance provides the flexibility to effectively represent complex range of emotional behaviors. Conventional methods to learn these emotional descriptors primarily focus on separate models to recognize each of these attributes. Recent work has shown that learning these attributes together regularizes the models, leading to better feature representations. This study explores new forms of regularization by adding unsupervised auxiliary tasks to reconstruct hidden layer representations. This auxiliary task requires the denoising of hidden representations at every layer of an auto-encoder. The framework relies on Ladder Networks that utilize skip connections between encoder and decoder layers to learn powerful representations of emotional dimensions. The results show that Ladder Networks improve the performance of the system compared to baselines that individually learn each attribute, and conventional denoising autoencoders. Furthermore, the unsupervised auxiliary tasks have promising potential to be used in a semi-supervised setting, where few labeled sentences are available.

Xiao-ning Song - One of the best experts on this subject based on the ideXlab platform.

  • Graph Regularized Variational Ladder Networks for Semi-Supervised Learning
    IEEE Access, 2020
    Co-Authors: Xiao-ning Song
    Abstract:

    To tackle the problem of semi-supervised learning (SSL), we propose a new autoencoder-based deep model. Ladder Networks (LN) is an autoencoder-based method for representation learning which has been successfully applied on unsupervised learning and semi-supervised learning. However, It ignores the manifold information of high-dimensional data and usually achieves unmeaning features which are very difficult to use in the subsequent tasks, such as prediction and recognition. To these issues, we proposed Graph Regularized Variational Ladder Networks (GRVLN), which explicitly and implicitly employs the manifold structure of data. Our contributions can be summarized as two folds: (1) Graph regularization is used to build all decoder layers, which explicitly promotes the manifold learning via graph laplacian matrixs; (2) Variational autoencoder is used as the backbone instead of traditional autoencoder in the encoder layers for implicitly learning the manifold structure of data distribution. Compared with Ladder Networks and other autoencoder-based methods, GRVLN achieves superior performance in semi-supervised classification tasks. Experimental results show that our method also has a comparable performance with state-of-the-art methods on several benchmark data sets.

Mingyue Niu - One of the best experts on this subject based on the ideXlab platform.

  • correction to semi supervised Ladder Networks for speech emotion recognition
    International Journal of Automation and Computing, 2019
    Co-Authors: Jianhua Tao, Jian Huang, Zheng Lian, Mingyue Niu
    Abstract:

    The article Semi-supervised Ladder Networks for Speech Emotion Recognition written by Jian-Hua Tao, Jian Huang, Ya Li, Zheng Lian and Ming-Yue Niu, was originally published on vol. 16, no. 4 of International Journal of Automation and Computing without Open Access. After publication, the authors decided to opt for Open Choice and to make the article an Open Access publication. Therefore, the copyright of the article has been changed to © The Author(s) 2019 and the article is forthwith distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

  • Semi-supervised Ladder Networks for Speech Emotion Recognition
    International Journal of Automation and Computing, 2019
    Co-Authors: Jianhua Tao, Jian Huang, Zheng Lian, Mingyue Niu
    Abstract:

    As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed various unsupervised models to extract effective emotional features and supervised models to train emotion recognition systems. In this paper, we utilize semi-supervised Ladder Networks for speech emotion recognition. The model is trained by minimizing the supervised loss and auxiliary unsupervised cost function. The addition of the unsupervised auxiliary task provides powerful discriminative representations of the input features, and is also regarded as the regularization of the emotional supervised task. We also compare the Ladder network with other classical autoencoder structures. The experiments were conducted on the interactive emotional dyadic motion capture (IEMOCAP) database, and the results reveal that the proposed methods achieve superior performance with a small number of labelled data and achieves better performance than other methods.

  • Speech Emotion Recognition Using Semi-supervised Learning with Ladder Networks
    2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 2018
    Co-Authors: Jian Huang, Jianhua Tao, Zheng Lian, Mingyue Niu, Ya Li, Jiangyan Yi
    Abstract:

    As a major branch of speech processing, speech emotion recognition has drawn much attention of researchers. Prior works have proposed a variety of models and feature sets for training a system. In this paper, we propose to use semi-supervised learning with Ladder Networks to generate robust feature representation for speech emotion recognition. In our method, the input of Ladder network is the normalized static acoustic features and is mapped to high level hidden representations. The model is trained to simultaneously minimize the sum of supervised and unsupervised cost functions by back-propagation. The extracted hidden representations are used as emotional features in SVM model for speech emotion recognition. The experimental results, performed on IEMOCAP database, show 2.6% higher performance than denoising auto-encoder, and 5.3% than the static acoustic features.

C A Zukowski - One of the best experts on this subject based on the ideXlab platform.