The Experts below are selected from a list of 192 Experts worldwide ranked by ideXlab platform
Carlos Busso - One of the best experts on this subject based on the ideXlab platform.
-
semi supervised speech emotion recognition with Ladder Networks
IEEE Transactions on Audio Speech and Language Processing, 2020Co-Authors: Srinivas Parthasarathy, Carlos BussoAbstract:Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. For example, systems that show superior performance on certain databases show poor performance when tested on other corpora. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of Ladder Networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. We implement the approach with sentence-level or frame-level features, demonstrating the flexibility of our approach. Additionally, the generalization of the Ladder Networks is evaluated in cross-corpus settings using sentence-level features, obtaining important improvements. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture.
-
semi supervised speech emotion recognition with Ladder Networks
arXiv: Audio and Speech Processing, 2019Co-Authors: Srinivas Parthasarathy, Carlos BussoAbstract:Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of Ladder Networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that Ladder Networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture.
-
Ladder Networks for emotion recognition using unsupervised auxiliary tasks to improve predictions of emotional attributes
arXiv: Audio and Speech Processing, 2018Co-Authors: Srinivas Parthasarathy, Carlos BussoAbstract:Recognizing emotions using few attribute dimensions such as arousal, valence and dominance provides the flexibility to effectively represent complex range of emotional behaviors. Conventional methods to learn these emotional descriptors primarily focus on separate models to recognize each of these attributes. Recent work has shown that learning these attributes together regularizes the models, leading to better feature representations. This study explores new forms of regularization by adding unsupervised auxiliary tasks to reconstruct hidden layer representations. This auxiliary task requires the denoising of hidden representations at every layer of an auto-encoder. The framework relies on Ladder Networks that utilize skip connections between encoder and decoder layers to learn powerful representations of emotional dimensions. The results show that Ladder Networks improve the performance of the system compared to baselines that individually learn each attribute, and conventional denoising autoencoders. Furthermore, the unsupervised auxiliary tasks have promising potential to be used in a semi-supervised setting, where few labeled sentences are available.
Srinivas Parthasarathy - One of the best experts on this subject based on the ideXlab platform.
-
semi supervised speech emotion recognition with Ladder Networks
IEEE Transactions on Audio Speech and Language Processing, 2020Co-Authors: Srinivas Parthasarathy, Carlos BussoAbstract:Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. For example, systems that show superior performance on certain databases show poor performance when tested on other corpora. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of Ladder Networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. We implement the approach with sentence-level or frame-level features, demonstrating the flexibility of our approach. Additionally, the generalization of the Ladder Networks is evaluated in cross-corpus settings using sentence-level features, obtaining important improvements. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture.
-
semi supervised speech emotion recognition with Ladder Networks
arXiv: Audio and Speech Processing, 2019Co-Authors: Srinivas Parthasarathy, Carlos BussoAbstract:Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of Ladder Networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that Ladder Networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture.
-
Ladder Networks for emotion recognition using unsupervised auxiliary tasks to improve predictions of emotional attributes
arXiv: Audio and Speech Processing, 2018Co-Authors: Srinivas Parthasarathy, Carlos BussoAbstract:Recognizing emotions using few attribute dimensions such as arousal, valence and dominance provides the flexibility to effectively represent complex range of emotional behaviors. Conventional methods to learn these emotional descriptors primarily focus on separate models to recognize each of these attributes. Recent work has shown that learning these attributes together regularizes the models, leading to better feature representations. This study explores new forms of regularization by adding unsupervised auxiliary tasks to reconstruct hidden layer representations. This auxiliary task requires the denoising of hidden representations at every layer of an auto-encoder. The framework relies on Ladder Networks that utilize skip connections between encoder and decoder layers to learn powerful representations of emotional dimensions. The results show that Ladder Networks improve the performance of the system compared to baselines that individually learn each attribute, and conventional denoising autoencoders. Furthermore, the unsupervised auxiliary tasks have promising potential to be used in a semi-supervised setting, where few labeled sentences are available.
Xiao-ning Song - One of the best experts on this subject based on the ideXlab platform.
-
Graph Regularized Variational Ladder Networks for Semi-Supervised Learning
IEEE Access, 2020Co-Authors: Xiao-ning SongAbstract:To tackle the problem of semi-supervised learning (SSL), we propose a new autoencoder-based deep model. Ladder Networks (LN) is an autoencoder-based method for representation learning which has been successfully applied on unsupervised learning and semi-supervised learning. However, It ignores the manifold information of high-dimensional data and usually achieves unmeaning features which are very difficult to use in the subsequent tasks, such as prediction and recognition. To these issues, we proposed Graph Regularized Variational Ladder Networks (GRVLN), which explicitly and implicitly employs the manifold structure of data. Our contributions can be summarized as two folds: (1) Graph regularization is used to build all decoder layers, which explicitly promotes the manifold learning via graph laplacian matrixs; (2) Variational autoencoder is used as the backbone instead of traditional autoencoder in the encoder layers for implicitly learning the manifold structure of data distribution. Compared with Ladder Networks and other autoencoder-based methods, GRVLN achieves superior performance in semi-supervised classification tasks. Experimental results show that our method also has a comparable performance with state-of-the-art methods on several benchmark data sets.
Mingyue Niu - One of the best experts on this subject based on the ideXlab platform.
-
correction to semi supervised Ladder Networks for speech emotion recognition
International Journal of Automation and Computing, 2019Co-Authors: Jianhua Tao, Jian Huang, Zheng Lian, Mingyue NiuAbstract:The article Semi-supervised Ladder Networks for Speech Emotion Recognition written by Jian-Hua Tao, Jian Huang, Ya Li, Zheng Lian and Ming-Yue Niu, was originally published on vol. 16, no. 4 of International Journal of Automation and Computing without Open Access. After publication, the authors decided to opt for Open Choice and to make the article an Open Access publication. Therefore, the copyright of the article has been changed to © The Author(s) 2019 and the article is forthwith distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
-
Semi-supervised Ladder Networks for Speech Emotion Recognition
International Journal of Automation and Computing, 2019Co-Authors: Jianhua Tao, Jian Huang, Zheng Lian, Mingyue NiuAbstract:As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed various unsupervised models to extract effective emotional features and supervised models to train emotion recognition systems. In this paper, we utilize semi-supervised Ladder Networks for speech emotion recognition. The model is trained by minimizing the supervised loss and auxiliary unsupervised cost function. The addition of the unsupervised auxiliary task provides powerful discriminative representations of the input features, and is also regarded as the regularization of the emotional supervised task. We also compare the Ladder network with other classical autoencoder structures. The experiments were conducted on the interactive emotional dyadic motion capture (IEMOCAP) database, and the results reveal that the proposed methods achieve superior performance with a small number of labelled data and achieves better performance than other methods.
-
Speech Emotion Recognition Using Semi-supervised Learning with Ladder Networks
2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 2018Co-Authors: Jian Huang, Jianhua Tao, Zheng Lian, Mingyue Niu, Ya Li, Jiangyan YiAbstract:As a major branch of speech processing, speech emotion recognition has drawn much attention of researchers. Prior works have proposed a variety of models and feature sets for training a system. In this paper, we propose to use semi-supervised learning with Ladder Networks to generate robust feature representation for speech emotion recognition. In our method, the input of Ladder network is the normalized static acoustic features and is mapped to high level hidden representations. The model is trained to simultaneously minimize the sum of supervised and unsupervised cost functions by back-propagation. The extracted hidden representations are used as emotional features in SVM model for speech emotion recognition. The experimental results, performed on IEMOCAP database, show 2.6% higher performance than denoising auto-encoder, and 5.3% than the static acoustic features.
C A Zukowski - One of the best experts on this subject based on the ideXlab platform.
-
delay time bounds and waveform bounds for rlcg Ladder Networks
Annual Simulation Symposium, 1994Co-Authors: Yingwen Bai, C A ZukowskiAbstract:We propose a current-voltage relaxation method to obtain and refine the waveform bounds for RLCG Ladder Networks. In addition to the known upper bounds, we find the lower bounds, and combine the two to find delay-time bounds for RLCG Ladder Networks. >
-
Annual Simulation Symposium - Delay-time bounds and waveform bounds for RLCG Ladder Networks
27th Annual Simulation Symposium, 1Co-Authors: Yingwen Bai, C A ZukowskiAbstract:We propose a current-voltage relaxation method to obtain and refine the waveform bounds for RLCG Ladder Networks. In addition to the known upper bounds, we find the lower bounds, and combine the two to find delay-time bounds for RLCG Ladder Networks. >