Variation Problem

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 23106 Experts worldwide ranked by ideXlab platform

Honggoo Kang - One of the best experts on this subject based on the ideXlab platform.

  • improved time frequency trajectory excitation modeling for a statistical parametric speech synthesis system
    International Conference on Acoustics Speech and Signal Processing, 2015
    Co-Authors: Eunwoo Song, Honggoo Kang
    Abstract:

    This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional Variation Problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC), the proposed algorithm efficiently models excitation, even if its dimension is varied. Objective and subjective test results verify that the proposed algorithm provides not only robustness to the training process but also naturalness to the synthesized speech.

  • ICASSP - Improved time-frequency trajectory excitation modeling for a statistical parametric speech synthesis system
    2015 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2015
    Co-Authors: Eunwoo Song, Honggoo Kang
    Abstract:

    This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional Variation Problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC), the proposed algorithm efficiently models excitation, even if its dimension is varied. Objective and subjective test results verify that the proposed algorithm provides not only robustness to the training process but also naturalness to the synthesized speech.

Eunwoo Song - One of the best experts on this subject based on the ideXlab platform.

  • improved time frequency trajectory excitation modeling for a statistical parametric speech synthesis system
    International Conference on Acoustics Speech and Signal Processing, 2015
    Co-Authors: Eunwoo Song, Honggoo Kang
    Abstract:

    This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional Variation Problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC), the proposed algorithm efficiently models excitation, even if its dimension is varied. Objective and subjective test results verify that the proposed algorithm provides not only robustness to the training process but also naturalness to the synthesized speech.

  • ICASSP - Improved time-frequency trajectory excitation modeling for a statistical parametric speech synthesis system
    2015 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2015
    Co-Authors: Eunwoo Song, Honggoo Kang
    Abstract:

    This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional Variation Problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC), the proposed algorithm efficiently models excitation, even if its dimension is varied. Objective and subjective test results verify that the proposed algorithm provides not only robustness to the training process but also naturalness to the synthesized speech.

Bjorn Schuller - One of the best experts on this subject based on the ideXlab platform.

  • a closed form solution to the graph total Variation Problem for continuous emotion profiling in noisy environment
    Speech Communication, 2018
    Co-Authors: Shaoling Jing, Lijiang Chen, Maria Colomba Comes, Arianna Mencattini, Grazia Raguso, Fabien Ringeval, Bjorn Schuller, Corrado Di Natale, Eugenio Martinelli
    Abstract:

    Abstract Time-continuous emotion estimation (e. g., arousal and valence) from spontaneous speech expressions has recently drawn increasing commercial attention. However, real-life applications of emotion recognition technology require challenging conditions, such as noise from recording devices and background environments. In this work, we introduce a novel personalized emotion prediction model validated in different noisy environments. It is performed by a three-level noise reduction algorithm: (i) data downsampling, (ii) feature synchronization, and (iii) a modified version of graph total Variation. The approach has been validated on the broadly used RECOLA database with different types of noises, including convolutive and additive noise with different SNRs. The process of feature synchronization improves the concordance correlation coefficient (CCC) absolute values by 0.271 on average for arousal and 0.137 for valence. The proposed denoising approach further improves the values by 0.101 for arousal and 0.086 for valence. Finally, the proposed model considerably improves the CCC values on raw data and all types of noisy data and outperforms the standard denoising methods.

  • Towards intoxicated speech recognition
    2017 International Joint Conference on Neural Networks (IJCNN), 2017
    Co-Authors: Zixing Zhang, Felix Weninger, Martin Wöllmer, Bjorn Schuller
    Abstract:

    In a real-life scenario, the acoustic characteristics of speech often suffer from the Variations induced by diverse environmental noises and different speakers. To overcome the speaker-related speech Variation Problem for Automatic Speech Recognition (ASR), many speaker adaptation techniques have been proposed and studied. Almost all of these studies, however, only considered the speakers' long-term traits, such as age, gender, and dialect. Speakers' short-term states, for example, affect and intoxication, are largely ignored. In this study, we address one particular speaker state, alcohol intoxication, which has rarely been studied in the context of ASR. To do this, empirical experiments are performed on a publicly available database used for the INTERSPEECH 2011 Speaker State Challenge, Intoxication Sub-Challenge. The experimental results show that the intoxicated state of the speaker indeed degrades the performance of ASR systems by a large margin for all of the three considered speech styles (spontaneous speech, tongue twisters, command & control). In addition, this paper further shows that multi-condition training can notably improve the acoustic model.

Ashley Feinsinger - One of the best experts on this subject based on the ideXlab platform.

  • The Variation Problem
    Philosophical Studies, 2020
    Co-Authors: Ashley Feinsinger
    Abstract:

    It is often assumed that two linguistic agents can come to understand one another in part because they use the same words. That is, many philosophical theories of communication posit an intersubjective same-word relation. However, giving an account of this relation is complicated by what I call “The Variation Problem”—a Problem resulting from the fact that the same word can be pronounced differently. In this paper, I first argue that previous models of the same-word relation, including Kaplanian and Chomskyan models, fail to escape The Variation Problem. I then propose a new model on which the same-word relation is grounded in a particular kind of social relation that holds between the speaker and the audience. On this model, using the same word requires not that agents make the same sounds, but that they coordinate their internal linguistic representations.

E. Filho - One of the best experts on this subject based on the ideXlab platform.

  • An Approach to Improve Accuracy Rate of On-line Signature Verification Systems of Different Sizes
    Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2007
    Co-Authors: R. Araujo, G. Cavalcanti, E. Filho
    Abstract:

    This paper discusses the Problem of size Variation in on-line signature verification systems. The main idea of the article is to investigate the influence of the size Variation in the feature extraction techniques and how this distortion can affect the final classification performance of the systems. In this study a new classification approach was suggested based on Kholmatov and Yanikoglu work in order to measure this performance. Besides that, a feature selection technique was applied in the description of the patterns with the purpose of over come the size Variation Problem. All the experiments were performed in a database constructed with signatures of three different sizes and skilled forgeries. This kind of study plays an important role in the implementation of systems that uses different signature sources.