Speech Enhancement

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 18696 Experts worldwide ranked by ideXlab platform

Horaud Radu - One of the best experts on this subject based on the ideXlab platform.

  • Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders
    'Institute of Electrical and Electronics Engineers (IEEE)', 2020
    Co-Authors: Sadeghi Mostafa, Leglaive Simon, Alameda-pineda Xavier, Girin Laurent, Horaud Radu
    Abstract:

    International audienceVariational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over Speech signals, which is then used to perform Speech Enhancement. One advantage of this generative approach is that it does not require pairs of clean and noisy Speech signals at training. In this paper, we propose audio-visual variants of VAEs for single-channel and speaker-independent Speech Enhancement. We develop a conditional VAE (CVAE) where the audio Speech generative process is conditioned on visual information of the lip region. At test time, the audio-visual Speech generative model is combined with a noise model based on nonnegative matrix factorization, and Speech Enhancement relies on a Monte Carlo expectation-maximization algorithm. Experiments are conducted with the recently published NTCD-TIMIT dataset. The results confirm that the proposed audio-visual CVAE effectively fuse audio and visual information, and it improves the Speech Enhancement performance compared with the audio-only VAE model, especially when the Speech signal is highly corrupted by noise. We also show that the proposed unsupervised audio-visual Speech Enhancement approach outperforms a state-of-the-art supervised deep learning method

  • A Recurrent Variational Autoencoder for Speech Enhancement
    'Institute of Electrical and Electronics Engineers (IEEE)', 2020
    Co-Authors: Leglaive Simon, Alameda-pineda Xavier, Girin Laurent, Horaud Radu
    Abstract:

    International audienceThis paper presents a generative approach to Speech Enhancement based on a recurrent variational autoencoder (RVAE). The deep generative Speech model is trained using clean Speech signals only, and it is combined with a nonnegative matrix factorization noise model for Speech Enhancement. We propose a variational expectation-maximization algorithm where the encoder of the RVAE is finetuned at test time, to approximate the distribution of the latent variables given the noisy Speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative Speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the Speech Enhancement results

  • Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoder
    HAL CCSD, 2020
    Co-Authors: Sadeghi Mostafa, Leglaive Simon, Alameda-pineda Xavier, Girin Laurent, Horaud Radu
    Abstract:

    Submitted to IEEE/ACM Transactions on Audio, Speech, and Language ProcessingVariational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over Speech signals, which is then used to perform Speech Enhancement. One advantage of this generative approach is that it does not require pairs of clean and noisy Speech signals at training. In this paper, we propose audio-visual variants of VAEs for single-channel and speaker-independent Speech Enhancement. We develop a conditional VAE (CVAE) where the audio Speech generative process is conditioned on visual information of the lip region. At test time, the audio-visual Speech generative model is combined with a noise model based on nonnegative matrix factorization, and Speech Enhancement relies on a Monte Carlo expectation-maximization algorithm. Experiments are conducted with the recently published NTCD-TIMIT dataset. The results confirm that the proposed audio-visual CVAE effectively fuse audio and visual information, and it improves the Speech Enhancement performance compared with the audio-only VAE model, especially when the Speech signal is highly corrupted by noise. We also show that the proposed unsupervised audio-visual Speech Enhancement approach outperforms a state-of-the-art supervised deep learning method

Joan Serra - One of the best experts on this subject based on the ideXlab platform.

  • SEGAN: Speech Enhancement generative adversarial network
    Proceedings of the Annual Conference of the International Speech Communication Association INTERSPEECH, 2017
    Co-Authors: Santiago Pascual, Antonio Bonafonte, Joan Serra
    Abstract:

    Current Speech Enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of generative adversarial networks for Speech Enhancement. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm the viability of the proposed model, and both objective and subjective evaluations confirm the effectiveness of it. With that, we open the exploration of generative architectures for Speech Enhancement, which may progressively incorporate further Speech-centric design choices to improve their performance.

Philipos C. Loizou - One of the best experts on this subject based on the ideXlab platform.

  • Speech Enhancement: Theory and Practice
    2007
    Co-Authors: Philipos C. Loizou
    Abstract:

    With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve Speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of Speech Enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and Enhancement algorithms aimed at improving Speech intelligibility. Fundamentals, Algorithms, Evaluation, and Future Steps Organized into four parts, the book begins with a review of the fundamentals needed to understand and design better Speech Enhancement algorithms. The second part describes all the major Enhancement algorithms and, because these require an estimate of the noise spectrum, also covers noise estimation algorithms. The third part of the book looks at the measures used to assess the performance, in terms of Speech quality and intelligibility, of Speech Enhancement methods. It also evaluates and compares several of the algorithms. The fourth part presents binary mask algorithms for improving Speech intelligibility under ideal conditions. In addition, it suggests steps that can be taken to realize the full potential of these algorithms under realistic conditions. Whats New in This Edition Updates in every chapter A new chapter on objective Speech intelligibility measures A new chapter on algorithms for improving Speech intelligibility Real-world noise recordings (on accompanying CD) MATLAB code for the implementation of intelligibility measures (on accompanying CD) MATLAB and C/C++ code for the implementation of algorithms to improve Speech intelligibility (on accompanying CD) Valuable Insights from a Pioneer in Speech Enhancement Clear and concise, this book explores how human listeners compensate for acoustic noise in noisy environments. Written by a pioneer in Speech Enhancement and noise reduction in cochlear implants, it is an essential resource for anyone who wants to implement or incorporate the latest Speech Enhancement algorithms to improve the quality and intelligibility of Speech degraded by noise. Includes a CD with Code and Recordings The accompanying CD provides MATLAB implementations of representative Speech Enhancement algorithms as well as Speech and noise databases for the evaluation of Enhancement algorithms.

  • Subjective comparison and evaluation of Speech Enhancement algorithms
    Speech communication, 2007
    Co-Authors: Philipos C. Loizou
    Abstract:

    Making meaningful comparisons between the performance of the various Speech Enhancement algorithms proposed over the years has been elusive due to lack of a common Speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy Speech corpus suitable for evaluation of Speech Enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 Speech Enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc., using the ITU-T P.835 methodology designed to evaluate the Speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests.

  • subjective comparison of Speech Enhancement algorithms
    International Conference on Acoustics Speech and Signal Processing, 2006
    Co-Authors: Philipos C. Loizou
    Abstract:

    We report on the development of a noisy Speech corpus suitable for evaluation of Speech Enhancement algorithms. This corpus is used for the subjective evaluation of 13 Speech Enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener algorithms. The subjective evaluation was performed by Dynastat, Inc. using the ITU-T P.835 methodology designed to evaluate the Speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests.

Radu Horaud - One of the best experts on this subject based on the ideXlab platform.

  • Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoder
    2020
    Co-Authors: Mostafa Sadeghi, Laurent Girin, Xavier Alameda-pineda, Simon Leglaive, Radu Horaud
    Abstract:

    Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over Speech signals, which is then used to perform Speech Enhancement. One advantage of this generative approach is that it does not require pairs of clean and noisy Speech signals at training. In this paper, we propose audio-visual variants of VAEs for single-channel and speaker-independent Speech Enhancement. We develop a conditional VAE (CVAE) where the audio Speech generative process is conditioned on visual information of the lip region. At test time, the audio-visual Speech generative model is combined with a noise model based on nonnegative matrix factorization, and Speech Enhancement relies on a Monte Carlo expectation-maximization algorithm. Experiments are conducted with the recently published NTCD-TIMIT dataset. The results confirm that the proposed audio-visual CVAE effectively fuse audio and visual information, and it improves the Speech Enhancement performance compared with the audio-only VAE model, especially when the Speech signal is highly corrupted by noise. We also show that the proposed unsupervised audio-visual Speech Enhancement approach outperforms a state-of-the-art supervised deep learning method.

  • A Recurrent Variational Autoencoder for Speech Enhancement
    arXiv: Learning, 2019
    Co-Authors: Simon Leglaive, Laurent Girin, Xavier Alameda-pineda, Radu Horaud
    Abstract:

    This paper presents a generative approach to Speech Enhancement based on a recurrent variational autoencoder (RVAE). The deep generative Speech model is trained using clean Speech signals only, and it is combined with a nonnegative matrix factorization noise model for Speech Enhancement. We propose a variational expectation-maximization algorithm where the encoder of the RVAE is fine-tuned at test time, to approximate the distribution of the latent variables given the noisy Speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative Speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the Speech Enhancement results.

Jianhueng Chen - One of the best experts on this subject based on the ideXlab platform.

  • spectro temporal subband wiener filter for Speech Enhancement
    International Conference on Acoustics Speech and Signal Processing, 2012
    Co-Authors: Jianhueng Chen
    Abstract:

    In this paper, we propose a signal-channel Speech Enhancement algorithm by applying the conventional Wiener filter in the spectro-temporal modulation domain. The multi-resolution spectro-temporal analysis and synthesis framework for Fourier spectrograms [12] is extended to the analysis-modification-synthesis (AMS) framework for Speech Enhancement. Compared with conventional Speech Enhancement algorithms, a Wiener filter and an extended minimum mean-square error (MMSE) algorithm, our proposed method outperforms them by a large/small margin in white/babble noise conditions from both objective and subjective evaluations.