Binary Mask

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 6021 Experts worldwide ranked by ideXlab platform

Mike Brookes - One of the best experts on this subject based on the ideXlab platform.

  • Improving the perceptual quality of ideal Binary Masked speech
    2017 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2017
    Co-Authors: Leo Lightburn, Enzo De Sena, Alastair Moore, Patrick A. Naylor, Mike Brookes
    Abstract:

    It is known that applying a time-frequency Binary Mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a Binary Mask that combines the intelligibility gains of conventional Binary Masking with the perceptual quality gains of a classical speech enhancer. The Binary Mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the Mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal Binary Mask, we show that the proposed method results in a higher predicted quality than other methods of applying a Binary Mask whilst preserving the improvements in predicted intelligibility.

  • SOBM - a Binary Mask for noisy speech that optimises an objective intelligibility metric
    2015 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2015
    Co-Authors: Leo Lightburn, Mike Brookes
    Abstract:

    It is known that the intelligibility of noisy speech can be improved by applying a Binary-valued gain Mask to a time-frequency representation of the speech. We present the SOBM, an oracle Binary Mask that maximises STOI, an objective speech intelligibility metric. We show how to determine the SOBM for a deterministic noise signal and also for a stochastic noise signal with a known power spectrum. We demonstrate that applying the SOBM to noisy speech results in a higher predicted intelligibility than is obtained with other Masks and show that the stochastic version is robust to mismatch errors in SNR and noise spectrum.

  • ICASSP - SOBM - a Binary Mask for noisy speech that optimises an objective intelligibility metric
    2015 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2015
    Co-Authors: Leo Lightburn, Mike Brookes
    Abstract:

    It is known that the intelligibility of noisy speech can be improved by applying a Binary-valued gain Mask to a timefrequency representation of the speech. We present the SOBM, an oracle Binary Mask that maximises STOI, an objective speech intelligibility metric. We show how to determine the SOBM for a deterministic noise signal and also for a stochastic noise signal with a known power spectrum. We demonstrate that applying the SOBM to noisy speech results in a higher predicted intelligibility than is obtained with other Masks and show that the stochastic version is robust to mismatch errors in SNR and noise spectrum.

Christopher J. Rozell - One of the best experts on this subject based on the ideXlab platform.

  • Cochlear implant speech intelligibility outcomes with structured and unstructured Binary Mask errors
    Journal of the Acoustical Society of America, 2016
    Co-Authors: Abigail A. Kressner, Adam Westermann, Jörg M. Buchholz, Christopher J. Rozell
    Abstract:

    It has been shown that intelligibility can be improved for cochlear implant (CI) recipients with the ideal Binary Mask (IBM). In realistic scenarios where prior information is unavailable, however, the IBM must be estimated, and these estimations will inevitably contain errors. Although the effects of both unstructured and structured Binary Mask errors have been investigated with normal-hearing (NH) listeners, they have not been investigated with CI recipients. This study assesses these effects with CI recipients using Masks that have been generated systematically with a statistical model. The results demonstrate that clustering of Mask errors substantially decreases the tolerance of errors, that incorrectly removing target-dominated regions can be as detrimental to intelligibility as incorrectly adding interferer-dominated regions, and that the individual tolerances of the different types of errors can change when both are present. These trends follow those of NH listeners. However, analysis with a mixed...

  • cochlear implant speech intelligibility outcomes with structured and unstructured Binary Mask errors
    Journal of the Acoustical Society of America, 2016
    Co-Authors: Abigail A. Kressner, Adam Westermann, Jörg M. Buchholz, Christopher J. Rozell
    Abstract:

    It has been shown that intelligibility can be improved for cochlear implant (CI) recipients with the ideal Binary Mask (IBM). In realistic scenarios where prior information is unavailable, however, the IBM must be estimated, and these estimations will inevitably contain errors. Although the effects of both unstructured and structured Binary Mask errors have been investigated with normal-hearing (NH) listeners, they have not been investigated with CI recipients. This study assesses these effects with CI recipients using Masks that have been generated systematically with a statistical model. The results demonstrate that clustering of Mask errors substantially decreases the tolerance of errors, that incorrectly removing target-dominated regions can be as detrimental to intelligibility as incorrectly adding interferer-dominated regions, and that the individual tolerances of the different types of errors can change when both are present. These trends follow those of NH listeners. However, analysis with a mixed effects model suggests that CI recipients tend to be less tolerant than NH listeners to Mask errors in most conditions, at least with respect to the testing methods in each of the studies. This study clearly demonstrates that structure influences the tolerance of errors and therefore should be considered when analyzing Binary-Masking algorithms.

  • A novel Binary Mask estimator based on sparse approximation
    2013 IEEE International Conference on Acoustics Speech and Signal Processing, 2013
    Co-Authors: Abigail A. Kressner, David V. Anderson, Christopher J. Rozell
    Abstract:

    While most single-channel noise reduction algorithms fail to improve speech intelligibility, the ideal Binary Mask (IBM) has demonstrated substantial intelligibility improvements. However, this approach exploits oracle knowledge. The main objective of this paper is to introduce a novel Binary Mask estimator based on a simple sparse approximation algorithm. Our approach does not require oracle knowledge and instead uses knowledge of speech structure.

  • ICASSP - A novel Binary Mask estimator based on sparse approximation
    2013 IEEE International Conference on Acoustics Speech and Signal Processing, 2013
    Co-Authors: Abigail A. Kressner, David V. Anderson, Christopher J. Rozell
    Abstract:

    While most single-channel noise reduction algorithms fail to improve speech intelligibility, the ideal Binary Mask (IBM) has demonstrated substantial intelligibility improvements. However, this approach exploits oracle knowledge. The main objective of this paper is to introduce a novel Binary Mask estimator based on a simple sparse approximation algorithm. Our approach does not require oracle knowledge and instead uses knowledge of speech structure.

  • causal Binary Mask estimation for speech enhancement using sparsity constraints
    Journal of the Acoustical Society of America, 2013
    Co-Authors: Abigail A. Kressner, David V. Anderson, Christopher J. Rozell
    Abstract:

    While most single-channel noise reduction algorithms fail to improve speech intelligibility, the ideal Binary Mask (IBM) has demonstrated substantial intelligibility improvements for both normal- and impaired-hearing listeners. However, this approach exploits oracle knowledge of the target and interferer signals to preserve only the time-frequency regions that are target-dominated. Single-channel noise suppression algorithms trying to approximate the IBM using locally estimated signal-to-noise ratios without oracle knowledge have had limited success. Thought of in another way, the IBM exploits the disjoint placement of the target and interferer in time and frequency to create a time-frequency signal representation that is more sparse (i.e., has fewer non-zeros). In recent work (submitted to ICASSP 2013) we have introduced a novel time-frequency Masking algorithm based on a sparse approximation algorithm from the signal processing literature. However, the algorithm employs a non-causal estimator. The present work introduces an improved de-noising algorithm that uses more realistic frame-based (causal) computations to estimate a Binary Mask.

Philipos C Loizou - One of the best experts on this subject based on the ideXlab platform.

  • INTERSPEECH - A new Binary Mask based on noise constraints for improved speech intelligibility.
    2020
    Co-Authors: Philipos C Loizou
    Abstract:

    It has been shown that large gains in speech intelligibility can be obtained by using the Binary Mask approach which retains the time-frequency (T-F) units of the mixture signal that are stronger than the interfering noise (Masker) (i.e., SNR>0 dB), and removes the T-F units where the interfering noise dominates. In this paper, we introduce a new Binary Mask for improving speech intelligibility based on noise distortion constraints. A Binary Mask is designed to retain noise overestimated T-F units while discarding noise underestimated T-F units. Listening tests were conducted to evaluate the new Binary Mask in terms of intelligibility. Results from the listening tests indicated that large gains in intelligibility can be achieved by the application of the proposed Binary Mask to noise-corrupted speech even at extremely low SNR levels (-10 dB). Index Terms: speech intelligibility, noise estimation, speech enhancement

  • INTERSPEECH - Binary Mask Estimation for Improved Speech Intelligibility in Reverberant Environments.
    2020
    Co-Authors: Oldooz Hazrati, Philipos C Loizou
    Abstract:

    A blind (non-ideal) time-frequency (T-F) Masking technique is proposed for suppressing reverberation. A Binary Mask is estimated at each T-F unit by extracting a single variance-based feature from the reverberant signal and comparing its value against an adaptive threshold. The performance of the estimated Binary Mask is evaluated using intelligibility listening tests with hearing impaired listeners in four moderate to highly reverberant conditions. Results indicated that the proposed T-F Masking technique yielded significant improvements in intelligibility even in highly reverberant conditions (T60 = 1.0 s). This improvement was attributed to the recovery of the vowel/consonant boundaries which are severely smeared in reverberation.

  • Binary Mask estimation for improved speech intelligibility in reverberant environments
    Conference of the International Speech Communication Association, 2012
    Co-Authors: Oldooz Hazrati, Philipos C Loizou
    Abstract:

    A blind (non-ideal) time-frequency (T-F) Masking technique is proposed for suppressing reverberation. A Binary Mask is estimated at each T-F unit by extracting a single variance-based feature from the reverberant signal and comparing its value against an adaptive threshold. The performance of the estimated Binary Mask is evaluated using intelligibility listening tests with hearing impaired listeners in four moderate to highly reverberant conditions. Results indicated that the proposed T-F Masking technique yielded significant improvements in intelligibility even in highly reverberant conditions (T60 = 1.0 s). This improvement was attributed to the recovery of the vowel/consonant boundaries which are severely smeared in reverberation.

  • a new Binary Mask based on noise constraints for improved speech intelligibility
    Conference of the International Speech Communication Association, 2010
    Co-Authors: Philipos C Loizou
    Abstract:

    It has been shown that large gains in speech intelligibility can be obtained by using the Binary Mask approach which retains the time-frequency (T-F) units of the mixture signal that are stronger than the interfering noise (Masker) (i.e., SNR>0 dB), and removes the T-F units where the interfering noise dominates. In this paper, we introduce a new Binary Mask for improving speech intelligibility based on noise distortion constraints. A Binary Mask is designed to retain noise overestimated T-F units while discarding noise underestimated T-F units. Listening tests were conducted to evaluate the new Binary Mask in terms of intelligibility. Results from the listening tests indicated that large gains in intelligibility can be achieved by the application of the proposed Binary Mask to noise-corrupted speech even at extremely low SNR levels (-10 dB). Index Terms: speech intelligibility, noise estimation, speech enhancement

  • Improving Speech Intelligibility in Noise Using a Binary Mask That Is Based on Magnitude Spectrum Constraints
    IEEE Signal Processing Letters, 2010
    Co-Authors: Philipos C Loizou
    Abstract:

    A new Binary Mask is introduced for improving speech intelligibility based on magnitude spectrum constraints. The proposed Binary Mask is designed to retain time-frequency (T-F) units of the mixture signal satisfying a magnitude constraint while discarding T-F units violating the constraint. Motivated by prior intelligibility studies of speech synthesized using the ideal Binary Mask, an algorithm is proposed that decomposes the input signal into T-F units and makes Binary decisions, based on a Bayesian classifier, as to whether each T-F unit satisfies the magnitude constraint or not. Speech corrupted at low signal-to-noise (SNR) levels (-5 and 0 dB) using different types of Maskers is synthesized by this algorithm and presented to normal-hearing listeners for identification. Results indicated substantial improvements in intelligibility over that attained by human listeners with unprocessed stimuli.

Deliang Wang - One of the best experts on this subject based on the ideXlab platform.

  • INTERSPEECH - On the Role of Binary Mask Pattern in Automatic Speech Recognition.
    2020
    Co-Authors: Arun Narayanan, Deliang Wang
    Abstract:

    Processing noisy signals using the ideal Binary Mask has been shown to improve automatic speech recognition (ASR) performance. In this paper, we present the first study that investigates the role of Mask patterns in ASR under varying signalto-noise ratios (SNR), noise conditions and Mask definitions. Binary Masks are typically computed either by comparing the local SNR within a time-frequency unit of a mixture signal with a threshold termed the local criterion (LC), or by comparing the local target energy with the long-term average energy of speech. Results show that: (i) Akin to human speech recognition, Binary Masking can significantly improve ASR even when the mixture SNR is as low as -60 dB. (ii) The difference between the LC and the mixture SNR is more correlated to the recognition accuracy than LC. (iii) The performance profiles in ASR are qualitatively similar to those obtained for human speech recognition. (iv) The LC at which the peak performance is obtained is lower than 0 dB, which is the optimal threshold as far as the SNR gain of processed signals is concerned. This indicates that maximizing SNR gain may not be the optimal criterion to improve either human or machine recognition of noisy speech.

  • the role of Binary Mask patterns in automatic speech recognition in background noise
    Journal of the Acoustical Society of America, 2013
    Co-Authors: Arun Narayanan, Deliang Wang
    Abstract:

    Processing noisy signals using the ideal Binary Mask improves automatic speech recognition (ASR) performance. This paper presents the first study that investigates the role of Binary Mask patterns in ASR under various noises, signal-to-noise ratios (SNRs), and vocabulary sizes. Binary Masks are computed either by comparing the SNR within a time-frequency unit of a mixture signal with a local criterion (LC), or by comparing the local target energy with the long-term average spectral energy of speech. ASR results show that (1) akin to human speech recognition, Binary Masking significantly improves ASR performance even when the SNR is as low as −60 dB; (2) the ASR performance profiles are qualitatively similar to those obtained in human intelligibility experiments; (3) the difference between the LC and mixture SNR is more correlated to the recognition accuracy than LC; (4) LC at which the performance peaks is lower than 0 dB, which is the threshold that maximizes the SNR gain of processed signals. This broad agreement with human performance is rather surprising. The results also indicate that maximizing the SNR gain is probably not an appropriate goal for improving either human or machine recognition of noisy speech.

  • on the role of Binary Mask pattern in automatic speech recognition
    Conference of the International Speech Communication Association, 2012
    Co-Authors: Arun Narayanan, Deliang Wang
    Abstract:

    Processing noisy signals using the ideal Binary Mask has been shown to improve automatic speech recognition (ASR) performance. In this paper, we present the first study that investigates the role of Mask patterns in ASR under varying signalto-noise ratios (SNR), noise conditions and Mask definitions. Binary Masks are typically computed either by comparing the local SNR within a time-frequency unit of a mixture signal with a threshold termed the local criterion (LC), or by comparing the local target energy with the long-term average energy of speech. Results show that: (i) Akin to human speech recognition, Binary Masking can significantly improve ASR even when the mixture SNR is as low as -60 dB. (ii) The difference between the LC and the mixture SNR is more correlated to the recognition accuracy than LC. (iii) The performance profiles in ASR are qualitatively similar to those obtained for human speech recognition. (iv) The LC at which the peak performance is obtained is lower than 0 dB, which is the optimal threshold as far as the SNR gain of processed signals is concerned. This indicates that maximizing SNR gain may not be the optimal criterion to improve either human or machine recognition of noisy speech.

  • role of Mask pattern in intelligibility of ideal Binary Masked noisy speech
    Journal of the Acoustical Society of America, 2009
    Co-Authors: Ulrik Kjems, Jesper Bunsow Boldt, Michael Syskind Pedersen, Thomas Lunner, Deliang Wang
    Abstract:

    Intelligibility of ideal Binary Masked noisy speech was measured on a group of normal hearing individuals across mixture signal to noise ratio (SNR) levels, Masker types, and local criteria for forming the Binary Mask. The Binary Mask is computed from time-frequency decompositions of target and Masker signals using two different schemes: an ideal Binary Mask computed by thresholding the local SNR within time-frequency units and a target Binary Mask computed by comparing the local target energy against the long-term average speech spectrum. By depicting intelligibility scores as a function of the difference between mixture SNR and local SNR threshold, alignment of the performance curves is obtained for a large range of mixture SNR levels. Large intelligibility benefits are obtained for both sparse and dense Binary Masks. When an ideal Mask is dense with many ones, the effect of changing mixture SNR level while fixing the Mask is significant, whereas for more sparse Masks the effect is small or insignificant.

  • estimation of the ideal Binary Mask using directional systems
    The 11th International Workshop on Acoustic Echo and Noise Control (IWAENC) Seattle WA USA, 2008
    Co-Authors: Jesper Bunsow Boldt, Ulrik Kjems, Michael Syskind Pedersen, Thomas Lunner, Deliang Wang
    Abstract:

    The ideal Binary Mask is often seen as a goal for time-frequencyMasking algorithms trying to increase speech intelligibility, but therequired availability of the unmixed signals makes it difficult to calculatethe ideal Binary Mask in any real-life applications. In thispaper we derive the theory and the requirements to enable calculationsof the ideal Binary Mask using a directional system without theavailability of the unmixed signals. The proposed method has a lowcomplexity and is verified using computer simulation in both idealand non-ideal setups showing promising results.Index Terms— Time-Frequency Masking, Directional systems,Ideal Binary Mask, Speech Intelligibility, Sound separation

Richard C. Hendriks - One of the best experts on this subject based on the ideXlab platform.

  • Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions
    IEEE Transactions on Audio Speech and Language Processing, 2012
    Co-Authors: Jesper Jensen, Richard C. Hendriks
    Abstract:

    Recently, Binary Mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A Binary gain function is applied to time-frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time-frequency regions. When implemented using discrete Fourier transform (DFT) techniques, the Binary Mask techniques can be seen as a special case of the broader class of DFT-based speech enhancement algorithms, for which the applied gain function is not constrained to be Binary. In this context, we develop and compare Binary Mask techniques to state-of-the-art continuous gain techniques. We derive spectral magnitude minimum mean-square error Binary gain estimators; the Binary gain estimators turn out to be simple functions of the continuous gain estimators. We show that the optimal Binary estimators are closely related to a range of existing, heuristically developed, Binary gain estimators. The derived Binary gain estimators perform better than existing Binary gain estimators in simulation experiments with speech signals contaminated by several different noise sources as measured by speech quality and intelligibility measures. However, even the best Binary Mask method is significantly outperformed by state-of-the-art continuous gain estimators. The instrumental intelligibility results are confirmed in an intelligibility listening test.

  • ICASSP - Spectral magnitude minimum mean-square error Binary Masks for DFT based speech enhancement
    2011 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2011
    Co-Authors: Jesper Jensen, Richard C. Hendriks
    Abstract:

    Originally, ideal Binary Mask (idbm) techniques have been used as a tool for studying aspects of the auditory system. More recently, idbm techniques have been adapted to the practical problem of retrieving a target speech signal from a noisy observation. In this practical setting, the Binary Mask techniques show similarities with existing DFT based speech enhancement techniques. In this context, we derive single-channel, Binary Mask estimators which minimize the spectral magnitude mean-square error. We show in simulation experiments with natural speech and noise signals that the proposed estimators perform significantly better than existing Binary Mask estimators. However, even the best of the proposed estimators is clearly outperformed by non-Binary estimators, both in terms of speech quality and intelligibility.