Speaker Identification

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 12390 Experts worldwide ranked by ideXlab platform

De Liang Wang - One of the best experts on this subject based on the ideXlab platform.

  • robust Speaker Identification in noisy and reverberant conditions
    International Conference on Acoustics Speech and Signal Processing, 2014
    Co-Authors: Xiaojia Zhao, Yuxuan Wang, De Liang Wang
    Abstract:

    Robustness of Speaker recognition systems is crucial for real-world applications, which typically contain both additive noise and room reverberation. However, the combined effects of additive noise and convolutive reverberation have been rarely studied in Speaker Identification (SID). This paper addresses this issue in two phases. We first remove background noise through binary masking using a deep neural network classifier. Then we perform robust SID with Speaker models trained in selected reverberant conditions, using bounded marginalization and direct masking. Evaluation results show that the proposed system substantially improves SID performance over related systems in a wide range of reverberation time and signal-to-noise ratios.

  • Robust Speaker Identification in noisy and reverberant conditions
    ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014
    Co-Authors: Xiaojia Zhao, Yuxuan Wang, De Liang Wang
    Abstract:

    Robustness of Speaker recognition systems is crucial for real-world applications, which typically contain both additive noise and room reverberation. However, the combined effects of additive noise and convolutive reverberation have been rarely studied in Speaker Identification (SID). This paper addresses this issue in two phases. We first remove background noise through binary masking using a deep neural network classifier. Then we perform robust SID with Speaker models trained in selected reverberant conditions, on the basis of bounded marginalization and direct masking. Evaluation results show that the proposed system substantially improves SID performance over related systems in a wide range of reverberation time and signal-to-noise ratios.

  • analyzing noise robustness of mfcc and gfcc features in Speaker Identification
    International Conference on Acoustics Speech and Signal Processing, 2013
    Co-Authors: Xiaojia Zhao, De Liang Wang
    Abstract:

    Automatic Speaker recognition can achieve a high level of performance in matched training and testing conditions. However, such performance drops significantly in mismatched noisy conditions. Recent research indicates that a new Speaker feature, gammatone frequency cepstral coefficients (GFCC), exhibits superior noise robustness to commonly used mel-frequency cepstral coefficients (MFCC). To gain a deep understanding of the intrinsic robustness of GFCC relative to MFCC, we design Speaker Identification experiments to systematically analyze their differences and similarities. This study reveals that the nonlinear rectification accounts for the noise robustness differences primarily. Moreover, this study suggests how to enhance MFCC robustness, and further improve GFCC robustness by adopting a different time-frequency representation.

  • CASA-Based robust Speaker Identification
    IEEE Transactions on Audio Speech and Language Processing, 2012
    Co-Authors: Xiaojia Zhao, Yang Shao, De Liang Wang
    Abstract:

    Conventional Speaker recognition systems perform poorly under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time–frequency mask. We investigate CASA for robust Speaker Identification. We first introduce a novel Speaker feature, gammatone frequency cepstral coefficient (GFCC), based on an auditory periphery model, and show that this feature captures Speaker characteristics and performs substantially better than conventional Speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by a CASA mask. We find that both reconstruction and marginalization are effective. We further combine the two methods into a single system based on their complementary advantages, and this system achieves significant performance improvements over related systems under a wide range of signal-to-noise ratios.

  • robust Speaker Identification using auditory features and computational auditory scene analysis
    International Conference on Acoustics Speech and Signal Processing, 2008
    Co-Authors: Yang Shao, De Liang Wang
    Abstract:

    The performance of Speaker recognition systems drop significantly under noisy conditions. To improve robustness, we have recently proposed novel auditory features and a robust Speaker recognition system using a front-end based on computational auditory scene analysis. In this paper, we further study the auditory features by exploring different feature dimensions and incorporating dynamic features. In addition, we evaluate the features and robust recognition in a Speaker Identification task in a number of noisy conditions. We find that one of the auditory features performs substantially better than a conventional Speaker feature. Furthermore, our recognition system achieves significant performance improvements compared with an advanced front-end in a wide range of signal-to-noise conditions.

Xiaojia Zhao - One of the best experts on this subject based on the ideXlab platform.

  • robust Speaker Identification in noisy and reverberant conditions
    International Conference on Acoustics Speech and Signal Processing, 2014
    Co-Authors: Xiaojia Zhao, Yuxuan Wang, De Liang Wang
    Abstract:

    Robustness of Speaker recognition systems is crucial for real-world applications, which typically contain both additive noise and room reverberation. However, the combined effects of additive noise and convolutive reverberation have been rarely studied in Speaker Identification (SID). This paper addresses this issue in two phases. We first remove background noise through binary masking using a deep neural network classifier. Then we perform robust SID with Speaker models trained in selected reverberant conditions, using bounded marginalization and direct masking. Evaluation results show that the proposed system substantially improves SID performance over related systems in a wide range of reverberation time and signal-to-noise ratios.

  • Robust Speaker Identification in noisy and reverberant conditions
    ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014
    Co-Authors: Xiaojia Zhao, Yuxuan Wang, De Liang Wang
    Abstract:

    Robustness of Speaker recognition systems is crucial for real-world applications, which typically contain both additive noise and room reverberation. However, the combined effects of additive noise and convolutive reverberation have been rarely studied in Speaker Identification (SID). This paper addresses this issue in two phases. We first remove background noise through binary masking using a deep neural network classifier. Then we perform robust SID with Speaker models trained in selected reverberant conditions, on the basis of bounded marginalization and direct masking. Evaluation results show that the proposed system substantially improves SID performance over related systems in a wide range of reverberation time and signal-to-noise ratios.

  • analyzing noise robustness of mfcc and gfcc features in Speaker Identification
    International Conference on Acoustics Speech and Signal Processing, 2013
    Co-Authors: Xiaojia Zhao, De Liang Wang
    Abstract:

    Automatic Speaker recognition can achieve a high level of performance in matched training and testing conditions. However, such performance drops significantly in mismatched noisy conditions. Recent research indicates that a new Speaker feature, gammatone frequency cepstral coefficients (GFCC), exhibits superior noise robustness to commonly used mel-frequency cepstral coefficients (MFCC). To gain a deep understanding of the intrinsic robustness of GFCC relative to MFCC, we design Speaker Identification experiments to systematically analyze their differences and similarities. This study reveals that the nonlinear rectification accounts for the noise robustness differences primarily. Moreover, this study suggests how to enhance MFCC robustness, and further improve GFCC robustness by adopting a different time-frequency representation.

  • CASA-Based robust Speaker Identification
    IEEE Transactions on Audio Speech and Language Processing, 2012
    Co-Authors: Xiaojia Zhao, Yang Shao, De Liang Wang
    Abstract:

    Conventional Speaker recognition systems perform poorly under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time–frequency mask. We investigate CASA for robust Speaker Identification. We first introduce a novel Speaker feature, gammatone frequency cepstral coefficient (GFCC), based on an auditory periphery model, and show that this feature captures Speaker characteristics and performs substantially better than conventional Speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by a CASA mask. We find that both reconstruction and marginalization are effective. We further combine the two methods into a single system based on their complementary advantages, and this system achieves significant performance improvements over related systems under a wide range of signal-to-noise ratios.

Yang Shao - One of the best experts on this subject based on the ideXlab platform.

  • CASA-Based robust Speaker Identification
    IEEE Transactions on Audio Speech and Language Processing, 2012
    Co-Authors: Xiaojia Zhao, Yang Shao, De Liang Wang
    Abstract:

    Conventional Speaker recognition systems perform poorly under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time–frequency mask. We investigate CASA for robust Speaker Identification. We first introduce a novel Speaker feature, gammatone frequency cepstral coefficient (GFCC), based on an auditory periphery model, and show that this feature captures Speaker characteristics and performs substantially better than conventional Speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by a CASA mask. We find that both reconstruction and marginalization are effective. We further combine the two methods into a single system based on their complementary advantages, and this system achieves significant performance improvements over related systems under a wide range of signal-to-noise ratios.

  • robust Speaker Identification using auditory features and computational auditory scene analysis
    International Conference on Acoustics Speech and Signal Processing, 2008
    Co-Authors: Yang Shao, De Liang Wang
    Abstract:

    The performance of Speaker recognition systems drop significantly under noisy conditions. To improve robustness, we have recently proposed novel auditory features and a robust Speaker recognition system using a front-end based on computational auditory scene analysis. In this paper, we further study the auditory features by exploring different feature dimensions and incorporating dynamic features. In addition, we evaluate the features and robust recognition in a Speaker Identification task in a number of noisy conditions. We find that one of the auditory features performs substantially better than a conventional Speaker feature. Furthermore, our recognition system achieves significant performance improvements compared with an advanced front-end in a wide range of signal-to-noise conditions.

  • incorporating auditory feature uncertainties in robust Speaker Identification
    International Conference on Acoustics Speech and Signal Processing, 2007
    Co-Authors: Yang Shao, Soundararajan Srinivasan, De Liang Wang
    Abstract:

    Conventional Speaker recognition systems perform poorly under noisy conditions. Recent research suggests that binary time-frequency (T-F) masks be a promising front-end for robust Speaker recognition. In this paper, we propose novel auditory features based on an auditory periphery model, and show that these features capture significant Speaker characteristics. Additionally, we estimate uncertainties of the auditory features based on binary T-F masks, and calculate Speaker likelihood scores using uncertainty decoding. Our approach achieves substantial performance improvement in a Speaker Identification task compared with a state-of-the-art robust front-end in a wide range of signal-to-noise conditions.

Yuxuan Wang - One of the best experts on this subject based on the ideXlab platform.

  • robust Speaker Identification in noisy and reverberant conditions
    International Conference on Acoustics Speech and Signal Processing, 2014
    Co-Authors: Xiaojia Zhao, Yuxuan Wang, De Liang Wang
    Abstract:

    Robustness of Speaker recognition systems is crucial for real-world applications, which typically contain both additive noise and room reverberation. However, the combined effects of additive noise and convolutive reverberation have been rarely studied in Speaker Identification (SID). This paper addresses this issue in two phases. We first remove background noise through binary masking using a deep neural network classifier. Then we perform robust SID with Speaker models trained in selected reverberant conditions, using bounded marginalization and direct masking. Evaluation results show that the proposed system substantially improves SID performance over related systems in a wide range of reverberation time and signal-to-noise ratios.

  • Robust Speaker Identification in noisy and reverberant conditions
    ICASSP IEEE International Conference on Acoustics Speech and Signal Processing - Proceedings, 2014
    Co-Authors: Xiaojia Zhao, Yuxuan Wang, De Liang Wang
    Abstract:

    Robustness of Speaker recognition systems is crucial for real-world applications, which typically contain both additive noise and room reverberation. However, the combined effects of additive noise and convolutive reverberation have been rarely studied in Speaker Identification (SID). This paper addresses this issue in two phases. We first remove background noise through binary masking using a deep neural network classifier. Then we perform robust SID with Speaker models trained in selected reverberant conditions, on the basis of bounded marginalization and direct masking. Evaluation results show that the proposed system substantially improves SID performance over related systems in a wide range of reverberation time and signal-to-noise ratios.

Michael Schmidt - One of the best experts on this subject based on the ideXlab platform.

  • Text-Independent Speaker Identification
    IEEE Signal Processing Magazine, 1994
    Co-Authors: Herbert Gish, Michael Schmidt
    Abstract:

    We describe current approaches to text-independent Speaker Identification based on probabilistic modeling techniques. The probabilistic approaches have largely supplanted methods based on comparisons of long-term feature averages. The probabilistic approaches have an important and basic dichotomy into nonparametric and parametric probability models. Nonparametric models have the advantage of being potentially more accurate models (though possibly more fragile) while parametric models that offer computational efficiencies and the ability to characterize the effects of the environment by the effects on the parameters. A robust Speaker-Identification system is presented that was able to deal with various forms of anomalies that are localized in time, such as spurious noise events and crosstalk. It is based on a segmental approach in which normalized segment scores formed the basic input for a variety of robust 43% procedures. Experimental results are presented, illustrating 59% the advantages and disadvantages of the different procedures. 64%. We show the role that cross-validation can play in determining how to weight the different sources of information when combining them into a single score. Finally we explore a Bayesian approach to measuring confidence in the decisions made, which enabled us to reject the consideration of certain tests in order to achieve an improved, predicted performance level on the tests that were retained.