Linear Transformation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 321 Experts worldwide ranked by ideXlab platform

Hermann Ney - One of the best experts on this subject based on the ideXlab platform.

  • Vocal tract normalization equals Linear Transformation in cepstral space
    IEEE Transactions on Speech and Audio Processing, 2005
    Co-Authors: Michael Pitz, Sirko Molau, Ralf Schlüter, Hermann Ney
    Abstract:

    Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a Linear Transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the Transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the Transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.

  • implementing frequency warping and vtln through Linear Transformation of conventional mfcc
    Conference of the International Speech Communication Association, 2005
    Co-Authors: S Umesh, Andras Zolnay, Hermann Ney
    Abstract:

    In this paper, we show that frequency-warping (including VTLN) can be implemented through Linear Transformation of conventional MFCC. Unlike the Pitz-Ney [1] continuous domain approach, we directly determine the relation between frequency-warping and the Linear-Transformation in the discrete-domain. The advantage of such an approach is that it can be applied to any frequency-warping and is not limited to cases where an analytical closed-form solution can be found. The proposed method exploits the bandlimited interpolation idea (in the frequency-domain) to do the necessary frequency-warping and yields exact results as long as the cepstral coefficients are que-frency limited. This idea of quefrencylimitedness shows the importance of the filter-bank smoothing of the spectra which has been ignored in [1, 2]. Furthermore, unlike [1], since we operate in the discrete domain, we can also apply the usual discrete-cosine transform (i.e. DCT-II) on the logarithm of the filter-bank output to get conventional MFCC features. Therefore, using our proposed method, we can Linearly transform conventional MFCC cepstra to do VTLN and we do not require any recomputation of the warped-features. We provide experimental results in support of this approach.

  • vocal tract normalization as Linear Transformation of mfcc
    Conference of the International Speech Communication Association, 2003
    Co-Authors: Michael Pitz, Hermann Ney
    Abstract:

    We have shown previously that vocal tract normalization (VTN) results in a Linear Transformation in the cepstral domain. In this paper we show that Mel-frequency warping can equally well be integrated into the framework of VTN as Linear Transformation on the cepstrum. We show examples of Transformation matrices to obtain VTN warped Mel-frequency cepstral coefficients (VTN-MFCC) as Linear Transformation of the original MFCC and discuss the effect of Mel-frequency warping on the Jacobian determinant of the Transformation matrix. Finally we show that there is a strong interdependence of VTN and Maximum Likelihood Linear Regression (MLLR) for the case of Gaussian emission probabilities.

Michael Pitz - One of the best experts on this subject based on the ideXlab platform.

  • Vocal tract normalization equals Linear Transformation in cepstral space
    IEEE Transactions on Speech and Audio Processing, 2005
    Co-Authors: Michael Pitz, Sirko Molau, Ralf Schlüter, Hermann Ney
    Abstract:

    Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a Linear Transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the Transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the Transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.

  • vocal tract normalization as Linear Transformation of mfcc
    Conference of the International Speech Communication Association, 2003
    Co-Authors: Michael Pitz, Hermann Ney
    Abstract:

    We have shown previously that vocal tract normalization (VTN) results in a Linear Transformation in the cepstral domain. In this paper we show that Mel-frequency warping can equally well be integrated into the framework of VTN as Linear Transformation on the cepstrum. We show examples of Transformation matrices to obtain VTN warped Mel-frequency cepstral coefficients (VTN-MFCC) as Linear Transformation of the original MFCC and discuss the effect of Mel-frequency warping on the Jacobian determinant of the Transformation matrix. Finally we show that there is a strong interdependence of VTN and Maximum Likelihood Linear Regression (MLLR) for the case of Gaussian emission probabilities.

Eng Siong Chng - One of the best experts on this subject based on the ideXlab platform.

  • ICASSP - Generalization of temporal filter and Linear Transformation for robust speech recognition
    2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2014
    Co-Authors: Duc Hoang Ha Nguyen, Xiong Xiao, Eng Siong Chng
    Abstract:

    Temporal filtering of feature trajectories and Linear Transformation of feature vectors are two effective ways to compensate the speech features to achieve robust speech recognition in noisy and reverberant environments. In the previous studies, as the two methods are usually applied in sequence, the interaction between the two methods is not optimized. In this paper, we propose a generalized transform which integrates temporal filter and Linear Transformation into a single process. The new transform parameters are optimized to minimize an approximated Kullback-Leibler divergence between the distribution of the compensated features and the distribution represented by a clean reference model. The proposed method is evaluated on the Aurora-5 clean condition training task. The experiments show that the generalized transform significantly outperforms the simple cascade of temporal filtering and Linear Transformation. For example, the word accuracy is improved from 81.55% (cascade) to 83.99% (generalized) and from 72.09% (cascade) to 76.04% (generalized) for office and living room environments, respectively, in speaker based feature adaptation scheme.

Duc Hoang Ha Nguyen - One of the best experts on this subject based on the ideXlab platform.

  • ICASSP - Generalization of temporal filter and Linear Transformation for robust speech recognition
    2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2014
    Co-Authors: Duc Hoang Ha Nguyen, Xiong Xiao, Eng Siong Chng
    Abstract:

    Temporal filtering of feature trajectories and Linear Transformation of feature vectors are two effective ways to compensate the speech features to achieve robust speech recognition in noisy and reverberant environments. In the previous studies, as the two methods are usually applied in sequence, the interaction between the two methods is not optimized. In this paper, we propose a generalized transform which integrates temporal filter and Linear Transformation into a single process. The new transform parameters are optimized to minimize an approximated Kullback-Leibler divergence between the distribution of the compensated features and the distribution represented by a clean reference model. The proposed method is evaluated on the Aurora-5 clean condition training task. The experiments show that the generalized transform significantly outperforms the simple cascade of temporal filtering and Linear Transformation. For example, the word accuracy is improved from 81.55% (cascade) to 83.99% (generalized) and from 72.09% (cascade) to 76.04% (generalized) for office and living room environments, respectively, in speaker based feature adaptation scheme.

Hynek Hermansky - One of the best experts on this subject based on the ideXlab platform.

  • feature extraction using non Linear Transformation for robust speech recognition on the aurora database
    International Conference on Acoustics Speech and Signal Processing, 2000
    Co-Authors: Sangita Sharma, Daniel P W Ellis, Sachin S Kajarekar, Pratibha Jain, Hynek Hermansky
    Abstract:

    We evaluate the performance of several feature sets on the Aurora task as defined by ETSI. We show that after a non-Linear Transformation, a number of features can be effectively used in a HMM-based recognition system. The non-Linear Transformation is computed using a neural network which is discriminatively trained on the phonetically labeled (forcibly aligned) training data. A combination of the non-Linearly transformed PLP (perceptive Linear predictive coefficients), MSG (modulation filtered spectrogram) and TRAP (temporal pattern) features yields a 63% improvement in error rate as compared to baseline me frequency cepstral coefficients features. The use of the non-Linearly transformed RASTA-like features, with system parameters scaled down to take into account the ETSI imposed memory and latency constraints, still yields a 40% improvement in error rate.