Spectral Peak

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 21708 Experts worldwide ranked by ideXlab platform

Abeer Alwan - One of the best experts on this subject based on the ideXlab platform.

  • SPeaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment
    IEEE Transactions on Audio Speech and Language Processing, 2007
    Co-Authors: Shizhen Wang, Abeer Alwan
    Abstract:

    Spectral mismatch between training and testing utterances can cause significant degradation in the performance of automatic speech recognition (ASR) systems. SPeaker adaptation and sPeaker normalization techniques are usually applied to address this issue. One way to reduce Spectral mismatch is to reshape the spectrum by aligning corresponding formant Peaks. There are various levels of mismatch in formant structures. In this paper, regression-tree-based phoneme- and state-level Spectral Peak alignment is proposed for rapid sPeaker adaptation using linearization of the vocal tract length normalization (VTLN) technique. This method is investigated in a maximum-likelihood linear regression (MLLR)-like framework, taking advantage of both the efficiency of frequency warping (VTLN) and the reliability of statistical estimations (MLLR). Two different regression classes are investigated: one based on phonetic classes (using combined knowledge and data-driven techniques) and the other based on Gaussian mixture classes. Compared to MLLR, VTLN, and global Peak alignment, improved performance can be obtained for both supervised and unsupervised adaptations for both medium vocabulary (the RM1 database) and connected digits recognition (the TIDIGITS database) tasks. Performance improvements are largest with limited adaptation data which is often the case for ASR applications, and these improvements are shown to be statistically significant.

  • INTERSPEECH - Rapid sPeaker adaptation using regression-tree based Spectral Peak alignment.
    2006
    Co-Authors: Shizhen Wang, Abeer Alwan
    Abstract:

    In this paper, regression-tree based Spectral Peak alignment is proposed for rapid sPeaker adaptation using the linearization of VTLN. Two different regression classes are investigated: phonetic classes (using combined knowledge and data-driven techniques) and mixture classes. Compared to MLLR and VTLN, improved performance can be obtained for both supervised and unsupervised adaptations on both medium vocabulary and connected digits recognition tasks. To further improve the performance, MLLR was integrated into this regression-tree based Peak alignment. Experimental results show that the performance improvements can be achieved even with limited adaptation data.

H. Pien - One of the best experts on this subject based on the ideXlab platform.

  • High-resolution biosensor Spectral Peak shift estimation
    IEEE Transactions on Signal Processing, 2005
    Co-Authors: W.c. Karl, H. Pien
    Abstract:

    In this paper, we present a maximum likelihood (ML) approach to high-resolution estimation of the shifts of a Spectral signal. This Spectral signal arises in application of optically based resonant biosensors, where high resolution in the estimation of signal shift is synonymous with high sensitivity to biological interactions. For the particular sensor of interest, the underlying signal is nonuniformly sampled and exhibits Poisson amplitude statistics. Shift estimation accuracies orders of magnitude finer than the sample spacing are sought. The new ML-based formulation leads to a solution approach different from typical resonance shift estimation methods based on polynomial fitting and Peak (or null) estimation and tracking.

  • High-resolution biosensor Spectral Peak shift estimation
    The Thrity-Seventh Asilomar Conference on Signals Systems & Computers 2003, 2003
    Co-Authors: W.c. Karl, H. Pien
    Abstract:

    In this work we present a maximum likelihood (ML) approach to high-resolution estimation of the shifts of a Spectral signal. This Spectral signal arises in application of optically-based resonant biosensors, where high-resolution in the estimation of signal shift is synonymous with high-sensitivity to biological interactions. The underlying signal is nonuniformly sampled and exhibits Poisson noise statistics. Shift estimation accuracies orders of magnitude finer than the sample spacing are sought. The ML-based formulation leads to a solution approach different from typical resonance shift estimation methods based on polynomial fitting and Peak (or null) estimation and tracking.

T Brunelli - One of the best experts on this subject based on the ideXlab platform.

  • speech perception abilities of adult and pediatric nucleus implant recipients using the Spectral Peak sPeak coding strategy
    Otolaryngology-Head and Neck Surgery, 1997
    Co-Authors: S Staller, C Menapace, E Domico, D Mills, R C Dowell, A Geers, S Pijl, S Hasenstab, M Justus, T Brunelli
    Abstract:

    Abstract A series of 73 postlinguistically deafened adults and 34 prelinguistically deafened children were evaluated with the Spectral Peak (SPeak) coding strategy of the Nucleus 22-channel cochlear implant. The adults who received consecutive implants demonstrated rapid acquisition of open-set speech recognition skills in the initial postoperative period. Group mean sentence recognition improved to 53.5% ( n = 52) after 2 weeks, 62.1% ( n = 55) after 1 month, 69.8% ( n = 57) after 3 months, and 74.4% ( n = 42) after 6 months of use. At the 6-month evaluation interval, 43% of subjects scored greater than 90% on sound-alone sentence recognition in quiet and only one patient (2.4%) scored less than 10%. Mean monosyllabic word recognition was 35.6% after 6 months of use. The 34 prelinguistically deafened children were converted from the MultiPeak strategy to Spectral Peak strategy at four large pediatric implant centers. After 6 months of using the new coding strategy, the children demonstrated significant improvements in their speech perception abilities. (Otolaryngol Head Neck Surg 1997;117:236-42.)

  • Speech perception abilities of adult and pediatric Nucleus implant recipients using the Spectral Peak (SPeak) coding strategy.
    Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery, 1997
    Co-Authors: S Staller, C Menapace, E Domico, D Mills, R C Dowell, A Geers, S Pijl, S Hasenstab, M Justus, T Brunelli
    Abstract:

    A series of 73 postlinguistically deafened adults and 34 prelinguistically deafened children were evaluated with the Spectral Peak (SPeak) coding strategy of the Nucleus 22-channel cochlear implant. The adults who received consecutive implants demonstrated rapid acquisition of open-set speech recognition skills in the initial postoperative period. Group mean sentence recognition improved to 53.5% (n = 52) after 2 weeks, 62.1% (n = 55) after 1 month, 69.8% (n = 57) after 3 months, and 74.4% (n = 42) after 6 months of use. At the 6-month evaluation interval, 43% of subjects scored greater than 90% on sound-alone sentence recognition in quiet and only one patient (2.4%) scored less than 10%. Mean monosyllabic word recognition was 35.6% after 6 months of use. The 34 prelinguistically deafened children were converted from the MultiPeak strategy to Spectral Peak strategy at four large pediatric implant centers. After 6 months of using the new coding strategy, the children demonstrated significant improvements in their speech perception abilities.

Shizhen Wang - One of the best experts on this subject based on the ideXlab platform.

  • SPeaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment
    IEEE Transactions on Audio Speech and Language Processing, 2007
    Co-Authors: Shizhen Wang, Abeer Alwan
    Abstract:

    Spectral mismatch between training and testing utterances can cause significant degradation in the performance of automatic speech recognition (ASR) systems. SPeaker adaptation and sPeaker normalization techniques are usually applied to address this issue. One way to reduce Spectral mismatch is to reshape the spectrum by aligning corresponding formant Peaks. There are various levels of mismatch in formant structures. In this paper, regression-tree-based phoneme- and state-level Spectral Peak alignment is proposed for rapid sPeaker adaptation using linearization of the vocal tract length normalization (VTLN) technique. This method is investigated in a maximum-likelihood linear regression (MLLR)-like framework, taking advantage of both the efficiency of frequency warping (VTLN) and the reliability of statistical estimations (MLLR). Two different regression classes are investigated: one based on phonetic classes (using combined knowledge and data-driven techniques) and the other based on Gaussian mixture classes. Compared to MLLR, VTLN, and global Peak alignment, improved performance can be obtained for both supervised and unsupervised adaptations for both medium vocabulary (the RM1 database) and connected digits recognition (the TIDIGITS database) tasks. Performance improvements are largest with limited adaptation data which is often the case for ASR applications, and these improvements are shown to be statistically significant.

  • INTERSPEECH - Rapid sPeaker adaptation using regression-tree based Spectral Peak alignment.
    2006
    Co-Authors: Shizhen Wang, Abeer Alwan
    Abstract:

    In this paper, regression-tree based Spectral Peak alignment is proposed for rapid sPeaker adaptation using the linearization of VTLN. Two different regression classes are investigated: phonetic classes (using combined knowledge and data-driven techniques) and mixture classes. Compared to MLLR and VTLN, improved performance can be obtained for both supervised and unsupervised adaptations on both medium vocabulary and connected digits recognition tasks. To further improve the performance, MLLR was integrated into this regression-tree based Peak alignment. Experimental results show that the performance improvements can be achieved even with limited adaptation data.

M. Magimai-doss - One of the best experts on this subject based on the ideXlab platform.

  • HMM/ANN based Spectral Peak location estimation for noise robust speech recognition
    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 2005
    Co-Authors: S. Ikbal, H. Bourlard, M. Magimai-doss
    Abstract:

    In this paper, we present an HMM/ANN based algorithm to estimate the Spectral Peak locations. This algorithm makes use of distinct time-frequency (TF) patterns in the spectrogram for estimating the Peak locations. Such a use of TF patterns is expected to impose temporal constraints during the Peak estimation task, thereby yielding a smoother estimate of the Peaks over time. Additionally, the algorithm uses an ergodic topology for the HMM/ANN, thus allowing an estimation of a varying number of Peak locations over time. The usefulness of the proposed algorithm is evaluated in the framework of a recently introduced noise robust feature called the spectro-temporal activity pattern (STAP) feature. Interestingly, the recently introduced phase autocorrelation (PAC) spectrum, with enhanced Spectral Peaks and smoothed Spectral valleys, turns out to be more appropriate for this algorithm than the regular spectrum.

  • ICASSP (1) - HMM/ANN based Spectral Peak location estimation for noise robust speech recognition
    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 2005
    Co-Authors: S. Ikbal, H. Bourlard, M. Magimai-doss
    Abstract:

    In this paper, we present an HMM/ANN based algorithm to estimate the Spectral Peak locations. This algorithm makes use of distinct time-frequency (TF) patterns in the spectrogram for estimating the Peak locations. Such a use of TF patterns is expected to impose temporal constraints during the Peak estimation task, thereby yielding a smoother estimate of the Peaks over time. Additionally, the algorithm uses an ergodic topology for the HMM/ANN, thus allowing an estimation of a varying number of Peak locations over time. The usefulness of the proposed algorithm is evaluated in the framework of a recently introduced noise robust feature called the spectro-temporal activity pattern (STAP) feature. Interestingly, the recently introduced phase autocorrelation (PAC) spectrum, with enhanced Spectral Peaks and smoothed Spectral valleys, turns out to be more appropriate for this algorithm than the regular spectrum.