Excitation Source

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 324 Experts worldwide ranked by ideXlab platform

B. Yegnanarayana - One of the best experts on this subject based on the ideXlab platform.

  • INTERSPEECH - Effect of Tongue Tip Trilling on the Glottal Excitation Source.
    2020
    Co-Authors: Vinay Kumar Mittal, N. Dhananjaya, B. Yegnanarayana
    Abstract:

    Recent studies have indicated changes in the glottal Excitation Source characteristics apart from vocal tract resonances due to tongue tip trilling. In this paper we study the significance of changing vocal tract system and the associated glottal Excitation Source characteristics due to trilling, from perception point of view. These studies are made by generating speech signal by either retaining the features of the vocal tract system or of the glottal Excitation Source of trill sounds. Experiments are conducted to understand the perceptual significance of the Excitation Source characteristics on production of different trill sounds. Speech sounds of sustained trill and approximant pair, and apical trills produced by four different places of articulation are considered. Features of the vocal tract system are extracted using linear prediction analysis, and those of the Source by zero frequency filtering.

  • INTERSPEECH - Tracking A Moving Speaker using Excitation Source Information
    2020
    Co-Authors: Vikas C. Raykar, B. Yegnanarayana, Ramani Duraiswami, S. R. Mahadeva Prasanna
    Abstract:

    Microphone arrays are widely used to detect, locate, and track a stationary or moving speaker. The first step is to estimate the time delay, between the speech signals received by a pair of microphones. Conventional methods like generalized crosscorrelation are based on the spectral content of the vocal tract system in the speech signal. The spectral content of the speech signalisaffectedduetodegradationsinthespeechsignalcaused by noise and reverberation. However, features corresponding to the Excitation Source of speech are less affected by such degradations. This paper proposes a novel method to estimate the time delays using the Excitation Source information in speech. The estimated delays are used to get the position of the moving speaker. The proposed method is compared with the spectrumbased approach using real data from a microphone array setup.

  • INTERSPEECH - Enhancement of reverberant speech using Excitation Source information.
    2020
    Co-Authors: M. Chaitanya, S. R. Mahadeva Prasanna, B. Yegnanarayana
    Abstract:

    This paper proposes a method for the enhancement of reverberant speech using the knowledge of the Excitation Source of speech production. The degradation level in the reverberant speech is measured in terms of Speechto-Reverberation component Ratio (SRR). From perception and processing point of view high SRR regions are important. Hence the proposed method identifies and enhances the speech in high SRR regions. The high SRR regions are identified using the Hilbert envelope of the Linear Prediction (LP) residual, which contains information about the Excitation Source of speech production. The Hilbert envelope of the LP residual derived from the reverberant speech is processed by the covariance analysis to derive the weight function. The LP residual of the reverberant speech is multiplied with the weight function to enhance the Excitations of speech in the high SRR regions. The speech signal synthesized from the modified LP residual is found to be less reverberant.

  • ISCSLP - A sparse representation of the Excitation Source characteristics of nonnormal speech sounds
    2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016
    Co-Authors: Vinay Kumar Mittal, B. Yegnanarayana
    Abstract:

    The impulse-sequence representation of the Excitation Source information in normal speech signal has been explored for speech coding. Such a representation, if can be developed for paralinguistic and emotional speech sounds, would help in their acoustic analyses. This paper proposes a sparse representation of the Excitation Source characteristics of nonnormal speech sounds signal, in terms of a time-domain sequence of impulses or impulse-like pulses. Using a recently proposed modified zero-frequency filtering method, an impulse sequence representation is obtained for the Excitation Source characteristics of nonnormal sounds in three categories: emotional speech, paralinguistic sounds and expressive voices. Validation of effectiveness of the proposed representation is carried out by analysis-synthesis approach and perceptual evaluation of Noh voices. This representation can potentially help significant reduction in the signal storage and processing requirement. It can also be helpful in speech coding of nonnormal speech sounds.

  • A sparse representation of the Excitation Source characteristics of nonnormal speech sounds
    2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016
    Co-Authors: Vinay Kumar Mittal, B. Yegnanarayana
    Abstract:

    The impulse-sequence representation of the Excitation Source information in normal speech signal has been explored for speech coding. Such a representation, if can be developed for paralinguistic and emotional speech sounds, would help in their acoustic analyses. This paper proposes a sparse representation of the Excitation Source characteristics of nonnormal speech sounds signal, in terms of a time-domain sequence of impulses or impulse-like pulses. Using a recently proposed modified zero-frequency filtering method, an impulse sequence representation is obtained for the Excitation Source characteristics of nonnormal sounds in three categories: emotional speech, paralinguistic sounds and expressive voices. Validation of effectiveness of the proposed representation is carried out by analysis-synthesis approach and perceptual evaluation of Noh voices. This representation can potentially help significant reduction in the signal storage and processing requirement. It can also be helpful in speech coding of nonnormal speech sounds.

S. Mahadeva R. Prasanna - One of the best experts on this subject based on the ideXlab platform.

  • Analysis of Excitation Source Characteristics for Shouted and Normal Speech Classification
    2020 National Conference on Communications (NCC), 2020
    Co-Authors: Shikha Baghel, S. Mahadeva R. Prasanna, Prithwijit Guha
    Abstract:

    The present work is aimed at analysing the Excitation Source characteristics of normal and shouted speech. In this context, we analyze the Differenced Electroglottogram (DEGG) signal corresponding to different vowels. This work proposes two novel Excitation Source features that are estimated from DEGG signal. These features are (a) Open Phase Triangle Area (OPTA) and (b) Flatness of Glottal Cycle (FoGC). OPTA captures the effect of open phase duration and slope of DEGG signal. FoGC measures the change in Source characteristics due to strength of Excitation (SoE) and pitch period. A practical issue in using the proposed features is the unavailability of DEGG signal in most speech processing applications. To overcome this problem, the integrated linear prediction residual (ILPR) signal estimated from speech is considered as an approximation of DEGG. We show that the proposed features can be computed from ILPR signal in the absence of DEGG. It is observed that the proposed features (estimated from either DEGG or ILPR) are successful in discriminating shouted from normal speech.

  • Excitation Source Feature for Discriminating Shouted and Normal Speech
    2018 International Conference on Signal Processing and Communications (SPCOM), 2018
    Co-Authors: Shikha Baghel, S. Mahadeva R. Prasanna, Prithwijit Guha
    Abstract:

    Dynamics of shouted speech production significantly vary from that of normal speech. These variations can be analyzed from Excitation Source information by using differenced electroglottogram (DEGG) signal. This work has two contributions. First, the proposal of a novel Glottal Open Phase Tilt (GOPT) feature derived from DEGG signal for discrimination of shouted and normal speech. Second, the construction of a database of speech and corresponding EGG signals for performance analysis of the proposed feature. In case of shouting, vocal folds vibrate faster and close abruptly. This leads to relative proximity of glottal opening and the following closing instances. This motivated the proposal of tilt feature for discriminating shouted from normal speech. The proposed feature is also extracted from ILPR signals that are known to approximate DEGG signals. Experiments on the collected dataset have provided shouted speech detection rate of 90.9% for DEGG and 76.37% for ILPR signals.

  • Speaker change detection using Excitation Source and vocal tract system information
    2015 Twenty First National Conference on Communications (NCC), 2015
    Co-Authors: Mousmita Sarma, Sree Nilendra Gadre, Biswajit Dev Sarma, S. Mahadeva R. Prasanna
    Abstract:

    The speaker change information in speech is due to both vocal tract and Excitation Source information. In this work, the Excitation Source information is extracted by computing cepstral features from the zero frequency filtered speech (ZFFS) signal. The vocal tract system information is extracted by computing cepstral features from the speech signal. The speaker change evidences obtained from these two feature sets are combined and observed that they contain complementary information for speaker change detection. The popular distance metric based algorithms, Bayesian Information Criteria (BIC) and Kullback Leibler Divergence (KLD) are used to detect the speaker change evidences. The Miss Detection Rate (MDR) of BIC based algorithm using cepstral features obtained from speech is 24.18% and from ZFFS is 25.92%, respectively. When the two sets of evidences are combined, the MDR reduces to 15.89%. Similarly, individual MDR of KLD based algorithm from speech and ZFFS are 32.24% and 45.17%, respectively, where as the combination reduces the MDR to 19.67%. Experiments are also performed with noisy speech signal and similar reduction of MDR is observed. This demonstrates the usefulness of cepstral features from the Excitation Source signal for reducing MDR.

  • analysis of Excitation Source information in emotional speech
    Conference of the International Speech Communication Association, 2010
    Co-Authors: S. Mahadeva R. Prasanna, D. Govind
    Abstract:

    The objective of this work is to analyze the effect of emotions on the Excitation Source of speech production. The neutral, angry, happy, boredom and fear emotions are considered for the study. Initially the electroglottogram (EGG) and its derivative signals are compared across different emotions. The mean, standard deviation and contour of instantaneous pitch, and strength of Excitation parameters are derived by processing the derivative of the EGG and also speech using zero-frequency filtering (ZFF) approach. The comparative study of these features across different emotions reveals that the effect of emotions on the Excitation Source is distinct and significant. The comparative study of the parameters from the derivative of EGG and speech waveform indicate that both cases have the same trend and range, inferring any of them may be used. Use of the computed parameters are found to be effective in the prosodic modification task. Index Terms: Source, emotion, pitch, strength.

  • Non-parametric vector quantization of Excitation Source information for speaker recognition
    TENCON 2008 - 2008 IEEE Region 10 Conference, 2008
    Co-Authors: Debadatta Pati, S. Mahadeva R. Prasanna
    Abstract:

    The objective of this work is to demonstrate the feasibility of Excitation Source information obtained by non-parametric vector quantization (VQ) for speaker recognition task. Linear prediction (LP) residual is used as the representation of Excitation Source information. The LP residual is subjected to non-parametric VQ during training. The codebooks are built for different codebook sizes. The testing of these codebooks using the LP residual of testing speech data indeed demonstrates that a codebook of sufficiently large size uniquely represents the speaker and provides appreciable performance. The speaker recognition system built using conventional Mel frequency cepstral coefficients (MFCCs) representing vocal tract information combines well with the proposed speaker recognition system using Excitation Source information to provide improved performance. On a set of randomly chosen 30 speakers from the TIMIT database, the proposed system provides 75%, MFCC based system provides 95% and the combined one provides 98.33%.

S. R. Mahadeva Prasanna - One of the best experts on this subject based on the ideXlab platform.

  • Speaker Recognition from Excitation Source Perspective
    Iete Technical Review, 2020
    Co-Authors: Debadatta Pati, S. R. Mahadeva Prasanna
    Abstract:

    AbstractThis paper gives a survey of different explorations carried out using speaker information present in the Excitation Source of speech for speaker recognition. The paper begins with an overview of the speaker recognition task. This is followed by a discussion on different speaker information present in speech, feature extraction methods, and types of Excitation Sources for speech production. Detailed descriptions on different explorations to exploit the speaker information in the Excitation Source are then given. These include methods based on pitch contour, jitter, shimmer, glottal flow derivative, linear prediction (LP) residual, LP residual phase, LP residual cepstrum, harmonic structure of the LP residual spectrum, and time frequency analysis of LP residual. A comparative study of all these methods is then carried out to highlight their merits and demerits. The paper is concluded by mentioning a future direction for speaker recognition from Excitation Source perspective.

  • INTERSPEECH - Tracking A Moving Speaker using Excitation Source Information
    2020
    Co-Authors: Vikas C. Raykar, B. Yegnanarayana, Ramani Duraiswami, S. R. Mahadeva Prasanna
    Abstract:

    Microphone arrays are widely used to detect, locate, and track a stationary or moving speaker. The first step is to estimate the time delay, between the speech signals received by a pair of microphones. Conventional methods like generalized crosscorrelation are based on the spectral content of the vocal tract system in the speech signal. The spectral content of the speech signalisaffectedduetodegradationsinthespeechsignalcaused by noise and reverberation. However, features corresponding to the Excitation Source of speech are less affected by such degradations. This paper proposes a novel method to estimate the time delays using the Excitation Source information in speech. The estimated delays are used to get the position of the moving speaker. The proposed method is compared with the spectrumbased approach using real data from a microphone array setup.

  • INTERSPEECH - Enhancement of reverberant speech using Excitation Source information.
    2020
    Co-Authors: M. Chaitanya, S. R. Mahadeva Prasanna, B. Yegnanarayana
    Abstract:

    This paper proposes a method for the enhancement of reverberant speech using the knowledge of the Excitation Source of speech production. The degradation level in the reverberant speech is measured in terms of Speechto-Reverberation component Ratio (SRR). From perception and processing point of view high SRR regions are important. Hence the proposed method identifies and enhances the speech in high SRR regions. The high SRR regions are identified using the Hilbert envelope of the Linear Prediction (LP) residual, which contains information about the Excitation Source of speech production. The Hilbert envelope of the LP residual derived from the reverberant speech is processed by the covariance analysis to derive the weight function. The LP residual of the reverberant speech is multiplied with the weight function to enhance the Excitations of speech in the high SRR regions. The speech signal synthesized from the modified LP residual is found to be less reverberant.

  • INTERSPEECH - Analysis of Excitation Source Information in Emotional Speech
    2020
    Co-Authors: S. R. Mahadeva Prasanna, D. Govind
    Abstract:

    The objective of this work is to analyze the effect of emotions on the Excitation Source of speech production. The neutral, angry, happy, boredom and fear emotions are considered for the study. Initially the electroglottogram (EGG) and its derivative signals are compared across different emotions. The mean, standard deviation and contour of instantaneous pitch, and strength of Excitation parameters are derived by processing the derivative of the EGG and also speech using zero-frequency filtering (ZFF) approach. The comparative study of these features across different emotions reveals that the effect of emotions on the Excitation Source is distinct and significant. The comparative study of the parameters from the derivative of EGG and speech waveform indicate that both cases have the same trend and range, inferring any of them may be used. Use of the computed parameters are found to be effective in the prosodic modification task. Index Terms: Source, emotion, pitch, strength.

  • Exploration of Excitation Source information for shouted and normal speech classification
    Journal of the Acoustical Society of America, 2020
    Co-Authors: Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
    Abstract:

    Discrimination between shouted and normal speech is an essential prerequisite for many speech processing applications. Existing works have established that Excitation Source information plays a significant role in shouted speech production. In speech processing literature, various features have been proposed to model different aspects of the Excitation Source. The principal contribution of this work is to explore three such features, Discrete Cosine Transform of Integrated Linear Prediction Residual (DCT-ILPR), Mel-Power Difference of Spectrum in Sub-bands (MPDSS), and Residual Mel-Frequency Cepstral Coefficient (RMFCC), for shouted and normal speech classification. The DCT-ILPR feature represents the shape of the glottal cycle, MPDSS estimates the periodicity of the Excitation Source spectrum, and RMFCC characterizes smoothed spectral information of the Excitation Source. The authors have also contributed a dataset containing shouted and normal speech. This work is evaluated on three datasets and benchmarked against three baseline methods. Deep neural networks are used to study the classification performance of individual features and their combinations. The generalization performance of features (and combinations) is also investigated. Fusion of Excitation Source features with Mel-Frequency Cepstral Coefficients (MFCC) provides the best performance compared to other combinations. Noise analysis shows that adding Excitation features with MFCC+ Δ Δ provides a more robust classification system.Discrimination between shouted and normal speech is an essential prerequisite for many speech processing applications. Existing works have established that Excitation Source information plays a significant role in shouted speech production. In speech processing literature, various features have been proposed to model different aspects of the Excitation Source. The principal contribution of this work is to explore three such features, Discrete Cosine Transform of Integrated Linear Prediction Residual (DCT-ILPR), Mel-Power Difference of Spectrum in Sub-bands (MPDSS), and Residual Mel-Frequency Cepstral Coefficient (RMFCC), for shouted and normal speech classification. The DCT-ILPR feature represents the shape of the glottal cycle, MPDSS estimates the periodicity of the Excitation Source spectrum, and RMFCC characterizes smoothed spectral information of the Excitation Source. The authors have also contributed a dataset containing shouted and normal speech. This work is evaluated on three datasets and benchma...

Suryakanth V. Gangashetty - One of the best experts on this subject based on the ideXlab platform.

  • Analysis of laughter and speech-laugh signals using Excitation Source information
    2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2014
    Co-Authors: Sri Harsha Dumpala, Suryakanth V. Gangashetty, Karthik Venkat Sridaran, B. Yegnanarayana
    Abstract:

    Speech-laugh is a speech-synchronous form of laughter that often occurs in natural conversation. However, there are deviations in features of speech-laugh when compared with laughter and neutral speech individually. The objective of this study is to analyse the Excitation Source features to capture the deviations between laughter and speech-laughs in voiced regions. The features used in this analysis are based on instantaneous fundamental frequency and strength of Excitation (β) at epochs. Modified zero frequency filtering (ZFF) method is used to extract the features. Kullback-Leibler (KL) distances obtained show that there are deviations in Excitation Source features which can be exploited to develop a method to discriminate speech-laughs from laughter. Experimental results show that features used are robust and speaker independent in discriminating speech-laughs from laughter. Results showing deviations of laughter and speech-laughs from neutral speech were also presented.

  • ICASSP - Analysis of laughter and speech-laugh signals using Excitation Source information
    2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2014
    Co-Authors: Sri Harsha Dumpala, Suryakanth V. Gangashetty, Karthik Venkat Sridaran, B. Yegnanarayana
    Abstract:

    Speech-laugh is a speech-synchronous form of laughter that often occurs in natural conversation. However, there are deviations in features of speech-laugh when compared with laughter and neutral speech individually. The objective of this study is to analyse the Excitation Source features to capture the deviations between laughter and speech-laughs in voiced regions. The features used in this analysis are based on instantaneous fundamental frequency and strength of Excitation (β) at epochs. Modified zero frequency filtering (ZFF) method is used to extract the features. Kullback-Leibler (KL) distances obtained show that there are deviations in Excitation Source features which can be exploited to develop a method to discriminate speech-laughs from laughter. Experimental results show that features used are robust and speaker independent in discriminating speech-laughs from laughter. Results showing deviations of laughter and speech-laughs from neutral speech were also presented.

  • Excitation Source features for discrimination of anger and happy emotions
    Proceedings of the Annual Conference of the International Speech Communication Association INTERSPEECH, 2014
    Co-Authors: P. Gangamohan, Suryakanth V. Gangashetty, Sudarsana Reddy Kadiri, B. Yegnanarayana
    Abstract:

    Studies on the emotion recognition task indicate that there is confusion in discrimination among higher activation states like 'anger' and 'happy'. In this study, features related to Excitation Source of speech are examined for discriminating 'anger' and 'happy' emotions. The objective is to explore the features which are independent of lexical content, language, channel and speaker. The features like strength of Excitation from zero frequency filtering method and spectral band magnitude energies from short-time spectral analysis are used. Experimental results show that these features can discriminate 'anger' and 'happy' emotion states to a good extent.

Jian Wang - One of the best experts on this subject based on the ideXlab platform.

  • design of a wideband Excitation Source for fast bioimpedance spectroscopy
    Measurement Science and Technology, 2011
    Co-Authors: Yuxiang Yang, Minhang Kang, Yong Lu, Jian Wang
    Abstract:

    Multi-frequency-one-time (MFOT) measurement of bioimpedance spectroscopy (BIS) can greatly reduce measurement time and grasp the transient physiological status of a living body compared with the traditional one-frequency-one-time (OFOT) measurement technology, and a wideband Excitation Source mixed with multiple frequencies is a crucial part of MFOT measurement of BIS. This communication describes a design of a wideband Excitation Source. Firstly, a multi-frequency mixed (MFM) signal containing seven primary harmonics is synthesized based on Walsh functions, which is a periodical and rectangular signal and whose 68.9% of the energy is homogeneously distributed on its seven 2nth primary harmonics. Then the MFM signal is generated by a field programmable gate array (FPGA), and a unipolar-to-bipolar convertor (UBC) is designed to convert the unipolar signal into bipolar signal. Finally, the bipolar MFM signal is driven by a voltage-controlled current Source (VCCS). A 2R-1C series model is adopted as the load of the VCCS, and the simulated voltage response on the load is obtained based on the theoretical analysis. Experiments show that the practical waveform on the load matches well with the theoretical analysis, which indicates that the VCCS has a good performance on the MFM signal. The design of the wideband Excitation Source establishes a good foundation for fast measurement of BIS.