voice activity detection

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 4458 Experts worldwide ranked by ideXlab platform

Li Hua - One of the best experts on this subject based on the ideXlab platform.

  • voice activity detection Based on Improved Discrete Wavelet Transform
    Computer Simulation, 2009
    Co-Authors: Li Hua
    Abstract:

    In speech processing systems,voice activity detection is very important.To improve the performance of voice activity detection,a new voice activity detection algorithm based on improved discrete wavelet transform is presented.First,the wavelet coefficients are derived by applying the discrete wavelet transform of input speech,then smoothing the TEO of corresponding wavelet coefficients to extract two efficient parameters,namely power ratio and power difference,and the judgement is made by adaptive threshold at last.The voice activity detection algorithm is simulated on MATLAB,and the experimental results prove that the new algorithm can efficiently overcome the environmental impact of low SNR,also be better than detection methods of cepstrum distance and spectral entropy.

  • voice activity detection Based on Wavelet Packet Transform and Adaptive Threshold
    Computer Simulation, 2009
    Co-Authors: Li Hua
    Abstract:

    voice activity detection is a crucial part in speech processing.Some traditional detection methods are ineffective in low SNR.To improve the performance and robustness,a new voice activity detection algorithm based on wavelet packet transform and adaptive threshold is proposed for the situation with white noise environment.The critical sub-band signals are obtained by wavelet packet transform,then smoothing the TEO of corresponding wavelet coefficients to enhance the discrimination capability of speech components,and lastly the adaptive threshold judgement is carried out.The experimental results prove that the new algorithm can efficiently distinguish the beginning and end points of voice segments in very low SNR environments.

  • voice activity detection Based on Teager Energy Operator
    Journal of Chongqing Institute of Technology, 2007
    Co-Authors: Li Hua
    Abstract:

    According to worsened performances of most voice activity detection methods at very low SNR,a new voice activity detection algorithm based on Teager energy operator(TEO) is presented,which can combine speech enhancement technology with Teager energy operator to conduct adaptive threshold judgments.The experimental results prove that the new algorithm is threshold adaptive to environment,simple and efficient,and possesses good performance and stability in a very low SNR environment.

Israel Cohen - One of the best experts on this subject based on the ideXlab platform.

  • Kernel-Based Sensor Fusion with Application to Audio-Visual voice activity detection
    IEEE Transactions on Signal Processing, 2016
    Co-Authors: David Dov, Ronen Talmon, Israel Cohen
    Abstract:

    In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.

  • audio visual voice activity detection using diffusion maps
    IEEE Transactions on Audio Speech and Language Processing, 2015
    Co-Authors: David Dov, Ronen Talmon, Israel Cohen
    Abstract:

    The performance of traditional voice activity detectors significantly deteriorates in the presence of highly nonstationary noise and transient interferences. One solution is to incorporate a video signal which is invariant to the acoustic environment. Although several voice activity detectors based on the video signal were recently presented, merely few detectors which are based on both the audio and the video signals exist in the literature to date. In this paper, we present an audio-visual voice activity detector and show that the incorporation of both audio and video signals is highly beneficial for voice activity detection. The algorithm is based on a supervised learning procedure, and a labeled training data set is considered. The algorithm comprises a feature extraction procedure, where the features are designed to separate speech from nonspeech frames. Diffusion maps is applied separately and similarly to the features of each modality and builds a low dimensional representation. Using the new representation, we propose a measure for voice activity which is based on a supervised learning procedure and the variability between adjacent frames in time. The measures of the two modalities are merged to provide voice activity detection based on both the audio and the video signals. Experimental results demonstrate the improved performance of the proposed algorithm compared to state-of-the-art detectors.

  • IWAENC - voice activity detection in transient noise environment using Laplacian pyramid algorithm
    2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014
    Co-Authors: Nurit Spingarn, Saman Mousazadeh, Israel Cohen
    Abstract:

    voice activity detection (VAD) has attracted significant research efforts in the last two decades. Despite much progress in designing voice activity detectors, voice activity detection in presence of transient noise and low SNR is a challenging problem. In this paper, we propose a new VAD algorithm based on supervised learning. Our method employs Laplacian pyramid algorithm as a tool for function extension. We estimate the likelihood ratio function of unlabeled data, by extending the likelihood ratios obtained from the labeled data. Simulation results demonstrate the advantages of the proposed method in transient noise environments over conventional statistical methods.

  • voice activity detection in presence of transients using the scattering transform
    2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI), 2014
    Co-Authors: David Dov, Israel Cohen
    Abstract:

    voice activity detection in the presence of highly non-stationary noise and transient interferences is an open problem. State-of-the-art voice activity detectors which are based on statistical models usually assume that noise is slowly varying with respect to speech. This assumption does not hold for transient interferences which are short time interruptions, and the performance of these detectors significantly deteriorates. In this paper, we propose a supervised learning algorithm for voice activity detection which is designed to perform in the presence of transients. We consider a labeled training set which comprises speech, background noise and transients, and propose a continuous measure for voice activity based on the Support Vector Machine (SVM) classifier. The measure of voice activity is constructed in a features domain, where the features are based on the scattering transform, include noise estimation, and are designed to separate speech and non-speech frames. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art detectors for different types of background noises, and in particular accurately classifies frames which contain transient interferences.

  • voice activity detection in Presence of Transient Noise Using Spectral Clustering
    IEEE Transactions on Audio Speech and Language Processing, 2013
    Co-Authors: Saman Mousazadeh, Israel Cohen
    Abstract:

    voice activity detection has attracted significant research efforts in the last two decades. Despite much progress in designing voice activity detectors, voice activity detection (VAD) in presence of transient noise is a challenging problem. In this paper, we develop a novel VAD algorithm based on spectral clustering methods. We propose a VAD technique which is a supervised learning algorithm. This algorithm divides the input signal into two separate clusters (i.e., speech presence and speech absence frames). We use labeled data in order to adjust the parameters of the kernel used in spectral clustering methods for computing the similarity matrix. The parameters obtained in the training stage together with the eigenvectors of the normalized Laplacian of the similarity matrix and Gaussian mixture model (GMM) are utilized to compute the likelihood ratio needed for voice activity detection. Simulation results demonstrate the advantage of the proposed method compared to conventional statistical model-based VAD algorithms in presence of transient noise.

Mohammad Mehdi Homayounpour - One of the best experts on this subject based on the ideXlab platform.

  • a new approach for robust realtime voice activity detection using spectral pattern
    International Conference on Acoustics Speech and Signal Processing, 2010
    Co-Authors: Mohammad Hossein Moattar, Mohammad Mehdi Homayounpour, Nima Khademi Kalantari
    Abstract:

    In this paper a voice activity detection approach is proposed which applies a voting algorithm to decide on the existence of speech in audio signal. For this purpose, the proposed approach uses three different short time features along with the pattern of spectral peaks of every frame. Spectral peaks pattern is appropriate for determining vowel sounds in speech signal even in the presence of noise. Therefore this measure can be applicable in voice activity detection in which the vowels characterize the speech signal. Experiments show that incorporating this measure along with our recently proposed approach for VAD, will improve the results of the algorithm considerably while imposing little computational overhead. The proposed approach is evaluated on different datasets with various noises and SNR levels and satisfying results are achieved.

  • EUSIPCO - A simple but efficient real-time voice activity detection algorithm
    2009
    Co-Authors: Mohammad Hossein Moattar, Mohammad Mehdi Homayounpour
    Abstract:

    voice activity detection (VAD) is a very important front end processing in all Speech and Audio processing applications. The performance of most if not all speech/audio processing methods is crucially dependent on the performance of voice activity detection. An ideal voice activity detector needs to be independent from application area and noise condition and have the least parameter tuning in real applications. In this paper a nearly ideal VAD algorithm is proposed which is both easy-to-implement and noise robust, comparing to some previous methods. The proposed method uses short-term features such as Spectral Flatness (SF) and Short-term Energy. This helps the method to be appropriate for online processing tasks. The proposed method was evaluated on several speech corpora with additive noise and is compared with some of the most recent proposed algorithms. The experiments show satisfactory performance in various noise conditions.

  • A simple but efficient real-time voice activity detection algorithm
    European Signal Processing Conference, 2009
    Co-Authors: Mohammad Hossein Moattar, Mohammad Mehdi Homayounpour
    Abstract:

    voice activity detection (VAD) is a very important front end processing in all Speech and Audio processing applications. The performance of most if not all speech/audio processing methods is crucially dependent on the performance of voice activity detection. An ideal voice activity detector needs to be independent from application area and noise condition and have the least parameter tuning in real applications. In this paper a nearly ideal VAD algorithm is proposed which is both easy-to-implement and noise robust, comparing to some previous methods. The proposed method uses short- term features such as Spectral Flatness (SF) and Short-term Energy. This helps the method to be appropriate for online processing tasks. The proposed method was evaluated on several speech corpora with additive noise and is compared with some of the most recent proposed algorithms. The ex- periments show satisfactory performance in various noise conditions

J W Pitton - One of the best experts on this subject based on the ideXlab platform.

  • voice activity detection using subband noncircularity
    International Conference on Acoustics Speech and Signal Processing, 2015
    Co-Authors: Scott Wisdom, Greg Okopal, Les Atlas, J W Pitton
    Abstract:

    Many voice activity detection (VAD) systems use the magnitude of complex-valued spectral representations. However, using only the magnitude often does not fully characterize the statistical behavior of the complex values. We present two novel methods for performing VAD on single- and dual-channel audio that do completely account for the second-order statistical behavior of complex data. Our methods exploit the second-order noncircularity (also known as impropriety) of complex subbands of speech and noise. Since speech tends to be more improper than noise, higher impropriety suggests speech activity. Our single-channel method is blind in the sense that it is unsupervised and, unlike many VAD systems, does not rely on non-speech periods for noise parameter estimation. Our methods achieve improved performance over other state-of-the-art magnitude-based VADs on the QUT-NOISE-TIMIT corpus, which indicates that impropriety is a compelling new feature for voice activity detection.

  • ICASSP - voice activity detection using subband noncircularity
    2015 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2015
    Co-Authors: Scott Wisdom, Greg Okopal, Les Atlas, J W Pitton
    Abstract:

    Many voice activity detection (VAD) systems use the magnitude of complex-valued spectral representations. However, using only the magnitude often does not fully characterize the statistical behavior of the complex values. We present two novel methods for performing VAD on single- and dual-channel audio that do completely account for the second-order statistical behavior of complex data. Our methods exploit the second-order noncircularity (also known as impropriety) of complex subbands of speech and noise. Since speech tends to be more improper than noise, higher impropriety suggests speech activity. Our single-channel method is blind in the sense that it is unsupervised and, unlike many VAD systems, does not rely on non-speech periods for noise parameter estimation. Our methods achieve improved performance over other state-of-the-art magnitude-based VADs on the QUT-NOISE-TIMIT corpus, which indicates that impropriety is a compelling new feature for voice activity detection.

Mohammad Hossein Moattar - One of the best experts on this subject based on the ideXlab platform.

  • a new approach for robust realtime voice activity detection using spectral pattern
    International Conference on Acoustics Speech and Signal Processing, 2010
    Co-Authors: Mohammad Hossein Moattar, Mohammad Mehdi Homayounpour, Nima Khademi Kalantari
    Abstract:

    In this paper a voice activity detection approach is proposed which applies a voting algorithm to decide on the existence of speech in audio signal. For this purpose, the proposed approach uses three different short time features along with the pattern of spectral peaks of every frame. Spectral peaks pattern is appropriate for determining vowel sounds in speech signal even in the presence of noise. Therefore this measure can be applicable in voice activity detection in which the vowels characterize the speech signal. Experiments show that incorporating this measure along with our recently proposed approach for VAD, will improve the results of the algorithm considerably while imposing little computational overhead. The proposed approach is evaluated on different datasets with various noises and SNR levels and satisfying results are achieved.

  • EUSIPCO - A simple but efficient real-time voice activity detection algorithm
    2009
    Co-Authors: Mohammad Hossein Moattar, Mohammad Mehdi Homayounpour
    Abstract:

    voice activity detection (VAD) is a very important front end processing in all Speech and Audio processing applications. The performance of most if not all speech/audio processing methods is crucially dependent on the performance of voice activity detection. An ideal voice activity detector needs to be independent from application area and noise condition and have the least parameter tuning in real applications. In this paper a nearly ideal VAD algorithm is proposed which is both easy-to-implement and noise robust, comparing to some previous methods. The proposed method uses short-term features such as Spectral Flatness (SF) and Short-term Energy. This helps the method to be appropriate for online processing tasks. The proposed method was evaluated on several speech corpora with additive noise and is compared with some of the most recent proposed algorithms. The experiments show satisfactory performance in various noise conditions.

  • A simple but efficient real-time voice activity detection algorithm
    European Signal Processing Conference, 2009
    Co-Authors: Mohammad Hossein Moattar, Mohammad Mehdi Homayounpour
    Abstract:

    voice activity detection (VAD) is a very important front end processing in all Speech and Audio processing applications. The performance of most if not all speech/audio processing methods is crucially dependent on the performance of voice activity detection. An ideal voice activity detector needs to be independent from application area and noise condition and have the least parameter tuning in real applications. In this paper a nearly ideal VAD algorithm is proposed which is both easy-to-implement and noise robust, comparing to some previous methods. The proposed method uses short- term features such as Spectral Flatness (SF) and Short-term Energy. This helps the method to be appropriate for online processing tasks. The proposed method was evaluated on several speech corpora with additive noise and is compared with some of the most recent proposed algorithms. The ex- periments show satisfactory performance in various noise conditions