Audio Stream

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9561 Experts worldwide ranked by ideXlab platform

Elizabeth A Croft - One of the best experts on this subject based on the ideXlab platform.

  • galvanic skin response derived bookmarking of an Audio Stream
    Human Factors in Computing Systems, 2011
    Co-Authors: Matthew K X J Pan, Gordon Jihshiang Chang, Gokhan H Himmetoglu, Ajung Moon, Thomas W Hazelton, Karon E Maclean, Elizabeth A Croft
    Abstract:

    We demonstrate a novel interaction paradigm driven by implicit, low-attention user control, accomplished by monitoring a user's physiological state. We have designed and prototyped this interaction for a first use case of bookmarking an Audio Stream, to holistically explore the implicit interaction concept. A listener's galvanic skin conductance (GSR) is monitored for orienting responses (ORs) to external interruptions; our research prototype then automatically bookmarks the media such that the user can attend to the interruption, then resume listening from the point heshe is interrupted.

  • CHI Extended Abstracts - Galvanic skin response-derived bookmarking of an Audio Stream
    Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems - CHI EA '11, 2011
    Co-Authors: Matthew K X J Pan, Gordon Jihshiang Chang, Gokhan H Himmetoglu, Ajung Moon, Thomas W Hazelton, Karon E Maclean, Elizabeth A Croft
    Abstract:

    We demonstrate a novel interaction paradigm driven by implicit, low-attention user control, accomplished by monitoring a user's physiological state. We have designed and prototyped this interaction for a first use case of bookmarking an Audio Stream, to holistically explore the implicit interaction concept. A listener's galvanic skin conductance (GSR) is monitored for orienting responses (ORs) to external interruptions; our research prototype then automatically bookmarks the media such that the user can attend to the interruption, then resume listening from the point heshe is interrupted.

Hong-jiang Zhang - One of the best experts on this subject based on the ideXlab platform.

  • Highlight sound effects detection in Audio Stream
    2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003
    Co-Authors: Rui Cai, Hong-jiang Zhang, Lie Lu, Lian-hong Cai
    Abstract:

    This paper addresses the problem of highlight sound effects detection in Audio Stream, which is very useful in fields of video summarization and highlight extraction. Unlike researches on Audio segmentation and classification, in this domain, it just locates those highlight sound effects in Audio Stream. An extensible framework is proposed and in current system three sound effects are considered: laughter, applause and cheer, which are tied up with highlight events in entertainments, sports, meetings and home videos. HMMs are used to model these sound effects and a log-likelihood scores based method is used to make final decision. A sound effect attention model is also proposed to extend general Audio attention model for highlight extraction and video summarization. Evaluations on a 2-hours Audio database showed very encouraging results.

  • ICME - Highlight sound effects detection in Audio Stream
    2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003
    Co-Authors: Rui Cai, Hong-jiang Zhang, Lian-hong Cai
    Abstract:

    This paper addresses the problem of highlight sound effects detection in Audio Stream, which is very useful in fields of video summarization and highlight extraction. Unlike researches on Audio segmentation and classification, in this domain, it just locates those highlight sound effects in Audio Stream. An extensible framework is proposed and in current system three sound effects are considered: laughter, applause and cheer, which are tied up with highlight events in entertainments, sports, meetings and home videos. HMMs are used to model these sound effects and a log-likelihood scores based method is used to make final decision. A sound effect attention model is also proposed to extend general Audio attention model for highlight extraction and video summarization. Evaluations on a 2-hours Audio database showed very encouraging results.

  • a robust Audio classification and segmentation method
    ACM Multimedia, 2001
    Co-Authors: Hao Jiang, Hong-jiang Zhang
    Abstract:

    In this paper, we present a robust algorithm for Audio classification that is capable of segmenting and classifying an Audio Stream into speech, music, environment sound and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and non-speech discrimination. In this step, a novel algorithm based on KNN and LSP VQ is presented. The second step further divides non-speech class into music, environment sounds and silence with a rule based classification scheme. Some new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. Our experiments in the context of video structure parsing have shown the algorithms produce very satisfactory results.

Shigeki Sagayama - One of the best experts on this subject based on the ideXlab platform.

  • Audio Stream segregation of multi pitch music signal based on time space clustering using gaussian kernel 2 dimensional model
    International Conference on Acoustics Speech and Signal Processing, 2005
    Co-Authors: Hirokazu Kameoka, Takuya Nishimoto, Shigeki Sagayama
    Abstract:

    The paper describes a novel approach for Audio Stream segregation of a multi-pitch music signal. We propose a parameter-constrained time-frequency spectrum model expressing both a harmonic spectral structure and a temporal curve of the power envelope with Gaussian kernels. MAP estimation of the model parameters using the EM algorithm provides fundamental frequency, onset and offset time, spectral envelope and power envelope of every underlying Audio Stream. Our proposed method showed high accuracy in a pitch name estimation task of several pieces of real music performance data.

  • ICASSP (3) - Audio Stream segregation of multi-pitch music signal based on time-space clustering using Gaussian kernel 2-dimensional model
    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 1
    Co-Authors: Hirokazu Kameoka, Takuya Nishimoto, Shigeki Sagayama
    Abstract:

    The paper describes a novel approach for Audio Stream segregation of a multi-pitch music signal. We propose a parameter-constrained time-frequency spectrum model expressing both a harmonic spectral structure and a temporal curve of the power envelope with Gaussian kernels. MAP estimation of the model parameters using the EM algorithm provides fundamental frequency, onset and offset time, spectral envelope and power envelope of every underlying Audio Stream. Our proposed method showed high accuracy in a pitch name estimation task of several pieces of real music performance data.

Matthew K X J Pan - One of the best experts on this subject based on the ideXlab platform.

  • galvanic skin response derived bookmarking of an Audio Stream
    Human Factors in Computing Systems, 2011
    Co-Authors: Matthew K X J Pan, Gordon Jihshiang Chang, Gokhan H Himmetoglu, Ajung Moon, Thomas W Hazelton, Karon E Maclean, Elizabeth A Croft
    Abstract:

    We demonstrate a novel interaction paradigm driven by implicit, low-attention user control, accomplished by monitoring a user's physiological state. We have designed and prototyped this interaction for a first use case of bookmarking an Audio Stream, to holistically explore the implicit interaction concept. A listener's galvanic skin conductance (GSR) is monitored for orienting responses (ORs) to external interruptions; our research prototype then automatically bookmarks the media such that the user can attend to the interruption, then resume listening from the point heshe is interrupted.

  • CHI Extended Abstracts - Galvanic skin response-derived bookmarking of an Audio Stream
    Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems - CHI EA '11, 2011
    Co-Authors: Matthew K X J Pan, Gordon Jihshiang Chang, Gokhan H Himmetoglu, Ajung Moon, Thomas W Hazelton, Karon E Maclean, Elizabeth A Croft
    Abstract:

    We demonstrate a novel interaction paradigm driven by implicit, low-attention user control, accomplished by monitoring a user's physiological state. We have designed and prototyped this interaction for a first use case of bookmarking an Audio Stream, to holistically explore the implicit interaction concept. A listener's galvanic skin conductance (GSR) is monitored for orienting responses (ORs) to external interruptions; our research prototype then automatically bookmarks the media such that the user can attend to the interruption, then resume listening from the point heshe is interrupted.

Claude Barras - One of the best experts on this subject based on the ideXlab platform.

  • Neural speech turn segmentation and affinity propagation for speaker diarization
    2018
    Co-Authors: Ruiqing Yin, Hervé Bredin, Claude Barras
    Abstract:

    Speaker diarization is the task of determining "who speaks when" in an Audio Stream. Most diarization systems rely on statistical models to address four sub-tasks: speech activity detection (SAD), speaker change detection (SCD), speech turn clustering, and re-segmentation. First, following the recent success of recurrent neural networks (RNN) for SAD and SCD, we propose to address re-segmentation with Long-Short Term Memory (LSTM) networks. Then, we propose to use affinity propagation on top of neural speaker embeddings for speech turn clustering, outperforming regular Hierarchical Agglomerative Clustering (HAC). Finally, all these modules are combined and jointly optimized to form a speaker diarization pipeline in which all but the clustering step are based on RNNs. We provide experimental results on the French Broadcast dataset ETAPE where we reach state-of-the-art performance.

  • Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks
    2017
    Co-Authors: Ruiqing Yin, Hervé Bredin, Claude Barras
    Abstract:

    Speaker change detection is an important step in a speaker di-arization system. It aims at finding speaker change points in the Audio Stream. In this paper, it is treated as a sequence labeling task and addressed by Bidirectional long short term memory networks (Bi-LSTM). The system is trained and evaluated on the Broadcast TV subset from ETAPE database. The result shows that the proposed model brings good improvement over conventional methods based on BIC and Gaussian Divergence. For instance, in comparison to Gaussian divergence, it produces speech turns that are 19.5% longer on average, with the same level of purity.

  • Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization
    2017
    Co-Authors: Guillaume Wisniewksi, Hervé Bredin, Grégory Gelly, Claude Barras
    Abstract:

    Real-time speaker diarization has many potential applications, including public security, biometrics or forensics. It can also significantly speed up the indexing of increasingly large mul-timedia archives. In this paper, we address the issue of low-latency speaker diarization that consists in continuously detecting new or reoccurring speakers within an Audio Stream, and determining when each speaker is active with a low latency (e.g. every second). This is in contrast with most existing approaches in speaker diarization that rely on multiple passes over the complete Audio recording. The proposed approach combines speaker turn neural embeddings with an incremental structure prediction approach inspired by state-of-the-art Natural Language Processing models for Part-of-Speech tagging and dependency parsing. It can therefore leverage both information describing the utterance and the inherent temporal structure of interactions between speakers to learn, in supervised framework , to identify speakers. Experiments on the Etape broadcast news benchmark validate the approach.