Audio Stream - Explore the Science & Experts

The Experts below are selected from a list of 9561 Experts worldwide ranked by ideXlab platform

Elizabeth A Croft - One of the best experts on this subject based on the ideXlab platform.

galvanic skin response derived bookmarking of an Audio Stream

Human Factors in Computing Systems, 2011

Co-Authors: Matthew K X J Pan, Gordon Jihshiang Chang, Gokhan H Himmetoglu, Ajung Moon, Thomas W Hazelton, Karon E Maclean, Elizabeth A Croft

Abstract:

We demonstrate a novel interaction paradigm driven by implicit, low-attention user control, accomplished by monitoring a user's physiological state. We have designed and prototyped this interaction for a first use case of bookmarking an Audio Stream, to holistically explore the implicit interaction concept. A listener's galvanic skin conductance (GSR) is monitored for orienting responses (ORs) to external interruptions; our research prototype then automatically bookmarks the media such that the user can attend to the interruption, then resume listening from the point heshe is interrupted.

15 days free trial to Access Article
CHI Extended Abstracts - Galvanic skin response-derived bookmarking of an Audio Stream

Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems - CHI EA '11, 2011

Co-Authors: Matthew K X J Pan, Gordon Jihshiang Chang, Gokhan H Himmetoglu, Ajung Moon, Thomas W Hazelton, Karon E Maclean, Elizabeth A Croft

Abstract:

We demonstrate a novel interaction paradigm driven by implicit, low-attention user control, accomplished by monitoring a user's physiological state. We have designed and prototyped this interaction for a first use case of bookmarking an Audio Stream, to holistically explore the implicit interaction concept. A listener's galvanic skin conductance (GSR) is monitored for orienting responses (ORs) to external interruptions; our research prototype then automatically bookmarks the media such that the user can attend to the interruption, then resume listening from the point heshe is interrupted.

15 days free trial to Access Article

Hong-jiang Zhang - One of the best experts on this subject based on the ideXlab platform.

Highlight sound effects detection in Audio Stream

2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003

Co-Authors: Rui Cai, Hong-jiang Zhang, Lie Lu, Lian-hong Cai

Abstract:

This paper addresses the problem of highlight sound effects detection in Audio Stream, which is very useful in fields of video summarization and highlight extraction. Unlike researches on Audio segmentation and classification, in this domain, it just locates those highlight sound effects in Audio Stream. An extensible framework is proposed and in current system three sound effects are considered: laughter, applause and cheer, which are tied up with highlight events in entertainments, sports, meetings and home videos. HMMs are used to model these sound effects and a log-likelihood scores based method is used to make final decision. A sound effect attention model is also proposed to extend general Audio attention model for highlight extraction and video summarization. Evaluations on a 2-hours Audio database showed very encouraging results.

15 days free trial to Access Article
ICME - Highlight sound effects detection in Audio Stream

2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003

Co-Authors: Rui Cai, Hong-jiang Zhang, Lian-hong Cai

Abstract:

This paper addresses the problem of highlight sound effects detection in Audio Stream, which is very useful in fields of video summarization and highlight extraction. Unlike researches on Audio segmentation and classification, in this domain, it just locates those highlight sound effects in Audio Stream. An extensible framework is proposed and in current system three sound effects are considered: laughter, applause and cheer, which are tied up with highlight events in entertainments, sports, meetings and home videos. HMMs are used to model these sound effects and a log-likelihood scores based method is used to make final decision. A sound effect attention model is also proposed to extend general Audio attention model for highlight extraction and video summarization. Evaluations on a 2-hours Audio database showed very encouraging results.

15 days free trial to Access Article
a robust Audio classification and segmentation method

ACM Multimedia, 2001

Co-Authors: Hao Jiang, Hong-jiang Zhang

Abstract:

In this paper, we present a robust algorithm for Audio classification that is capable of segmenting and classifying an Audio Stream into speech, music, environment sound and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and non-speech discrimination. In this step, a novel algorithm based on KNN and LSP VQ is presented. The second step further divides non-speech class into music, environment sounds and silence with a rule based classification scheme. Some new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. Our experiments in the context of video structure parsing have shown the algorithms produce very satisfactory results.

15 days free trial to Access Article

Shigeki Sagayama - One of the best experts on this subject based on the ideXlab platform.

Audio Stream segregation of multi pitch music signal based on time space clustering using gaussian kernel 2 dimensional model

International Conference on Acoustics Speech and Signal Processing, 2005

Co-Authors: Hirokazu Kameoka, Takuya Nishimoto, Shigeki Sagayama

Abstract:

The paper describes a novel approach for Audio Stream segregation of a multi-pitch music signal. We propose a parameter-constrained time-frequency spectrum model expressing both a harmonic spectral structure and a temporal curve of the power envelope with Gaussian kernels. MAP estimation of the model parameters using the EM algorithm provides fundamental frequency, onset and offset time, spectral envelope and power envelope of every underlying Audio Stream. Our proposed method showed high accuracy in a pitch name estimation task of several pieces of real music performance data.

15 days free trial to Access Article
ICASSP (3) - Audio Stream segregation of multi-pitch music signal based on time-space clustering using Gaussian kernel 2-dimensional model

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 1

Co-Authors: Hirokazu Kameoka, Takuya Nishimoto, Shigeki Sagayama

Abstract:

The paper describes a novel approach for Audio Stream segregation of a multi-pitch music signal. We propose a parameter-constrained time-frequency spectrum model expressing both a harmonic spectral structure and a temporal curve of the power envelope with Gaussian kernels. MAP estimation of the model parameters using the EM algorithm provides fundamental frequency, onset and offset time, spectral envelope and power envelope of every underlying Audio Stream. Our proposed method showed high accuracy in a pitch name estimation task of several pieces of real music performance data.

15 days free trial to Access Article

Matthew K X J Pan - One of the best experts on this subject based on the ideXlab platform.

galvanic skin response derived bookmarking of an Audio Stream

Human Factors in Computing Systems, 2011

Co-Authors: Matthew K X J Pan, Gordon Jihshiang Chang, Gokhan H Himmetoglu, Ajung Moon, Thomas W Hazelton, Karon E Maclean, Elizabeth A Croft

Abstract:

We demonstrate a novel interaction paradigm driven by implicit, low-attention user control, accomplished by monitoring a user's physiological state. We have designed and prototyped this interaction for a first use case of bookmarking an Audio Stream, to holistically explore the implicit interaction concept. A listener's galvanic skin conductance (GSR) is monitored for orienting responses (ORs) to external interruptions; our research prototype then automatically bookmarks the media such that the user can attend to the interruption, then resume listening from the point heshe is interrupted.

15 days free trial to Access Article
CHI Extended Abstracts - Galvanic skin response-derived bookmarking of an Audio Stream

Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems - CHI EA '11, 2011

Co-Authors: Matthew K X J Pan, Gordon Jihshiang Chang, Gokhan H Himmetoglu, Ajung Moon, Thomas W Hazelton, Karon E Maclean, Elizabeth A Croft

Abstract:

We demonstrate a novel interaction paradigm driven by implicit, low-attention user control, accomplished by monitoring a user's physiological state. We have designed and prototyped this interaction for a first use case of bookmarking an Audio Stream, to holistically explore the implicit interaction concept. A listener's galvanic skin conductance (GSR) is monitored for orienting responses (ORs) to external interruptions; our research prototype then automatically bookmarks the media such that the user can attend to the interruption, then resume listening from the point heshe is interrupted.

15 days free trial to Access Article

Claude Barras - One of the best experts on this subject based on the ideXlab platform.

Neural speech turn segmentation and affinity propagation for speaker diarization

2018

Co-Authors: Ruiqing Yin, Hervé Bredin, Claude Barras

Abstract:

Speaker diarization is the task of determining "who speaks when" in an Audio Stream. Most diarization systems rely on statistical models to address four sub-tasks: speech activity detection (SAD), speaker change detection (SCD), speech turn clustering, and re-segmentation. First, following the recent success of recurrent neural networks (RNN) for SAD and SCD, we propose to address re-segmentation with Long-Short Term Memory (LSTM) networks. Then, we propose to use affinity propagation on top of neural speaker embeddings for speech turn clustering, outperforming regular Hierarchical Agglomerative Clustering (HAC). Finally, all these modules are combined and jointly optimized to form a speaker diarization pipeline in which all but the clustering step are based on RNNs. We provide experimental results on the French Broadcast dataset ETAPE where we reach state-of-the-art performance.

15 days free trial to Access Article
Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks

2017

Co-Authors: Ruiqing Yin, Hervé Bredin, Claude Barras

Abstract:

Speaker change detection is an important step in a speaker di-arization system. It aims at finding speaker change points in the Audio Stream. In this paper, it is treated as a sequence labeling task and addressed by Bidirectional long short term memory networks (Bi-LSTM). The system is trained and evaluated on the Broadcast TV subset from ETAPE database. The result shows that the proposed model brings good improvement over conventional methods based on BIC and Gaussian Divergence. For instance, in comparison to Gaussian divergence, it produces speech turns that are 19.5% longer on average, with the same level of purity.

15 days free trial to Access Article
Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization

2017

Co-Authors: Guillaume Wisniewksi, Hervé Bredin, Grégory Gelly, Claude Barras

Abstract:

Real-time speaker diarization has many potential applications, including public security, biometrics or forensics. It can also significantly speed up the indexing of increasingly large mul-timedia archives. In this paper, we address the issue of low-latency speaker diarization that consists in continuously detecting new or reoccurring speakers within an Audio Stream, and determining when each speaker is active with a low latency (e.g. every second). This is in contrast with most existing approaches in speaker diarization that rely on multiple passes over the complete Audio recording. The proposed approach combines speaker turn neural embeddings with an incremental structure prediction approach inspired by state-of-the-art Natural Language Processing models for Part-of-Speech tagging and dependency parsing. It can therefore leverage both information describing the utterance and the inherent temporal structure of interactions between speakers to learn, in supervised framework , to identify speakers. Experiments on the Etape broadcast news benchmark validate the approach.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Audio Stream with ideXlab!

Elizabeth A Croft - One of the best experts on this subject based on the ideXlab platform.

galvanic skin response derived bookmarking of an Audio Stream

CHI Extended Abstracts - Galvanic skin response-derived bookmarking of an Audio Stream

Hong-jiang Zhang - One of the best experts on this subject based on the ideXlab platform.

Highlight sound effects detection in Audio Stream

ICME - Highlight sound effects detection in Audio Stream

a robust Audio classification and segmentation method

Shigeki Sagayama - One of the best experts on this subject based on the ideXlab platform.

Audio Stream segregation of multi pitch music signal based on time space clustering using gaussian kernel 2 dimensional model

ICASSP (3) - Audio Stream segregation of multi-pitch music signal based on time-space clustering using Gaussian kernel 2-dimensional model

Matthew K X J Pan - One of the best experts on this subject based on the ideXlab platform.

galvanic skin response derived bookmarking of an Audio Stream

CHI Extended Abstracts - Galvanic skin response-derived bookmarking of an Audio Stream

Claude Barras - One of the best experts on this subject based on the ideXlab platform.

Neural speech turn segmentation and affinity propagation for speaker diarization

Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks

Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization