Audio Signals - Explore the Science & Experts

The Experts below are selected from a list of 29358 Experts worldwide ranked by ideXlab platform

Masataka Goto - One of the best experts on this subject based on the ideXlab platform.

lyricsynchronizer automatic synchronization system between musical Audio Signals and lyrics

IEEE Journal of Selected Topics in Signal Processing, 2011

Co-Authors: Hiromasa Fujihara, Masataka Goto, Jun Ogata, Hiroshi G Okuno

Abstract:

This paper describes a system that can automatically synchronize polyphonic musical Audio Signals with their corresponding lyrics. Although methods for synchronizing monophonic speech Signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal Signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.

15 days free trial to Access Article
Drum sound recognition for polyphonic Audio Signals by adaptation and matching of spectrogram templates with harmonic structure suppression

IEEE Transactions on Audio Speech and Language Processing, 2007

Co-Authors: Kazuyoshi Yoshii, Masataka Goto, Hiroshi G Okuno

Abstract:

This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic Audio Signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto's distance measure originally designed to detect the onsets in drums-only Signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively

15 days free trial to Access Article
a real time music scene description system predominant f0 estimation for detecting melody and bass lines in real world Audio Signals

Speech Communication, 2004

Co-Authors: Masataka Goto

Abstract:

Abstract In this paper, we describe the concept of music scene description and address the problem of detecting melody and bass lines in real-world Audio Signals containing the sounds of various instruments. Most previous pitch-estimation methods have had difficulty dealing with such complex music Signals because these methods were designed to deal with mixtures of only a few sounds. To enable estimation of the fundamental frequency (F0) of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the unreliable fundamental component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. This method estimates the relative dominance of every possible F0 (represented as a probability density function of the F0) by using MAP (maximum a posteriori probability) estimation and considers the F0’s temporal continuity by using a multiple-agent architecture. Experimental results with a set of ten music excerpts from compact-disc recordings showed that a real-time system implementing this method was able to detect melody and bass lines about 80% of the time these existed.

15 days free trial to Access Article
real time beat tracking for drumless Audio Signals chord change detection for musical decisions

Speech Communication, 1999

Co-Authors: Masataka Goto, Yoichi Muraoka

Abstract:

Abstract This paper describes a real-time beat-tracking system that detects a hierarchical beat structure in musical Audio Signals without drum-sounds. Most previous systems have dealt with MIDI Signals and had difficulty in applying, in real time, musical heuristics to Audio Signals containing sounds of various instruments and in tracking beats above the quarter-note level. Our system not only tracks beats at the quarter-note level but also detects beat structure at the half-note and measure levels. To make musical decisions about the Audio Signals, we propose a method of detecting chord changes that does not require chord names to be identified. The method enables the system to track beats at different rhythmic levels – for example, to find the beginnings of half notes and measures – and to select the best of various hypotheses about beat positions. Experimental results show that the proposed method was effective to detect the beat structure in real-world Audio Signals sampled from compact discs of popular music.

15 days free trial to Access Article
musical understanding at the beat level real time beat tracking for Audio Signals

Computational auditory scene analysis, 1998

Co-Authors: Masataka Goto, Yoichi Muraoka

Abstract:

This paper presents the main issues and our solutions to the problem of understanding musical Audio Signals at the beat level, issues which are common to more general auditory scene analysis. Previous beat tracking systems were not able to work in realistic acoustic environments. We built a real-time beat tracking system that processes Audio Signals that contain sounds of various instruments. The main features of our solutions are: (1) To handle ambiguous situations, our system manages multiple agents that maintain multiple hypotheses of beats. (2) Our system makes a context-dependent decision by leveraging musical knowledge represented as drum patterns. (3) All processes are performed based on how reliable detected events and hypotheses are, since it is impossible to handle realistic complex Signals without mistakes. (4) Frequency-analysis parameters are dynamically adjusted by interaction between low-level and high-level processing. In our experiment using music on commercially distributed compact discs, our system correctly tracked beats in 40 out of 42 popular songs in which drums maintain the beat.

15 days free trial to Access Article

Bruno Torrésani - One of the best experts on this subject based on the ideXlab platform.

Random models for sparse Signals expansion on unions of bases with application to Audio Signals

IEEE Transactions on Signal Processing, 2008

Co-Authors: Matthieu Kowalski, Bruno Torrésani

Abstract:

A new approach for signal expansion with respect to hybrid dictionaries, based upon probabilistic modeling is proposed and studied, with emphasis on Audio signal processing applications. The signal is modeled as a sparse linear combination of waveforms, taken from the union of two orthonormal bases, with random coefficients. The behavior of the analysis coefficients, namely inner products of the signal with all basis functions, is studied in details, which shows that these coefficients may generally be classified in two categories: significant coefficients versus unsignificant coefficients. Conditions ensuring the feasibility of such a classification are given. When the classification is possible, it leads to efficient estimation algorithms, that may in turn be used for de-noising or coding purpose. The proposed approach is illustrated by numerical experiments on Audio Signals, using MDCT bases.

15 days free trial to Access Article
A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals

2005

Co-Authors: Matthieu Kowalski, Bruno Torrésani

Abstract:

The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms. 1. PROBLEM STATEMENT

15 days free trial to Access Article
A study of Bernoulli and structured random waveform models for Audio Signals

2005

Co-Authors: Matthieu Kowalski, Bruno Torrésani

Abstract:

The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms.

15 days free trial to Access Article

Matthieu Kowalski - One of the best experts on this subject based on the ideXlab platform.

Drum extraction in single channel Audio Signals using multi-layer non negative matrix factor deconvolution

2017

Co-Authors: Clément Laroche, Hélène Papadopoulos, Matthieu Kowalski, Gael Richard

Abstract:

In this paper, we propose a supervised multilayer factorization method designed for harmonic/percussive source separation and drum extraction. Our method decomposes the Audio Signals in sparse orthogonal components which capture the harmonic content, while the drum is represented by an extension of non negative matrix factorization which is able to exploit time-frequency dictionaries to take into account non stationary drum sounds. The drum dictionaries represent various real drum hits and the decomposition has more physical sense and allows for a better interpretation of the results. Experiments on real music data for a harmonic/percussive source separation task show that our method outperforms other state of the art algorithms. Finally, our method is very robust to non stationary harmonic sources that are usually poorly decomposed by existing methods.

15 days free trial to Access Article
Sparse and structured decomposition of Audio Signals on hybrid dictionaries using musical priors

Journal of the Acoustical Society of America, 2013

Co-Authors: Hélène Papadopoulos, Matthieu Kowalski

Abstract:

This paper investigates the use of musical priors for sparse expansion of Audio Signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music Audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music Signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music Audio Signals.

15 days free trial to Access Article
Random models for sparse Signals expansion on unions of bases with application to Audio Signals

IEEE Transactions on Signal Processing, 2008

Co-Authors: Matthieu Kowalski, Bruno Torrésani

Abstract:

A new approach for signal expansion with respect to hybrid dictionaries, based upon probabilistic modeling is proposed and studied, with emphasis on Audio signal processing applications. The signal is modeled as a sparse linear combination of waveforms, taken from the union of two orthonormal bases, with random coefficients. The behavior of the analysis coefficients, namely inner products of the signal with all basis functions, is studied in details, which shows that these coefficients may generally be classified in two categories: significant coefficients versus unsignificant coefficients. Conditions ensuring the feasibility of such a classification are given. When the classification is possible, it leads to efficient estimation algorithms, that may in turn be used for de-noising or coding purpose. The proposed approach is illustrated by numerical experiments on Audio Signals, using MDCT bases.

15 days free trial to Access Article
A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals

2005

Co-Authors: Matthieu Kowalski, Bruno Torrésani

Abstract:

The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms. 1. PROBLEM STATEMENT

15 days free trial to Access Article
A study of Bernoulli and structured random waveform models for Audio Signals

2005

Co-Authors: Matthieu Kowalski, Bruno Torrésani

Abstract:

The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms.

15 days free trial to Access Article

Mériem Jaidane - One of the best experts on this subject based on the ideXlab platform.

Audio watermarking: a way to stationnarize Audio Signals

IEEE Transactions on Signal Processing, 2005

Co-Authors: Sonia Djaziri-larbi, Mériem Jaidane

Abstract:

Audio watermarking is usually used as a multimedia copyright protection tool or as a system that embed metadata in Audio Signals. In this paper, watermarking is viewed as a preprocessing step for further Audio processing systems: the watermark signal conveys no information, it is rather used to modify the statistical characteristics of an Audio signal, in particular its non stationarity. The embedded watermark is then added in order to stationnarize the host signal. Indeed, the embedded watermark is piecewise stationary, thus it modifies the stationarity of the original Audio signal. In some Audio processing fields, this fact can be used to improve performances that are very sensitive to time-variant signal statistics. This paper presents an analysis of the perceptual water-marking impact on the stationarity of Audio Signals. The study is based on stationarity indices, which represent a measure of variations in spectral characteristics of Signals, using time frequency representations. Simulation results with two kinds of Signals, artificial Signals and Audio Signals (speech and music) are presented. Stationarity indices comparison between water-marked and original Audio Signals shows a significant stationarity enhancement of the watermarked signal, especially for transient attacks. Index Terms-Perceptual Audio watermarking, stationarity indices, time frequency representations.

15 days free trial to Access Article
ICASSP (6) - Watermarking influence on the stationarity of Audio Signals

2003 IEEE International Conference on Acoustics Speech and Signal Processing 2003. Proceedings. (ICASSP '03)., 1

Co-Authors: Sonia Djaziri Larbi, Mériem Jaidane

Abstract:

The paper presents an analysis of the perceptual impact of watermarking on the stationarity of Audio Signals. Indeed, the embedded watermark is piecewise stationary, so it modifies the stationarity of the original Audio signal. This study is based on stationarity indices, which represent a measure of the variations in the spectral characteristics of Signals, using time frequency representations. Simulation results with two kinds of Signals, test Signals and Audio Signals (speech and music) are presented. Comparison stationarity indices between watermarked and original Audio Signals show a significant stationarity enhancement of the watermarked signal, especially for transient attacks.

15 days free trial to Access Article

Mériem Jaïdane-saïdane - One of the best experts on this subject based on the ideXlab platform.

Audio watermarking: A way to stationnarize Audio Signals

IEEE Transactions on Signal Processing, 2005

Co-Authors: Sonia Djaziri Larbi, Mériem Jaïdane-saïdane

Abstract:

Audio watermarking is usually used as a multimedia copyright protection tool or as a system that embed metadata in Audio Signals. In this paper, watermarking is viewed as a preprocessing step for further Audio processing systems: the watermark signal conveys no information, rather it is used to modify the statistical characteristics of an Audio signal, in particular its nonstationarity. The embedded watermark is then added in order to stationnarize the host signal. Indeed, the embedded watermark is piecewise stationary, thus it modifies the stationarity of the original Audio signal. In some Audio processing fields, this fact can be used to improve performances that are very sensitive to time-variant signal statistics. This work presents an analysis of the perceptual watermarking impact on the stationarity of Audio Signals. The study is based on stationarity indices, which represent a measure of variations in spectral characteristics of Signals, using time-frequency representations. Simulation results with two kinds of Signals, artificial Signals and Audio Signals (speech and music), are presented. Stationarity indices comparison between watermarked and original Audio Signals shows a significant stationarity enhancement of the watermarked signal, especially for transient attacks.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Audio Signals with ideXlab!

Masataka Goto - One of the best experts on this subject based on the ideXlab platform.

lyricsynchronizer automatic synchronization system between musical Audio Signals and lyrics

Drum sound recognition for polyphonic Audio Signals by adaptation and matching of spectrogram templates with harmonic structure suppression

a real time music scene description system predominant f0 estimation for detecting melody and bass lines in real world Audio Signals

real time beat tracking for drumless Audio Signals chord change detection for musical decisions

musical understanding at the beat level real time beat tracking for Audio Signals

Bruno Torrésani - One of the best experts on this subject based on the ideXlab platform.

Random models for sparse Signals expansion on unions of bases with application to Audio Signals

A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals

A study of Bernoulli and structured random waveform models for Audio Signals

Matthieu Kowalski - One of the best experts on this subject based on the ideXlab platform.

Drum extraction in single channel Audio Signals using multi-layer non negative matrix factor deconvolution

Sparse and structured decomposition of Audio Signals on hybrid dictionaries using musical priors

Random models for sparse Signals expansion on unions of bases with application to Audio Signals

A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals

A study of Bernoulli and structured random waveform models for Audio Signals

Mériem Jaidane - One of the best experts on this subject based on the ideXlab platform.

Audio watermarking: a way to stationnarize Audio Signals

ICASSP (6) - Watermarking influence on the stationarity of Audio Signals

Mériem Jaïdane-saïdane - One of the best experts on this subject based on the ideXlab platform.

Audio watermarking: A way to stationnarize Audio Signals