The Experts below are selected from a list of 29358 Experts worldwide ranked by ideXlab platform
Masataka Goto - One of the best experts on this subject based on the ideXlab platform.
-
lyricsynchronizer automatic synchronization system between musical Audio Signals and lyrics
IEEE Journal of Selected Topics in Signal Processing, 2011Co-Authors: Hiromasa Fujihara, Masataka Goto, Jun Ogata, Hiroshi G OkunoAbstract:This paper describes a system that can automatically synchronize polyphonic musical Audio Signals with their corresponding lyrics. Although methods for synchronizing monophonic speech Signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal Signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.
-
Drum sound recognition for polyphonic Audio Signals by adaptation and matching of spectrogram templates with harmonic structure suppression
IEEE Transactions on Audio Speech and Language Processing, 2007Co-Authors: Kazuyoshi Yoshii, Masataka Goto, Hiroshi G OkunoAbstract:This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic Audio Signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto's distance measure originally designed to detect the onsets in drums-only Signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively
-
a real time music scene description system predominant f0 estimation for detecting melody and bass lines in real world Audio Signals
Speech Communication, 2004Co-Authors: Masataka GotoAbstract:Abstract In this paper, we describe the concept of music scene description and address the problem of detecting melody and bass lines in real-world Audio Signals containing the sounds of various instruments. Most previous pitch-estimation methods have had difficulty dealing with such complex music Signals because these methods were designed to deal with mixtures of only a few sounds. To enable estimation of the fundamental frequency (F0) of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the unreliable fundamental component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. This method estimates the relative dominance of every possible F0 (represented as a probability density function of the F0) by using MAP (maximum a posteriori probability) estimation and considers the F0’s temporal continuity by using a multiple-agent architecture. Experimental results with a set of ten music excerpts from compact-disc recordings showed that a real-time system implementing this method was able to detect melody and bass lines about 80% of the time these existed.
-
real time beat tracking for drumless Audio Signals chord change detection for musical decisions
Speech Communication, 1999Co-Authors: Masataka Goto, Yoichi MuraokaAbstract:Abstract This paper describes a real-time beat-tracking system that detects a hierarchical beat structure in musical Audio Signals without drum-sounds. Most previous systems have dealt with MIDI Signals and had difficulty in applying, in real time, musical heuristics to Audio Signals containing sounds of various instruments and in tracking beats above the quarter-note level. Our system not only tracks beats at the quarter-note level but also detects beat structure at the half-note and measure levels. To make musical decisions about the Audio Signals, we propose a method of detecting chord changes that does not require chord names to be identified. The method enables the system to track beats at different rhythmic levels – for example, to find the beginnings of half notes and measures – and to select the best of various hypotheses about beat positions. Experimental results show that the proposed method was effective to detect the beat structure in real-world Audio Signals sampled from compact discs of popular music.
-
musical understanding at the beat level real time beat tracking for Audio Signals
Computational auditory scene analysis, 1998Co-Authors: Masataka Goto, Yoichi MuraokaAbstract:This paper presents the main issues and our solutions to the problem of understanding musical Audio Signals at the beat level, issues which are common to more general auditory scene analysis. Previous beat tracking systems were not able to work in realistic acoustic environments. We built a real-time beat tracking system that processes Audio Signals that contain sounds of various instruments. The main features of our solutions are: (1) To handle ambiguous situations, our system manages multiple agents that maintain multiple hypotheses of beats. (2) Our system makes a context-dependent decision by leveraging musical knowledge represented as drum patterns. (3) All processes are performed based on how reliable detected events and hypotheses are, since it is impossible to handle realistic complex Signals without mistakes. (4) Frequency-analysis parameters are dynamically adjusted by interaction between low-level and high-level processing. In our experiment using music on commercially distributed compact discs, our system correctly tracked beats in 40 out of 42 popular songs in which drums maintain the beat.
Bruno Torrésani - One of the best experts on this subject based on the ideXlab platform.
-
Random models for sparse Signals expansion on unions of bases with application to Audio Signals
IEEE Transactions on Signal Processing, 2008Co-Authors: Matthieu Kowalski, Bruno TorrésaniAbstract:A new approach for signal expansion with respect to hybrid dictionaries, based upon probabilistic modeling is proposed and studied, with emphasis on Audio signal processing applications. The signal is modeled as a sparse linear combination of waveforms, taken from the union of two orthonormal bases, with random coefficients. The behavior of the analysis coefficients, namely inner products of the signal with all basis functions, is studied in details, which shows that these coefficients may generally be classified in two categories: significant coefficients versus unsignificant coefficients. Conditions ensuring the feasibility of such a classification are given. When the classification is possible, it leads to efficient estimation algorithms, that may in turn be used for de-noising or coding purpose. The proposed approach is illustrated by numerical experiments on Audio Signals, using MDCT bases.
-
A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals
2005Co-Authors: Matthieu Kowalski, Bruno TorrésaniAbstract:The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms. 1. PROBLEM STATEMENT
-
A study of Bernoulli and structured random waveform models for Audio Signals
2005Co-Authors: Matthieu Kowalski, Bruno TorrésaniAbstract:The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms.
Matthieu Kowalski - One of the best experts on this subject based on the ideXlab platform.
-
Drum extraction in single channel Audio Signals using multi-layer non negative matrix factor deconvolution
2017Co-Authors: Clément Laroche, Hélène Papadopoulos, Matthieu Kowalski, Gael RichardAbstract:In this paper, we propose a supervised multilayer factorization method designed for harmonic/percussive source separation and drum extraction. Our method decomposes the Audio Signals in sparse orthogonal components which capture the harmonic content, while the drum is represented by an extension of non negative matrix factorization which is able to exploit time-frequency dictionaries to take into account non stationary drum sounds. The drum dictionaries represent various real drum hits and the decomposition has more physical sense and allows for a better interpretation of the results. Experiments on real music data for a harmonic/percussive source separation task show that our method outperforms other state of the art algorithms. Finally, our method is very robust to non stationary harmonic sources that are usually poorly decomposed by existing methods.
-
Sparse and structured decomposition of Audio Signals on hybrid dictionaries using musical priors
Journal of the Acoustical Society of America, 2013Co-Authors: Hélène Papadopoulos, Matthieu KowalskiAbstract:This paper investigates the use of musical priors for sparse expansion of Audio Signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music Audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music Signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music Audio Signals.
-
Random models for sparse Signals expansion on unions of bases with application to Audio Signals
IEEE Transactions on Signal Processing, 2008Co-Authors: Matthieu Kowalski, Bruno TorrésaniAbstract:A new approach for signal expansion with respect to hybrid dictionaries, based upon probabilistic modeling is proposed and studied, with emphasis on Audio signal processing applications. The signal is modeled as a sparse linear combination of waveforms, taken from the union of two orthonormal bases, with random coefficients. The behavior of the analysis coefficients, namely inner products of the signal with all basis functions, is studied in details, which shows that these coefficients may generally be classified in two categories: significant coefficients versus unsignificant coefficients. Conditions ensuring the feasibility of such a classification are given. When the classification is possible, it leads to efficient estimation algorithms, that may in turn be used for de-noising or coding purpose. The proposed approach is illustrated by numerical experiments on Audio Signals, using MDCT bases.
-
A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals
2005Co-Authors: Matthieu Kowalski, Bruno TorrésaniAbstract:The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms. 1. PROBLEM STATEMENT
-
A study of Bernoulli and structured random waveform models for Audio Signals
2005Co-Authors: Matthieu Kowalski, Bruno TorrésaniAbstract:The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms.
Mériem Jaidane - One of the best experts on this subject based on the ideXlab platform.
-
Audio watermarking: a way to stationnarize Audio Signals
IEEE Transactions on Signal Processing, 2005Co-Authors: Sonia Djaziri-larbi, Mériem JaidaneAbstract:Audio watermarking is usually used as a multimedia copyright protection tool or as a system that embed metadata in Audio Signals. In this paper, watermarking is viewed as a preprocessing step for further Audio processing systems: the watermark signal conveys no information, it is rather used to modify the statistical characteristics of an Audio signal, in particular its non stationarity. The embedded watermark is then added in order to stationnarize the host signal. Indeed, the embedded watermark is piecewise stationary, thus it modifies the stationarity of the original Audio signal. In some Audio processing fields, this fact can be used to improve performances that are very sensitive to time-variant signal statistics. This paper presents an analysis of the perceptual water-marking impact on the stationarity of Audio Signals. The study is based on stationarity indices, which represent a measure of variations in spectral characteristics of Signals, using time frequency representations. Simulation results with two kinds of Signals, artificial Signals and Audio Signals (speech and music) are presented. Stationarity indices comparison between water-marked and original Audio Signals shows a significant stationarity enhancement of the watermarked signal, especially for transient attacks. Index Terms-Perceptual Audio watermarking, stationarity indices, time frequency representations.
-
ICASSP (6) - Watermarking influence on the stationarity of Audio Signals
2003 IEEE International Conference on Acoustics Speech and Signal Processing 2003. Proceedings. (ICASSP '03)., 1Co-Authors: Sonia Djaziri Larbi, Mériem JaidaneAbstract:The paper presents an analysis of the perceptual impact of watermarking on the stationarity of Audio Signals. Indeed, the embedded watermark is piecewise stationary, so it modifies the stationarity of the original Audio signal. This study is based on stationarity indices, which represent a measure of the variations in the spectral characteristics of Signals, using time frequency representations. Simulation results with two kinds of Signals, test Signals and Audio Signals (speech and music) are presented. Comparison stationarity indices between watermarked and original Audio Signals show a significant stationarity enhancement of the watermarked signal, especially for transient attacks.
Mériem Jaïdane-saïdane - One of the best experts on this subject based on the ideXlab platform.
-
Audio watermarking: A way to stationnarize Audio Signals
IEEE Transactions on Signal Processing, 2005Co-Authors: Sonia Djaziri Larbi, Mériem Jaïdane-saïdaneAbstract:Audio watermarking is usually used as a multimedia copyright protection tool or as a system that embed metadata in Audio Signals. In this paper, watermarking is viewed as a preprocessing step for further Audio processing systems: the watermark signal conveys no information, rather it is used to modify the statistical characteristics of an Audio signal, in particular its nonstationarity. The embedded watermark is then added in order to stationnarize the host signal. Indeed, the embedded watermark is piecewise stationary, thus it modifies the stationarity of the original Audio signal. In some Audio processing fields, this fact can be used to improve performances that are very sensitive to time-variant signal statistics. This work presents an analysis of the perceptual watermarking impact on the stationarity of Audio Signals. The study is based on stationarity indices, which represent a measure of variations in spectral characteristics of Signals, using time-frequency representations. Simulation results with two kinds of Signals, artificial Signals and Audio Signals (speech and music), are presented. Stationarity indices comparison between watermarked and original Audio Signals shows a significant stationarity enhancement of the watermarked signal, especially for transient attacks.