Audio Signals

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 29358 Experts worldwide ranked by ideXlab platform

Masataka Goto - One of the best experts on this subject based on the ideXlab platform.

  • lyricsynchronizer automatic synchronization system between musical Audio Signals and lyrics
    IEEE Journal of Selected Topics in Signal Processing, 2011
    Co-Authors: Hiromasa Fujihara, Masataka Goto, Jun Ogata, Hiroshi G Okuno
    Abstract:

    This paper describes a system that can automatically synchronize polyphonic musical Audio Signals with their corresponding lyrics. Although methods for synchronizing monophonic speech Signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal Signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.

  • Drum sound recognition for polyphonic Audio Signals by adaptation and matching of spectrogram templates with harmonic structure suppression
    IEEE Transactions on Audio Speech and Language Processing, 2007
    Co-Authors: Kazuyoshi Yoshii, Masataka Goto, Hiroshi G Okuno
    Abstract:

    This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic Audio Signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto's distance measure originally designed to detect the onsets in drums-only Signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively

  • a real time music scene description system predominant f0 estimation for detecting melody and bass lines in real world Audio Signals
    Speech Communication, 2004
    Co-Authors: Masataka Goto
    Abstract:

    Abstract In this paper, we describe the concept of music scene description and address the problem of detecting melody and bass lines in real-world Audio Signals containing the sounds of various instruments. Most previous pitch-estimation methods have had difficulty dealing with such complex music Signals because these methods were designed to deal with mixtures of only a few sounds. To enable estimation of the fundamental frequency (F0) of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the unreliable fundamental component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. This method estimates the relative dominance of every possible F0 (represented as a probability density function of the F0) by using MAP (maximum a posteriori probability) estimation and considers the F0’s temporal continuity by using a multiple-agent architecture. Experimental results with a set of ten music excerpts from compact-disc recordings showed that a real-time system implementing this method was able to detect melody and bass lines about 80% of the time these existed.

  • real time beat tracking for drumless Audio Signals chord change detection for musical decisions
    Speech Communication, 1999
    Co-Authors: Masataka Goto, Yoichi Muraoka
    Abstract:

    Abstract This paper describes a real-time beat-tracking system that detects a hierarchical beat structure in musical Audio Signals without drum-sounds. Most previous systems have dealt with MIDI Signals and had difficulty in applying, in real time, musical heuristics to Audio Signals containing sounds of various instruments and in tracking beats above the quarter-note level. Our system not only tracks beats at the quarter-note level but also detects beat structure at the half-note and measure levels. To make musical decisions about the Audio Signals, we propose a method of detecting chord changes that does not require chord names to be identified. The method enables the system to track beats at different rhythmic levels – for example, to find the beginnings of half notes and measures – and to select the best of various hypotheses about beat positions. Experimental results show that the proposed method was effective to detect the beat structure in real-world Audio Signals sampled from compact discs of popular music.

  • musical understanding at the beat level real time beat tracking for Audio Signals
    Computational auditory scene analysis, 1998
    Co-Authors: Masataka Goto, Yoichi Muraoka
    Abstract:

    This paper presents the main issues and our solutions to the problem of understanding musical Audio Signals at the beat level, issues which are common to more general auditory scene analysis. Previous beat tracking systems were not able to work in realistic acoustic environments. We built a real-time beat tracking system that processes Audio Signals that contain sounds of various instruments. The main features of our solutions are: (1) To handle ambiguous situations, our system manages multiple agents that maintain multiple hypotheses of beats. (2) Our system makes a context-dependent decision by leveraging musical knowledge represented as drum patterns. (3) All processes are performed based on how reliable detected events and hypotheses are, since it is impossible to handle realistic complex Signals without mistakes. (4) Frequency-analysis parameters are dynamically adjusted by interaction between low-level and high-level processing. In our experiment using music on commercially distributed compact discs, our system correctly tracked beats in 40 out of 42 popular songs in which drums maintain the beat.

Bruno Torrésani - One of the best experts on this subject based on the ideXlab platform.

  • Random models for sparse Signals expansion on unions of bases with application to Audio Signals
    IEEE Transactions on Signal Processing, 2008
    Co-Authors: Matthieu Kowalski, Bruno Torrésani
    Abstract:

    A new approach for signal expansion with respect to hybrid dictionaries, based upon probabilistic modeling is proposed and studied, with emphasis on Audio signal processing applications. The signal is modeled as a sparse linear combination of waveforms, taken from the union of two orthonormal bases, with random coefficients. The behavior of the analysis coefficients, namely inner products of the signal with all basis functions, is studied in details, which shows that these coefficients may generally be classified in two categories: significant coefficients versus unsignificant coefficients. Conditions ensuring the feasibility of such a classification are given. When the classification is possible, it leads to efficient estimation algorithms, that may in turn be used for de-noising or coding purpose. The proposed approach is illustrated by numerical experiments on Audio Signals, using MDCT bases.

  • A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals
    2005
    Co-Authors: Matthieu Kowalski, Bruno Torrésani
    Abstract:

    The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms. 1. PROBLEM STATEMENT

  • A study of Bernoulli and structured random waveform models for Audio Signals
    2005
    Co-Authors: Matthieu Kowalski, Bruno Torrésani
    Abstract:

    The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms.

Matthieu Kowalski - One of the best experts on this subject based on the ideXlab platform.

  • Drum extraction in single channel Audio Signals using multi-layer non negative matrix factor deconvolution
    2017
    Co-Authors: Clément Laroche, Hélène Papadopoulos, Matthieu Kowalski, Gael Richard
    Abstract:

    In this paper, we propose a supervised multilayer factorization method designed for harmonic/percussive source separation and drum extraction. Our method decomposes the Audio Signals in sparse orthogonal components which capture the harmonic content, while the drum is represented by an extension of non negative matrix factorization which is able to exploit time-frequency dictionaries to take into account non stationary drum sounds. The drum dictionaries represent various real drum hits and the decomposition has more physical sense and allows for a better interpretation of the results. Experiments on real music data for a harmonic/percussive source separation task show that our method outperforms other state of the art algorithms. Finally, our method is very robust to non stationary harmonic sources that are usually poorly decomposed by existing methods.

  • Sparse and structured decomposition of Audio Signals on hybrid dictionaries using musical priors
    Journal of the Acoustical Society of America, 2013
    Co-Authors: Hélène Papadopoulos, Matthieu Kowalski
    Abstract:

    This paper investigates the use of musical priors for sparse expansion of Audio Signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music Audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music Signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music Audio Signals.

  • Random models for sparse Signals expansion on unions of bases with application to Audio Signals
    IEEE Transactions on Signal Processing, 2008
    Co-Authors: Matthieu Kowalski, Bruno Torrésani
    Abstract:

    A new approach for signal expansion with respect to hybrid dictionaries, based upon probabilistic modeling is proposed and studied, with emphasis on Audio signal processing applications. The signal is modeled as a sparse linear combination of waveforms, taken from the union of two orthonormal bases, with random coefficients. The behavior of the analysis coefficients, namely inner products of the signal with all basis functions, is studied in details, which shows that these coefficients may generally be classified in two categories: significant coefficients versus unsignificant coefficients. Conditions ensuring the feasibility of such a classification are given. When the classification is possible, it leads to efficient estimation algorithms, that may in turn be used for de-noising or coding purpose. The proposed approach is illustrated by numerical experiments on Audio Signals, using MDCT bases.

  • A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals
    2005
    Co-Authors: Matthieu Kowalski, Bruno Torrésani
    Abstract:

    The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms. 1. PROBLEM STATEMENT

  • A study of Bernoulli and structured random waveform models for Audio Signals
    2005
    Co-Authors: Matthieu Kowalski, Bruno Torrésani
    Abstract:

    The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms.

Mériem Jaidane - One of the best experts on this subject based on the ideXlab platform.

  • Audio watermarking: a way to stationnarize Audio Signals
    IEEE Transactions on Signal Processing, 2005
    Co-Authors: Sonia Djaziri-larbi, Mériem Jaidane
    Abstract:

    Audio watermarking is usually used as a multimedia copyright protection tool or as a system that embed metadata in Audio Signals. In this paper, watermarking is viewed as a preprocessing step for further Audio processing systems: the watermark signal conveys no information, it is rather used to modify the statistical characteristics of an Audio signal, in particular its non stationarity. The embedded watermark is then added in order to stationnarize the host signal. Indeed, the embedded watermark is piecewise stationary, thus it modifies the stationarity of the original Audio signal. In some Audio processing fields, this fact can be used to improve performances that are very sensitive to time-variant signal statistics. This paper presents an analysis of the perceptual water-marking impact on the stationarity of Audio Signals. The study is based on stationarity indices, which represent a measure of variations in spectral characteristics of Signals, using time frequency representations. Simulation results with two kinds of Signals, artificial Signals and Audio Signals (speech and music) are presented. Stationarity indices comparison between water-marked and original Audio Signals shows a significant stationarity enhancement of the watermarked signal, especially for transient attacks. Index Terms-Perceptual Audio watermarking, stationarity indices, time frequency representations.

  • ICASSP (6) - Watermarking influence on the stationarity of Audio Signals
    2003 IEEE International Conference on Acoustics Speech and Signal Processing 2003. Proceedings. (ICASSP '03)., 1
    Co-Authors: Sonia Djaziri Larbi, Mériem Jaidane
    Abstract:

    The paper presents an analysis of the perceptual impact of watermarking on the stationarity of Audio Signals. Indeed, the embedded watermark is piecewise stationary, so it modifies the stationarity of the original Audio signal. This study is based on stationarity indices, which represent a measure of the variations in the spectral characteristics of Signals, using time frequency representations. Simulation results with two kinds of Signals, test Signals and Audio Signals (speech and music) are presented. Comparison stationarity indices between watermarked and original Audio Signals show a significant stationarity enhancement of the watermarked signal, especially for transient attacks.

Mériem Jaïdane-saïdane - One of the best experts on this subject based on the ideXlab platform.

  • Audio watermarking: A way to stationnarize Audio Signals
    IEEE Transactions on Signal Processing, 2005
    Co-Authors: Sonia Djaziri Larbi, Mériem Jaïdane-saïdane
    Abstract:

    Audio watermarking is usually used as a multimedia copyright protection tool or as a system that embed metadata in Audio Signals. In this paper, watermarking is viewed as a preprocessing step for further Audio processing systems: the watermark signal conveys no information, rather it is used to modify the statistical characteristics of an Audio signal, in particular its nonstationarity. The embedded watermark is then added in order to stationnarize the host signal. Indeed, the embedded watermark is piecewise stationary, thus it modifies the stationarity of the original Audio signal. In some Audio processing fields, this fact can be used to improve performances that are very sensitive to time-variant signal statistics. This work presents an analysis of the perceptual watermarking impact on the stationarity of Audio Signals. The study is based on stationarity indices, which represent a measure of variations in spectral characteristics of Signals, using time-frequency representations. Simulation results with two kinds of Signals, artificial Signals and Audio Signals (speech and music), are presented. Stationarity indices comparison between watermarked and original Audio Signals shows a significant stationarity enhancement of the watermarked signal, especially for transient attacks.