Audio Signals - Explore the Science & Experts | ideXlab


Scan Science and Technology

Contact Leading Edge Experts & Companies

Audio Signals

The Experts below are selected from a list of 29358 Experts worldwide ranked by ideXlab platform

Audio Signals – Free Register to Access Experts & Abstracts

Masataka Goto – One of the best experts on this subject based on the ideXlab platform.

  • lyricsynchronizer automatic synchronization system between musical Audio Signals and lyrics
    IEEE Journal of Selected Topics in Signal Processing, 2011
    Co-Authors: Hiromasa Fujihara, Masataka Goto, Jun Ogata, Hiroshi G Okuno

    Abstract:

    This paper describes a system that can automatically synchronize polyphonic musical Audio Signals with their corresponding lyrics. Although methods for synchronizing monophonic speech Signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal Signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.

    Free Register to Access Article

  • Drum sound recognition for polyphonic Audio Signals by adaptation and matching of spectrogram templates with harmonic structure suppression
    IEEE Transactions on Audio Speech and Language Processing, 2007
    Co-Authors: Kazuyoshi Yoshii, Masataka Goto, Hiroshi G Okuno

    Abstract:

    This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic Audio Signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto’s distance measure originally designed to detect the onsets in drums-only Signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively

    Free Register to Access Article

  • a real time music scene description system predominant f0 estimation for detecting melody and bass lines in real world Audio Signals
    Speech Communication, 2004
    Co-Authors: Masataka Goto

    Abstract:

    Abstract In this paper, we describe the concept of music scene description and address the problem of detecting melody and bass lines in real-world Audio Signals containing the sounds of various instruments. Most previous pitch-estimation methods have had difficulty dealing with such complex music Signals because these methods were designed to deal with mixtures of only a few sounds. To enable estimation of the fundamental frequency (F0) of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the unreliable fundamental component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. This method estimates the relative dominance of every possible F0 (represented as a probability density function of the F0) by using MAP (maximum a posteriori probability) estimation and considers the F0’s temporal continuity by using a multiple-agent architecture. Experimental results with a set of ten music excerpts from compact-disc recordings showed that a real-time system implementing this method was able to detect melody and bass lines about 80% of the time these existed.

    Free Register to Access Article

Bruno Torrésani – One of the best experts on this subject based on the ideXlab platform.

  • Random models for sparse Signals expansion on unions of bases with application to Audio Signals
    IEEE Transactions on Signal Processing, 2008
    Co-Authors: Matthieu Kowalski, Bruno Torrésani

    Abstract:

    A new approach for signal expansion with respect to hybrid dictionaries, based upon probabilistic modeling is proposed and studied, with emphasis on Audio signal processing applications. The signal is modeled as a sparse linear combination of waveforms, taken from the union of two orthonormal bases, with random coefficients. The behavior of the analysis coefficients, namely inner products of the signal with all basis functions, is studied in details, which shows that these coefficients may generally be classified in two categories: significant coefficients versus unsignificant coefficients. Conditions ensuring the feasibility of such a classification are given. When the classification is possible, it leads to efficient estimation algorithms, that may in turn be used for de-noising or coding purpose. The proposed approach is illustrated by numerical experiments on Audio Signals, using MDCT bases.

    Free Register to Access Article

  • A STUDY OF BERNOULLI AND STRUCTURED RANDOM WAVEFORM MODELS FOR Audio Signals
    , 2005
    Co-Authors: Matthieu Kowalski, Bruno Torrésani

    Abstract:

    The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms. 1. PROBLEM STATEMENT

    Free Register to Access Article

  • A study of Bernoulli and structured random waveform models for Audio Signals
    , 2005
    Co-Authors: Matthieu Kowalski, Bruno Torrésani

    Abstract:

    The empirical pdf of wavelet or MDCT coefficients of Audio signal generally feature a sharp peak at the origin, together with heavy tails. We show that such features may be reproduced if Audio Signals are modelled as sparse series of waveforms, randomly taken from a union of two significantly different orthonormal bases. In this context we obtain estimates for the behavior of “observed” coefficients, and numerical results on Audio Signals. Unlike more classical approaches involving optimization algorithms, our approach approaches thus relies on an explicit model. These allow us to analyze mathematical properties of such Signals and corresponding estimators, and derive simple estimation algorithms.

    Free Register to Access Article

Matthieu Kowalski – One of the best experts on this subject based on the ideXlab platform.

  • Drum extraction in single channel Audio Signals using multi-layer non negative matrix factor deconvolution
    , 2017
    Co-Authors: Clément Laroche, Hélène Papadopoulos, Matthieu Kowalski, Gael Richard

    Abstract:

    In this paper, we propose a supervised multilayer factorization method designed for harmonic/percussive source separation and drum extraction. Our method decomposes the Audio Signals in sparse orthogonal components which capture the harmonic content, while the drum is represented by an extension of non negative matrix factorization which is able to exploit time-frequency dictionaries to take into account non stationary drum sounds. The drum dictionaries represent various real drum hits and the decomposition has more physical sense and allows for a better interpretation of the results. Experiments on real music data for a harmonic/percussive source separation task show that our method outperforms other state of the art algorithms. Finally, our method is very robust to non stationary harmonic sources that are usually poorly decomposed by existing methods.

    Free Register to Access Article

  • Sparse and structured decomposition of Audio Signals on hybrid dictionaries using musical priors
    Journal of the Acoustical Society of America, 2013
    Co-Authors: Hélène Papadopoulos, Matthieu Kowalski

    Abstract:

    This paper investigates the use of musical priors for sparse expansion of Audio Signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music Audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music Signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music Audio Signals.

    Free Register to Access Article

  • Random models for sparse Signals expansion on unions of bases with application to Audio Signals
    IEEE Transactions on Signal Processing, 2008
    Co-Authors: Matthieu Kowalski, Bruno Torrésani

    Abstract:

    A new approach for signal expansion with respect to hybrid dictionaries, based upon probabilistic modeling is proposed and studied, with emphasis on Audio signal processing applications. The signal is modeled as a sparse linear combination of waveforms, taken from the union of two orthonormal bases, with random coefficients. The behavior of the analysis coefficients, namely inner products of the signal with all basis functions, is studied in details, which shows that these coefficients may generally be classified in two categories: significant coefficients versus unsignificant coefficients. Conditions ensuring the feasibility of such a classification are given. When the classification is possible, it leads to efficient estimation algorithms, that may in turn be used for de-noising or coding purpose. The proposed approach is illustrated by numerical experiments on Audio Signals, using MDCT bases.

    Free Register to Access Article