Music Signal

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 21591 Experts worldwide ranked by ideXlab platform

Shigeki Sagayama - One of the best experts on this subject based on the ideXlab platform.

  • infinite state spectrum model for Music Signal analysis
    International Conference on Acoustics Speech and Signal Processing, 2011
    Co-Authors: Masahiro Nakano, Hirokazu Kameoka, Jonathan Le Roux, Nobutaka Ono, Shigeki Sagayama
    Abstract:

    This paper presents a nonparametric Bayesian extension of nonnegative matrix factorization (NMF) for Music Signal analysis. Instrument sounds often exhibit non-stationary spectral characteristics. We introduce infinite-state spectral bases into NMF to represent time-varying spectra in polyphonic Music Signals. We describe our extension of NMF with infinite-state spectral bases generated by the Dirichlet process in a statistical framework, derive an efficient optimization algorithm based on collapsed variational inference, and validate the framework on audio data.

  • Introduction to the Special Issue on Music Signal Processing
    IEEE Journal of Selected Topics in Signal Processing, 2011
    Co-Authors: Meinard Müller, Anssi Klapuri, Gael Richard, Daniel P. W. Ellis, Shigeki Sagayama
    Abstract:

    The 15 papers in this special issue are devoted to the merging field of Music Signal processing.

  • ICASSP - Infinite-state spectrum model for Music Signal analysis
    2011 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2011
    Co-Authors: Masahiro Nakano, Hirokazu Kameoka, Nobutaka Ono, Jonathan Le Roux, Shigeki Sagayama
    Abstract:

    This paper presents a nonparametric Bayesian extension of nonnegative matrix factorization (NMF) for Music Signal analysis. Instrument sounds often exhibit non-stationary spectral characteristics. We introduce infinite-state spectral bases into NMF to represent time-varying spectra in polyphonic Music Signals. We describe our extension of NMF with infinite-state spectral bases generated by the Dirichlet process in a statistical framework, derive an efficient optimization algorithm based on collapsed variational inference, and validate the framework on audio data.

  • Specmurt Analysis of Polyphonic Music Signals
    IEEE Transactions on Audio Speech and Language Processing, 2008
    Co-Authors: Shoichiro Saito, Hirokazu Kameoka, Takuya Nishimoto, K. Takahashi, Shigeki Sagayama
    Abstract:

    This paper introduces a new Music Signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic Music Signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic Music Signals and compared with manually annotated MIDI data.

  • audio stream segregation of multi pitch Music Signal based on time space clustering using gaussian kernel 2 dimensional model
    International Conference on Acoustics Speech and Signal Processing, 2005
    Co-Authors: Hirokazu Kameoka, Takuya Nishimoto, Shigeki Sagayama
    Abstract:

    The paper describes a novel approach for audio stream segregation of a multi-pitch Music Signal. We propose a parameter-constrained time-frequency spectrum model expressing both a harmonic spectral structure and a temporal curve of the power envelope with Gaussian kernels. MAP estimation of the model parameters using the EM algorithm provides fundamental frequency, onset and offset time, spectral envelope and power envelope of every underlying audio stream. Our proposed method showed high accuracy in a pitch name estimation task of several pieces of real Music performance data.

Hirokazu Kameoka - One of the best experts on this subject based on the ideXlab platform.

  • ICASSP - Mondrian hidden Markov model for Music Signal processing
    2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2014
    Co-Authors: Masahiro Nakano, Hirokazu Kameoka, Yasunori Ohishi, Ryo Mukai, Kunio Kashino
    Abstract:

    This paper discusses a new extension of hidden Markov models that can capture clusters embedded in transitions between the hidden states. In our model, the state-transition matrices are viewed as representations of relational data reflecting a network structure between the hidden states. We specifically present a nonparametric Bayesian approach to the proposed state-space model whose network structure is represented by a Mondrian Process-based relational model. We show an application of the proposed model to Music Signal analysis through some experimental results.

  • infinite state spectrum model for Music Signal analysis
    International Conference on Acoustics Speech and Signal Processing, 2011
    Co-Authors: Masahiro Nakano, Hirokazu Kameoka, Jonathan Le Roux, Nobutaka Ono, Shigeki Sagayama
    Abstract:

    This paper presents a nonparametric Bayesian extension of nonnegative matrix factorization (NMF) for Music Signal analysis. Instrument sounds often exhibit non-stationary spectral characteristics. We introduce infinite-state spectral bases into NMF to represent time-varying spectra in polyphonic Music Signals. We describe our extension of NMF with infinite-state spectral bases generated by the Dirichlet process in a statistical framework, derive an efficient optimization algorithm based on collapsed variational inference, and validate the framework on audio data.

  • ICASSP - Infinite-state spectrum model for Music Signal analysis
    2011 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2011
    Co-Authors: Masahiro Nakano, Hirokazu Kameoka, Nobutaka Ono, Jonathan Le Roux, Shigeki Sagayama
    Abstract:

    This paper presents a nonparametric Bayesian extension of nonnegative matrix factorization (NMF) for Music Signal analysis. Instrument sounds often exhibit non-stationary spectral characteristics. We introduce infinite-state spectral bases into NMF to represent time-varying spectra in polyphonic Music Signals. We describe our extension of NMF with infinite-state spectral bases generated by the Dirichlet process in a statistical framework, derive an efficient optimization algorithm based on collapsed variational inference, and validate the framework on audio data.

  • Specmurt Analysis of Polyphonic Music Signals
    IEEE Transactions on Audio Speech and Language Processing, 2008
    Co-Authors: Shoichiro Saito, Hirokazu Kameoka, Takuya Nishimoto, K. Takahashi, Shigeki Sagayama
    Abstract:

    This paper introduces a new Music Signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic Music Signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic Music Signals and compared with manually annotated MIDI data.

  • audio stream segregation of multi pitch Music Signal based on time space clustering using gaussian kernel 2 dimensional model
    International Conference on Acoustics Speech and Signal Processing, 2005
    Co-Authors: Hirokazu Kameoka, Takuya Nishimoto, Shigeki Sagayama
    Abstract:

    The paper describes a novel approach for audio stream segregation of a multi-pitch Music Signal. We propose a parameter-constrained time-frequency spectrum model expressing both a harmonic spectral structure and a temporal curve of the power envelope with Gaussian kernels. MAP estimation of the model parameters using the EM algorithm provides fundamental frequency, onset and offset time, spectral envelope and power envelope of every underlying audio stream. Our proposed method showed high accuracy in a pitch name estimation task of several pieces of real Music performance data.

Kazunobu Kondo - One of the best experts on this subject based on the ideXlab platform.

  • Music Signal separation using supervised nmf with all pole model based discriminative basis deformation
    European Signal Processing Conference, 2016
    Co-Authors: Hiroaki Nakajima, Nobutaka Ono, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Norihiro Takamune, Shoichi Koyama, Kazunobu Kondo
    Abstract:

    In this paper, we address the Music Signal separation problem and propose a new supervised nonnegative matrix factorization (SNMF) algorithm employing the deformation of a spectral supervision basis trained in advance. Conventional SNMF has a problem that the separation accuracy is degraded by a mismatch between the trained basis and the spectrogram of the actual target sound in open data. To reduce the mismatch problem, we propose a new method with two features. First, we introduce a deformation with an all-pole model that is optimized to make the trained basis fit the spectrogram of the target Signal, even if the true target component is hidden in the observed mixture. Next, to avoid an excess deformation, we limit the degree of freedom in the deformation by performing discriminative training. Our experimental evaluation reveals that the proposed method outperforms conventional SNMFs.

  • Music Signal separation based on bayesian spectral amplitude estimator with automatic target prior adaptation
    International Conference on Acoustics Speech and Signal Processing, 2014
    Co-Authors: Yuki Murota, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Shunsuke Nakai, Satoshi Nakamura, Kazunobu Kondo
    Abstract:

    In this paper, we propose a new approach for addressing Music Signal separation based on the generalized Bayesian estimator with automatic prior adaptation. This method consists of three parts, namely, the generalized MMSE-STSA estimator with a flexible target Signal prior, the NMF-based dynamic interference spectrogram estimator, and closed-form parameter estimation for the statistical model of the target Signal based on higher-order statistics. The statistical model parameter of the hidden target Signal can be detected automatically for optimal Bayesian estimation with online target-Signal prior adaptation. Our experimental evaluation can show the efficacy of the proposed method.

  • ICASSP - Music Signal separation based on Bayesian spectral amplitude estimator with automatic target prior adaptation
    2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2014
    Co-Authors: Yuki Murota, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Shunsuke Nakai, Satoshi Nakamura, Kazunobu Kondo
    Abstract:

    In this paper, we propose a new approach for addressing Music Signal separation based on the generalized Bayesian estimator with automatic prior adaptation. This method consists of three parts, namely, the generalized MMSE-STSA estimator with a flexible target Signal prior, the NMF-based dynamic interference spectrogram estimator, and closed-form parameter estimation for the statistical model of the target Signal based on higher-order statistics. The statistical model parameter of the hidden target Signal can be detected automatically for optimal Bayesian estimation with online target-Signal prior adaptation. Our experimental evaluation can show the efficacy of the proposed method.

  • robust Music Signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing
    International Symposium on Signal Processing and Information Technology, 2013
    Co-Authors: Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Yu Takahashi, Kosuke Yagi, Kazunobu Kondo
    Abstract:

    In this paper, we address a monaural source separation problem and propose a new penalized supervised nonnegative matrix factorization (SNMF). Conventional SNMF often degrades the separation performance owing to the basis-sharing problem between supervised bases and nontarget bases. To solve this problem, we employ two types of penalty term based on orthogonality and divergence maximization in the cost function to force the nontarget bases to become as different as possible from the supervised bases. From the experimental results, it can be confirmed that the proposed method prevents the simultaneous generation of similar spectral patterns in the supervised bases and other bases, and increases the separation performance compared with the conventional method.

  • Music Signal separation by supervised nonnegative matrix factorization with basis deformation
    International Conference on Digital Signal Processing, 2013
    Co-Authors: Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi
    Abstract:

    In this paper, we address a Music Signal separation problem, and propose a new supervised algorithm for real instrumental Signal separation employing a deformable capability for a spectral supervision trained in advance. Nonnegative matrix factorization (NMF) is one of the techniques used for the separation of an audio mixture that consists of multiple instrumental sources. Conventional supervised NMF has the critical problem that a mismatch between the bases trained in advance and the target real sound reduces the accuracy of separation. To solve this problem, we propose a new advanced supervised NMF that employs a deformable capability for the trained bases and penalty terms for making the bases fit into the target sound. The results of the experiment using real instruments show that the proposed method significantly improves the accuracy of separation compared with the conventional method.

Takuya Nishimoto - One of the best experts on this subject based on the ideXlab platform.

  • Specmurt Analysis of Polyphonic Music Signals
    IEEE Transactions on Audio Speech and Language Processing, 2008
    Co-Authors: Shoichiro Saito, Hirokazu Kameoka, Takuya Nishimoto, K. Takahashi, Shigeki Sagayama
    Abstract:

    This paper introduces a new Music Signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic Music Signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic Music Signals and compared with manually annotated MIDI data.

  • audio stream segregation of multi pitch Music Signal based on time space clustering using gaussian kernel 2 dimensional model
    International Conference on Acoustics Speech and Signal Processing, 2005
    Co-Authors: Hirokazu Kameoka, Takuya Nishimoto, Shigeki Sagayama
    Abstract:

    The paper describes a novel approach for audio stream segregation of a multi-pitch Music Signal. We propose a parameter-constrained time-frequency spectrum model expressing both a harmonic spectral structure and a temporal curve of the power envelope with Gaussian kernels. MAP estimation of the model parameters using the EM algorithm provides fundamental frequency, onset and offset time, spectral envelope and power envelope of every underlying audio stream. Our proposed method showed high accuracy in a pitch name estimation task of several pieces of real Music performance data.

  • specmurt anasylis a piano roll visualization of polyphonic Music Signal by deconvolution of log frequency spectrum
    Conference of the International Speech Communication Association, 2004
    Co-Authors: Shigeki Sagayama, Hirokazu Kameoka, Keigo Takahashi, Takuya Nishimoto
    Abstract:

    In this paper, we propose a new Signal processing technique, “specmurt anasylis,” that provides piano-rolllike visual display of multi-tone Signals (e.g., polyphonic Music). Specmurt is defined as inverse Fourier transform of linear spectrum with logarithmic frequency, unlike familiar cepstrum defined as inverse Fourier transform of logarithmic spectrum with linear frequency. We apply this technique to Music Signals frencyque anasylis using specmurt filreting instead of quefrency alanysis using cepstrum liftering. Suppose that each sound contained in the multi-pitch Signal has exactly the same harmonic structure pattern (i.e., the energy ratio of harmonic components), in logarithmic frequency domain the overall shape of the multi-pitch spectrum is a superposition of the common spectral patterns with different degrees of parallel shift. The overall shape can be expressed as a convolution of a fundamental frequency pattern (degrees of parallel shift and power) and the common harmonic structure pattern. The fundamental frequency pattern is restored by division of the inverse Fourier transform of a given log-frequency spectrum, i.e., specmurt, by that of the common harmonic structure pattern. The proposed method was successfully tested on several pieces of Music recordings.

  • ICASSP (3) - Audio stream segregation of multi-pitch Music Signal based on time-space clustering using Gaussian kernel 2-dimensional model
    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 1
    Co-Authors: Hirokazu Kameoka, Takuya Nishimoto, Shigeki Sagayama
    Abstract:

    The paper describes a novel approach for audio stream segregation of a multi-pitch Music Signal. We propose a parameter-constrained time-frequency spectrum model expressing both a harmonic spectral structure and a temporal curve of the power envelope with Gaussian kernels. MAP estimation of the model parameters using the EM algorithm provides fundamental frequency, onset and offset time, spectral envelope and power envelope of every underlying audio stream. Our proposed method showed high accuracy in a pitch name estimation task of several pieces of real Music performance data.

Dirk Slock - One of the best experts on this subject based on the ideXlab platform.

  • MMSP - Periodic Signal extraction with frequency-selective amplitude modulation and global time-warping for Music Signal decomposition
    2008 IEEE 10th Workshop on Multimedia Signal Processing, 2008
    Co-Authors: Mahdi Triki, Dirk Slock, Ahmed Triki
    Abstract:

    A key building block in Music transcription and indexing operations is the decomposition of Music Signals into notes. We model a note Signal as a periodic Signal with (slow) frequency-selective amplitude modulation and global time warping. Time-varying frequency-selective amplitude modulation allows the various harmonics of the periodic Signal to decay at different speeds. Time-warping allows for some limited global frequency modulation. The bandlimited variation of the frequency-selective amplitude modulation and of the global time warping gets expressed through a subsampled representation and parametrization of the corresponding Signals. Assuming additive white Gaussian noise, a maximum likelihood approach is proposed for the estimation of the model parameters and the optimization is performed in an iterative (cyclic) fashion that leads to a sequence of simple least-squares problems.

  • Periodic Signal extraction with frequency-selective amplitude modulation and global time-warping for Music Signal decomposition
    2008 IEEE 10th Workshop on Multimedia Signal Processing, 2008
    Co-Authors: Mahdi Triki, Dirk Slock, Ahmed Triki
    Abstract:

    A key building block in Music transcription and indexing operations is the decomposition of Music Signals into notes. We model a note Signal as a periodic Signal with (slow) frequency-selective amplitude modulation and global time warping. Time-varying frequency-selective amplitude modulation allows the various harmonics of the periodic Signal to decay at different speeds. Time-warping allows for some limited global frequency modulation. The bandlimited variation of the frequency-selective amplitude modulation and of the global time warping gets expressed through a subsampled representation and parametrization of the corresponding Signals. Assuming additive white Gaussian noise, a maximum likelihood approach is proposed for the estimation of the model parameters and the optimization is performed in an iterative (cyclic) fashion that leads to a sequence of simple least-squares problems.

  • periodic Signal extraction with global amplitude and phase modulation for Music Signal decomposition
    International Conference on Acoustics Speech and Signal Processing, 2005
    Co-Authors: Mahdi Triki, Dirk Slock
    Abstract:

    A key building block in Music transcription and indexing operations is the decomposition of the Music Signal into notes. We model a note Signal as a periodic Signal with (slow) global variation of amplitude (reflecting attack, sustain, decay) and frequency (limited time warping). The bandlimited variation of global amplitude and frequency is expressed through a subsampled representation and parameterization of the corresponding Signals. Assuming additive white Gaussian noise, a maximum likelihood approach is proposed for the estimation of the model parameters and the optimization is performed in an iterative (cyclic) fashion that leads to a sequence of simple least-squares problems. Particular attention is paid to the estimation of the basic periodic Signal, which can have a non-integer period, and the estimation of the amplitude Signal with guaranteed positivity.

  • ICASSP (3) - Periodic Signal extraction with global amplitude and phase modulation for Music Signal decomposition
    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 1
    Co-Authors: Mahdi Triki, Dirk Slock
    Abstract:

    A key building block in Music transcription and indexing operations is the decomposition of the Music Signal into notes. We model a note Signal as a periodic Signal with (slow) global variation of amplitude (reflecting attack, sustain, decay) and frequency (limited time warping). The bandlimited variation of global amplitude and frequency is expressed through a subsampled representation and parameterization of the corresponding Signals. Assuming additive white Gaussian noise, a maximum likelihood approach is proposed for the estimation of the model parameters and the optimization is performed in an iterative (cyclic) fashion that leads to a sequence of simple least-squares problems. Particular attention is paid to the estimation of the basic periodic Signal, which can have a non-integer period, and the estimation of the amplitude Signal with guaranteed positivity.