Audio Feature

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 21306 Experts worldwide ranked by ideXlab platform

Gert Lanckriet - One of the best experts on this subject based on the ideXlab platform.

  • Codebook-Based Audio Feature Representation for Music Information Retrieval
    2014
    Co-Authors: Yonatan Vaizman, Brian Mcfee, Gert Lanckriet
    Abstract:

    Digital music has become prolific in the web in recent decades. Automated recommendation systems are essential for users to discover music they love and for artists to reach appropriate audience. When manual annotations and user preference data is lacking (e.g. for new artists) these systems must rely on content based methods. Besides powerful machine learning tools for classification and retrieval, a key component for successful recommendation is the Audio content representation. Good representations should capture informative musical patterns in the Audio signal of songs. These representations should be concise, to enable efficient (low storage, easy indexing, fast search) management of huge music repositories, and should also be easy and fast to compute, to enable real-time interaction with a user supplying new songs to the system. Before designing new Audio Features, we explore the usage of traditional local Features, while adding a stage of encoding with a pre-computed codebook and a stage of pooling to get compact vectorial representations. We experiment with different encoding methods, namely the LASSO, vector quantization (VQ) and cosine similarity (CS). We evaluate the representations' quality in two music information retrieval applications: query-by-tag and query-by-example. Our results show that concise representations can be used for successful performance in both applications. We recommend using top- τ VQ encoding, which consistently performs well in both applications, and requires much less computation time than the LASSO.

  • codebook based Audio Feature representation for music information retrieval
    2013
    Co-Authors: Yonatan Vaizman, Brian Mcfee, Gert Lanckriet
    Abstract:

    Digital music has become prolific in the web in recent decades. Automated recommendation systems are essential for users to discover music they love and for artists to reach appropriate audience. When manual annotations and user preference data is lacking (e.g. for new artists) these systems must rely on \emph{content based} methods. Besides powerful machine learning tools for classification and retrieval, a key component for successful recommendation is the \emph{Audio content representation}. Good representations should capture informative musical patterns in the Audio signal of songs. These representations should be concise, to enable efficient (low storage, easy indexing, fast search) management of huge music repositories, and should also be easy and fast to compute, to enable real-time interaction with a user supplying new songs to the system. Before designing new Audio Features, we explore the usage of traditional local Features, while adding a stage of encoding with a pre-computed \emph{codebook} and a stage of pooling to get compact vectorial representations. We experiment with different encoding methods, namely \emph{the LASSO}, \emph{vector quantization (VQ)} and \emph{cosine similarity (CS)}. We evaluate the representations' quality in two music information retrieval applications: query-by-tag and query-by-example. Our results show that concise representations can be used for successful performance in both applications. We recommend using top-$\tau$ VQ encoding, which consistently performs well in both applications, and requires much less computation time than the LASSO.

  • semantic annotation and retrieval of music and sound effects
    2008
    Co-Authors: Douglas Turnbull, Luke Barrington, David Torres, Gert Lanckriet
    Abstract:

    We present a computer audition system that can both annotate novel Audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled Audio content given a text-based query. We consider the related tasks of content-based Audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic Features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an Audio Feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our ldquoquery-by-textrdquo system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.

Benjamin Schrauwen - One of the best experts on this subject based on the ideXlab platform.

  • multiscale approaches to music Audio Feature learning
    2013
    Co-Authors: Sander Dieleman, Benjamin Schrauwen
    Abstract:

    Content-based music information retrieval tasks are typically solved with a two-stage approach: Features are extracted from music Audio signals, and are then used as input to a regressor or classifier. These Features can be engineered or learned from data. Although the former approach was dominant in the past, Feature learning has started to receive more attention from the MIR community in recent years. Recent results in Feature learning indicate that simple algorithms such as K-means can be very effective, sometimes surpassing more complicated approaches based on restricted Boltzmann machines, autoencoders or sparse coding. Furthermore, there has been increased interest in multiscale representations of music Audio recently. Such representations are more versatile because music Audio exhibits structure on multiple timescales, which are relevant for different MIR tasks to varying degrees. We develop and compare three approaches to multiscale Audio Feature learning using the spherical K-means algorithm. We evaluate them in an automatic tagging task and a similarity metric learning task on the Magnatagatune dataset.

George Tzanetakis - One of the best experts on this subject based on the ideXlab platform.

  • empirical analysis of track selection and ordering in electronic dance music using Audio Feature extraction
    2013
    Co-Authors: Thor Kell, George Tzanetakis
    Abstract:

    Disc jockeys are in some ways the ultimate experts at selecting and playing recorded music for an audience, especially in the context of dance music. In this work, we empirically investigate factors affecting track selection and ordering using DJ-created mixes of electronic dance music. We use automatic content-based analysis and discuss the implications of our findings to playlist generation and ordering. Timbre appears to be an important factor when selecting tracks and ordering tracks, and track order itself matters, as shown by statistically significant differences in the transitions between the original order and a shuffled version. We also apply this analysis to ordering heuristics and suggest that the standard playlist generation model of returning tracks in order of decreasing similarity to the initial track may not be optimal, at least in the context of track ordering for electronic dance music.

  • distributed Audio Feature extraction for music
    2005
    Co-Authors: Stuart Bray, George Tzanetakis
    Abstract:

    One of the important challenges facing music information retrieval (MIR) of Audio signals is scaling analysis algorithms to large collections. Typically, analysis of Audio signals utilizes sophisticated signal processing and machine learning techniques that require significant computational resources. Therefore, Audio MIR is an area were computational resources are a significant bottleneck. For example, the number of pieces utilized in the majority of existing work in Audio MIR is at most a few thousand files. Computing Audio Features over thousands files can sometimes take days of processing. In this paper, we describe how Marsyas-0.2, a free software framework for Audio analysis and synthesis can be used to rapidly implement efficient distributed Audio analysis algorithms. The framework is based on a dataflow architecture which facilitates partitioning of Audio computations over multiple computers. Experimental results demonstrating the effectiveness of the proposed approach are presented.

Andreas Rauber - One of the best experts on this subject based on the ideXlab platform.

  • evaluation of Feature extractors and psycho acoustic transformations for music genre classification
    2005
    Co-Authors: Thomas Lidy, Andreas Rauber
    Abstract:

    We present a study on the importance of psycho-acoustic transformations for effective Audio Feature calculation. From the results, both crucial and problematic parts of the algorithm for Rhythm Patterns Feature extraction are identified. We furthermore introduce two new Feature representations in this context: Statistical Spectrum Descriptors and Rhythm Histogram Features. Evaluation on both the individual and combined Feature sets is accomplished through a music genre classification task, involving 3 reference Audio collections. Results are compared to published measures on the same data sets. Experiments confirmed that in all settings the inclusion of psycho-acoustic transformations provides significant improvement of classification accuracy.

Joonwhoan Lee - One of the best experts on this subject based on the ideXlab platform.

  • Audio Feature reduction and analysis for automatic music genre classification
    2014
    Co-Authors: Babu Kaji Baniya, Joonwhoan Lee
    Abstract:

    Multimedia database retrieval is growing at a fast rate thereby subsequent increase in the popularity of online retrieval system. The large datasets are major challenges for searching, retrieving, and organizing the music content. Therefore, there is a need of robust automatic music genre classification method for organizing these music data into different classes according to the certain viable information. There are two fundamental components to be considered for genre classification namely Audio Feature extraction and classifier design. In this paper, diverse Audio Features set have been proposed to characterize the music contents precisely. The Feature sets belong to four different groups, i.e. dynamic, rhythm, spectral, and harmony. From the Features, five different statistical parameters are considered as representatives, including up to the 4 th order central moments of each Feature, and covariance components. Ultimately, significant numbers of representative attributes are controlled by MRMR algorithm. The algorithm calculates the score level of all Feature attributes and orders them. The high score Feature attributes are only considered for genre classification. Moreover, we can visualize that which Audio Features and which of the different statistical parameters derived from them are important for genre classification. Among them, mel frequency cepstral coefficients (MFCCs) have higher scored level than other Feature attributes. Furthermore, MRMR does not transform the Feature value like as principal component analysis (PCA). Besides these, the comparison has been made based on classification accuracy between two-dimensionality reduction methodologies using support vector machine (SVM). The classification accuracy of MRMR Feature reduction set outperforms than PCA. The overall classification is also higher than other existing state-of-the-art of frame base methods.