Semantic Context

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 112458 Experts worldwide ranked by ideXlab platform

Chong Wah Ngo - One of the best experts on this subject based on the ideXlab platform.

  • Semantic Context modeling with maximal margin conditional random fields for automatic image annotation
    Computer Vision and Pattern Recognition, 2010
    Co-Authors: Yu Xiang, Xiangdong Zhou, Zuotao Liu, Tatseng Chua, Chong Wah Ngo
    Abstract:

    Context modeling for Vision Recognition and Automatic Image Annotation (AIA) has attracted increasing attentions in recent years. For various Contextual information and resources, Semantic Context has been exploited in AIA and brings promising results. However, previous works either casted the problem into structural classification or adopted multi-layer modeling, which suffer from the problems of scalability or model efficiency. In this paper, we propose a novel discriminative Conditional Random Field (CRF) model for Semantic Context modeling in AIA, which is built over Semantic concepts and treats an image as a whole observation without segmentation. Our model captures the interactions between Semantic concepts from both Semantic level and visual level in an integrated manner. Specifically, we employ graph structure to model Contextual relationships between Semantic concepts. The potential functions are designed based on linear discriminative models, which enables us to propose a novel decoupled hinge loss function for maximal margin parameter estimation. We train the model by solving a set of independent quadratic programming problems with our derived Contextual kernel. The experiments are conducted on commonly used benchmarks: Corel and TRECVID data sets for evaluation. The experimental results show that compared with the state-of-the-art methods, our method achieves significant improvement on annotation performance.

  • Semantic Context transfer across heterogeneous sources for domain adaptive video search
    ACM Multimedia, 2009
    Co-Authors: Yugang Jiang, Chong Wah Ngo, Shihfu Chang
    Abstract:

    Automatic video search based on Semantic concept detectors has recently received significant attention. Since the number of available detectors is much smaller than the size of human vocabulary, one major challenge is to select appropriate detectors to response user queries. In this paper, we propose a novel approach that leverages heterogeneous knowledge sources for domain adaptive video search. First, instead of utilizing WordNet as most existing works, we exploit the Context information associated with Flickr images to estimate query-detector similarity. The resulting measurement, named Flickr Context similarity (FCS), reflects the co-occurrence statistics of words in image Context rather than textual corpus. Starting from an initial detector set determined by FCS, our approach novelly transfers Semantic Context learned from test data domain to adaptively refine the query-detector similarity. The Semantic Context transfer process provides an effective means to cope with the domain shift between external knowledge source (e.g., Flickr Context) and test data, which is a critical issue in video search. To the best of our knowledge, this work represents the first research aiming to tackle the challenging issue of domain change in video search. Extensive experiments on 120 textual queries over TRECVID 2005-2008 data sets demonstrate the effectiveness of Semantic Context transfer for domain adaptive video search. Results also show that the FCS is suitable for measuring query-detector similarity, producing better performance to various other popular measures.

  • ACM Multimedia - Semantic Context transfer across heterogeneous sources for domain adaptive video search
    Proceedings of the seventeen ACM international conference on Multimedia - MM '09, 2009
    Co-Authors: Yugang Jiang, Chong Wah Ngo, Shihfu Chang
    Abstract:

    Automatic video search based on Semantic concept detectors has recently received significant attention. Since the number of available detectors is much smaller than the size of human vocabulary, one major challenge is to select appropriate detectors to response user queries. In this paper, we propose a novel approach that leverages heterogeneous knowledge sources for domain adaptive video search. First, instead of utilizing WordNet as most existing works, we exploit the Context information associated with Flickr images to estimate query-detector similarity. The resulting measurement, named Flickr Context similarity (FCS), reflects the co-occurrence statistics of words in image Context rather than textual corpus. Starting from an initial detector set determined by FCS, our approach novelly transfers Semantic Context learned from test data domain to adaptively refine the query-detector similarity. The Semantic Context transfer process provides an effective means to cope with the domain shift between external knowledge source (e.g., Flickr Context) and test data, which is a critical issue in video search. To the best of our knowledge, this work represents the first research aiming to tackle the challenging issue of domain change in video search. Extensive experiments on 120 textual queries over TRECVID 2005-2008 data sets demonstrate the effectiveness of Semantic Context transfer for domain adaptive video search. Results also show that the FCS is suitable for measuring query-detector similarity, producing better performance to various other popular measures.

  • Near-duplicate keyframe retrieval with visual keywords and Semantic Context
    Proceedings of the 6th ACM international conference on Image and video retrieval - CIVR '07, 2007
    Co-Authors: Xiao Wu, Wan-lei Zhao, Chong Wah Ngo
    Abstract:

    Near-duplicate keyframes (NDK) play a unique role in large-scale video search, news topic detection and tracking. In this paper, we propose a novel NDK retrieval approach by exploring both visual and textual cues from the visual vocabulary and Semantic Context respectively. The vocabulary, which provides entries for visual keywords, is formed by the clustering of local keypoints. The Semantic Context is inferred from the speech transcript surrounding a keyframe. We experiment the usefulness of visual keywords and Semantic Context, separately and jointly, using cosine similarity and language models. By linearly fusing both modalities, performance improvement is reported compared with the techniques with keypoint matching. While matching suffers from expensive computation due to the need of online nearest neighbor search, our approach is effective and efficient enough for online video search.

  • CIVR - Near-duplicate keyframe retrieval with visual keywords and Semantic Context
    Proceedings of the 6th ACM international conference on Image and video retrieval - CIVR '07, 2007
    Co-Authors: Wan-lei Zhao, Chong Wah Ngo
    Abstract:

    Near-duplicate keyframes (NDK) play a unique role in large-scale video search, news topic detection and tracking. In this paper, we propose a novel NDK retrieval approach by exploring both visual and textual cues from the visual vocabulary and Semantic Context respectively. The vocabulary, which provides entries for visual keywords, is formed by the clustering of local keypoints. The Semantic Context is inferred from the speech transcript surrounding a keyframe. We experiment the usefulness of visual keywords and Semantic Context, separately and jointly, using cosine similarity and language models. By linearly fusing both modalities, performance improvement is reported compared with the techniques with keypoint matching. While matching suffers from expensive computation due to the need of online nearest neighbor search, our approach is effective and efficient enough for online video search.

Wei-Ta Chu - One of the best experts on this subject based on the ideXlab platform.

  • Semantic Context detection using audio event fusion: Camera-ready version
    Eurasip Journal on Applied Signal Processing, 2006
    Co-Authors: Wei-Ta Chu, Wen-huang Cheng, Ja-ling Wu
    Abstract:

    Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish Semantic Context detection. Two levels of modeling, audio event and Semantic Context modeling, are devised to bridge the gap between physical audio features and Semantic concepts. In this work, hidden Markov models (HMMs) are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the Semantic Context level, generative (ergodic hidden Markov model) and discriminative (support vector machine (SVM)) approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.

  • Semantic Context detection based on hierarchical audio models
    Multimedia Information Retrieval, 2003
    Co-Authors: Wen-huang Cheng, Wei-Ta Chu
    Abstract:

    Semantic Context detection is one of the key techniques to facilitate efficient multimedia retrieval. Semantic Context is a scene that completely represents a meaningful information segment to human beings. In this paper, we propose a novel hierarchical approach that models the statistical characteristics of several audio events, over a time series, to accomplish Semantic Context detection. The approach consists of two stages: audio event and Semantic Context detections. HMMs are used to model basic audio events, and event detection is performed in the first stage. Then Semantic Context detection is achieved based on Gaussian mixture models, which model the correlations among several audio events temporally. With this framework, we bridge the gaps between low-level features and the Semantic Contexts that last in a time series. The experimental evaluations indicate that the approach is effective in detecting high-level Semantics.

  • Multimedia Information Retrieval - Semantic Context detection based on hierarchical audio models
    Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval - MIR '03, 2003
    Co-Authors: Wen-huang Cheng, Wei-Ta Chu
    Abstract:

    Semantic Context detection is one of the key techniques to facilitate efficient multimedia retrieval. Semantic Context is a scene that completely represents a meaningful information segment to human beings. In this paper, we propose a novel hierarchical approach that models the statistical characteristics of several audio events, over a time series, to accomplish Semantic Context detection. The approach consists of two stages: audio event and Semantic Context detections. HMMs are used to model basic audio events, and event detection is performed in the first stage. Then Semantic Context detection is achieved based on Gaussian mixture models, which model the correlations among several audio events temporally. With this framework, we bridge the gaps between low-level features and the Semantic Contexts that last in a time series. The experimental evaluations indicate that the approach is effective in detecting high-level Semantics.

  • ICME - A study of Semantic Context detection by using SVM and GMM approaches
    2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763), 1
    Co-Authors: Wei-Ta Chu, Wen-huang Cheng, J. Yung-jen Hsu
    Abstract:

    Semantic-level content analysis is a crucial issue to achieve efficient content retrieval and management. In this paper, we propose an hierarchical approach that models the statistical characteristics of several audio events over a time series to accomplish Semantic Context detection. Two stages, including audio event and Semantic Context modeling/testing, are devised to bridge the Semantic gap between physical audio features and Semantic concepts. HMM are used to model audio events, and SVM and GMM are used to fuse the characteristics of various audio events related to some specific Semantic concepts. The experimental results show that the approach is effective in detecting Semantic Context. The comparison between SVM- and GMM-based approaches is also studied

  • MMM - Generative and Discriminative Modeling toward Semantic Context Detection in Audio Tracks
    11th International Multimedia Modelling Conference, 1
    Co-Authors: Wei-Ta Chu, Wen-huang Cheng
    Abstract:

    Semantic-level content analysis is a crucial issue to achieve efficient content retrieval and management. We propose a hierarchical approach that models the statistical characteristics of several audio events over a time series to accomplish Semantic Context detection. Two stages, including audio event and Semantic Context modeling/testing, are devised to bridge the Semantic gap between physical audio features and Semantic concepts. For action movies we focused in this work, hidden Markov models (HMMs) are used to model four representative audio events, i.e. gunshot, explosion, car-braking, and engine sounds. At the Semantic Context level, generative (ergodic hidden Markov model) and discriminative (support vector machine, SVM) approaches are investigated to fuse the characteristics and correlations among various audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and draw a sketch for Semantic indexing and retrieval. Moreover, the differences between two fusion schemes are discussed to be the reference for future research.

Wen-huang Cheng - One of the best experts on this subject based on the ideXlab platform.

  • Semantic Context detection using audio event fusion: Camera-ready version
    Eurasip Journal on Applied Signal Processing, 2006
    Co-Authors: Wei-Ta Chu, Wen-huang Cheng, Ja-ling Wu
    Abstract:

    Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish Semantic Context detection. Two levels of modeling, audio event and Semantic Context modeling, are devised to bridge the gap between physical audio features and Semantic concepts. In this work, hidden Markov models (HMMs) are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the Semantic Context level, generative (ergodic hidden Markov model) and discriminative (support vector machine (SVM)) approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.

  • Semantic Context detection based on hierarchical audio models
    Multimedia Information Retrieval, 2003
    Co-Authors: Wen-huang Cheng, Wei-Ta Chu
    Abstract:

    Semantic Context detection is one of the key techniques to facilitate efficient multimedia retrieval. Semantic Context is a scene that completely represents a meaningful information segment to human beings. In this paper, we propose a novel hierarchical approach that models the statistical characteristics of several audio events, over a time series, to accomplish Semantic Context detection. The approach consists of two stages: audio event and Semantic Context detections. HMMs are used to model basic audio events, and event detection is performed in the first stage. Then Semantic Context detection is achieved based on Gaussian mixture models, which model the correlations among several audio events temporally. With this framework, we bridge the gaps between low-level features and the Semantic Contexts that last in a time series. The experimental evaluations indicate that the approach is effective in detecting high-level Semantics.

  • Multimedia Information Retrieval - Semantic Context detection based on hierarchical audio models
    Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval - MIR '03, 2003
    Co-Authors: Wen-huang Cheng, Wei-Ta Chu
    Abstract:

    Semantic Context detection is one of the key techniques to facilitate efficient multimedia retrieval. Semantic Context is a scene that completely represents a meaningful information segment to human beings. In this paper, we propose a novel hierarchical approach that models the statistical characteristics of several audio events, over a time series, to accomplish Semantic Context detection. The approach consists of two stages: audio event and Semantic Context detections. HMMs are used to model basic audio events, and event detection is performed in the first stage. Then Semantic Context detection is achieved based on Gaussian mixture models, which model the correlations among several audio events temporally. With this framework, we bridge the gaps between low-level features and the Semantic Contexts that last in a time series. The experimental evaluations indicate that the approach is effective in detecting high-level Semantics.

  • ICME - A study of Semantic Context detection by using SVM and GMM approaches
    2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763), 1
    Co-Authors: Wei-Ta Chu, Wen-huang Cheng, J. Yung-jen Hsu
    Abstract:

    Semantic-level content analysis is a crucial issue to achieve efficient content retrieval and management. In this paper, we propose an hierarchical approach that models the statistical characteristics of several audio events over a time series to accomplish Semantic Context detection. Two stages, including audio event and Semantic Context modeling/testing, are devised to bridge the Semantic gap between physical audio features and Semantic concepts. HMM are used to model audio events, and SVM and GMM are used to fuse the characteristics of various audio events related to some specific Semantic concepts. The experimental results show that the approach is effective in detecting Semantic Context. The comparison between SVM- and GMM-based approaches is also studied

  • MMM - Generative and Discriminative Modeling toward Semantic Context Detection in Audio Tracks
    11th International Multimedia Modelling Conference, 1
    Co-Authors: Wei-Ta Chu, Wen-huang Cheng
    Abstract:

    Semantic-level content analysis is a crucial issue to achieve efficient content retrieval and management. We propose a hierarchical approach that models the statistical characteristics of several audio events over a time series to accomplish Semantic Context detection. Two stages, including audio event and Semantic Context modeling/testing, are devised to bridge the Semantic gap between physical audio features and Semantic concepts. For action movies we focused in this work, hidden Markov models (HMMs) are used to model four representative audio events, i.e. gunshot, explosion, car-braking, and engine sounds. At the Semantic Context level, generative (ergodic hidden Markov model) and discriminative (support vector machine, SVM) approaches are investigated to fuse the characteristics and correlations among various audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and draw a sketch for Semantic indexing and retrieval. Moreover, the differences between two fusion schemes are discussed to be the reference for future research.

Georges Linares - One of the best experts on this subject based on the ideXlab platform.

  • Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition
    IEEE ACM Transactions on Audio Speech and Language Processing, 2017
    Co-Authors: Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linares
    Abstract:

    The diachronic nature of broadcast news data leads to the problem of Out-Of-Vocabulary (OOV) words in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are Proper Names (PNs). However PNs are important for automatic indexing of audio-video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the Semantic Context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and Semantic Context derived from Latent Dirichlet Allocation (LDA) topic models, continuous word vector representations and the Neural Bag-of-Words (NBOW) model which is capable of learning task specific word and Context representations. We propose a Neural Bag-of-Weighted Words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed Context models.

  • Document Level Semantic Context for Retrieving OOV Proper Names
    2016
    Co-Authors: Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linares
    Abstract:

    Recognition of Proper Names (PNs) in speech is important for content based indexing and browsing of audio-video data. However, many PNs are Out-Of-Vocabulary (OOV) words nfor LVCSR systems used in these applications due to the diachronic nature of data. By exploiting Semantic Context of the audio, relevant OOV PNs can be retrieved and then the target PNs can be recovered. To retrieve OOV PNs, we propose to represent their Context with document level Semantic vectors; and show that this approach is able to handle less frequent OOV PNs in the training data. We study different representations, including Random Projections, LSA, LDA, Skip-gram, CBOW and GloVe. A further evaluation of recovery of target OOV PNs using a phonetic search shows that document level Semantic Context is reliable for recovery of OOV PNs.

  • ICASSP - Document level Semantic Context for retrieving OOV proper names
    2016 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2016
    Co-Authors: Imran Sheikh, Dominique Fohr, Irina Ulina, Georges Linares
    Abstract:

    Recognition of Proper Names (PNs) in speech is important for content based indexing and browsing of audio-video data. However, many PNs are Out-Of-Vocabulary (OOV) words for LVCSR systems used in these applications due to the diachronic nature of data. By exploiting Semantic Context of the audio, relevant OOV PNs can be retrieved and then the target PNs can be recovered. To retrieve OOV PNs, we propose to represent their Context with document level Semantic vectors; and show that this approach is able to handle less frequent OOV PNs in the training data. We study different representations, including Random Projections, LSA, LDA, Skip-gram, CBOW and GloVe. A further evaluation of recovery of target OOV PNs using a phonetic search shows that document level Semantic Context is reliable for recovery of OOV PNs.

Yang Zhang - One of the best experts on this subject based on the ideXlab platform.

  • Mandarin-Speaking Children's Speech Recognition: Developmental Changes in the Influences of Semantic Context and F0 Contours
    Frontiers in psychology, 2017
    Co-Authors: Hong Zhou, Linjun Zhang, Hua Shu, Meng Liang, Connie Qun Guan, Yang Zhang
    Abstract:

    The goal of this developmental speech perception study was to assess whether and how age group modulated the influences of high-level Semantic Context and low-level fundamental frequency (F0) contours on the recognition of Mandarin speech by elementary and middle-school-aged children in quiet and interference backgrounds. The results revealed different patterns for Semantic and F0 information. One the one hand, age group modulated significantly the use of F0 contours, indicating that elementary school children relied more on natural F0 contours than middle school children during Mandarin speech recognition. On the other hand, there was no significant modulation effect of age group on Semantic Context, indicating that children of both age groups used Semantic Context to assist speech recognition to a similar extent. Furthermore, the significant modulation effect of age group on the interaction between F0 contours and Semantic Context revealed that younger children could not make better use of Semantic Context in recognizing speech with flat F0 contours compared with natural F0 contours, while older children could benefit from Semantic Context even when natural F0 contours were altered, thus confirming the important role of F0 contours in Mandarin speech recognition by elementary school children. The developmental changes in the effects of high-level Semantic and low-level F0 information on speech recognition might reflect the differences in auditory and cognitive resources associated with processing of the two types of information in speech perception.

  • Use of Semantic Context and F0 contours by older listeners during Mandarin speech recognition in quiet and single-talker interference conditions
    The Journal of the Acoustical Society of America, 2017
    Co-Authors: Wei Jiang, Linjun Zhang, Hua Shu, Yang Zhang
    Abstract:

    This study followed up Wang, Shu, Zhang, Liu, and Zhang [(2013). J. Acoust. Soc. Am. 34(1), EL91-EL97] to investigate factors influencing older listeners' Mandarin speech recognition in quiet vs single-talker interference. Listening condition significantly interacted with F0 contours but not with Semantic Context, revealing that natural F0 contours provided benefit in the interference condition whereas Semantic Context contributed similarly to both conditions. Furthermore, the significant interaction between Semantic Context and F0 contours demonstrated the importance of Semantic Context when F0 was flattened. Together, findings from the two studies indicate that aging differentially affects tonal language speakers' dependence on F0 contours and Semantic Context for speech perception in suboptimal conditions.

  • Effects of Semantic Context and Fundamental Frequency Contours on Mandarin Speech Recognition by Second Language Learners.
    Frontiers in psychology, 2016
    Co-Authors: Linjun Zhang, Hua Shu, Yang Zhang
    Abstract:

    Speech recognition by second language (L2) learners in optimal and suboptimal conditions has been examined extensively with English as the target language in most previous studies. This study extended existing experimental protocols (Wang et al., 2013) to investigate Mandarin speech recognition by Japanese learners of Mandarin at two different levels (elementary vs. intermediate) of proficiency. The overall results showed that in addition to L2 proficiency, Semantic Context, F0 contours, and listening condition all affected the recognition performance on the Mandarin sentences. However, the effects of Semantic Context and F0 contours on L2 speech recognition diverged to some extent. Specifically, there was significant modulation effect of listening condition on Semantic Context, indicating that L2 learners made use of Semantic Context less efficiently in the interfering background than in quiet. In contrast, no significant modulation effect of listening condition on F0 contours was found. Furthermore, there was significant interaction between Semantic Context and F0 contours, indicating that Semantic Context becomes more important for L2 speech recognition when F0 information is degraded. None of these effects were found to be modulated by L2 proficiency. The discrepancy in the effects of Semantic Context and F0 contours on L2 speech recognition in the interfering background might be related to differences in processing capacities required by the two types of information in adverse listening conditions.