Level Semantics

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3579 Experts worldwide ranked by ideXlab platform

Yu Zheng - One of the best experts on this subject based on the ideXlab platform.

  • Exploiting Mid-Level Semantics for Large-Scale Complex Video Classification
    IEEE Transactions on Multimedia, 2019
    Co-Authors: Ji Zhang, Yu Zheng
    Abstract:

    As the amount of available video data has grown substantially, automatic video classification has become an urgent yet challenging task. Most video classification methods focus on acquiring discriminative spacial visual features and motion patterns for video representation, especially deep learning methods, which have achieved very good results on action recognition problems. However, the performance of most of these methods drastically degenerates for more generic video classification tasks where the video contents are much more complex. Thus, in this paper, the mid-Level Semantics of videos are exploited to bridge the semantic gap between low-Level features and high-Level video Semantics. Inspired by the term ``frequency-inverse document frequency'', a word weighting method for the problem of text classification is introduced to the video domain. The visual objects in videos are regarded as the words in texts, and two new weighting methods are proposed to encode videos by weighting visual objects according to the characteristics of videos. In addition, the semantic similarities between video categories and visual objects are introduced from the text domain as privileged information to facilitate classifier training on the obtained semantic representations of videos. The proposed semantic encoding method (semantic stream) is then fused with the popular two-stream CNN model for the final classification results. Experiments are conducted on two large-scale complex video datasets, CCV and ActivityNet. The experimental results validate the effectiveness of the proposed methods.

  • ICPR - From Text to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification
    2018 24th International Conference on Pattern Recognition (ICPR), 2018
    Co-Authors: Ji Zhang, Xiao Wang, Yu Zheng
    Abstract:

    Automatically classifying large scale of video data is an urgent yet challenging task. To bridge the semantic gap between low-Level features and high-Level video Semantics, we propose a method to represent videos with their mid-Level Semantics. Inspired by the problem of text classification, we regard the visual objects in videos as the words in documents, and adapt the TF-IDF word weighting method to encode videos by visual objects. Some extensions upon the proposed method are also made according to the characteristics of videos. We integrate the proposed semantic encoding method with the popular two-stream CNN model for video classification. Experiments are conducted on two large-scale video datasets, CCV and ActivityNet. The experimanetal results validates the effectiveness of our method.

  • From Text to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification
    2018 24th International Conference on Pattern Recognition (ICPR), 2018
    Co-Authors: Ji Zhang, Xiao Wang, Yu Zheng
    Abstract:

    Automatically classifying large scale of video data is an urgent yet challenging task. To bridge the semantic gap between low-Level features and high-Level video Semantics, we propose a method to represent videos with their mid-Level Semantics. Inspired by the problem of text classification, we regard the visual objects in videos as the words in documents, and adapt the TF-IDF word weighting method to encode videos by visual objects. Some extensions upon the proposed method are also made according to the characteristics of videos. We integrate the proposed semantic encoding method with the popular two-stream CNN model for video classification. Experiments are conducted on two large-scale video datasets, CCV and ActivityNet. The experimanetal results validates the effectiveness of our method.

Ji Zhang - One of the best experts on this subject based on the ideXlab platform.

  • Exploiting Mid-Level Semantics for Large-Scale Complex Video Classification
    IEEE Transactions on Multimedia, 2019
    Co-Authors: Ji Zhang, Yu Zheng
    Abstract:

    As the amount of available video data has grown substantially, automatic video classification has become an urgent yet challenging task. Most video classification methods focus on acquiring discriminative spacial visual features and motion patterns for video representation, especially deep learning methods, which have achieved very good results on action recognition problems. However, the performance of most of these methods drastically degenerates for more generic video classification tasks where the video contents are much more complex. Thus, in this paper, the mid-Level Semantics of videos are exploited to bridge the semantic gap between low-Level features and high-Level video Semantics. Inspired by the term ``frequency-inverse document frequency'', a word weighting method for the problem of text classification is introduced to the video domain. The visual objects in videos are regarded as the words in texts, and two new weighting methods are proposed to encode videos by weighting visual objects according to the characteristics of videos. In addition, the semantic similarities between video categories and visual objects are introduced from the text domain as privileged information to facilitate classifier training on the obtained semantic representations of videos. The proposed semantic encoding method (semantic stream) is then fused with the popular two-stream CNN model for the final classification results. Experiments are conducted on two large-scale complex video datasets, CCV and ActivityNet. The experimental results validate the effectiveness of the proposed methods.

  • ICPR - From Text to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification
    2018 24th International Conference on Pattern Recognition (ICPR), 2018
    Co-Authors: Ji Zhang, Xiao Wang, Yu Zheng
    Abstract:

    Automatically classifying large scale of video data is an urgent yet challenging task. To bridge the semantic gap between low-Level features and high-Level video Semantics, we propose a method to represent videos with their mid-Level Semantics. Inspired by the problem of text classification, we regard the visual objects in videos as the words in documents, and adapt the TF-IDF word weighting method to encode videos by visual objects. Some extensions upon the proposed method are also made according to the characteristics of videos. We integrate the proposed semantic encoding method with the popular two-stream CNN model for video classification. Experiments are conducted on two large-scale video datasets, CCV and ActivityNet. The experimanetal results validates the effectiveness of our method.

  • From Text to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification
    2018 24th International Conference on Pattern Recognition (ICPR), 2018
    Co-Authors: Ji Zhang, Xiao Wang, Yu Zheng
    Abstract:

    Automatically classifying large scale of video data is an urgent yet challenging task. To bridge the semantic gap between low-Level features and high-Level video Semantics, we propose a method to represent videos with their mid-Level Semantics. Inspired by the problem of text classification, we regard the visual objects in videos as the words in documents, and adapt the TF-IDF word weighting method to encode videos by visual objects. Some extensions upon the proposed method are also made according to the characteristics of videos. We integrate the proposed semantic encoding method with the popular two-stream CNN model for video classification. Experiments are conducted on two large-scale video datasets, CCV and ActivityNet. The experimanetal results validates the effectiveness of our method.

Zhao-xiang Zhang - One of the best experts on this subject based on the ideXlab platform.

  • ICCV - Sequence Level Semantics Aggregation for Video Object Detection
    2019 IEEE CVF International Conference on Computer Vision (ICCV), 2019
    Co-Authors: Haiping Wu, Yuntao Chen, Naiyan Wang, Zhao-xiang Zhang
    Abstract:

    Video objection detection (VID) has been a rising research direction in recent years. A central issue of VID is the appearance degradation of video frames caused by fast motion. This problem is essentially ill-posed for a single frame. Therefore, aggregating features from other frames becomes a natural choice. Existing methods rely heavily on optical flow or recurrent neural networks for feature aggregation. However, these methods emphasize more on the temporally nearby frames. In this work, we argue that aggregating features in the full-sequence Level will lead to more discriminative and robust features for video object detection. To achieve this goal, we devise a novel Sequence Level Semantics Aggregation (SELSA) module. We further demonstrate the close relationship between the proposed method and the classic spectral clustering method, providing a novel view for understanding the VID problem. We test the proposed method on the ImageNet VID and the EPIC KITCHENS dataset and achieve new state-of-the-art results. Our method does not need complicated postprocessing methods such as Seq-NMS or Tubelet rescoring, which keeps the pipeline simple and clean.

  • Sequence Level Semantics Aggregation for Video Object Detection
    2019 IEEE CVF International Conference on Computer Vision (ICCV), 2019
    Co-Authors: Haiping Wu, Yuntao Chen, Naiyan Wang, Zhao-xiang Zhang
    Abstract:

    Video objection detection (VID) has been a rising research direction in recent years. A central issue of VID is the appearance degradation of video frames caused by fast motion. This problem is essentially ill-posed for a single frame. Therefore, aggregating features from other frames becomes a natural choice. Existing methods rely heavily on optical flow or recurrent neural networks for feature aggregation. However, these methods emphasize more on the temporally nearby frames. In this work, we argue that aggregating features in the full-sequence Level will lead to more discriminative and robust features for video object detection. To achieve this goal, we devise a novel Sequence Level Semantics Aggregation (SELSA) module. We further demonstrate the close relationship between the proposed method and the classic spectral clustering method, providing a novel view for understanding the VID problem. We test the proposed method on the ImageNet VID and the EPIC KITCHENS dataset and achieve new state-of-the-art results. Our method does not need complicated postprocessing methods such as Seq-NMS or Tubelet rescoring, which keeps the pipeline simple and clean.

Raquel Urtasun - One of the best experts on this subject based on the ideXlab platform.

  • ICCV - Understanding High-Level Semantics by Modeling Traffic Patterns
    2013 IEEE International Conference on Computer Vision, 2013
    Co-Authors: Hongyi Zhang, Andreas Geiger, Raquel Urtasun
    Abstract:

    In this paper, we are interested in understanding the Semantics of outdoor scenes in the context of autonomous driving. Towards this goal, we propose a generative model of 3D urban scenes which is able to reason not only about the geometry and objects present in the scene, but also about the high-Level Semantics in the form of traffic patterns. We found that a small number of patterns is sufficient to model the vast majority of traffic scenes and show how these patterns can be learned. As evidenced by our experiments, this high-Level reasoning significantly improves the overall scene estimation as well as the vehicle-to-lane association when compared to state-of-the-art approaches.

  • Understanding High-Level Semantics by Modeling Traffic Patterns
    2013 IEEE International Conference on Computer Vision, 2013
    Co-Authors: Hongyi Zhang, Andreas Geiger, Raquel Urtasun
    Abstract:

    In this paper, we are interested in understanding the Semantics of outdoor scenes in the context of autonomous driving. Towards this goal, we propose a generative model of 3D urban scenes which is able to reason not only about the geometry and objects present in the scene, but also about the high-Level Semantics in the form of traffic patterns. We found that a small number of patterns is sufficient to model the vast majority of traffic scenes and show how these patterns can be learned. As evidenced by our experiments, this high-Level reasoning significantly improves the overall scene estimation as well as the vehicle-to-lane association when compared to state-of-the-art approaches.

Haiping Wu - One of the best experts on this subject based on the ideXlab platform.

  • ICCV - Sequence Level Semantics Aggregation for Video Object Detection
    2019 IEEE CVF International Conference on Computer Vision (ICCV), 2019
    Co-Authors: Haiping Wu, Yuntao Chen, Naiyan Wang, Zhao-xiang Zhang
    Abstract:

    Video objection detection (VID) has been a rising research direction in recent years. A central issue of VID is the appearance degradation of video frames caused by fast motion. This problem is essentially ill-posed for a single frame. Therefore, aggregating features from other frames becomes a natural choice. Existing methods rely heavily on optical flow or recurrent neural networks for feature aggregation. However, these methods emphasize more on the temporally nearby frames. In this work, we argue that aggregating features in the full-sequence Level will lead to more discriminative and robust features for video object detection. To achieve this goal, we devise a novel Sequence Level Semantics Aggregation (SELSA) module. We further demonstrate the close relationship between the proposed method and the classic spectral clustering method, providing a novel view for understanding the VID problem. We test the proposed method on the ImageNet VID and the EPIC KITCHENS dataset and achieve new state-of-the-art results. Our method does not need complicated postprocessing methods such as Seq-NMS or Tubelet rescoring, which keeps the pipeline simple and clean.

  • Sequence Level Semantics Aggregation for Video Object Detection
    2019 IEEE CVF International Conference on Computer Vision (ICCV), 2019
    Co-Authors: Haiping Wu, Yuntao Chen, Naiyan Wang, Zhao-xiang Zhang
    Abstract:

    Video objection detection (VID) has been a rising research direction in recent years. A central issue of VID is the appearance degradation of video frames caused by fast motion. This problem is essentially ill-posed for a single frame. Therefore, aggregating features from other frames becomes a natural choice. Existing methods rely heavily on optical flow or recurrent neural networks for feature aggregation. However, these methods emphasize more on the temporally nearby frames. In this work, we argue that aggregating features in the full-sequence Level will lead to more discriminative and robust features for video object detection. To achieve this goal, we devise a novel Sequence Level Semantics Aggregation (SELSA) module. We further demonstrate the close relationship between the proposed method and the classic spectral clustering method, providing a novel view for understanding the VID problem. We test the proposed method on the ImageNet VID and the EPIC KITCHENS dataset and achieve new state-of-the-art results. Our method does not need complicated postprocessing methods such as Seq-NMS or Tubelet rescoring, which keeps the pipeline simple and clean.