Video Summarization

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 4908 Experts worldwide ranked by ideXlab platform

Naokazu Yokoya - One of the best experts on this subject based on the ideXlab platform.

  • ACCV (5) - Video Summarization Using Deep Semantic Features
    Computer Vision – ACCV 2016, 2017
    Co-Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
    Abstract:

    This paper presents a Video Summarization technique for an Internet Video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original Video requires to understand its content. Furthermore the content of Internet Videos is very diverse, ranging from home Videos to documentaries, which makes Video Summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep Video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard Video Summarization techniques. For this, we design a deep neural network that maps Videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of Videos and descriptions. To generate a Video summary, we extract the deep features from each segment of the original Video and apply a clustering-based Summarization technique to them. We evaluate our Video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a Video Summarization technique.

  • Video Summarization using Deep Semantic Features
    arXiv: Computer Vision and Pattern Recognition, 2016
    Co-Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
    Abstract:

    This paper presents a Video Summarization technique for an Internet Video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original Video requires to understand its content. Furthermore the content of Internet Videos is very diverse, ranging from home Videos to documentaries, which makes Video Summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep Video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard Video Summarization techniques. For this, we design a deep neural network that maps Videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of Videos and descriptions. To generate a Video summary, we extract the deep features from each segment of the original Video and apply a clustering-based Summarization technique to them. We evaluate our Video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a Video Summarization technique.

  • ICME - Textual description-based Video Summarization for Video blogs
    2015 IEEE International Conference on Multimedia and Expo (ICME), 2015
    Co-Authors: Mayu Otani, Yuta Nakashima, Tomokazu Sato, Naokazu Yokoya
    Abstract:

    Recent popularization of camera devices, including action cams and smartphones, enables us to record Videos in everyday life and share them through the Internet. Video blog is a recent approach for sharing Videos, in which users enjoy expressing themselves in blog posts with attractive Videos. Generating such Videos, however, requires users to review vast amount of raw Videos and edit them appropriately, which keeps users away from doing so. In this paper, we propose a novel Video Summarization method for helping users to create a Video blog post. Unlike typical Video Summarization methods, the proposed method utilizes the text, which is written for a Video blog post, and makes the Video summary consistent with the content of the text. For this, we perform Video Summarization by solving an optimization problem, in which an objective function involves the content similarity between the summarized Video and the text. Our user study with 20 participants has demonstrated that our proposed method is suitable to create Video blog posts compared with conventional methods for Video Summarization.

Sheng-hua Zhong - One of the best experts on this subject based on the ideXlab platform.

  • Dynamic graph convolutional network for multi-Video Summarization
    Pattern Recognition, 2020
    Co-Authors: Sheng-hua Zhong, Yan Liu
    Abstract:

    Abstract Multi-Video Summarization is an effective tool for users to browse multiple Videos. In this paper, multi-Video Summarization is formulated as a graph analysis problem and a dynamic graph convolutional network is proposed to measure the importance and relevance of each Video shot in its own Video as well as in the whole Video collection. Two strategies are proposed to solve the inherent class imbalance problem of Video Summarization task. Moreover, we propose a diversity regularization to encourage the model to generate a diverse summary. Extensive experiments are conducted, and the comparisons are carried out with the state-of-the-art Video Summarization methods, the traditional and novel graph models. Our method achieves state-of-the-art performances on two standard Video Summarization datasets. The results demonstrate the effectiveness of our proposed model in generating a representative summary for multiple Videos with good diversity.

  • Video Summarization via spatio-temporal deep architecture
    Neurocomputing, 2019
    Co-Authors: Sheng-hua Zhong, Jianmin Jiang
    Abstract:

    Abstract Video Summarization has unprecedented importance to help us overview current ever-growing amount of Video collections. In this paper, we propose a novel dynamic Video Summarization model based on deep learning architecture. We are the first to solve the imbalanced class distribution problem in Video Summarization. The over-sampling algorithm is used to balance the class distribution on training data. The novel two-stream deep architecture with the cost-sensitive learning is proposed to handle the class imbalance problem in feature learning. In the spatial stream, RGB images are used to represent the appearance of Video frames, and in the temporal stream, multi-frame motion vectors with deep learning framework is firstly introduced to represent and extract temporal information of the input Video. The proposed method is evaluated on two standard Video Summarization datasets and a standard emotional dataset. Empirical validations for Video Summarization demonstrate that our model achieves performance improvement over the existing and state-of-the-art methods. Moreover, the proposed method is able to highlight the Video content with the active level of arousal in affective computing task. In addition, the proposed frame-based model has another advantage. It can automatically preserve the connection between consecutive frames. Although the summary is constructed based on the frame level, the final summary is comprised of informative and continuous segments instead of individual separate frames.

  • PCM (2) - Gaze Aware Deep Learning Model for Video Summarization
    Advances in Multimedia Information Processing – PCM 2018, 2018
    Co-Authors: Sheng-hua Zhong, Stephen J. Heinen, Jianmin Jiang
    Abstract:

    Video Summarization is an ideal tool for skimming Videos. Previous computational models extract explicit information from the input Video, such as visual appearance, motion or audio information, in order to generate informative summaries. Eye gaze information, which is an implicit clue, has proved useful for indicating important content and the viewer’s interest. In this paper, we propose a novel gaze-aware deep learning model for Video Summarization. In our model, the position and velocity of the observers’ raw eye movements are processed by the deep neural network to indicate the users’ preferences. Experiments on two widely used Video Summarization datasets show that our model is more proficient than state-of-the-art methods in summarizing Video for characterizing general preferences as well as for personal preferences. The results provide an innovative and improved algorithm for using gaze information in Video Summarization.

  • Foveated convolutional neural networks for Video Summarization
    Multimedia Tools and Applications, 2018
    Co-Authors: Sheng-hua Zhong, Stephen J. Heinen, Jianmin Jiang
    Abstract:

    With the proliferation of Video data, Video Summarization is an ideal tool for users to browse Video content rapidly. In this paper, we propose a novel foveated convolutional neural networks for dynamic Video Summarization. We are the first to integrate gaze information into a deep learning network for Video Summarization. Foveated images are constructed based on subjects’ eye movements to represent the spatial information of the input Video. Multi-frame motion vectors are stacked across several adjacent frames to convey the motion clues. To evaluate the proposed method, experiments are conducted on two Video Summarization benchmark datasets. The experimental results validate the effectiveness of the gaze information for Video Summarization despite the fact that the eye movements are collected from different subjects from those who generated summaries. Empirical validations also demonstrate that our proposed foveated convolutional neural networks for Video Summarization can achieve state-of-the-art performances on these benchmark datasets.

  • A novel clustering method for static Video Summarization
    Multimedia Tools and Applications, 2016
    Co-Authors: Sheng-hua Zhong, Jianmin Jiang, Yunyun Yang
    Abstract:

    Static Video Summarization is recognized as an effective way for users to quickly browse and comprehend large numbers of Videos. In this paper, we formulate static Video Summarization as a clustering problem. Inspired by the idea from high density peaks search clustering algorithm, we propose an effective clustering algorithm by integrating important properties of Video to gather similar frames into clusters. Finally, all clusters' center will be collected as static Video Summarization. Compared with existing clustering-based Video Summarization approaches, our work can detect frames which are highly relevant and generate representative clusters automatically. We evaluate our proposed work by comparing it with several state-of-the-art clustering-based Video Summarization methods and some classical clustering algorithms. The experimental results evidence that our proposed method has better performance and efficiency.

Guoping Qiu - One of the best experts on this subject based on the ideXlab platform.

  • FrameRank: A Text Processing Approach to Video Summarization
    arXiv: Computation and Language, 2019
    Co-Authors: Zhuo Lei, Chao Zhang, Qian Zhang, Guoping Qiu
    Abstract:

    Video Summarization has been extensively studied in the past decades. However, user-generated Video Summarization is much less explored since there lack large-scale Video datasets within which human-generated Video summaries are unambiguously defined and annotated. Toward this end, we propose a user-generated Video Summarization dataset - UGSum52 - that consists of 52 Videos (207 minutes). In constructing the dataset, because of the subjectivity of user-generated Video Summarization, we manually annotate 25 summaries for each Video, which are in total 1300 summaries. To the best of our knowledge, it is currently the largest dataset for user-generated Video Summarization. Based on this dataset, we present FrameRank, an unsupervised Video Summarization method that employs a frame-to-frame level affinity graph to identify coherent and informative frames to summarize a Video. We use the Kullback-Leibler(KL)-divergence-based graph to rank temporal segments according to the amount of semantic information contained in their frames. We illustrate the effectiveness of our method by applying it to three datasets SumMe, TVSum and UGSum52 and show it achieves state-of-the-art results.

  • ICME - FrameRank: A Text Processing Approach to Video Summarization
    2019 IEEE International Conference on Multimedia and Expo (ICME), 2019
    Co-Authors: Zhuo Lei, Chao Zhang, Qian Zhang, Guoping Qiu
    Abstract:

    Video Summarization has been extensively studied in the past decades. However, user-generated Video Summarization is much less explored since there lack large-scale Video datasets within which human-generated Video summaries are unambiguously defined and annotated. Toward this end, we propose a user-generated Video Summarization dataset - UGSum52 - that consists of 52 Videos (207 minutes). In constructing the dataset, because of the subjectivity of user-generated Video Summarization, we manually annotate 25 summaries for each Video, which are in total 1300 summaries. To the best of our knowledge, it is currently the largest dataset for user-generated Video Summarization. Based on this dataset, we present FrameRank, an unsupervised Video Summarization method that employs a frame-to-frame level affinity graph to identify coherent and informative frames to summarize a Video. We use the Kullback-Leibler(KL)-divergence-based graph to rank temporal segments according to the amount of semantic information contained in their frames. We illustrate the effectiveness of our method by applying it to three datasets SumMe, TVSum and UGSum52 and show it achieves state-of-the-art results.

Jianmin Jiang - One of the best experts on this subject based on the ideXlab platform.

  • Video Summarization via spatio-temporal deep architecture
    Neurocomputing, 2019
    Co-Authors: Sheng-hua Zhong, Jianmin Jiang
    Abstract:

    Abstract Video Summarization has unprecedented importance to help us overview current ever-growing amount of Video collections. In this paper, we propose a novel dynamic Video Summarization model based on deep learning architecture. We are the first to solve the imbalanced class distribution problem in Video Summarization. The over-sampling algorithm is used to balance the class distribution on training data. The novel two-stream deep architecture with the cost-sensitive learning is proposed to handle the class imbalance problem in feature learning. In the spatial stream, RGB images are used to represent the appearance of Video frames, and in the temporal stream, multi-frame motion vectors with deep learning framework is firstly introduced to represent and extract temporal information of the input Video. The proposed method is evaluated on two standard Video Summarization datasets and a standard emotional dataset. Empirical validations for Video Summarization demonstrate that our model achieves performance improvement over the existing and state-of-the-art methods. Moreover, the proposed method is able to highlight the Video content with the active level of arousal in affective computing task. In addition, the proposed frame-based model has another advantage. It can automatically preserve the connection between consecutive frames. Although the summary is constructed based on the frame level, the final summary is comprised of informative and continuous segments instead of individual separate frames.

  • PCM (2) - Gaze Aware Deep Learning Model for Video Summarization
    Advances in Multimedia Information Processing – PCM 2018, 2018
    Co-Authors: Sheng-hua Zhong, Stephen J. Heinen, Jianmin Jiang
    Abstract:

    Video Summarization is an ideal tool for skimming Videos. Previous computational models extract explicit information from the input Video, such as visual appearance, motion or audio information, in order to generate informative summaries. Eye gaze information, which is an implicit clue, has proved useful for indicating important content and the viewer’s interest. In this paper, we propose a novel gaze-aware deep learning model for Video Summarization. In our model, the position and velocity of the observers’ raw eye movements are processed by the deep neural network to indicate the users’ preferences. Experiments on two widely used Video Summarization datasets show that our model is more proficient than state-of-the-art methods in summarizing Video for characterizing general preferences as well as for personal preferences. The results provide an innovative and improved algorithm for using gaze information in Video Summarization.

  • Foveated convolutional neural networks for Video Summarization
    Multimedia Tools and Applications, 2018
    Co-Authors: Sheng-hua Zhong, Stephen J. Heinen, Jianmin Jiang
    Abstract:

    With the proliferation of Video data, Video Summarization is an ideal tool for users to browse Video content rapidly. In this paper, we propose a novel foveated convolutional neural networks for dynamic Video Summarization. We are the first to integrate gaze information into a deep learning network for Video Summarization. Foveated images are constructed based on subjects’ eye movements to represent the spatial information of the input Video. Multi-frame motion vectors are stacked across several adjacent frames to convey the motion clues. To evaluate the proposed method, experiments are conducted on two Video Summarization benchmark datasets. The experimental results validate the effectiveness of the gaze information for Video Summarization despite the fact that the eye movements are collected from different subjects from those who generated summaries. Empirical validations also demonstrate that our proposed foveated convolutional neural networks for Video Summarization can achieve state-of-the-art performances on these benchmark datasets.

  • A novel clustering method for static Video Summarization
    Multimedia Tools and Applications, 2016
    Co-Authors: Sheng-hua Zhong, Jianmin Jiang, Yunyun Yang
    Abstract:

    Static Video Summarization is recognized as an effective way for users to quickly browse and comprehend large numbers of Videos. In this paper, we formulate static Video Summarization as a clustering problem. Inspired by the idea from high density peaks search clustering algorithm, we propose an effective clustering algorithm by integrating important properties of Video to gather similar frames into clusters. Finally, all clusters' center will be collected as static Video Summarization. Compared with existing clustering-based Video Summarization approaches, our work can detect frames which are highly relevant and generate representative clusters automatically. We evaluate our proposed work by comparing it with several state-of-the-art clustering-based Video Summarization methods and some classical clustering algorithms. The experimental results evidence that our proposed method has better performance and efficiency.

Mayu Otani - One of the best experts on this subject based on the ideXlab platform.

  • ACCV (5) - Video Summarization Using Deep Semantic Features
    Computer Vision – ACCV 2016, 2017
    Co-Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
    Abstract:

    This paper presents a Video Summarization technique for an Internet Video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original Video requires to understand its content. Furthermore the content of Internet Videos is very diverse, ranging from home Videos to documentaries, which makes Video Summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep Video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard Video Summarization techniques. For this, we design a deep neural network that maps Videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of Videos and descriptions. To generate a Video summary, we extract the deep features from each segment of the original Video and apply a clustering-based Summarization technique to them. We evaluate our Video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a Video Summarization technique.

  • Video Summarization using Deep Semantic Features
    arXiv: Computer Vision and Pattern Recognition, 2016
    Co-Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
    Abstract:

    This paper presents a Video Summarization technique for an Internet Video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original Video requires to understand its content. Furthermore the content of Internet Videos is very diverse, ranging from home Videos to documentaries, which makes Video Summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep Video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard Video Summarization techniques. For this, we design a deep neural network that maps Videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of Videos and descriptions. To generate a Video summary, we extract the deep features from each segment of the original Video and apply a clustering-based Summarization technique to them. We evaluate our Video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a Video Summarization technique.

  • ICME - Textual description-based Video Summarization for Video blogs
    2015 IEEE International Conference on Multimedia and Expo (ICME), 2015
    Co-Authors: Mayu Otani, Yuta Nakashima, Tomokazu Sato, Naokazu Yokoya
    Abstract:

    Recent popularization of camera devices, including action cams and smartphones, enables us to record Videos in everyday life and share them through the Internet. Video blog is a recent approach for sharing Videos, in which users enjoy expressing themselves in blog posts with attractive Videos. Generating such Videos, however, requires users to review vast amount of raw Videos and edit them appropriately, which keeps users away from doing so. In this paper, we propose a novel Video Summarization method for helping users to create a Video blog post. Unlike typical Video Summarization methods, the proposed method utilizes the text, which is written for a Video blog post, and makes the Video summary consistent with the content of the text. For this, we perform Video Summarization by solving an optimization problem, in which an objective function involves the content similarity between the summarized Video and the text. Our user study with 20 participants has demonstrated that our proposed method is suitable to create Video blog posts compared with conventional methods for Video Summarization.