The Experts below are selected from a list of 4908 Experts worldwide ranked by ideXlab platform
Naokazu Yokoya - One of the best experts on this subject based on the ideXlab platform.
-
ACCV (5) - Video Summarization Using Deep Semantic Features
Computer Vision – ACCV 2016, 2017Co-Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu YokoyaAbstract:This paper presents a Video Summarization technique for an Internet Video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original Video requires to understand its content. Furthermore the content of Internet Videos is very diverse, ranging from home Videos to documentaries, which makes Video Summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep Video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard Video Summarization techniques. For this, we design a deep neural network that maps Videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of Videos and descriptions. To generate a Video summary, we extract the deep features from each segment of the original Video and apply a clustering-based Summarization technique to them. We evaluate our Video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a Video Summarization technique.
-
Video Summarization using Deep Semantic Features
arXiv: Computer Vision and Pattern Recognition, 2016Co-Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu YokoyaAbstract:This paper presents a Video Summarization technique for an Internet Video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original Video requires to understand its content. Furthermore the content of Internet Videos is very diverse, ranging from home Videos to documentaries, which makes Video Summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep Video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard Video Summarization techniques. For this, we design a deep neural network that maps Videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of Videos and descriptions. To generate a Video summary, we extract the deep features from each segment of the original Video and apply a clustering-based Summarization technique to them. We evaluate our Video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a Video Summarization technique.
-
ICME - Textual description-based Video Summarization for Video blogs
2015 IEEE International Conference on Multimedia and Expo (ICME), 2015Co-Authors: Mayu Otani, Yuta Nakashima, Tomokazu Sato, Naokazu YokoyaAbstract:Recent popularization of camera devices, including action cams and smartphones, enables us to record Videos in everyday life and share them through the Internet. Video blog is a recent approach for sharing Videos, in which users enjoy expressing themselves in blog posts with attractive Videos. Generating such Videos, however, requires users to review vast amount of raw Videos and edit them appropriately, which keeps users away from doing so. In this paper, we propose a novel Video Summarization method for helping users to create a Video blog post. Unlike typical Video Summarization methods, the proposed method utilizes the text, which is written for a Video blog post, and makes the Video summary consistent with the content of the text. For this, we perform Video Summarization by solving an optimization problem, in which an objective function involves the content similarity between the summarized Video and the text. Our user study with 20 participants has demonstrated that our proposed method is suitable to create Video blog posts compared with conventional methods for Video Summarization.
Sheng-hua Zhong - One of the best experts on this subject based on the ideXlab platform.
-
Dynamic graph convolutional network for multi-Video Summarization
Pattern Recognition, 2020Co-Authors: Sheng-hua Zhong, Yan LiuAbstract:Abstract Multi-Video Summarization is an effective tool for users to browse multiple Videos. In this paper, multi-Video Summarization is formulated as a graph analysis problem and a dynamic graph convolutional network is proposed to measure the importance and relevance of each Video shot in its own Video as well as in the whole Video collection. Two strategies are proposed to solve the inherent class imbalance problem of Video Summarization task. Moreover, we propose a diversity regularization to encourage the model to generate a diverse summary. Extensive experiments are conducted, and the comparisons are carried out with the state-of-the-art Video Summarization methods, the traditional and novel graph models. Our method achieves state-of-the-art performances on two standard Video Summarization datasets. The results demonstrate the effectiveness of our proposed model in generating a representative summary for multiple Videos with good diversity.
-
Video Summarization via spatio-temporal deep architecture
Neurocomputing, 2019Co-Authors: Sheng-hua Zhong, Jianmin JiangAbstract:Abstract Video Summarization has unprecedented importance to help us overview current ever-growing amount of Video collections. In this paper, we propose a novel dynamic Video Summarization model based on deep learning architecture. We are the first to solve the imbalanced class distribution problem in Video Summarization. The over-sampling algorithm is used to balance the class distribution on training data. The novel two-stream deep architecture with the cost-sensitive learning is proposed to handle the class imbalance problem in feature learning. In the spatial stream, RGB images are used to represent the appearance of Video frames, and in the temporal stream, multi-frame motion vectors with deep learning framework is firstly introduced to represent and extract temporal information of the input Video. The proposed method is evaluated on two standard Video Summarization datasets and a standard emotional dataset. Empirical validations for Video Summarization demonstrate that our model achieves performance improvement over the existing and state-of-the-art methods. Moreover, the proposed method is able to highlight the Video content with the active level of arousal in affective computing task. In addition, the proposed frame-based model has another advantage. It can automatically preserve the connection between consecutive frames. Although the summary is constructed based on the frame level, the final summary is comprised of informative and continuous segments instead of individual separate frames.
-
PCM (2) - Gaze Aware Deep Learning Model for Video Summarization
Advances in Multimedia Information Processing – PCM 2018, 2018Co-Authors: Sheng-hua Zhong, Stephen J. Heinen, Jianmin JiangAbstract:Video Summarization is an ideal tool for skimming Videos. Previous computational models extract explicit information from the input Video, such as visual appearance, motion or audio information, in order to generate informative summaries. Eye gaze information, which is an implicit clue, has proved useful for indicating important content and the viewer’s interest. In this paper, we propose a novel gaze-aware deep learning model for Video Summarization. In our model, the position and velocity of the observers’ raw eye movements are processed by the deep neural network to indicate the users’ preferences. Experiments on two widely used Video Summarization datasets show that our model is more proficient than state-of-the-art methods in summarizing Video for characterizing general preferences as well as for personal preferences. The results provide an innovative and improved algorithm for using gaze information in Video Summarization.
-
Foveated convolutional neural networks for Video Summarization
Multimedia Tools and Applications, 2018Co-Authors: Sheng-hua Zhong, Stephen J. Heinen, Jianmin JiangAbstract:With the proliferation of Video data, Video Summarization is an ideal tool for users to browse Video content rapidly. In this paper, we propose a novel foveated convolutional neural networks for dynamic Video Summarization. We are the first to integrate gaze information into a deep learning network for Video Summarization. Foveated images are constructed based on subjects’ eye movements to represent the spatial information of the input Video. Multi-frame motion vectors are stacked across several adjacent frames to convey the motion clues. To evaluate the proposed method, experiments are conducted on two Video Summarization benchmark datasets. The experimental results validate the effectiveness of the gaze information for Video Summarization despite the fact that the eye movements are collected from different subjects from those who generated summaries. Empirical validations also demonstrate that our proposed foveated convolutional neural networks for Video Summarization can achieve state-of-the-art performances on these benchmark datasets.
-
A novel clustering method for static Video Summarization
Multimedia Tools and Applications, 2016Co-Authors: Sheng-hua Zhong, Jianmin Jiang, Yunyun YangAbstract:Static Video Summarization is recognized as an effective way for users to quickly browse and comprehend large numbers of Videos. In this paper, we formulate static Video Summarization as a clustering problem. Inspired by the idea from high density peaks search clustering algorithm, we propose an effective clustering algorithm by integrating important properties of Video to gather similar frames into clusters. Finally, all clusters' center will be collected as static Video Summarization. Compared with existing clustering-based Video Summarization approaches, our work can detect frames which are highly relevant and generate representative clusters automatically. We evaluate our proposed work by comparing it with several state-of-the-art clustering-based Video Summarization methods and some classical clustering algorithms. The experimental results evidence that our proposed method has better performance and efficiency.
Guoping Qiu - One of the best experts on this subject based on the ideXlab platform.
-
FrameRank: A Text Processing Approach to Video Summarization
arXiv: Computation and Language, 2019Co-Authors: Zhuo Lei, Chao Zhang, Qian Zhang, Guoping QiuAbstract:Video Summarization has been extensively studied in the past decades. However, user-generated Video Summarization is much less explored since there lack large-scale Video datasets within which human-generated Video summaries are unambiguously defined and annotated. Toward this end, we propose a user-generated Video Summarization dataset - UGSum52 - that consists of 52 Videos (207 minutes). In constructing the dataset, because of the subjectivity of user-generated Video Summarization, we manually annotate 25 summaries for each Video, which are in total 1300 summaries. To the best of our knowledge, it is currently the largest dataset for user-generated Video Summarization. Based on this dataset, we present FrameRank, an unsupervised Video Summarization method that employs a frame-to-frame level affinity graph to identify coherent and informative frames to summarize a Video. We use the Kullback-Leibler(KL)-divergence-based graph to rank temporal segments according to the amount of semantic information contained in their frames. We illustrate the effectiveness of our method by applying it to three datasets SumMe, TVSum and UGSum52 and show it achieves state-of-the-art results.
-
ICME - FrameRank: A Text Processing Approach to Video Summarization
2019 IEEE International Conference on Multimedia and Expo (ICME), 2019Co-Authors: Zhuo Lei, Chao Zhang, Qian Zhang, Guoping QiuAbstract:Video Summarization has been extensively studied in the past decades. However, user-generated Video Summarization is much less explored since there lack large-scale Video datasets within which human-generated Video summaries are unambiguously defined and annotated. Toward this end, we propose a user-generated Video Summarization dataset - UGSum52 - that consists of 52 Videos (207 minutes). In constructing the dataset, because of the subjectivity of user-generated Video Summarization, we manually annotate 25 summaries for each Video, which are in total 1300 summaries. To the best of our knowledge, it is currently the largest dataset for user-generated Video Summarization. Based on this dataset, we present FrameRank, an unsupervised Video Summarization method that employs a frame-to-frame level affinity graph to identify coherent and informative frames to summarize a Video. We use the Kullback-Leibler(KL)-divergence-based graph to rank temporal segments according to the amount of semantic information contained in their frames. We illustrate the effectiveness of our method by applying it to three datasets SumMe, TVSum and UGSum52 and show it achieves state-of-the-art results.
Jianmin Jiang - One of the best experts on this subject based on the ideXlab platform.
-
Video Summarization via spatio-temporal deep architecture
Neurocomputing, 2019Co-Authors: Sheng-hua Zhong, Jianmin JiangAbstract:Abstract Video Summarization has unprecedented importance to help us overview current ever-growing amount of Video collections. In this paper, we propose a novel dynamic Video Summarization model based on deep learning architecture. We are the first to solve the imbalanced class distribution problem in Video Summarization. The over-sampling algorithm is used to balance the class distribution on training data. The novel two-stream deep architecture with the cost-sensitive learning is proposed to handle the class imbalance problem in feature learning. In the spatial stream, RGB images are used to represent the appearance of Video frames, and in the temporal stream, multi-frame motion vectors with deep learning framework is firstly introduced to represent and extract temporal information of the input Video. The proposed method is evaluated on two standard Video Summarization datasets and a standard emotional dataset. Empirical validations for Video Summarization demonstrate that our model achieves performance improvement over the existing and state-of-the-art methods. Moreover, the proposed method is able to highlight the Video content with the active level of arousal in affective computing task. In addition, the proposed frame-based model has another advantage. It can automatically preserve the connection between consecutive frames. Although the summary is constructed based on the frame level, the final summary is comprised of informative and continuous segments instead of individual separate frames.
-
PCM (2) - Gaze Aware Deep Learning Model for Video Summarization
Advances in Multimedia Information Processing – PCM 2018, 2018Co-Authors: Sheng-hua Zhong, Stephen J. Heinen, Jianmin JiangAbstract:Video Summarization is an ideal tool for skimming Videos. Previous computational models extract explicit information from the input Video, such as visual appearance, motion or audio information, in order to generate informative summaries. Eye gaze information, which is an implicit clue, has proved useful for indicating important content and the viewer’s interest. In this paper, we propose a novel gaze-aware deep learning model for Video Summarization. In our model, the position and velocity of the observers’ raw eye movements are processed by the deep neural network to indicate the users’ preferences. Experiments on two widely used Video Summarization datasets show that our model is more proficient than state-of-the-art methods in summarizing Video for characterizing general preferences as well as for personal preferences. The results provide an innovative and improved algorithm for using gaze information in Video Summarization.
-
Foveated convolutional neural networks for Video Summarization
Multimedia Tools and Applications, 2018Co-Authors: Sheng-hua Zhong, Stephen J. Heinen, Jianmin JiangAbstract:With the proliferation of Video data, Video Summarization is an ideal tool for users to browse Video content rapidly. In this paper, we propose a novel foveated convolutional neural networks for dynamic Video Summarization. We are the first to integrate gaze information into a deep learning network for Video Summarization. Foveated images are constructed based on subjects’ eye movements to represent the spatial information of the input Video. Multi-frame motion vectors are stacked across several adjacent frames to convey the motion clues. To evaluate the proposed method, experiments are conducted on two Video Summarization benchmark datasets. The experimental results validate the effectiveness of the gaze information for Video Summarization despite the fact that the eye movements are collected from different subjects from those who generated summaries. Empirical validations also demonstrate that our proposed foveated convolutional neural networks for Video Summarization can achieve state-of-the-art performances on these benchmark datasets.
-
A novel clustering method for static Video Summarization
Multimedia Tools and Applications, 2016Co-Authors: Sheng-hua Zhong, Jianmin Jiang, Yunyun YangAbstract:Static Video Summarization is recognized as an effective way for users to quickly browse and comprehend large numbers of Videos. In this paper, we formulate static Video Summarization as a clustering problem. Inspired by the idea from high density peaks search clustering algorithm, we propose an effective clustering algorithm by integrating important properties of Video to gather similar frames into clusters. Finally, all clusters' center will be collected as static Video Summarization. Compared with existing clustering-based Video Summarization approaches, our work can detect frames which are highly relevant and generate representative clusters automatically. We evaluate our proposed work by comparing it with several state-of-the-art clustering-based Video Summarization methods and some classical clustering algorithms. The experimental results evidence that our proposed method has better performance and efficiency.
Mayu Otani - One of the best experts on this subject based on the ideXlab platform.
-
ACCV (5) - Video Summarization Using Deep Semantic Features
Computer Vision – ACCV 2016, 2017Co-Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu YokoyaAbstract:This paper presents a Video Summarization technique for an Internet Video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original Video requires to understand its content. Furthermore the content of Internet Videos is very diverse, ranging from home Videos to documentaries, which makes Video Summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep Video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard Video Summarization techniques. For this, we design a deep neural network that maps Videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of Videos and descriptions. To generate a Video summary, we extract the deep features from each segment of the original Video and apply a clustering-based Summarization technique to them. We evaluate our Video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a Video Summarization technique.
-
Video Summarization using Deep Semantic Features
arXiv: Computer Vision and Pattern Recognition, 2016Co-Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu YokoyaAbstract:This paper presents a Video Summarization technique for an Internet Video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original Video requires to understand its content. Furthermore the content of Internet Videos is very diverse, ranging from home Videos to documentaries, which makes Video Summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep Video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard Video Summarization techniques. For this, we design a deep neural network that maps Videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of Videos and descriptions. To generate a Video summary, we extract the deep features from each segment of the original Video and apply a clustering-based Summarization technique to them. We evaluate our Video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a Video Summarization technique.
-
ICME - Textual description-based Video Summarization for Video blogs
2015 IEEE International Conference on Multimedia and Expo (ICME), 2015Co-Authors: Mayu Otani, Yuta Nakashima, Tomokazu Sato, Naokazu YokoyaAbstract:Recent popularization of camera devices, including action cams and smartphones, enables us to record Videos in everyday life and share them through the Internet. Video blog is a recent approach for sharing Videos, in which users enjoy expressing themselves in blog posts with attractive Videos. Generating such Videos, however, requires users to review vast amount of raw Videos and edit them appropriately, which keeps users away from doing so. In this paper, we propose a novel Video Summarization method for helping users to create a Video blog post. Unlike typical Video Summarization methods, the proposed method utilizes the text, which is written for a Video blog post, and makes the Video summary consistent with the content of the text. For this, we perform Video Summarization by solving an optimization problem, in which an objective function involves the content similarity between the summarized Video and the text. Our user study with 20 participants has demonstrated that our proposed method is suitable to create Video blog posts compared with conventional methods for Video Summarization.