Learning Framework

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 585522 Experts worldwide ranked by ideXlab platform

Fernando Pereira - One of the best experts on this subject based on the ideXlab platform.

  • a double deep spatio angular Learning Framework for light field based face recognition
    IEEE Transactions on Circuits and Systems for Video Technology, 2020
    Co-Authors: Alireza Sepasmoghaddam, Mohammad A Haque, Paulo Lobato Correia, Kamal Nasrollahi, Thomas B Moeslund, Fernando Pereira
    Abstract:

    Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-deep spatio-angular Learning Framework for light field-based face recognition, which is able to model both the intra-view/spatial and inter-view/angular information using two deep networks in sequence. This is a novel recognition Framework that has never been proposed in the literature for face recognition or any other visual recognition task. The proposed double-deep Learning Framework includes a long short-term memory (LSTM) recurrent network, whose inputs are VGG-Face descriptions, computed using a VGG-16 convolutional neural network (CNN). The VGG-Face spatial descriptions are extracted from a selected set of 2D sub-aperture (SA) images rendered from the light field image, corresponding to different observation angles. A sequence of the VGG-Face spatial descriptions is then analyzed by the LSTM network. A comprehensive set of experiments has been conducted using the IST-EURECOM light field face database, addressing varied and challenging recognition tasks. The results show that the proposed Framework achieves superior face recognition performance when compared to the state of the art.

  • a double deep spatio angular Learning Framework for light field based face recognition
    arXiv: Computer Vision and Pattern Recognition, 2018
    Co-Authors: Alireza Sepasmoghaddam, Mohammad A Haque, Paulo Lobato Correia, Kamal Nasrollahi, Thomas B Moeslund, Fernando Pereira
    Abstract:

    Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-deep spatio-angular Learning Framework for light field based face recognition, which is able to learn both texture and angular dynamics in sequence using convolutional representations; this is a novel recognition Framework that has never been proposed before for either face recognition or any other visual recognition task. The proposed double-deep Learning Framework includes a long short-term memory (LSTM) recurrent network whose inputs are VGG-Face descriptions that are computed using a VGG-Very-Deep-16 convolutional neural network (CNN). The VGG-16 network uses different face viewpoints rendered from a full light field image, which are organised as a pseudo-video sequence. A comprehensive set of experiments has been conducted with the IST-EURECOM light field face database, for varied and challenging recognition tasks. Results show that the proposed Framework achieves superior face recognition performance when compared to the state-of-the-art.

Alireza Sepasmoghaddam - One of the best experts on this subject based on the ideXlab platform.

  • a double deep spatio angular Learning Framework for light field based face recognition
    IEEE Transactions on Circuits and Systems for Video Technology, 2020
    Co-Authors: Alireza Sepasmoghaddam, Mohammad A Haque, Paulo Lobato Correia, Kamal Nasrollahi, Thomas B Moeslund, Fernando Pereira
    Abstract:

    Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-deep spatio-angular Learning Framework for light field-based face recognition, which is able to model both the intra-view/spatial and inter-view/angular information using two deep networks in sequence. This is a novel recognition Framework that has never been proposed in the literature for face recognition or any other visual recognition task. The proposed double-deep Learning Framework includes a long short-term memory (LSTM) recurrent network, whose inputs are VGG-Face descriptions, computed using a VGG-16 convolutional neural network (CNN). The VGG-Face spatial descriptions are extracted from a selected set of 2D sub-aperture (SA) images rendered from the light field image, corresponding to different observation angles. A sequence of the VGG-Face spatial descriptions is then analyzed by the LSTM network. A comprehensive set of experiments has been conducted using the IST-EURECOM light field face database, addressing varied and challenging recognition tasks. The results show that the proposed Framework achieves superior face recognition performance when compared to the state of the art.

  • a double deep spatio angular Learning Framework for light field based face recognition
    arXiv: Computer Vision and Pattern Recognition, 2018
    Co-Authors: Alireza Sepasmoghaddam, Mohammad A Haque, Paulo Lobato Correia, Kamal Nasrollahi, Thomas B Moeslund, Fernando Pereira
    Abstract:

    Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-deep spatio-angular Learning Framework for light field based face recognition, which is able to learn both texture and angular dynamics in sequence using convolutional representations; this is a novel recognition Framework that has never been proposed before for either face recognition or any other visual recognition task. The proposed double-deep Learning Framework includes a long short-term memory (LSTM) recurrent network whose inputs are VGG-Face descriptions that are computed using a VGG-Very-Deep-16 convolutional neural network (CNN). The VGG-16 network uses different face viewpoints rendered from a full light field image, which are organised as a pseudo-video sequence. A comprehensive set of experiments has been conducted with the IST-EURECOM light field face database, for varied and challenging recognition tasks. Results show that the proposed Framework achieves superior face recognition performance when compared to the state-of-the-art.

Xiangyang Xue - One of the best experts on this subject based on the ideXlab platform.

  • modeling multimodal clues in a hybrid deep Learning Framework for video classification
    IEEE Transactions on Multimedia, 2018
    Co-Authors: Yugang Jiang, Jinhui Tang, Xiangyang Xue, Shihfu Chang
    Abstract:

    Videos are inherently multimodal. This paper studies the problem of exploiting the abundant multimodal clues for improved video classification performance. We introduce a novel hybrid deep Learning Framework that integrates useful clues from multiple modalities, including static spatial appearance information, motion patterns within a short time window, audio information, as well as long-range temporal dynamics. More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion, and audio signals to extract their corresponding features. We then employ a feature fusion network to derive a unified representation with an aim to capture the relationships among features. Furthermore, to exploit the long-range temporal dynamics in videos, we apply two long short-term memory (LSTM) networks with extracted appearance and motion features as inputs. Finally, we also propose refining the prediction scores by leveraging contextual relationships among video semantics. The hybrid deep Learning Framework is able to exploit a comprehensive set of multimodal features for video classification. Through an extensive set of experiments, we demonstrate that: 1) LSTM networks that model sequences in an explicitly recurrent manner are highly complementary to the CNN models; 2) the feature fusion network that produces a fused representation through modeling feature relationships outperforms a large set of alternative fusion strategies; and 3) the semantic context of video classes can help further refine the predictions for improved performance. Experimental results on two challenging benchmarks—the UCF-101 and the Columbia Consumer Videos (CCV)—provide strong quantitative evidence that our Framework can produce promising results: ${\text{93.1}\%}$ on the UCF-101 and ${\text{84.5}\%}$ on the CCV, outperforming several competing methods with clear margins.

  • modeling multimodal clues in a hybrid deep Learning Framework for video classification
    arXiv: Multimedia, 2017
    Co-Authors: Yugang Jiang, Jinhui Tang, Xiangyang Xue, Shihfu Chang
    Abstract:

    Videos are inherently multimodal. This paper studies the problem of how to fully exploit the abundant multimodal clues for improved video categorization. We introduce a hybrid deep Learning Framework that integrates useful clues from multiple modalities, including static spatial appearance information, motion patterns within a short time window, audio information as well as long-range temporal dynamics. More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features. We then employ a feature fusion network to derive a unified representation with an aim to capture the relationships among features. Furthermore, to exploit the long-range temporal dynamics in videos, we apply two Long Short Term Memory networks with extracted appearance and motion features as inputs. Finally, we also propose to refine the prediction scores by leveraging contextual relationships among video semantics. The hybrid deep Learning Framework is able to exploit a comprehensive set of multimodal features for video classification. Through an extensive set of experiments, we demonstrate that (1) LSTM networks which model sequences in an explicitly recurrent manner are highly complementary with CNN models; (2) the feature fusion network which produces a fused representation through modeling feature relationships outperforms alternative fusion strategies; (3) the semantic context of video classes can help further refine the predictions for improved performance. Experimental results on two challenging benchmarks, the UCF-101 and the Columbia Consumer Videos (CCV), provide strong quantitative evidence that our Framework achieves promising results: $93.1\%$ on the UCF-101 and $84.5\%$ on the CCV, outperforming competing methods with clear margins.

  • modeling spatial temporal clues in a hybrid deep Learning Framework for video classification
    ACM Multimedia, 2015
    Co-Authors: Xi Wang, Yugang Jiang, Xiangyang Xue
    Abstract:

    Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep Learning Framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos. Specifically, the spatial and the short-term motion features are extracted separately by two Convolutional Neural Networks (CNN). These two types of CNN-based features are then combined in a regularized feature fusion network for classification, which is able to learn and utilize feature relationships for improved performance. In addition, Long Short Term Memory (LSTM) networks are applied on top of the two features to further model longer-term temporal clues. The main contribution of this work is the hybrid Learning Framework that can model several important aspects of the video data. We also show that (1) combining the spatial and the short-term motion features in the regularized fusion network is better than direct classification and fusion using the CNN with a softmax layer, and (2) the sequence-based LSTM is highly complementary to the traditional classification strategy without considering the temporal frame orders. Extensive experiments are conducted on two popular and challenging benchmarks, the UCF-101 Human Actions and the Columbia Consumer Videos (CCV). On both benchmarks, our Framework achieves very competitive performance: 91.3% on the UCF-101 and 83.5% on the CCV.

  • modeling spatial temporal clues in a hybrid deep Learning Framework for video classification
    arXiv: Computer Vision and Pattern Recognition, 2015
    Co-Authors: Xi Wang, Yugang Jiang, Xiangyang Xue
    Abstract:

    Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep Learning Framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos. Specifically, the spatial and the short-term motion features are extracted separately by two Convolutional Neural Networks (CNN). These two types of CNN-based features are then combined in a regularized feature fusion network for classification, which is able to learn and utilize feature relationships for improved performance. In addition, Long Short Term Memory (LSTM) networks are applied on top of the two features to further model longer-term temporal clues. The main contribution of this work is the hybrid Learning Framework that can model several important aspects of the video data. We also show that (1) combining the spatial and the short-term motion features in the regularized fusion network is better than direct classification and fusion using the CNN with a softmax layer, and (2) the sequence-based LSTM is highly complementary to the traditional classification strategy without considering the temporal frame orders. Extensive experiments are conducted on two popular and challenging benchmarks, the UCF-101 Human Actions and the Columbia Consumer Videos (CCV). On both benchmarks, our Framework achieves to-date the best reported performance: $91.3\%$ on the UCF-101 and $83.5\%$ on the CCV.

Zhanxing Zhu - One of the best experts on this subject based on the ideXlab platform.

  • spatio temporal graph convolutional networks a deep Learning Framework for traffic forecasting
    International Joint Conference on Artificial Intelligence, 2018
    Co-Authors: Haoteng Yin, Zhanxing Zhu
    Abstract:

    Timely accurate traffic forecast is crucial for urban traffic control and guidance. Due to the high nonlinearity and complexity of traffic flow, traditional methods cannot satisfy the requirements of mid-and-long term prediction tasks and often neglect spatial and temporal dependencies. In this paper, we propose a novel deep Learning Framework, Spatio-Temporal Graph Convolutional Networks (STGCN), to tackle the time series prediction problem in traffic domain. Instead of applying regular convolutional and recurrent units, we formulate the problem on graphs and build the model with complete convolutional structures, which enable much faster training speed with fewer parameters. Experiments show that our model STGCN effectively captures comprehensive spatio-temporal correlations through modeling multi-scale traffic networks and consistently outperforms state-of-the-art baselines on various real-world traffic datasets.

  • spatio temporal graph convolutional neural network a deep Learning Framework for traffic forecasting
    arXiv: Learning, 2017
    Co-Authors: Haoteng Yin, Zhanxing Zhu
    Abstract:

    Timely accurate traffic forecast is crucial for urban traffic control and guidance. Due to the high nonlinearity and complexity of traffic flow, traditional methods cannot satisfy the requirements of mid-and-long term prediction tasks and often neglect spatial and temporal dependencies. In this paper, we propose a novel deep Learning Framework, Spatio-Temporal Graph Convolutional Networks (STGCN), to tackle the time series prediction problem in traffic domain. Instead of applying regular convolutional and recurrent units, we formulate the problem on graphs and build the model with complete convolutional structures, which enable much faster training speed with fewer parameters. Experiments show that our model STGCN effectively captures comprehensive spatio-temporal correlations through modeling multi-scale traffic networks and consistently outperforms state-of-the-art baselines on various real-world traffic datasets.

Paulo Lobato Correia - One of the best experts on this subject based on the ideXlab platform.

  • a double deep spatio angular Learning Framework for light field based face recognition
    IEEE Transactions on Circuits and Systems for Video Technology, 2020
    Co-Authors: Alireza Sepasmoghaddam, Mohammad A Haque, Paulo Lobato Correia, Kamal Nasrollahi, Thomas B Moeslund, Fernando Pereira
    Abstract:

    Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-deep spatio-angular Learning Framework for light field-based face recognition, which is able to model both the intra-view/spatial and inter-view/angular information using two deep networks in sequence. This is a novel recognition Framework that has never been proposed in the literature for face recognition or any other visual recognition task. The proposed double-deep Learning Framework includes a long short-term memory (LSTM) recurrent network, whose inputs are VGG-Face descriptions, computed using a VGG-16 convolutional neural network (CNN). The VGG-Face spatial descriptions are extracted from a selected set of 2D sub-aperture (SA) images rendered from the light field image, corresponding to different observation angles. A sequence of the VGG-Face spatial descriptions is then analyzed by the LSTM network. A comprehensive set of experiments has been conducted using the IST-EURECOM light field face database, addressing varied and challenging recognition tasks. The results show that the proposed Framework achieves superior face recognition performance when compared to the state of the art.

  • a double deep spatio angular Learning Framework for light field based face recognition
    arXiv: Computer Vision and Pattern Recognition, 2018
    Co-Authors: Alireza Sepasmoghaddam, Mohammad A Haque, Paulo Lobato Correia, Kamal Nasrollahi, Thomas B Moeslund, Fernando Pereira
    Abstract:

    Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-deep spatio-angular Learning Framework for light field based face recognition, which is able to learn both texture and angular dynamics in sequence using convolutional representations; this is a novel recognition Framework that has never been proposed before for either face recognition or any other visual recognition task. The proposed double-deep Learning Framework includes a long short-term memory (LSTM) recurrent network whose inputs are VGG-Face descriptions that are computed using a VGG-Very-Deep-16 convolutional neural network (CNN). The VGG-16 network uses different face viewpoints rendered from a full light field image, which are organised as a pseudo-video sequence. A comprehensive set of experiments has been conducted with the IST-EURECOM light field face database, for varied and challenging recognition tasks. Results show that the proposed Framework achieves superior face recognition performance when compared to the state-of-the-art.