Facial Animation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Maja Pantic - One of the best experts on this subject based on the ideXlab platform.

  • speech driven Facial Animation using polynomial fusion of features
    International Conference on Acoustics Speech and Signal Processing, 2020
    Co-Authors: Triantafyllos Kefalas, Konstantinos Vougioukas, Stavros Petridis, Yannis Panagakis, Jean Kossaifi, Maja Pantic
    Abstract:

    Speech-driven Facial Animation involves using a speech signal to generate realistic videos of talking faces. Recent deep learning approaches to Facial synthesis rely on extracting low-dimensional representations and concatenating them, followed by a decoding step of the concatenated vector. This accounts for only first-order interactions of the features and ignores higher-order interactions. In this paper we propose a polynomial fusion layer that models the joint representation of the encodings by a higher-order polynomial, with the parameters modelled by a tensor decomposition. We demonstrate the suitability of this approach through experiments on generated videos evaluated on a range of metrics on video quality, audiovisual synchronisation and generation of blinks.

  • Realistic Speech-Driven Facial Animation with GANs
    International Journal of Computer Vision, 2019
    Co-Authors: Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
    Abstract:

    Speech-driven Facial Animation is the process that automatically synthesizes talking characters based on speech signals. The majority of work in this domain creates a mapping from audio features to visual features. This approach often requires post-processing using computer graphics techniques to produce realistic albeit subject dependent results. We present an end-to-end system that generates videos of a talking head, using only a still image of a person and an audio clip containing speech, without relying on handcrafted intermediate features. Our method generates videos which have (a) lip movements that are in sync with the audio and (b) natural Facial expressions such as blinks and eyebrow movements. Our temporal GAN uses 3 discriminators focused on achieving detailed frames, audio-visual synchronization, and realistic expressions. We quantify the contribution of each component in our model using an ablation study and we provide insights into the latent representation of the model. The generated videos are evaluated based on sharpness, reconstruction quality, lip-reading accuracy, synchronization as well as their ability to generate natural blinks.

Kun Zhou - One of the best experts on this subject based on the ideXlab platform.

  • warp guided gans for single photo Facial Animation
    ACM Transactions on Graphics, 2018
    Co-Authors: Jiahao Geng, Yanlin Weng, Tianjia Shao, Youyi Zheng, Kun Zhou
    Abstract:

    This paper introduces a novel method for realtime portrait Animation in a single photo. Our method requires only a single portrait photo and a set of Facial landmarks derived from a driving source (e.g., a photo or a video sequence), and generates an animated image with rich Facial details. The core of our method is a warp-guided generative model that instantly fuses various fine Facial details (e.g., creases and wrinkles), which are necessary to generate a high-fidelity Facial expression, onto a pre-warped image. Our method factorizes out the nonlinear geometric transformations exhibited in Facial expressions by lightweight 2D warps and leaves the appearance detail synthesis to conditional generative neural networks for high-fidelity Facial Animation generation. We show such a factorization of geometric transformation and appearance synthesis largely helps the network better learn the high nonlinearity of the Facial expression functions and also facilitates the design of the network architecture. Through extensive experiments on various portrait photos from the Internet, we show the significant efficacy of our method compared with prior arts.

  • real time Facial Animation on mobile devices
    Graphical Models \ graphical Models and Image Processing \ computer Vision Graphics and Image Processing, 2014
    Co-Authors: Yanlin Weng, Chen Cao, Qiming Hou, Kun Zhou
    Abstract:

    Abstract We present a performance-based Facial Animation system capable of running on mobile devices at real-time frame rates. A key component of our system is a novel regression algorithm that accurately infers the Facial motion parameters from 2D video frames of an ordinary web camera. Compared with the state-of-the-art Facial shape regression algorithm [1] , which takes a two-step procedure to track Facial Animations (i.e., first regressing the 3D positions of Facial landmarks, and then computing the head poses and expression coefficients), we directly regress the head poses and expression coefficients. This one-step approach greatly reduces the dimension of the regression target and significantly improves the tracking performance while preserving the tracking accuracy. We further propose to collect the training images of the user under different lighting environments, and make use of the data to learn a user-specific regressor, which can robustly handle lighting changes that frequently occur when using mobile devices.

  • 3d shape regression for real time Facial Animation
    International Conference on Computer Graphics and Interactive Techniques, 2013
    Co-Authors: Chen Cao, Yanlin Weng, Stephen Lin, Kun Zhou
    Abstract:

    We present a real-time performance-driven Facial Animation system based on 3D shape regression. In this system, the 3D positions of Facial landmark points are inferred by a regressor from 2D video frames of an ordinary web camera. From these 3D points, the pose and expressions of the face are recovered by fitting a user-specific blendshape model to them. The main technical contribution of this work is the 3D regression algorithm that learns an accurate, user-specific face alignment model from an easily acquired set of training data, generated from images of the user performing a sequence of predefined Facial poses and expressions. Experiments show that our system can accurately recover 3D face shapes even for fast motions, non-frontal faces, and exaggerated expressions. In addition, some capacity to handle partial occlusions and changing lighting conditions is demonstrated.

Frédéric Pighin - One of the best experts on this subject based on the ideXlab platform.

  • Expressive speech-driven Facial Animation
    ACM Transactions on Graphics, 2005
    Co-Authors: Wen C. Tien, Petros Faloutsos, Frédéric Pighin
    Abstract:

    Speech-driven Facial motion synthesis is a well explored research topic. However, little has been done to model expressive visual behavior during speech. We address this issue using a machine learning approach that relies on a database of speech-related high-fidelity Facial motions. From this training set, we derive a generative model of expressive Facial motion that incorporates emotion control, while maintaining accurate lip-synching. The emotional content of the input speech can be manually specified by the user or automatically extracted from the audio signal using a Support Vector Machine classifier.

  • learning controls for blend shape based realistic Facial Animation
    Symposium on Computer Animation, 2003
    Co-Authors: Pushkar P Joshi, Wen C. Tien, Mathieu Desbrun, Frédéric Pighin
    Abstract:

    Blend shape Animation is the method of choice for keyframe Facial Animation: a set of blend shapes (key Facial expressions) are used to define a linear space of Facial expressions. However, in order to capture a significant range of complexity of human expressions, blend shapes need to be segmented into smaller regions where key idiosyncracies of the face being animated are present. Performing this segmentation by hand requires skill and a lot of time. In this paper, we propose an automatic, physically-motivated segmentation that learns the controls and parameters directly from the set of blend shapes. We show the usefulness and efficiency of this technique for both, motion-capture Animation and keyframing. We also provide a rendering algorithm to enhance the visual realism of a blend shape model.

  • resynthesizing Facial Animation through 3d model based tracking
    International Conference on Computer Vision, 1999
    Co-Authors: Frédéric Pighin, Richard Szeliski, David Salesin
    Abstract:

    Given video footage of a person's face, we present new techniques to automatically recover the face position and the Facial expression from each frame in the video sequence. A 3D face model is fitted to each frame using a continuous optimization technique. Our model is based on a set of 3D face models that are linearly combined using 3D morphing. Our method has the advantages over previous techniques of fitting directly a realistic 3-dimensional face model and of recovering parameters that can be used directly in an Animation system. We also explore many applications, including performance-driven Animation (applying the recovered position and expression of the face to a synthetic character to produce an Animation that mimics the input video), relighting the face, varying the camera position, and adding Facial ornaments such as tattoos and scars.

Konstantinos Vougioukas - One of the best experts on this subject based on the ideXlab platform.

  • speech driven Facial Animation using polynomial fusion of features
    International Conference on Acoustics Speech and Signal Processing, 2020
    Co-Authors: Triantafyllos Kefalas, Konstantinos Vougioukas, Stavros Petridis, Yannis Panagakis, Jean Kossaifi, Maja Pantic
    Abstract:

    Speech-driven Facial Animation involves using a speech signal to generate realistic videos of talking faces. Recent deep learning approaches to Facial synthesis rely on extracting low-dimensional representations and concatenating them, followed by a decoding step of the concatenated vector. This accounts for only first-order interactions of the features and ignores higher-order interactions. In this paper we propose a polynomial fusion layer that models the joint representation of the encodings by a higher-order polynomial, with the parameters modelled by a tensor decomposition. We demonstrate the suitability of this approach through experiments on generated videos evaluated on a range of metrics on video quality, audiovisual synchronisation and generation of blinks.

  • Realistic Speech-Driven Facial Animation with GANs
    International Journal of Computer Vision, 2019
    Co-Authors: Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
    Abstract:

    Speech-driven Facial Animation is the process that automatically synthesizes talking characters based on speech signals. The majority of work in this domain creates a mapping from audio features to visual features. This approach often requires post-processing using computer graphics techniques to produce realistic albeit subject dependent results. We present an end-to-end system that generates videos of a talking head, using only a still image of a person and an audio clip containing speech, without relying on handcrafted intermediate features. Our method generates videos which have (a) lip movements that are in sync with the audio and (b) natural Facial expressions such as blinks and eyebrow movements. Our temporal GAN uses 3 discriminators focused on achieving detailed frames, audio-visual synchronization, and realistic expressions. We quantify the contribution of each component in our model using an ablation study and we provide insights into the latent representation of the model. The generated videos are evaluated based on sharpness, reconstruction quality, lip-reading accuracy, synchronization as well as their ability to generate natural blinks.

Mark Pauly - One of the best experts on this subject based on the ideXlab platform.

  • online modeling for realtime Facial Animation
    International Conference on Computer Graphics and Interactive Techniques, 2013
    Co-Authors: Sofien Bouaziz, Yangang Wang, Mark Pauly
    Abstract:

    We present a new algorithm for realtime face tracking on commodity RGB-D sensing devices. Our method requires no user-specific training or calibration, or any other form of manual assistance, thus enabling a range of new applications in performance-based Facial Animation and virtual interaction at the consumer level. The key novelty of our approach is an optimization algorithm that jointly solves for a detailed 3D expression model of the user and the corresponding dynamic tracking parameters. Realtime performance and robust computations are facilitated by a novel subspace parameterization of the dynamic Facial expression space. We provide a detailed evaluation that shows that our approach significantly simplifies the performance capture workflow, while achieving accurate Facial tracking for realtime applications.

  • Realtime performance-based Facial Animation
    ACM Transactions on Graphics, 2011
    Co-Authors: Thibaut Weise, Sofien Bouaziz, Hao Li, Mark Pauly
    Abstract:

    This paper presents a system for performance-based character Animation that enables any user to control the Facial expressions of a digital avatar in realtime. The user is recorded in a natural environment using a non-intrusive, commercially available 3D sensor. The simplicity of this acquisition device comes at the cost of high noise levels in the acquired data. To effectively map low-quality 2D images and 3D depth maps to realistic Facial expressions, we introduce a novel face tracking algorithm that combines geometry and texture registration with pre-recorded Animation priors in a single optimization. Formulated as a maximum a posteriori estimation in a reduced parameter space, our method implicitly exploits temporal coherence to stabilize the tracking. We demonstrate that compelling 3D Facial dynamics can be reconstructed in realtime without the use of face markers, intrusive lighting, or complex scanning hardware. This makes our system easy to deploy and facilitates a range of new applications, e.g. in digital gameplay or social interactions.