Image Translation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 40986 Experts worldwide ranked by ideXlab platform

Nicu Sebe - One of the best experts on this subject based on the ideXlab platform.

  • attentiongan unpaired Image to Image Translation using attention guided generative adversarial networks
    IEEE Transactions on Neural Networks, 2021
    Co-Authors: Hao Tang, Philip H S Torr, Hong Liu, Nicu Sebe
    Abstract:

    State-of-the-art methods in the Image-to-Image Translation are capable of learning a mapping from a source domain to a target domain with unpaired Image data. Though the existing methods have achieved promising results, they still produce visual artifacts, being able to translate low-level information but not high-level semantics of input Images. One possible reason is that generators do not have the ability to perceive the most discriminative parts between the source and target domains, thus making the generated Images low quality. In this article, we propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN) for the unpaired Image-to-Image Translation task. AttentionGAN can identify the most discriminative foreground objects and minimize the change of the background. The attention-guided generators in AttentionGAN are able to produce attention masks, and then fuse the generation output with the attention masks to obtain high-quality target Images. Accordingly, we also design a novel attention-guided discriminator which only considers attended regions. Extensive experiments are conducted on several generative tasks with eight public datasets, demonstrating that the proposed method is effective to generate sharper and more realistic Images compared with existing competitive models. The code is available at https://github.com/Ha0Tang/AttentionGAN.

  • gmm unit unsupervised multi domain and multi modal Image to Image Translation via attribute gaussian mixture modeling
    arXiv: Computer Vision and Pattern Recognition, 2020
    Co-Authors: Yahui Liu, Marco De Nadai, Jian Yao, Nicu Sebe, Bruno Lepri, Xavier Alamedapineda
    Abstract:

    Unsupervised Image-to-Image Translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training Images. Recent studies have shown remarkable success for multiple domains but they suffer from two main limitations: they are either built from several two-domain mappings that are required to be learned independently, or they generate low-diversity results, a problem known as mode collapse. To overcome these limitations, we propose a method named GMM-UNIT, which is based on a content-attribute disentangled representation where the attribute space is fitted with a GMM. Each GMM component represents a domain, and this simple assumption has two prominent advantages. First, it can be easily extended to most multi-domain and multi-modal Image-to-Image Translation tasks. Second, the continuous domain encoding allows for interpolation between domains and for extrapolation to unseen domains and Translations. Additionally, we show how GMM-UNIT can be constrained down to different methods in the literature, meaning that GMM-UNIT is a unifying framework for unsupervised Image-to-Image Translation.

  • attentiongan unpaired Image to Image Translation using attention guided generative adversarial networks
    arXiv: Computer Vision and Pattern Recognition, 2019
    Co-Authors: Hao Tang, Philip H S Torr, Hong Liu, Nicu Sebe
    Abstract:

    State-of-the-art methods in Image-to-Image Translation are capable of learning a mapping from a source domain to a target domain with unpaired Image data. Though the existing methods have achieved promising results, they still produce visual artifacts, being able to translate low-level information but not high-level semantics of input Images. One possible reason is that generators do not have the ability to perceive the most discriminative parts between the source and target domains, thus making the generated Images low quality. In this paper, we propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN) for the unpaired Image-to-Image Translation task. AttentionGAN can identify the most discriminative foreground objects and minimize the change of the background. The attention-guided generators in AttentionGAN are able to produce attention masks, and then fuse the generation output with the attention masks to obtain high-quality target Images. Accordingly, we also design a novel attention-guided discriminator which only considers attended regions. Extensive experiments are conducted on several generative tasks with eight public datasets, demonstrating that the proposed method is effective to generate sharper and more realistic Images compared with existing competitive models. The code is available at this https URL.

  • dual generator generative adversarial networks for multi domain Image to Image Translation
    Asian Conference on Computer Vision, 2018
    Co-Authors: Hao Tang, Wei Wang, Yan Yan, Nicu Sebe
    Abstract:

    State-of-the-art methods for Image-to-Image Translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired Image data. However, these methods require the training of one specific model for every pair of Image domains, which limits the scalability in dealing with more than two Image domains. In addition, the training stage of these methods has the common problem of model collapse that degrades the quality of the generated Images. To tackle these issues, we propose a Dual Generator Generative Adversarial Network (G\(^2\)GAN), which is a robust and scalable approach allowing to perform unpaired Image-to-Image Translation for multiple domains using only dual generators within a single model. Moreover, we explore different optimization losses for better training of G\(^2\)GAN, and thus make unpaired Image-to-Image Translation with higher consistency and better stability. Extensive experiments on six publicly available datasets with different scenarios, i.e., architectural buildings, seasons, landscape and human faces, demonstrate that the proposed G\(^2\)GAN achieves superior model capacity and better generation performance comparing with existing Image-to-Image Translation GAN models.

Alexei A Efros - One of the best experts on this subject based on the ideXlab platform.

  • contrastive learning for unpaired Image to Image Translation
    European Conference on Computer Vision, 2020
    Co-Authors: Taesung Park, Alexei A Efros, Richard Zhang, Junyan Zhu
    Abstract:

    In Image-to-Image Translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain. We propose a straightforward method for doing so – maximizing mutual information between the two, using a framework based on contrastive learning. The method encourages two elements (corresponding patches) to map to a similar point in a learned feature space, relative to other elements (other patches) in the dataset, referred to as negatives. We explore several critical design choices for making contrastive learning effective in the Image synthesis setting. Notably, we use a multilayer, patch-based approach, rather than operate on entire Images. Furthermore, we draw negatives from within the input Image itself, rather than from the rest of the dataset. We demonstrate that our framework enables one-sided Translation in the unpaired Image-to-Image Translation setting, while improving quality and reducing training time. In addition, our method can even be extended to the training setting where each “domain” is only a single Image.

  • interactive sketch fill multiclass sketch to Image Translation
    International Conference on Computer Vision, 2019
    Co-Authors: Arnab Ghosh, Alexei A Efros, Richard Zhang, Puneet K Dokania, Oliver Wang, Philip H S Torr, Eli Shechtman
    Abstract:

    We propose an interactive GAN-based sketch-to-Image Translation method that helps novice users easily create Images of simple objects. The user starts with a sparse sketch and a desired object category, and the network then recommends its plausible completion(s) and shows a corresponding synthesized Image. This enables a feedback loop, where the user can edit the sketch based on the network's recommendations, while the network is able to better synthesize the Image that the user might have in mind. In order to use a single model for a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network.

  • interactive sketch fill multiclass sketch to Image Translation
    arXiv: Computer Vision and Pattern Recognition, 2019
    Co-Authors: Arnab Ghosh, Alexei A Efros, Richard Zhang, Puneet K Dokania, Oliver Wang, Philip H S Torr, Eli Shechtman
    Abstract:

    We propose an interactive GAN-based sketch-to-Image Translation method that helps novice users create Images of simple objects. As the user starts to draw a sketch of a desired object type, the network interactively recommends plausible completions, and shows a corresponding synthesized Image to the user. This enables a feedback loop, where the user can edit their sketch based on the network's recommendations, visualizing both the completed shape and final rendered Image while they draw. In order to use a single trained model across a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network. Video available at our website: this https URL.

  • toward multimodal Image to Image Translation
    arXiv: Computer Vision and Pattern Recognition, 2017
    Co-Authors: Junyan Zhu, Alexei A Efros, Richard Zhang, Oliver Wang, Deepak Pathak, Trevor Darrell, Eli Shechtman
    Abstract:

    Many Image-to-Image Translation problems are ambiguous, as a single input Image may correspond to multiple possible outputs. In this work, we aim to model a \emph{distribution} of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.

  • Image to Image Translation with conditional adversarial networks
    Computer Vision and Pattern Recognition, 2017
    Co-Authors: Phillip Isola, Junyan Zhu, Tinghui Zhou, Alexei A Efros
    Abstract:

    We investigate conditional adversarial networks as a general-purpose solution to Image-to-Image Translation problems. These networks not only learn the mapping from input Image to output Image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing Images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

Jan Kautz - One of the best experts on this subject based on the ideXlab platform.

  • few shot unsupervised Image to Image Translation
    International Conference on Computer Vision, 2019
    Co-Authors: Mingyu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz
    Abstract:

    Unsupervised Image-to-Image Translation methods learn to map Images in a given class to an analogous Image in a different class, drawing on unstructured (non-registered) datasets of Images. While remarkably successful, current methods require access to many Images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised Image-to-Image Translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example Images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design. Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework. Our implementation and datasets are available at https://github.com/NVlabs/FUNIT

  • multimodal unsupervised Image to Image Translation
    European Conference on Computer Vision, 2018
    Co-Authors: Xun Huang, Mingyu Liu, Serge Belongie, Jan Kautz
    Abstract:

    Unsupervised Image-to-Image Translation is an important and challenging problem in computer vision. Given an Image in the source domain, the goal is to learn the conditional distribution of corresponding Images in the target domain, without seeing any examples of corresponding Image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain Image. To address this limitation, we propose a Multimodal Unsupervised Image-to-Image \(\text{ Translation } \text{(MUNIT) }\) framework. We assume that the Image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an Image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to state-of-the-art approaches further demonstrate the advantage of the proposed framework. Moreover, our framework allows users to control the style of Translation outputs by providing an example style Image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT.

  • multimodal unsupervised Image to Image Translation
    arXiv: Computer Vision and Pattern Recognition, 2018
    Co-Authors: Xun Huang, Mingyu Liu, Serge Belongie, Jan Kautz
    Abstract:

    Unsupervised Image-to-Image Translation is an important and challenging problem in computer vision. Given an Image in the source domain, the goal is to learn the conditional distribution of corresponding Images in the target domain, without seeing any pairs of corresponding Images. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain Image. To address this limitation, we propose a Multimodal Unsupervised Image-to-Image Translation (MUNIT) framework. We assume that the Image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an Image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to the state-of-the-art approaches further demonstrates the advantage of the proposed framework. Moreover, our framework allows users to control the style of Translation outputs by providing an example style Image. Code and pretrained models are available at this https URL

  • unsupervised Image to Image Translation networks
    Neural Information Processing Systems, 2017
    Co-Authors: Mingyu Liu, Thomas M Breuel, Jan Kautz
    Abstract:

    Unsupervised Image-to-Image Translation aims at learning a joint distribution of Images in different domains by using Images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised Image-to-Image Translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality Image Translation results on various challenging unsupervised Image Translation tasks, including street scene Image Translation, animal Image Translation, and face Image Translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in https://github.com/mingyuliutw/unit.

Chen Change Loy - One of the best experts on this subject based on the ideXlab platform.

  • tsit a simple and versatile framework for Image to Image Translation
    European Conference on Computer Vision, 2020
    Co-Authors: Liming Jiang, Mingyang Huang, Jianping Shi, Changxu Zhang, Chunxiao Liu, Chen Change Loy
    Abstract:

    We introduce a simple and versatile framework for Image-to-Image Translation. We unearth the importance of normalization layers, and provide a carefully designed two-stream generative model with newly proposed feature transformations in a coarse-to-fine fashion. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network, permitting our method to scale to various tasks in both unsupervised and supervised settings. No additional constraints (e.g., cycle consistency) are needed, contributing to a very clean and simple method. Multi-modal Image synthesis with arbitrary style control is made possible. A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations. GitHub: https://github.com/EndlessSora/TSIT.

  • tsit a simple and versatile framework for Image to Image Translation
    arXiv: Computer Vision and Pattern Recognition, 2020
    Co-Authors: Liming Jiang, Mingyang Huang, Jianping Shi, Changxu Zhang, Chunxiao Liu, Chen Change Loy
    Abstract:

    We introduce a simple and versatile framework for Image-to-Image Translation. We unearth the importance of normalization layers, and provide a carefully designed two-stream generative model with newly proposed feature transformations in a coarse-to-fine fashion. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network, permitting our method to scale to various tasks in both unsupervised and supervised settings. No additional constraints (e.g., cycle consistency) are needed, contributing to a very clean and simple method. Multi-modal Image synthesis with arbitrary style control is made possible. A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations.

  • transgaga geometry aware unsupervised Image to Image Translation
    Computer Vision and Pattern Recognition, 2019
    Co-Authors: Kaidi Cao, Chen Qian, Chen Change Loy
    Abstract:

    Unsupervised Image-to-Image Translation aims at learning a mapping between two visual domains. However, learning a Translation across large geometry variations al- ways ends up with failure. In this work, we present a novel disentangle-and-translate framework to tackle the complex objects Image-to-Image Translation task. Instead of learning the mapping on the Image space directly, we disentangle Image space into a Cartesian product of the appearance and the geometry latent spaces. Specifically, we first in- troduce a geometry prior loss and a conditional VAE loss to encourage the network to learn independent but com- plementary representations. The Translation is then built on appearance and geometry space separately. Extensive experiments demonstrate the superior performance of our method to other state-of-the-art approaches, especially in the challenging near-rigid and non-rigid objects Translation tasks. In addition, by taking different exemplars as the ap- pearance references, our method also supports multimodal Translation. Project page: https://wywu.github. io/projects/TGaGa/TGaGa.html

  • transgaga geometry aware unsupervised Image to Image Translation
    arXiv: Computer Vision and Pattern Recognition, 2019
    Co-Authors: Kaidi Cao, Chen Qian, Chen Change Loy
    Abstract:

    Unsupervised Image-to-Image Translation aims at learning a mapping between two visual domains. However, learning a Translation across large geometry variations always ends up with failure. In this work, we present a novel disentangle-and-translate framework to tackle the complex objects Image-to-Image Translation task. Instead of learning the mapping on the Image space directly, we disentangle Image space into a Cartesian product of the appearance and the geometry latent spaces. Specifically, we first introduce a geometry prior loss and a conditional VAE loss to encourage the network to learn independent but complementary representations. The Translation is then built on appearance and geometry space separately. Extensive experiments demonstrate the superior performance of our method to other state-of-the-art approaches, especially in the challenging near-rigid and non-rigid objects Translation tasks. In addition, by taking different exemplars as the appearance references, our method also supports multimodal Translation. Project page: this https URL

Junyan Zhu - One of the best experts on this subject based on the ideXlab platform.

  • contrastive learning for unpaired Image to Image Translation
    European Conference on Computer Vision, 2020
    Co-Authors: Taesung Park, Alexei A Efros, Richard Zhang, Junyan Zhu
    Abstract:

    In Image-to-Image Translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain. We propose a straightforward method for doing so – maximizing mutual information between the two, using a framework based on contrastive learning. The method encourages two elements (corresponding patches) to map to a similar point in a learned feature space, relative to other elements (other patches) in the dataset, referred to as negatives. We explore several critical design choices for making contrastive learning effective in the Image synthesis setting. Notably, we use a multilayer, patch-based approach, rather than operate on entire Images. Furthermore, we draw negatives from within the input Image itself, rather than from the rest of the dataset. We demonstrate that our framework enables one-sided Translation in the unpaired Image-to-Image Translation setting, while improving quality and reducing training time. In addition, our method can even be extended to the training setting where each “domain” is only a single Image.

  • toward multimodal Image to Image Translation
    arXiv: Computer Vision and Pattern Recognition, 2017
    Co-Authors: Junyan Zhu, Alexei A Efros, Richard Zhang, Oliver Wang, Deepak Pathak, Trevor Darrell, Eli Shechtman
    Abstract:

    Many Image-to-Image Translation problems are ambiguous, as a single input Image may correspond to multiple possible outputs. In this work, we aim to model a \emph{distribution} of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.

  • Image to Image Translation with conditional adversarial networks
    Computer Vision and Pattern Recognition, 2017
    Co-Authors: Phillip Isola, Junyan Zhu, Tinghui Zhou, Alexei A Efros
    Abstract:

    We investigate conditional adversarial networks as a general-purpose solution to Image-to-Image Translation problems. These networks not only learn the mapping from input Image to output Image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing Images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

  • Image-to-Image Translation with conditional adversarial networks
    Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition CVPR 2017, 2017
    Co-Authors: Phillip Isola, Junyan Zhu, Tinghui Zhou, Alexei A Efros
    Abstract:

    We investigate conditional adversarial networks as a general-purpose solution to Image-to-Image Translation problems. These networks not only learn the mapping from input Image to output Image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing Images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

  • toward multimodal Image to Image Translation
    Neural Information Processing Systems, 2017
    Co-Authors: Junyan Zhu, Alexei A Efros, Richard Zhang, Oliver Wang, Deepak Pathak, Trevor Darrell, Eli Shechtman
    Abstract:

    Many Image-to-Image Translation problems are ambiguous, as a single input Image may correspond to multiple possible outputs. In this work, we aim to model a distribution of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.