Natural Language Expression

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 63 Experts worldwide ranked by ideXlab platform

Kwan-yee K. Wong - One of the best experts on this subject based on the ideXlab platform.

  • Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
    2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
    Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong
    Abstract:

    Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement.

  • cops ref a new dataset and task on compositional referring Expression comprehension
    arXiv: Computer Vision and Pattern Recognition, 2020
    Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong
    Abstract:

    Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. We hope this new dataset and task can serve as a benchmark for deeper visual reasoning analysis and foster the research on referring Expression comprehension.

Cheng Deng - One of the best experts on this subject based on the ideXlab platform.

  • Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
    2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
    Co-Authors: Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Cheng Deng
    Abstract:

    Referring Expression comprehension (REC) and segmentation (RES) are two highly-related tasks, which both aim at identifying the referent according to a Natural Language Expression. In this paper, we propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning of REC and RES for the first time. In MCN, RES can help REC to achieve better Language-vision alignment, while REC can help RES to better locate the referent. In addition, we address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS). Specifically, CEM enables REC and RES to focus on similar visual regions by maximizing the consistency energy between two tasks. ASNLS supresses the response of unrelated regions in RES based on the prediction of REC. To validate our model, we conduct extensive experiments on three benchmark datasets of REC and RES, i.e., RefCOCO, RefCOCO+ and RefCOCOg. The experimental results report the significant performance gains of MCN over all existing methods, i.e., up to +7.13% for REC and +11.50% for RES over SOTA, which well confirm the validity of our model for joint REC and RES learning.

Zhenfang Chen - One of the best experts on this subject based on the ideXlab platform.

  • Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
    2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
    Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong
    Abstract:

    Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement.

  • cops ref a new dataset and task on compositional referring Expression comprehension
    arXiv: Computer Vision and Pattern Recognition, 2020
    Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong
    Abstract:

    Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. We hope this new dataset and task can serve as a benchmark for deeper visual reasoning analysis and foster the research on referring Expression comprehension.

Gen Luo - One of the best experts on this subject based on the ideXlab platform.

  • Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
    2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
    Co-Authors: Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Cheng Deng
    Abstract:

    Referring Expression comprehension (REC) and segmentation (RES) are two highly-related tasks, which both aim at identifying the referent according to a Natural Language Expression. In this paper, we propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning of REC and RES for the first time. In MCN, RES can help REC to achieve better Language-vision alignment, while REC can help RES to better locate the referent. In addition, we address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS). Specifically, CEM enables REC and RES to focus on similar visual regions by maximizing the consistency energy between two tasks. ASNLS supresses the response of unrelated regions in RES based on the prediction of REC. To validate our model, we conduct extensive experiments on three benchmark datasets of REC and RES, i.e., RefCOCO, RefCOCO+ and RefCOCOg. The experimental results report the significant performance gains of MCN over all existing methods, i.e., up to +7.13% for REC and +11.50% for RES over SOTA, which well confirm the validity of our model for joint REC and RES learning.

Peng Wang - One of the best experts on this subject based on the ideXlab platform.

  • Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
    2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
    Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong
    Abstract:

    Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement.

  • cops ref a new dataset and task on compositional referring Expression comprehension
    arXiv: Computer Vision and Pattern Recognition, 2020
    Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong
    Abstract:

    Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. We hope this new dataset and task can serve as a benchmark for deeper visual reasoning analysis and foster the research on referring Expression comprehension.