Natural Language Expression - Explore the Science & Experts

The Experts below are selected from a list of 63 Experts worldwide ranked by ideXlab platform

Kwan-yee K. Wong - One of the best experts on this subject based on the ideXlab platform.

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong

Abstract:

Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement.

15 days free trial to Access Article
cops ref a new dataset and task on compositional referring Expression comprehension

arXiv: Computer Vision and Pattern Recognition, 2020

Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong

Abstract:

Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. We hope this new dataset and task can serve as a benchmark for deeper visual reasoning analysis and foster the research on referring Expression comprehension.

15 days free trial to Access Article

Cheng Deng - One of the best experts on this subject based on the ideXlab platform.

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Co-Authors: Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Cheng Deng

Abstract:

Referring Expression comprehension (REC) and segmentation (RES) are two highly-related tasks, which both aim at identifying the referent according to a Natural Language Expression. In this paper, we propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning of REC and RES for the first time. In MCN, RES can help REC to achieve better Language-vision alignment, while REC can help RES to better locate the referent. In addition, we address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS). Specifically, CEM enables REC and RES to focus on similar visual regions by maximizing the consistency energy between two tasks. ASNLS supresses the response of unrelated regions in RES based on the prediction of REC. To validate our model, we conduct extensive experiments on three benchmark datasets of REC and RES, i.e., RefCOCO, RefCOCO+ and RefCOCOg. The experimental results report the significant performance gains of MCN over all existing methods, i.e., up to +7.13% for REC and +11.50% for RES over SOTA, which well confirm the validity of our model for joint REC and RES learning.

15 days free trial to Access Article

Zhenfang Chen - One of the best experts on this subject based on the ideXlab platform.

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong

Abstract:

Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement.

15 days free trial to Access Article
cops ref a new dataset and task on compositional referring Expression comprehension

arXiv: Computer Vision and Pattern Recognition, 2020

Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong

Abstract:

Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. We hope this new dataset and task can serve as a benchmark for deeper visual reasoning analysis and foster the research on referring Expression comprehension.

15 days free trial to Access Article

Gen Luo - One of the best experts on this subject based on the ideXlab platform.

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Co-Authors: Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Cheng Deng

Abstract:

Referring Expression comprehension (REC) and segmentation (RES) are two highly-related tasks, which both aim at identifying the referent according to a Natural Language Expression. In this paper, we propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning of REC and RES for the first time. In MCN, RES can help REC to achieve better Language-vision alignment, while REC can help RES to better locate the referent. In addition, we address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS). Specifically, CEM enables REC and RES to focus on similar visual regions by maximizing the consistency energy between two tasks. ASNLS supresses the response of unrelated regions in RES based on the prediction of REC. To validate our model, we conduct extensive experiments on three benchmark datasets of REC and RES, i.e., RefCOCO, RefCOCO+ and RefCOCOg. The experimental results report the significant performance gains of MCN over all existing methods, i.e., up to +7.13% for REC and +11.50% for RES over SOTA, which well confirm the validity of our model for joint REC and RES learning.

15 days free trial to Access Article

Peng Wang - One of the best experts on this subject based on the ideXlab platform.

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong

Abstract:

Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement.

15 days free trial to Access Article
cops ref a new dataset and task on compositional referring Expression comprehension

arXiv: Computer Vision and Pattern Recognition, 2020

Co-Authors: Zhenfang Chen, Peng Wang, Kwan-yee K. Wong

Abstract:

Referring Expression comprehension (REF) aims at identifying a particular object in a scene by a Natural Language Expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring Expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their Expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring Expression comprehension with two main features. First, we design a novel Expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate Expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an Expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. We hope this new dataset and task can serve as a benchmark for deeper visual reasoning analysis and foster the research on referring Expression comprehension.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Natural Language Expression with ideXlab!

Kwan-yee K. Wong - One of the best experts on this subject based on the ideXlab platform.

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

cops ref a new dataset and task on compositional referring Expression comprehension

Cheng Deng - One of the best experts on this subject based on the ideXlab platform.

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

Zhenfang Chen - One of the best experts on this subject based on the ideXlab platform.

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

cops ref a new dataset and task on compositional referring Expression comprehension

Gen Luo - One of the best experts on this subject based on the ideXlab platform.

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

Peng Wang - One of the best experts on this subject based on the ideXlab platform.

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

cops ref a new dataset and task on compositional referring Expression comprehension