Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction (2312.13655v1)
Abstract: Language-enabled robots have been widely studied over the past years to enable natural human-robot interaction and teaming in various real-world applications. Language-enabled robots must be able to comprehend referring expressions to identify a particular object from visual perception using a set of referring attributes extracted from natural language. However, visual observations of an object may not be available when it is referred to, and the number of objects and attributes may also be unbounded in open worlds. To address the challenges, we implement an attribute-based compositional zero-shot learning method that uses a list of attributes to perform referring expression comprehension in open worlds. We evaluate the approach on two datasets including the MIT-States and the Clothing 16K. The preliminary experimental results show that our implemented approach allows a robot to correctly identify the objects referred to by human commands.
- Efficient grounding of abstract spatial concepts for natural language interaction with robot manipulators. 2016.
- Evaluating robot behavior in response to natural language. In Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pages 197–198, 2018.
- Robots that use language. Annual Review of Control, Robotics, and Autonomous Systems, 3:25–55, 2020.
- N. Randall. A survey of robot-assisted language learning (rall). ACM Transactions on Human-Robot Interaction (THRI), 9(1):1–36, 2019.
- Human-robot teaming in urban search and rescue. In Human Factors and Ergonomics Society Annual Meeting, volume 59, pages 250–254, 2015.
- Sorry dave, i’m afraid i can’t do that: Explaining unachievable robot tasks using natural language. In Robotics: Science and Systems, 2013.
- P. Gao and H. Zhang. Bayesian deep graph matching for correspondence identification in collaborative perception. In Robotics Science and Systems (RSS), 2021.
- Open world compositional zero-shot learning. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- C.-Y. Chen and K. Grauman. Inferring analogous attributes. In The IEEE Conference on Computer Vision and Pattern Recognition, 2014.
- From red wine to red tomato: Composition with context. In IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- T. Nagarajan and K. Grauman. Attributes as operators: Factorizing unseen attribute-object compositions. In The European Conference on Computer Vision, 2018.
- Symmetry and group in attribute-object compositions. 2020 ieee. In The IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- Constrained semi-supervised learning using attributes and comparative attributes. In The European Conference on Computer Vision, 2012.
- Learning graph embeddings for open world compositional zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Relation-aware compositional zero-shot learning for attribute-object pair recognition. arXiv, 2021.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv, 2018.
- Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv, 2020.
- Glove: Global vectors for word representation. In The Conference on Empirical Methods in Natural Language Processing, 2014.
- Disentangling visual embeddings for attributes and objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Learning attention as disentangler for compositional zero-shot learning. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Discovering states and transformations in image collections. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015.
- Learning invariant visual representations for compositional zero-shot learning. In European Conference on Computer Vision, 2022.