Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships (2403.17173v2)

Published 25 Mar 2024 in cs.CV

Abstract: Modeling and visualizing relationships between tasks or datasets is an important step towards solving various meta-tasks such as dataset discovery, multi-tasking, and transfer learning. However, many relationships, such as containment and transferability, are naturally asymmetric and current approaches for representation and visualization (e.g., t-SNE) do not readily support this. We propose Task2Box, an approach to represent tasks using box embeddings -- axis-aligned hyperrectangles in low dimensional spaces -- that can capture asymmetric relationships between them through volumetric overlaps. We show that Task2Box accurately predicts unseen hierarchical relationships between nodes in ImageNet and iNaturalist datasets, as well as transferability between tasks in the Taskonomy benchmark. We also show that box embeddings estimated from task representations (e.g., CLIP, Task2Vec, or attribute based) can be used to predict relationships between unseen tasks more accurately than classifiers trained on the same representations, as well as handcrafted asymmetric distances (e.g., KL divergence). This suggests that low-dimensional box embeddings can effectively capture these task relationships and have the added advantage of being interpretable. We use the approach to visualize relationships among publicly available image classification datasets on popular dataset hosting platform called Hugging Face.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Hugging Face datasets. https://huggingface.co/datasets?task_categories=task_categories:image-classification.
  2. Boxe: A box embedding model for knowledge base completion. Advances in Neural Information Processing Systems, 33:9649–9661, 2020.
  3. Cdul: Clip-driven unsupervised learning for multi-label image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1348–1357, 2023.
  4. Task2vec: Task embedding for meta-learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6430–6439, 2019.
  5. An analysis of the t-sne algorithm for data visualization. In Proceedings of the 31st Conference On Learning Theory, pages 1455–1462. PMLR, 2018.
  6. Capacity and bias of learned geometric embeddings for directed graphs. Advances in Neural Information Processing Systems, 34:16423–16436, 2021.
  7. Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2818–2829, 2023.
  8. Box embeddings: An open-source library for representation learning using geometric structures. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 203–211, 2021.
  9. Clip-art: Contrastive pre-training for fine-grained art classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3956–3960, 2021.
  10. Word2box: Capturing set-theoretic semantics of words using box embeddings. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR), 2021.
  13. Representation similarity analysis for efficient task taxonomy & transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12387–12396, 2019.
  14. Efficiently identifying task groupings for multi-task learning. Advances in Neural Information Processing Systems, 34:27503–27516, 2021.
  15. Datasheets for datasets. Communications of the ACM, 64(12):86–92, 2021.
  16. Open-vocabulary object detection via vision and language knowledge distillation. In International Conference on Learning Representations, 2021.
  17. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  18. Clip-s4: Language-guided self-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11207–11216, 2023.
  19. Event-event relation extraction using probabilistic box embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 235–244, 2022.
  20. Openclip, 2021.
  21. Learning with whom to share in multi-task feature learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 521–528, 2011.
  22. Universal statistics of fisher information in deep neural networks: Mean field approach. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1032–1041. PMLR, 2019.
  23. Padclip: Pseudo-labeling with adaptive debiasing in clip for unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16155–16165, 2023.
  24. Open-vocabulary semantic segmentation with mask-adapted clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7061–7070, 2023.
  25. Approximate fisher information matrix to characterize the training of deep neural networks. IEEE transactions on pattern analysis and machine intelligence, 42(1):15–26, 2018.
  26. Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15305–15314, 2023.
  27. Hyperbolic visual embedding learning for zero-shot recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  28. Umap: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29):861, 2018.
  29. Circular drawings of rooted trees. CWI (Centre for Mathematics and Computer Science), 1998.
  30. George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
  31. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pages 220–229, 2019.
  32. Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.
  33. Representing joint hierarchies with box embeddings. In Automated Knowledge Base Construction, 2020.
  34. Modeling label space interactions in multi-label classification using box embeddings. In International Conference on Learning Representations, 2022.
  35. The spectrum of the fisher information matrix of a single-hidden-layer neural network. Advances in neural information processing systems, 31, 2018.
  36. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  37. Tidier drawings of trees. IEEE Transactions on software Engineering, (2):223–228, 1981.
  38. Query2box: Reasoning over knowledge graphs in vector space using box embeddings. In International Conference on Learning Representations, 2019.
  39. Roseanna W Saaty. The analytic hierarchy process—what it is and how it is used. Mathematical modelling, 9(3-5):161–176, 1987.
  40. Hierarchical nearest neighbor graph embedding for efficient dimensionality reduction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 336–345, 2022.
  41. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
  42. Instance level affinity-based transfer for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5361–5371, 2021.
  43. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web, pages 287–297, 2016.
  44. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  45. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
  46. Order-embeddings of images and language. International Conference on Learning Representations (ICLR), 2015.
  47. Clip the gap: A single domain generalization approach for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3219–3229, 2023.
  48. Probabilistic embedding of knowledge graphs with box lattice measures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 263–272, 2018.
  49. Visualization of large hierarchical data by circle packing. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 517–520, 2006.
  50. Caltech-ucsd birds 200. Technical Report CNS-TR-201, Caltech, 2010.
  51. Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7031–7040, 2023.
  52. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3712–3722, 2018.
  53. Mind the gap: Domain gap control for single shot domain adaptation for generative adversarial networks. In International Conference on Learning Representations, 2021.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com