Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation (2404.01409v1)

Published 1 Apr 2024 in cs.CV, cs.AI, and cs.MM

Abstract: In the realm of food computing, segmenting ingredients from images poses substantial challenges due to the large intra-class variance among the same ingredients, the emergence of new ingredients, and the high annotation costs associated with large food segmentation datasets. Existing approaches primarily utilize a closed-vocabulary and static text embeddings setting. These methods often fall short in effectively handling the ingredients, particularly new and diverse ones. In response to these limitations, we introduce OVFoodSeg, a framework that adopts an open-vocabulary setting and enhances text embeddings with visual context. By integrating vision-LLMs (VLMs), our approach enriches text embedding with image-specific information through two innovative modules, eg, an image-to-text learner FoodLearner and an Image-Informed Text Encoder. The training process of OVFoodSeg is divided into two stages: the pre-training of FoodLearner and the subsequent learning phase for segmentation. The pre-training phase equips FoodLearner with the capability to align visual information with corresponding textual representations that are specifically related to food, while the second phase adapts both the FoodLearner and the Image-Informed Text Encoder for the segmentation task. By addressing the deficiencies of previous models, OVFoodSeg demonstrates a significant improvement, achieving an 4.9\% increase in mean Intersection over Union (mIoU) on the FoodSeg103 dataset, setting a new milestone for food image segmentation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Training in cognitive strategies reduces eating and improves food choice. PNAS, 2018.
  2. Zero-shot semantic segmentation. NeuriPS, 2019.
  3. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 2017.
  4. Per-pixel classification is not all you need for semantic segmentation. NeuriPS, 2021.
  5. Global diets link environmental sustainability and human health. Nature, 2014.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  7. Learning to prompt for open-vocabulary object detection with vision-language model. In CVPR, 2022.
  8. A new large-scale food image segmentation dataset and its application to food calorie estimation based on grains of rice. In MADiMa, 2019.
  9. Scaling open-vocabulary image segmentation with image-level labels. In ECCV, 2022.
  10. Open-vocabulary object detection via vision and language knowledge distillation. In ICLR, 2022.
  11. Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
  12. A hybrid network based on gan and cnn for food segmentation and calorie estimation. In ICSCDS, 2022.
  13. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
  14. Segment anything. In ICCV, 2023.
  15. Open-vocabulary object detection upon frozen vision and language models. In ICLR, 2023.
  16. Foodsam: Any food segmentation. IEEE Transactions on Multimedia, 2023.
  17. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML, 2023.
  18. Open-vocabulary semantic segmentation with mask-adapted clip. In CVPR, 2023.
  19. Recipe1m+: a dataset for learning cross-modal embeddings for cooking recipes and food images. arXiv preprint arXiv:1810.06553, 2018.
  20. Im2Calories: towards an automated mobile vision food diary. In ICCV, 2015.
  21. A survey on food computing. ACM Computing Surveys, 2019.
  22. Large scale visual food recognition. arXiv preprint arXiv:2103.16107, 2021.
  23. Applications of knowledge graphs for food science and industry. Patterns, 2022.
  24. UEC-FoodPIX Complete: A large-scale food image segmentation dataset. In ICPRW, 2020.
  25. Freeseg: Unified, universal and open-vocabulary image segmentation. In CVPR, 2023.
  26. Learning transferable visual models from natural language supervision. In ICML, 2021.
  27. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In DLMIA, 2017.
  28. Nutrition5k: Towards automatic nutritional understanding of generic food. In CVPR, 2021.
  29. Swin transformer based pyramid pooling network for food segmentation. In SEAI, 2022.
  30. Hierarchical open-vocabulary universal image segmentation. In NeuriPS, 2023.
  31. A large-scale benchmark for food image segmentation. In ACMMM, 2021.
  32. Semantic projection network for zero-and few-label semantic segmentation. In CVPR, 2019.
  33. A simple baseline for open-vocabulary semantic segmentation with pre-trained vision-language model. In ECCV, 2022.
  34. Side adapter network for open-vocabulary semantic segmentation. In CVPR, 2023.
  35. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR, 2021.
  36. Zhuowen Tu Zheng Ding, Jieke Wang. Open-vocabulary universal image segmentation with maskclip. In ICML, 2023.
  37. Extract free dense labels from clip. In ECCV, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiongwei Wu (16 papers)
  2. Sicheng Yu (13 papers)
  3. Ee-Peng Lim (57 papers)
  4. Chong-Wah Ngo (55 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.