Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 81 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 145 tok/s Pro
GPT OSS 120B 446 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Synthesizing Knowledge-enhanced Features for Real-world Zero-shot Food Detection (2402.09242v1)

Published 14 Feb 2024 in cs.CV

Abstract: Food computing brings various perspectives to computer vision like vision-based food analysis for nutrition and health. As a fundamental task in food computing, food detection needs Zero-Shot Detection (ZSD) on novel unseen food objects to support real-world scenarios, such as intelligent kitchens and smart restaurants. Therefore, we first benchmark the task of Zero-Shot Food Detection (ZSFD) by introducing FOWA dataset with rich attribute annotations. Unlike ZSD, fine-grained problems in ZSFD like inter-class similarity make synthesized features inseparable. The complexity of food semantic attributes further makes it more difficult for current ZSD methods to distinguish various food categories. To address these problems, we propose a novel framework ZSFDet to tackle fine-grained problems by exploiting the interaction between complex attributes. Specifically, we model the correlation between food categories and attributes in ZSFDet by multi-source graphs to provide prior knowledge for distinguishing fine-grained features. Within ZSFDet, Knowledge-Enhanced Feature Synthesizer (KEFS) learns knowledge representation from multiple sources (e.g., ingredients correlation from knowledge graph) via the multi-source graph fusion. Conditioned on the fusion of semantic knowledge representation, the region feature diffusion model in KEFS can generate fine-grained features for training the effective zero-shot detector. Extensive evaluations demonstrate the superior performance of our method ZSFDet on FOWA and the widely-used food dataset UECFOOD-256, with significant improvements by 1.8% and 3.7% ZSD mAP compared with the strong baseline RRFS. Further experiments on PASCAL VOC and MS COCO prove that enhancement of the semantic knowledge can also improve the performance on general ZSD. Code and dataset are available at https://github.com/LanceZPF/KEFS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. W. Min, S. Jiang, L. Liu, Y. Rui, and R. Jain, “A survey on food computing,” ACM Computing Surveys, vol. 52, no. 5, pp. 1–36, 2019.
  2. G. Vaidyanathan, “What humanity should eat to stay healthy and save the planet,” Nature, vol. 600, pp. 22–25, 2021.
  3. H. Wang, G. Lin, S. C. Hoi, and C. Miao, “Learning structural representations for recipe generation and food retrieval,” TPAMI, pp. 1–15, 2022.
  4. W. Willett, J. Rockström, B. Loken, M. Springmann, T. Lang, S. Vermeulen, T. Garnett, D. Tilman, F. DeClerck, A. Wood et al., “Food in the anthropocene: the eat–lancet commission on healthy diets from sustainable food systems,” The Lancet, vol. 393, no. 10170, pp. 447–492, 2019.
  5. W. Min, Z. Wang, Y. Liu, M. Luo, L. Kang, X. Wei, X. Wei, and S. Jiang, “Large scale visual food recognition,” TPAMI, vol. 45, no. 8, pp. 9932–9949, 2023.
  6. J. Marin, A. Biswas, F. Ofli, N. Hynes, A. Salvador, Y. Aytar, I. Weber, and A. Torralba, “Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images,” TPAMI, vol. 43, no. 1, pp. 187–203, 2019.
  7. S. Jiang, W. Min, L. Liu, and Z. Luo, “Multi-scale multi-view deep feature aggregation for food recognition,” TIP, vol. 29, no. 1, pp. 265–276, 2020.
  8. B. Basso and J. Antle, “Digital agriculture to design sustainable agricultural systems,” Nature Sustainability, vol. 3, no. 4, pp. 254–256, 2020.
  9. M. I. H. Khan, S. S. Sablani, R. Nayak, and Y. Gu, “Machine learning-based modeling in food processing applications: State of the art,” Comprehensive Reviews in Food Science and Food Safety, vol. 21, no. 2, pp. 1409–1438, 2022.
  10. W. Wang, W. Min, T. Li, X. Dong, H. Li, and S. Jiang, “A review on vision-based analysis for automatic dietary assessment,” Trends in Food Science & Technology, vol. 122, pp. 223–237, 2022.
  11. E. Aguilar, B. Remeseiro, M. Bolaños, and P. Radeva, “Grab, pay, and eat: Semantic food detection for smart restaurants,” TMM, vol. 20, no. 12, pp. 3266–3275, 2018.
  12. A. Meyers, N. Johnston, V. Rathod, A. Korattikara, A. Gorban, N. Silberman, S. Guadarrama, G. Papandreou, J. Huang, and K. P. Murphy, “Im2calories: Towards an automated mobile vision food diary,” in ICCV, 2015, pp. 1233–1241.
  13. D. Damen, H. Doughty, G. M. Farinella, S. Fidler, A. Furnari, E. Kazakos, D. Moltisanti, J. Munro, T. Perrett, W. Price et al., “The epic-kitchens dataset: Collection, challenges and baselines,” TPAMI, vol. 43, no. 11, pp. 4125–4141, 2020.
  14. A. Bansal, K. Sikka, G. Sharma, R. Chellappa, and A. Divakaran, “Zero-shot object detection,” in ECCV, 2018, pp. 384–400.
  15. D. Berkan, R. G. Cinbiş, and N. İkizler Cinbiş, “Zero-shot object detection by hybrid region embedding,” in BMVC, 2018, pp. 56–68.
  16. C. Yan, X. Chang, M. Luo, H. Liu, X. Zhang, and Q. Zheng, “Semantics-guided contrastive network for zero-shot object detection,” TPAMI, pp. 1–1, 2022.
  17. P. Zhu, H. Wang, and V. Saligrama, “Don’t even look once: Synthesizing features for zero-shot detection,” in CVPR, 2020, pp. 11 693–11 702.
  18. N. Hayat, M. Hayat, S. Rahman, S. Khan, S. W. Zamir, and F. S. Khan, “Synthesizing the unseen for zero-shot object detection,” in ACCV, 2020.
  19. P. Huang, J. Han, D. Cheng, and D. Zhang, “Robust region feature synthesizer for zero-shot object detection,” in CVPR, 2022, pp. 7622–7631.
  20. S. Badirli, Z. Akata, G. Mohler, C. Picard, and M. M. Dundar, “Fine-grained zero-shot learning with DNA as side information,” in NeurIPS, 2021, pp. 19 352–19 362.
  21. S. Haussmann, O. Seneviratne, Y. Chen, Y. Ne’eman, J. Codella, C.-H. Chen, D. L. McGuinness, and M. J. Zaki, “FoodKG: a semantics-driven knowledge graph for food recommendation,” in ISWC, 2019, pp. 146–162.
  22. L. Bossard, M. Guillaumin, and L. V. Gool, “Food-101–mining discriminative components with random forests,” in ECCV, 2014, pp. 446–461.
  23. H. Wang, D. Sahoo, C. Liu, E.-p. Lim, and S. C. Hoi, “Learning cross-modal embeddings with adversarial networks for cooking recipes and food images,” in CVPR, 2019, pp. 11 572–11 581.
  24. L. Rachakonda, S. P. Mohanty, and E. Kougianos, “ilog: An intelligent device for automatic food intake monitoring and stress detection in the iomt,” TCE, vol. 66, no. 2, pp. 115–124, 2020.
  25. T. Ege and K. Yanai, “Simultaneous estimation of food categories and calories with multi-task CNN,” in ICMVA, 2017, pp. 198–201.
  26. W. Shimoda and K. Yanai, “Webly-supervised food detection with foodness proposal,” IEICE Transactions on Information and Systems, vol. 102, no. 7, pp. 1230–1239, 2019.
  27. G. Ciocca, P. Napoletano, and R. Schettini, “Food recognition: a new dataset, experiments, and results,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 3, pp. 588–598, 2016.
  28. A. Ziller, J. Hansjakob, V. Rusinov, D. Zügner, P. Vogel, and S. Günnemann, “Oktoberfest food dataset,” arXiv preprint arXiv:1912.05007, 2019.
  29. M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in CVPR, 2020, pp. 10 778–10 787.
  30. Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in CVPR, 2018, pp. 6154–6162.
  31. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable transformers for end-to-end object detection,” in ICLR, 2021.
  32. C. H. Lampert, H. Nickisch, and S. Harmeling, “Attribute-based classification for zero-shot visual object categorization,” TPAMI, vol. 36, no. 3, pp. 453–465, 2013.
  33. M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. S. Corrado, and J. Dean, “Zero-shot learning by convex combination of semantic embeddings,” in ICLR, 2014.
  34. R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, “Zero-shot learning through cross-modal transfer,” in NeurIPS, vol. 26, 2013.
  35. J. Lei Ba, K. Swersky, S. Fidler et al., “Predicting deep zero-shot convolutional neural networks using textual descriptions,” in ICCV, 2015, pp. 4247–4255.
  36. L. Zhang, T. Xiang, and S. Gong, “Learning a deep embedding model for zero-shot learning,” in CVPR, 2017, pp. 2021–2030.
  37. E. Kodirov, T. Xiang, and S. Gong, “Semantic autoencoder for zero-shot learning,” in CVPR, 2017, pp. 3174–3183.
  38. Z. Akata, M. Malinowski, M. Fritz, and B. Schiele, “Multi-cue zero-shot learning with strong supervision,” in CVPR, 2016, pp. 59–68.
  39. E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata, “Generalized zero-and few-shot learning via aligned variational autoencoders,” in CVPR, 2019, pp. 8247–8255.
  40. Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata, “Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly,” TPAMI, vol. 41, no. 9, pp. 2251–2265, 2018.
  41. X. Dai, C. Wang, H. Li, S. Lin, L. Dong, J. Wu, and J. Wang, “Synthetic feature assessment for zero-shot object detection,” in ICME, 2023, pp. 444–449.
  42. S. Rahman, S. Khan, and F. Porikli, “Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts,” in ACCV, 2018, pp. 547–563.
  43. Z. Li, L. Yao, X. Zhang, X. Wang, S. Kanhere, and H. Zhang, “Zero-shot object detection with textual descriptions,” in AAAI, 2019, pp. 8690–8697.
  44. Y. Zheng, R. Huang, C. Han, X. Huang, and L. Cui, “Background learnable cascade for zero-shot object detection,” in ACCV, 2020.
  45. S. Zhao, C. Gao, Y. Shao, L. Li, C. Yu, Z. Ji, and N. Sang, “GTNet: Generative transfer network for zero-shot object detection,” in AAAI, 2020, pp. 12 967–12 974.
  46. H. Li, C.-M. Feng, Y. Xu, T. Zhou, L. Yao, and X. Chang, “Zero-shot camouflaged object detection,” TIP, 2023.
  47. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in NAACL, 2019.
  48. T. N. Kipf and W. Max, “Semi-supervised classification with graph convolutional networks,” in ICLR, 2017.
  49. A. L. Maas, A. Y. Hannun, A. Y. Ng et al., “Rectifier nonlinearities improve neural network acoustic models,” in ICML, 2013, pp. 3–8.
  50. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in ICCV, 2017, pp. 1501–1510.
  51. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in ICML, 2017, pp. 214–223.
  52. P. Zhou, K. Ying, Z. Wang, D. Guo, and C. Bai, “Self-supervised enhancement for named entity disambiguation via multimodal graph convolution,” TNNLS, vol. 35, no. 1, pp. 231–245, 2022.
  53. OpenAI, “Gpt-4 technical report. arxiv 2303.08774,” View in Article, vol. 2, p. 3, 2023.
  54. Z. Yang, L. Li, K. Lin, J. Wang, C.-C. Lin, Z. Liu, and L. Wang, “The dawn of LMMs: Preliminary explorations with GPT-4V(ision),” arXiv preprint arXiv:2309.17421, vol. 9, no. 1, 2023.
  55. M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” IJCV, vol. 88, no. 2, pp. 303–338, 2010.
  56. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in ECCV, 2014, pp. 740–755.
  57. Y. Kawano and K. Yanai, “Automatic expansion of a food image dataset leveraging existing categories with domain adaptation,” in ECCV Workshop, 2015, pp. 3–17.
  58. S. Rahman, S. Khan, and N. Barnes, “Polarity loss: Improving visual-semantic alignment for zero-shot detection,” TNNLS, pp. 1–13, 2022.
  59. P. Zhou, W. Min, Y. Zhang, J. Song, Y. Jin, and S. Jiang, “Seeds: Semantic separable diffusion synthesizer for zero-shot food detection,” in ACM MM, 2023, pp. 8157–8166.
  60. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in NeurIPS, vol. 28, 2015.
  61. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  62. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015, pp. 1–15.
  63. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in ECCV, 2020, pp. 213–229.
  64. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention Is All You Need,” in NeurIPS, vol. 30, 2017.
  65. A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” in CVPR, 2009, pp. 1778–1785.
  66. T. Mikolov, É. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, “Advances in pre-training distributed word representations,” in LREC, 2018, pp. 52–55.
  67. H. Li, J. Mei, J. Zhou, and Y. Hu, “Zero-shot object detection based on dynamic semantic vectors,” in ICRA, 2023, pp. 9267–9273.
  68. L. van der Maaten and G. Hinton, “Visualizing data using t-sne,” JMLR, vol. 9, no. 86, pp. 2579–2605, 2008.
  69. D. Narayanan, M. Shoeybi, J. Casper, P. LeGresley, M. Patwary, V. Korthikanti, D. Vainbrand, P. Kashinkunti, J. Bernauer, B. Catanzaro et al., “Efficient large-scale language model training on gpu clusters using megatron-lm,” in IEEE SC, 2021, pp. 1–15.
Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.