Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios (2403.07403v1)

Published 12 Mar 2024 in cs.CV and cs.AI

Abstract: The precise recognition of food categories plays a pivotal role for intelligent health management, attracting significant research attention in recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172, provide abundant food image resources that catalyze the prosperity of research in this field. Nevertheless, these datasets are well-curated from canteen scenarios and thus deviate from food appearances in daily life. This discrepancy poses great challenges in effectively transferring classifiers trained on these canteen datasets to broader daily-life scenarios encountered by humans. Toward this end, we present two new benchmarks, namely DailyFood-172 and DailyFood-16, specifically designed to curate food images from everyday meals. These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain. In addition, we also propose a simple yet effective baseline method named Multi-Cluster Reference Learning (MCRL) to tackle the aforementioned domain gap. MCRL is motivated by the observation that food images in daily-life scenarios exhibit greater intra-class appearance variance compared with those in well-curated benchmarks. Notably, MCRL can be seamlessly coupled with existing approaches, yielding non-trivial performance enhancements. We hope our new benchmarks can inspire the community to explore the transferability of food recognition models trained on well-curated datasets toward practical real-life applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. W. Min, Z. Wang, Y. Liu, M. Luo, L. Kang, X. Wei, X. Wei, and S. Jiang, “Large scale visual food recognition,” CoRR, vol. abs/2103.16107, 2021.
  2. Q. Thames, A. Karpur, W. Norris, F. Xia, L. Panait, T. Weyand, and J. Sim, “Nutrition5k: Towards automatic nutritional understanding of generic food,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8903–8911.
  3. Y. Liang, J. Li, Q. Zhao, W. Rao, C. Zhang, and C. Wang, “Image segmentation and recognition for multi-class chinese food,” in International Conference on Image Processing, 2022, pp. 3938–3942.
  4. J. Chen and C.-W. Ngo, “Deep-based ingredient recognition for cooking recipe retrieval,” in ACM Multimedia, 2016, pp. 32–41.
  5. L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101–mining discriminative components with random forests,” in European Conference on Computer Vision.   Springer, 2014, pp. 446–461.
  6. J. Chen, B. Zhu, C.-W. Ngo, T.-S. Chua, and Y.-G. Jiang, “A study of multi-task and region-wise deep learning for food ingredient recognition,” IEEE Trans. on Image Processing, vol. 30, pp. 1514–1526, 2020.
  7. E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain confusion: Maximizing for domain invariance,” arXiv preprint arXiv:1412.3474, 2014.
  8. M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” in International Conference on Machine Learning.   PMLR, 2015, pp. 97–105.
  9. B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in ECCV Workshop.   Springer, 2016, pp. 443–450.
  10. K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain separation networks,” Advances in neural information processing systems, vol. 29, 2016.
  11. M. Peng, Z. Li, and X. Juan, “Similarity-based domain adaptation network,” Neurocomputing, vol. 493, pp. 462–473, 2022.
  12. J. Huang, D. Guan, A. Xiao, S. Lu, and L. Shao, “Category contrast for unsupervised domain adaptation in visual tasks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 1203–1214.
  13. Y. Zhu, F. Zhuang, J. Wang, G. Ke, J. Chen, J. Bian, H. Xiong, and Q. He, “Deep subdomain adaptation network for image classification,” IEEE Trans. on Neural Networks and Learning Systems, vol. 32, no. 4, pp. 1713–1722, 2020.
  14. C. Yu, J. Wang, Y. Chen, and M. Huang, “Transfer learning with dynamic adversarial adaptation network,” in IEEE International Conference on Data Mining (ICDM).   IEEE, 2019, pp. 778–786.
  15. G. Kang, L. Jiang, Y. Yang, and A. G. Hauptmann, “Contrastive adaptation network for unsupervised domain adaptation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4893–4902.
  16. J. Wang and X.-L. Zhang, “Improving pseudo labels with intra-class similarity for unsupervised domain adaptation,” Pattern Recognition, vol. 138, p. 109379, 2023.
  17. Y. Du, Z. Tan, Q. Chen, X. Zhang, Y. Yao, and C. Wang, “Dual adversarial domain adaptation,” arXiv preprint arXiv:2001.00153, 2020.
  18. J. Wang, W. Feng, Y. Chen, H. Yu, M. Huang, and P. S. Yu, “Visual domain adaptation with manifold embedded distribution alignment,” in ACM Multimedia, 2018, pp. 402–410.
  19. M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,” in IEEE International Conference on Computer Vision, 2013, pp. 2200–2207.
  20. P. Kaur, K. Sikka, W. Wang, S. Belongie, and A. Divakaran, “Foodx-251: a dataset for fine-grained food classification,” arXiv preprint arXiv:1907.06167, 2019.
  21. A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli, I. Weber, and A. Torralba, “Learning cross-modal embeddings for cooking recipes and food images,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3020–3028.
  22. M.-Y. Chen, Y.-H. Yang, C.-J. Ho, S.-H. Wang, S.-M. Liu, E. Chang, C.-H. Yeh, and M. Ouhyoung, “Automatic chinese food identification and quantity estimation,” in SIGGRAPH Asia 2012 Technical Briefs, 2012, pp. 1–4.
  23. M. Puri, Z. Zhu, Q. Yu, A. Divakaran, and H. Sawhney, “Recognition and volume estimation of food intake using a mobile device,” in Workshop on Applications of Computer Vision, 2009, pp. 1–8.
  24. Y. Matsuda and K. Yanai, “Multiple-food recognition considering co-occurrence employing manifold ranking,” in International Conference on Pattern Recognition.   IEEE, 2012, pp. 2017–2020.
  25. A. Şengür, Y. Akbulut, and Ü. Budak, “Food image classification with deep features,” in International Artificial Intelligence and Data Processing Symposium (IDAP).   Ieee, 2019, pp. 1–6.
  26. B. Arslan, S. Memiş, E. B. Sönmez, and O. Z. Batur, “Fine-grained food classification methods on the uec food-100 database,” IEEE Transactions on Artificial Intelligence, vol. 3, no. 2, pp. 238–243, 2021.
  27. B. Zhu, C.-W. Ngo, and J.-j. Chen, “Cross-domain cross-modal food transfer,” in ACM Multimedia, 2020, pp. 3762–3770.
  28. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” The journal of machine learning research, vol. 17, no. 1, pp. 2096–2030, 2016.
  29. B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain adaptation,” in AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
  30. M. Ghifary, W. B. Kleijn, and M. Zhang, “Domain adaptive neural networks for object recognition,” in Pacific Rim International Conference on Artificial Intelligence.   Springer, 2014, pp. 898–904.
  31. M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Deep transfer learning with joint adaptation networks,” in International Conference on Machine Learning.   PMLR, 2017, pp. 2208–2217.
  32. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
  33. Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in International Conference on Machine Learning.   PMLR, 2015, pp. 1180–1189.
  34. R. Gong, W. Li, Y. Chen, and L. V. Gool, “Dlow: Domain flow for adaptation and generalization,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2477–2486.
  35. J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” in International Conference on Machine Learning.   Pmlr, 2018, pp. 1989–1998.
  36. X. Zhang, F. X. Yu, S.-F. Chang, and S. Wang, “Deep transfer network: Unsupervised domain adaptation,” arXiv preprint arXiv:1503.00591, 2015.
  37. Z. Pei, Z. Cao, M. Long, and J. Wang, “Multi-adversarial domain adaptation,” in AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
  38. D.-H. Lee et al., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Workshop on challenges in representation learning, ICML, vol. 3, no. 2, 2013, p. 896.
  39. X. Gu, J. Sun, and Z. Xu, “Spherical space domain adaptation with robust pseudo-label loss,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9101–9110.
  40. Y. Zhang, Y. Zhang, Y. Wei, K. Bai, Y. Song, and Q. Yang, “Fisher deep domain adaptation,” in International Conference on Data Mining.   SIAM, 2020, pp. 469–477.
  41. Z. Zheng and Y. Yang, “Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation,” International Journal of Computer Vision, vol. 129, no. 4, pp. 1106–1120, 2021.
  42. A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “A kernel two-sample test,” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 723–773, 2012.
  43. Z. Qin, D. Kim, and T. Gedeon, “Rethinking softmax with cross-entropy: Neural network classifier as mutual information estimator,” arXiv preprint arXiv:1911.10688, 2019.
  44. H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5018–5027.
  45. X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko, “Visda: The visual domain adaptation challenge,” arXiv preprint arXiv:1710.06924, 2017.
  46. J. Na, D. Han, H. J. Chang, and W. Hwang, “Contrastive vicinal space for unsupervised domain adaptation,” in European Conference on Computer Vision.   Springer, 2022, pp. 92–110.
  47. T. Xu, W. Chen, P. Wang, F. Wang, H. Li, and R. Jin, “Cdtrans: Cross-domain transformer for unsupervised domain adaptation,” arXiv preprint arXiv:2109.06165, 2021.
  48. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, pp. 211–252, 2015.
  49. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  50. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  51. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International Conference on Machine Learning.   PMLR, 2021, pp. 10 347–10 357.
Citations (6)

Summary

We haven't generated a summary for this paper yet.