From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios (2403.07403v1)
Abstract: The precise recognition of food categories plays a pivotal role for intelligent health management, attracting significant research attention in recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172, provide abundant food image resources that catalyze the prosperity of research in this field. Nevertheless, these datasets are well-curated from canteen scenarios and thus deviate from food appearances in daily life. This discrepancy poses great challenges in effectively transferring classifiers trained on these canteen datasets to broader daily-life scenarios encountered by humans. Toward this end, we present two new benchmarks, namely DailyFood-172 and DailyFood-16, specifically designed to curate food images from everyday meals. These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain. In addition, we also propose a simple yet effective baseline method named Multi-Cluster Reference Learning (MCRL) to tackle the aforementioned domain gap. MCRL is motivated by the observation that food images in daily-life scenarios exhibit greater intra-class appearance variance compared with those in well-curated benchmarks. Notably, MCRL can be seamlessly coupled with existing approaches, yielding non-trivial performance enhancements. We hope our new benchmarks can inspire the community to explore the transferability of food recognition models trained on well-curated datasets toward practical real-life applications.
- W. Min, Z. Wang, Y. Liu, M. Luo, L. Kang, X. Wei, X. Wei, and S. Jiang, “Large scale visual food recognition,” CoRR, vol. abs/2103.16107, 2021.
- Q. Thames, A. Karpur, W. Norris, F. Xia, L. Panait, T. Weyand, and J. Sim, “Nutrition5k: Towards automatic nutritional understanding of generic food,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8903–8911.
- Y. Liang, J. Li, Q. Zhao, W. Rao, C. Zhang, and C. Wang, “Image segmentation and recognition for multi-class chinese food,” in International Conference on Image Processing, 2022, pp. 3938–3942.
- J. Chen and C.-W. Ngo, “Deep-based ingredient recognition for cooking recipe retrieval,” in ACM Multimedia, 2016, pp. 32–41.
- L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101–mining discriminative components with random forests,” in European Conference on Computer Vision. Springer, 2014, pp. 446–461.
- J. Chen, B. Zhu, C.-W. Ngo, T.-S. Chua, and Y.-G. Jiang, “A study of multi-task and region-wise deep learning for food ingredient recognition,” IEEE Trans. on Image Processing, vol. 30, pp. 1514–1526, 2020.
- E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain confusion: Maximizing for domain invariance,” arXiv preprint arXiv:1412.3474, 2014.
- M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” in International Conference on Machine Learning. PMLR, 2015, pp. 97–105.
- B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in ECCV Workshop. Springer, 2016, pp. 443–450.
- K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain separation networks,” Advances in neural information processing systems, vol. 29, 2016.
- M. Peng, Z. Li, and X. Juan, “Similarity-based domain adaptation network,” Neurocomputing, vol. 493, pp. 462–473, 2022.
- J. Huang, D. Guan, A. Xiao, S. Lu, and L. Shao, “Category contrast for unsupervised domain adaptation in visual tasks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 1203–1214.
- Y. Zhu, F. Zhuang, J. Wang, G. Ke, J. Chen, J. Bian, H. Xiong, and Q. He, “Deep subdomain adaptation network for image classification,” IEEE Trans. on Neural Networks and Learning Systems, vol. 32, no. 4, pp. 1713–1722, 2020.
- C. Yu, J. Wang, Y. Chen, and M. Huang, “Transfer learning with dynamic adversarial adaptation network,” in IEEE International Conference on Data Mining (ICDM). IEEE, 2019, pp. 778–786.
- G. Kang, L. Jiang, Y. Yang, and A. G. Hauptmann, “Contrastive adaptation network for unsupervised domain adaptation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4893–4902.
- J. Wang and X.-L. Zhang, “Improving pseudo labels with intra-class similarity for unsupervised domain adaptation,” Pattern Recognition, vol. 138, p. 109379, 2023.
- Y. Du, Z. Tan, Q. Chen, X. Zhang, Y. Yao, and C. Wang, “Dual adversarial domain adaptation,” arXiv preprint arXiv:2001.00153, 2020.
- J. Wang, W. Feng, Y. Chen, H. Yu, M. Huang, and P. S. Yu, “Visual domain adaptation with manifold embedded distribution alignment,” in ACM Multimedia, 2018, pp. 402–410.
- M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,” in IEEE International Conference on Computer Vision, 2013, pp. 2200–2207.
- P. Kaur, K. Sikka, W. Wang, S. Belongie, and A. Divakaran, “Foodx-251: a dataset for fine-grained food classification,” arXiv preprint arXiv:1907.06167, 2019.
- A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli, I. Weber, and A. Torralba, “Learning cross-modal embeddings for cooking recipes and food images,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3020–3028.
- M.-Y. Chen, Y.-H. Yang, C.-J. Ho, S.-H. Wang, S.-M. Liu, E. Chang, C.-H. Yeh, and M. Ouhyoung, “Automatic chinese food identification and quantity estimation,” in SIGGRAPH Asia 2012 Technical Briefs, 2012, pp. 1–4.
- M. Puri, Z. Zhu, Q. Yu, A. Divakaran, and H. Sawhney, “Recognition and volume estimation of food intake using a mobile device,” in Workshop on Applications of Computer Vision, 2009, pp. 1–8.
- Y. Matsuda and K. Yanai, “Multiple-food recognition considering co-occurrence employing manifold ranking,” in International Conference on Pattern Recognition. IEEE, 2012, pp. 2017–2020.
- A. Şengür, Y. Akbulut, and Ü. Budak, “Food image classification with deep features,” in International Artificial Intelligence and Data Processing Symposium (IDAP). Ieee, 2019, pp. 1–6.
- B. Arslan, S. Memiş, E. B. Sönmez, and O. Z. Batur, “Fine-grained food classification methods on the uec food-100 database,” IEEE Transactions on Artificial Intelligence, vol. 3, no. 2, pp. 238–243, 2021.
- B. Zhu, C.-W. Ngo, and J.-j. Chen, “Cross-domain cross-modal food transfer,” in ACM Multimedia, 2020, pp. 3762–3770.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” The journal of machine learning research, vol. 17, no. 1, pp. 2096–2030, 2016.
- B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain adaptation,” in AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
- M. Ghifary, W. B. Kleijn, and M. Zhang, “Domain adaptive neural networks for object recognition,” in Pacific Rim International Conference on Artificial Intelligence. Springer, 2014, pp. 898–904.
- M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Deep transfer learning with joint adaptation networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 2208–2217.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
- Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in International Conference on Machine Learning. PMLR, 2015, pp. 1180–1189.
- R. Gong, W. Li, Y. Chen, and L. V. Gool, “Dlow: Domain flow for adaptation and generalization,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2477–2486.
- J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” in International Conference on Machine Learning. Pmlr, 2018, pp. 1989–1998.
- X. Zhang, F. X. Yu, S.-F. Chang, and S. Wang, “Deep transfer network: Unsupervised domain adaptation,” arXiv preprint arXiv:1503.00591, 2015.
- Z. Pei, Z. Cao, M. Long, and J. Wang, “Multi-adversarial domain adaptation,” in AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
- D.-H. Lee et al., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Workshop on challenges in representation learning, ICML, vol. 3, no. 2, 2013, p. 896.
- X. Gu, J. Sun, and Z. Xu, “Spherical space domain adaptation with robust pseudo-label loss,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9101–9110.
- Y. Zhang, Y. Zhang, Y. Wei, K. Bai, Y. Song, and Q. Yang, “Fisher deep domain adaptation,” in International Conference on Data Mining. SIAM, 2020, pp. 469–477.
- Z. Zheng and Y. Yang, “Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation,” International Journal of Computer Vision, vol. 129, no. 4, pp. 1106–1120, 2021.
- A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “A kernel two-sample test,” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 723–773, 2012.
- Z. Qin, D. Kim, and T. Gedeon, “Rethinking softmax with cross-entropy: Neural network classifier as mutual information estimator,” arXiv preprint arXiv:1911.10688, 2019.
- H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5018–5027.
- X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko, “Visda: The visual domain adaptation challenge,” arXiv preprint arXiv:1710.06924, 2017.
- J. Na, D. Han, H. J. Chang, and W. Hwang, “Contrastive vicinal space for unsupervised domain adaptation,” in European Conference on Computer Vision. Springer, 2022, pp. 92–110.
- T. Xu, W. Chen, P. Wang, F. Wang, H. Li, and R. Jin, “Cdtrans: Cross-domain transformer for unsupervised domain adaptation,” arXiv preprint arXiv:2109.06165, 2021.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, pp. 211–252, 2015.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International Conference on Machine Learning. PMLR, 2021, pp. 10 347–10 357.