FoodSAM: Any Food Segmentation (2308.05938v1)

Published 11 Aug 2023 in cs.CV and cs.AI

Abstract: In this paper, we explore the zero-shot capability of the Segment Anything Model (SAM) for food image segmentation. To address the lack of class-specific information in SAM-generated masks, we propose a novel framework, called FoodSAM. This innovative approach integrates the coarse semantic mask with SAM-generated masks to enhance semantic segmentation quality. Besides, we recognize that the ingredients in food can be supposed as independent individuals, which motivated us to perform instance segmentation on food images. Furthermore, FoodSAM extends its zero-shot capability to encompass panoptic segmentation by incorporating an object detector, which renders FoodSAM to effectively capture non-food object information. Drawing inspiration from the recent success of promptable segmentation, we also extend FoodSAM to promptable segmentation, supporting various prompt variants. Consequently, FoodSAM emerges as an all-encompassing solution capable of segmenting food items at multiple levels of granularity. Remarkably, this pioneering framework stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images. Extensive experiments demonstrate the feasibility and impressing performance of FoodSAM, validating SAM's potential as a prominent and influential tool within the domain of food image segmentation. We release our code at https://github.com/jamesjg/FoodSAM.

PDF Abstract

An In-Depth Review of "FoodSAM: Any Food Segmentation"

The paper "FoodSAM: Any Food Segmentation," addresses a significant challenge in the domain of food computing—accurately segmenting food images into distinct components such as individual ingredients and associated non-food objects. This task is inherently complex due to the diverse appearances, overlapping ingredients, and complex backgrounds found in food images. The authors introduce FoodSAM, a zero-shot framework that leverages the Segment Anything Model (SAM) to improve segmentation performance in this domain.

Main Contributions

In response to the limitations of current segmentation models in food image analysis, the authors propose the following key innovations in the FoodSAM framework:

Integration of SAM with Semantic Masking: SAM is known for its ability to generate high-quality category-agnostic masks. However, these masks lack specificity for particular classes, which is crucial for accurately identifying food items. FoodSAM enhances semantic segmentation by fusing SAM-generated masks with coarse semantic masks. This integration enriches the mask with meaningful class-specific information, thus improving segmentation accuracy.
Instance Segmentation for Food Ingredients: The proposed framework treats individual food ingredients as separate instances, an approach naturally suited to the random and independent nature of ingredient positioning in food imagery. This allows for detailed segmentation of food items at a granular level.
Incorporation of Non-Food Object Detection: Food images often contain non-food objects, such as utensils and dining furniture, which contribute contextual information. FoodSAM employs object detection methodologies to capture these non-food elements, enabling comprehensive panoptic segmentation of the entire scene.
Support for Promptable Segmentation: By adopting recent advances in prompt-based segmentation, FoodSAM extends its zero-shot capabilities further. It supports various prompt types like points, boxes, and masks, allowing for interactive and flexible segmentation tasks.

Results and Implications

The authors validate FoodSAM across two significant datasets: UECFoodPix Complete and FoodSeg103. FoodSAM demonstrates superior performance over existing methods, achieving significant improvements in mIoU, mAcc, and aAcc metrics. Notably, FoodSAM outperforms other variants of SAM, such as SEEM and RAM, particularly in fine-grained ingredient segmentation. The framework also proves effective in panoptic tasks by successfully integrating both food and non-food segmentation.

These results underscore the potential of SAM as a foundational model in the field of computer vision, particularly for complex tasks that require detailed and contextual understanding. The integration of mask prompts and instance detection further demonstrates the feasibility of combining vision models with domain-specific information to enhance performance.

Future Directions

The FoodSAM framework represents a methodological advancement in food image segmentation. However, the paper also highlights areas for future research:

Broader Application Across Datasets: Future work could explore FoodSAM's applicability across more diverse food datasets, which may vary in complexity and cultural diversity.
Extension to Real-World Applications: Implementing FoodSAM in practical settings, such as dietary analysis or culinary research, could provide real-world validations and further optimize the framework.
Enhancements in Instance and Panoptic Segmentation: Improving the detail and accuracy of instance and panoptic segmentation in complex and crowded food scenes remains an ongoing challenge.

In summary, "FoodSAM: Any Food Segmentation" presents a robust approach to food image segmentation, leveraging the strengths of foundation models while addressing their limitations through intelligent integration with semantic data. This paper marks a meaningful step forward in computational food analysis, providing a blueprint for future advancements in the field.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Xing Lan (5 papers)
Jiayi Lyu (9 papers)
Hanyu Jiang (8 papers)
Kun Dong (14 papers)
Zehai Niu (1 paper)
Yi Zhang (994 papers)
Jian Xue (30 papers)

Citations (17)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - jamesjg/FoodSAM: FoodSAM: Any Food Segmentation (160 stars)