What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation (2306.15521v3)
Abstract: While semantic segmentation has seen tremendous improvements in the past, there are still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-LLMs, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models. The toolkit is available at https://github.com/blumenstiel/MESS.
- Ttpla: An aerial-image dataset for detection and segmentation of transmission towers and power lines. In Proceedings of the Asian Conference on Computer Vision.
- Deep learning based automated detection of intraretinal cystoid fluid. International Journal of Imaging Systems and Technology, 32(3):902–917.
- Alam, F. (2021). Leaf disease segmentation dataset. Kaggle. https://www.kaggle.com/datasets/fakhrealam9537/leaf-disease-segmentation-dataset.
- LandCoverNet: A global benchmark land cover classification training dataset. arXiv preprint arXiv:2012.03111.
- Natural disaster damage assessment using semantic segmentation of uav imagery. In 2023 International Conference on Robotics and Automation in Industry (ICRAI), pages 1–7. IEEE.
- Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. In Proceedings of the IEEE international conference on computer vision, pages 1949–1957.
- Bandara, N. (2022). Ensemble Deep Learning for Automated Dust Storm Detection Using Satellite Images. In 2022 International Research Conference on Smart Computing and Systems Engineering (SCSE), volume 5, pages 178–183. IEEE.
- Zerowaste dataset: Towards deformable object segmentation in cluttered scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21147–21157.
- OpenSurfaces: A richly annotated catalog of surface appearance. ACM Transactions on graphics (TOG), 32(4):1–17.
- Material recognition in the wild with the materials in context database. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3479–3487.
- Corrosion condition state semantic segmentation dataset. University Libraries, Virginia Tech: Blacksburg, VA, USA.
- Labeled Cracks in the Wild (LCW) Dataset. University Libraries, Virginia Tech.
- Sen1Floods11: A georeferenced dataset to train and test deep learning flood algorithms for sentinel-1. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 210–211.
- Mixed supervision for surface-defect detection: From weakly to fully supervised learning. Computers in Industry, 129:103459.
- Refign: Align and refine for adaptation of semantic segmentation to adverse conditions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3174–3184.
- BSF Swissphoto (2012). ISPRS Potsdam dataset within the ISPRS test project on urban classification, 3D building reconstruction and semantic labeling. https://www.isprs.org/education/ benchmarks/UrbanSemLab/default.aspx.
- Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nature methods, 16(12):1247–1253.
- Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE transactions on medical imaging, 33(2):577–590.
- Location-aware self-supervised transformers. arXiv preprint arXiv:2212.02400.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision (ECCV), pages 801–818.
- Normtoraw: A style transfer based self-supervised learning approach for nuclei segmentation. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE.
- Large-scale structure from motion with semantic constraints of aerial images. In Pattern Recognition and Computer Vision: First Chinese Conference, pages 347–359. Springer.
- Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems, 34.
- Agriculture-vision: A large aerial image database for agricultural pattern analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2828–2838.
- CAT-Seg: Cost aggregation for open-vocabulary semantic segmentation. arXiv preprint arXiv:2303.11797v1.
- Roboflow 100: A rich, multi-domain object detection benchmark. arXiv preprint arXiv:2211.13523.
- Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368.
- Semantic segmentation in art paintings. Computer Graphics Forum, 41(2):261–275.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223.
- Epic-kitchens visor benchmark: Video segmentations and object relations. Advances in Neural Information Processing Systems, 35:13745–13758.
- Active fire detection in Landsat-8 imagery: A large-scale dataset and a deep-learning study. ISPRS Journal of Photogrammetry and Remote Sensing, 178:171–186.
- Deepglobe 2018: A challenge to parse the earth through satellite images. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 172–181.
- Decoupling zero-shot semantic segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11583–11592.
- A deep learning-based approach for high-throughput hypocotyl phenotyping. Plant physiology, 181(4):1415–1424.
- Computer vision for recognition of materials and vessels in chemistry lab settings and the vector-labpics data set. ACS central science, 6(10):1743–1752.
- Atlantis: A benchmark for semantic segmentation of waterbody images. Environmental Modelling & Software, 149:105333.
- The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88:303–338.
- Concealed object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Spidermesh: Spatial-aware demand-guided recursive meshing for rgb-t semantic segmentation. arXiv preprint arXiv:2303.08692.
- MVTec D2S: densely segmented supermarket dataset. In European Conference on Computer Vision (ECCV), pages 569–585.
- An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Transactions on Biomedical Engineering, 59(9):2538–2548.
- The ciona17 dataset for semantic segmentation of invasive species in a marine aquaculture environment. In Conference on Computer and Robot Vision (CRV), pages 361–366. IEEE.
- Image Compositing for Segmentation of Surgical Tools Without Manual Annotations. IEEE Transactions on Medical Imaging, 40(5):1450–1460.
- Panoptic segmentation of satellite image time series with convolutional temporal attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4872–4881.
- Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5337–5345.
- Scaling open-vocabulary image segmentation with image-level labels. pages 540–557. Springer.
- Calcrop21: A georeferenced multi-spectral dataset of satellite imagery and crop labels. In IEEE International Conference on Big Data, pages 1625–1632. IEEE.
- Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364.
- xbd: A dataset for assessing building damage from satellite imagery. arXiv preprint arXiv:1911.09296.
- A Hyperspectral and RGB Dataset for Building Facade Segmentation. arXiv preprint arXiv:2212.02749.
- SOCRATES: Introducing Depth in Visual Wildlife Monitoring Using Stereo Vision. Sensors, 22(23):9082.
- A crop/weed field image dataset for the evaluation of computer vision based precision agriculture tasks. In Computer Vision - ECCV 2014 Workshops, pages 105–116. Springer.
- Neural control of fasting-induced torpor in mice. Scientific reports, 9(1):15462.
- Trashcan: A semantically-segmented dataset towards visual detection of marine debris. arXiv preprint arXiv:2007.08097.
- Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical imaging, 19(3):203–210.
- Hosseinpoor, S. (2019). Semantic Terrain Segmentation with an Original RGB Data Set, Targeting Elevation Differences. Master’s thesis.
- Humans In The Loop (2020). Semantic segmentation of aerial imagery. Kaggle. https://www.kaggle.com/datasets/humansintheloop/semantic-segmentation-of-aerial-imagery.
- IDEA-Research (2023). Grounded-SAM. https://github.com/IDEA-Research/Grounded-Segment-Anything.
- Satellite imagery feature detection using deep convolutional neural network: A kaggle competition. arXiv preprint arXiv:1706.06169.
- Institute of Computer Graphics and Vision (2019). Semantic Drone Dataset. http://dronedataset.icg.tugraz.at/.
- Semantic segmentation of underwater imagery: Dataset and benchmark. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1769–1776.
- Kvasir-instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy. MultiMedia Modeling: 27th International Conference, MMM 2021, pages 218–229.
- Kvasir-seg: A segmented polyp dataset. In International Conference on Multimedia Modeling, pages 451–462. Springer.
- Rellis-3d dataset: Data, benchmarks and analysis. In IEEE international conference on robotics and automation (ICRA), pages 1110–1116. IEEE.
- Rv-gan: Segmenting retinal vascular structure in fundus photographs using a novel multi-scale generative adversarial network. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24, pages 34–44. Springer.
- Segment anything. arXiv preprint arXiv:2304.02643.
- Vulnerability of Antarctica’s ice shelves to meltwater-driven fracture. Nature, 584(7822):574–578.
- A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation. International journal of computer assisted radiology and surgery, 14:483–492.
- Anabranch network for camouflaged object segmentation. Computer vision and image understanding, 184:45–56.
- The IRMA code for unique classification of medical images. Medical Imaging 2003: PACS and Integrated Medical Information Systems: Design and Evaluation, 5033:440 – 451.
- Multiple-human parsing in the wild. arXiv preprint arXiv:1705.07206.
- Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10965–10975.
- Fss-1000: A 1000-class dataset for few-shot segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
- A Copy Paste and Semantic Segmentation-Based Approach for the Classification and Assessment of Significant Rice Diseases. Plants, 11(22):3174.
- Open-vocabulary semantic segmentation with mask-adapted CLIP. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7061–7070.
- Microsoft coco: Common objects in context. pages 740–755. Springer.
- Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499.
- Deepcrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing, 338:139–153.
- A survey on applications of deep learning in microscopy image analysis. Computers in Biology and Medicine, 134:104523.
- Underwater optical image processing: a comprehensive review. Mobile networks and applications, 22:1204–1211.
- UAVid: A semantic segmentation dataset for UAV imagery. ISPRS journal of photogrammetry and remote sensing, 165:108–119.
- CryoNuSeg: A dataset for nuclei instance segmentation of cryosectioned H&E-stained histological images. Computers in biology and medicine, 132:104349.
- Towards global flood mapping onboard low cost satellites with machine learning. Scientific reports, 11(1):1–12.
- 3D-PV-Locator: Large-scale detection of rooftop-mounted photovoltaic systems in 3D. Applied Energy, 310:118469.
- Segment anything model for medical image analysis: An experimental study. arXiv preprint arXiv:2304.10517v1.
- HR-GLDD: A globally distributed dataset using generalized DL for rapid landslide mapping on HR satellite imagery. Earth System Science Data Discussions, pages 1–21.
- The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging, 34(10):1993–2024.
- Semantic segmentation of vehicle vision based on two-branch enet network. In 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), pages 477–481. IEEE.
- Finely-grained annotated datasets for image-based plant phenotyping. Pattern recognition letters, 81:80–89.
- TAS-NIR: A VIS+ NIR Dataset for Fine-grained Semantic Segmentation in Unstructured Outdoor Environments. arXiv preprint arXiv:2212.09368.
- The role of context for object detection and semantic segmentation in the wild. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 891–898.
- The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision, pages 4990–4999.
- A method for taxonomy development and its application in information systems. European Journal of Information Systems.
- Automatic grading of prostate cancer in digitized histopathology images: Learning from multiple experts. Medical image analysis, 50:167–180.
- Cats and dogs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3498–3505. IEEE.
- 16 - concepts of image fusion in remote sensing applications. In Stathaki, T., editor, Image Fusion, pages 393–428. Academic Press, Oxford.
- Taco: Trash annotations in context for litter detection. arXiv preprint arXiv:2003.06975.
- Mtneuro: A benchmark for evaluating representations of brain structure across multiple levels of abstraction. Advances in Neural Information Processing Systems, 35:5299–5314.
- Learning transferable visual models from natural language supervision. International Conference on Machine Learning.
- Floodnet: A high resolution aerial imagery dataset for post flood scene understanding. IEEE Access, 9:89644–89654.
- The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3234–3243.
- Robust medical instrument segmentation challenge 2019. arXiv preprint arXiv:2003.10299.
- Roy, S. (2021). Thermal Dog Dataset. Kaggle. https://www.kaggle.com/datasets/sagnik1511/thermal-dog-dataset-instance-segmentation.
- Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7374–7383.
- A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Scientific Reports, 10(1):14671.
- Embrapa Wine Grape Instance Segmentation Dataset–Embrapa WGISD. Zenodo.
- Detailed Annotations of Chest X-Rays via CT Projection for Report Understanding. Proceedings of the 33th British Machine Vision Conference (BMVC).
- Severstal (2019). Severstal: Steel Defect Detection. Kaggle. https://www.kaggle.com/competitions/severstal-steel-defect-detection/overview.
- SpaceNet 6: Multi-sensor all weather mapping dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 196–197.
- WorkingHands: A hand-tool assembly dataset for image segmentation and activity mining.
- PST900: RGB-thermal calibration, dataset and segmentation network. IEEE international conference on robotics and automation (ICRA), pages 9441–9447.
- Gland segmentation in colon histology images: The glas challenge contest. Medical image analysis, 35:489–502.
- The CropAndWeed Dataset: A Multi-Modal Learning Approach for Efficient Crop and Weed Manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3729–3738.
- The aircraft context dataset: understanding and optimizing data variability in aerial domains. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3823–3832.
- Ai4mars: A dataset for terrain-aware autonomous driving on mars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1982–1991.
- Sylvester, A. (2021). Neural Net Mapping of Hudson Bay Sea Ice. Github repository. https://github.com/asylve/Sea-Ice.
- Deep learning convolutional networks for multiphoton microscopy vasculature segmentation. arXiv preprint arXiv:1606.02382.
- Cnn-based approaches for weed detection. In 2022 10th International Scientific Conference on Computer Science (COMSCI), pages 1–4. IEEE.
- Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment, 237:111322.
- NDD20: A large-scale few-shot dolphin dataset for coarse and fine-grained categorisation. arXiv preprint arXiv:2005.13359.
- Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE transactions on medical imaging, 36(1):86–97.
- A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing. In European Conference on Computer Vision (ECCV), pages 450–466. Springer.
- Analysis of hand segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4710–4719.
- Improving realism in patient-specific abdominal ultrasound simulation using cyclegans. International journal of computer assisted radiology and surgery, 15(2):183–192.
- Caltech-UCSD Birds 200. California Institute of Technology.
- Towards real-world prohibited item detection: A large-scale x-ray benchmark. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5412–5421.
- An empirical study of remote sensing pretraining. IEEE Transactions on Geoscience and Remote Sensing.
- LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv preprint arXiv:2110.08733.
- A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images. IEEE Geoscience and Remote Sensing Letters, 19:1–5.
- Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 190:196–214.
- Online automatic anomaly detection for photovoltaic systems using thermography imaging and low rank matrix decomposition. Journal of Quality Technology, 54(5):503–516.
- Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14408–14419.
- iSAID: A large-scale dataset for instance segmentation in aerial images. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 28–37.
- A large-scale benchmark for food image segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, pages 506–515.
- Side adapter network for open-vocabulary semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2945–2954.
- A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model. arXiv preprint arXiv:2112.14757.
- Bayesian particle instance segmentation for electron microscopy image quantification. Journal of Chemical Information and Modeling, 61(3):1136–1149.
- Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9308–9318.
- BDD100K: A diverse driving dataset for heterogeneous multitask learning. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645.
- Railsem19: A dataset for semantic rail scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0.
- A simple framework for open-vocabulary segmentation and detection. arXiv preprint arXiv:2303.08131.
- Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications. In European Conference on Computer Vision (ECCV), pages 127–145. Springer.
- Guided filter network for semantic image segmentation. IEEE Transactions on Image Processing, 31:2695–2709.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890.
- Modanet: A large-scale street fashion dataset with polygon annotations. In Proceedings of the 26th ACM international conference on Multimedia, pages 1670–1678.
- Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 127:302–321.
- Sketchyscene: Richly-annotated scene sketches. In European Conference on Computer Vision (ECCV), pages 421–436.
- Generalized decoding for pixel, image, and language. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15116–15127.