Retail-786k: a Large-Scale Dataset for Visual Entity Matching (2309.17164v2)
Abstract: Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms. Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.
- Neural networks for entity matching: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(3):1–37, 2021.
- Fine-grained image analysis with deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Deep learning for instance retrieval: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- A retail product categorisation dataset. arXiv preprint arXiv:2103.13864, 2021.
- imaterialist challenge (furniture) at fgvc5. https://www.kaggle.com/c/imaterialist-challenge-furniture-2018.
- Recognizing groceries in situ using in vitro training data. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007.
- Recognizing products: A per-exemplar multi-label image classification approach. In European Conference on Computer Vision, pages 440–455. Springer, 2014.
- Products-10k: A large-scale product recognition dataset. arXiv preprint arXiv:2008.10545, 2020.
- Rp2k: A large-scale retail product dataset for fine-grained image classification. arXiv preprint arXiv:2006.12634, 2020.
- Fine-grained recognition of thousands of object categories with single-example training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4113–4122, 2017.
- Precise detection in densely packed scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5227–5236, 2019.
- Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment, 3(1-2):484–493, 2010. Also available as https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolution/.
- The magellan data repository. https://sites.google.com/site/anhaidgroup/useful-stuff/the-magellan-data-repository.
- Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE international conference on computer vision, pages 2641–2649, 2015.
- Towards multi-modal entity resolution for product matching. In GvDB, 2021.
- Wdc product categorization goldstandard. http://webdatacommons.org/structureddata/2014-12/products/gs.html.
- Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4004–4012, 2016.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022.
- Robust and decomposable average precision for image retrieval. Advances in Neural Information Processing Systems, 34:23569–23581, 2021.
- A metric learning reality check. In European Conference on Computer Vision, pages 681–699. Springer, 2020.
- imaterialist challenge at fgvc 2017. https://www.kaggle.com/competitions/imaterialist-challenge-FGVC2017/overview.
- Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197, 2019. Also available as https://nijianmo.github.io/amazon/index.html.
- Atlas: a dataset and benchmark for e-commerce clothing product categorization. arXiv preprint arXiv:1908.08984, 2019.
- Weakly supervised learning with side information for noisy labeled images. In The European Conference on Computer Vision (ECCV), August 2020. Also available as https://retailvisionworkshop.github.io/recognition_challenge_2020/.
- Small hand-held object recognition test (short). In IEEE Winter Conference on Applications of Computer Vision, pages 524–531. IEEE, 2014.
- The freiburg groceries dataset. arXiv preprint arXiv:1611.05799, 2016.
- Toward retail product recognition on grocery shelves. In Sixth International Conference on Graphic and Image Processing (ICGIP 2014), volume 9443, pages 46–52. SPIE, 2015.
- Rethinking object detection in retail stores. In The 35th AAAI Conference on Artificial Intelligence (AAAI 2021), 2021.
- Rpc: a large-scale and fine-grained retail product checkout dataset. https://www.kaggle.com/datasets/diyer22/retail-product-checkout-dataset, 2022.
- Arc: A vision-based automatic retail checkout system. arXiv preprint arXiv:2104.02832, 2021.
- Take goods from shelves: a dataset for class-incremental object detection. In Proceedings of the 2019 on International Conference on Multimedia Retrieval, pages 271–278, 2019.
- Towards identification of packaged products via computer vision: Convolutional neural networks for object detection and image classification in retail environments. In Proceedings of the 9th International Conference on the Internet of Things, pages 1–8, 2019.
- Holoselecta dataset: 10’035 gtin-labelled product instances in vending machines for object detection of packaged products in retail environments. Data in Brief, 32:106280, 2020. Also available as https://data.mendeley.com/datasets/gz39ggf35n/1.
- Automatic fruit and vegetable classification from images. Computers and Electronics in Agriculture, 70(1):96–104, 2010.
- Food/non-food image classification and food categorization using pre-trained googlenet model. In Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, pages 3–11, 2016. Also available as https://www.epfl.ch/labs/mmspg/downloads/food-image-datasets/.
- Mvtec d2s: densely segmented supermarket dataset. In Proceedings of the European conference on computer vision (ECCV), pages 569–585, 2018.
- J. Kittler J. Burianek, A. Ahmadyfard. Soil-47: Surrey object image library. http://www.ee.surrey.ac.uk/CVSSP/demos/colour/soil47/.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
- Bianca Lamm (5 papers)
- Janis Keuper (66 papers)