SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized Zero-Shot Learning (2312.13100v1)
Abstract: Generalized Zero-Shot Learning (GZSL) recognizes unseen classes by transferring knowledge from the seen classes, depending on the inherent interactions between visual and semantic data. However, the discrepancy between well-prepared training data and unpredictable real-world test scenarios remains a significant challenge. This paper introduces a dual strategy to address the generalization gap. Firstly, we incorporate semantic information through an innovative encoder. This encoder effectively integrates class-specific semantic information by targeting the performance disparity, enhancing the produced features to enrich the semantic space for class-specific attributes. Secondly, we refine our generative capabilities using a novel compositional loss function. This approach generates discriminative classes, effectively classifying both seen and unseen classes. In addition, we extend the exploitation of the learned latent space by utilizing controlled semantic inputs, ensuring the robustness of the model in varying environments. This approach yields a model that outperforms the state-of-the-art models in terms of both generalization and diverse settings, notably without requiring hyperparameter tuning or domain-specific adaptations. We also propose a set of novel evaluation metrics to provide a more detailed assessment of the reliability and reproducibility of the results. The complete code is made available on https://github.com/william-heyden/SEER-ZeroShotLearning/.
- Zero-shot learning and its applications from autonomous vehicles to covid-19 diagnosis: A review. Intelligence-based medicine, 3:100005, 2020.
- Generative zero-shot learning via low-rank embedded semantic dictionary. IEEE transactions on pattern analysis and machine intelligence, 41(12):2861–2874, 2018.
- Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE conference on computer vision and pattern recognition, pages 951–958. IEEE, 2009.
- Rethinking generative zero-shot learning: An ensemble learning perspective for recognising visual patches. In Proceedings of the 28th ACM International Conference on Multimedia, pages 3413–3421, 2020.
- A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 2188–2196, 2018.
- Semantics disentangling for generalized zero-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8712–8720, 2021.
- Devise: A deep visual-semantic embedding model. Advances in neural information processing systems, 26, 2013.
- Zero-shot learning through cross-modal transfer. Advances in neural information processing systems, 26, 2013.
- A simple exponential family framework for zero-shot learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18–22, 2017, Proceedings, Part II 10, pages 792–808. Springer, 2017.
- Free: Feature refinement for generalized zero-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 122–131, 2021.
- Contrastive embedding for generalized zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2371–2381, 2021.
- Transferable contrastive network for generalized zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9765–9774, 2019.
- Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence, 41(9):2251–2265, 2018.
- Logic explained networks. Artificial Intelligence, 314:103822, 2023.
- Generalized zero-shot learning via disentangled representation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 1966–1974, 2021.
- Learning attention as disentangler for compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15315–15324, 2023.
- Hierarchical disentanglement of discriminative latent features for zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11467–11476, 2019.
- Duet: Cross-modal semantic grounding for contrastive zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 405–413, 2023.
- Domain-aware visual bias eliminating for generalized zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12664–12673, 2020.
- Variational lossy autoencoder. arXiv preprint arXiv:1611.02731, 2016.
- Manifold embedded joint geometrical and statistical alignment for visual domain adaptation. Knowledge-Based Systems, 257:109886, 2022.
- Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning, pages 1558–1566. PMLR, 2016.
- Deep feature consistent variational autoencoder. In 2017 IEEE winter conference on applications of computer vision (WACV), pages 1133–1141. IEEE, 2017.
- A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862, 2017.
- Sample complexity of testing the manifold hypothesis. Advances in neural information processing systems, 23, 2010.
- An integral projection-based semantic autoencoder for zero-shot learning. IEEE Access, 2023.
- Tom Halverson. Linear algebra with applications. The American Mathematical Monthly, 104(7):681, 1997.
- Relational knowledge transfer for zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
- Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2751–2758. IEEE, 2012.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Zero-shot learning-the good, the bad and the ugly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4582–4591, 2017.
- Distributed representations of words and phrases and their compositionality. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
- Vgse: Visually-grounded semantic embeddings for zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9316–9325, 2022.
- Multi-modal cycle-consistent generalized zero-shot learning. In Proceedings of the European conference on computer vision (ECCV), pages 21–37, 2018.
- Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5542–5551, 2018.
- f-vaegan-d2: A feature generating framework for any-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10275–10284, 2019.
- Leveraging the invariant side of generative zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7402–7411, 2019.
- Generalized zero-and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8247–8255, 2019.
- Generative dual adversarial network for generalized zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 801–810, 2019.
- Fine-grained generalized zero-shot learning via dense attribute-based attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4483–4493, 2020.
- Invertible zero-shot recognition flows. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, pages 614–631. Springer, 2020.
- Generating diverse augmented attributes for generalized zero shot learning. Pattern Recognition Letters, 166:126–133, 2023.
- Semantic feature extraction for generalized zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1166–1173, 2022.
- Deep learning. MIT press, 2016.
- Improved training of wasserstein gans. Advances in neural information processing systems, 30, 2017.
- Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6418–6428, 2020.
- A robust variational autoencoder using beta divergence. Knowledge-based systems, 238:107886, 2022.
- Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE international conference on computer vision, pages 2452–2460, 2015.