Papers
Topics
Authors
Recent
2000 character limit reached

Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition (2403.05428v3)

Published 8 Mar 2024 in cs.MM

Abstract: In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, designed to provide a deeper understanding of stickers. Recognizing multiple tags for stickers becomes particularly challenging due to sticker tags usually are fine-grained attribute aware. Hence, we propose an Attentive Attribute-oriented Prompt Learning method, ie, Att$2$PL, to capture informative features of stickers in a fine-grained manner to better differentiate tags. Specifically, we first apply an Attribute-oriented Description Generation (ADG) module to obtain the description for stickers from four attributes. Then, a Local Re-attention (LoR) module is designed to perceive the importance of local information. Finally, we use prompt learning to guide the recognition process and adopt confidence penalty optimization to penalize the confident output distribution. Extensive experiments show that our method achieves encouraging results for all commonly used metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.
  2. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.
  3. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5177–5186.
  4. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
  5. Towards exploiting sticker for multimodal sentiment analysis in social media: A new dataset and baseline. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6795–6804.
  6. Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1277–1286.
  7. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  8. Kalpana D Joshi and Prakash S Nalwade. 2013. Modified k-means for better initial cluster centres. International Journal of Computer Science and Mobile Computing, 2(7):219–223.
  9. An efficient k-means clustering algorithm: Analysis and implementation. IEEE transactions on pattern analysis and machine intelligence, 24(7):881–892.
  10. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2.
  11. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  12. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
  13. General multi-label image classification with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16478–16488.
  14. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744.
  15. Visual instruction tuning. arXiv preprint arXiv:2304.08485.
  16. Ser30k: A large-scale dataset for sticker emotion recognition. In Proceedings of the 30th ACM International Conference on Multimedia, pages 33–41.
  17. Query2label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834.
  18. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022.
  19. Asymmetric loss for multi-label classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 82–91.
  20. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR.
  21. Going deeper with image transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 32–42.
  22. Attention is all you need. Advances in neural information processing systems, 30.
  23. Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2285–2294.
  24. Attention-driven dynamic graph convolutional network for multi-label image recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pages 649–665. Springer.
  25. Sticker820k: Empowering interactive retrieval with stickers. arXiv preprint arXiv:2306.06870.
  26. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592.
  27. Residual attention: A simple but effective method for multi-label recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 184–193.
  28. Two-stream transformer for multi-label image classification. In Proceedings of the 30th ACM International Conference on Multimedia, pages 3598–3607.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.