Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Social Media Ready Caption Generation for Brands (2401.01637v1)

Published 3 Jan 2024 in cs.CL

Abstract: Social media advertisements are key for brand marketing, aiming to attract consumers with captivating captions and pictures or logos. While previous research has focused on generating captions for general images, incorporating brand personalities into social media captioning remains unexplored. Brand personalities are shown to be affecting consumers' behaviours and social interactions and thus are proven to be a key aspect of marketing strategies. Current open-source multimodal LLMs are not directly suited for this task. Hence, we propose a pipeline solution to assist brands in creating engaging social media captions that align with the image and the brand personalities. Our architecture is based on two parts: a the first part contains an image captioning model that takes in an image that the brand wants to post online and gives a plain English caption; b the second part takes in the generated caption along with the target brand personality and outputs a catchy personality-aligned social media caption. Along with brand personality, our system also gives users the flexibility to provide hashtags, Instagram handles, URLs, and named entities they want the caption to contain, making the captions more semantically related to the social media handles. Comparative evaluations against various baselines demonstrate the effectiveness of our approach, both qualitatively and quantitatively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Jennifer L. Aaker. 1997a. Dimensions of brand personality. Journal of Marketing Research, 34(3):347–356.
  2. Jennifer L Aaker. 1997b. Dimensions of brand personality. Journal of marketing research, 34(3):347–356.
  3. Flamingo: a visual language model for few-shot learning. In NeurIPS.
  4. Flamingo: a visual language model for few-shot learning.
  5. Brand love. Journal of marketing, 76(2):1–16.
  6. Steven D Carr. 1996. The cult of brand personality. Marketing News, 30(10):4–9.
  7. Scaling instruction-finetuned language models.
  8. Aron Culotta and Jennifer Cutler. 2016. Mining brand perceptions from twitter social networks. Mark. Sci., 35(3):343–362.
  9. Instructblip: Towards general-purpose vision-language models with instruction tuning.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  11. Predicting personality from twitter. In PASSAT/SocialCom 2011, Privacy, Security, Risk and Trust (PASSAT), 2011 IEEE Third International Conference on and 2011 IEEE Third International Conference on Social Computing (SocialCom), Boston, MA, USA, 9-11 Oct., 2011, pages 149–156. IEEE Computer Society.
  12. CLIPScore: a reference-free evaluation metric for image captioning. In EMNLP.
  13. Dongling Huang and Lan Luo. 2016. Consumer preference elicitation of complex products using fuzzy support vector machine active learning. Mark. Sci., 35(3):445–464.
  14. Fisher vectors derived from hybrid gaussian-laplacian mixture models for image annotation. CoRR, abs/1411.7399.
  15. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML.
  16. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. CoRR, abs/2301.12597.
  17. Align before fuse: Vision and language representation learning with momentum distillation. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 9694–9705.
  18. Oscar: Object-semantics aligned pre-training for vision-language tasks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXX, volume 12375 of Lecture Notes in Computer Science, pages 121–137. Springer.
  19. G-eval: Nlg evaluation using gpt-4 with better human alignment.
  20. Brands matter: An empirical demonstration of the creation of shareholder value through branding. Journal of the Academy of Marketing Science, 34(2):224–235.
  21. Deep captioning with multimodal recurrent neural networks (m-rnn). In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  22. Hazel Markus and Paula Nurius. 1986. Possible selves. American psychologist, 41(9):954.
  23. A brand-new look at you: predicting brand personality in social media networks with machine learning. Journal of Interactive Marketing, 56(1):1–15.
  24. A brand-new look at you: Predicting brand personality in social media networks with machine learning. Journal of Interactive Marketing, 56:55–69.
  25. X-linear attention networks for image captioning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 10968–10977. Computer Vision Foundation / IEEE.
  26. Attend to You: Personalized Image Captioning with Context Sequence Memory Networks. In CVPR.
  27. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR.
  28. Learning transferable visual models from natural language supervision.
  29. How to measure alignment in perceptions of brand personality within online communities: interdisciplinary insights. Journal of Interactive Marketing, 35(1):70–85.
  30. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  31. An integrated approach for improving brand consistency of web content: Modeling, analysis, and recommendation. ACM Transactions on the Web (TWEB), 15(2):1–25.
  32. High-order attention models for visual question answering. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 3664–3674.
  33. A simple baseline for audio-visual scene-aware dialog. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 12548–12558. Computer Vision Foundation / IEEE.
  34. Factor graph attention. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 2039–2048. Computer Vision Foundation / IEEE.
  35. OFA: unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 23318–23340. PMLR.
  36. Predicting perceived brand personality with social media. In Proceedings of the International AAAI Conference on Web and social media, volume 10, pages 436–445.
  37. Predicting perceived brand personality with social media. Proceedings of the International AAAI Conference on Web and Social Media, 10(1):436–445.
  38. Large-scale network analysis for online social brand advertising. MIS Q., 40(4):849–868.
  39. Vinvl: Revisiting visual representations in vision-language models. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 5579–5588. Computer Vision Foundation / IEEE.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Himanshu Maheshwari (4 papers)
  2. Koustava Goswami (16 papers)
  3. Apoorv Saxena (14 papers)
  4. Balaji Vasan Srinivasan (33 papers)