Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can AI Outperform Human Experts in Creating Social Media Creatives? (2404.00018v1)

Published 19 Mar 2024 in cs.HC, cs.AI, and cs.SI

Abstract: Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by LLMs. We take the most popular Instagram posts (with the biggest number of like clicks) in top brands' Instagram accounts to create social media creatives. We give GPT 4 several prompt instructions with text descriptions to generate the most effective prompts for cutting-edge text-to-image generators: Midjourney, DALL E 3, and Stable Diffusion. LLM-augmented prompts can boost AI's abilities by adding objectives, engagement strategy, lighting and brand consistency for social media image creation. We conduct an extensive human evaluation experiment, and find that AI excels human experts, and Midjourney is better than the other text-to-image generators. Surprisingly, unlike conventional wisdom in the social media industry, prompt instruction including eye-catching shows much poorer performance than those including natural. Regarding the type of creatives, AI improves creatives with animals or products but less with real people. Also, AI improves creatives with short text descriptions more than with long text descriptions, because there is more room for AI to augment prompts with shorter descriptions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Language models are few-shot learners. In Neural Information Processing Systems, volume 33, pages 1877–1901, 2020.
  2. How many demonstrations do you need for in-context learning? In Empirical Methods in Natural Language Processing, pages 11149–11159, 2023.
  3. Marked personas: Using natural language prompts to measure stereotypes in language models. In Association for Computational Linguistics, pages 1504–1532, 2023.
  4. Multi-dimensional gender bias classification. In Empirical Methods in Natural Language Processing, pages 314–331, November 2020.
  5. Cogview: Mastering text-to-image generation via transformers. In Neural Information Processing Systems, pages 19822–19835, 2021.
  6. Taming transformers for high-resolution image synthesis. In Computer Vision and Pattern Recognition, pages 12873–12883, 2021.
  7. Denoising diffusion probabilistic models. In in Neural Information Processing Systems, 2020.
  8. Adrian Horton. Embrace it or risk obsolescence: How will ai jobs affect hollywood? The Guardian, 2018. Available at: https://www.theguardian.com/film/2023/aug/21/ai-jobs-hollywood-writers-actors-strike (Accessed: Jan 13th, 2023).
  9. Large language models are zero-shot reasoners. In Neural Information Processing Systems, volume 35, pages 22199–22213, 2022.
  10. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 2023.
  11. Generating images from captions with attention. In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations, 2016.
  12. In reversal because of a.i., office jobs are now more at risk. The Guardian, 2023. Available at: https://www.theguardian.com/film/2023/aug/21/ai-jobs-hollywood-writers-actors-strike (Accessed: Jan 13th, 2023).
  13. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, volume 162, pages 16784–16804, 2022.
  14. SDXL: improving latent diffusion models for high-resolution image synthesis. CoRR, abs/2307.01952, 2023.
  15. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21, 2020.
  16. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831, 2021.
  17. Zero-shot text-to-image generation. In International Conference on Machine Learning, volume 139, pages 8821–8831, 2021.
  18. Hierarchical text-conditional image generation with CLIP latents. CoRR, abs/2204.06125, 2022.
  19. Generative adversarial text to image synthesis. In International Conference on Machine Learning, volume 48, pages 1060–1069, 2016.
  20. What you feel, is what you like influence of message appeals on customer engagement on instagram. Journal of Interactive Marketing, 49(1):20–53, 2020.
  21. High-resolution image synthesis with latent diffusion models. In Computer Vision and Pattern Recognition, pages 10674–10685, 2022.
  22. Photorealistic text-to-image diffusion models with deep language understanding. In Neural Information Processing Systems, 2022.
  23. Improving image captioning with better use of caption. In Association for Computational Linguistics, pages 7454–7464, 2020.
  24. Characteristics of antivaccine messages on social media: Systematic review. Journal of Medical Internet Research, 23, 2021.
  25. Emergent abilities of large language models. Transactions on Machine Learning Research, 2022.
  26. Chain-of-thought prompting elicits reasoning in large language models. In Neural Information Processing Systems, volume 35, pages 24824–24837, 2022.
  27. Ethical and social risks of harm from language models. CoRR, abs/2112.04359, 2021.
  28. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Computer Vision and Pattern Recognition,, pages 1316–1324, 2018.
  29. Tree of thoughts: Deliberate problem solving with large language models. In Neural Information Processing Systems, 2023.
  30. Attracting comments: Digital engagement metrics on facebook and financial performance. Journal of Advertising, 47(1):24–37, 2018.
  31. Scaling autoregressive models for content-rich text-to-image generation. Transactions on Machine Learning Research, 2022.
  32. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. International Conference on Computer Vision, pages 5908–5916, 2016.
  33. Cross-modal contrastive learning for text-to-image generation. In Computer Vision and Pattern Recognition, pages 833–842, 2021.
  34. Text-to-image diffusion models in generative AI: A survey. CoRR, abs/2303.07909, 2023.
  35. Towards language-free training for text-to-image generation. In Computer Vision and Pattern Recognition, pages 17907–17917, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Eunkyung Park (2 papers)
  2. Raymond K. Wong (2 papers)
  3. Junbum Kwon (1 paper)
Reddit Logo Streamline Icon: https://streamlinehq.com