Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach
Abstract: Influencer marketing has become a widely used strategy for reaching customers. Despite growing interest among influencers and brand partners in predicting engagement with influencer videos, there has been little research on the relative importance of different video data modalities in predicting engagement. We analyze unstructured data from long-form YouTube influencer videos - spanning text, audio, and video images - using an interpretable deep learning framework that leverages model attention to video elements. This framework enables strong out-of-sample prediction, followed by ex-post interpretation using a novel approach that prunes spurious associations. Our prediction-based results reveal that "what is said" through words (text) is more important than "how it is said" through imagery (video images) or acoustics (audio) in predicting video engagement. Interpretation-based findings show that during the critical onset period of a video (first 30 seconds), auditory stimuli (e.g., brand mentions and music) are associated with sentiment expressed in verbal engagement (comments), while visual stimuli (e.g., video images of humans and packaged goods) are linked with sentiment expressed through non-verbal engagement (the thumbs-up/down ratio). We validate our approach through multiple methods, connect our findings to relevant theory, and discuss implications for influencers, brands and agencies.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.