- The paper demonstrates that food identity, derived from social media behavior, correlates with deeper consumer values and political stances.
- It employs keyword entropy and laterality index to robustly measure collective interests across food left-wing and right-wing groups.
- Retweet network and word association analyses reveal clear community segregation and distinct brand affinities, enabling targeted marketing insights.
This paper explores the concept of "food identity," suggesting that food preferences, categorized as "food left-wing" (preferring natural, organic, vegetarian options) and "food right-wing" (preferring fast food, convenience items), reflect broader personal values, consumer awareness, and even political leanings (You are what you eat: A social media study of food identity, 2018). The paper analyzes Japanese Twitter data to demonstrate these connections and their potential applications.
Data Collection and User Identification
- Data Source: Tweets were collected using the Twitter Search API over approximately one month (starting Dec 2016).
- Keyword Identification: Predefined lists of Japanese keywords associated with "food left-wing" (e.g., local production, slow food, vegetarian, organic) and "food right-wing" (e.g., fast food, junk food, frozen foods, convenience stores) were used (Table 1).
- Initial Dataset (Dataset 1): All tweets containing these keywords were collected (650k left-wing, 3.1M right-wing).
- User Filtering: Users who tweeted >= 30 times using keywords from only one category were identified as representative of that food identity. Users tweeting >= 30 times from both categories were excluded. This resulted in 1,233 food left-wing users and 5,010 food right-wing users.
- Timeline Dataset (Dataset 2): The complete timelines (all tweets, not just food-related) of these identified users were collected (3.6M left-wing tweets, 15.0M right-wing tweets).
Analysis Methods
- Measuring Collective Interest (Keyword Entropy & Laterality Index):
To measure how widely a topic (keyword) is discussed within each group, rather than just its raw frequency (which could be skewed by a few very active users), the paper uses normalized keyword entropy (Hk):
Hk=−x∈U∑Pk(x)log2Pk(x)/log2N
where Pk(x) is the probability a tweet with keyword k was made by user x, and N is the total number of users in the group. Hk ranges from 0 to 1, with higher values indicating the keyword is used by a broader set of users within the group.
To directly compare interest between the two groups, the Laterality Index (LI) is calculated:
LI=HkR+HkLHkR−HkL
where HkR and HkL are the keyword entropies for the right-wing and left-wing groups, respectively. LI ranges from -1 (stronger left-wing interest) to +1 (stronger right-wing interest).
Implementation Note: Bootstrapping (10 samples, 1000 resamples) was used to ensure statistical robustness of the entropy and LI calculations.
- Visualizing Consumer Awareness (Word Association Networks):
- To understand the contextual meaning and associations users have with certain terms, word embeddings (word2vec) were trained separately on the tweet texts (Dataset 2) for each group.
- Preprocessing: Tweets were tokenized using MeCab with the neologd dictionary, and stopwords (SlothLib), symbols, and URLs were removed.
- Word2Vec: The Gensim library was used to train word2vec models.
- Network Construction:
- Start with a seed word (e.g., "animal experiment").
- Find the top 20 words whose vector representations have a cosine similarity > 0.4 to the seed word's vector.
- Create links between the seed word and these associated words.
- Repeat the process using the newly found words as seeds.
- Visualize the resulting network to show word associations specific to each group.
- Analyzing Social Interactions (Retweet Networks):
- To map information flow, a directed network was constructed where nodes are users and a directed edge A -> B exists if user B retweeted user A.
- The network included the identified left-wing/right-wing users plus other users involved in their retweets (found in Dataset 2).
- The structure of this network reveals how information spreads within and between the two food identity groups.
Key Findings
- Distinct Interests: The analysis confirmed distinct interests extending beyond food:
- Food Left-Wing: Higher collective interest (LI<0) in terms like "trans fatty acid," "GM (genetically modified)," "agrochemical-free," socio-environmental issues ("fair trade," "environmental protection," "animal experiment"), "Starbucks," "Apple," "Sony," and "Corona" beer. Association networks showed strong negative connotations around "animal experiment."
- Food Right-Wing: Higher collective interest (LI>0) in "instant noodles," "high calorie," politically conservative terms ("conservative," "Prime Minister Abe"), "IKEA," "Costco," "Red Bull," and most tech/auto brands (except those favored by the left).
- Brand Preferences: Using keyword entropy based on co-occurrence of brand names and positive words (like "want," "bought," "tasty," "cool"), the paper found distinct brand affinities aligning with the groups' profiles (e.g., Starbucks for left, IKEA/Costco/Red Bull for right).
- Network Segregation: The retweet network visualization (Fig. 7) showed clear segregation between the food left-wing (blue nodes) and food right-wing (red nodes) users, indicating they primarily consume and share information within their respective groups. The left-wing cluster appeared more densely interconnected. A news outlet (@livedoornews, yellow node) acted as a significant bridge, retweeting content relevant to both groups.
- Vocabulary Differences: TF-IDF analysis of popular keywords (Table 5) further highlighted distinct vocabularies, with left-wing users frequently using terms like "beauty," "health," "nature," and right-wing users using terms like "supermarket," "ice cream."
Practical Applications and Implications
- Proxy for User Understanding: Food identity, easily observable on social media, can serve as a useful proxy for understanding underlying values, lifestyle choices, and consumer preferences without needing explicit demographic data or surveys.
- Targeted Marketing: Instead of broad advertising, marketers can target campaigns based on inferred food identity. For example, promoting eco-friendly products or fair-trade coffee to users exhibiting left-wing food identity characteristics, or convenience-focused items and volume discounts to right-wing users.
- Social Science Research: Provides a method to quantify latent personal attributes using digital trace data, potentially complementing traditional survey methods.
- Information Diffusion Analysis: Identifying segregated communities and bridging nodes (like news outlets) helps understand how information (and misinformation) spreads within different online populations.
Limitations
- Social Media Bias: Twitter users may not represent the general population.
- Keyword Classification: Keyword-based identification can misclassify users (e.g., someone criticizing fast food might be labeled right-wing) and doesn't capture sarcasm or complex contexts. Future work could incorporate sentiment analysis or more advanced NLP techniques.
- Oversimplification: Reducing identity to a left-right spectrum is a simplification of complex, multi-dimensional attributes.
In essence, the paper provides a practical framework using keyword analysis, word embeddings, and network analysis on social media data to demonstrate that "you are what you eat" extends into the digital field, offering valuable insights into consumer behavior and social dynamics.