Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media (2401.08825v1)

Published 16 Jan 2024 in cs.LG, cs.CL, and cs.CV

Abstract: Online reviews in the form of user-generated content (UGC) significantly impact consumer decision-making. However, the pervasive issue of not only human fake content but also machine-generated content challenges UGC's reliability. Recent advances in LLMs may pave the way to fabricate indistinguishable fake generated content at a much lower cost. Leveraging OpenAI's GPT-4-Turbo and DALL-E-2 models, we craft AiGen-FoodReview, a multi-modal dataset of 20,144 restaurant review-image pairs divided into authentic and machine-generated. We explore unimodal and multimodal detection models, achieving 99.80% multimodal accuracy with FLAVA. We use attributes from readability and photographic theories to score reviews and images, respectively, demonstrating their utility as hand-crafted features in scalable and interpretable detection models, with comparable performance. The paper contributes by open-sourcing the dataset and releasing fake review detectors, recommending its use in unimodal and multimodal fake review detection tasks, and evaluating linguistic and visual features in synthetic versus authentic data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Online review helpfulness: Role of qualitative factors. Psychology & Marketing, 33(11): 1006–1017.
  2. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. arXiv:2204.06745.
  3. Language Models are Few-Shot Learners. arXiv:2005.14165.
  4. Evaluating large language models trained on code. arXiv:2107.03374.
  5. Survey of review spam detection using machine learning techniques. Journal of Big Data, 2(23): 1–24.
  6. A formula for predicting readability: Instructions. Educational research bulletin, 37–54.
  7. Dean, G. 2021. Websites are selling fake reviews ’in bulk’ to Amazon merchants, a report found. One site offered 1,000 reviews for $11,000. https://www.businessinsider.com/fake-amazon-reviews-for-sale-buy-merchants-amazons-choice-2021-2. Accessed: 2024-01-03.
  8. How people evaluate online reviews. Communication Research, 45(5): 719–736.
  9. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805.
  10. Do online reviews matter?—An empirical investigation of panel data. Decision support systems, 45(4): 1007–1016.
  11. Why do travelers trust TripAdvisor? Antecedents of trust towards consumer-generated media and its influence on recommendation adoption and word of mouth. Tourism Management, 51: 174–185.
  12. Flesch, R. 1948. A new readability yardstick. Journal of Applied Psychology, 32(3): 221–233.
  13. Camera eats first: exploring food aesthetics portrayed on social media using deep learning. International Journal of Contemporary Hospitality Management, 34(9): 3300–3331.
  14. Dissecting AI-Generated Fake Reviews: Detection and Analysis of GPT-Based Restaurant Reviews on Social Media. In Proceedings of the International Conference on Information Systems, 8. Aisnet.
  15. Generative Adversarial Nets. In Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N.; and Weinberger, K., eds., Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc.
  16. The manager’s dilemma: a conceptualization of online review manipulation strategies. Current Issues in Tourism, 21(5): 484–503.
  17. The market for fake reviews. Marketing Science, 41(5): 896–921.
  18. Horbatko, L. 2023. AI Image Detector. https://github.com/guyfloki/ai-image-detector.
  19. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, 4904–4916. PMLR.
  20. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences, 103: 102274.
  21. Auto-Encoding Variational Bayes. arXiv:1312.6114.
  22. Generating Images with Multimodal Language Models. arXiv:2305.17216.
  23. Sentiment manipulation in online platforms: An analysis of movie tweets. Production and Operations Management, 27(3): 393–416.
  24. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems, volume 34, 9694–9705. Curran Associates, Inc.
  25. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692.
  26. Luca, M. 2016. Reviews, reputation, and revenue: The case of Yelp. com. Com (March 15, 2016). Harvard Business School NOM Unit Working Paper, (12-016).
  27. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, 62(12): 3412–3427.
  28. A Unified Approach to Interpreting Model Predictions. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems 30, 4765–4774. Curran Associates, Inc.
  29. McCluskey, M. 2022. Inside the War on Fake Consumer Reviews. https://time.com/6192933/fake-reviews-regulation. Accessed: 2024-01-07.
  30. Conditional Generative Adversarial Nets. arXiv:1411.1784.
  31. What yelp fake review filter might be doing? In Proceedings of the international AAAI conference on web and social media, volume 7, 409–418.
  32. Munzel, A. 2016. Assisting consumers in detecting fake reviews: The role of identity information disclosure and consensus. Journal of Retailing and Consumer Services, 32: 96–108.
  33. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  34. Fake Review Detection on Online E-Commerce Platforms: A Systematic Literature Review. Data Mining and Knowledge Discovery, 35(5): 1830–1881.
  35. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020.
  36. Language models are unsupervised multitask learners. OpenAI blog, 1(8): 9.
  37. ArtiFact: A Large-Scale Dataset with Artificial and Factual Images for Generalizable and Robust Synthetic Image Detection. arXiv:2302.11970.
  38. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125.
  39. Zero-shot text-to-image generation. In International Conference on Machine Learning, 8821–8831. PMLR.
  40. Automated readability index. Technical report, Amrl-Tr. Aerospace Medical Research Laboratories.
  41. FLAVA: A Foundational Language And Vision Alignment Model. arXiv:2112.04482.
  42. Release Strategies and the Social Impacts of Language Models. arXiv:1908.09203.
  43. A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions. arXiv:2312.08578.
  44. Pixel Recurrent Neural Networks. arXiv:1601.06759.
  45. Attention is All you Need. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  46. Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science. arXiv:2305.15041.
  47. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv:2306.07899.
  48. Examining the Impact of Yelp’s Elite Squad on Users’ Following Contribution. In CIS 2021 Proceedings. 23, 1–16.
  49. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382.
  50. The economic value of online reviews. Marketing Science, 34(5): 739–754.
  51. Why is a picture ‘worth a thousand words’? Pictures as information in perceived helpfulness of online reviews. International Journal of Consumer Studies, 45(3): 364–378.
  52. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 10(5): 1122–1136.
  53. Effects of online reviews and managerial responses from a review manipulation perspective. Current Issues in Tourism, 23(17): 2207–2222.
  54. Can consumer-posted photos serve as a leading indicator of restaurant survival? Evidence from Yelp. Management Science, 69(1): 25–50.
  55. A matter of reevaluation: incentivizing users to contribute reviews in online platforms. Decision Support Systems, 128: 113158.
  56. Welfare economics of review information: Implications for the online selling platform owner. International Journal of Production Economics, 184: 69–79.
  57. Modeling consumer learning from online product reviews. Marketing Science, 32(1): 153–169.
  58. Large Language Models Are Human-Level Prompt Engineers. arXiv:2211.01910.
  59. Manufactured opinions: The effect of manipulating online product reviews. Journal of Business Research, 87: 24–35.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com