A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok (2403.19717v1)
Abstract: Mobile apps have embraced user privacy by moving their data processing to the user's smartphone. Advanced ML models, such as vision models, can now locally analyze user images to extract insights that drive several functionalities. Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal (1) what insights vision models in both apps infer about users from their image and video data and (2) whether these models exhibit performance disparities with respect to demographics. As vision models provide signals for sensitive technologies like age verification and facial recognition, understanding potential biases in these models is crucial for ensuring that users receive equitable and accurate services. We develop a novel method for capturing and evaluating ML tasks in mobile apps, overcoming challenges like code obfuscation, native code execution, and scalability. Our method comprises ML task detection, ML pipeline reconstruction, and ML performance assessment, specifically focusing on demographic disparities. We apply our methodology to TikTok and Instagram, revealing significant insights. For TikTok, we find issues in age and gender prediction accuracy, particularly for minors and Black individuals. In Instagram, our analysis uncovers demographic disparities in the extraction of over 500 visual concepts from images, with evidence of spurious correlations between demographic features and certain concepts.
- 10 - Android formats; LIEF Documentation — lief-project.github.io. https://lief-project.github.io/doc/latest/tutorials/10_android_formats.html.
- Concepts — Android NDK — Android Developers — developer.android.com. https://developer.android.com/ndk/guides/concepts.
- Frida • A world-class dynamic instrumentation toolkit — frida.re. https://frida.re/.
- [Guide] [November 6, 2023] Root Pixel 6 Unlock Bootloader + Pass SafetyNet + Both Slots Bootable + More — xdaforums.com. https://xdaforums.com/t/guide-root-pixel-6-with-magisk-unlock-bootloader-pass-safetynet-more.4388733/.
- Implementing ART just-In-time (JIT) Compiler — Android Open Source Project — source.android.com. https://source.android.com/docs/core/runtime/jit-compiler.
- Introducing Instagram Reels — about.instagram.com. https://about.instagram.com/blog/announcements/introducing-instagram-reels-announcement.
- JNI Functions — docs.oracle.com. https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html.
- JNI tips — Android NDK — Android Developers — developer.android.com. https://developer.android.com/training/articles/perf-jni#java.
- Privacy Policy (Instagram). https://privacycenter.instagram.com/policy/.
- Privacy Policy (TikTok). https://www.tiktok.com/legal/page/us/privacy-policy/en.
- PyTorch — pytorch.org. https://pytorch.org/mobile/home/.
- SafetyNet: Google’s tamper detection for Android · Yiannis Kozyrakis blog — koz.io. https://koz.io/inside-safetynet/.
- Social media company ceos testify on online child sexual exploitation — c-span.org. https://www.c-span.org/video/?532641-1/social-media-company-ceos-testify-online-child-sexual-exploitation-part-1.
- The Top 10 Social Media Sites & Platforms — searchenginejournal.com. https://www.searchenginejournal.com/social-media/social-media-platforms/#close.
- TikTok - Make Your Day — tiktok.com. https://www.tiktok.com/find/tiktok-live-verification-age?lang=en.
- TikTok CEO gives first public interview since congressional hearing — nbcnews.com. https://www.nbcnews.com/tech/tiktok-ceo-ted2023-conference-rcna80760.
- Understanding Callback Methods with Examples — onespan.com. https://www.onespan.com/blog/callback-methods-overview.
- Too quiet in the library: An empirical study of security updates in android apps’ native code. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 1347–1359. IEEE, 2021.
- Smart at what cost? characterising mobile deep neural networks in the wild. In Proceedings of the 21st ACM Internet Measurement Conference, IMC ’21, page 658–672, New York, NY, USA, 2021. Association for Computing Machinery.
- Androshield: Automated android applications vulnerability detection, a hybrid static and dynamic analysis approach. Information, 10(10):326, 2019.
- Fairness in machine learning. Nips tutorial, 1:2017, 2017.
- Big data’s disparate impact. California law review, pages 671–732, 2016.
- Of ahead time: Evaluating disassembly of android apps compiled to binary oats through the art. In Proceedings of the 16th European Workshop on System Security, pages 21–29, 2023.
- A survey on smartphone user’s security choices, awareness and education. Computers & Security, 88:101647, 2020.
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In Sorelle A. Friedler and Christo Wilson, editors, Conference on Fairness, Accountability and Transparency, FAT 2018, 23-24 February 2018, New York, NY, USA, volume 81 of Proceedings of Machine Learning Research, pages 77–91. PMLR, 2018.
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77–91. PMLR, 2018.
- Cheating your apps: Black-box adversarial attacks on deep learning apps. Journal of Software: Evolution and Process, 2023.
- Shou Chew. Testimony before the u.s. house committee on energy and commerce. https://docs.house.gov/meetings/IF/IF00/20230323/115519/HHRG-118-IF00-Wstate-ChewS-20230323.pdf, 2023. Accessed: Dec-6-2023.
- Grant Clauser. Amazon’s Alexa Never Stops Listening to You. Should You Worry? — nytimes.com. https://www.nytimes.com/wirecutter/blog/amazons-alexa-never-stops-listening-to-you/.
- On the use of automatically generated synthetic image datasets for benchmarking face recognition. In 2021 IEEE International Joint Conference on Biometrics (IJCB), pages 1–8. IEEE, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Understanding real-world threats to deep learning models in android apps. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, page 785–799, New York, NY, USA, 2022. Association for Computing Machinery.
- Identifying java calls in native code via binary scanning. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 388–400, 2020.
- Google. quick_trampoline_entrypoints.cc. https://cs.android.com/android/platform/superproject/main/+/main:art/runtime/entrypoints/quick/quick_trampoline_entrypoints.cc.
- Obfuscated android application development. In Proceedings of the Third Central European Cybersecurity Conference, pages 1–6, 2019.
- Pam Greenberg. 2020 Consumer Data Privacy Legislation — ncsl.org. https://www.ncsl.org/technology-and-communication/2020-consumer-data-privacy-legislation.
- Face recognition vendor test (fvrt): Part 3, demographic effects. National Institute of Standards and Technology Gaithersburg, MD, 2019.
- Cross-language binary-source code matching with intermediate representations. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 601–612. IEEE, 2022.
- Smart app attack: Hacking deep learning models in android apps. Trans. Info. For. Sec., 17:1827–1840, jan 2022.
- Robustness of on-device models: Adversarial attack to deep learning models on android apps. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 101–110. IEEE, 2021.
- InfoQ. Introduction to bytedance pitaya. https://xie.infoq.cn/article/d9a05a40ddbc1b01218f46a0a.
- Kernel-based behavior analysis for android malware detection. In 2011 seventh international conference on computational intelligence and security, pages 1011–1015. IEEE, 2011.
- Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1548–1558, 2021.
- Kate Kaye. Why AI and machine learning are drifting away from the cloud — protocol.com. https://www.protocol.com/enterprise/ai-machine-learning-cloud-data.
- Entroplyzer: Android malware classification and characterization using entropy analysis of dynamic characteristics. In 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), pages 1–12. IEEE, 2021.
- Benchmark dalvik and native code for android system. In 2011 Second International Conference on Innovations in Bio-inspired Computing and Applications, pages 320–323. IEEE, 2011.
- Fa3: Fine-grained android application analysis. In Proceedings of the 24th International Workshop on Mobile Computing Systems and Applications, pages 74–80, 2023.
- Estimation of the power of the kruskal-wallis test. Biometrical Journal, 38(5):613–630, 1996.
- Prerna Mahtani. TikTok scans videos to determine users’ ages - iTMunch — itmunch.com. https://itmunch.com/tiktok-scans-videos-to-determine-users-ages/.
- I find your behavior disturbing: Static and dynamic app behavioral analysis for detection of android malware. In 2016 14th Annual conference on privacy, security and trust (PST), pages 129–136. IEEE, 2016.
- A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35, 2021.
- Uncovering bias in face generation models. arXiv preprint arXiv:2302.11562, 2023.
- Fairness through robustness: Investigating robustness disparity in deep learning. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 466–477, New York, NY, USA, 2021. Association for Computing Machinery.
- Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 429–435, 2019.
- 50 ways to leak your data: An exploration of apps’ circumvention of the android permissions system. In 28th USENIX Security Symposium (USENIX Security 19), pages 603–620, 2019.
- Demistify: Identifying on-device machine learning models stealing and reuse vulnerabilities in mobile apps. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), pages 468–480. IEEE Computer Society, 2024.
- Unbiased face synthesis with diffusion models: Are we there yet? arXiv preprint arXiv:2309.07277, 2023.
- Daniel Ruby. 77 Instagram Statistics 2023 (Active Users & Trends) — demandsage.com. https://www.demandsage.com/instagram-statistics/.
- Jucify: A step towards android code unification for enhanced static analysis. In Proceedings of the 44th International Conference on Software Engineering, pages 1232–1244, 2022.
- Gender bias in machine translation. Transactions of the Association for Computational Linguistics, 9:845–874, 2021.
- Fairness and abstraction in sociotechnical systems. In Proceedings of the conference on fairness, accountability, and transparency, pages 59–68, 2019.
- The bias amplification paradox in text-to-image generation. arXiv preprint arXiv:2308.00755, 2023.
- Natasha Singer. At Meta, Millions of Underage Users Were an ‘Open Secret,’ States Say — nytimes.com. https://www.nytimes.com/2023/11/25/technology/instagram-meta-children-privacy.html.
- Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets. arXiv preprint arXiv:2305.15407, 2023.
- Mind your weight (s): A large-scale study on insufficient machine learning model protection in mobile apps. In 30th USENIX Security Symposium (USENIX Security 21), pages 1955–1972, 2021.
- topjohnwu. GitHub - topjohnwu/Magisk: The Magic Mask for Android — github.com. https://github.com/topjohnwu/Magisk. [Accessed 14-03-2024].
- James Vincent. Instagram is testing an AI tool that verifies your age by scanning your face — theverge.com. https://www.theverge.com/2022/6/23/23179752/instagram-age-verification-ai-social-vouching-methods.
- Jn-saf: Precise and efficient ndk/jni-aware inter-language static analysis framework for security vetting of android applications with native code. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 1137–1150, 2018.
- What twitter knows: Characterizing ad targeting practices, user perceptions, and ad explanations through users’ own twitter data. In 29th USENIX Security Symposium (USENIX Security 20), pages 145–162, 2020.
- Eric W Weisstein. Bonferroni correction. https://mathworld. wolfram. com/, 2004.
- Alice Xiang. Being’seen’vs.’mis-seen’: Tensions between privacy and fairness in computer vision. Harvard Journal of Law & Technology, 36(1), 2022.
- A first look at deep learning apps on smartphones. In The World Wide Web Conference, WWW ’19, page 2125–2136, New York, NY, USA, 2019. Association for Computing Machinery.
- Ndroid: Toward tracking information flows across multiple android contexts. IEEE Transactions on Information Forensics and Security, 14(3):814–828, 2018.
- Malton: Towards {{\{{On-Device}}\}}{{\{{Non-Invasive}}\}} mobile malware analysis for {{\{{ART}}\}}. In 26th USENIX Security Symposium (USENIX Security 17), pages 289–306, 2017.
- Celeba-spoof: Large-scale face anti-spoofing dataset with rich annotations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pages 70–85. Springer, 2020.