Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Examining the Challenges in Archiving Instagram (2401.02029v1)

Published 4 Jan 2024 in cs.DL

Abstract: To prevent the spread of disinformation on Instagram, we need to study the accounts and content of disinformation actors. However, due to their malicious nature, Instagram often bans accounts that are responsible for spreading disinformation, making these accounts inaccessible from the live web. The only way we can study the content of banned accounts is through public web archives such as the Internet Archive. However, there are many issues present with archiving Instagram pages. Specifically, we focused on the issue that many Wayback Machine Instagram mementos redirect to the Instagram login page. In this study, we determined that mementos of Instagram account pages on the Wayback Machine began redirecting to the Instagram login page in August 2019. We also found that Instagram mementos on Archive.today, Arquivo.pt, and Perma.cc are also not well archived in terms of quantity and quality. Moreover, we were unsuccessful in all our attempts to archive Katy Perry's Instagram account page on Archive.today, Arquivo.pt, and Conifer. Although in the minority, replayable Instagram mementos exist in public archives and contain valuable data for studying disinformation on Instagram. With that in mind, we developed a Python script to web scrape Instagram mementos. As of August 2023, the Python script can scrape Wayback Machine archives of Instagram account pages between November 7, 2012 and June 8, 2018.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. S. J. Dixon, “Most popular social networks worldwide as of January 2023, ranked by number of monthly active users.” https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/, Feb 2023.
  2. R. DiResta, K. Shaffer, B. Ruppel, D. Sullivan, R. Matney, R. Fox, J. Albright, and B. Johnson, “The tactics & tropes of the Internet Research Agency.” https://www.intelligence.senate.gov/sites/default/files/documents/NewKnowledge-Disinformation-Report-Whitepaper.pdf, 2019.
  3. H. Bragg and M. C. Weigle, “Discovering the traces of disinformation on instagram in the Internet Archive,” Tech. Rep. arXiv:2301.091888, arXiv, Jan. 2023.
  4. H. Bragg, H. Jayanetti, M. L. Nelson, and M. C. Weigle, “Less than 4% of archived Instagram account pages for the Disinformation Dozen are replayable,” in Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL), June 2023.
  5. Center for Countering Digital Hate, “The Disinformation Dozen: Why platforms must act on twelve leading online anti-vaxxers.” https://counterhate.com/research/the-disinformation-dozen, Mar 2021.
  6. H. Jayanetti, “How well is Instagram archived?.” https://ws-dl.blogspot.com/2020/11/2020-11-04-how-well-is-instagram.html, Nov 2020.
  7. S. Alam and M. L. Nelson, “MemGator - A Portable Concurrent Memento Aggregator,” in Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 243–244, Jun 2016.
  8. H. Jayanetti, “Supporting account-based queries for archived Instagram posts,” Master’s thesis, Old Dominion University, May 2023.
  9. M. Kelly, L. M. Alkwai, M. L. Nelson, M. C. Weigle, and H. Van de Sompel, “Impact of URI canonicalization on memento count,” Tech. Rep. arxiv:1601.05142, arXiv, March 2017.
  10. R. Lakshmanan, “Instagram now forces people to sign in to view public profiles.” https://thenextweb.com/news/instagram-now-forces-people-to-sign-in-to-view-public-profiles, Oct 2019.
  11. K. Bell, “You can’t lurk on Instagram anymore unless you’re logged in.” https://mashable.com/article/instagram-requires-log-in-to-view-profiles, Oct 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Rachel Zheng (5 papers)
  2. Michele C. Weigle (55 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com