Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

4TCT, A 4chan Text Collection Tool (2307.03556v1)

Published 7 Jul 2023 in cs.DL and cs.SI

Abstract: 4chan is a popular online imageboard which has been widely studied due to an observed concentration of far-right, antisemitic, racist, misogynistic, and otherwise hateful material being posted to the site, as well as the emergence of political movements and the evolution of memes which are posted there, discussed in Section 1.1. We have created a tool developed in Python which utilises the 4chan API to collect data from a selection of boards. This paper accompanies the release of the code via the github repository: https://github.com/jhculb/4TCT. We believe this tool will be of use to academics studying 4chan by providing a tool for collection of data from 4chan to sociological researchers, and potentially contributing to GESIS' Digital Behavioural Data project.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Matti Nikkilä. Vivian about ready to choke a b*tch! – : A cartoon character’s journey into a boundary-building resource and its perpetuating resemiotizations during gamergate. 15(1):1–18. ISSN 1457-9863. doi:10.47862/apples.98324. URL https://apples.journal.fi/article/view/98324. Number: 1.
  2. Post-truth protest: How 4chan cooked up the pizzagate bullshit. 21(3). ISSN 1441-2616. doi:10.5204/mcj.1422. URL https://journal.media-culture.org.au/index.php/mcjournal/article/view/1422. Number: 3.
  3. Angela Nagle. Kill All Normies: Online Culture Wars From 4Chan And Tumblr To Trump And The Alt-Right. John Hunt Publishing. ISBN 978-1-78535-544-8.
  4. Kek, cucks, and god emperor trump: A measurement study of 4chan’s politically incorrect forum and its effects on the web. URL http://arxiv.org/abs/1610.03452.
  5. William Merrin. President troll: Trump, 4chan and memetic warfare. In Catherine Happer, Andrew Hoskins, and William Merrin, editors, Trump’s Media War, pages 201–226. Springer International Publishing. ISBN 978-3-319-94069-4. doi:10.1007/978-3-319-94069-4_13. URL https://doi.org/10.1007/978-3-319-94069-4_13.
  6. “and we will fight for our race!” a measurement study of genetic testing conversations on reddit and 4chan. 14:452–463. ISSN 2334-0770. doi:10.1609/icwsm.v14i1.7314. URL https://ojs.aaai.org/index.php/ICWSM/article/view/7314.
  7. 4chan and /b/: An analysis of anonymity and ephemerality in a large online community. 5(1):50–57. ISSN 2334-0770. doi:10.1609/icwsm.v5i1.14134. URL https://ojs.aaai.org/index.php/ICWSM/article/view/14134. Number: 1.
  8. The challenges of studying 4chan and the alt-right: ‘come on in the water’s fine’. 24(1):5–30. ISSN 1461-4448, 1461-7315. doi:10.1177/1461444820948803. URL http://journals.sagepub.com/doi/10.1177/1461444820948803.
  9. Generally curious: Thematically distinct datasets of general threads on 4chan/pol/. 14:863–867. ISSN 2334-0770. doi:10.1609/icwsm.v14i1.7351. URL https://ojs.aaai.org/index.php/ICWSM/article/view/7351.
  10. Variations on a theme? comparing 4chan, 8kun, and other chans’ far-right “/pol” boards. 15(1):65–80. ISSN 2334-3745. URL https://www.jstor.org/stable/26984798. Publisher: Terrorism Research Initiative.
  11. (((they))) rule: Memetic antagonism and nebulous othering on 4chan. 22(12):2218–2237. ISSN 1461-4448, 1461-7315. doi:10.1177/1461444819888746. URL http://journals.sagepub.com/doi/10.1177/1461444819888746.
  12. Understanding and detecting hateful content using contrastive learning. 17:257–268. ISSN 2334-0770. doi:10.1609/icwsm.v17i1.22143. URL https://ojs.aaai.org/index.php/ICWSM/article/view/22143.
  13. The bibliotheca anonoma, a. URL https://github.com/bibanon/bibanon. original-date: 2012-01-20T02:18:51Z.
  14. BASC-archiver, b. URL https://github.com/bibanon/BASC-Archiver.
  15. Issun. GChan. URL https://github.com/Issung/GChan. original-date: 2019-10-18T04:02:47Z.
  16. 4chan-downloader. URL https://github.com/Exceen/4chan-downloader. original-date: 2013-05-09T14:52:04Z.
  17. Aaron DeVore. 4chan archiver. URL https://github.com/adevore/4chan-archiver. original-date: 2011-01-15T20:35:45Z.
  18. Gary and Nei Cardoso de Oliveira Neto. archive-chan. URL https://github.com/LameLemon/archive-chan. original-date: 2019-03-02T17:11:40Z.
  19. Andrew Sychra. 4chan-b-scraper. URL https://github.com/andrewsyc/4chan-b-scraper. original-date: 2016-04-21T22:43:15Z.
  20. Meghan Denny. 4chan-dl. URL https://github.com/nektro/4chan-dl. original-date: 2019-12-31T10:12:54Z.
  21. woodenphone. 4chan archive pastebin downloader. URL https://github.com/woodenphone/4chan-archive-pastebin-downloader. original-date: 2016-12-05T08:28:48Z.
  22. bstrds. 4chdm. URL https://github.com/bstrds/4chdm. original-date: 2014-03-20T14:09:56Z.
  23. Devix71. 4chandownloader. URL https://github.com/Devix71/4ChanDownloader/tree/master.

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com