Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dark Web Activity Classification Using Deep Learning (2306.07980v3)

Published 30 May 2023 in cs.IR and cs.CV

Abstract: In contemporary times, people rely heavily on the internet and search engines to obtain information, either directly or indirectly. However, the information accessible to users constitutes merely 4% of the overall information present on the internet, which is commonly known as the surface web. The remaining information that eludes search engines is called the deep web. The deep web encompasses deliberately hidden information, such as personal email accounts, social media accounts, online banking accounts, and other confidential data. The deep web contains several critical applications, including databases of universities, banks, and civil records, which are off-limits and illegal to access. The dark web is a subset of the deep web that provides an ideal platform for criminals and smugglers to engage in illicit activities, such as drug trafficking, weapon smuggling, selling stolen bank cards, and money laundering. In this article, we propose a search engine that employs deep learning to detect the titles of activities on the dark web. We focus on five categories of activities, including drug trading, weapon trading, selling stolen bank cards, selling fake IDs, and selling illegal currencies. Our aim is to extract relevant images from websites with a ".onion" extension and identify the titles of websites without images by extracting keywords from the text of the pages. Furthermore, we introduce a dataset of images called Darkoob, which we have gathered and used to evaluate our proposed method. Our experimental results demonstrate that the proposed method achieves an accuracy rate of 94% on the test dataset.

Summary

We haven't generated a summary for this paper yet.