dark-web
dark-web
50 Shades of Dark
Threat Intelligence Reveals Secrets: From the Surface to the Dark Web
Summary
There is a lot of talk about the dark web these days, not least about how cybercriminals use it to spread malware, leak
intellectual property, and publish user account credentials.
We decided to explore the surface, deep, and dark parts of the web to see what information is available and how it is
connected. What we found was that there really is no sharp border between them. Information tends to seep into the
surface web from its darker parts, and it is more appropriate to talk about one web, with different shades of darkness. The
logic behind this is that brokers of illicit information on the dark web need to market their products, and hence need to
post links to them on the surface web (Brian Krebs has noted the same1).
Using Recorded Future’s real-time threat intelligence we can identify paste sites and forums as primary nodes of
communication between the surface and dark web, and show how these are used to link to both TOR/Onion sites and
various download sites.
This connectivity allows us to harvest and analyze metadata (such as link patterns, activity levels, and topics) about the dark
web from the surface web, giving us access to valuable information for threat analysis.
1
http://krebsonsecurity.com/2015/04/taking-down-fraud-sites-is-whac-a-mole/
@RecordedFuture www.recordedfuture.com
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
Introduction
People talk about the dark web as a mysterious place, hard to find and inaccessible to normal internet users. In this
paper we argue that there is no sharp border between the surface web and the dark web, and that there are indeed
links from the former to the latter. Different parts of the web thus exhibit varying degrees of shadiness, and can even be
characterized by both actual content and what it links to. Conceptually, we might distinguish three levels of the web, each
portraying different characteristics:
›› Surface web
»» Freely accessible
»» Indexed by Google, Bing, and others
»» Mostly open, but sometimes behind pay walls
»» Fairly stable, content is available from source for a long time
»» Language (mostly) suited for traditional natural language processing (NLP), and tools exist for extracting and analyzing data
›› Deep web
»» Often behind logins, but accessible to anyone registering
»» Database driven, and therefore not indexed by search engines
»» Sometimes by invitation only
»» Mostly un-indexed by search engines such as Google and Bing
›› Dark web
»» Not indexed or searchable by Google, Bing etc.
»» Often on other networks such as TOR2, Freenet3, I2P4, etc.
»» Frequently behind logins, accessible by invitation only
»» Sometimes uses special language like slang, leetspeak etc. which is not easily analyzed by normal NLP tools.
»» Volatile, with content that sometimes only stays available for a few minutes (in one study we did more than 10% of Pastebin posts
were removed within 48 hours)
Information tends to seep out even from the darkest corners of the web, if for no other reason than because that
information has a value, which cannot be realized unless it is possible to find. Therefore it has to be marketed in some way.
Wikipedia lists three uses of the dark web5 (or Darknet):
Clearly, our argument that information needs to be made accessible outside of the dark web to realize its (monetary)
value holds for both (2) and (3) in this list. The surface and deep web contain links to the dark web. How frequent is such
information?
2
https://www.torproject.org/
3
https://freenetproject.org/
4
https://geti2p.net/en/
5
http://en.wikipedia.org/wiki/Darknet_(overlay_network)
Recorded Future 2
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
As an initial example, we used the TOR Uncensored Hidden Wiki index (http://zqktlwi4fecvo6ri.onion/wiki/index.php/Main_
Page) to manually locate a dubious reseller of credit cards (Premium Cards, http://slwc4j5wkn3yyo5j.onion/ ):
Recorded Future 3
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
We then queried the Recorded Future index for the Onion link to Premium Cards, and indeed found 14 references from
the last 3.5 months:
These references all come from Pastebin. One of the pastes, for example, provides an index to several useful “Financial
Marketplaces”:
Recorded Future 4
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
As a second example, we investigated if illicit material was being marketed in sources that Recorded Future does harvest.
Credit card information with CVVs is a good example of such material, and we focused on material published in 2015, and
only in Russian. This yielded a small but interesting set of references related to advertising content and advice on how to
obtain and use the stolen credit card information:
Being even more specific, we looked for CVVs of credit cards related to Israel:
Recorded Future 5
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
Thus, there is no doubt illicit material is being marketed not only on the dark web but also on other channels such as paste
sites and forums.
Some of this content is nefarious enough to get quickly removed, even from Pastebin:
Recorded Future 6
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
In these cases, it is convenient that Recorded Future provides cached access to paste content we have harvested (NOTE:
this feature is available only to registered Recorded Future clients):
In some cases the answer is a straightforward “yes.” To download a Remote Access Trojan (RAT) like DarkComet, just
Google for instructions and download sites:
6
These are not all cyber related tweets for that time period, but a subset selected by Recorded Future filters.
Recorded Future 7
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
To get a bigger picture of where DarkComet is being distributed and discussed, we extracted all links in documents related
to it for a three-month period, using the Recorded Future API, and visualized the resulting links using the open source
graph visualization tool Gephi7:
7
http://gephi.github.io/
Recorded Future 8
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
This graph illustrates the different kind of sites where malware is mentioned or found:
›› General discussion forums (marked by yellow in the graph), including Facebook, Reddit, Twitter, and YouTube. Here, general discussions
about a malware take place, and a lot of the traffic is related to security companies and general warnings about a new threat.
›› More specialized forums, where hackers ask questions about how to find, download, modify, and use a malware. The Aljyyosh.com
site is a good example of such a site.
›› Repositories where malware can be found and downloaded. These are marked by red ovals and include download and content
distribution sites such as Dropbox, ge.tt, and Mediafire.
Social media sites and forums thus act as the marketing channels for the download sites where malware and related
services can be found.
Recorded Future 9
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
During 2014, the discussion was mostly active on sites related to cyber vulnerability conversations. In January 2015 the
discussion shifted over to social and mainstream media, mostly due to the discussions around the use of this Malware
in connection to the Charlie Hebdo events. There was actually an increase in mentions of DarkComet on Pastebin in late
November and December 2014. They are small in number, but the mentions which do exist are very instructive, as the
following screenshot illustrates:
Recorded Future 10
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
Here are a few of the sites linked to from Pastebin — note that these are instructions for how to download and set up
DarkComet:
Recorded Future 11
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
In addition to showing increased interest in DarkComet, the growing amount of mentions also indicates usage migrating
from higher risk threat actors to “garden variety” threat actors who source their malware tools from Pastebin.
Link Patterns
Next, we examined all links from texts on paste sites and forums for a period of 3.5 months that contained a reference to
malware and had a link to some other site, which we evaluated to see where the link was directed. Below are the top link
targets. If we compare this list with a list of popular file sharing sites for general content, such as http://www.ebizmba.com/
articles/file-sharing-websites, we see a mix of “general” file sharing sites and some clearly more focussed on shady material.
We also note that some very popular file sharing sites, like Dropbox, are missing from the top link list.
www.4shared.com 3469 3
www.mediafire.com 2463 2
rapidshareporns.com 1239
www.2shared.com 1206
uploading.com 1153
turbobit.net 898
ul.to 824
rapidshare.com 709 14
www.easybytez.com 646
fileshare.club 547
hotfile.com 547
www.jeuxvideo.com 329
bitshare.com 327
www.juanmata10.com 260
depositfiles.com 245
salefiles.com 220
www.example.com 215
www.netload.in 212
noc.yartv.ru 201
netload.in 196
pastebin.com 179
extabit.com 173
www.putlocker.com 163
www.exploit4arab.net 157
Recorded Future 12
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
tech4all.criativin.com.br 156
www.youtube.com 154
github.com 150
www.gov.ai 147
www.owasp.org 143
filepost.com 140
hosting.risp.ru 138
pan.baidu.com 132
www.exploit-db.com 130
freakshare.com 120
www.zeustech.net 119
www.voxility.com 117
As seen, again a majority of the link destinations are file sharing sites of different kinds, showing that discussions around
malware on these sites tend to be accompanied with links where other content can be downloaded. This graph illustrates
the link pattern, and Pastebin is the main source of links:
Recorded Future 13
50 Shades of Dark: How the Surface Web Reveals What’s Happening on the Dark Web
Conclusions
There are clear borders between the surface, deep, and dark web in terms of accessibility and tools, but there exists
information on the surface web and on the deep web that can be used to gain important understanding of what is
happening on the dark web. Simple marketing mechanics underlies this — when something needs to be sold, prospective
customers need to be able to find information about it quickly. The available information includes topics, link patterns, and
activity levels.
As illustrated by the study of mentions of the DarkComet malware, sites such as Pastebin act as a marketing channel by
providing a fairly unregulated place for posting both instructions and links to download sites for malware. Using a threat
intelligence platform to monitor the activity on paste sites can therefore be a good way to get early warning signals for
increased use of certain kind of malware and stolen data or credentials.
Topics also tend to migrate over time, from dark to surface web, and analyzing these patterns allows us to understand
when high-end malware tools are becoming commodity malware. Such a shift means the volume of attacks using the
commodity malware will increase, but the average skill level of attackers will go down — and the highly skilled attackers will
have moved on to using another tool.
Recorded Future, 363 Highland Avenue, Somerville, MA 02144 USA | © Recorded Future, Inc. All rights reserved. All trademarks remain property of their respective owners. | 04/17
REQUEST A DEMO
@RecordedFuture www.recordedfuture.com