Malicious Web Pages: What If Hosting Providers Could Actually Do Something

Uploaded by

guerreroveracesar11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Malicious Web Pages: What If Hosting Providers Could Actually Do Something

Uploaded by

guerreroveracesar11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5

Available online at www.sciencedirect.com

ScienceDirect

www.compseconline.com/publications/prodclaw.htm

Malicious web pages: What if hosting providers

could actually do something…

Huw Fryer b,*, Sophie Stalla-Bourdillon a, Tim Chown b

a
Institute for Law and the Web, University of Southampton, UK
b
ECS, Faculty of Physical Sciences and Engineering, University of Southampton, UK

abstract

Keywords: The growth in use of Internet based systems over the past 20 years has seen a corre-
Web security sponding growth in criminal information technologies infrastructures. While previous
Drive-by download “worm” based attacks would push themselves onto vulnerable systems, a common form of
Malware attack is now that of drive-by download. In contrast to email or worm-based malware
E-Commerce directive propagation, such drive-by attacks are stealthy as they are ‘invisible’ to the user when
Immunities doing general Web browsing. They also increase the potential victim base for attackers
Internet intermediaries since they allow a way through the user's firewall as the user initiates the connection to the
Hosting providers Web page from within their own network. This paper introduces some key terminology
ISP relating to drive-by downloads and assesses the state of the art in technologies which seek
Search engines to prevent these attacks. This paper then suggests that a proactive approach to preventing
compromise is required. The roles of different stakeholders are examined in terms of
efficacy and legal implications, and it is concluded that Web hosting providers are best
placed to deal with the problem, but that the system of liability exemption deriving from
the E-Commerce Directive reduces the incentive for these actors to adopt appropriate
security practices.
© 2015 Huw Fryer, Sophie Stalla-Bourdillon and Tim Chown. Published by Elsevier Ltd. All
rights reserved.

initiated by others, such as distributed denial of service (DDoS)

1. Introduction attacks. A single compromise may result in an infected system
that is used in multiple criminal activities, and the cumulative
The ability of cyber criminals to compromise networked effect of these activities and the resources dedicated to pre-
computer systems through the spread of malware allows the vention can be considerable.1 This paper explains how the
creation of significant criminal information technologies (IT) phenomenon of drive-by downloads has evolved to become a
infrastructures or ‘botnets’. The systems compromising such significant threat to both Internet users and third party
infrastructures can be used to harvest credentials, typically systems.
through keylogging malware, or provide a cover for illegal To effect a compromise via a drive-by, a criminal will
activities by making victim computers perform criminal acts create a malicious Web page which, when visited, attempts to

1
* Corresponding author. ECS, Faculty of Physical Sciences and See e.g. Ross Anderson and others, “Measuring the Cost of
Engineering, University of Southampton, Highfield, South- Cybercrime”, Proceedings (online) of the 11th Workshop on the
ampton, SO17 1BJ, UK. Economics of Information Security (WEIS), Berlin, Germany
E-mail address: [email protected] (H. Fryer). (2012).
http://dx.doi.org/10.1016/j.clsr.2015.05.011
0267-3649/© 2015 Huw Fryer, Sophie Stalla-Bourdillon and Tim Chown. Published by Elsevier Ltd. All rights reserved.
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5 491

exploit vulnerabilities on the user's computer automatically. infrastructure which can send this volume of spam, or
In contrast to email or worm-based malware propagation, perform other undesirable actions is through the use of ma-
such drive-by attacks are stealthy as they are ‘invisible’ to the licious software (malware). Malware takes over a victim's
user when doing general Web browsing. They also increase computer, and having done that can either attack the users
the potential victim base for attackers since they allow a way directly, or recruit them into a botnet, i.e. a distributed
through the user's firewall, as the user initiates the connection network of computers which is of great to value to an attacker.
to the Web page from within their own network. The phe- Targeting the users might include something as simple as
nomenon of drive-by downloads is not a new one, but remains altering search results to gain advertising revenue, or spying
one of the significant threats to the security of the Web, with on the browsing habits to target adverts. More seriously, it can
the prominent malware variants being distributed in this steal credentials to online banking; or render a user's com-
way.2 puter unusable (e.g. through encrypting all their files) unless
The perception that malware only resides on ‘suspect’ sites they pay a ransom. Distributed computing offers the oppor-
such as file sharing sites, or those carrying pornography is tunity to conduct distributed denial of service attacks; sending
now far from reality. Commonly, an attacker will seek to spam; and more recently mining bitcoins.4
compromise an otherwise legitimate website and use that to Over the years, the tactics that criminals have used to
distribute malware. They may also attempt to place malware distribute malware have evolved and now different strategies
on a cheap throwaway domain name, but it is harder for ISPs are required to combat them. This section provides some
or authorities to take measures against a legitimate website, background of this evolution, up to the primary focus of the
and it also increases the probability of a potential victim paper: that of “drive-by” downloads. The distinctions between
visiting it. Where the target is a website on a trending topic, different types of malware are often unhelpful, since a lot of
the risk of exposure is even greater. With the rise of blogging them do not fit neatly into one category, and in corporate el-
and similar content creation, there is also a significant risk of ements of different types of malware. The reason for the
vulnerabilities in common blogging platforms, such as distinctions in this section is to emphasise the differences in
WordPress, exposing visitors to such sites to potential drive- propagation methods, and the differences in strategy which
by malware. are required to combat them.
This article provides a review of the existing strategies
being used to mitigate this problem, and explains why they 2.1. Exploitation vs social engineering
are not enough. We suggest that simple actions by Web in-
termediaries, in particular companies providing hosting ser- In order to work, malware needs to be able to run on a victim
vices, could significantly impact upon the amount of machine. One method to infect a victim is known as social
malicious web pages, and force the criminals to use a smaller, engineering which is to simply make the user voluntarily run
more readily identifiable set of platforms to spread their the malicious code.5 This can be accomplished through the
malware. We conclude that laws excluding liability for in- use of Trojan style malware. Like the name suggests, this is a
termediaries such as the E-commerce Directive in the Euro- reference to the Trojan horse from Greek legend, which was
pean Union do not necessarily give an incentive to hosting let into Troy and allowed the Greeks hiding within to sneak
providers to engage in such security practices and legitimate out and open the gates of the besieged city from the inside. In
use of the Web suffers as a result. the context of security, this might comprise an application
purporting to perform a certain task, whilst at the same time
an application hidden within would simultaneously attempt
to subvert the machine it was run on.
2. Background Another method is to exploit a vulnerability on the ma-
chine. A vulnerability is a flaw, or bug in a piece of software
Like any other technology, computers have turned out to have which amounts to a security weakness. Vulnerabilities will
a significant amount of use by criminals as well as legitimate have a greater or lesser degree of severity, but the most
use. The problem has been more severe than with previous serious are those which allow Remote Code Execution (RCE).
technology, due to the combination of two factors. Firstly, These vulnerabilities allow an attacker to run their own code
computers have increased the speed at which a task can be rather than the code intended by the application. This is done
automated. Secondly, the Web has got rid of the majority of by confusing the program into accepting input as commands
the geographic limitations towards finding more victims so to be executed, rather than as data to be manipulated. An
this automation can be put to good (or rather malicious) use. exploit is a piece of code which takes advantage of the
An example of this automation in action comes from the vulnerability, in order to run the desired code. In traditional
volume of spam, which despite having reduced considerably computer based applications, this will be done by corrupting
from a high of 92.6%, still represents 75.2% of all emails.3 The
main way that criminal groups are able to maintain 4
Bitcoins are a virtual currency, a part of which relies on
solving a “hard” mathematical problem, for which the miner is
2
Chris Grier and others, “Manufacturing Compromise: The compensated. The power requirements for doing this are signif-
Emergence of Exploit-as-a-Service”, Proceedings of the 2012 ACM icant, so using a network of victim computers can save a
conference on Computer and communications security (2012). considerable amount of money.
3 5
Trustwave, “Trustwave 2013 Global Security Report” (2013) In this context, code refers to the series of instructions writ-
<http://www2.trustwave.com/rs/trustwave/images/2013-Global- ten by the programmer which gets converted into “machine
Security-Report.pdf> accessed July 22, 2014. code” (a series of 0s and 1s) that the computer can understand.
492 c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5

the memory, but in Web applications there are many other and physical viruses, a direct analogy was drawn and strate-
methods through which this can be achieved, and other less gies for minimising the spread came from that. Kephart &
serious vulnerabilities which exist.6 White were amongst the first to look at this,12 and other ideas
The two methods can also be used together. For example, a like selective immunisation or quarantine were also tried with
common attack vector is that of a malicious attachment sent varying levels of success.13
in an email. Sending an executable file would simply be social
engineering,7 and that would be used to infect the computer.
For the most part though, in order for it to be a plausible 2.3. Worms
reason for the user to open the attachment, then a file not
typically associated with being malicious (such as Word doc- Like a virus, a worm also self-propagates. The difference is
uments, or PDF files) can be used. As the victim opens the that whilst a virus was constrained in having to go from file to
maliciously crafted file, a vulnerability in the file reading file, a worm could install itself only once on a computer and
software is exploited and that is used to take over the com- then scan other computers to push itself onto. Rather than
puter. A typical example is an email purporting to be from a relying on physical devices to propagate, it automates the
delivery company, with information in a PDF file on the process through using network or Internet connections. It
location of a parcel sent to the victim. required no intervention from the user, it would simply scan
for vulnerable machines and exploit the same vulnerability on
2.2. Viruses each one. As usage of the Web began to grow significantly in
the early 2000s, worms were incredibly common and incred-
Early malware was mostly computer viruses, now incorrectly ibly effective. Few of the computers connecting to the Internet
used as a layperson's term for all forms of malware. A virus had adequate security to deal with these attacks. Firewalls
was characterised by the fact that it would attach itself to a were not commonly installed, meaning that it was possible to
previously benign file, and then spread from file to file on the push this malware without any barriers, and many computers
computer. The concept originally came from Von Neumann, were directly accessible on the Internet to an attacker.14
and then Cohen analysed the properties of computer viruses Operating system vendors also took time to adjust to the na-
in more detail.8 An early Masters thesis by Kraus considered a ture of the threat, in that their products were under such
biological analogy in that code did not satisfy the re- relentless attack. For example it was not until 2003 that
quirements for being classed as alive, whereas a virus was Microsoft introduced a regular patching cycle, and even then
simpler than most other organisms, had the ability to repro- the update mechanism required users to opt-in rather than be
duce, and hence became a workable analogy.9 The virus would done automatically, which meant that a lot of the time up-
spread from file to file on a computer, and then transfer to dates never happened.
different computers through the physical transfer of floppy This attack method has since fallen out of favour, and
disks between users. This was the most logical way for it to there are several possible reasons for this. The first is that
spread, since use of networks in general, and in particular the operating system vendors have caught up with the threats
Web, was in its infancy10. and the hostile environment they have to work within, and
The nature of the spread of real world viruses, and actions have introduced additional security and updates into their
to limit them, appeared to hold with computer viruses too,11 products. As such, worm exploits e which attacked operating
which led to a substantial body of work on epidemiology of systems e are not so easy to find. Modern operating systems
computer networks. Given the similarities between virtual also have firewalls installed by default,15 which largely solves
the problem of malware “pushing” onto a machine. Similarly,
6
See infra, Section 4.1. the depletion of IP addresses also led to the adoption of
7
Mostly. It is possible to rely on the fact that Windows hides Network Address Translation (NAT) hardware, which enabled
file extensions by default, so filename.doc.exe would display as multiple computers on a local network to share the same IP
.doc, whereas running it would cause the executable file to run.
8
address on the Internet as is the case on most home networks.
Fred Cohen, “Computer Viruses: Theory and Experiments”
A side effect of this is that the NAT hardware will not accept
(1987) 6 Computers & security 22.
9
Kraus, 1988 Masters thesis, translated by D Bilar and E Filiol, unsolicited communications from the Internet, and this can
“On Self-Reproducing Computer Programs” (2009) 5 Journal in block worm based attacks.
computer virology 9.
10 13
Interestingly this approach is still used with memory sticks, See e.g. Giuseppe Serazzi and Stefano Zanero, “Computer
in particular sensitive machines which are not connected to the Virus Propagation Models”, Performance Tools and Applications to
Internet to avoid malware, for example the Stuxnet malware Networked Systems (Springer 2004). and Chenxi Wang, John C
used this method, Ralph Langner, “Stuxnet: Dissecting a Cyber- Knight and Matthew C Elder, “On Computer Viral Infection and
warfare Weapon” (2011) 9 Security & Privacy, IEEE 49. It was also a the Effect of Immunization”, Computer Security Applications, 2000.
propagation method of the Conficker worm, see Phillip Porras, ACSAC0 00. 16th Annual Conference (2000).
14
Hassen Saidi and Vinod Yegneswaran, “Conficker C Analysis” At a certain point, it was not possible for even vigilant users to
[2009] SRI International. configure their computers before becoming infected, Scott Gran-
11
William H Murray, “The Application of Epidemiology to neman, “Infected in 20 Minutes” (The Register, 2004) <http://www.
Computer Viruses” (1988) 7 Computers & Security 139. theregister.co.uk/2004/08/19/infected_in20_minutes/> accessed
12
Jeffrey O Kephart and Steve R White, “Measuring and June 20, 2014.
15
Modelling Computer Virus Prevalence”, Research in Security and Joe Davies (Microsoft), “New Networking Features in Microsoft
Privacy, 1993. Proceedings., 1993 IEEE Computer Society Symposium on Windows XP Service Pack 2” (2004) <http://technet.microsoft.
(1993). com/en-us/library/bb877964.aspx> accessed June 20, 2014.
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5 493

2.4. Drive-by downloads There are benefits to an attacker in compromising a legit-

imate website as opposed to buying cheap throwaway domain
Attackers reacted to the defences against worms by taking the names. The existing reputation contained by a legitimate
opposite approach with drive-by downloads. A drive-by website means that it is harder to shut down than a malicious
download waits for the user to come to them rather than site, and also it becomes more likely that a potential victim
attempting to force it onto other machines (using a “pull” will visit it. A legitimate website will already have a certain
rather than “push” propagation mechanism). The process re- amount of traffic, and this can be enhanced through taking
quires that a user visits a website which is under the control of advantage of trending topics. Major news events will lead to a
an attacker. Once the website has been visited, malicious code lot of people searching for information about it, so controlling
on the website will attempt to subvert the user's browser, and a website about major news events is an advantage. A recent
take over the computer that way since it is the browser which example could be news about the missing MH 370 Malaysian
is used to access the website. This might be done through Airlines flight. Moore et al. found evidence of this practice
using JavaScript to corrupt the browser itself, or will use one of with advert filled and malicious sites effectively exploiting
the plugins which the browser is running. Plugins are addi- trending terms on both Google and Twitter generating
tional features added to the browser, often to play multimedia considerable profits for criminals.21
content e such as Flash, Adobe Reader or Java. Like with Finally, if a website with high reputation is compromised, it
operating systems, these will also contain vulnerabilities, so can be used to facilitate black hat search engine optimisation
the attacker will attempt to exploit these in the same way to (SEO). The way the Google ranking system works is to assigns
get their malicious code to run. weight to the links to a website based on the rank of the
This offers some significant advantages to an attacker over website which links to it, so a website with high reputation
a worm-based attack. Firstly, any logs which exist of a user could significantly enhance the ranking of another website.
visiting a malicious website will be virtually indistinguishable With older propagation methods, simple steps can be
from normal Web browsing, so the compromise has less taken. It is well known that Trojan style malware can be run
chance of being discovered. Secondly, although a firewall can by opening untrusted executable files, so one can simply avoid
block attacks from the Internet, it has to let some traffic running applications that are not known to be trustworthy
through in order to make Web browsing possible. Using a and appropriate usage of firewalls (or NATs with their implicit
website can therefore offer the attacker a way through the firewall function) can prevent the spread of worms. With
user's firewall, and therefore increase the potential victim drive-by downloads things are slightly more complicated,
base.16 since there is constantly a risk with going to any website that
The phenomenon of drive-by downloads is not a new one, the computer might become infected, so even a careful user's
but it has remained a significant threat. Provos et al. per- computer could become infected.
formed a detailed analysis of drive-by attacks as early as At a high level, there are three steps which need to happen
March 2006e07,17 and also JanuaryeOctober 2007.18 They for a successful drive-by download to occur:
found 1.3% of results in Google search results were malicious;
and 0.6% of the most popular 1 million URLs had, at some 1. A previously benign website needs to be taken over and
point, been used as malicious hosting. A typical attack will use redirect the user to an attack, or embed the attack code
a previously benign website which is compromised by the within the victim website;
attacker to include malicious content. This is a separate part 2. A user with a vulnerable Web browser (or browser plugin)
of the attack, before the victim browses to the website, and has to visit it;
will use some weakness in the website or the server it is 3. The vulnerability within the browser gets successfully
hosted on, commonly including out of date software; mali- exploited by the malicious code.
cious advertising; or exploits using unchecked user data.19
Following the exploitation of the website, the content will This implies that in terms of combating the problem, there
then be changed to include malicious code, usually to redirect are at least two approaches available:
the victim to an attack website, which contains the code
performing the exploit20. 1. Accepting that websites will get compromised, and
attempting to mitigate the damage to users' machines. This
is a reactive approach, and includes early detection, mini-
16
Niels Provos and others, “The Ghost in the Browser Analysis mising access, or preventing malicious code from
of Web-Based Malware”, Proceedings of the first conference on First
executing.
Workshop on Hot Topics in Understanding Botnets (2007).
17
Ibid.
2. Preventing the benign website from being taken over in the
18
Niels Provos and others, “All Your iFRAMEs Point to Us”, Pro- first place: a proactive approach which consists in giving
ceedings of the 17th Conference on Security Symposium (USENIX As- website operators an incentive to secure their websites.
sociation 2008) <http://dl.acm.org/citation.cfm?id¼1496711.
1496712>.
19
See infra Section 4.1.
20
An <iframe> HTML tag allows an entire Web page to be
21
embedded inside another one. Redirection is a common Tyler Moore, Nektarios Leontiadis and Nicolas Christin,
requirement for the Web, so there are simple ways of doing it, for “Fashion Crimes: Trending-Term Exploitation on the Web”, Pro-
example the JavaScript command: window.location ¼ “www.evil. ceedings of the 18th ACM conference on Computer and communications
com”. security (2011).
494 c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5

As it will be explained in the following sections, the pro- More sophisticated methods analyse the characteristics of
active approach is likely to be the most effective e or rather, it a page, and search for characteristics which are out of the
is currently underused, and can reduce the need for actions ordinary and use those to assign a probability that a certain
under the first approach. page is malicious. This is known as anomaly detection, and an
example includes Cova et al. whose analysis viewed certain
programming techniques or a large amount of redirects as
3. Mitigating the effect of compromised suspicious.24 These techniques can be enhanced by using
websites: the reactive approach machine learning techniques, where a program learns to
recognise these malicious characteristics and adapt to new
The majority of the literature has focused on mitigating the variants as it gets trained. Examples of applications using
effect of drive-by downloads to the clients, after the initial these techniques include Cujo,25 ZOZZLE,26 and SurfGuard.27
compromise (to the website) has already happened. This can These techniques are not confined to analysis of the page
be roughly split into two categories: pre-emptive approaches content. John et al. identified a drive-by campaign where
which search through websites in advance of the victim pages on compromised websites suddenly increased in
visiting a site so as to warn them, and real-time approaches popularity in the search rankings, making use of black hat
that are methods of detecting whether a page is malicious at search engine optimisation (SEO) techniques. They found that
the time that the user visits the page and attempts to prevent this was a good way of identifying some malicious attacks,
any damage from occurring, although both use similar and whilst blocking based on these criteria would be easy to
methods for detection. get around, it would require the popularity of the page to be
The identification of a malicious page enables a sign to be reduced making the pages less likely to be viewed.28 Zhang
placed to warn users from visiting it, or to prevent the et al. sought to identify compromised Web pages through
execution of the malicious code if they visit it anyway. their links to attack servers hosting the malicious content,
To identify pages in advance, a client honeypot would which as attackers currently work, is usually hosted on a
typically be used. A client honeypot is an application which different server. Through combining knowledge of IP ad-
mimics vulnerable browsers and visits websites in order to dresses and domains related to those particular servers, then
induce them to attack. Depending on the level of information a network of compromised pages could be identified.29
which the researcher wants, the site can either be classified as Rather than attempting to identify the malware from the
malicious (or not); or further details could be discovered such characteristics of the server, it is also possible to identify
as in what ways the attack site attempts to interact with the through actions in the user's computer. This has the advan-
browser. Provos et al. used high interaction honeypots for tage that there is no need for any knowledge about how the
their investigation into the level of malicious pages described malicious code is constructed, but simply relies upon the
earlier, which looked for any changes to the state of the ma- observed effects of the browser or operating system after
chine; suspicious redirects, or suspicious downloads.22 Pages visiting the page. One way this can work is through relying on
identified by Google (either through the use of client honey- the fact that RCE requires memory corruption to occur, and
pots, or as part of their website scanning process) are pre- then for the attacker's code (known as shellcode) to be
sented with a warning when in the search results through the executed. Egele et al. sought to identify shellcode in output
safe-browsing API, which is also used by browsers such as from the Web page30, whereas other techniques have been to
Google Chrome and Mozilla Firefox. In the event that a user examine the download of files after a page has been visited. If
tries to browse to a malicious page, they are given further
prompts to attempt to prevent them from going onto the page. 24
Marco Cova, Christopher Kruegel and Giovanni Vigna,
Despite general criticism in the literature about the effec- “Detection and Analysis of Drive-by-Download Attacks and Ma-
tiveness of browser warnings, a recent study by Akhawe and licious JavaScript Code”, Proceedings of the 19th international con-
Felt suggested that these warnings are actually effective, with ference on World wide web (2010).
25
Konrad Rieck, Tammo Krueger and Andreas Dewald, “Cujo:
only between 9% and 23% of users going through malware or
Efficient Detection and Prevention of Drive-by-Download At-
phishing warnings.23
tacks”, Proceedings of the 26th Annual Computer Security Applications
Detection methods for malicious websites also contain Conference (2010).
trade-offs between accuracy and processing time required for 26
Charlie Curtsinger and others, “ZOZZLE: Fast and Precise In-
classification. Simple analysis can classify pages based on Browser JavaScript Malware Detection.”, USENIX Security Sympo-
characteristics of known malware. These are simple and quick sium (2011).
27
to do, but can easily be circumvented with minor changes in V Sachin and NN Chiplunkar, “SurfGuard JavaScript
Instrumentation-Based Defense against Drive-by Downloads”,
strategy by the attacker. Applications do exist for similar pur-
Recent Advances in Computing and Software Systems (RACSS), 2012
poses in relation to malware, by slightly changing a few lines of International Conference on (2012).
code so that anti-virus software will not detect them. 28
John P John and others, “deSEO: Combating Search-Result
Poisoning.”, USENIX Security Symposium (2011).
22 29
Provos and others, “All Your iFRAMEs Point to Us.” n18. Junjie Zhang and others, “Arrow: Generating Signatures to
23
Devdatta Akhawe and Adrienne Porter Felt, “Alice in War- Detect Drive-by Downloads”, Proceedings of the 20th international
ningland: A Large-Scale Field Study of Browser Security Warning conference on World wide web (2011).
30
Effectiveness”, Proceedings of the 22nd {USENIX} Security Symposium Manuel Egele, Engin Kirda and Christopher Kruegel, “Miti-
(2013) <https://www.usenix.org/system/files/conference/ gating Drive-by Download Attacks: Challenges and Open Prob-
usenixsecurity13/sec13-paper_akhawe.pdf> accessed February lems”, iNetSec 2009–Open Research Problems in Network Security
10, 2014. (Springer 2009).
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5 495

it appears that they have been downloaded without consent, Websites are made possible by the integration of a range of
then that would suggest that they were unwanted at best and applications, and as such are vulnerable to similar attacks that
likely malware. In these cases then the code to execute these conventional desktop applications are. The Web server is
programs could simply be ignored.31 responsible for accepting the requests coming from the client,
That said, the use of client honeypots or pre-emptive and responding with the content. Frequently, the server will
detection in this way does have certain limitations. Firstly, interact with a database which allows the display of dynam-
the browser which is being simulated might not be the target ically created pages, and functions such as login to take place.
of the malware, such as if a honeypot used a version of These will both also be running on an operating system. The
Internet Explorer when the malware targeted Mozilla Firefox. display of the website itself is done using HyperText Markup
It may not attempt to execute in situations like that. Similarly, Language (HTML) and Cascading Style Sheets (CSS), which the
there is “IP Centric” malware, which would only appear to user's Web browser understands and can display. Creating
users with certain IP addresses. An IP address can identify content using HTML is frequently automated so that novice
which network a user is coming from, and with security operators can add content. This is done by a content man-
companies or search engines having known IP address ranges, agement system (CMS), which will present the operator with a
then (depending on the level of control over the page) the WYSIWYG (what you see is what you get) interface, and store
malware could decline to activate if an adversary (i.e. an ad- the content in a database which can be retrieved and dis-
versary to the person who has compromised the site e e.g. a played when it is requested by a user.
search engine/security company etc.) visited the site. This complicated interaction between different software
makes website particularly prone to attacks. There are tradi-
tional memory corruption attacks which can be used against
the software running infrastructure, such as the Web server or
the database.35 Like with home computers, a website also
4. Preventing the compromise of legitimate
offers a way through a firewall, since it accepts traffic from
websites: the proactive approach
outside and interacts with the underlying infrastructure in
order to respond to requests. The website, interacting with the
4.1. Vulnerable and compromised websites
infrastructure offers another means of attack.
These attacks will often rely on mixing data with applica-
The previous section considered the problem from the clients'
tion instructions, so websites which do not validate their user
point of view: making sure they do not visit compromised
input to prevent this can be attacked in this way. Injection
websites and seek to minimise the damage if they do. The
based attacks are an example of this, and would often target
alternative approach is to prevent legitimate websites from
the database by figuring out what values the application uses
becoming exploited in the first place. This section will
to query the database and change them to be commands.36 For
describe the ways in which websites are exploited, and what
example, a Web page might query the database with the
can be done to prevent this from occurring. A malicious
condition: everything with a type value of ‘product’. An
advertisement can be another way of making a drive-by attack
attacker could simply modify the value to search for to be.
possible. This is also an important problem, but is regarded as
being out of the scope of this paper.32 The reason compro- '; UPDATE Content SET Post ¼ Post þ ' < iframe src
mised websites were chosen was a statistic in the APWG
¼ “evil:com” >
report which states that somewhere close to 90% of phishing
websites were otherwise legitimate, compromised websites.33 This appends a quotation mark which signifies the search
In addition, having access to a fully controlled website rather should be for an empty string, and then adds an additional
than merely infecting visiting users provides a whole range of UPDATE command, which edits the content stored in the
uses to attackers, making the potential threat more serious.34 database - in this case embedding the malicious content at the
end of each post. Adding – at the end is a comment, which tells
31
Fu-Hau Hsu and others, “BrowserGuard: A Behavior-Based the computer to ignore anything after it, so can exclude the
Solution to Drive-by-Download Attacks” (2011) 29 Selected rest of the intended command.
Areas in Communications, IEEE Journal on 1461. Another attack would use JavaScript in a similar way. As
32
This is one of the attack vector identified by Provos et al. By
was discussed previously, JavaScript is used as a means of
placing advertisements on a website, the operator no longer has
control of the content. Provos and others, “All Your iFRAMEs Point corrupting the user's Web browser when they come and visit
to Us.” the website, to install malware on it. This would not place the
33
Rod Rasmussen and Greg Aaron, “APWG Global Phishing website under the attacker's control, but instead would
Survey: Trends and Domain Name Use in 1H2013” (2013) <http:// directly target the users who visited it. The effect is the same
docs.apwg.org/reports/APWG_GlobalPhishingSurvey_1H2013. of course e it turns the benign page into a malicious one.
pdf> accessed June 20, 2014. Whilst phishing websites are not the
same as websites distributing malware, this is indicative of cur-
35
rent trends. Having compromised a website an attacker could use See the discussion in Section 2.
36
it in whichever way he chooses, whether that is as a phishing Other vectors include XPath, XML, or even commands in the
page or otherwise. server side language the website is written in. For example, PHP
34
Davide Canali, Davide Balzarotti and others, “Behind the has a shell_exec function which allows operating system com-
Scenes of Online Attacks: An Analysis of Exploitation Behaviors mands to be written and then performed, see “Shell_exec”
on the Web”, Proceedings of the 20th Annual Network & Distributed <http://www.php.net//manual/en/function.shell-exec.php>
System Security Symposium (2013). accessed July 09, 2014.
496 c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5

Alternatively, JavaScript can be used to disguise the content a shell, then they would do so on 46% of occasions, and then
on the page,37 or to redirect the user to a malicious page, e.g. use that to login on average after 3 and a half hours. Only 1.1%
for phishing. of attackers specifically sought to add a drive-by download
The code in all the different layers of the Web server and onto the website, but nearly 49.4% attempted to gain more
application will, like any other software, have bugs which will permanent control of the machine, and 27.7% tried to get the
occasionally turn out to be vulnerabilities. Well supported machine into an IRC botnet,44 which would also enable drive-
software will be updated, where the vulnerabilities will be by downloads alongside other more general malicious activity
fixed, but then the weakness in the server becomes that the such as phishing, sending spam, or hosting illegal content.
updated software is not installed. This leaves the website in a This demonstrates the variety of ways an attacker can use a
vulnerable state which could be exploited.38 A website being compromised website as a means of facilitating drive-by
vulnerable is more serious than a normal user being vulner- downloads.
able, because the user can hide to some degree behind the Automated scans are made by criminals to detect vulner-
protections offered by their ISP, or the fact that the probability able websites and to seek to exploit them. One strategy is the
is generally pretty low that they will visit an attack website. A use of search terms indicating the presence of vulnerable
website by contrast, exists entirely so it can be found and so components, or a website which is already compromised. This
has none of this protection. was demonstrated by Clayton & Moore, who showed that
Somewhat concerning, is the apparent prevalence of vul- there was a correlation between these search terms and the
nerabilities from the OWASP top 10 on the Web. WhiteHat compromise of websites for phishing attacks.45 For example,
security's 2013 Global Security report found that 86% of web- searching for phpizabi 0.848 b c1 hgp1 would return websites
sites had a “serious” vulnerability on their website, defined as powered by an old version of phpizabi, which contained a
the ability to compromise at least part of their site.39 Trust- vulnerability allowing the upload of files to the server
wave's Global Security Report 2013, found that 38% of domains (vulnerability CVE-2008-0805).46 Websites which were already
used Password1 as a password as well as many other vul- compromised might have an uploaded “shell”, which is a
nerabilities40. Checkmarx analysed the source code of the piece of functionality designed so that the attacker can
most popular WordPress plugins, and found that 20% of them, perform tasks on the server. The phrase inurl:c99.php would
and 70% of the most popular e-commerce plugins, contained locate websites which had any URL containing c99.php, a
similar, serious, vulnerabilities.41 Although these are all se- popular shell used by attackers, and would demonstrate that
curity vendors, possibly with their own interests in presenting the site was already compromised.
the data in certain ways, the amount of data breaches which
have occurred do lend credence to their figures and indicate
the seriousness of the situation. 4.2. Proactive solutions
Vulnerable websites are at constant risk of compromise,
and it is potentially simply a matter of time before the The main way we envisage the proactive approach being
compromise happens. Like with home zombie computers, applied is through patching, and keeping the software up to the
there is a wide range of uses a website can be put to, latest version. This section will analyse the process on both the
demonstrated by Canali & Balzarotti42 who deployed web client side and the server side, and what sort of strategy could
based honeypots to analyse exactly what an attacker would do be used to require different users to ensure it occurs.
following a successful exploitation.43 They used 500 websites On the client side, modern software has increasingly
with different characteristics, mostly based around vulner- started to automatically upgrade in order to fix vulnerabilities.
able CMS software. In the event that the attacker could upload Microsoft's “patch Tuesday” provides monthly updates to
Windows and other software on the second Tuesday of every
month, and both Adobe and Oracle have quarterly update
37
This is often used in “clickjacking”, where a Facebook “like” cycles for their products. Web browsers also update auto-
button or similar is hidden behind other content, and fools the matically, Microsoft Internet Explorer as part of patch
user into propagating a link to the page.
38
Tuesday, Google Chrome updates silently, and Mozilla Firefox
There are other sorts of vulnerabilities specific to websites,
automatically searches for updates and installs the update
and the interested reader can go to https://www.owasp.org/
index.php/Top_10_2013-Top_10 for more detail. when the browser is restarted. This means that vulnerabilities
39
WhiteHat Security, “Whitehat Website Security Statistics do not exist for very long47 which suggests that people who are
Report 2013” (2013) <https://www.whitehatsec.com/assets/
44
WPstatsReport_052013.pdf> accessed June 20, 2014. IRC stands for Internet Relay Chat, and used to be a common
40
Trustwave, n1. way for attackers to control botnets. Since many server admin-
41
Checkmarx, “The Security State of WordPress' Top 50 Plugins” istrators will simply block any network traffic connecting to this,
<http://www.checkmarx.com/white_papers/the-security-state- it has become a lot rarer.
45
of-wordpress-top-50-plugins/> accessed July 09, 2014. Tyler Moore and Richard Clayton, “Evil Searching: Compro-
42
Canali, Balzarotti and others, n29. mise and Recompromise of Internet Hosts for Phishing”, Financial
43
A honeypot was the precursor to the client honeypot, and Cryptography and Data Security (Springer 2009).
46
simply provides what appears to be a vulnerable server which an This search no longer returns any results, 24 April 2014.
47
attacker might scan. They can have different functions such as Three months could be regarded as a considerable amount of
detecting malicious IP addresses, viewing attack strategies, or time, but in most cases the vulnerabilities are not publicly known
altering the economics of attacking sites, Zhen Li, Qi Liao and due to responsible disclosure practices by security researchers. In
Aaron Striegel, “Botnet Economics: Uncertainty Matters”, Work- the event that a problem is particularly serious, then a vendor
shop on the Economics of Information Security (WEIS) (2008). will often release an out of band update to protect their users.
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5 497

Table 1 e Reproduced from Microsoft Security Report JulyeDecember 2012.

Exploit Platform or technology 1Q12 2Q12 3Q12 4Q12
Win32/Pdfjsca Documents 1,430,448 1,217,348 1,187,265 2,757,703
Blacole HTML/JavaScript 3,154,826 2,793,451 2,464,172 2,381,275
CVE-2012-1723a Java e e 110,529 1,430,501
Malicious IFrame HTML/JavaScript 950,347 812,470 567,014 1,017,351
CVE-2010-2568 (MS10-046) Operating system 726,797 783,013 791,520 1,001,053
CVE-2012-0507a Java 205,613 1,494,074 270,894 220,780
CVE-2011-3402 (MS11-087) Operating System 42 24 66 199,648
CVE-2011-3544a Java 1,358,266 803,053 149,487 116,441
ShellCodea Shell code 105,479 145,352 120,862 73,615
JS/Phoex Java 274,811 232,773 201,423 25,546
a
Vulnerability also used by the Blacole kit, the totals for this vulnerability exclude Blacole detections.

falling victim to drive-by-downloads are using very old ver- continue to use out of date CMS software. An example of this
sions of software before the automatic updates were intro- is the Joomla CMS, which has 46% of websites using version
duced, or have consciously decided to prevent the updates 1.x50, despite them no longer being officially supported.51
from running (e.g. if they are running a pirate version of the There are a few initiatives to try and combat this issue.
software). WordPress recently introduced a feature which automatically
This is supported by the attacks reported for 2013 shown in provides security and maintenance updates for versions 3.7
Table 1, which is from Microsoft's annual report of the state of and above.52 Whilst hosting providers will often send emails
security.48 when a new version of a CMS is released, and keep the
A few points should be clarified about the data presented in infrastructure up to date, Dutch hosting company Antagonist
this table. CVE numbers are unique identifiers for a particular, went a step further, providing a free vulnerability scanning
reported vulnerability, prefixed by CVE, the year it occurred, service, and attempts to fix vulnerabilities automatically.53
and then a unique number from that year. Blacole is an exploit
kit, which identifies the versions of software being used by the
users visiting a website, and then chooses an appropriate 5. Actors involved with proactive defence
exploit based on that information. The persistence of some of
the exploits over the course of the year demonstrates that One difficulty with drive-by downloads is that the people
they are still successful long after they are known about; who cause the problem are generally not affected by the
otherwise they would not continue to be used. This suggests consequences. The compromised website will not be
that any solutions which require installation of additional affected from a business point of view, since the purpose of a
software to prevent the execution of attacks are unlikely to drive-by page is that it is not noticed. When it is picked up by
have success on their own, because the only people likely to a blacklist, it is possible to remove the immediate issue
install them are people who are already largely safe.49 without fixing the problem itself. Even the victim of a drive-
On the server side, updates are more problematic, since by attack will also often not lose out, since losses from bank
any change brings with it the danger of breaking website fraud losses are often borne by the bank, but it is the botnet
functionality, and appropriate due diligence in preventing this infrastructure which causes more problems to the Web.
takes manpower. Like with normal users, the low barrier to Fixing the problem costs Web users as a whole a considerable
entry for running a website (e.g. using a CMS) means that the amount of money, but comparatively little to most in-
operators themselves are unaware of the risks of running dividuals, an example of this being click fraud.54 Whilst this
outdated software, or possibly were outsourced to someone does increase the costs for consumers as companies need to
who no longer supports the website. Actively maintained recoup higher advertising costs, the cost is shared between
CMSs like WordPress or Drupal are periodically updated to fix
50
bugs or vulnerabilities, but the high volume of websites which w3techs.com, “Usage Statistics and Market Share of Joomla
use them (WordPress is said to run 20% of the Web) means for Websites” (2014) <http://w3techs.com/technologies/details/
cm-joomla/all/all> accessed July 09, 2014.
that an attack against one website will work against many 51
Joomla, “What Version of Joomla! Should You Use?” <http://
websites, making them an attractive target for attackers. It
docs.joomla.org/What_version_of_Joomla!_should_you_use?
does appear to be a significant problem, since many websites #Joomla.21_CMS_versions> accessed July 09, 2014.
52
WordPress, “Updating WordPress” (2013) <http://codex.
wordpress.org/Updating_WordPress#Automatic_Background_
48
Danielle Alyias (Microsoft) and others, “Microsoft Security Updates> accessed July 09, 2014.
53
Intelligence Report, Volume 14” (2013) <http://download. Wouter de Vries, “Hosting Provider Antagonist Automatically
microsoft.com/download/E/0/F/E0F59BE7-E553-4888-9220- Fixes Vulnerabilities in Customers' Websites” (2012) <https://
1C79CBD14B4F/Microsoft_Security_Intelligence_Report_Volume_ www.antagonist.nl/blog/2012/11/hosting-provider-antagonist-
14_English.pdf> accessed June 20, 2014. automatically-fixes-vulnerabilities-in-customers-websites/>
49
Zero day attacks (exploits which target a vulnerability before accessed July 09, 2014.
54
a fix is available) are generally used for targeted attacks, so are Click fraud is the practice of repeatedly clicking on pay per
excluded from analysis in this paper. click advertising links to generate revenue.
498 c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5

Fig. 1 e The process of a drive-by download.

everyone and consequently not worth any individual 3. The technology does not currently exist to create bug free
investing in to prevent. software. Even where reasonable care is taken to check for
Adopting a proactive approach is defined as taking a pre- vulnerabilities, successfully checking every single line of
ventative rather than a reactive measure, and different types code (out of millions) is not currently feasible.
of actors could help in the process. A drive-by download sce-
nario is represented in Fig. 1. Of the entities represented in the Second, we also exclude the users. In many senses, it is the
scenario, the criminal/wrongdoer can be excluded from a users' “fault” that their machines get compromised, from poor
framework of proactive defence, since by definition he oper- security practices. This might include use of outdated soft-
ates outside of the law, most of the time from a jurisdiction ware (to the same effect as the websites), or declining to use
with limited enforcement power (in our example Anarchania). any security software. On some occasions, users will actively
That leaves six categories of actors: the user; the user's ISP; the choose to ignore warnings and view a page known to be ma-
search engine operator; the website operator; the hosting licious. Factors like this mean that many in the security in-
provider and the software vendor. dustry have a low opinion of users' ability to look after
We however further exclude from the scope of our analysis themselves, and agree with McGraw & Felten's quip that
the following three actors. First, the software vendor is set “given a choice between dancing pigs and security, users will
aside. While it is true that software vendors have a tendency pick dancing pigs every time”.56 There is some literature
to overlook vulnerabilities since they want to be “first to suggesting that users be personally liable for the damage they
market”,55 we chose to exclude software vendors from our cause through having unsecured machines.57 This, it is
analysis for the following reasons: argued, would soon require users to invest properly in
securing their machines therefore reducing the amount of
1. The major software vendors, e.g. Microsoft, Oracle, Adobe damage they can cause.
now work very hard to ensure their software is secure, and This does not represent the whole problem, however. For
have regular patching schedules to fix vulnerabilities. The example, Herley argued that many of the security warnings
impact of liability on software developers would likely presented to them are protecting against theoretical rather
have a limited impact in any case, as demonstrated by the than actual problems, and that adopting them would lead to a
use of old vulnerabilities in Table 1. considerable loss in terms of time outstripping the potential
2. On the Web, a large amount of software is open source loss from an attack. He also pointed out, that the “dancing
which means that they are the result of a complex chain of pigs” comment is unfair, in that users are not actually offered
production and it is not always easy to allocate roles and
responsibilities within this chain; 56
Cited by Akhawe and Felt, n19.
57
e.g. T Luis De Guzman, “Unleashing a Cure for the Botnet
Zombie Plague: Cybertorts, Counterstrikes, and Privileges” (2009)
55
Ross Anderson, “Why Information Security Is Hard-an Eco- 59 Cath. UL Rev. 527; Stephen E Henderson and Matthew E Yar-
nomic Perspective”, Computer Security Applications Conference, 2001. brough, “Suing the Insecure?: A Duty of Care in Cyberspace”
ACSAC 2001. Proceedings 17th Annual (2001). (2002) 32 New Mexico Law Review 11.
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5 499

“security” but rather are offered a set of complicated guide- and Article 11 of the European Charter); the right to private life
lines for managing risk.58 Adams & Sasse argue that it is these and data protection (Article 8 of the ECHR and Articles 7 and 8
policies, which are incompatible with working practices, of the European Charter); and the right to conduct one's
which is what causes bad security decisions rather than the business (Article 16 of the European Charter).
users themselves. When they can see the rationale behind The first way in which an ISP could intervene would be by
security requirements with well-designed software, their se- identifying infected customers as their machines start to talk
curity practices are good.59 to botnet command and control (C & C) servers, and then by
Third, one might view website operators themselves as notifying them. This is possible because a customer's Internet
being responsible. Like users they are frequently blameworthy traffic is all routed through the ISP, enabling them to see which
for failing to take adequate security precautions and allowing servers they are communicating with. That a device is infected
their websites to become compromised. A requisite standard could also be noticed by other entities, for example those who
of security practice could be proposed, and in the event that observed the IP address as part of a denial of service attack
the operator falls below that, then they could be liable. On the against them. However, a likely consequence is that informa-
other hand, there are similar issues to open source software tion about the user (i.e. personal data within the meaning of
developers, in that it is frequently not possible to identify the data protection Directive60) would need to be retained.
them. Even where they are identified, there is no guarantee From a right to private life and data protection perspective,
that they will have the necessary resources to cover the costs the processing including the collection and retention of the
arising from damages. In addition, requiring the individual customer's traffic data including IP addresses61 could probably
operators to take responsibility for their websites could be justified. Under Article 6 of the E-Privacy Directive,62 pro-
conceivably lead to a duplication of effort which would likely viders of public communications networks and publicly avail-
be better for a single hosting provider to do. Finally, one of the able electronic communications providers such as ISPs can
advantages of the Web has been the low barrier to entry for process traffic data63 for traffic management purposes. This
the creation of content. This being the case, it cannot be should include the safeguarding of network security and fraud
assumed that the website owner will necessarily have the detection,64 and since these can be deemed as an ISP's legiti-
technical competence to retain security on their website. For mate interests, no consent from their users should be required.
all of these reasons we have also decided to exclude website The justification of the processing of traffic data might be
operators from groups potentially liable for drive-by problematic if the data are transferred to third parties, even if
downloads. it is for the very same purpose: safeguarding network security
Some of the remaining categories of actors have contested and fraud detection since Article 6 only targets data control-
definitions, so for the avoidance of any doubt the terms will be lers acting under the authority of ISPs. For example, there
used in the following way. An ISP is the operator who physi- might be lists of IP addresses of known infected devices such
cally provides access to the Internet. Hosting providers offer as those displayed on the website of the Honeynet Project65 or
many different types of packages for people who wish to host the lists maintained by Spamhaus or other similar companies
a website, from managing everything including the CMS like about addresses which are known to have sent spam,66 or
http://wordpress.com to simply providing hardware and lett- known C & C ZeuS servers which could in many cases be
ing the customer do the rest. A service such as Dropbox which
offers the ability to store documents is not included, because 60
Directive 95/46/EC of the European Parliament and of the
the usual intention is not to publish Web pages, even if it is Council of 24 October 1995 on the protection of individuals with
possible to use in social engineering attacks. regard to the processing of personal data and on the free move-
ment of such data Official Journal L 281, 23/11/1995 pp. 31e50,
Article 2.
5.1. Intervention by ISPs 61
The Court of Justice of the European Union (CJEU) held in case
C-70/100 Scarlet v Sabam of 24 November 2011 [2012] E.C.D.R. 4
The intervention of an ISP on behalf of its customers has been (Sabam) that IP addresses at least in the hands of ISPs are per-
considered in the past, and is generally regarded as an effec- sonal data.
62
Directive 2002/58/EC of the European Parliament and of the
tive strategy in minimising the effect of users participating in
Council of 12 July 2002 concerning the processing of personal data
botnets. Unfortunately, an ISP has got limited incentives to
and the protection of privacy in the electronic communications
perform any security actions on behalf of its users because of sector (Directive on privacy and electronic communications)
the thin margins on which the market is based. The difficulty Official Journal L 201, 31/07/2002 pp. 37e47, as amended.
63
of enabling users to see and understand tangible benefits from Traffic data is defined very broadly: “‘traffic data’ means any
extra security means providers are sceptical about marketing data processed for the purpose of the conveyance of a commu-
a more secure service. In addition, should an ISP choose to nication on an electronic communications network or for the
billing thereof” (Article 2).
intervene, legal issues arise in terms of users' fundamental 64
See Recital 39 of the proposed general data protection Regu-
rights and liberties such as the right to freedom of expression lation. Proposal for regulation of the European Parliament and of
including the right to access information (Article 11 of ECHR the Council on the protection of individuals with regard to the
processing of personal data and on the free movement of such
58
Cormac Herley, “So Long, and No Thanks for the Externalities: data (General Data Protection Regulation) COM(2012) 11 final.
65
The Rational Rejection of Security Advice by Users”, Proceedings of The map was formerly at http://map.honeycloud.net/, but is
the 2009 workshop on New security paradigms workshop (2009). currently not online, 22 July 2014.
59 66
Anne Adams and Martina Angela Sasse, “Users Are Not the Spamhaus, “The Spamhaus Block List” <http://www.
Enemy” (1999) 42 Communications of the ACM 40. spamhaus.org/sbl> accessed July 09, 2014.
500 c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5

ordinary hacked computers.67 Organisations like Team have worked to some extent, though neither has been a stand
Cymru68 also notify ISPs when they detect compromised out success. Microsoft's security report rates them as being
machines coming from the ISP's network. Here an argument better than the worldwide average, though nowhere near the
can be made that in the case of the sharing of personal data to best.76 Also, research by van Eeten demonstrated the difficulty
detect infections informed consent on the parts of the ISPs' faced by ISPs, finding that they succeeded in contacting
subscribers is needed. In any case it is important that the approximately only 10% of their infected customers.77
principles of data minimisation69 and limited duration70 of the At RSA 2010, Scott Charney from Microsoft Trustworthy
Data Protection Directive are complied with by the recipient of Computing suggested that quarantine was something which
the personal data. And information relating to the recipient of should be considered as a solution, applying to the whole
the data should be provided to the subscribers, who should be Internet.78 He argued that infected devices should be treated
able to exercise their rights (e.g. right to access, or like people with an infectious disease, and used a broader
rectification71). public health analogy of “collective defence” as a means of
There are many ways an ISP could choose to notify their protecting the Internet.79 Posture checking devices which
customers that their machines have been compromised. These attempt to connect to the Internet, and denying them access in
are listed in RFC 6561, written mostly by people affiliated to the event that they are not regarded as being sufficiently secure
COMCAST e an ISP based in the USA who have attempted to is known as Network Access Control (NAC), or Network
introduce such a system.72 No method was guaranteed to be Endpoint Assessment (NEA), and is already used in corporate
100% successful and each has its own set of limitations”. For networks, particularly where employees might use their device
example, an email might be quick and possible to automate, on other networks which may expose them to security threats.
but there is no guarantee that it would be ever be read, whether Assuming quarantining sanctions are indeed imposed on
due to spam filters, users do not use that account, or simply Internet users, the legal assessment gets more complex. In
ignore it.73 On the other hand, blocking Internet access would this case, Article 15 of the data protection Directive is of
alert the user to the problem, but there may not be anything relevance as it in principle attempts to protect data subjects
they could do about it, such as if there was more than one from automated individual decisions without adequate safe-
machine on the network, or the infected device did not have guards80. In addition, even if ISPs are private actors, because
any interface with which to solve the problem. Above all, they act as necessary gateways between their subscribers and
blocking Internet access for notification purposes could be the whole Internet they are regulated differently from other
criticized on the ground of freedom of expression including the Internet actors to make sure users’ basic rights can be exer-
freedom to access information as explained below. cised.81 A right to access information (as well as to receive
There have been initiatives in a few countries to introduce
notifications in a manner like this. In Australia, ISPs are 76
Dennis (Microsoft) Batchelder and others, “Microsoft Security
voluntarily signed up to a standard which requires them to Intelligence Report Volume 16 Regional Threat Assessment”
notify customers (up to and including quarantine),74 and the (2014) <http://download.microsoft.com/download/7/2/B/
Dutch anti-botnet treaty covers 98% of the market.75 Both 72B5DE91-04F4-42F4-A587-9D08C55E0734/Microsoft_Security_In-
telligence_Report_Volume_16_Regional_Threat_Assessment_En-
glish.pdf> accessed June 20, 2014.
77
67
“ZeuS Tracker:: ZeuS Blocklist” <https://zeustracker.abuse.ch/ Michel JG van Eeten and others, “INTERNET SERVICE PRO-
blocklist.php> accessed July 09, 2014. Following the approach VIDERS AND BOTNET MITIGATION A Fact-Finding Study on the
taken by the Article 29 Data Protection Working Party and the Dutch Market Table of Contents” <http://www.rijksoverheid.nl/
emphasis put on available means not a reasonable range or ac- bestanden/documenten-en-publicaties/rapporten/2011/01/13/
tors including the data controllers it could be argued that IP ad- internet-service-providers-and-botnet-mitigation/tud-isps-and-
dresses are also personal data in the hands of organisations botnet-mitigation-in-nl-final-public-version-07jan2011.pdf>.
78
working in collaboration with ISPs such as the ones ZeuS Tracker, Microsoft, “Scott Charney: RSA 2010 Keynote” (2010) <http://
Spamhaus, and the Honeynet Alliance described in the preceding www.microsoft.com/en-us/news/exec/charney/2010/03-
text. 02rsa2010.aspx> accessed July 09, 2014.
79
68
http://www.team-cymru.org/. Scott Charney, “Collective Defense: Applying the Public-
69
Article 6(1) (c). Health Model to the Internet” (2012) 10 Security & Privacy, IEEE 54.
80
70
Article 6(1) (e). Article 15(1) reads as follows: “1. Member States shall grant
71
See Article 12 of the Data Protection Directive. the right to every person not to be subject to a decision which
72
Nirmal Livingood, Jason and Mody, “Recommendations for produces legal effects concerning him or significantly affects him
the Remediation of Bots in ISP Networks” (2012) <http://tools.ietf. and which is based solely on automated processing of data
org/html/rfc6561> accessed June 20, 2014. intended to evaluate certain personal aspects relating to him,
73
It is also a bad idea to begin getting users into the habit of such as his performance at work, creditworthiness, reliability,
reading an email, and following a link to “fix” the problem with conduct, etc.”
81
their security, which is already a vector for attacks and making See e.g. the directives comprising the Telecoms rules and in
profit known as scareware. particular Directive 2009/136/EC of the European Parliament and
74
“Internet Service Providers Voluntary Code of Practice for of the Council of 25 November 2009 amending Directive 2002/22/
Industry Self-Regulation in the Area of Cyber Security” (Internet EC on universal service and users' rights relating to electronic
Industry Association, 2010) <http://iia.net.au/userfiles/ communications networks and services, Directive 2002/58/EC
iiacybersecuritycode_implementation_dec2010.pdf> accessed concerning the processing of personal data and the protection of
June 20, 2014. privacy in the electronic communications sector and Regulation
75
Donna Buenaventura, “Dutch ISPs Sign Anti-Botnet Treaty” (EC) No 2006/2004 on cooperation between national authorities
(Spyware, viruses, & security forum, 2009) <http://forums.cnet.com/ responsible for the enforcement of consumer protection laws
7726-6132_102-3138000.html> accessed June 20, 2014. Official Journal L 337, 18.12.2009, pp. 11e36.
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5 501

information) including information online has also been rec- using DNS in this situation would then block the whole web-
ognised by judges (or through statute) at the national level82 site of thousands of pages!
and by the European Court of Human Right (ECtHR).83 It Trying to be more accurate and only target malicious or
could thus be argued that if the quarantining amounts to a vulnerable pages is trickier. Because the majority of websites
suspension of Internet access it needs the setting up of a which contain a drive-by download are legitimate websites,
judicial proceeding to be implemented. the nature of the sites can change from benign to malicious
Another approach an ISP could take would be to physically on a regular basis, requiring the ISP to continually check
prevent their users from visiting infected pages. There is websites for malicious content. This is different from file
precedent for ISPs blocking websites in the UK; Cleanfeed is sharing or pornographic websites which will remain in the
reportedly used to prevent access to child pornography using same category in general and change status less often. A
the list maintained by the Internet Watch Foundation.84 central list could be maintained, but it is not clear who
Similarly, there have been a few court cases where ISPs have should be responsible for this, and what recourse a website
been handed injunctions requiring them to block access to would have if it were (wrongly or otherwise) put on it.89
websites encouraging copyright infringement such as the Alternatively, it would potentially require every single ISP
Pirate Bay.85 A recent agreement was also reached between to check every single Web page on a regular basis to see if
the government, and major ISPs that they will require an “opt- they had changed state from malicious to benign, or benign
in” from their subscribers before they will display (legal) to malicious, which would generate a considerable amount
pornographic websites.86 However, applying this blocking of useless traffic and require a significant amount of
technique to drive-by downloads raises a host of legal and resources.
technical difficulties. At this stage, the freedom of websites operators to conduct
First, there is the usual issue of data protection and right to their business (as per Article 16 of the European Charter) is at
private life, since this requires the ISP to monitor the cus- stake as well as the right to freedom of expression of both the
tomers' traffic in order to determine which websites they are users and producers of content. The ability to conduct one's
visiting to prevent access to malicious sites. own business, in particular if one is victim of a false positive in
Second, it is true that preventing access to malicious do- terms of their compromised or vulnerable status is likely to be
mains is relatively trivial from a technical point of view. A jeopardized in a significant number of situations.
domain name is required to identify a website, and in general A final issue to note in relation to blocking websites by ISPs
web use would not be possible without this. An ISP could use is that it is possible for a user to visit a website without any
DNS to simply associate the domain name of a website with consequences. In particular, if someone has all their software
an error page.87 However, such a blocking technique is likely fully patched they are unlikely to fall victim, since the ma-
to amount to over-blocking and to become problematic from jority of drive-bys use already known about vulnerabilities.
the perspective of freedom of expression.88 Consider, for Additional precautions can be taken, such as running the
example a single page on Wikipedia being compromised, browser in a sandbox, or from a virtual machine, which makes
it a lot more difficult for malware to permanently affect the
machine90.
82
See e.g. Conseil Constitutionnel Decision n 2009-580 of June
10th 2009, at http://www.conseil-constitutionnel.fr/decision/
2009/2009-580-dc/decision-n-2009-580-dc-du-10-juin-2009.42666. 5.2. Intervention by search engines
html. In Estonia a law was adopted in 2000 stating that Internet
access was a human right. See also the situation in Finland in
Search engines provide listings of websites in order of their
2009.
83
See e.g. ECtHR Ahmet Yildirim v Turkey (Application no. 3111/ relevance to the search terms provided by the user. The most
10) of 18 December 2012 (Yildirim) at [53]. popular search engine is Google, with close to 90% of the
84
IWF, “Remit, Vision and Mission” <https://www.iwf.org.uk/ search engine market in the UK. Their PageRank algorithm
about-iwf/remit-vision-and-mission> accessed July 09, 2014. gives the rank of a website by working out the amount of other
85
See e.g. Twentieth Century Fox v British Telecom [2011] EWHC Web pages linking to it; but also adds to the calculation by a
1981 (Ch) and Dramatico Entertainment Ltd v British Sky Broadcasting
determination of the quality of the links to it.91 In order to do
[2012] EWHC 268 (Ch).
86 this, the search engine will send crawlers to every website it
BBC, “Online Pornography to Be Blocked by Default, PM An-
nounces” (2013) <http://www.bbc.co.uk/news/uk-23401076> has listed and traverse through the links on the Web page. A
accessed July 22, 2014. website will usually specifically register to be indexed in this
87
This would probably not be possible for DNS servers which
89
make use of the additional security of the DNSSEC protocol. See the interesting cases of e360 Insight, Inc v Spamhaus Project
88
See e.g. the Yildirim case at [55] in which the Court considered (2011) 658 F 3d 637; Zango, Inc v Kaspersky Lab, Inc (2009) 568 F 3d
that the blocking measures at stake having blocking access to all 1169.
90
Google sites including the applicant's website amounted to an In this context, a sandbox and a virtual machine perform a
interference by a public authority with the applicant's right to similar function. A sandbox prevents the execution of code
freedom of expression, “of which the freedom to receive and outside of the area the specific application is running. A virtual
impart information and ideas is an integral part …”. It is true machine is used to run another operating system from within an
however that it is a case of negative obligation and not positive operating system, and the state can be saved, so in the event the
obligation but it could be argued that public authorities should machine becomes infected it is possible to revert to a known good
have the duty to make sure ISPs do not systematically engage into state at the end of a session.
91
over-blocking activities and thereby take the initiative to regulate Lawrence Page and others, “The PageRank Citation Ranking:
their blocking practices. Bringing Order to the Web.”
502 c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5

way, or if there are enough links to a page then that can be However, erroneous declarations as regards the mali-
enough for it to be included in the search engine results. ciousness of certain websites could be considered defamatory,
A search engine does have the incentive to eliminate drive- as was illustrated in the case of e360 Insight v The Spamhaus
by downloads, because they reduce the quality of the search Project.98 The Spamhaus Project had to fight a five year legal
results, and consequently damage the product they are battle to defend against a defamation claim. The plaintiff, a
attempting to sell.92 A lot of work is already done by these “marketing company” claimed that Spamhaus defamed them
companies, for example, the results discussed earlier, pre- by classifying them as spammers.99
sented by Provos et al.93 are from Google. Google also provide a In addition, if the search engine is in a dominant position
“Safe browsing” service, which allows developers to check on the market, then anything it does will be scrutinised for
pages they are about to visit to see if that page is listed as being potential abuses of this position. This is particularly an issue
malicious. This is also integrated into some browsers, like for Google, who have a market share of around 90% in some
Firefox and Chrome. An inevitable consequence of indexing countries100. Given their dominance, by excluding a website
almost all websites that exist on the Web is that they are in a from their results, they are severely impacting upon that
position to see pages which are vulnerable, or which have website's ability to participate in whichever market they aim
already been compromised. This arguably places them in an to compete in. Even by using depreferencing, they risk accu-
ideal position to mitigate damage which can occur from drive- sations of bias e and indeed, Google has been investigated in
by downloads. several jurisdictions for unfairly placing its own sites higher
A search engine could include elements of the security than competitors.
posture as part of their ranking algorithms, to make it more In the USA the FTC investigated “whether Google manipulated
difficult for users to find allegedly malicious websites in their its search algorithms and search results page in order to impede a
search results. This was examined by Edwards et al., in an competitive threat posed by vertical search engines”. The investi-
approach which they referred to as “depreferencing”.94 The gation noted that enhancements made to Google's algorithms
depreferencing approach was that the ranking of websites were done to improve the product and even though they hurt
take into account the likelihood that the websites were rivals there was no anticompetitive action by them101. Google
compromised, and lower their ranking based on how certain were also investigated by the EU Commission, and agreed to
they were of the classification. It is something which Google make changes to the display of their results rather than
did do in a related area in 2011, where their algorithm was continuing with an adversarial procedure102.
changed to limit the exposure of advert filled websites.95 In reality though, the circumstances for the scenario of
Higher exposure in trending topics and search results is a blocking or ‘depreferencing’ a malicious Web page are
good thing for attackers,96 so to limit that exposure is likely to different. The cases brought by the Commission and the FTC
reduce the users infected. Edwards et al. do warn about dep- related to abuses against Google's competitors rather, than the
referencing though, that increasing tolerance of false posi- mere effects of pursuing some action which affects markets
tives could lead to the system being gamed where business they are not participants in. Search engine intervention will
rivals seek to plant malicious code on their rivals' websites.97 only be effective and thereby reduce the exposure to mali-
There are some limitations, however, to the ability of a cious/vulnerable websites for a significant amount of people if
search engine to screen and delist malicious websites. all search engines with a high market share implement it, so a
Like any analysis of malware, a search engine needs to dominant provider such as Google is an advantage from a
choose between false positive and false negative errors. The practical point of view.
consequences of falsely declaring a website to be malicious There are some technical limitations to making search
could potentially be serious to the site in question, so search engines intervene on top of legal constraints. Displaying
engine providers need to be conservative about which pages warnings or tweaking algorithms does not mean that the
they choose to classify as malicious to avoid mistakes. website cannot be accessed. As mentioned above, Akhawe
On the other hand, one could argue that as regards the found that between 9 and 23% of people ignored browser
issue of the freedom of expression of Internet users including
content providers and the freedom to conduct one's business 98
e360 Insight, Inc v Spamhaus Project (2011) 658 F 3d 637. The
of content providers search engines should be treated differ- Spamhaus Project are different from a search engine though, in
ently than ISPs since their activities is not essential to get that they compile lists of domains which are known to send
access to the Internet and the delisting is not tantamount to a spam, so their listing is a blacklist rather than a list of results. See
blocking. also Zango, Inc v Kaspersky Lab, Inc (2009) 568 F 3d 1169.
99
Ibid.
100
BBC, “Google's Market Share ‘Dips below 90%’ in UK” (2012)
92
Benjamin Edwards and others, “Beyond the Blacklist: <http://www.bbc.co.uk/news/technology-20222085> accessed
Modeling Malware Spread and the Effect of Interventions”, Pro- June 20, 2014.
101
ceedings of the 2012 workshop on New security paradigms (ACM 2012) FTC, “Statement of the Federal Trade Commission Regarding
<http://dl.acm.org/citation.cfm?id¼2413302> accessed February Google's Search Practices In the Matter of Google Inc. FTC File
09, 2014. Number 111-0163” (2013) <http://www.ftc.gov/sites/default/files/
93
Provos and others, “The Ghost in the Browser Analysis of documents/public_statements/statement-commission-
Web-Based Malware”, n16. regarding-googles-search-practices/130103brillgooglesearchstmt.
94
Benjamin Edwards and others, “Beyond the Blacklist”, n92. pdf> accessed June 20, 2014.
102
95
Moore, Leontiadis and Christin, n21. Joaquin Almunia (European Commission), “Statement on the
96
Ibid. Google Investigation” (2014) <http://europa.eu/rapid/press-
97
Benjamin Edwards and others, “Beyond the Blacklist”, n92. release_SPEECH-14-93_en.htm> accessed June 20, 2014.
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5 503

warnings about websites103. Even at the lower bound of 9%, Arguably, the hosting providers are in a better position
this is a not insignificant amount of traffic continuing to visit than the website operators to discover the vulnerabilities, not
these websites. least because of their technical expertise. There are different
In addition, it is possible for malware to hide from search models which a hosting provider can offer to a customer, from
engines, such as IP centric malware which was discussed simply renting out the hardware, to fully managing the con-
earlier. This is also the case the search engine cannot see parts tent management system (CMS). Often the model will be that
of the website. There is a protocol, which, though not officially the hosting provider will provide the hardware and the oper-
recognised, is commonly followed, and consist of keeping a ating system, and grant the customer permissions to host
page called robots.txt in the main folder of a website (as in their website on that. In addition, the hardware and operating
www.example.com/robots.txt) indicating which automated system are often shared between many customers. This
scanners are permitted to visit the website; and which pages model means that the hosting providers have a greater
of the website they are allowed to visit. A scanner would amount of privileges, and therefore control over running
identify itself through its user agent, for example Googlebot or processes, and hence are in a better position to run software to
Bingbot identify Google or Bing the two main search engines. monitor changes to the file system or uploads to the server.
On most occasions, the website operators would be keen to Firewalls can also be deployed, to protect both the infra-
have these search engines visit their website, but in the event structure, and the Web application itself. Web application
that they do not then the search engines would follow the firewalls seek to block malicious traffic characteristic of the
protocol and decline to visit certain pages. vulnerabilities described above.
Pages which the search engines do not see would not Particularly in a shared hosting environment, it appears far
necessarily be high in the rankings, but could still be used to more efficient for a single entity (i.e. the hosting provider) to
host malware linked to from emails or social media. As dis- deal with security aspects. The tasks are largely the same, and
cussed earlier, phishing emails use compromised websites in can therefore be repeated across different accounts with little
some 90% of cases104, and social media websites are becoming duplication of effort.
increasingly popular with attackers because, in part, of the Whether hosting providers are currently doing a good
higher conversion rate105. Research has shown that some 8% enough job is open to question. From the responses received
of all URLs on Twitter are spam of some description. Facebook by StopBadware in their survey, there was general dissatis-
is also targeted through clickjacking, where a “like” button or faction at the lack of support from the hosting providers once
similar is hidden on a (spam) page a person visits which per- websites become compromised108. Some believed that the
suades the victim's friends to visit the page as well106. host was directly at fault, in that by becoming compromised
themselves that led to all the customers on the server being
compromised as well. This was a motivation behind a study
5.3. Intervention by hosting provider
by Canali, Balzarotti and Francillon, who examined the ability
of popular Web hosts to deal with attacks and compro-
Typically, the responsibility for managing the website and
mises109, and appeared to show that the security protection on
running the Web server will be split. Often the people who are
offer is inadequate. They created websites at a series of
concerned with the content of the website will not have much
different providers, and simulated different types of attacks
technical knowledge. As such, it has been shown that
on them in order to see how the hosting providers responded.
following the exploitation of one of these vulnerabilities, and
Whilst some providers were able to block or mitigate some of
the subsequent compromise of the website, it can take a while
the attacks, none of them provided complete protection. Of
for the operator of the website to notice the problems and fix
those who failed to mitigate these attacks, the researchers
them. StopBadware and Commtouch conducted a survey of
made an abuse report to the provider, which is a complaint to
website operators whose sites were compromised, finding
the hosting provider about inappropriate content on the
that 63% of their sample didn't know about how their website
websites. Of those, only 50% of hosting providers replied to the
got compromised, and only 6% were able to identify the fact
report, and only one chose to notify the website operator.
that their website had been compromised. Almost half were
The approach taken by Antagonist, as a Web hosting pro-
only notified when they were faced with a browser warning
vider performing vulnerability scans on the websites which they
screen. Regarding the recovery, some 26% of the websites
host, is a promising idea which could be more widely adopted. In
remained compromised, and only 46% were able to solve the
the event that a website is found to be vulnerable, then the
problem themselves107.
operator should be notified and given an opportunity to fix it.
103 This might be as simple as upgrading the CMS, to completely
Akhawe and Felt, n19.
104
Rasmussen and Aaron, n28.
changing the code used to communicate with the database110.
105
See e.g. Moore, Leontiadis and Christin., n21, and Eva Zan-
108
gerle and Günther Specht, “‘Sorry, I Was Hacked’ A Classification Ibid.
of Compromised Twitter Accounts.” 109
Davide Canali, Davide Balzarotti and Aure lien Francillon,
106
Mohammad Reza Faghani, Ashraf Matrawy and Chung-Horng “The Role of Web Hosting Providers in Detecting Compromised
Lung, “A Study of Trojan Propagation in Online Social Networks”, Websites”, Proceedings of the 22nd international conference on World
New Technologies, Mobility and Security (NTMS), 2012 5th Interna- Wide Web (2013).
110
tional Conference on (2012). In order to be effective, this would need to ignore robots.txt as
107
StopBadware and CommTouch, “Compromised Websites: An described in Section 5.2, but this is something which could be
Owner's Perspective” (2012) <https://www.stopbadware.org/files/ included as part of the terms of service between the hosting
compromised-websites-an-owners-perspective.pdf>. provider and the website operator.
504 c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5

In the event that the operator chooses not to respond, or invest additional effort in order to obtain a higher pay off (e.g.
does not make a fix quickly enough, then the access to this spear phishing someone with access to a corporate bank ac-
site could be blocked. This could be done by DNS, which is the count). Automation is a key part of the scalability of attacks,
system for converting domain names (like www.example. and might suggest why the majority of people never get
com) to IP addresses the computer can understand (like attacked despite bad security practices e they fall above the
152.78.64.12). It is possible for someone to browse to a website minimum “attack everyone” model, whilst falling below the
using the IP address, but it would require that they knew it required payoff for a targeted attack.114 An example of a
and it would not typically appear in search results, conse- scalable attack like this was presented by Kanich et al., who
quently minimising exposure to the page whilst allowing it to discovered that sending 350 million spam emails sold only
be fixed. Like search engines, hosting providers would need to $2800 worth of pharmaceutical product. They suggested as a
be wary of false positives and false negatives, but they would possibility, based on their results that the attackers might only
not need to be concerned about defamation to the same just be making a profit.115 This does not suggest that the whole
extent since they are in a contractual relationship with the of the cybercrime ecosystem is based on such narrow mar-
websites they host. gins, but it does raise an interesting possibility for a slight
There is also a danger that a system like this could cause increase in costs to the attacker being enough to reduce in-
complacency with the website operators and actually increase centives for participation.
the danger of compromise. A similar phenomenon has been The monitoring of hosted websites to detect malicious web
observed on the client side, with Christin et al. finding a pages should easily be justified for data protection purposes:
correlation between anti-virus installation and malware in- in such a case the hosting provider would pursue a legitimate
fections.111 Similarly, WhiteHat security hypothesised that interest as per Article 7 of the Data Protection Directive.
complacency was a reason for a higher level of vulnerability in Ensuring the security of one's network is deemed to be a
websites which used static source code analysis tools legitimate interest by Recital 39 of the General Data Protection
compared to those who did not.112 Regulation. Besides, this is a distinct process from crawling
Automated scans will not catch all vulnerabilities of the website e there is no need for the hosting providers to
course, since some vulnerabilities will require very specific view any data stored on the server at all. It is possible to
input to exploit it, but reduction in vulnerabilities will natu- trivially identify versions of software running on the server,
rally make the job of an attacker more difficult and therefore and attacks could simply be done in a “proof of concept”
reduce the requirement for ISP or search engine monitoring. manner e.g. attempting to inject the SQL command SELECT
Consider in economic terms the plight of an attacker in ‘abc’ WHERE …, which would literally select the characters
choosing which websites to attack. He will seek to maximise ‘abc’ under a certain condition. If the vulnerability is suc-
his profit, and will hence seek to gain the maximum amount cessfully exploited, then at some point on the page these
of traffic for his effort. At present, given that vulnerabilities characters will appear.
are known, and it is possible to fingerprint websites to see What could be slightly more problematic would be the
which software they are running, he can run his own auto- decision to block access to an allegedly malicious website if
mated scan and any vulnerabilities will be found. The ho- the website operator does not implement the necessary se-
mogeneity of the Web and software industry mean that curity measures. Automated individual decisions under
existing vulnerabilities will work for potentially thousands of Article 15 of the data protection Directive would need to be
websites, therefore minimising the work the attacker has to explained to users who should be given an opportunity to
do. In the event that all known vulnerabilities are patched, the oppose the decision. What is more the blocking of a website
attacker is left with two choices. Firstly, he can visit websites would also have repercussions for the exercise of the right to
manually and attempt to find the vulnerabilities which auto- freedom of expression and the right to conduct one's business.
matic scanners miss and lose out on the advantages of auto- This being said, because hosting providers are private actors
mation which computers offer. Secondly, he can attempt to and are not providers of publicly accessible electronic com-
find new vulnerabilities (zero-days) on his own and use an munications services they should enjoy more freedom in
automated scanner to find sites vulnerable to these vulnera- terms of private ordering through the means of their terms of
bilities.113 These vulnerabilities take considerable resources to use. Besides, from the perspective of freedom of expression,
find, and their value decreases rapidly, as soon as they are restricting the activity of content providers might be more
found about, and as such tend to be used for targeted attacks acceptable than restricting the activity of content consumers,
rather than drive-by downloads. those willing to access content.
Making hosting providers detect vulnerabilities could Assuming hosting providers should bear a duty to take
therefore have a significant effect on attackers, who rely on precautionary measures in relation to the security of their
scalable, indiscriminate attacks with a low yield (standard platform and the security only, and, in particular scan for the
phishing emails); and targeted attackers who are prepared to presence of malicious web pages it is nonetheless arguable
that Article 15 of the E-Commerce Directive as it stands would
111
Nicolas Christin and others, “It's All about the Benjamins: An
114
Empirical Study on Incentivizing Users to Ignore Security Advice” Cormac Herley, “The Plight of the Targeted Attacker in a
[2012] Financial Cryptography and Data Security 16. World of Scale.”, WEIS (2010).
115
112
WhiteHat Security, n34. Chris Kanich and others., “Spamalytics: An Empirical Analysis
113
Whilst still competing against other attackers, further mini- of Spam Marketing Conversion”, Proceedings of the 15th ACM con-
mising their income. ference on Computer and communications security (2008).
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5 505

impede the discovery of such a duty. Article 15(1) provides websites by protecting users’ machines, or minimising their
that “[m]ember States shall not impose a general obligation on exposure e.g. through warning messages. Whilst good as a
providers, when providing the services covered by Articles 12, supplemental approach, this has been shown to not be
13 and 14, to monitor the information which they transmit or enough. A more proactive approach is required. Such an
store, nor a general obligation to actively seek facts or cir- approach cannot rely on the users themselves (be they con-
cumstances indicating illegal activity”. But because the scan- tent producers or content consumers), because in many cases
ning would be implemented to discover malware and it is not possible for them to protect themselves. In particular,
malware only it could maybe be sustained that the monitoring many devices are constrained by either hardware or software
would remain limited in scope and only amount to a special and are hence unable to upgrade to the latest version. In
monitoring obligation. addition, it is counter-intuitive for many users that they
Combining Article 15 and Article 14 of the E-commerce might need to do anything else after having installed the
Directive would however mean that the only way one could software.
impose a duty to take precautionary measures upon hosting The attention must be concentrated on the server side, in
providers, which in principle should not be financially liable other words upon hosting providers which are in a better
for the information put on their systems by their users,116 position than ISPs running access networks or search engines
would be to issue an injunction against each of them. This is to prevent the compromising of vulnerable websites by iden-
where it might be necessary to modify the scope of internet tifying at an early stage the presence of vulnerabilities on the
intermediaries’ liability exemptions to allow for the develop- websites they host. In addition, from a legal perspective, the
ment of best practices in the field of cybersecurity while screening and eventually the blocking of compromised web-
making sure that when these providers take reasonable sites by hosting providers appears to be less problematic than
measures but eventually make a mistake bona fide they are the intervention of ISPs or search engines.
not held financially liable unless they do not act promptly to Unfortunately the system of immunities to be found in the
correct the mistake.117 e-Commerce Directive does not give hosting providers the
right incentive. In order to benefit from Article 14 they are
required to be neutral, which seems to mean that they cannot
6. Conclusions exercise any type of control over the content they host. Yet the
systematic screening of the content hosted on hosting pro-
We have presented a background of one of the major security viders' platforms is likely to be characterised as active moni-
issues to be faced by Web users today, that of drive-by toring. It remains to be seen if there is a market for services
downloads. This is different to previous types of attacks, in supplied by hosting companies like Antagonist, a market
that it relies on the user to visit the malicious page rather which is likely to stay a niche though and be of interest only to
than it being possible for the network to simply place a fire- computer-savvy users.
wall and prevent attacks from coming in. Current approaches
have focused on mitigating the damage of compromised

116
Article 14 provides that “Where an information society ser-
vice is provided that consists of the storage of information pro-
vided by a recipient of the service, Member States shall ensure
that the service provider is not liable for the information stored at
the request of a recipient of the service” as long as they don't have
actual knowledge of the illegal content they are hosting. To be
more precise the immunity is set forth in such a way that if
hosting providers take the initiative to scan their platforms they
are likely to lose their immunity. Therefore they are actually
given an incentive to remain passive. See CJEU, 23 March 2010, C-
236/08, C-237/08, C-238/08, Google France et Inc. c/ Louis Vuitton
Malletier SA, Viaticum SA, Luteciel SARL, CNRRH [2011] All E.R. (EC)
411 at [113] (Google v Vuitton); CJEU, 12 July 2011, C-324/09, L'Oreal
SA et al v eBay International AG et al [2012] All E.R. (EC) 501 at [113]
(L'Oreal v eBay). In L'Oreal v eBay the CJEU stated that “where a
service provider, instead of confining itself to providing that
service neutrally by a merely technical and automatic processing
of the data provided by its customers plays an active role of such
a kind as to give it knowledge of, or control, those data” the lia-
bility exemption will not apply.
117
S. 230 of the US Communications Decency Act of 1996 does
contain a good Samaritan provision providers of interactive
computer services taking down offensive content but because
this provision is coupled with a very broad immunity for third-
party “speech”, providers of interactive computer services are in
fact given an incentive to remain passive. See 47 U.S.C. x 230.