althobaiti2021chi
althobaiti2021chi
therefore needed. So when providing users with support we need their email provider, their gaming account, or a lottery site they
to think beyond simply telling them if something is phishing or just won money on. Attackers are also often interested in account
not and instead focus on helping them leverage their contextual credentials, so it is useful for them to mimic existing services to trick
knowledge of the situation in conjunction with available data to users into entering their credentials or providing other sensitive
reach a decision. data that the user might normally only give to a trusted party. One
When judging the safety of a URL, experts generally have more side effect of this approach is that the user has strong expectations of
experience and data sources to draw from but at the end they look where they think they are going when they click on any links. The
for discrepancies in the data and their expectations [84]. They can other side effect is that, except in very rare situations, the attacker
collect the data using tools like WHOIS (ICANN’s domain lookup) does not have access to the company’s real URL and instead must
to learn about the registered domain owner or understand the im- setup a fake one, so the URL the user clicks on is owned by someone
plications of a links up-time and popularity. However, using such other than the group the user thinks they are interacting with.
tools requires an impressive amount of both access to information In a review of URL phishing features used by humans and by
and knowledge about how to interpret them. Each URL case also automated systems, Althobaiti et al. [5] observed that the domain
requires a slightly different set of information sources and knowl- part of the URL is the most used feature in human-based detection
edge, making training users to make such judgments on their own because they can compare it against their expectations. It is less
overly burdensome. useful for computers because the computer has to guess if the
Our goal is to support end users, so that they can engage in URL matches the content of the communication. The problem is
some of the informed reasoning experts currently use when they that while theoretically combining the domain with contextual
want to decide on a URL’s safety. More specifically, we want to take knowledge should make phishing easy to detect by humans, in
existing information sources along with knowledge about how to practice, people struggle to accurately parse URLs [2], making
interpret those sources and use it to help end user decision making. comparison extra challenging.
To do so, we started with a grid-based report structure inspired
by the Privacy Nutrition Labels work by Kelly et al. [38]. The grid 2.1 Mouse over the link and look at the URL
presents the user with information about the URL, drawn from One common piece of advice users are given is to mouse over links
existing research on the URL features that are likely to be the most in communications and look at where they go [33, 54, 68]. This is
useful to humans [5], and is annotated with explanations aimed good advice, especially in cases where the URL is very different from
at helping users interpret the information. We then iterated on expectations such as an email, supposedly from PayPal, containing
its design with the assistance of 8 focus groups consisting of end a moonstone235432.net link. But the advice gets harder to follow if
users, security experts, and design experts to simplify the interface the attacker uses any of a wide range of tricks [22, 26, 49, 89].
and improve the explanation of features. After we created a stable
design, we analyzed what it would look like on 4640 URLs from
two phishing datasets and two safe URL datasets. The goal was to
determine if there was any redundant information on the report
and also if the features we chose do generally align with the safety
state of the URLs. Finally, we ran a user study with 153 Prolific
users to determine if they could correctly understand the report
Figure 1: Example URL along with its structure.
contents and make accurate safety judgments.
We found that focus group participants saw how such a report
could be useful in cases where they were unsure about a communi- As Figure 1 shows, a URL is made up of many elements which
cation. The later focus groups also found the report design useful impact its destination and can be easy for end users to confuse.
and informative. Final versions of the report featured only show- For phishing, the most important element to look at is the host
ing relevant information, and colors to help users know where name, particularly the domain [12, 20, 39, 43, 45, 74, 90]. This part
to focus. In our analysis of how the report would look with real of the URL controls what server will be contacted to fetch the
URLs we found that for most URLs the report only needed to show page, essentially, who controls the page. In order to divert the
about 7 of the 23 possible information rows, greatly limiting the user to a page they control, the attacker must specify their own
user’s reading burden. The colors also tended to align with the URL domain and use tricks to make it look legitimate. We detail some
being phishing or not. The online participants and focus group of the tricks here and refer the reader for a more comprehensive
participants exhibited similar interpretations of the various report overview [6, 22, 26, 64].
elements, suggesting that our richer focus group data was a good The simplest and oldest approach is using a complicated looking
representation of what Prolific users also thought. domain name like the raw IP address, hex or decimals characters
instead of the real one [26, 47]. A slightly more advanced approach
is to pick a domain that looks visibly similar to the real one, but is
2 DECIDING IF A URL GOES WHERE THE actually different [26, 49, 78, 89]. Even skilled security experts have
USER THINKS IT GOES difficulties with this kind of deception [17, 24]. For example, in so-
Phishing communication often works by convincing the user that called homograph attacks English characters are substituted with
the message they received is from a legitimate group they want to identical looking UTF8-encoded characters from different alphabets
interact with. Examples include: their bank, their IT department, such as páypal.com and paypal.com [22, 25, 69]. Another example of
I Don’t Need an Expert! Making URL Phishing Features Human Comprehensible CHI ’21, May 8–13, 2021, Yokohama, Japan
a look alike attack is misspelling (typosquatting). A classic example 2.3 Redirects and Short URLs
is substituting characters like ‘vv’ for ‘w’ or capital ‘I’ for lowercase While the domain shown in a clicked URL is often the same as
‘L’ which look identical with a sans-serif font [69, 76]. These two the final destination URL, that is not always true. Organizations
types of look alike attacks, while dangerous, are very popular and commonly do minor redirects such as adding ‘www’. Some may
hard to detect by current industry anti-phishing tools [65, 77]. also redirect to their preferred brand such as nyt.com redirecting
Another trick is to leverage users’ inability to differentiate be- to www.nytimes.com. More challenging are URLs that obscure the
tween URL components [22]. For example, Albakry et al. [2] found real URL completely making the URL’s destination impossible to
that users cannot differentiate between a company name in the predict without assistance [13, 28]. Examples include URL short-
subdomain vs. the domain of a URL. Similarly, Reynolds et al. [69] ening services (e.g. bit.ly) [8], QR codes, and URL-rewriting by
found that users struggle to correctly parse URLs, but have high email servers (e.g. safelinks.protection.outlook.com). Thankfully,
self-confidence in their ability to interpret URLs. A dangerous com- users do seem to be aware that they cannot predict the destination
bination that helps attackers. A common trick involves putting a of shortened URLs [2].
brand name into an incorrect position, such as in the subdomain
(e.g. amazon.evil.com), path (e.g. evil.com/amazon), search string
(e.g. evil.com?amazon), or even username (e.g. [email protected]). A 3 RELATED WORK
similar trick is to swap out the top-level domain (TLD) such as
amazon.evil instead of amazon.com [71] or put a fake TLD into a
Research on preventing phishing attacks has adopted three com-
subdomain (amazon.com.evil.com). plementary approaches: automating phishing detection, educating
users about phishing, and supporting users’ decisions with security
indicators. A full review of automated phishing detection is outside
the scope of this paper, though we review some of the features in
Section 5. We present a brief overview of the other areas.
3.2 Phishing detection support into a food nutrition label like format. We thought that a similar
In phishing detection support, a computer assists the user by pro- approach might show important URL features to users in a consis-
viding extra information or comparing the URL to known labeled tent format that might allow them to learn over time. Thus, our
ones. These support systems can take several forms, e.g., browser goal is to develop a “URL nutrition label”, including framing URL
warnings, chatbots, and toolbars. This collaborative approach is information in a way that assists users in leveraging their contex-
suggested to be complimentary by Park et al. [61] who argue that tual knowledge and expectations to judge if a given URL belongs
through utilizing the complementary strengths of a human and an to the organization they expect. We call our design a URL feature
agent, we can achieve the results we desire. report. Our report aims to address the following key design goals:
Several existing tools provide phishing-detection support for
users. Netcraft’s browser plugin [48] warns users about blacklisted A. Comprehensive. The report should include enough infor-
webpages once they visited them; as well as clearly displaying the mation to help users make an informed decision about the
website’s country, site rank, hostname, and other facts to help users safety of almost all URLs, including the ones in Section 2. To
identify fraudulent URLs. SpoofStick presents the domain name in avoid overloading users, the interface should also present
the browser toolbar to highlight cases where there is a legitimate- only necessary information [60].
looking domain name in a wrong position [88]. Yang et al. also B. Support knowledge acquisition. Each phishing indicator
designed security warnings based on website traffic ranks [91]. The needs to have an explanation that helps non-experts un-
Faheem chat bot [6] provides basic facts about any given URL; derstand the information as well as support higher level
including the existence of misspellings, non-ASCII characters and reasoning about it [16].
redirection. Users can also ask the bot to elaborate on any term and C. Promote confidence. Users need to have confidence in
receive a longer explanation. TORPEDO [82], a Thunderbird add- their final decision in order for the report to have its intended
on, presents and highlights the domain of a URL linked in hypertext impact. Therefore, the report should support users in con-
on a email. The add-on will disable the links for 3 seconds so users fidently making decisions on their own rather than blindly
stop and think about the URL safety. trusting recommendations. We aim to support users’ confi-
The above security indicators take a similar approach to our dence by providing conceptual and procedural knowledge
proposed report. We are presenting the user with information about (know-how) when explaining the phishing indicators [7, 53].
the URL prior to visiting it under the assumption that with support, D. Inspire Trust. The report should inspire users to trust it by
they will have the ability to identify unexpected aspects of the link. regularly providing accurate information and explaining its
Our work differs from existing solutions in that it focuses on how to recommendations in a way that a user can verify themselves.
express potentially complex URL and web hosting concepts to users Building trust with users when they need help will also
in an easy-to-comprehend way. Existing solutions either focus on improve their acceptance to taking the help [41, 67].
providing support to more technical users who may already have a E. Support comparisons. The report should allow users to
strong lexicon of internet terms like “host”, “domain” and “hosting compare the aspects of the report to their own understanding
provider” or providing basic support that does not add much to the and, potentially, against reports of other URLs. Supporting
upfront user training. Our work is aimed at bringing this type of comparisons makes it easier for people to use the report for
information to a broader audience. their tasks, the consistent positioning of information also
allows them to learn the location of data for faster future
access [38]. For example, a user may bank with Skrill and
4 DESIGN GOALS sees on the report that the domain is registered to an address
From the above related work we can see that there are three large on the Isle of Man, unsure if that is correct or not, they also
problems that need to be solved: 1) human judgment is needed ask for a report on Skrill’s main website to see if it is also
to determine if a URL is safe because the human has contextual registered to the Isle of Man.
knowledge that is not available to the computer, 2) URLs are made
up of a large number of components that are hard to parse correctly
and contain information like certificates and redirects that require 5 DESIGNING THE INITIAL REPORT
computer assistance to read, and finally, 3) there are many disparate Reading a URL and making an accurate judgment requires accessing
data repositories that contain data pertinent to URL trustworthiness, a wide variety of URL facts as well as understanding what those
e.g. DNS records of registration dates and phishing feed lists of facts mean. These facts are consistent between URLs. As part of our
known malicious URLs, which have a wide range of interfaces and design goals, we focus on selecting features that will help people
locations making them non-trivial to use. most in making informed decisions about URL safety and how to
Therefore, as mentioned in Section 2, to judge a URL correctly present them to users. For an initial list, we started with the findings
a large range of URL features is required. For humans, the most of Althobaiti et al. who reviewed phishing features used in human-
indicative feature is the domain since they understand the context training and automated detection research [5]. We then narrowed
in which they see the URL and understand which organization’s the list down to features that had been shown to be robust and had
domain they would expect [5]. But predicting the destination of the potential to be human-friendly. We also excluded features that
URLs is non-trivial. So, to best assist users in this task, we drew were highly technical and could not be combined with contextual
inspiration from the privacy policy nutrition label work by Kelly et knowledge to make informed decisions, for example, the DNS-based
al. [38] where a large number of privacy policy elements were put features [5].
I Don’t Need an Expert! Making URL Phishing Features Human Comprehensible CHI ’21, May 8–13, 2021, Yokohama, Japan
Table 1: The threshold for the features used in the URL re- text is used to both highlight potential issues as well as provide
port. ‘-’ means that the row will not be shown in that situ- guidance as to what the problem might be. For example, in the
ation. Features were also added (⋆) and removed (⋆⋆) from initial design, a Google PageRank of 0 is low suggesting that the
the report due to design iteration changes. page is probably not apple.com.
The first and foremost indicative feature is the domain itself. If
Feature Red Yellow Green the user is able to detect that the domain is not what they expected,
Facts Domain - - - they are likely to succeed in avoiding the attack. We adopted the
(mostly Category ⋆ Malicious Web-host - common advice to search for the company’s name in Google and
neutral) Registrar Location ⋆ - - - look at the top few results. This works well because most modern
Facts Domain Popularity - < 300K < 150k search engines use popularity as an ordering metric [74].
PageRank - 0-3 4-10
Two more revealing components are the relative popularity of
Domain Age <3M <6M ≥6M
In Search Engine No match Partial match Match a website, which we determine using Alexa’s most popular do-
Encryption ⋆⋆ - Unencrypted - mains [4, 26, 50, 91], and the PageRank of a webpage [4, 26]. These
Tricks No. of External Domains >4 2-4 - two popularity scales both imply how popular a website is, but
No. of Short URLs in Chain - >1 - we present both to users because they do not always agree, most
Blacklisted in Chain >0 - - commonly in web hosting situations. If the domain is a web host,
IP Address 1 - -
Non-standard Port - 1 -
the Alexa popularity is the same for all pages and subdomains un-
No. of Subdomains >4 3-4 - der that domain, whereas the PageRank may differ between pages
Credential in Host 1 - - under one domain. Finally, we use the domain age from Whois
Has Unicode ‘%’ 1 - - records in our report since users can efficiently compare it to the
Hex Code in Host 1 - -
Non-ASCII Mixed lang. Non-ASCII -
expected duration of the organization’s online presence.
Out-of-position TLD A token - - Encryption is another hint for safety, indicating if the connection
Out-of-position Protocol A token - - with the server will be encrypted or not (https vs http). HTTPS adds
Out-of-position ‘www’ A token A sub-token - encryption so users’ information remains protected from unautho-
Top Targeted in subdomain A token A sub-token -
Similarity to Top Targeted - 1 -
rized access in transit. Unfortunately, encryption is not a highly
Similarity to Alexa Top 10k - 1 - reliable indication of phishing websites [57], especially, since the
introduction of LetsEncrypt [19] which gives free encryption cer-
tificates to anyone. It is, however, a useful security aspect.
To present the features in a comparable, well-arranged format
as required by our design goals A and E, we split our initial design
into four sections (See Figure 2). 5.3 Tricks
There are many ways to manipulate a URL to look legitimate. In
5.1 Notice and reminder
this section we aim at identifying and pointing out these malicious
On the top of the report we show the URL that was asked about tricks to users. Since the existence of tricks is very indicative of
for reference. For URLs that redirect, we display both the requested phishing, we check the URL for about 16 different tricks, many of
URL and the one that would be redirected to if they clicked the link. them lexical. For example, we examine the URL for the existence of
We also check the URL against known malicious URLs and clearly misspelling by comparing the domain with top targeted domains
state if it is already known to be safe or malicious. on PhishTank and Alexa’s top 10,000 domains [77]. Each identified
PhishTank and Google Safe Browsing both provide lists of re- trick is then shown to the user as a row under the tricks section
ported malicious URLs, approved by security communities, which along with both a explanation of the trick and evidence (Goal C). To
could be used to automatically alert a user [30, 79]. limit the length of the report, only identified tricks are shown, and
On the other hand, the Extended Validation Certificate (EV cer- if a URL has no tricks we simply state that no tricks were found.
tificate) is used to mark a URL as safe because it indicates that a site’s In addition to misspellings, we also look at mixed language use,
ownership has been verified by a certificate authority, which is suf- i.e. the existence of characters from conflicting alphabets. While
ficient evidence [51]. For other URLs, neither blacklisted nor with a no longer popular [22], the existence of IP address, hex or deci-
verified owner, the safety is unknown. Thus, we avoid false positives mal characters often indicate phishing URLs. We also reverse IP
and inspire trust in the report’s safety information (Goal D). addresses to the human-readable domain when possible.
Other tricks used by attackers to mislead users are also specified
5.2 Facts in our report such as the number of subdomains, ‘@’ in the host-
In this section, we provide more details about the website’s URL name, and out of position ‘http’, ‘https’, TLD, and ‘www’. We also
features to help users decide whether this domain indeed belongs use the PhishTank’s top targeted brands to identify if a targeted
to the expected institution or not. Each fact is presented with a brand name is in the subdomain [10]. Additionally, we consider redi-
fact name in bold on the left followed by a short description and rections, including multiple chained redirection. We determine the
the value on the right (Goal B). This section has the most consis- number of external domains [50], number of shortened URLs [29]
tent structure; however, we only show relevant features to achieve and blacklisted URLs in that chain [46] and flag them as suspicious
goal A of being comprehensive without overloading the users. Red if they exceed a threshold. The full list of tricks is shown in Table 1.
CHI ’21, May 8–13, 2021, Yokohama, Japan Althobaiti, Meng and Vaniea, et al.
Figure 2: Our initial design of the report which was shown to the first focus group (left) and the final design of the report (right).
Table 2: Focus groups including their participants’ expertise everyone finished, the researcher moderated a discussion about
and group size. the report. This process was repeated with another 2-4 email and
report combinations depending on time.
Group Type Size Gender
G1 HCI 3 2F, 1M 6.3 Outcomes
G2 Security 2 1F, 1M
6.3.1 Overall Impressions. Participants generally liked the re-
G3 Security 4 4M
port, both content and design, and found themselves well sup-
G4 Non-technical 4 4F
ported making a decision. Initially, they wished for a clear state-
G5 Non-technical 4 3F, 1M
ment whether the URL is safe or not. After we explained that most
G6 Non-technical 5 4F, 1M
URLs cannot be definitively classified that way, they tended to un-
G7 Non-technical 5 5F
derstand, but the concept did not come naturally. A G6 member,
G8 Non-technical 5 3F, 2M
for example, started with the strong view that safe/unsafe presen-
tation would be best, but after being presented with a URL from
a real phishing email sent to most of the University population,
G1 and G2 were recruited from our University community. G3 was he immediately identified it as phishing and recalled that his own
recruited from a local security workshop and contained security ex- anti-phishing tools had failed to identify it at the time. Our report
perts from industry. All three groups were unpaid and participated showed him that the URL lead to an organization located in South
primarily out of interest in the project topic. Africa, which is not an expected location for a Microsoft URL.
We recruited non-expert participants from The University of Users had mixed opinions about the interface and its long-term
Edinburgh using various email lists including students from art, usability. Early groups found the interface overwhelming and very
psychology, and physics while computer science and informatics long, but the perception improved through iteration on the content
were excluded. We chose this group because students are known and presentation. The last few groups found the interface appealing
for falling for these types of attacks meaning that they represent and were even interested in using it either in their daily lives or as a
the type of people our report should support. They also rely heavily tool when uncertain about a phishing message. They described the
on the Internet for their studies [36] making them vulnerable to report as a useful tool to make a confident decision about the safety
malicious links [81]. They were compensated £10 for 90 minutes. of a URL. A member of G5 for example explained its usefulness as
“Alongside with intuition, there is relevant support and information
6.2 Procedure for me to make decision on whether to trust the website subse-
We first provided a consent form and collected demographics via a quently”. Similarly, a member of G7 explained: “I think for most
paper survey. In expert focus groups (G1-G3), we gave a 10 minute people this would provide enough information to make informed
presentation on phishing, our motivation for the project, and com- decisions with a high level of confidence. Very interesting”.
mon URL manipulation tricks. The presentation was provided to They had varied feelings about the trustworthiness of auto-
ensure all the experts are aware of the context which allowed us detection tools in general. A G6 participant stated: “I trust the
to best leverage their expertise. For average users, we excluded machine a lot, but I will trust myself more. This interface will help
the URL manipulation tricks part of the presentation because we me to educate myself”. This attitude is not only in line with the goal
wanted their normal reaction to the reports without a prior knowl- to support a user’s decision, but also typical of phishing training
edge of the tricks. As a warm up for all groups, we asked participants which teaches users to not completely rely on the severity level of
to share a recent experience with phishing communications includ- the indicators but also encourages them to consider their expecta-
ing how they discovered that it was phishing. Doing so helped the tions. Similarly, a participant of G7 said: “It would work for helping
participants better conceptualize what “phishing” meant, while also classify URLs to safe and not safe. It is important to educate users
providing a set of concrete examples which were often referenced and not just trust the software of taking decisions”. Participants
in later discussions. were also able to learn from the report itself as a participant in G7
After the initial discussion, we handed out two sheets of paper: an described: “I learned to prioritize the results”.
email containing a URL and the report about the URL. The email was
provided so that participants would have the contextual information
6.3.2 Visual Appearance and Interaction.
necessary to use the report. We used real non-malicious emails
previously sent to the researchers as a start-point and replaced
some of the existing URLs with malicious ones. Participants were Symbols and Colors. Over the course of the focus groups, we
told to imagine that they had received the email but were worried adjusted the use and prominence of symbols and colors according
about it so they entered the URL into an online report generator and to feedback. The first group, G1, found the colored symbols in the
got the provided report. They were first asked to use the report to left column too small to read and were concerned that they would
decide on their own if the message was real or phishing. Meanwhile, not be sufficiently obvious to readers and have unclear meaning.
they were encouraged to mark elements of the interface that they We therefore added descriptions such as “known issue” on a solid
found helpful or confusing with provided colored pens. Participants background color to make the meaning clear (See Figure 3). In the
also had access to a range of co-design style materials including final design we removed these aids all together and added a legend
blank paper, stickers, colored pens, sticky notes, and scissors. After to just below the summary so that it would be visible when needed.
CHI ’21, May 8–13, 2021, Yokohama, Japan Althobaiti, Meng and Vaniea, et al.
With the new design, G2 was concerned that the colors might be Domain and Hostname Highlighting. At the top of the report,
inappropriate for color-blind users and G4 mentioned that differ- we highlight the URL domain to provide the domain information
ent cultures might interpret red and green differently. In Chinese together with summarizing facts. Our initial design did not include
culture, for example, red is considered a happy color. To handle domain highlighting since we list the domain as the first fact. How-
both issues, we converted to a color-blind friendly pallet and water- ever, after moving the manipulation tricks further to the top and
marked severity symbols to clarify the meaning [35]. The final adding a summary, the domain is not obvious enough. Therefore,
report was tested on the iOS grayscale display which produces we added domain highlighting at the top similar to how industry
a colorless version of the report and allows to evaluate how the tools use it. Using highlighting similar to web browsers also keeps it
choice of colors would be for a colorblind person. Later focus groups familiar for non-technical users as member of G1 suggested that the
had mixed opinions about the water-marked severity indicators. concepts of domains and subdomains is too technical and lay users
Some agreed that the symbols enhanced the meaning while others are unlikely to understand them. So the highlight in the domain
found them distracting. Therefore, we removed them from the final row will help the user learn about the domain aspects.
version and kept only the symbols in the legend.
Report Summary. As mentioned before, participants of almost all
groups suggested that we provide a clear binary answer of whether
the URL is safe or not.
When G1 understood that a binary answer is not possible for all
URLs, they instead suggested an overall score or severity bar for
URLs that could be easily used for judgment. Similarly, a participant
from G3 wanted some sort of classification such as maliciousness
percentages. We felt that a single overall score would mislead users
and not encourage them to read and learn from the presented
information. Instead, we tried to use a combination of clearly visible
colors and added a summary highlighting key issues to the top of
Figure 3: Both images were created by a participant in G1. the report to help users get the requested high-level sense safety.
The left image shows a URL with the domain, subdomain, Security groups G2 and G3 saw two versions of the report, one
and additional URL components highlighted. Below is the with and one without a summary. Both thought the summary was
proposed summary with Tricks, Popularity, and Age. In the a good idea and debated about which topics should be included.
right image, the suggested new report structure continues They liked that the summary told them which features to focus on
with a list of tricks followed by URL facts. first. Since both expert groups considered it beneficial, we added a
summary section at the top of the report for the remaining groups
and continued to iterate on its presentation.
Several iterations later, we settled on four summary boxes: used
Facts Order. Focus groups had several suggestions about how to manipulation tricks, search result, domain age, and domain pop-
adjust the presentation order of the rows. Members of G1 suggested ularity. Manipulation tricks was chosen because their presence is
we weight the presented features and present the most reliable a strong indicator for phishing [70] and the meaning was clear to
features at the top. Given that we wanted to present features in a most focus groups. The search results box indicates whether the URL
consistent order (Goal E), we instead located the strongest features appears in Google top results when searching for it. Both domain
at the top and put the tricks section above the facts. age and popularity are common features that made sense to users
G1 also suggested that ordering the facts by color indicator, and were generally well understood in focus groups. Groups G2
beginning with the most severe (red) at the top. Another suggestion and later considered the summary to be quite useful. Initially, they
was ordering them based on each feature priority. We decided felt that such a summary definitely required the rest of the report
against these approaches because the relative value of facts depends for explanation. However, after reading the report, they quickly
on the information only the user knows. For example, popularity understood the meaning and had no difficulty using it when reading
is a valuable feature if its value is unexpected, such as if the user future reports.
thinks the URL is an Apple domain, but the popularity is low.
Another suggestion by G6 was to remove non-critical (green) Tricks. G1 found that the tricks section is very useful, especially
facts so as to not overwhelm the user as we hide green rows for the clarification of what is wrong, but the facts section did not ade-
tricks. However, a G5 participant felt that green facts were easy to quately explain the meaning of the information. A G1 participant
ignore if not needed. We decided against hiding green facts because said: “If I show the tricks to my gran, she will say yeah cool, but if
doing so would make it harder to compare reports. It might also show her the facts she wouldn’t know what is going on”. The tricks
incorrectly make all URL reports look overly red and negative, are indeed a stronger indication of a malicious link, potentially
leading to users incorrectly rejecting good URLs. eliminating the need to look further [27]. Thus, as suggested by G1
in Figure 3, they should appear at the top. Especially since for later
6.3.3 Report Content. Additionally to the report interface, we groups key facts such as age already appeared in the summary.
iterated over the report components and wording. After each group, One of the comments from G6 recommended removing the tricks
we incorporated suggestions to make the report better accessible we found in a verified (safe) URLs because it will distract the users;
and understandable. however, we decided that displaying tricks even for safe URLs will
I Don’t Need an Expert! Making URL Phishing Features Human Comprehensible CHI ’21, May 8–13, 2021, Yokohama, Japan
help users to learn to judge the features’ importance for making two measures are naturally easy to confuse as they both deal with
informed judgments. For example, non-ASCII characters can occur popularity. They also have inverted scales with 1 being good for
even in safe URLs after the evolution of Internationalized Domain domain popularity and bad for PageRank. In the initial design, we
Names; thus, based on the context, users can use the feature to tried to explain the difference in words. However, G1 suggested a
judge if a URL is safe or not. This decision supports our design visual range for the numbers instead to indicate clearly which value
goal B. is problematic. To further emphasize the scale, we added colors to
the range. G2 and G3 saw colored bars with raw values below and
Location and Category. We added a location field to better sup- showed no difficulties reading it. G5, however, commented that the
port users in identifying any inconsistency between the domain numbers looked like they were written in error due to the opposite
location and the expected location. Usually, the stated physical loca- directions of the popularity scale numbers. After iterating several
tion of malicious domain registrars differs from legitimate ones [70]. approaches on later groups, we settled on removing the numbers
However, understanding the meaning of the location was challeng- entirely and simply showing “popular” and “not popular” as the
ing for focus group members with some users interpreting location ends of the ranges.
based on the trustworthiness of the country. For example, in G4 Differentiating between the domain popularity and PageRank
one of the participants stated: “Apple in Japan, so what? Japan is was challenging for all groups except the security ones (G2, G3).
not questionable”. She was confused and thought that the location Domain popularity made the most sense, likely because it is roughly
referred to the server location rather than the location of the orga- based on the number of people visiting the site, which is easy to
nization. We thus adjusted the description to clarify that it was the explain and understand. The concept of PageRank, however, was
location of the domain owner. much harder to grasp for participants even when verbally explained.
A security expert from G3 commented that one of his common Also, the “domain” versus “page” difference was subtle leading to
approaches to detecting phishing websites is to look up the URL’s difficulties articulating why a page might have a different popularity
category on FortiGuard which categorizes URLs into groups such from the domain.
as shopping or governmental organizations. This feature can be Eliminating one or the other was also not an option as our se-
used to check whether a suspicious page has a similar category as curity groups explicitly mentioned how useful it was to include
the expected one [72]. Additionally, FortiGuard categorizes the full both since they are fundamentally different measures. For example,
host name of provided URLs in case a domain includes different sites like WordPress have high domain popularity even if a hosted
subdomains such as WordPress. page has a low page rank. So it is possible for a very popular site to
Web Hosting. As mentioned previously, some popular domains be hosting an unpopular malicious page. To reduce confusion, we
host content for others. As a result, it is possible for the domain to iterated on the wording to improve section explanations. G8 in par-
be popular and registered a long time ago, but the specific page or ticular was shown several wording options and provided extensive
subdomain is malicious. Discrepancy between Domain Popularity feedback on how to express the concepts more clearly. However,
and PageRank should highlight this situation to users, but focus even in the final design, the difference between domain popularity
group participants found the discrepancy confusing rather than and PageRank is still hard to grasp quickly.
helpful. So mid-way through the focus groups we started experi-
menting with wordings suggested by the participants to directly Encryption. Initially we thought that encryption would be use-
explain the issue. We tried several approaches, including dividing ful information in the report. In the encryption component we
the facts into page and domain facts or creating a large warning on stated if the connection was encrypted or not. But we were also
the top of facts. In the final design, to determine which domains concerned that user would equate encryption with owner validity,
automatically offer web hosting services, we used the FortiGuard so we added, “This URL is encrypted but we couldn’t verify the
website categorization service [23]. Then, we hide the domain-only owner”. G1 understood this concept and explained: “If this is an
facts (location, age, and popularity) and in their place we state that: Apple URL, you would expect to have a verified owner”.
“This domain hosts multiple sites, some are good and some may be However, we found that showing encryption information may
problematic. Usually only small companies and personal websites mislead users. For example, a G2 participant marked a legitimate
are hosted by other domains.” In general, the focus group partici- URL as phishing because she did not think a reputable company
pants liked this warning and felt that it was very important and would use a HTTP connection. When we tried incorporating own-
useful for their decision making. G7 felt that they lacked direction ership information as well to provide a more complete view of
on where to look after seeing it. They understood that there might encryption, participants just became more confused. A participant
be a problem but they were unsure how to distinguish between safe from G4 asked “why is it a good sign if you cannot verify the
and malicious hosted sites. Conversely, a G6 participant commented owner of the organization?” after seeing that the connection was
how the warning helped him to be confident visiting personal pages encrypted (green) but the website owner could not be verified due
since he expected them to be hosted on other sites. to the information not being in the SSL/TLS certificate. Showing
information that could mislead the user violates our design goals D
Domain Popularity and PageRank. The domain popularity is of inspiring trust and C of confidence by showing correct infor-
drawn from Alexa and is an indicator of how often people visit mation that will lead to the right decision. Therefore, in line with
the domain, with the most visited domain being ranked 1. The our design goal A to avoid overload, we removed encryption en-
PageRank roughly indicates how often other pages link to this tirely from the report. The only exception are Extended Validation
page [11]. Here, the most linked to pages have a rank of 10. The (EV) Certificates. These are TLS/SSL certificates which have gone
CHI ’21, May 8–13, 2021, Yokohama, Japan Althobaiti, Meng and Vaniea, et al.
7 FEATURES VALIDATION
While the features selected for the report are known to be strong
phishing features, we still wanted to test the visual appearance of
the report on known phishing and safe URLs. We analyzed 6877
URLs from four data sets, two phishing (PhishTank [58], Open-
Phish [59]), and two safe (DMOZ [14], ParaCrawl [42]), to explore
what the report could realistically look like for users. The sets of
safe URLs contained 2615 URLs, of which 592 (23%) were excluded
due to ‘4xx’ and ‘5xx’ response codes. For phishing URLs, we col-
Figure 4: A G6 participant began designing a report showing lected a total of 4262 URLs, of which 1645 (39%) were excluded due
“safe”, “unsafe”, “doubtful” and adding an option for “more to unsuccessful response codes. Phishing URLs from OpenPhish
details”. and PhishTank were processed every two hours to extract features
while the pages were still live.
To reduce load on the reader, reports only include data relevant
Both G3 and G6 suggested a tiered design where initially only to the particular URL. For example, tricks do not appear in every
limited data like a score or the summary is shown with an option URL, so we hide them by default and only show them if they are
to view the full report (See Figure 5); possibly limiting the type and present such as the presence of non-ASCII characters. So while
detail of data depending if the user is novice or advanced, e.g. an IT there are 23 possible rows, only 6.7 rows were shown on average
helpdesk employee. In G6, a participant suggested that they would (Min = 4, Max = 10), with phishing (Mean = 6.8) and safe (Mean =
like to see a high level safety flag before seeing any reports (See 6.5) having similar row counts.
Figure 4): “Give me three flags (green-safe, red-blacklisted, yellow- Tricks were rare for safe URLs with only 77 (3.8%) showing one
unsure), then give me the ability to drill down the summary and trick and the remainder having no tricks. Phishing URLs more
if I want more details, give me a link to the webpage”. Adding the commonly had tricks with 868 (33.2%) containing between 1–3
high level estimate will not “overwhelm [them] with the details at tricks. The primary cause of tricks for safe URLs was the similarity
a starting point” when using everyday. This suggestion contributed between the domain and one of top 10,000 popular domains (30
to the “report summary only” design we evaluate in later sections. URLs), a phenomenon already seen in previous research [78]. Thus,
I Don’t Need an Expert! Making URL Phishing Features Human Comprehensible CHI ’21, May 8–13, 2021, Yokohama, Japan
we only show a yellow indicator for this feature to avoid false a report was provided and participants were encouraged to use
positives and hand the task of comparing and deciding whether it when answering. Participants were then asked how confident
this is expected in their context to the user. they were in their decision followed by a question about what most
Looking at the red color in the reports, 30.2% of safe URLs and influenced their decision. After answering questions about all 6
88.0% of phishing URLs had at least one red row. Of the safe URLs URLs, all groups were asked a set of comprehension questions to
with a red row, 99% did not appear in Google’s top 10 search re- make sure they did, or could, understand the content of the report
sults, causing the red. Search result may therefore vary a good bit, or the report summary, the comprehension questions were multiple
making it a difficult feature to interpret. However, it is still a good choice and asked the participant what they thought the different
indicator for the illegitimacy of a page, which is why we decided parts of the report meant. Full and control were asked about the full
to keep it. We also found that 23.4% of phishing URL reports had report elements, and the report summary group was asked about
only green rows. Further examination showed that only three of the report summary elements. The answer options were drawn
them were compromised websites, while the rest were URLs that from common misunderstandings observed in the focus groups.
redirected to safe URLs; thus, the features referred to safe URLs The survey ended by collecting background information. We used a
which were indeed safe. This redirection tactic is used by attackers phishing susceptibility scale from Wright & Marett [87] with 5 sub-
to serve advertisements and then send the user to the expected scales to test participants’ computer self-efficacy, web experience,
safe website [77]. Therefore, it is important for the report to urge trust, risk beliefs, and suspicion of humanity. Also, we included
users to visit the shown final link instead of the original one. We basic demographics questions on age, gender, and highest degree
also considered using the color frequency in each report to predict obtained.
whether a URL is safe or not. Applying linear regression, we found
8.1.1 Study conditions. In the following, we describe the condi-
that the frequency of each color in the report significantly predicts
tions and the questions that differed between them.
whether a URL is safe or not (R2 = 0.44, F(4635) = 1238, p < .001)
with Red (β = −0.48, p < .001), Yellow (β = −0.17, p < .001), and Domain highlighting: In this condition, we showed the full URL
Green (β = 0.11, p < .001). with the domain highlighted and asked participants if the URL leads
Finally, we measured the features’ redundancy to ensure we to the given company name. Existing research already shows that
are not showing unnecessary features. We computed pair-wise users cannot read URLs unaided [2, 4, 6, 69]. Domain highlighting
correlations between features using their severity color as presented has already been adopted by several browsers, e.g. Safari, making it a
to users as the feature’s value and found no correlation between state-of-the-art approach that has been shown to help users decide
any of them. on URL safety [88]. Thus, we chose domain highlighting as the
control condition. To determine what most influenced participants’
8 ONLINE STUDY safety judgments, they were asked to select up to three of: the
domain, the protocol (https), the URL path and query strings, their
Focus groups are an excellent way to get rich feedback but a poor
prior knowledge, and their familiarity of the company’s URL.
way to get a truly wide range of participants. To address that gap, we
decided to use an online survey to test the clarity of report content Full report: This condition is the longest. Before showing the 6
as well as its ability to support users in making accurate safety URLs to participants, we first showed them a fictitious report with
judgments about a URL. We used a between-subjects experimental obviously fake values and asked 10 questions in a random order
design where each participant saw one of: the full report, just the about the features, i.e., “How old is this website?”. Doing so gave
summary, and just showing the URL with domain highlighting. participants some basic practice with the report and allowed us to
test for any serious misunderstandings about how to use it.
8.1 Questionnaire Instrument To determine what most influenced the participants’ safety judg-
ments, they were shown a list of the different report elements along
For all conditions, the survey started with informed consent. Partici-
with “my own prior experience reading URLs” and asked to select
pants were then asked how familiar they were with 13 website terms
up to three that most influenced their decision.
and 6 companies followed by study instructions to not visit any of
Finally, we asked a set of 7-point Likert assertion questions to
the links and only read them. A question then tested whether they
measure the report usefulness and satisfaction, loosely based on
had read the instructions and terminated the study if they failed it
the SUS, such as "I can learn a lot about phishing using this report"
twice. To test their existing URL-reading skills, participants were
and others drawn from focus group participants’ opinions about
then given three URLs and asked to choose which company those
the report. We ended with an optional free text comment section.
URLs lead to. The first URL has Google in the pathname, the second
has Facebook in the subdomain part, and the third is for New York Report summary: Many participants in our focus groups sug-
Times which uses an abbreviation of the brand name. gested to show the report summary when a user hovers over a
Participants were then shown 6 URLs. For each URL, they were link. The idea has merit, so we evaluate it here as a middle option
told to imagine that they wanted to visit a particular company, between domain highlighting and showing the full report. Partici-
given a brief description of that company, for example, “eBay, an pants in this condition saw only the summary part of the report,
auction and consumer to consumer sales website”, and then asked with no option to see the full report.
if the URL “leads to a page owned by the above company or is it To determine what influenced the participants’ safety judgments
a malicious URL”. In the domain highlighting condition the URL most, they were shown a list of answers including the summary
was domain highlighted in the question, for the other conditions report boxes, elements of the URL, their own prior experience, and
CHI ’21, May 8–13, 2021, Yokohama, Japan Althobaiti, Meng and Vaniea, et al.
the colors. After answering questions for all 6 URLs, they were We used an ANOVA followed by applying Cohen’s F for the
asked about the meaning of each of the summary report elements, effect size to test if the three conditions (domain-highlight, full
with multiple choice answer options derived from common focus report, and report summary) impacted the accuracy of participants’
group misconceptions and an other option. Finally, they were asked judgments and we found a statistically significant impact of the
the same 7-point Likert questions about the report usability as the condition on the judgment accuracy (α = .01, p < .001, r = 0.29).
Full Report condition. We then computed follow-up t-tests and found a significant dif-
We categorized URLs into 3 reading difficulties levels: (1) Parse ference between all three pairs of conditions: domain-highlight
and Match: any URL knowledgeable person can find the domain and full report (p < 0.0001, d = 0.7) , domain-highlight and report-
and compare it to brand name, (2) Domain Knowledge: a URL summary (p < 0.0001, d = 0.4), and report-summary and full report
knowledgeable person has to know which organization a domain (p < 0.001, d = 0.3).
belongs to before judging the URL, and (3) Misleading Flags: URLs We separately tested if any other variables impacted accuracy
have information that may mislead participants to misjudge them. using ANOVA as well. These variables are the time spent on the
For each category we have two organizations, one popular and question, the condition, and the level of difficulty of each URL,
one not popular based on the top targeted domains on PhishTank, the actual safety (malicious/trustworthy), participants’ confidence
and for each organization, we have a phishing and a safe URL (see in their answer, their familiarity with the company, the company
Table 3). With 6 organizations, we ended up with 12 URLs in total. the URL leads to, their prior knowledge of URL reading, and their
For each condition, participants were divided into two groups with phishing susceptibility factors. We found that the accuracy of users’
every group being shown one link of each organization at random, judgments is significantly impacted by the condition, the URL safety,
six URLs in total. and the URL hardness level (α = .01, p < .001), with a large effect
The presented URLs are real-life URLs with the phishing taken size for the condition (r = 0.31) and small for the other two (0.16
from our analyzed data set in Section 7. We made minor manipula- and 0.13). The remaining variables had no significant impact on
tions to control some variables. They have an approximately similar the judgment accuracy.
length, https protocol, as well as path and query strings. To reduce We tested URLs associated with 6 organizations as shown in
a bias in the selected URLs, we ensured that the color indicators Table 3 where each organization had a phishing and safe URL
were in-line with real observed color combinations from the data associated with it, resulting in 12 URLs tested. For all URLs, the
set. As we had abnormal false positive search results, we included full report has higher accuracy than the domain highlighting. The
one safe URL with a red search result. For participants’ safety, we summary report is slightly mixed, mostly sitting between the do-
selected phishing URLs that were no longer active. Additionally, in main highlighting and the full report, but occasionally showing
case they clicked on the links, we added a hyperlink which leads to more accuracy than the full report. The four “parse and match”
a page belonging to the research group about the danger of clicking URLs are theoretically the easiest to determine from only read-
on these links. ing the URL string, which is mostly born out with the high accu-
racy for even the domain highlighting condition. The exception,
bestchange.ru, was incorrectly marked as phishing by the majority
8.2 Survey Results of participants in the domain-highlighting condition. For the “do-
Participants. We recruited participants from Prolific for a 30 main knowing” URLs, a user has to know the correct domain of the
minute study on phishing. The time estimate was based on a short organization to be able to accurately judge safety if unaided. Here
pilot study. We limited participants to those with approval rates the email.microsoftonline.com URL was the most challenging for
above 90% and Native English speakers to avoid language issues. all conditions. The bìttrêx.com URL was also challenging for the
We then excluded those who did not answer the attention check domain highlighting group, possibly because they were unsure if
questions accurately. the non-ASCII character (Vietnamese) should be there or not. In the
We had a total of 153 participants (domain highlighting = 51, misleading URLs, Tripod was a confusing case where the safe URL
report summary = 50, full report = 52), 63.4% were female. Partici- positions the brand name in the subdomain while the phishing URL
pants had an average age of 31.89 years (σ = 9.9). Compensation includes the brand name in the domain but actually is a hosting
was £3.5. The average time required to complete the survey was service for other websites. Similarly, the fb.me legitimate short URL
18.26 minutes. For the prior URL reading skill, on average 1.6 of confused many of the domain-highlight participants.
the questions were answered correctly with only 14 answering all
questions correctly (9%) and (15%) not answering any URL correctly. Comprehension of the report elements. Full report participants
were able to provide a correct answer for 7.73 out of 10 report
Accuracy of safety judgment. We found that participants in gen- comprehension questions on average.
eral were able to accurately judge URLs’ safety. The average ac- The most common error was in regards to the location feature
curacy was highest for the full report (5.5/6, SD = .28), with the where 57.7% (30/52) of participants indicated that the location means
report summary also doing well (4.96/6, SD = .38) and the domain- the physical location of the server they were contacting rather than
highlight doing the worst (3.88/6, SD = .48). The false positive the self-reported location of the organization that registered the
(FPR) and false negative (FNR) rates are also encouraging with all domain.
participants more likely to incorrectly mark safe URLs as phishing: PageRank continued to be a source of confusion with 34.62%
full report (FPR = .12, FNR = .05), report-summary (FPR = .23, FNR (18/52) of participants providing an incorrect answer. They com-
= .11), and domain-highlight (FPR = .44, FNR = .26)). monly confused PageRank with domain popularity indicating that
I Don’t Need an Expert! Making URL Phishing Features Human Comprehensible CHI ’21, May 8–13, 2021, Yokohama, Japan
Table 3: The URLs used in the online study to judge URLs. Each condition was divided to two groups and see only one URL for
each company.
URL Hardness Popularity Safety Group % of participants who accurately judged safety
G1 G2 Highlight Full report Summary
https://resolutioncenter.ebay.com/policies/?id=123 Parse and match Popular Safe X 81 100 81
https://itmurl.com/www.ebay.co.uk/item=30327559652 Phish X 96 96 88
https://www.bestchange.ru/exchangers/mkt=en&id=234 Unpopular Safe X 44 88 83
https://www.bestcnange.ru/exchangers/mkt=en&id=234 Phish X 92 100 92
https//email.microsoftonline.com/login/?mkt=en-GB Domain knowing Popular Safe X 64 73 50
https://www.365onmicrosoft.com/login/?langua=en-GB Phish X 73 100 96
https://international.bittrex.com/account/?id=2423 Unpopular Safe X 73 85 65
https://international.bìttrêx.com/account/?id=2423 Phish X 56 96 92
https://fb.me/messages/t/788720331154519 Misleading flags Popular Safe X 15 92 85
https://l.facebook.com/l.php?u=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttp%2F67.23.238.165 Phish X 60 100 83
https://www.tripod.lycos.com/pricing/?plan=free-ad Unpopular Safe X 56 92 96
https://webmasterq.tripod.com/pricing?plan=free-ad Phish X 65 77 81
the value meant how popular the site was rather than the indi- the report content, and did not need to learn new skills to use it.
vidual page. One option we are considering for future work is to However, in the summary group, users felt they needed to learn a
hide PageRank when it is in alignment with the popularity (both lot of things before using the report.
high or both low) and only show it when it is different with direct
explanations of how the miss-alignment could be problematic.
All questions in the section had an option to indicate that the
description was confusing. The web hosting element confused par- 9 LIMITATIONS
ticipants most with 12% indicating that the description is unclear Our report aims to support users in deciding if potential phishing
and 73% (38/52) of them answering it correctly. The result suggests URLs are or are not safe to click on. Therefore, our work is limited
that the new wording is mostly working though there is some room to the types of information available to a user in advance of loading
for improvement. the page itself and does not include solutions that look at the safety
In the summary group, participants were able to provide a cor- of the resulting page such as identifying compromised code or
rect answer for an average of 4.69 out of 6 questions. The most layouts that are visually similar to frequently targeted sites.
common error concerned the web hosting feature with 57.7% (38/52) We endeavored to put together focus groups looking at HCI,
answering correctly. Security, and non-technical students to get a range of opinions and
experience. We also conducted multiple focus groups to offset some
Report usefulness and satisfaction. We asked participants to pick of their known issues, such as participants getting distracted by
the report elements that they found most helpful after deciding irrelevant topics, or being influenced by a dominant peers’ opinions.
about each URL to get a sense of if they were relying on a small We also ensured that a moderater was present to keep the groups
set of features or using the whole report. Participants chose dif- focused and on topic. Finally, we also used an online survey to
ferent information for different URLs. “Domain age” was the most further verify our focus group findings on a large scale.
influential feature for 4/12 (eBay and Bittrex Safe URLs and eBay Prolific, similar to other online micro work sites, is known to have
and Microsoft Phish URLs), “Manipulation tricks” for 3/12 (Face- users who are more computer literate than the average internet user,
book, Bittrex, and BestChange phishing URLs), “Domain popular- they also tend to be more privacy aware [37] which may impact
ity” 4/12 (BestChange, Microsoft, Tripod and Facebook safe URLs), their knowledge of URL reading. Though recent studies of online
and “Search result” for 1/12 (Tripod Phishing URL). For the safe workers suggest that online workers, including Prolific workers,
Microsoft URL, we displayed a warning that ‘microsoftonline’ is still struggle with predicting where a URL will go [2, 69]. This type
similar to ‘Microsoft’ and it does not match Google’s top search re- of user is also the type of person that less skilled people may go
sults, however, it was still not the most helpful feature. The fact that to for assistance [56], so supporting them well is likely to have a
participants were looking at different elements for different URLs broader positive impact.
shows that they were making use of the full report and balancing The phishing feature list we used is also not exhaustive. There
and weighing features, instead of just sticking to the one aspect are a wide variety of features used to detect phishing URLs, and
that made the most sense to them. Participants indicated that they many of them appear in only one or two papers. To mitigate the
used prior knowledge but it was not in the top three features for issues, we made use of existing reviews of the range and accuracy
any of the URLs. of features [5]. However, features that have been mostly tested on
The self-reported answers for satisfaction indicate that the full automated systems are not necessarily the best features possible
report (Mean = 5.78, Median = 6, SD = 0.72) was more preferable for people. In this work we have started with known good features
than the summary-report (Mean = 5.36, Median = 5.57, SD = 1.13). and narrowed in on those that best support people, but it is possible
For the full report, participants found that the survey taught them that other features exist that support people better but did not show
about phishing, using the report would help them, they understand up in our review.
CHI ’21, May 8–13, 2021, Yokohama, Japan Althobaiti, Meng and Vaniea, et al.
Our code automatically queries several third party APIs, such as [6] Kholoud Althobaiti, Kami Vaniea, and Serena Zheng. 2018. Faheem: Explaining
PhishTank and Google Safe Browsing, to retrieve features not pos- URLs to people using a Slack bot. In 2018 Symposium on Digital Behaviour Inter-
vention for Cyber Security (AISB 2018), April 5 2018. University of Liverpool, Liver-
sible to extract from the URL itself. Using such APIs in a deployed pool, UK, 1–8. http://aisb2018.csc.liv.ac.uk/PROCEEDINGS%20AISB2018/Digital%
system would be practical since they are continuously updated and, 20Behaviour%20Interventions%20for%20CyberSecurity%20-%20AISB2018.pdf
[7] Nalin Asanka Gamagedara Arachchilage and Steve Love. 2014. Security aware-
thus, require minimal to no ongoing maintenance for the report ness of computer users: A phishing threat avoidance perspective. Comput. Hum.
accuracy. Other report elements like the tricks and the explanations Behav. 38 (2014), 304–312. https://doi.org/10.1016/j.chb.2014.05.046
may require more expert involvement to maintain over time, but [8] Krishna Bhargrava, Douglas Brewer, and Kang Li. 2009. A study of URL redi-
rection indicating spam. In Sixth conference on e-mail and anti-spam CEAS.
the effort of doing so is not large. To further explore the potential Steve ShengâĂŹs Publications, California, USA, 1–4. http://citeseerx.ist.psu.
of deployment, a masters student build a prototype of the report edu/viewdoc/download?doi=10.1.1.536.2821
as a Chrome browser plugin as a thesis project [83]. Their system [9] Jim Blythe, L. Jean Camp, and Vaibhav Garg. 2011. Targeted risk communication
for computer security. In Proceedings of the 16th International Conference on
allowed a user to request a variation of our report for any URL they Intelligent User Interfaces, IUI. ACM, Palo Alto, CA, USA, 295–298. https://doi.
saw inside the browser. As the prototype was a proof-of-concept, org/10.1145/1943403.1943449
[10] Giovanni Bottazzi, Emiliano Casalicchio, Davide Cingolani, Fabio Marturana,
it is not suitable for real-world testing. But it did demonstrate the and Marco Piu. 2015. MP-Shield: A Framework for Phishing Detection in
feasibility of integrating such a report into common user tools, like Mobile Devices. In 15th International Conference on Computer and Informa-
browsers. tion Technology, CIT; 14th International Conference on Ubiquitous Computing
and Communications, IUCC; 13th International Conference on Dependable, Auto-
nomic and Secure Computing, DASC; 13th International Conference on Pervasive
11 CONCLUSION Intelligence and Computing, PICom, Yulei Wu, Geyong Min, Nektarios Geor-
galas, Jia Hu, Luigi Atzori, Xiaolong Jin, Stephen A. Jarvis, Lei (Chris) Liu,
We have presented the design for a new URL feature report which and Ramón Agüero Calvo (Eds.). IEEE, Liverpool, United Kingdom, 1977–1983.
assists users in deciding whether a URL is malicious or not. The re- https://doi.org/10.1109/CIT/IUCC/DASC/PICOM.2015.293
[11] Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-Scale Hy-
ports are intended for users who are trying to judge a URL’s safety pertextual Web Search Engine. Computer Networks 30, 1-7 (1998), 107–117.
as part of a primary task. To refine the report’s design, we con- https://doi.org/10.1016/s0169-7552(98)00110-x
ducted 8 focus groups with experts in HCI, experts in security, and [12] Gamze Canova, Melanie Volkamer, Clemens Bergmann, and Benjamin Rein-
heimer. 2015. NoPhish App Evaluation: Lab and Retention Study. In Internet
average users. Finally, we conducted a survey to measure the read- Society, 8 February 2015 (Usec ’15), Vol. 453. The Internet Society, San Diego, CA,
ability and effectiveness of the report. We found that participants USA, 1–10. http://dx.doi.org/10.14722/usec.2015.23009
could generally read the reports, understand phishing features, and [13] Sidharth Chhabra, Anupama Aggarwal, Fabrício Benevenuto, and Ponnurangam
Kumaraguru. 2011. Phi.sh/$oCiaL: the phishing landscape through short URLs.
use them to successfully decide if a URL is malicious or safe. How- In The 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam
ever, some participants still had difficulty understanding the more Conference, CEAS. ACM, Perth, Australia, 92–101. https://doi.org/10.1145/2030376.
2030387
complex concepts such as PageRank and location. [14] CMBuild. 2013. Archive of dmoz.org. (2013). https://dmoz-odp.org/Reference/
Accessed Dec. 2020.
ACKNOWLEDGMENTS [15] Lucian Constantin. 2019. Attackers Host Phishing Pages on Azure. (Mar.
2019). https://securityboulevard.com/2019/03/attackers-host-phishing-pages-
We thank Maria Wolters and TULiPS lab members for their feedback on-azure/ Accessed Jun. 2019.
and discussion on the design of the focus groups and the user [16] Lorrie Faith Cranor. 2008. A Framework for Reasoning About the Human in
the Loop. In Usability, Psychology, and Security, UPSEC’08, Elizabeth F. Churchill
study. This work was supported in part by the UKRI Centre for and Rachna Dhamija (Eds.). USENIX Association, San Francisco, CA, USA, 1–15.
Doctoral Training in Natural Language Processing, funded by the http://www.usenix.org/events/upsec08/tech/full%5Fpapers/cranor/cranor.pdf
[17] Rachna Dhamija, J. D. Tygar, and Marti A. Hearst. 2006. Why phishing works.
UKRI (grant EP/S022481/1), the University of Edinburgh as well as In Proceedings of the 2006 Conference on Human Factors in Computing Systems,
a Google Research Award. CHI, Rebecca E. Grinter, Tom Rodden, Paul M. Aoki, Edward Cutrell, Robin
Jeffries, and Gary M. Olson (Eds.). ACM, Montréal, Québec, Canada, 581–590.
https://doi.org/10.1145/1124772.1124861
REFERENCES [18] Hermann Ebbinghaus. 2013. Memory: a contribution to experimental psychology.
[1] Sara Albakry and Kami Vaniea. 2018. Automatic Phishing Detection vesus Annals of neurosciences 20, 4 (Oct. 2013), 155–156. https://doi.org/10.5214/ans.
User Training, Is there a Middle Ground Using XAI?. In Proceedings of the SICSA 0972.7531.200408
Workshop on Reasoning, Learning and Explainability (CEUR Workshop Proceedings), [19] Let’s Encrypt. 2019. Free SSL/TLS Certificates. (2019). https://letsencrypt.org/
Kyle Martin, Nirmalie Wiratunga, and Leslie S. Smith (Eds.), Vol. 2151. CEUR- Accessed Dec. 2020.
WS.org, Aberdeen, Scotland, UK, 1–2. http://ceur-ws.org/Vol-2151/Paper_P2.pdf [20] J Erkkila. 2011. Why we fall for phishing. In Proceedings of the 2011 CHI Conference
[2] Sara Albakry, Kami Vaniea, and Maria K. Wolters. 2020. What is this URL’s on Human Factors in Computing Systems (Chi ’11). ACM, ancouver, BC, Canada,
Destination? Empirical Evaluation of Users’ URL Reading. In CHI ’20: CHI 1–8. https://juerkkil.iki.fi/files/writings/phishing
Conference on Human Factors in Computing Systems, Regina Bernhaupt, Flo- [21] FBI. 2020. 2019 Internet Crime Report, Data Reflects an Evolving Threat and the
rian ’Floyd’ Mueller, David Verweij, Josh Andres, Joanna McGrenere, Andy Importance of Reporting. Technical Report. The Federal Bureau of Investiga-
Cockburn, Ignacio Avellino, Alix Goguey, Pernille Bjøn, Shengdong Zhao, Bri- tion, Internet Crime Complaint Center. https://www.fbi.gov/news/stories/2019-
ane Paul Samson, and Rafal Kocielnik (Eds.). ACM, Honolulu, HI, USA, 1–12. internet-crime-report-released-021120 Accessed Aug. 2020.
https://doi.org/10.1145/3313831.3376168 [22] Matheesha Fernando and Nalin Asanka Gamagedara Arachchilage. 2020. Why
[3] Hazim Almuhimedi, Adrienne Porter Felt, Robert W. Reeder, and Sunny Consolvo. Johnny can’t rely on anti-phishing educational interventions to protect him-
2014. Your Reputation Precedes You: History, Reputation, and the Chrome self against contemporary phishing attacks? CoRR abs/2004.13262 (2020), 1–12.
Malware Warning. In Tenth Symposium on Usable Privacy and Security, SOUPS, arXiv:cs.CR/2004.13262 https://arxiv.org/abs/2004.13262
Lorrie Faith Cranor, Lujo Bauer, and Robert Biddle (Eds.). USENIX Association, [23] Fortinet. 2021. Web Filter Categories. (Jan. 9 2021). https://www.fortiguard.com/
Menlo Park, CA, USA, 113–128. webfilter/categories Accessed Aug. 2020.
[4] Mohamed Alsharnouby, Furkan Alaca, and Sonia Chiasson. 2015. Why phishing [24] Lorenzo Franceschi-Bicchierai. 2016. How Hackers Broke Into
still works: User strategies for combating phishing attacks. International Journal John Podesta and Colin Powell’s Gmail Accounts. (2016). https:
of Human-Computer Studies 82 (2015), 69–82. https://doi.org/10.1016/j.ijhcs.2015. //motherboard.vice.com/en%5Fus/article/mg7xjb/how-hackers-broke-into-
05.005 john-podesta-and-colin-powells-gmail-accounts Accessed Aug. 2020.
[5] Kholoud Althobaiti, Ghaidaa Rummani, and Kami Vaniea. 2019. A Review of [25] Evgeniy Gabrilovich and Alex Gontmakher. 2002. The homograph attack. Com-
Human- and Computer-Facing URL Phishing Features. In European Symposium mun. ACM 45, 2 (2002), 128. https://doi.org/10.1145/503124.503156
on Security and Privacy Workshops, EuroS&P Workshops. IEEE, Stockholm, Sweden,
182–191. https://doi.org/10.1109/EuroSPW.2019.00027
CHI ’21, May 8–13, 2021, Yokohama, Japan Althobaiti, Meng and Vaniea, et al.
[26] Sujata Garera, Niels Provos, Monica Chew, and Aviel D. Rubin. 2007. A Framework Rosson and David J. Gilmore (Eds.). ACM, San Jose, California, USA, 905–914.
for Detection and Measurement of Phishing Attacks. In Proceedings of the 2007 https://doi.org/10.1145/1240624.1240760
ACM Workshop on Recurring Malcode (Worm ’07). Association for Computing [45] Ponnurangam Kumaraguru, Steve Sheng, Alessandro Acquisti, Lorrie Faith Cra-
Machinery, New York, NY, USA, 1âĂŞ8. https://doi.org/10.1145/1314389.1314391 nor, and Jason I. Hong. 2010. Teaching Johnny not to fall for phish. ACM
[27] Dan J. Graham, Jacob L. Orquin, and Vivianne H.M. Visschers. 2012. Eye tracking Transactions on Internet Technology 10, 2 (2010), 7:1–7:31. https://doi.org/10.1145/
and nutrition label use: A review of the literature and recommendations for label 1754393.1754396
enhancement. Food Policy 37, 4 (2012), 378–382. https://doi.org/10.1016/j.foodpol. [46] Sangho Lee and Jong Kim. 2013. WarningBird: A Near Real-Time Detection
2012.03.004 System for Suspicious URLs in Twitter Stream. IEEE Transactions on Dependable
[28] Chris Grier, Kurt Thomas, Vern Paxson, and Chao Michael Zhang. 2010. spam: the and Secure Computing 10, 3 (2013), 183–195. https://doi.org/10.1109/tdsc.2013.3
underground on 140 characters or less. In Proceedings of the 17th ACM Conference [47] Chunlin Liu, Lidong Wang, Bo Lang, and Yuan Zhou. 2018. Finding Effective
on Computer and Communications Security, CCS 2010, October 4-8, 2010. ACM, Classifier for Malicious URL Detection. In Proceedings of the 2nd International
Chicago, Illinois, USA, 27–37. https://doi.org/10.1145/1866307.1866311 Conference on Management Engineering, Software Engineering and Service Sciences
[29] Neha Gupta, Anupama Aggarwal, and Ponnurangam Kumaraguru. 2014. (Icmss 2018). Association for Computing Machinery, New York, NY, USA, 240–244.
bit.ly/malicious: Deep dive into short URL based e-crime detection. In APWG https://doi.org/10.1145/3180374.3181352
Symposium on Electronic Crime Research, eCrime. IEEE, Birmingham, AL, USA, [48] Netcraft Ltd. 2019. Internet Security and Data Mining. (2019). https://www.
14–24. https://doi.org/10.1109/ecrime.2014.6963161 netcraft.com/ Accessed Jun. 2020.
[30] Srishti Gupta and Ponnurangam Kumaraguru. 2014. Emerging phishing trends [49] Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Identi-
and effectiveness of the anti-phishing landing page. In 2014 APWG Symposium fying suspicious URLs: an application of large-scale online learning. In Proceedings
on Electronic Crime Research, eCrime. IEEE, Birmingham, AL, USA, 36–47. https: of the 26th Annual International Conference on Machine Learning, ICML 2009, June
//doi.org/10.1109/ecrime.2014.6963163 14-18, 2009 (ACM International Conference Proceeding Series), Andrea Pohoreckyj
[31] Masayuki Higashino. 2019. A Design of an Anti-Phishing Training System Danyluk, Léon Bottou, and Michael L. Littman (Eds.), Vol. 382. ACM, Montreal,
Collaborated with Multiple Organizations. In Proceedings of the 21st International Quebec, Canada, 681–688. https://doi.org/10.1145/1553374.1553462
Conference on Information Integration and Web-based Applications & Services, [50] Samuel Marchal, Kalle Saari, Nidhi Singh, and N. Asokan. 2016. Know Your
iiWAS 2019, December 2-4, 2019. ACM, Munich, Germany, 589–592. https://doi. Phish: Novel Techniques for Detecting Phishing Sites and Their Targets. In 36th
org/10.1145/3366030.3366086 International Conference on Distributed Computing Systems, ICDCS. IEEE, Nara,
[32] FBIâĂŹs Internet Crime Complaint Center (IC3). 2017. 2017 Internet Crime Report. Japan, 323–333. https://doi.org/10.1109/icdcs.2016.10
Technical Report. The Federal Bureau of Investigation (FBI), Internet Crime [51] Ulrike Meyer and Vincent Drury. 2019. Certified Phishing: Taking a Look at
Complaint Center. https://pdf.ic3.gov/2017%5FIC3Report.pdf Accessed Aug. Public Key Certificates of Phishing Websites. In Fifteenth Symposium on USAble
2020. Privacy and Security, SOUPS. USENIX Association, Santa Clara, CA, USA, 210–223.
[33] Iulia Ion, Rob Reeder, and Sunny Consolvo. 2015. "...No one Can Hack My Mind": https://www.usenix.org/conference/soups2019/presentation/drury
Comparing Expert and Non-Expert Security Practices. In Eleventh Symposium [52] Microsoft. 2018. Microsoft Security Intelligence Report, Volumne 23. Technical Re-
On Usable Privacy and Security, SOUPS, Lorrie Faith Cranor, Robert Biddle, and port. Microsoft. https://www.microsoft.com/en-us/security/intelligence-report
Sunny Consolvo (Eds.). USENIX Association, Ottawa, Canada, 327–346. https: Accessed Aug. 2018.
//www.usenix.org/conference/soups2015/proceedings/presentation/ion [53] Gaurav Misra, Nalin Asanka Gamagedara Arachchilage, and Shlomo Berkovsky.
[34] Daniel Jampen, Gürkan Gür, Thomas Sutter, and Bernhard Tellenbach. 2020. 2017. Phish Phinder: A Game Design Approach to Enhance User Confidence
Don’t click: towards an effective anti-phishing training. A comparative literature in Mitigating Phishing Attacks. In Eleventh International Symposium on Human
review. Human-centric Computing and Information Sciences 10 (2020), 33. https: Aspects of Information Security & Assurance, HAISA, Proceedings, Steven Furnell
//doi.org/10.1186/s13673-020-00237-7 and Nathan L. Clarke (Eds.). University of Plymouth, Adelaide, Australia, 41–51.
[35] Bernhard Jenny and Nathaniel Vaughn Kelso. 2007. Color Design for the Color http://www.cscan.org/openaccess/?paperid=349
Vision Impaired. Cartographic Perspectives 58 (2007), 61–67. https://doi.org/10. [54] Mattia Mossano, Kami Vaniea, Lukas Aldag, Reyhan Düzgün, Peter Mayer, and
14714/CP58.270 Melanie Volkamer. 2020. Analysis of publicly available anti-phishing webpages:
[36] Joseph Johnson. 2019. UK: number of internet users who are students 2011-2019. contradicting information, lack of concrete advice and very narrow attack vector.
(May. 2019). https://www.statista.com/statistics/940040/number-of-student- In European Symposium on Security and Privacy Workshops, EuroS&P Workshops.
internet-users-in-the-uk/ IEEE, Genoa, Italy, 130–139. https://doi.org/10.1109/EuroSPW51379.2020.00026
[37] Ruogu Kang, Stephanie Brown, Laura Dabbish, and Sara Kiesler. 2014. Privacy [55] Rennie Naidoo. 2015. Analysing Urgency and Trust Cues Exploited in Phish-
Attitudes of Mechanical Turk Workers and the U.S. Public. In 10th Symposium ing Scam Designs. In 10th International Conference on Cyber Warfare and Secu-
on USAble Privacy and Security, SOUPS, Lorrie Faith Cranor, Lujo Bauer, and rity, ICCWS. Academic Conferences International Limited, The University of
Robert Biddle (Eds.). USENIX Association, Menlo Park, CA, USA, 37–49. https: Venda and The Council for Scientific and Industrial Research, South Africa, 216–
//www.usenix.org/conference/soups2014/proceedings/presentation/kang 222. search.proquest.com/conference-papers-proceedings/analysing-urgency-
[38] Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder. trust-cues-exploited-phishing/docview/1781336050/se-2?accountid=10673
2009. A "nutrition label" for privacy. In Proceedings of the 5th Symposium on [56] James Nicholson, Lynne M. Coventry, and Pam Briggs. 2018. Introducing the
USAble Privacy and Security, SOUPS. ACM, Mountain View, California, USA, Cybersurvival Task: Assessing and Addressing Staff Beliefs about Effective Cyber
1–a12. https://doi.org/10.1145/1572532.1572538 Protection. In Fourteenth Symposium on USAble Privacy and Security, SOUPS,
[39] Timothy Kelley and Bennett I. Bertenthal. 2016. Attention and past behavior, not August 12-14, 2018. USENIX Association, Baltimore, MD, USA, 443–457. https:
security knowledge, modulate users’ decisions to login to insecure websites. Inf. //www.usenix.org/conference/soups2018/presentation/nicholson
Computer Security 24, 2 (2016), 164–176. https://doi.org/10.1108/ics-01-2016-0002 [57] Adam Oest, Yeganeh Safaei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and
arXiv:https://doi.org/10.1108/ICS-01-2016-0002 Gary Warner. 2018. Inside a phisher’s mind: Understanding the anti-phishing
[40] Mahmoud Khonji, Youssef Iraqi, and Andrew Jones. 2013. Phishing Detection: A ecosystem through phishing kit analysis. In 2018 APWG Symposium on Electronic
Literature Survey. IEEE Communications Surveys Tutorials 15, 4 (2013), 2091–2121. Crime Research, eCrime 2018, May 15-17, 2018. IEEE, San Diego, CA, USA, 1–12.
https://doi.org/10.1109/surv.2013.032213.00009 https://doi.org/10.1109/ecrime.2018.8376206
[41] Iacovos Kirlappos and Martina Angela Sasse. 2012. Security Education against [58] LLC OpenDNS. 2019. PhishTank: Join the fight against phishing. (2019). https:
Phishing: A Modest Proposal for a Major Rethink. IEEE Security and Privacy 10, //www.phishtank.com/ Accessed Dec. 2020.
2 (2012), 24–32. https://doi.org/10.1109/MSP.2011.179 [59] OpenPhish. 2019. OpenPhish: Phishing Intelligence. (2019). https://openphish.
[42] Philipp Koehn, Huda Khayrallah, Kenneth Heafield, and Mikel L. Forcada. 2018. com Accessed Dec. 2020.
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering. In Proceedings [60] Charles A. O’Reilly. 1980. Individuals and Information Overload in Organizations:
of the Third Conference on Machine Translation: Shared Task Papers, WMT 2018, Is More Necessarily Better? The Academy of Management Journal 23, 4 (1980),
October 31 - November 1, 2018. Association for Computational Linguistics, Belgium, 684–696. http://www.jstor.org/stable/255556
Brussels, 726–739. https://doi.org/10.18653/v1/w18-6453 [61] Gilchan Park, Lauren M. Stuart, Julia M. Taylor, and Victor Raskin. 2014. Compar-
[43] Ponnurangam Kumaraguru, Justin Cranshaw, Alessandro Acquisti, Lorrie Cranor, ing machine and human ability to detect phishing emails. In 2014 IEEE Interna-
Jason Hong, Mary Ann Blair, and Theodore Pham. 2009. School of Phish: A Real- tional Conference on Systems, Man, and Cybernetics, SMC 2014, October 5-8, 2014.
world Evaluation of Anti-phishing Training. In Proceedings of the 5th Symposium IEEE, San Diego, CA, USA, 2322–2327. https://doi.org/10.1109/smc.2014.6974273
on USAble Privacy and Security (Soups ’09). ACM, New York, NY, USA, Article 3, [62] Cofense PhishMe. 2017. Enterprise Phishing Resiliency and Defense Report. Tech-
12 pages. https://doi.org/10.1145/1572532.1572536 nical Report. PhishMe, Inc. https://cofense.com/wp-content/uploads/2017/11/
[44] Ponnurangam Kumaraguru, Yong Rhee, Alessandro Acquisti, Lorrie Faith Cranor, Enterprise-Phishing-Resiliency-and-Defense-Report-2017.pdf Accessed Aug.
Jason I. Hong, and Elizabeth Nunge. 2007. Protecting people from phishing: the 2020.
design and evaluation of an embedded training email system. In Proceedings of [63] Swapan Purkait. 2012. Phishing counter measures and their effectiveness -
the 2007 Conference on Human Factors in Computing Systems, CHI, Mary Beth literature review. Information Management & Computer Security 20, 5 (2012),
382–420. https://doi.org/10.1108/09685221211286548
I Don’t Need an Expert! Making URL Phishing Features Human Comprehensible CHI ’21, May 8–13, 2021, Yokohama, Japan
[64] Issa Qabajeh, Fadi A. Thabtah, and Francisco Chiclana. 2018. A recent review of [82] Melanie Volkamer, Karen Renaud, Benjamin Reinheimer, and Alexandra Kunz.
conventional vs. automated cybersecurity anti-phishing techniques. Computer 2017. User experiences of TORPEDO: TOoltip-poweRed Phishing Email Detec-
Science Review 29 (2018), 44–55. https://doi.org/10.1016/j.cosrev.2018.05.003 tiOn. Computer Security 71 (2017), 100–113. https://doi.org/10.1016/j.cose.2017.
[65] Florian Quinkert, Tobias Lauinger, William K. Robertson, Engin Kirda, and 02.004
Thorsten Holz. 2019. It’s Not what It Looks Like: Measuring Attacks and Defen- [83] Stephen Waddell. 2020. CatchPhish: A URL and Anti-Phishing Research Plat-
sive Registrations of Homograph Domains. In 7th Conference on Communications form. Master’s thesis. University of Edinburgh. https://groups.inf.ed.ac.uk/tulips/
and Network Security, CNS 2019, June 10-12, 2019. IEEE, Washington, DC, USA, projects/19-20/waddell-2020.pdf
259–267. https://doi.org/10.1109/cns.2019.8802671 [84] Rick Wash. 2020. How Experts Detect Phishing Scam Emails. Proc. ACM Human
[66] Elissa M. Redmiles, Amelia R. Malone, and Michelle L. Mazurek. 2016. I Think Computer Interaction 4, CSCW2 (2020), 160:1–160:28. https://doi.org/10.1145/
They’re Trying to Tell Me Something: Advice Sources and Selection for Digital 3415231
Security. In IEEE Symposium on Security and Privacy, SP. IEEE Computer Society, [85] Patrickson Weanquoi, Jaris Johnson, and Jinghua Zhang. 2017. Using a Game to
San Jose, CA, USA, 272–288. https://doi.org/10.1109/SP.2016.24 Teach About Phishing. In Proceedings of the 18th Annual Conference on Information
[67] Robert W. Reeder, Adrienne Porter Felt, Sunny Consolvo, Nathan Malkin, Christo- Technology Education and the 6th Annual Conference on Research in Information
pher Thompson, and Serge Egelman. 2018. An Experience Sampling Study of Technology, Stephen J. Zilora, Tom Ayers, and Daniel S. Bogaard (Eds.). ACM,
User Reactions to Browser Warnings in the Field. In Proceedings of the 2018 CHI Rochester, New York, USA, 75. https://doi.org/10.1145/3125659.3125669
Conference on Human Factors in Computing Systems, CHI, Regan L. Mandryk, [86] Emma J. Williams, Joanne Hinds, and Adam N. Joinson. 2018. Exploring suscep-
Mark Hancock, Mark Perry, and Anna L. Cox (Eds.). ACM, Montreal, QC, Canada, tibility to phishing in the workplace. International Journal of Human Computer
512. https://doi.org/10.1145/3173574.3174086 Studies 120 (2018), 1–13. https://doi.org/10.1016/j.ijhcs.2018.06.004
[68] Robert W. Reeder, Iulia Ion, and Sunny Consolvo. 2017. 152 Simple Steps to Stay [87] Ryan T. Wright and Kent Marett. 2010. The Influence of Experiential and Dispo-
Safe Online: Security Advice for Non-Tech-Savvy Users. IEEE Security & Privacy sitional Factors in Phishing: An Empirical Investigation of the Deceived. Journal
15, 5 (2017), 55–64. https://doi.org/10.1109/msp.2017.3681050 of Management Information Systems 27, 1 (2010), 273–303. http://www.jmis-
[69] Joshua Reynolds, Deepak KuMar., Zane Ma, Rohan Subramanian, Meishan Wu, web.org/articles/1038
Martin Shelton, Joshua Mason, Emily Stark, and Michael Bailey. 2020. Measuring [88] Min Wu, Robert C. Miller, and Simson L. Garfinkel. 2006. Do security toolbars
Identity Confusion with Uniform Resource Locators. In Proceedings of the 2020 actually prevent phishing attacks?. In Proceedings of the 2006 Conference on Human
CHI Conference on Human Factors in Computing Systems (CHI ’20). ACM, Honolulu, Factors in Computing Systems, CHI 2006, April 22-27, 2006. ACM, Montréal, Québec,
HI, USA, 1–12. https://doi.org/10.1145/3313831.3376298 Canada, 601–610. https://doi.org/10.1145/1124772.1124863
[70] Doyen Sahoo, Chenghao Liu, and Steven C. H. Hoi. 2019. Malicious URL Detection [89] Guang Xiang, Jason I. Hong, Carolyn Penstein Rosé, and Lorrie Faith Cranor.
using Machine Learning: A Survey. (2019). arXiv:cs.LG/1701.07179 http://arxiv. 2011. CANTINA+: A Feature-Rich Machine Learning Framework for Detecting
org/abs/1701.07179 Phishing Web Sites. ACM Trans. Inf. Syst. Secur. 14, 2 (2011), 21:1–21:28. https:
[71] Maria Sameen, Kyunghyun Han, and Seong Oun Hwang. 2020. PhishHaven - An //doi.org/10.1145/2019599.2019606
Efficient Real-Time AI Phishing URLs Detection System. IEEE Access 8 (2020), [90] Aiping Xiong, Robert W. Proctor, Weining Yang, and Ninghui Li. 2017. Is Domain
83425–83443. https://doi.org/10.1109/ACCESS.2020.2991403 Highlighting Actually Helpful in Identifying Phishing Web Pages? Hum. Factors
[72] Nuttapong Sanglerdsinlapachai and Arnon Rungsawang. 2010. Using Domain 59, 4 (2017), 640–660. https://doi.org/10.1177/0018720816684064
Top-page Similarity Feature in Machine Learning-Based Web Phishing Detection. [91] Jun Yang, Pengpeng Yang, Xiaohui Jin, and Qian Ma. 2017. Multi-Classification
In Third International Conference on Knowledge Discovery and Data Mining, WKDD. for Malicious URL Based on Improved Semi-Supervised Algorithm. In IEEE Inter-
IEEE, Phuket, Thailand, 187–190. https://doi.org/10.1109/wkdd.2010.108 national Conference on Computational Science and Engineering, CSE 2017, and IEEE
[73] Tara Seals. 2017. ost of user security training tops $290K per year. (2017). https:// International Conference on Embedded and Ubiquitous Computing, EUC, Volume 1.
www.infosecurity-magazine.com/news/cost-of-user-security-training Accessed IEEE Computer Society, Guangzhou, China, 143–150. https://doi.org/10.1109/CSE-
Nov. 2020. EUC.2017.34
[74] Steve Sheng, Bryant Magnien, Ponnurangam Kumaraguru, Alessandro Acquisti,
Lorrie Faith Cranor, Jason I. Hong, and Elizabeth Nunge. 2007. Anti-Phishing
Phil: the design and evaluation of a game that teaches people not to fall for phish.
In Proceedings of the 3rd Symposium on USAble Privacy and Security, SOUPS 2007,
July 18-20, 2007 (ACM International Conference Proceeding Series), Lorrie Faith
Cranor (Ed.), Vol. 229. ACM, Pittsburgh, Pennsylvania, USA, 88–99. https://doi.
org/10.1145/1280680.1280692
[75] Hossein Siadati, Sean Palka, Avi Siegel, and Damon McCoy. 2017. Measuring
the Effectiveness of Embedded Phishing Exercises. In 10th USENIX Workshop on
Cyber Security Experimentation and Test, CSET 2017, August 14, 2017. USENIX
Association, Vancouver, BC, Canada, 8. https://www.usenix.org/conference/
cset17/workshop-program/presentation/siadatii
[76] Gabor Szathmari. 2020. Why Outdated Anti-Phishing Advice Leaves You Exposed
(Part 2). (Jul. 2020). https://blog.ironbastion.com.au/why-outdated-anti-phishing-
advice-leaves-you-exposed-part-2/
[77] Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Márk Félegyházi, and
Chris Kanich. 2014. The Long "Taile" of Typosquatting Domain Names. In
Proceedings of the 23rd USENIX Security Symposium. USENIX Association, San
Diego, CA, USA, 191–206. https://www.usenix.org/conference/usenixsecurity14/
technical-sessions/presentation/szurdi
[78] Rashid Tahir, Ali Raza, Faizan Ahmad, Jehangir Kazi, Fareed Zaffar, Chris
Kanich, and Matthew Caesar. 2018. It’s All in the Name: Why Some URLs
are More Vulnerable to Typosquatting. In Conference on Computer Communi-
cations, INFOCOM 2018, April 16-19, 2018. IEEE, Honolulu, HI, USA, 2618–2626.
https://doi.org/10.1109/infocom.2018.8486271
[79] Nikolaos Tsalis, Nikos Virvilis, Alexios Mylonas, Theodore K. Apostolopoulos,
and Dimitris Gritzalis. 2014. Browser Blacklists: The Utopia of Phishing Protec-
tion. In E-Business and Telecommunications - 11th International Joint Conference,
ICETE, Revised Selected Papers (Communications in Computer and Information
Science), Mohammad S. Obaidat, Andreas Holzinger, and Joaquim Filipe (Eds.),
Vol. 554. Springer, Vienna, Austria, 278–293. https://doi.org/10.1007/978-3-319-
25915-4_15
[80] Verizon. 2017. 2017 Data Breach Investigations Report. Technical Report. Veri-
zon. https://www.verizonenterprise.com/resources/reports/rp%5FDBIR%5F2018%
5FReport%5Fexecsummary%5Fen%5Fxg.pdf Accessed Jun. 2018.
[81] Verizon. 2019. 2019 DataEnterprise Phishing Resiliency and Defense Repor Breach
Investigations Report. Technical Report. Verizon. https://enterprise.verizon.
com/resources/reports/2019-data-breach-investigations-report.pdf Accessed
Jun. 2020.