0% found this document useful (0 votes)
6 views

WhatsaSysadminToDo

The document provides an analysis of password security practices, focusing on the effectiveness of various password policies and the gap between user effort and actual security benefits. It categorizes password-protected accounts based on their importance and potential consequences of compromise, emphasizing that traditional measures of password strength are often misleading. The authors aim to clarify what is supported by evidence regarding password management and to guide administrators in making informed decisions about password policies.

Uploaded by

Nitin Salunke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

WhatsaSysadminToDo

The document provides an analysis of password security practices, focusing on the effectiveness of various password policies and the gap between user effort and actual security benefits. It categorizes password-protected accounts based on their importance and potential consequences of compromise, emphasizing that traditional measures of password strength are often misleading. The authors aim to clarify what is supported by evidence regarding password management and to guide administrators in making informed decisions about password policies.

Uploaded by

Nitin Salunke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

An Administrator’s Guide to Internet Password Research∗

Dinei Florêncio and Cormac Herley Paul C. van Oorschot


Microsoft Research, Redmond, USA Carleton University, Ottawa, Canada

Abstract. The research literature on passwords is rich services, given the realities of finite-resources, imperfect
but little of it directly aids those charged with securing understanding of the threats, and considerable pushback
web-facing services or setting policies. With a view to from users.
improving this situation we examine questions of im- Do password composition policies work? Does forced
plementation choices, policy and administration using a password expiration improve security? Do lockouts help
combination of literature survey and first-principles rea- protect a service? What do password meters accomplish?
soning to identify what works, what does not work, and The most comprehensive document on these and other
what remains unknown. Some of our results are surpris- questions dates to 1985 [13]. The problem is not that no
ing. We find that offline attacks, the justification for great recent guidance is available; OWASP offers several doc-
demands of user effort, occur in much more limited cir- uments [39, 45, 56]; blogs, trade magazines and industry
cumstances than is generally believed (and in only a mi- analyst reports are full of tips, best practices and opin-
nority of recently-reported breaches). We find that an ions. Discussions in online fora reliably generate pas-
enormous gap exists between the effort needed to with- sionate arguments, if little progress. However, much of
stand online and offline attacks, with probable safety the available guidance lacks supporting evidence.
occurring when a password can survive 106 and 1014 We seek to establish what is supported by clear ev-
guesses respectively. In this gap, eight orders of mag- idence and solid justification. Using a combination of
nitude wide, there is little return on user effort: exceed- literature survey and ground-up, first-principles reason-
ing the online threshold but falling short of the offline ing, we identify what is known to work, what is known
one represents wasted effort. We find that guessing re- not to work, and what remains unknown. The end goal
sistance above the online threshold is also wasted at sites is a more useful view of what is known about the imple-
that store passwords in plaintext or reversibly encrypted: mentation, effectiveness, and impacts of choices made
there is no attack scenario where the extra effort protects in deploying password-related mechanisms for access
the account. to services over the web. The target audience is those
interested in the intersection of research literature, and
the operation, administration and setting of policies for
1 Introduction password-protected web-sites.
On the positive side, considerable progress in the last
Despite the ubiquity of password-protected web sites, re-
few years has followed from analysis of leaked plaintext
search guidance on the subject of running them is slight.
datasets. This has provided new evidence challenging
Much of the password literature has become specialized,
many long-held beliefs. Most current password practices
fragmented, or theoretical, and in places confusing and
reflect historical origins [18]. Some have evolved over
contradictory. Those who administer and set policies can
time; others should have, but have not. Environments of
hardly be blamed for being unenthusiastic about publi-
use, platforms, and user bases have changed immensely.
cations which document constantly improving attacks on
We summarize the literature useful to answer practical
password sites but are largely silent on the question of
questions on the efficacy of policies governing password
how they can be defended. Disappointingly little of the
composition, expiration and account locking.
accumulating volume of password research directly ad-
dresses key everyday issues—what to do to protect web- Some of our findings are surprising. Experts now
recognize that traditional measures of strength bear lit-
∗ Proceedings of USENIX LISA’14, Nov. 9-14, 2014, Seattle, WA. tle relation to how passwords withstand guessing, and

1
can no longer be considered useful; current password categorize password-based accounts might be based on
policies have not reflected this. We characterize cir- communication technology—e.g., grouping email pass-
cumstances allowing advanced offline guessing attacks words as one category. We find this unsuitable to our
to occur, and find them more limited than is generally goals, as some email accounts are throw-aways from a
realized. We identify an enormous gap between the “consequences” point of view, while others are critically
guessing-resistance needed to withstand online and of- important. Accounts may be categorized in many ways,
fline attacks, and note that it is growing. We high- based on different criteria. Our categorization (below) is
light that strength above that needed to withstand online based largely on potential consequences of account com-
guessing is effectively wasted at sites that store pass- promise, which we expect to be important characteristics
words in plaintext or reversibly encrypted: there is no in any such categorization:
attack scenario where the extra strength protects the ac-
count from an intelligent adversary. • (personal) time loss/inconvenience
To dispense with a preliminary question: despite long- • (personal) privacy
known shortcomings in both security and usability, pass-
words are highly unlikely to disappear. The many rea- • (personal) physical security
sons include the difficulty of finding something bet-
ter, user familiarity as an authentication front-end (pass- • (personal/business) financial implications
words will likely persist as one factor within multi- • (personal/business) reputational damage
dimensional frameworks), and the inertia of ubiquitous
deployment [9, 30]. Thus the challenges of administer- • (personal/business) legal implications
ing passwords will not fade quietly either, to the disap-
pointment of those hoping that a replacement technology • confidentiality of third-party data
will remove the need to address hard issues.
• damage to resources (physical or electronic)

The above time loss/inconvenience may result from loss


2 Classifying accounts into categories of invested effort or accumulated information, such as
contact lists in email or social networking accounts. One
A common tactic to allegedly improve the security of
lens to view this through is to ask: Would a user invest 10
password-protected sites is to ask users to expend more
minutes in trying to recover a lost account, or simply cre-
effort—choose “stronger” passwords, don’t re-use pass-
ate a new such account from scratch? Another account-
words across sites, deploy and administer anti-malware
differentiating question is: Do users make any effort at
tools, ensure all software on user devices is patched up-
all to remember the account password? Note also that
to-date, and so on.
the consequences of compromise of an account X may
If we assume that users have a fixed time-effort budget extend to accounts Y and Z (e.g., due to password re-
for “password security” [4], then it is unwise to spend use, email-based password recovery for other accounts,
equally on all accounts: some are far more important accounts used as single-sign-on gateways).
than others, correspondingly implying greater impact For use in what follows, and of independent interest,
upon account compromise. This motivates determining we categorize accounts as follows:
how to categorize accounts—a subject of surprisingly lit-
tle focus in the literature. Different categories call for • don’t-care accounts (unlocked doors).
different levels of (password and other) security. Those
who decide and administer policies should be aware of • low-consequence accounts (latched garden doors).
what category not only they see their site falling into, but
• medium-consequence accounts.
what categories subsets of their users would see it falling
into. Note also that some password-protected sites pro- • high-consequence accounts (essential/critical).
vide no direct security benefit to end-users, e.g., services
used to collect data, or which compel email-address user- • ultra-sensitive accounts (beyond passwords).
names to later contact users for marketing-related pur-
Details and examples of these categories are given in Ta-
poses. Thus views on account and password importance
ble 1; we say little more in this paper about the book-
may differ between users and systems administrators or
end categories in this spectrum: don’t-care accounts are
site operators (e.g., see [10]).1
suitably named, and ultra-sensitive accounts are beyond
Criteria for categorizing accounts: A first attempt to
scope. Within this paper, that leaves us to explore what
1 Realistic
systems administrators might self-categorize their site, password policies, user advice, implementation details,
asking: Do users see me as a “bugmenot.com” site? (cf. Table 1) and security levels are suitable for accounts in three main

2
Category of account Description Comments
0: Don’t-care Accounts whose compromise has The security community and users should recognize that for such
no impact on users. A compromise accounts, there would be no technical objection to using password
of the account at any time would password, knowing that it provides no security. Such accounts should
not bother users. Often one-use be isolated from other categories to avoid cross-contamination, e.g.,
accounts with trivially weak pass- due to password re-use. Users should minimize security-related invest-
words, or recreated from scratch ments of time and effort—resources are better spent elsewhere. Possible
if needed subsequently. Perhaps strategies: re-using a single weak password for all such accounts, using
the site compels passwords, despite distinct passwords written down on one sheet for easy access, and using
user not seeing any value therein. publicly shared passwords (see: bugmenot.com).
Generic examples: One-time email accounts (e.g., used for one-off signup, then abandoned). Nuisance
accounts for access to “free” news articles or other content.
1: Low-consequence Accounts whose compromise has Administrators and operators should be realistic in expectations of user
non-severe implications (minimal commitment to such accounts. Some users may rely almost entirely
or easily repaired). Often infre- on a password recovery feature, vs. remembering such account pass-
quently used accounts, relatively words. Users should recognize the place of these between Don’t-care
low-impact if compromised. and Medium-consequence accounts.
Generic examples: Social network accounts (infrequent users). Discussion group accounts (infrequent
users). Online newspapers, streaming media accounts (credit card details not stored onsite).
2: Medium-consequence Non-trivial consequences but lim- User losses are more in time and effort, than large financial loss or con-
ited, e.g., loss of little-used reputa- fidentiality breached by document or data loss. User effort directed
tion account, or credit card details towards resisting online guessing attacks is well-spent. Unclear if the
stored at online U.S. merchant (di- same holds true re: resisting determined offline guessing attacks. Note:
rect financial losses limited to $50). many attack vectors are beyond user control (e.g., browser-based flaws,
server compromises).
Generic examples: Email accounts (secondary). Online shopping sites (credit card details stored onsite).
Social network accounts (casual users). Voice or text communication services accounts (e.g., Skype, MSN).
Charge-up sites for stored value cards (credit card details stored onsite). Human resources sites giving
employees semi-public information.
3: High-consequence Critical or essential accounts re- Most important password discussion, attention and effort of both sysad-
lated to primary employment, fi- mins and users should focus here. Often password protection for such
nance, or documents requiring high accounts is best augmented by second-factor mechanisms (involving ex-
confidentiality. Compromises are plicit user action) or other dimensions (invisible to user). Stakeholder
not easily repaired or have major priorities may differ: an account viewed lower consequence by a user
consequences/side effects. may be categorized essential by their employer (e.g., remote access to a
corporate database via a password).
Generic examples: Email accounts (primary, professional, recovery of other accounts). Major social net-
work/reputational accounts (heavy users and celebrities). Online banking and financial accounts. SSH
and VPN passwords for access to corporate networks. Access to corporate databases, including employee
personal details.
∞: Ultra-sensitive Account compromise may cause It is entirely unsuitable to rely on passwords alone for securing such
major, life-altering, irreversible accounts Passwords if used should be augmented by (possibly multi-
damage. (Many individual users ple) additional mechanisms. The passwords themselves should not be
will have no such accounts.) expected to be tangibly different from those for high-consequence ac-
counts (one might argue that weaker passwords suffice, given stronger
supplementary authentication mechanisms).
Generic examples: Multi-million dollar irreversible banking transactions. Authorization to launch military
weapons. Encryption of nation-state secrets.

Table 1: Categories of password-protected accounts, comments and examples. Accounts in the same category ideally
have passwords of similar strength relative to guessing attacks. Ultra-sensitive accounts require that passwords be
augmented by much more robust mechanisms.

3
categories of interest: low-consequence, medium conse- on what is, in effect, stolen property generated some dis-
quence, and high-consequence accounts.2 cussion when the Rockyou dataset became available in
Use of single term “password” over-loaded? Our 2009. While none of the 32 million users gave permis-
discussion of categories highlights that using the unqual- sion for their passwords to be used, rough consensus now
ified term password for protection that runs the gamut appears to be that use of these datasets imposes little or
from don’t-care to high-consequence sites may mislead no additional harm, and their use to improve password
users. We should not be quick to express outrage on security is acceptable; the datasets are also of course
learning that password1 and 123456 are common available to attackers.
on publicly-disclosed password lists from compromised Note that among Table 2 incidents, passwords were
sites, if these are don’t-care accounts in users’ eyes. Nor stored “properly” (see Section 3.4) salted and hashed in
should it be surprising to find passwords stored cleartext just two cases—Evernote and Gawker. Rockyou, Tianya
on fantasy football sites. The use of the same term pass- and Cupid Media stored plaintext passwords; LinkedIn
word across all account categories, together with a jum- and eHarmony stored them hashed but unsalted; Adobe
ble of unscoped password advice to users, and an absence stored them reversibly encrypted. Section 3.2 and Fig-
of discussion of different categories of accounts (and cor- ure 1 explain why offline guessing attacks (beyond rain-
responding password requirements), likely contributes to bow table lookups) are a relevant threat only when the
lower overall security, including through cross-category password file is properly salted and hashed.
re-use of passwords. We believe finer-grained terminol-
ogy would better serve users here.
3.1 Password strength: ideal vs. actual
Security analysis and evaluation would be much simpler
3 Guessing attacks and password storage if users chose passwords as random collections of char-
acters. For example, if passwords were constrained to
The enormous effort that has been spent on password
be L characters long, and drawn from an alphabet of C
strength and guessing-attacks might lead us to believe
characters, then each of C L passwords would be equally
that the questions there are largely settled and things are
likely. An attacker would have no better strategy than
well-understood. Unfortunately, we find that this is not
to guess at random and would have probability C −L of
the case. For a number of reasons, realistic studies of
being correct on each guess. Even with relatively mod-
password behaviors are hard to conduct [21].
est choices for L and C we can reduce the probability
Until recently, published knowledge on in-the-wild of success (per password guess) to 10−16 or so—putting
password habits [7, 10, 57] was derived from a few it beyond reach of a year’s worth of effort at 10 million
small-scale sets of plaintext passwords [38], or stud- guess verifications per second.
ies without access to plaintext passwords [22]. Re- Unfortunately, the reality is nowhere close to this.
cent large-scale breaches have provided significant col- Datasets such as the 32 million plaintext Rockyou pass-
lections of plaintext passwords, allowing study of ac- words [64] have revealed that user behavior still forms
tual user choices. Tellingly, they reveal that many time- obvious clusters three decades after attention was first
honored assumptions are false. Password “strength” drawn to the problem [38]. Left to themselves users
measures both long-used by academics and deeply em- choose common words (e.g., password, monkey,
bedded in criteria used by IT security auditors, are now princess), proper nouns (e.g., julie, snoopy),
known to correlate poorly with guessing resistance; poli- and predictable sequences (e.g., abcdefg, asdfgh,
cies currently enforced push users toward predictable 123456; the latter by about 1% of Rockyou accounts).
strategies rather than randomness—e.g., evidence, as dis-
This has greatly complicated the task of estimating
cussed below, shows password aging (forced expiration)
passwords’ resistance to guessing. Simple estimation
achieves very little of its hoped-for improvement [63].
techniques work well for the ideal case of random collec-
Here we explore what is known on measuring pass- tions of characters, but are completely unsuited for user-
word strength, fundamentals of storing passwords, and chosen passwords. For example, a misguided approach
suitable target thresholds for how many guesses a pass- models the “entropy” of the password as log2 C L =
word should withstand. L · log2 C where L is the length, and C is the size
Leaked datasets. Table 2 lists several recent leaks of the alphabet from which the characters are drawn
from prominent web-sites. These datasets reveal much (e.g., lowercase only would have C = 26, lowercase
about user password habits. The ethics of doing analysis and digits would have C = 36, lower-, uppercase and
2 An alternate account categorization by Grosse and Upadhyay [28], digits would have C = 62 etc). The problems with
based on value, has categories: throw-away, routine, spokesperson, this approach becomes obvious when we factor in user
sensitive and very-high-value transactions. behavior: P@ssw0rd occurs 218 times in the Rock-

4
Offline guessing attack
Reversibly beyond rainbow tables
Site Year # Accounts Hashed Salted Encrypted needed and possible
Rockyou [64] 2009 32m N
Gawker 2010 1.3m X X Y
Tianya 2011 35m N
eHarmony 2012 1.5m X N
LinkedIn 2012 6.5m X N
Evernote 2013 50m X X Y
Adobe 2013 150m X N
Cupid Media 2013 42m N

Table 2: Recent incidents of leaked password data files. For only two listed incidents (Evernote and Gawker) would an offline guessing attack
be the simplest plausible way to exploit the leak. For each other incident, passwords were stored in such a way that either an easier attack would
suffice, or offline guessing was impossible, as explained in Figure 1.

you dataset but has a score of 52.6 under this measure, million guesses per account, and 10% in 30.9 million
while gunpyo occurs only once and has score 28.2. guesses. In contrast, guessing in optimal order on the
Thus, a password that is far more common (thus more Rockyou distribution of 32 million passwords, an aver-
likely to be guessed) scores much higher than one that is age of only 7, 131 guesses per account breaks 10% of ac-
unique—the opposite of what a useful metric would de- counts. Thus, successfully guessing 10% of the Rockyou
liver. These are by no means exceptions; a test of how accounts is a factor of 30, 900, 000/7131 ≈ 4300 easier
passwords hold up to guessing attacks using the John- than for a length-6 lowercase random distribution (even
theRipper cracking tool [44] shows that L · log2 C corre- though the latter is weaker using the L · log2 C measure).
lates very poorly with guessing resistance [32, 60]. Sim- Thus, oversimplified “entropy-based” measures
ilarly for NIST’s crude entropy approximation [13, 60]: should not be relied upon to draw conclusions about
many “high-entropy” passwords by its measure turn out guessing resistance; rather, their use should be strongly
to be easily guessed, and many scoring lower withstand discouraged. The terms password strength and complex-
attack quite well. Such naive measures dangerously and ity are also confusing, encouraging optimization of such
unreliably estimate how well passwords can resist attack. inappropriate metrics, or inclusion of certain character
The failure of traditional measures to predict guess- classes whether or not they help a password withstand
ability has led researchers to alternatives aiming to more attack. We will use instead the term guessing resistance
closely reflect how well passwords withstand attack. One for the attribute that we seek in a password.
approach uses a cracking tool to estimate the number of
guesses a password will survive. Tools widely used for 3.2 Online and offline guessing
this purpose include JohntheRipper [44], Hashcat, and
its GPU-accelerated sister oclHashcat [42]; others are Determining how well a password withstands attack re-
based on context-free grammars [60, 61]. These tools quires some bound on how many guesses the attacker can
combine “dictionaries” with mangling rules intended to make and some estimate of the order in which he makes
mimic common user strategies: replacing ‘s’ with ‘$’, them. The most conservative assumption on guessing or-
‘a’ with ‘@’, assume a capital first letter and trailing digit der is that the attacker knows the actual password dis-
where policy forces uppercase and digits, etc. Thus sim- tribution (see α-guesswork above); another approach as-
ple choices like P@ssw0rd are guessed very quickly. sumes that he proceeds in the order dictated by an ef-
ficient cracker (see also above). We now review how
Another approach models an optimal attacker with the number of guesses an attacker can make depends on
access to the actual password distribution χ (e.g., the both: (a) the point at which he attacks; and (b) server-
Rockyou dataset), making guesses in likelihood order. side details of how passwords are stored.
This motivates partial guessing metrics [6] addressing The main points to attack a password are: on the
the question: how much work must an optimal attacker client, in the network, at a web-server’s public-facing
do to break a fraction α of user accounts? part, and at the server backend. Attacks at the client
Bonneau’s α-guesswork gives the expected number (e.g., malware, phishing) or in the network (e.g., sniff-
of guesses per-account to achieve a success rate of α ing) do not generally involve guessing—the password is
[7]. Optimal guessing gives dramatic improvements in simply stolen; guess-resistance is irrelevant. Attacks in-
skewed distributions arising from user-chosen secrets, volving guessing are thus at the server’s public-face and
but none in uniform distributions. For example, for backend.
the distribution UL6 of random (equi-probable) length- Attacks on the server’s public-face (generally called
6 lowercase passwords, all can be found in 266 ≈ 309 online attacks) are hard to avoid for a public site—by

5
Per-guess
design the server responds to authentication requests,
Position Password success probability
checking (username, password) pairs, granting access 100 123456 9.1 × 10−3
when they match. An attacker guesses credential pairs 101 abc123 5.2 × 10−4
and lets the server to do the checking. Anyone with a 102 princesa 1.9 × 10−4
browser can mount basic online guessing attacks on the 103 cassandra 4.2 × 10−5
104 sandara 5.3 × 10−6
publicly facing server—but of course the process is usu-
105 yahoo.co 7.2 × 10−7
ally automated using scripts and guessing dictionaries. 106 musica17 9.4 × 10−8
Attacks on the backend are harder. Recommended 107 tilynn06 3.1 × 10−8
practice has backends store not passwords but their salted
hashes; recalculating these from user-entered passwords, Table 3: Passwords from the Rockyou dataset in position 10m
the backend avoids storing plaintext passwords. (Sec- for m = 0, 1, 2, · · · , 7. Observe that an online attacker who guesses
tions 3.4 and 3.5 discuss password storage details.) in optimal order sees his per-guess success rate fall by five orders of
For an offline attack to improve an attacker’s lot over magnitude if he persists to 106 guesses.
guessing online, three conditions must hold.
4) the file has been reversibly encrypted. This case has
i) He must gain access to the system (or a backup) to two paths: the attacker either gets the decryption
get to the stored password file. Since the backend is key (Case 4A), or he does not (Case 4B).
designed to respond only one-at-a-time to requests
from the public-facing server, this requires evading In Case 4A, offline attack is again unneeded: decryp-
all backend defences. An attacker able to do this, tion provides all passwords. In Case 4B, there is no ef-
and export the file of salted-hashes, can test guesses fective offline attack: even if the password is 123456,
at the rate his hardware supports. the attacker has no way of verifying this from the en-
crypted file without the key (we assume that encryption
ii) He must go undetected in gaining password file ac- uses a suitable algorithm, randomly chosen key of suffi-
cess: if breach detection is timely, then in well- cient length, and standard cryptographic techniques, e.g.,
designed systems the administrator should be able initialization, modes of operation, or random padding, to
to force system-wide password resets, greatly lim- ensure different instances of the same plaintext password
iting attacker time to guess against the file. (Note: do not produce identical ciphertext [36]). Even having
ability to quickly reset passwords requires nontriv- stolen the password file and exported it without detec-
ial planning and resources which are beyond scope tion, the attacker’s best option remains online guessing at
to discuss.) the public-facing server; properly encrypted data should
iii) The file must be properly both salted and hashed. not be reversible, even for underlying plaintext (pass-
Otherwise, an offline attack is either not the best at- words here) that is far from random.
tack, or is not possible (as we explain next). In summary, Figure 1 shows that offline guessing is a
primary concern only in the narrow circumstance when
If the password file is accessed, and the access goes un- all of the following apply: a leak occurs, goes unde-
detected, then four main possibilities exist in common tected,3 and the passwords are suitably hashed and salted
practice (see Figure 1): (cf. [8, 64]). In all other common cases, offline attack is
either impossible (guessing at the public-facing server is
1) the file is plaintext. In this case, offline guessing
better) or unneeded (the attacker gets passwords directly,
is clearly unnecessary: the attacker simply reads all
with no guessing needed).
passwords, with nothing to guess [64].
Revisiting Table 2 in light of this breakdown, note that
2) the file is hashed but unsalted. Here the passwords of the breaches listed, Evernote and Gawker were the
cannot be directly read, but rainbow tables (see be- only examples where an offline guessing attack was nec-
low) allow fast hash reversal for passwords within a essary; in all other cases a simpler attack sufficed, and
fixed set for which a one-time pre-computation was thus guessing resistance (above that necessary to resist
done. For example, 90% of LinkedIn’s (hashed but online attack) was largely irrelevant due to how pass-
unsalted) passwords were guessed in six days [26]. words were stored.
Rainbow tables. To understand the importance of
3) the file is both salted and hashed. Here an offline at- salting as well as hashing stored passwords, consider
tack is both possible and necessary. For each guess the attacker wishing to reverse a given hashed pass-
against a user account the attacker must compute the word. Starting with a list (dictionary) of N candidate
salted hash and compare to the stored value. The passwords, pre-computing the hash of each, and stor-
fate of each password now depends on how many
guesses it will withstand. 3 Or similarly, password reset capability is absent or unexercised.

6
Password file
Does not leak Does leak

Leak undetected Leak detected

No offline attack
Unsalted, Salted, Reversibly
Plaintext hashed hashed encrypted
Offline attack ineffective
(if system resets passwords)

Offline attack unneeded Rainbow table Offline guessing


(plaintext available) lookup (gets most) attack

Decryption key Decryption


doesn’t leak key leaks

Offline attack Offline attack unneeded


not possible (plaintext available)

Figure 1: Decision tree for guessing-related threats in common practice based on password file details. Offline guessing is a threat when the
password file leaks, that fact goes undetected, and the passwords have been properly salted and hashed. In other cases, offline guessing is either
unnecessary, not possible, or addressable by resetting system passwords. Absent a hardware security module (HSM), we expect that the “Decryption
key doesn’t leak” branch is rarely populated; failure to prevent theft of a password file gives little confidence in ability to protect the decryption key.

Length Character set Full cardinality pressions for defined patterns of lower, upper, digits and
12 lower 2612 = 256.4
10 lower, upper 5210 = 257.0
special characters; this would extend attacks from (naive)
9 any 959 = 259.1 brute-force spaces to “smart dictionaries” of similar size
10 lower, upper, digit 6210 = 259.5 but containing higher-likelihood user passwords. We em-
phasize that offline attacks using pre-computations over
Table 4: Number of elements targeted by various rainbow tables. fixed dictionaries, including rainbow tables, are defeated
by proper salting, and require leaked password hashes.
ing each pair sorted by hash, allows per-instance rever-
sal by simple table lookup after one-time order-N pre-
computation—and order-N storage. To reduce storage, 3.3 How many guesses must a password
rainbow tables [43] use a series of functions to pre- withstand?
compute repeatable sequences of password hashes called
chains, storing only each chain’s first and last plaintext. Recall that the online attacker’s guesses are checked by
Per-instance computation later identifies hashes from the the backend server, while an offline attacker tests guesses
original fixed password list to a chain, allowing reversal on hardware that he controls. This constrains online at-
in greater, but still reasonable, time than had all hashes tacks to far fewer guesses than is possible offline.
been stored. Numerous rainbow table implementations Online guessing (breadth-first). Consider the on-
and services are available.4 For rainbow tables target- line attacker. For concreteness, assume a guessing cam-
ing selected passwords compositions, Table 4 lists as ref- paign over a four-month period, sending a guess every
erence points the targeted number of passwords, which 1s at a sustained rate, yielding about 107 guesses; we
give a lower bound on pre-computation time (resolving use this as a very loose upper bound on the number of
expected “collisions” increases computation time). online guesses any password might have to withstand.
Modifications [40] may allow tables for any efficiently An attacker sending guesses at this rate against all ac-
enumerable password space, e.g., based on regular ex- counts (a breadth-first attack) would likely simply over-
whelm servers: e.g., it is unlikely that Facebook’s servers
4 For
example, see http://project-rainbowcrack.com could handle simultaneous authentication requests from
or sourceforge.net/projects/ophcrack among others. all users. (In practice, but a tiny fraction authenticate

7
in any 1s period.) Second, if we assume that the aver- using hardware under his control, he can test guesses
age user attempts authentication k times/day and fails at a rate far exceeding online attacks. Improvements
5% of the time (due to typos, cached credentials after in processing power over time make it possible that his
a password change, etc.) then a single attacker sending new hardware computes guesses orders of magnitude
one guess per-account per-second would send 86, 000/k faster than, say, 10-year-old authentication servers which
times more traffic and 1.73×107 /k more fail events than process (online) login attempts. The task is also dis-
the entire legitimate user population combined. Even if tributable, and can be done using a botnet or stolen cloud
k = 100 (e.g., automated clients re-authenticating every computing resources. An attacker might use thousands
15 minutes) our single attacker would be sending a fac- of machines each computing hashes thousands of times
tor of 860 more requests and 1.73 × 105 more fails than faster than a target site’s backend server. Using a GPU
the whole legitimate population. Malicious traffic at this able to compute 10 billion raw hashes/s or more [18, 26],
volume against any server is hard to hide. A more re- a 4-month effort yields 1017 guesses; 1,000 such ma-
alistic average of k = 1 makes the imbalance between chines allows 1014 guesses on each of a million accounts,
malicious and benign traffic even more extreme. Thus or 1020 on a single account—all assuming no defensive
107 guesses per account seems entirely infeasible in a iterated hashing, which Section 3.4 explores as a means
breadth-first online guessing campaign; 104 is more re- to reduce such enormous offline guess numbers.
alistic. Given the lack of constraints, it is harder to bound the
Online guessing (depth-first). What about depth-first number of guesses, but it is safe to say that offline attacks
guessing—is 107 guesses against a single targeted ac- can test many orders of magnitude more guesses than on-
count feasible? First, note that most individual accounts line attacks. Weir et al. [60] pursue cracking up to 1011
are not worthy of targeted effort. Using the Section 2 cat- guesses; a series of papers from CMU researchers inves-
egories, low- and medium-consequence sites may have tigate as far as 1014 guesses [32, 35]. To be safe from
very few such accounts, while at high-consequence sites offline guessing, we must assume a lower bound of at
a majority might be worthy. Second, 107 guesses would least 1014 , and more as hardware5 and cracking methods
imply neither a lockout policy (see Section 4.4) nor any- improve. This is illustrated in Figure 2.
thing limiting the rate at which the server accepts login Online-offline gap. To summarize, a huge chasm sep-
requests for an account. Third, as evident from Table 3 arates online and offline guessing. Either an attacker
which tabulates the passwords in position 10m from the sends guesses to a publicly-facing server (online) or
Rockyou distribution for m = 0, 1, 2, · · · , 7, an online guesses on hardware he controls (offline)—there is no
attacker making guesses in optimal order and persist- continuum of possibilities in between. The number of
ing to 106 guesses will experience five orders of mag- guesses that a password must withstand to expect to sur-
nitude reduction from his initial success rate. Finally, vive each attack differs enormously. A threshold of at
IP address blacklisting strategies may make sending 107 most 106 guesses suffices for high probability of surviv-
guesses to a single account infeasible (albeit public IP ing online attacks, whereas at least 1014 seems necessary
addresses fronting large numbers of client devices com- for any confidence against a determined, well-resourced
plicate this). Thus, the assumptions that would allow 107 offline attack (though due to the uncertainty about the
online guesses against a single account are extreme— attacker’s resources, the offline threshold is harder to es-
effectively requiring an absence of defensive effort. 106 timate). These thresholds for probable safety differ by
seems a more realistic upper bound on how many online 8 orders of magnitude. The gap increases if the offline
guesses a password must withstand in a depth-first attack attack brings more distributed machines to bear, and as
(e.g., over 4 months). This view is corroborated by a offline attacks and hardware improve; it decreases with
2010 study of password policies, which found that Ama- hash iteration. Figure 2 conceptualizes the situation and
zon.com, Facebook, and Fidelity Investments (among Table 5 summarizes.
many others) allow 6-digit PIN’s for authentication [23]. Next, consider the incremental benefit received in im-
That these sites allow passwords which will not (in ex- proving a password as a function of the number of
pectation) survive 106 guesses suggests that passwords guesses it can withstand (10m ). Improvement delivers
that will survive this many guesses can be protected from enormous gain when m ≤ 3: the risk of online attack
online attacks (possibly aided by backend defenses). Fig- is falling sharply in this region, and safety (from online
ure 2 depicts our view: we gauge the online guessing risk guessing) can be reached at about m = 6. By the time
to a password that will withstand only 102 guesses as ex- m = 6, this effect is gone; the risk of online attack is now
treme, one that will withstand 103 guesses as moderate, minimal, but further password improvement buys little
and one that will withstand 106 guesses as negligible. protection against offline attacks until m = 14 (where
The left curve does not change as hardware improves. 5 Hardware advances can be partially counteracted by increased
Offline guessing. Now consider the offline attacker: hash iteration counts per Section 3.4.

8
Online-offline chasm available to the attacker (e.g., leaked file).
Risk of being guessed 1.05

Low Moderate High Extreme


0.85 Online
3.4 Storage and stretching of passwords
0.65
Offline
As Section 3.2 stated, storing salted hashes [48] of pass-
0.45
words beats most alternatives. In all other common stor-
0.25
age options, advanced offline guessing attacks are ei-
ther unnecessary (simpler attacks prevail) or impossible.
0.05
Use of site-wide (global) salt defeats generic rainbow ta-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 21
-0.15 bles; per-account salts (even userid) and iteration counts,
log10(#guesses a password withstands) storable with hashes, provide further protection. Unfor-
tunately, for various reasons, hashing is far from univer-
Figure 2: Conceptualized risk from online and offline guessing as sal (e.g., perhaps 40% of sites do not hash [10]).
a function of the number of guesses a password will withstand over a Guessing is resource-intensive—an offline attack may
4-month campaign. In the region from 106 to about 1014 , improved involve billions of guesses per-account, whereas a web-
guessing-resistance has little effect on outcome (online or offline). server verifies login attempts only on-demand as users
seek to log in. Since early U NIX, this has been lever-
the probability of offline guessing success starts to de- aged defensively by hash functions designed to be slow,
cline). Note the large gap or chasm where online guess- by iteration: repeatedly computing the hash of the hash
ing is a negligible threat but surviving offline guessing of the salted password. Such key stretching was formally
is still far off. In this gap, incrementally increasing the studied by Kelsey et al. [33]; we reserve the term key
number of guesses the password will survive delivers lit- strengthening for the idea, with related effect, of using a
tle or no security benefit. random suffix salt that verifiers must brute-force. Iterat-
For example, consider two passwords which withstand ing 10n times slows offline attack by n orders of magni-
106 and 1012 guesses respectively. Unless we assume the tude; this configurable factor should be engineered to add
offline attacker lacks motivation or resources (and gives negligible delay to users, while greatly increasing an of-
up early), there is no apparent scenario in which the ex- fline attacker’s work. Factors of 4,000–16,000 iterations
tra guess-resistance of the second password helps. For already appear in practice. Our estimates for the number
example, a password like tincan24 (which will sur- of guesses an offline attacker can send assumed no itera-
vive more than a million guesses derived from the Rock- tion; hash iteration narrows the online-offline chasm.
you distribution) and one like 7Qr&2M (which lives in a Salting also removes one form of “parallel attack”: if
space that can be exhausted in (26 + 26 + 10 + 30)6 = two users have the same password, this will not be appar-
926 < 1012 guesses) fare the same: both will survive ent and cannot be exploited to simplify attacks (assuming
online guessing, but neither will survive offline attack. proper salting and hashing, e.g., salts of sufficient length,
Equally, a 7-digit and a 13-digit random PIN have simi- and passwords not truncated before hashing).
lar security properties for the same reason. If we assume Practical instantiations [18] of key stretching (via
additional guess-resistance comes at the cost of user ef- so-called adaptive key derivation functions) include
fort [4, 25, 29], then the effort in the second case ap- bcrypt [48], supported first in OpenBSD with 128-bit
pears entirely wasted. In this case, an effort-conserving salt and configurable iteration count to acceptably adjust
approach is to aim to withstand online attacks, but not delay and server load on given platforms, and allow for
put in the extra effort to withstand offline attacks. In hardware processing advances; the widely used standard
fact there is evidence that many sites abandon the idea PBKDF2 (part of PKCS #5 v2.0 and RFC 2898); and the
of relying on user effort as the defence against offline at- newer scrypt [46] designed to protect against custom-
tacks; i.e., they appear to make little effort to force users hardware attacks (e.g., ASIC, FPGA, GPU).
to reach the higher threshold [23]. Sections 4.1 and 4.2 Keyed hashing. Reversible encryption is one of the
consider the efficacy of password composition policies worst options for storing passwords if the decryption key
and blacklists. leaks, but is among the best if a site can guarantee that it
Frequency of online vs. offline attacks. Authorita- never leaks (even if the password file itself does). Justifi-
tive statistics on the relative frequency of online attacks cation for sites to store passwords reversibly encrypted is
compared to offline attacks do not exist. However it is a need to support legacy protocols (see Section 3.5). Ab-
clear that online attacks can be mounted immediately sent such legacy requirements, the best solution is salt-
against public web sites (such attacks are more efficient ing and iterated hashing with a message authentication
when known-valid userids are obtained a priori [24]), code (MAC) [37, 56] stored instead of a hash; password
while offline attacks require that the password hashes be verification (and testing of guesses) is then impossible

9
without crypto key access. The difficulty of managing reversibly encrypted user passwords, rather than using
keys should not be understated—too often keys stored in salted, hashed records.” Fahl et al. (Leibniz University)
software or a configuration file are found by attackers, state [21]: “The IDM system stored up to five unique
explaining the common use of a one-way hash over re- passwords per user using asymmetric cryptography, so it
versible encryption. However, if the MAC of a salted, would be possible to decrypt the passwords to do a se-
iterated password hash is all that is stored, then even if curity analysis.” Zhang et al. (UNC) state [63]: “The
the MAC key leaks, security is equal to a salted iterated dataset we acquired contains 51,141 unsalted MD5 pass-
hash; and that risk falls away if a hardware security mod- word hashes from 10,374 defunct ONYENs (used be-
ule (HSM) is used for MAC generation and verification. tween 2004 and 2009), with 4 to 15 password hashes per
ONYEN, i.e., the hashes of the passwords chosen for that
3.5 Availability of passwords at the server ONYEN sequentially in time.”7
Figure 1 makes clear that if the password is to be avail-
So salted hashes are a preferred means to store pass- able at the backend (i.e., stored plaintext or reversibly
words, and (cf. Figure 1) an attacker who has access to encrypted) then an offline attack is either unnecessary or
the password file, and exports it undetected, still faces a impossible. Thus, any resistance to guessing above and
computationally expensive offline attack. A site suffer- beyond that needed to withstand online attacks is wasted
ing this severe, undetected breach fares far better than (in no scenario does the extra guessing resistance protect
one with plaintext or hashed-unsalted passwords, or re- the account from competent attackers). Thus sites that
versibly encrypted passwords and a leaked decryption impose restrictive password policies on their users while
key. Nonetheless, many sites use a non-preferred means storing passwords plaintext or reversibly encrypted are
of storing passwords, e.g., there is a “Hall of Shame” of squandering effort. An example appears to be a docu-
sites6 which mail forgotten passwords back to users and mented CMU policy [35]: passwords had to be greater
thus store them either plaintext or reversibly encrypted. than length 8 and include lower, upper, special characters
While the practice is inadvisable for high-consequence and digits. This policy appears designed to withstand an
sites, as Section 2 notes, one size clearly does not fit all. offline guessing attack which (since passwords were re-
In addition to sites which mail-back passwords, recent versibly encrypted) had no possibility of occurring, and
breaches clearly signal that storing plaintext passwords thus imposes usability cost without security benefit.
is not uncommon. In Table 2’s list of recent server leaks, We do not know how common it is for sites to
only two used salted hashes. Failure to store passwords store passwords plaintext or reversibly encrypted. Large
as salted hashes may be due to confusion, failure to un- breaches, such as in Table 2, continue to make clear
derstand the advantages, or a conscious decision or soft- that plaintext is common among low- and medium-
ware default related to legacy applications or protocols consequence sites. The data from CMU and Leibniz hint
as we explain next. that far from being rare exceptions, reversible encryp-
RADIUS (Remote Authentication Dial In User Ser- tion of passwords may also be quite common. If true,
vice) is a networking protocol widely used to pro- this would imply that many sites with strict composition
vide dial-in access to corporate and university net- policies are engaged in a large-scale waste of user effort
works. Early protocols that allowed client machines to based on confused thinking about guessing resistance.
authenticate such as Password Authentication Protocol
(PAP) and Challenge-Handshake Authentication Proto-
col (CHAP) over RADIUS require passwords be avail- 3.6 Other means to address offline attacks
able (to the server) in-the-clear or reversibly encrypted. Online guessing attacks seem an unavoidable reality for
Thus, sites that supported such clients must store pass- Internet assets protected by passwords, while offline at-
words plaintext or reversibly encrypted. Support for pro- tacks occur only in a limited set of circumstances. The
tocols that supercede PAP and CHAP in commodity OS’s guessing resistance needed to withstand these two types
began only circa 2000. Thus, many sites may have had of attacks differs enormously (recall Section 3.3). Sig-
to support such clients at least until a decade or so later. nificant effort has been devoted to getting users to choose
Universities provide interesting examples. Recent pa- better passwords. If an online attacker can send at most
pers by groups researching passwords make clear that 106 guesses per account, then it is relatively easy (e.g.,
several universities were, at least until recently, storing password blacklists) to resist online guessing. Thus, get-
passwords reversibly encrypted or as unsalted hashes. ting users to choose passwords that will withstand over
Mazurek et al. (CMU) state [35]: “The university was us- 106 guesses is an effort to withstand offline attacks, not
ing a legacy credential management system (since aban- online.
doned), which, to meet certain functional requirements,
7 An “ONYEN” is a userid (“Only Name You’ll Ever Need”) in the
6 See http://plaintextoffenders.com. single-sign-on system studied.

10
There are ways to address offline attacks that do not in- passwords) that none of Table 2’s listed sites imposed
volve persuading users to choose better passwords. Fig- strict composition policies on users, we cannot directly
ure 1 makes clear that if the file doesn’t leak, or the compare collections of passwords created with and with-
leak is detected and existing passwords are immediately out composition polices to see if the policy has a signifi-
disabled, things are very different. Thus alternate ap- cant effect. However, Weir et al. [60] compare how sub-
proaches include those that protect the password file, or sets of the Rockyou dataset that comply with different
allow detection of leaks—neither requiring changes in composition policies fare on guessing resistance. (The
user behaviour. exercise is instructive, but we must beware that a sub-
Crescenzco et al. [15] give a method to preclude an set of passwords that comply with a policy are not nec-
offline attack, even if an attacker gains unrestricted ac- essarily representative of passwords created under that
cess to the backend server. It hinges on the fact that an policy; cf. [32].) They found that passwords containing
offline attacker must guess at a rate far exceeding the nor- an uppercase character are little better at withstanding
mal authentication requests from the user population (cf. guessing than unrestricted passwords: 89% of the alpha
Section 3.3). They introduce a novel hashing algorithm strings containing uppercase were either all uppercase, or
that requires randomly indexing into a large collection simply had the first character capitalized (cf. [35]). They
of random bits (e.g., 1 TByte). Ensuring that the only conclude that forcing an uppercase character merely dou-
physical connection to the server with the random bits is bles the number of guesses an intelligent attacker would
matched to the expected rate of authentication requests need. Fully, 14% of passwords with uppercase charac-
from the user population guarantees that the information ters did not survive 50,000 guesses—thus providing in-
needed to compute the hashes can never be stolen. While adequate protection even against online attackers.
the scheme is not standard, it illustrates that ingenious Including special characters helped more: of pass-
approaches to prevent password file leaks are possible words incorporating one, the number that did not sur-
(thereby eliminating the possibility of offline attacks). vive 50,000 guesses dropped to 7%. But common pat-
Leaked password files can also be detected by spiking terns revealed by their analysis (e.g., 28.5% had a single
password files with honeywords—false passwords which special character at the end) were not fully exploited by
are salted, hashed and indistinguishable from actual user the guessing algorithm, so this survival rate is optimistic.
passwords [31]. An offline attack which attempts authen- Thus including special characters likewise does not pro-
tication with a “successfully” guessed honeyword alerts tect robustly even against online attacks.
administrators of a breached password file, signalling Kelley et al. [32] examine passwords created by
that in-place recovery plans should commence. 12,000 participants in a Mechanical Turk study under 8
different composition policies including: basic-length-8,
4 Password policies and system defences basic-length-16, and length-8 mandating all of lower, up-
per, digits and special characters. They use a variety
of cracking algorithms to evaluate guessing resistance
4.1 Composition and length policies
of various passwords. Interestingly, while there is enor-
Many approaches have been tried to force users to choose mous variation between the fate of passwords created un-
better passwords. The most common are policies with der different policies at high guess numbers (e.g., 58%
length and character-group composition requirements. of basic-length-8, but only 13% of basic-length-16 pass-
Many sites require passwords of length at least 8, with words were found after 1013 guesses) there was less vari-
at least three of four character types (lower- and upper- ation for numbers of guesses below 106 . Also, in each
case, digits, special characters) so that each password of the policies tested, fewer than 10% of passwords fell
meets a lower bound by the measure L·log2 C. However, within the first 106 guesses (our online threshold).
as Section 3.1 explains, this naive entropy-motivated Mazurek et al. [35] examine 25,000 passwords (from
metric very poorly models password guessing-resistance a university single sign-on system) created under a pol-
[32, 60]. Users respond to composition policies with icy requiring at least length-8 and mandating inclusion
minimally compliant choices such as Pa$$w0rd and of lower, upper, special characters and digits, and checks
Snoopy2. Passwords scoring better by this metric are against a dictionary. The cracking algorithms tested
not guaranteed to fare better under guessing attacks. On achieved minimal success until 107 guesses, but suc-
examining this, Weir et al. [60] conclude “the entropy ceeded against about 48% of accounts by 1014 guesses.
value doesn’t tell the defender any useful information Depending as they do on a single cracking algorithm,
about how secure their password creation policy is.” these must be considered the worst-case success rates for
Recent gains in understanding guess-resistance come an attacker; it is quite possible that better tuning would
largely from analysis of leaked datasets [7, 60]. Since greatly improve attack performance. In particular, it is
it appears (including by examining the actual cleartext not safe to assume that this policy ensures good survival

11
Attack Guesses Recommended defenses
Online guessing Breadth-first 104 Password blacklist; rate-limiting; account lock-out; recognition of
Depth-first 106 known devices (e.g., by browser cookies, IP address recognition)
Offline guessing Breadth-first 1014 Iterated hashing; prevent leak of hashed-password file; keyed hash
Depth-first 1020 functions with Hardware Security Module support (Sections 3.4, 3.6)
Rainbow table lookup (using extensive pre-computation) n/a Salting; prevent leak of hashed-password file
Non-guessing (phishing, keylogging, network sniffing) n/a Beyond scope of this paper

Table 5: Selected attack types, number of per-account guesses expected in moderate attacks, and recommended defenses. We assume a 4-month
guessing campaign, and for offline guessing that the password file is salted and hashed (see Section 3.4). Rate-limiting includes delays and various
other techniques limiting login attempts over fixed time periods (see Section 4.4). Rainbow tables are explained in Section 3.2.

up to 107 guesses, since most cracking algorithms opti- position policy. Microsoft banned common choices for
mize performance at high rather than low guess numbers. hotmail in 2011. In 2009, Twitter banned a list of 370
In answering whether password policies work, we passwords, which account for (case insensitive) 5.2% of
must first decide what it is we want of them. We use Rockyou accounts; simply blocking these popular pass-
Section 3.3 which argued that safety from depth-first on- words helps a significant fraction of users who would
line guessing requires withstanding 106 guesses, while otherwise be at extreme risk of online guessing.
safety from offline guessing requires 1014 or more. There Also examining efficacy, using a blacklist of 50,000
are many tools that increase resistance to online guess- words Weir et al. [60] found that over 99% of passwords
ing; some offer a simple way to protect against online withstood 4,000 guesses; 94% withstood 50,000. Thus,
guessing with lower usability impact than composition a simple blacklist apparently offers excellent protection
policies—e.g., password blacklists (see Section 4.2). against breadth-first online attacks and good improve-
The above and further evidence suggest that compo- ment for depth-first online attacks.
sition policies are mediocre at protecting against offline Blacklists of a few thousand, even one million pass-
guessing. For example, over 20% of CMU passwords words, can be built by taking the commonest choices
were found in fewer than 1011 guesses, and 48% af- from leaked distributions. At 106 they may offer ex-
ter 1014 [35]. While the stringent policy (minimum cellent protection against all online attacks. However,
length eight and inclusion of all four character classes) they do not offer much protection against offline attacks.
has forced half of the population to cross the online- Blacklists of size 1014 appear impractical. A significant
offline chasm, for practical purposes this is still failure: annoyance issue also increases with list size [35]: users
we expect most administrators would regard a site where may understand if a few thousand or even 106 of the most
half of the passwords are in an attacker’s hands as being common choices are forbidden, but a list of 1014 appears
100% compromised. Of policies studied by Kelley et al. capricious and (in contrast to composition policies) it is
[32] only one that required 16 characters gave over 80% not possible to give clear instructions on how to comply.
survival rate at 1014 guesses.
As an advantage of blacklists, they inconvenience only
Thus, the ubiquity of composition policies (which we those most at risk. 100% of users using one of Twitter’s
expect stems from historical use, zero direct system cost, 370 black-words is highly vulnerable to online guessing.
and the ease of giving advice) is at odds with a rela- By contrast, forcing compliance with a composition pol-
tively modest delivery: they help protect against online icy inconveniences all users (including those with long
attacks, but alternatives seem better. Some policies in- lowercase passwords that resist offline guessing quite
crease guess-resistance more than others, but none deliv- well [35]) and apparently delivers little.
ers robust resistance against the level of guessing modern
There is a risk that a static blacklist lacks cur-
offline attacks can bring to bear. Given that no aspect of
rency; band names and song lyrics can cause popular-
password security seems to incite comparable user an-
ity surges that go unrepresented—e.g., the 16 times that
imosity [1, 14, 41], and that this is exacerbated by the
justinbieber appears in the 2009 Rockyou dataset
rise of mobile devices with soft keyboards, composition
would likely be higher in 2014. Also, even if the top 106
policies appear to offer very poor return on user effort.
passwords are banned, something else becomes the new
most common password. The assumption is that ban-
4.2 Blacklists and proactive checking ning the current most popular choices results in a distri-
bution that is less skewed; this assumption does not seem
Another method to avoid weak passwords is to use a strong, but has not been empirically verified. In one pro-
blacklist of known bad choices which are forbidden, posed password storage scheme that limits the popularity
sometimes called proactive password checking [5, 55]. of any password [54], no more than T users of a site are
This can be complementary or an alternative to a com- allowed to have the same password (for a configurable

12
threshold T ); this eliminates the currency problem and to be highly predictable from old, and the gain is slight,
reduces the head-end password distribution skew. for a policy competing with composition rules as most-
hated by users. The benefits of forcing users to differ-
ent strategies of choosing passwords, and making re-use
4.3 Expiration policies (password aging) harder may be more important. Given the severe us-
Forced password change at regular intervals is another ability burden, and associated support costs, expiration
well-known recommendation, endorsed by NIST [13] should probably be considered only for the top end of
and relatively common among enterprises and universi- the high-consequence category.
ties, albeit rarer among general web-sites. Of 75 sites
examined in one study [23], 10 of 23 universities forced
such a policy, while 4 of 10 government sites, 0 of 10 4.4 Rate-limiting and lockout policies
banks, and 0 of 27 general purpose sites did so.
The original justification for password aging was ap- A well-known approach to limiting the number of on-
parently to reduce the time an attacker had to guess a line attack guesses is to impose some kind of lockout
password. Expiration also limits the time that an attacker policy—e.g., locking an account after three failed login
has to exploit an account. Ancillary benefits might be attempts (or 10, for a more user-friendly tradeoff [12]).
that it forces users into a different password selection It might be locked for a certain period of time, or until
strategy, e.g., if passwords expire every 90 days, it is less the user takes an unlocking action (e.g., by phoning, or
likely that users choose very popular problematic choices answering challenge questions). Locking for an hour af-
like password and abcdefg and more likely that ter three failed attempts reduces the number of guesses
they develop strategies for passwords that are more com- an online attacker can make in a 4-month campaign to
plex but which can be modified easily (e.g., increment- 3 × 24 × 365/3 = 8, 760 (cf. Section 3.3). A related
ing a numeric substring). As a further potential benefit, approach increasingly delays the system response after
it makes re-use between accounts less likely—whereas a small number of failed logins—to 1s, 2s, 4s and so
reusing a static password across accounts is easy and on. Bonneau and Preibusch [10] found that in practice,
common [22], forced expiration imposes co-ordination very few sites block logins even after 100 failed logins
overhead for passwords re-used across sites. (though the sites they studied were predominantly in the
Reducing guessing time is relevant for offline attacks low and medium consequence categories). Secret ques-
(an online guesser, as noted, gets far fewer attempts). So tions (Section 4.6), if used, must similarly be throttled.
any benefit against guessing attacks is limited to cases The two main problems with lockout policies are the
where offline guessing is a factor, which Section 3.2 ar- resulting usability burden, and the denial of service vul-
gues are far less common. nerability created. Usability is clearly an issue given that
Reducing the time an attacker has to exploit an ac- users forget passwords a great deal. The denial of ser-
count is useful only if the original avenue of exploitation vice vulnerability is that a fixed lockout policy allows an
is closed, and no alternate (backdoor) access means has attacker to lock selected users out of the site. Incentives
been installed. When the NIST guidelines were written, may mean that this represents a greater problem for some
guessing was a principal means of getting a password. categories of sites than others. An online auction user
An attacker who had successfully guessed a password might lockout a rival as a deadline approaches; someone
would be locked out by a password change; he would interested in mayhem might lock all users of an online
have to start guessing anew. Several factors suggest brokerage out during trading hours.
that this benefit is now diminished. First, offline pass- Throttling online guessing while avoiding intentional
word guessing is now only one avenue of attack; if the service lockouts, was explored by Pinkas and Sander [47]
password is gained by keylogging-malware, a password and extended by others [2, 59]. Login attempts can
change has little effect if the malware remains in place. be restricted to devices a server has previously associ-
Second, even if the attack is offline guessing, expiration ated with successful logins for a given username, e.g.,
turns out to be less effective than believed. Zhang et by browser cookies or IP address; login attempts from
al. [63] recently found many new passwords very closely other devices (assumed to be potential online guess-
related to old after a forced reset; they were given access ing machines) require both a password and a correctly-
to expired passwords at UNC and allowed (under care- answered CAPTCHA. Through a clever protocol, legiti-
fully controlled circumstances) to submit guesses for the mate users logging in from new devices see only a tiny
new passwords. The results are startling: they guessed fraction of CAPTCHAs (e.g., 5% of logins from a first-
17% of passwords in 5 tries or fewer, and 41% of ac- time device). The burden on online guessers is much
counts in under 3 seconds of offline attacking. larger, due to a vastly larger number of login attempts.
Thus, with forced expiration, new passwords appear The downside of this approach is CAPTCHA usability.

13
4.5 Password meter effectiveness Many sites use backup authentication questions (se-
cret questions) instead of, or in conjunction with, email-
In addition to offering tips or advice on creating good ing a reset link. The advantage of doing both is that an
passwords, many large sites employ password meters, attacker gaining access to a user’s e-mail account could
purportedly measuring password strength, in an attempt gain access to any sites that e-mail reset links. Different
to nudge users toward better passwords. They are gen- categories of accounts (see Section 2) must approach this
erally implemented in Javascript in the browser, severely question differently. For high-consequence accounts, it
limiting the complexity of the strength-estimation algo- seems that backup questions should be asked to further
rithm implemented—e.g., downloading a very large dic- authenticate the user; for lower consequence accounts,
tionary to check against is problematic. Thus many me- the effort of setting up and typing backup questions must
ters use flawed measures (see Section 3.1) which cor- be taken into account.
relate poorly with guessing resistance. This also pro- When a secondary communication channel is unavail-
duces many incongruities, e.g., classifying Pa$$w0rd able (e.g., the site in question is the webmail provider
as “very strong” and gunpyo as “weak”. Of course, de- itself, or a secondary communication channel was never
ficiencies in currently deployed meters do not necessarily set up, or is no longer available) backup questions are
imply that the general idea is flawed. widely used. Unfortunately, plentiful evidence [49, 53]
Among recent studies of the efficacy of meters, Ur et shows that typically in practice, the guessing-space of
al. [58] examined the effect of various meters on 2,931 backup question answers is obviously too small, or in-
Mechanical Turk users, finding that significant increases volves questions whose answers can be looked up on
in guessing-resistance were only achieved by very strin- the Internet for targeted or popular personalities. Several
gent meters. The presence of any meter did however pro- high-profile break-ins have exploited this fact.
vide some improvement even in resistance to online at- Proposed authentication alternatives exist (e.g., [52]),
tacks (i.e., below 106 guesses). De Carnavelet and Man- but require more study. In summary, the implementation
nan [17] compare several password meters in common of password reset mechanisms is sensitive, fraught with
use and find enormous inconsistencies: passwords be- dangers, and may require case-specific decisions.
ing classified as strong by one are termed weak by an-
other. Egelman et al. [20] explore whether telling users
how their password fares relative to others might have 4.7 Phishing
a greater effect than giving an absolute measure. Those
Guessing is but one means to get a password. Phishing
who saw a meter tended to choose stronger passwords
rose to prominence around 2005 as a simple way to so-
than those who didn’t, but the type of meter did not make
cially engineer users into divulging secrets. There are
a significant difference. In a post-test survey 64% of par-
two varieties. Generic or scattershot attempts are gen-
ticipants admitted reusing a password from elsewhere—
erally delivered in large spam campaigns; spear phish-
such users may have been influenced to re-use a different
ing aims at specific individuals or organizations, possibly
old password, but every old password is obviously be-
with target-specific lures to increase effectiveness.
yond the reach of subsequent influences.
Scattershot phishing generally exploits user confusion
as to how to distinguish a legitimate web-site from a
4.6 Backup questions & reset mechanisms spoofed version [19]. The literature suggests many ap-
proaches to combat the problem, e.g., toolbars, tokens,
Reset mechanisms are essential at almost every two-factor schemes, user training. Few of these have en-
password-protected site to handle forgotten passwords. joyed large-scale deployment. One that did, the SiteKey
For most cases, it can be assumed the user still has access image to allow a user to verify a site, was found not to
to a secondary communication channel (e.g., an e-mail meet its design goals [51]: most users entered their pass-
account or phone number on record)—and the assumed word at a spoofed site even in the absence of the trust
security of that channel can be leveraged to provide the indicator. A toolbar indicator study reached a similarly
reset mechanism. A common practice is to e-mail back pessimistic conclusion [62]. Equally, no evidence sug-
to the user either a reset link or temporary password. gests any success from efforts to train users to tell good
Sites that store passwords cleartext or reversibly en- sites from bad simply by parsing the URL; the task it-
crypted can e-mail back that password itself if forgotten, self is ill-defined [29]. In fact, much of the progress
but this exposes the password to third parties. Mannan against scattershot phishing in recent years appears to
et al. [34] propose to allow forgotten passwords to be re- have been by browser vendors, through better identifi-
stored securely; the server stores an encrypted copy of cation and blocking of phishing sites.
the password, with the decryption key known to a user Spear phishing continues to be a major concern, es-
recovery device (e.g., smartphone) but not the server. pecially for high-consequence sites. The March 2011

14
breach on RSA Security’s SecurID hardware tokens was that will resist many guesses is a way of addressing the
reportedly8 such an attack. It is too early to say if ap- threat of offline attacks, and relies almost exclusively
proaches wherein administrators send periodic (defen- on user effort. Investing engineering time to better pro-
sive training) phishing emails to their own users leads tect the password file, to ensure that leaks are likely to
to improved outcomes. be detected, and to ensure that passwords are properly
salted and hashed (or protected using an offline-resistant
scheme such as discussed in Section 3.6) are alterna-
4.8 Re-using email address as username tives dealing with the same problem that rely on server-
Many sites (over 90% by one study [10]) encourage or side effort (engineering effort and/or operational time).
force users to use an email address as username. This Florêncio and Herley [23] found that sites where users
provides a point of contact (e.g., for password resets— do not have a choice (such as government and university
or marketing), ensures unique usernames, and is memo- sites) were more likely to address the offline threat with
rable. However it also brings several security issues. user effort, while sites that compete for users and traf-
It encourages users (subconsciously or otherwise) to fic (such as retailers) were more likely to allow password
re-use the email password, thereby increasing the threats policies that addressed the online threat only.
based on password re-use [16]. It can facilitate forms Scale is important in deciding how costs should be di-
of phishing if users become habituated to entering their vided between the server and client sides; what is reason-
email passwords at low-value sites that users email ad- able at one scale may be unacceptable at another. For ex-
dresses as usernames. ample, many web-sites today have many more accounts
Re-using email addresses as usernames across sites than the largest systems of 30 years ago. A trade-off in-
also facilitates leaking information regarding registered conveniencing 200 users to save one systems adminis-
users of those sites [50], although whether a given string trator effort might be perfectly reasonable; however, the
is a valid username at a site can be extracted for non- same trade-off involving 100 million users and 10 ad-
email address usernames also [10, 11]. Preventing such ministrators is a very different proposition: the factor of
leaks may be as much a privacy issue, as a security issue. 50, 000 increase in the ratio of users to administrators
means that decisions should be approached differently,
especially in any environment where user time, energy,
5 Discussion and implications and effort is a limited resource. There is evidence that
the larger web-sites take greater care than smaller ones
5.1 System-side vs. client-side defences to reduce the burden placed on users [23].

Some password-related defences involve implementa-


tion choices between system-side and client-side mecha- 5.2 Take-away points
nisms; some attacks can be addressed at either the server We now summarize some of the key findings, and make
(at cost of engineering effort) or the client (often at cost recommendations based on the analysis above.
of user effort). Table 6 summarizes costs and benefits of Many different types of sites impose passwords on
several measures that we have discussed, noting security users; asset values related to these sites and associated
benefit and usability cost. accounts range widely, including different valuations be-
We have seen little discussion in the literature of the tween users of the same sites. Thus, despite little atten-
available trade-offs—and implications on cost, security, tion to date in the literature, recognizing different cate-
usability, and system-wide efficiency with respect to to- gories of accounts is important (cf. Table 1). User ef-
tal user effort—between implementing password-related fort available for managing password portfolios is fi-
functionality client-side vs. server-side. Ideally, all de- nite [3, 25, 27, 57]. Users should spend less effort on
cisions on where to impose costs would be made ex- password management issues (e.g., choosing complex
plicitly and acknowledged. A danger is that costs of- passwords) for don’t-care and lower consequence ac-
floaded to the user are often hard to measure, and there- counts, allowing more effort on higher consequence ac-
fore unmeasured—this does not make the cost zero, but counts. Password re-use across accounts in different cat-
makes it hard to distinguish from zero. It is a natural con- egories is dangerous; a major concern is lower conse-
sequence that system-side costs, which are more directly quence sites compromising passwords re-used for high-
visible and more easily measured, are under-utilized, at consequence sites. While this seems an obvious concern,
the expense of client-side mechanisms which download a first step is greater formal recognition of different cate-
(less visible, harder to measure) cognitive effort to end- gories of sites. We summarize this take-away point as:
users. For example, forcing users to choose passwords
T1: Recognizing different categories of web-sites is es-
8 http://www.wired.com/2011/08/how-rsa-got-hacked/ sential to responsibly allocating user password

15
I MPLEMENTATION ASPECT ATTACKS STOPPED OR SLOWED U SER IMPACT R EMARKS
Password stored non-plaintext Full compromise on server breakin alone None Recommended
Salting (global and per-account) Pre-computation attacks (table lookup) None Recommended
Iterated hashing Slows offline guessing proportionally None Recommended
MAC of iterated, salted hash Precludes offline guessing (requires key) None Best option (key management)
Rate-limiting & lockout policies Hugely reduces online guessing Possible user lockout Recommended
Blacklisting (proactive checking) Eliminates most-probable passwords Minor for small lists Recommended
Length rules Slows down naive brute force attacks Cognitive burden Recommended: length ≥ 8
Password meters Nudges users to “less guessable” passwords Depends on user choice Marginal gain
Limits ongoing attacker access; Significant;
Password aging (expiration) indirectly ameliorates password re-use annoying Possibly more harm than good
Cognitive burden. Slows
Character-set rules May slow down naive brute-force attacks entry on mobile devices Often bad return on user effort

Table 6: Password-related implementation options. The majority of Remarks are relevant to medium-consequence accounts (see Table 1). It is
strongly recommended that password storage details (e.g., salting, iterated hashing, MAC if used) are implemented by standard library tools.

management effort across sites. Users are best There is no continuum of guessing attack types—it is ei-
served by effort spent on higher consequence sites, ther online or offline, with nothing in between. There is a
and avoiding cross-category password re-use. chasm between the threshold to withstand these two dif-
ferent types. There is little security benefit in exceeding
While naive “password strength” measures are widely the online threshold while failing to reach the offline one.
used, simple to calculate, and have formed the basis for Passwords that fail to completely cross this chasm waste
much of the analysis around passwords, simplistic met- effort since they do more than is necessary to withstand
rics [13] based on Shannon entropy are poor measures of online attacks, but still succumb to offline attacks.
guessing-resistance (recall Section 3.1). Reasoning that
uses naive metrics as a proxy for security is unsound and T5: Between the thresholds to resist online and of-
leads to unreliable conclusions. Policies, requirements fline attacks, incremental improvement in guess-
and advice that seek to improve password security by resistance has little benefit.
“increasing entropy” should be disregarded.
Recall that rainbow table attacks are one form of offline
T2: Crude entropy-based estimates are unsuitable for attack, and require access to leaked password hashes.
measuring password resistance to guessing attacks;
their use should be discouraged. T6: Rainbow table attacks can be effectively stopped by
well-known salting methods, or by preventing the
While choosing passwords that will resist (online and/or leakage of hashed password files.
offline) guessing has dominated the advice directed at
users, it is worth emphasizing that the success rate of Analysis of Fig.1 shows that offline attacks are possible
several attacks are unaffected by password choice. and necessary in only very limited circumstances which
occur far less often than suggested from the attention
T3: The success of threats such as client-side malware, given by the research literature. If the password file has
phishing, and sniffing unencrypted wireless links not been properly salted and hashed, then user effort to
are entirely unaffected by password choice. withstand beyond 106 guesses is better spent elsewhere.

Password policies and advice aim to have users choose T7: Offline guessing attacks are a major concern only
passwords that will withstand guessing attacks. The if the password file leaks, the leak goes undetected,
threshold number of guesses to survive online and of- and the file was properly salted and hashed (other-
fline attacks differ enormously. The first threshold does wise simpler attacks work, e.g., rainbow tables).
not grow as hardware and cracking algorithms improve;
the second gradually increases with time, only partially It follows that sites that store passwords in plaintext or
offset by adaptive password hashing functions (if used). reversibly encrypted, and impose strict password com-
position policies unnecessarily burden users—the poli-
T4: Password guessing attacks are either online or of- cies offer zero benefit against intelligent attackers, as any
fline. The guessing-resistance needed to survive the increased guessing-resistance is irrelevant. The attacker
two differs enormously. Withstanding 106 guesses either has direct access to a plaintext password, or if the
probably suffices for online; withstanding 1014 or key encrypting the hashed password does not also leak
more guesses may be needed to resist determined, then the (plaintext) password hashes needed for the of-
well-resourced offline attacks. fline guessing attack are unavailable.

16
T8: For implementations with stored passwords avail- Preventing, detecting and recovering from offline at-
able at the server (plaintext or reversibly en- tacks must be administrative priorities, if the burden is
crypted), composition policies aiming to force resis- not to be met with user effort. It is of prime impor-
tance to offline guessing attacks are unjustifiable— tance to ensure that password files do not leak (or have
no risk of offline guessing exists. content such that leaks are harmless), that any leak can
be quickly detected, and that an incident response plan
The threat of offline guessing attacks can essentially be
allows system-wide forced password resets if and when
eliminated if it can be ensured that password files do not
needed. Next, and of arguably equal importance, is pro-
leak, e.g., by keyed hash functions with HSM (hardware
tecting against online attacks by limiting the number of
security) support. Guessing attack risks then reduce to
online guesses that can be made (e.g., by throttling or
online guessing, which is addressable by known mecha-
lockouts) and precluding the most common passwords
nisms such as throttling, recognizing known devices, and
(e.g., by password blacklists). Salting and iterated hash-
proactive checking to disallow too-popular passwords—
ing are of course expected, using standardized adaptive
all burdening users less than composition policies.
password hashing functions or related MACs.
T9: Online attacks are a fact of life for public-facing Acknowledgements. We thank Michael Brogan and
servers. Offline attacks, by contrast, can be entirely Nathan Dors (U. Washington) for helpful discussions,
avoided by ensuring the password file does not leak, anonymous referees, and Furkan Alaca, Lujo Bauer,
or mitigated by detecting if it does leak and having a Kemal Bicakci, Joseph Bonneau, Bill Burr, Nicolas
disaster-recovery plan to force a system-wide pass- Christin, Simson Garfinkel, Peter Gutmann, M. Mannan,
word reset in that case. Fabian Monrose, and Julie Thorpe for detailed comments
on a draft. The third author acknowledges an NSERC
6 Concluding remarks Discovery Grant and Canada Research Chair in Authen-
tication and Computer Security.
In concluding we summarize the case against consuming
user effort in attempts to resist offline guessing attacks. References
1. Honesty demands a clear acknowledgement that we [1] A. Adams and M. A. Sasse. Users Are Not the Enemy. C.ACM,
don’t know how to do so: attempts to get users to 42(12), 1999.
[2] M. Alsaleh, M. Mannan, and P. C. van Oorschot. Revisiting
choose passwords that will resist offline guessing, defenses against large-scale online password guessing attacks.
e.g., by composition policies, advice and strength IEEE TDSC, 9(1):128–141, 2012.
meters, must largely be judged failures. Such mea- [3] A. Beautement and A. Sasse. The economics of user effort in
sures may get some users across the online-offline information security. Computer Fraud & Security, pages 8–12,
October 2009.
chasm, but this helps little unless it is a critical [4] A. Beautement, M. Sasse, and M. Wonham. The Compliance
mass; we assume most administrators would con- Budget: Managing Security Behaviour in Organisations. In
sider a site with half its passwords in an attacker’s NSPW, 2008.
hands to be fully rather than half compromised. [5] F. Bergadano, B. Crispo, and G. Ruffo. High dictionary com-
pression for proactive password checking. ACM Trans. Inf. Syst.
2. Failed attempts ensure a large-scale waste of user Secur., 1(1):3–25, 1998.
[6] J. Bonneau. Guessing human-chosen secrets. University of Cam-
effort, since exceeding the online while falling short bridge. Ph.D. thesis, May 2012.
of the offline threshold delivers no security benefit. [7] J. Bonneau. The science of guessing: analyzing an anonymized
corpus of 70 million passwords. In Proc. IEEE Symp. on Security
3. The task gets harder every year—hardware ad- and Privacy, pages 538–552, 2012.
vances help attackers more than defenders, increas- [8] J. Bonneau. Password cracking, part II: when does pass-
ing the number of guesses in offline attacks. word cracking matter, Sept.4, 2012. https://www.
lightbluetouchpaper.org.
4. Zero-user-burden mechanisms largely or entirely [9] J. Bonneau, C. Herley, P. van Oorschot, and F. Stajano. The quest
to replace passwords: A framework for comparative evaluation
eliminating offline attacks exist, but are little-used. of web authentication schemes. In Proc. IEEE Symp. on Security
and Privacy, 2012.
5. Demanding passwords that will withstand offline at- [10] J. Bonneau and S. Preibusch. The password thicket: Technical
tack is a defense-in-depth approach necessary only and market failures in human authentication on the web. In WEIS,
when a site has failed both to protect the password 2010.
file, and to detect the leak and respond suitably. [11] A. Bortz and D. Boneh. Exposing private information by timing
web applications. Proc. WWW, 2007.
6. That large providers (e.g., Facebook, Fidelity, Ama- [12] S. Brostoff and M. Sasse. “Ten strikes and you’re out”: Increas-
ing the number of login attempts can improve password usability.
zon) allow 6-digit PINs demonstrates that it is pos- CHI Workshop, 2003.
sible to run first-tier properties without placing the [13] W. Burr, D. F. Dodson, and W. Polk. Electronic Authentication
burden of resisting offline attacks on users. Guideline. In NIST Special Pub 800-63, 2006.

17
[14] W. Cheswick. Rethinking passwords. ACM Queue, 10(12):50– itors. OWASP Testing Guide 4.0. Section 4.5: Au-
56, 2012. thentication Testing (accessed July 27, 2014), https:
[15] G. D. Crescenzo, R. J. Lipton, and S. Walfish. Perfectly se- //www.owasp.org/index.php/OWASP_Testing_
cure password protocols in the bounded retrieval model. In TCC, Guide_v4_Table_of_Contents.
pages 225–244, 2006. [40] A. Narayanan and V. Shmatikov. Fast dictionary attacks on pass-
[16] A. Das, J. Bonneau, M. Caesar, N. Borisov, and X. Wang. The words using time-space tradeoff. ACM CCS, 2005.
tangled web of password reuse. NDSS, 2014. [41] D. Norman. The Way I See It: When security gets in the way.
[17] X. de Carnavalet and M. Mannan. From very weak to very strong: Interactions, 16(6):60–63, 2009.
Analyzing password-strength meters. In Proc. NDSS, 2014. [42] oclHashcat. http://www.hashcat.net/.
[18] S. Designer and S. Marechal. Password Secu- [43] P. Oechslin. Making a faster cryptanalytical time-memory trade-
rity: Past, Present, Future (with strong bias towards off. Advances in Cryptology - CRYPTO 2003, 2003.
password hashing), December 2012. Slide deck: [44] Openwall. http://www.openwall.com/john/.
http://www.openwall.com/presentations/ [45] OWASP. Guide to Authentication. Accessed July 27,
Passwords12-The-Future-Of-Hashing/. 2014, https://www.owasp.org/index.php/Guide_
[19] R. Dhamija, J. D. Tygar, and M. Hearst. Why phishing works. In to_Authentication.
Proc. CHI, 2006. [46] C. Percival. Stronger key derivation via sequential memory-hard
[20] S. Egelman, A. Sotirakopoulos, I. Muslukhov, K. Beznosov, and functions. In BSDCan, 2009.
C. Herley. Does my password go up to eleven? the impact of [47] B. Pinkas and T. Sander. Securing Passwords Against Dictionary
password meters on password selection. In Proc. CHI, 2013. Attacks. ACM CCS, 2002.
[21] S. Fahl, M. Harbach, Y. Acar, and M. Smith. On the ecological [48] N. Provos and D. Mazieres. A future-adaptable password scheme.
validity of a password study. In Proc. SOUPS. ACM, 2013. In USENIX Annual Technical Conference, FREENIX Track,
[22] D. Florêncio and C. Herley. A Large-Scale Study of Web Pass- pages 81–91, 1999.
word Habits. Proc. WWW, 2007. [49] R. W. Reeder and S. Schechter. When the password doesn’t work:
[23] D. Florêncio and C. Herley. Where Do Security Policies Come secondary authentication for websites. IEEE Security & Privacy,
From? Proc. SOUPS, 2010. 9(2):43–49, 2011.
[24] D. Florêncio, C. Herley, and B. Coskun. Do Strong Web Pass- [50] P. F. Roberts. Leaky web sites provide trail of clues about corpo-
words Accomplish Anything? Proc. Usenix Hot Topics in rate executives. ITworld.com, August 13, 2012.
Security, 2007. [51] S. Schechter, R. Dhamija, A. Ozment, and I. Fischer. The em-
[25] D. Florêncio, C. Herley, and P. van Oorschot. Password portfolios peror’s new security indicators: evaluation of website authentica-
and the finite-effort user: Sustainably managing large numbers of tion and effect of role playing on usability studies. In Proc. IEEE
accounts. In Proc. USENIX Security, 2014. Symp. on Security and Privacy, 2007.
[26] D. Goodin. Why passwords have never been weaker [52] S. Schechter, S. Egelman, and R. Reeder. It’s not what you know,
and crackers have never been stronger, 2012. Ars Tech- but who you know: a social approach to last-resort authentication.
nia, http://arstechnica.com/security/2012/08/ In Proc. CHI, 2009.
passwords-under-assault/. [53] S. E. Schechter, A. J. B. Brush, and S. Egelman. It’s no secret:
[27] B. Grawemeyer and H. Johnson. Using and managing multi- Measuring the security and reliability of authentication via “se-
ple passwords: A week to a view. Interacting with Computers, cret” questions. In Proc. IEEE Symp. Security & Privacy, 2009.
23(3):256–267, 2011. [54] Schechter, S. and Herley, C. and Mitzenmacher, M. Popular-
[28] E. Grosse and M. Upadhyay. Authentication at scale. IEEE ity is everything: A new approach to protecting passwords from
Security & Privacy, 11(1):15–22, 2013. statistical-guessing attacks. Proc. HotSec, 2010.
[29] C. Herley. So Long, And No Thanks for the Externalities: Ratio- [55] E. H. Spafford. OPUS: Preventing weak password choices.
nal Rejection of Security Advice by Users. Proc. NSPW, 2009. Computers & Security, 11(3):273–278, 1992.
[30] C. Herley and P. van Oorschot. A research agenda acknowledging [56] J. Steven and J. Manico. Password Storage Cheat Sheet (OWASP).
the persistence of passwords. IEEE Security & Privacy, 10(1):28– OWASP. Apr.7, 2014, https://www.owasp.org/index.
36, 2012. php/Password_Storage_Cheat_Sheet.
[31] A. Juels and R. L. Rivest. Honeywords: Making password- [57] E. Stobert and R. Biddle. The password life cycle: user behaviour
cracking detectable. In Proc. ACM CCS, pages 145–160, 2013. in managing passwords. In Proc. SOUPS, 2014.
[32] P. G. Kelley, S. Komanduri, M. L. Mazurek, R. Shay, T. Vidas, [58] B. Ur, P. G. Kelley, S. Komanduri, J. Lee, M. Maass, M. Mazurek,
L. Bauer, N. Christin, L. F. Cranor, and J. Lopez. Guess again T. Passaro, R. Shay, T. Vidas, L. Bauer, et al. How does your
(and again and again): Measuring password strength by simu- password measure up? The effect of strength meters on password
lating password-cracking algorithms. In Proc. IEEE Symp. on creation. In Proc. USENIX Security, 2012.
Security and Privacy, 2012. [59] P. van Oorschot and S. Stubblebine. On Countering Online Dic-
[33] J. Kelsey, B. Schneier, C. Hall, and D. Wagner. Secure appli- tionary Attacks with Login Histories and Humans-in-the-Loop.
cations of low-entropy keys. Proc. ISW’97—Springer LNCS, ACM TISSEC, 9(3):235–258, 2006.
1396:121-134, 1998. [60] M. Weir, S. Aggarwal, M. Collins, and H. Stern. Testing metrics
[34] M. Mannan, D. Barrera, C. D. Brown, D. Lie, and P. C. van for password creation policies by attacking large sets of revealed
Oorschot. Mercury: Recovering forgotten passwords using per- passwords. In Proc. ACM CCS, 2010.
sonal devices. In Financial Cryptography, pages 315–330, 2011. [61] M. Weir, S. Aggarwal, B. de Medeiros, and B. Glodek. Password
[35] M. L. Mazurek, S. Komanduri, T. Vidas, L. Bauer, N. Christin, cracking using probabilistic context-free grammars. In Proc.
L. F. Cranor, P. G. Kelley, R. Shay, and B. Ur. Measuring pass- IEEE Symp. on Security and Privacy, pages 391–405, 2009.
word guessability for an entire university. In ACM CCS, 2013. [62] M. Wu, R. Miller, and S. L. Garfinkel. Do Security Toolbars
[36] A. Menezes, P. van Oorschot, and S. Vanstone. Actually Prevent Phishing Attacks. Proc. CHI, 2006.
Handbook of Applied Cryptography. CRC Press, 1996. [63] Y. Zhang, F. Monrose, and M. K. Reiter. The security of modern
[37] A. Menezes, P. van Oorschot, and S. Vanstone. password expiration: An algorithmic framework and empirical
Handbook of Applied Cryptography. CRC Press, 1996. analysis. In Proc. ACM CCS, 2010.
[38] R. Morris and K. Thompson. Password Security: A Case History. [64] E. Zwicky. Brute force and ignorance. ;login:, 35(2):51–52, April
C.ACM, 22(11):594–597, 1979. 2010. USENIX.
[39] A. Muller, M. Meucci, E. Keary, and D. Cuthbert, ed-

18

You might also like