0% found this document useful (0 votes)

15 views24 pages

Large Scale Study of Web Accessibility Metrics: Beatriz Martins Carlos Duarte

This paper investigates the relationship between various web accessibility metrics by evaluating nearly three million web pages using eleven different metrics. It highlights the challenges in selecting appropriate metrics due to the abundance available and aims to identify groups of metrics that yield similar results. The findings contribute to understanding and improving web accessibility evaluations, offering insights for practitioners and researchers alike.

Uploaded by

afriki4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views24 pages

Large Scale Study of Web Accessibility Metrics: Beatriz Martins Carlos Duarte

Uploaded by

afriki4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Universal Access in the Information Society (2024) 23:411–434

https://doi.org/10.1007/s10209-022-00956-x

LONG PAPER

Large‑scale study of web accessibility metrics

Beatriz Martins1 · Carlos Duarte1

Accepted: 24 November 2022 / Published online: 9 December 2022

Abstract
Evaluating the accessibility of web resources is usually done by checking the conformance of the resource against a standard
or set of guidelines (e.g., the WCAG 2.1). The result of the evaluation will indicate what guidelines are respected (or not) by
the resource. While it might hint at the accessibility level of web resources, often it will be complicated to compare the level
of accessibility of different resources or of different versions of the same resource from evaluation reports. Web accessibility
metrics synthesize the accessibility level of a web resource into a quantifiable value. The fact that there is a wide number of
accessibility metrics, makes it challenging to choose which ones to use. In this paper, we explore the relationship between
web accessibility metrics. For that purpose, we investigated eleven web accessibility metrics. The metrics were computed
from automated accessibility evaluations obtained using QualWeb. A set of around three million web pages were evaluated.
By computing the metrics over this sample of nearly three million web pages, it was possible to identify groups of metrics
that offer similar results. Our analysis shows that there are metrics that behave similarly, which, when deciding what metrics
to use, assists in picking the metric that is less resource intensive or for which it might be easier to collect the inputs.

Keywords Web accessibility metrics · Large-scale accessibility evaluation · Automatic accessibility evaluation · QualWeb

1 Introduction to all the guidelines checked are easy to understand, but do

not support understanding how far from being accessible
Web accessibility is defined as the availability and usability the resource is. Approaches that capture the nuances in the
of web resources by every single individual, no matter their levels of accessibility of web pages or sites could be more
disabilities. “Web accessibility means that websites, tools, useful. This is what web accessibility metrics try to achieve.
and technologies are designed and developed so that people Web metrics are defined as procedures “for measuring
with disabilities can use them” [1]. a property of a web page or website” [3]. For instance,
Web accessibility can be evaluated by verifying the the number of links or the size of an HTML file are two
conformance with standards or guidelines, the most com- examples of properties that can be computed for a site or
mon ones being the Web Content Accessibility Guidelines web page. As such, accessibility metrics are responsible for
(WCAG) [2]. Given that the results typically show the acces- measuring the accessibility level of websites or web pages,
sibility of a page or site in terms of conformance to the set by synthesizing values of web resources [4].
of guidelines being checked, it is not always easy to gauge Web accessibility metrics are formulas that are applied
the accessibility level of the evaluated resource. Approaches using data provided by accessibility evaluations. This data
such as considering a resource accessible only if it conforms can be gathered manually, semi-automatically or automati-
cally. For instance, there are metrics that use data collected
through expert procedures which, when conducting large-
Beatriz Martins and Carlos Duarte contributed equally to this work. scale evaluations (e.g., comparing multiple sites or monitor-
ing a site with hundreds of pages), make them expensive and
* Carlos Duarte impractical choices.
[email protected]
Accessibility metrics are important for multiple pur-
Beatriz Martins poses and scenarios. The most obvious use is to compare
[email protected]
two or more web resources. This can be relevant for practi-
1
LASIGE, Faculdade de Ciências da Universidade de Lisboa, tioners, like web site administrators, that need to compare
Campo Grande 016, Lisboa 1749‑016, Portugal

13
Vol.:(0123456789)
412 Universal Access in the Information Society (2024) 23:411–434

subsequent releases of the same website to check for accessi- 2 Accessibility metrics
bility improvements. But it is also important for researchers,
especially when conducting large-scale accessibility evalua- According to Vigo, Brajnik and Connor [3], web metrics
tions [5], comparing domains of activity [6, 7], geographical measure properties of websites or web pages. These met-
areas [8, 9], or user groups [10]. Recently, in the context rics can summarize results obtained from a guideline review
of the European Web Accessibility Directive1, the different based evaluation [11]. Additionally, Song, et al., [12] state
European member-states reported the results of their acces- that web accessibility metrics have the ability to measure the
sibility monitoring activities. This large-scale accessibility accessibility levels of websites.
monitoring exercise was marred by the difficulty in compar- Metrics should meet five different aspects [13]. They
ing the results reported by the different member-states. The should:
use of a common metric would have mitigated this problem.
Other potential benefits from accessibility metrics include 1. be simple to understand;
support for ranking web pages, which can be relevant for 2. be precisely defined;
retrieval systems; or being used as a way to provide criteria 3. be objective;
for adaptations in adaptive hypermedia systems [3]. 4. be cost-effective; and
Given there is a large number of web accessibility metrics 5. give such information so it is possible to have meaning-
available for researchers, auditors or practitioners to choose ful interpretations.
from, an important question emerges: which one(s) should
be used? To help answer this question, it is important to Freire, et al., also mention that web accessibility metrics are
understand how these web accessibility metrics relate to important to understand, control and improve products and
each other and if it is possible to group them according to processes in companies [13]. Nevertheless, they affirm that it
their similarities and understand the differences between is not possible to define which metric is more effective, since
each group. it depends on the project in question and its needs.
To identify existing relationships between web accessibil- Parmanto and Zeng [14] argue that an accessibility metric
ity metrics we computed eleven different web accessibility should be summarized into a quantitative score that provides
metrics over a set of more than two million web pages. In a continuous range of values so it is possible to understand
this article we report the findings of this study. We begin by how accessible and inaccessible the web content is. It is also
providing a background about web accessibility metrics and important to guarantee that the range of the metric’s values
a review of 19 web accessibility metrics that were proposed supports more fine-grained discrimination than accessible
in the literature. Then, we present the methodology, results and inaccessible. Another property the authors ascribe to
and discussion of a study where we compared eleven of the high quality metrics is that they should consider the com-
19 reviewed metrics. Afterwards, we present a second study, plexity of the websites. It would be convenient if the acces-
where we analyzed the validity of the eleven metrics, by sibility metric could be scalable to conduct a large-scale
assessing how they rate a set of pages created to demon- accessibility evaluation.
strate good and bad accessibility practices. We finish with an In conclusion, metrics are useful to process and under-
analysis of the studies’ limitations before concluding. stand the results obtained from an accessibility evaluation.
With this work we contribute the following: This approach can also help rank web pages or even explore
the accessibility level of web pages or websites. The com-
• A review of existing web accessibility metrics, describing putation of accessibility metrics can produce, as a result,
a total of 19 metrics applicable at page or website level; qualitative or quantitative values.
• The results of computing a subset of eleven metrics over
a sample of nearly three million web pages; 2.1 Literature review
• An analysis that identifies relationships between metrics
and determines groups of metrics that report similar out- Before presenting the details of the identified web accessibil-
comes. ity metrics, it is important to introduce concepts that help to
understand how each metric behaves.
Some metrics use the barrier concept. A barrier is a con-
dition caused by the website or web page that prevents the
user to access the web content [15], i.e., a problem found
in a certain website or web page that prevents the user to
perceive or interact with the web content. Barriers can have
different levels of severity.
1
https://digital-strategy.ec.europa.eu/en/policies/web-accessibility.

13
Universal Access in the Information Society (2024) 23:411–434 413

Whenever an accessibility evaluation is performed, its actual problems that were encountered in a web page and the
outcomes vary according to the compliance with standards. potential barriers, i.e., all potential problems of a web page
Different outcomes are considered by different metrics, but that can lead to accessibility issues if they are not properly
they can be summarized into: (1) pass, which means that designed.
the web content fulfills a certain recommendation; (2) fail, Vigo and Brajnik [4] state that the failure rate quanti-
which indicates that the web content does not meet the rec- tatively measures the accessibility conformance, having a
ommendation; (3) warning, an outcome produced by auto- score from zero to one. A web page with a failure rate of
mated evaluation tools to represent those instances where zero is totally accessible, whereas a totally inaccessible web
the tool could not determine the conformance, or lack of page has a failure rate score of one.
conformance, with the recommendation, and the interven- The simplicity of this metric can be explained with the
tion of an human expert is required. fact that it does not consider the error nature, i.e., “whether
Besides the above aspects, it is important to note that checkpoints are automatic errors, warnings or generic prob-
some of the web accessibility metrics that have been lems” [18], or the fact that it does not take into consideration
reviewed verify the conformance with checkpoints and these the checkpoints’ weights.
checkpoints are grouped into priority levels: priority 1, pri-
Bp
ority 2 or priority 3. The priority levels in some metrics have Ip = (1)
associated weights that vary from zero to one. This applies Pp
to metrics proposed before the introduction of WCAG 2.0.
Metrics proposed after WCAG 2.0 typically verify conform- Equation 1 presents the formula for computing the Failure
ance with success criteria grouped at conformance levels A, Rate metric, where Ip is the Failure Rate final score, Bp iden-
AA and AAA. tifies the actual points of failure, and Pp identifies the poten-
The score of a metric can be bounded or not. A bounded tial points of failure.
metric makes it easier to gauge where a score falls within the
accessible to inaccessible continuum of values. Unbounded 2.1.2 Unified web evaluation methodology (UWEM)
metrics, on the other hand, by not having a defined range
of values, can lead to a harder interpretation of whether a According to Sirithumgul, Suchato and Punyabukkana [10],
resource is accessible or inaccessible. UWEM 1.0 is an improved version of UWEM 0.5 [19]
In the following, we present the metrics we found by that was developed in 2006. It is based on user feedback
searching the existing literature on web accessibility. For rather than WCAG priority levels [12]. The final value of
each metric, we describe the data it is based on, its output this metric represents a probability of finding a barrier in a
range, and any other considerations regarding its application website or web page that could prevent users from complet-
(e.g., if it is applicable to web pages or web sites). ing a certain task [11, 13, 20]. This metric also considers
the potential problems and barriers’ weights. The UWEM
2.1.1 Failure‑rate (FR) formula is based on the product of the checkpoints’ failure
rates [20]. Its results are precise and accurate, however, it
The Failure Rate (FR) was developed by Sullivan and Mat- only takes into consideration 2 priority levels of the WCAG
son in 2000 [16]. According to Vigo, et al., [17], this metric guidelines [21].
relates the actual points of failure with the potential points The formula can be interpreted as a web page score or
of failure. For instance, if a web page has ten images, all a website score. If the website score is wanted, then the
these images are potential barriers if they are not properly UWEM formula will be the sum of the UWEM score of
defined. If five out of these ten images do not have a proper each web page divided by the total number of pages of that
alternative text, according to the accessibility evaluator, they website, i.e., the arithmetic mean.
are actual barriers. This formula’s final score varies between zero and one,
A point of failure can be interpreted in two ways: as an where zero means the web page is accessible and one means
accessibility problem or barrier that occurs on a web page’s the web page is inaccessible.
elements preventing the interaction of a user with the web ∏ Bi
content; or as the elements that cause accessibility prob- UWEM = 1 − 1− W (2)
Pi i
lems. According to the first interpretation, each element can
have multiple points of failure, which allows us to count Equation 2 presents the formula for computing the UWEM
more accessibility problems and better estimate the acces- metric, where Bi is the total of actual points of failure of a
sibility level. Therefore, we decided to consider a point of checkpoint i, Pi is the total of potential points of failure of
failure as an accessibility problem that occurs on a web page. a checkpoint i, and Wi identifies the severity of a certain
Consequently, the failure rate can be the ratio between the barrier i (this weight is calculated by simple heuristics, by

13
414 Universal Access in the Information Society (2024) 23:411–434

combining the results of an automatic evaluation and manual of each checkpoint and the priority of that checkpoint [4].
testing or by disabled users feedback [22]). The arithmetic mean of all pages of a website represents the
metric score for that website. The Hackett and the Parmanto
2.1.3 A3 and Zeng formulas are represented in equations 4 and 5,
respectively.
In 2006, Buhler, et al., proposed some changes to the UWEM The range of this metric’s values is not bounded [18],
0.5 metric [22]. In particular, some probability properties as there is no limit for this metric’s score. The only refer-
were used as well as some issues related to the complex- ence this metric has is the higher its score, the worse the
ity of the web page were aggregated. A3 is an improved accessibility level of the website. Since this metric takes into
aggregation formula based on UWEM [11, 13, 20]. Similar consideration 25 WCAG checkpoints out of 65, this metric
to UWEM, A3 also considers the failure rate, i.e., the ratio offers a guideline support of 38%. Nevertheless, according
between the number of barriers (violation of a given check- to Brajnik and Vigo [24], WAB is the best individual metric
point) and the total number of potential barriers. UWEM and compared to A3, Page Measure (PM) and Web Accessibil-
A3 consider the barriers weights coefficients based on the ity Quantitative Metric (WAQM) since it yields an accuracy
impact on the user of each given barrier [13]. rate of 96%.
This metric produces a small range of values, that are all
1 ∑ ∑ fr(p, c)
between zero and one, where zero means the web page is WAB = (4)
Np p c priorityc
accessible whereas 1 means the web page is inaccessible.
∏ Bpb
+
Bpb Equation 4 presents the formula for computing the WAB by
(1 − Fb ) Npb (3) Hackett metric, where fr(p, c) is the failure rate of a certain
Bp
A3 = 1 −
b checkpoint c in web page p, priorityc identifies the priority
Equation 3 presents the formula for computing the A3 met- level of the checkpoint c (1, 2 or 3), and Np is the total num-
ric, where Bpb is the total of actual points of failure of a ber of web pages of a given website.
checkpoint b in page p, b is the barrier (checkpoint viola- ∑T ∑n b
tion), Npb is the total of potential points of failure of a check- ( ij )(Wi )
WAB =
j=1 i=1 Bij
(5)
point b in page p, and Fb identifies the severity of a certain T
barrier b (this weight is calculated by simple heuristics, by
combining the results of an automatic evaluation and manual Equation 5 presents the formula for computing the WAB
testing or by disabled users feedback [22]). by Parmanto and Zeng metric, where bij is the number of
The authors of this metric performed an experimental actual violations of checkpoint i in page j, Bij is the number
study to compare the results between A3 and UWEM and of potential violations of checkpoint i in page j, n is the
understand the differences between them. A checkpoint total number of checkpoints, Wi identifies the weight of the
weight of 0.05 was used for all checkpoints, assuming that checkpoint c, according to its priority level (this weight is
all of them would have the same importance. This experi- calculated from experiments with users with different dis-
ment was conducted with a group of six disabled users that abilities [11]), and T is the total number of web pages of a
evaluated six web pages. After applying both metrics, the given website.
authors concluded that A3 outperformed UWEM in the Parmanto and Zeng [14] weighted the priority levels
experiment [11]. in the calculation of the WAB score. Priority 1 violations
represent a higher weight score since web pages with this
level of violations are more difficult to access by people
2.1.4 Web accessibility barriers (WAB)
with disabilities.
Ana Belén Martínez, Aquilino A. Juan, Darío Álvarez, and
The WAB metric was proposed by Hackett, et al., in 2003
Ma del Carmen Suárez [21] went further and created a quan-
[23]. Parmanto and Zeng proposed a new version of the
titative metric based on the WAB metric: WAB∗. The WAB∗
WAB metric in 2005 [14]. It quantitatively measures the
metric is based on WAB and has some UWEM-like exten-
accessibility of a web site considering the 25 WCAG 1.0
sions. It gets the WAB’s precision of the accessibility score
checkpoints (5 checkpoints in Priority 1, 13 checkpoints in
and uses more detailed checkpoints, as UWEM does. With
Priority 2, and 7 checkpoints in Priority 3). It applies the
all these tools, the authors could build a new metric, namely
concepts of potential problems and weights of the barri-
WAB∗. Martínez, et al. [21], point out the main problems
ers. Barriers’ weights are related to the relative importance
and the main advantages of WAB and UWEM metrics. For
of a given checkpoint. It takes into consideration the total
instance, WAB performs tests to evaluate checkpoints, yet it
number of pages of a certain website. The WAB formula
is not precise in the way it determines the number of potential
is defined as the ratio between the sum of the failure rate

13
Universal Access in the Information Society (2024) 23:411–434 415

violations of each checkpoint. However, it specifies all three given web page, and Nelements is the number of elements on
priorities’ checkpoints. Concerning UWEM, this metric pro- a given web page.
duces more precise results, although it only focuses on priority
1 and 2 checkpoints. Thus, these two metrics are merged into
WAB∗. Consequently, WAB∗ has more precision in terms of 2.1.7 SAMBA
the obtained results. In conclusion, this new metric considers 3
priority levels and has 36 checkpoints (25 WAB checkpoints + Brajnik and Lomuscio proposed SAMBA [27], a semi-
11 UWEM checkpoints). This metric was tested by evaluating automatic method for measuring barriers of accessibil-
30,600 web pages from banking sector websites. The results ity, that combines automatic evaluations with human
show that WAB∗ outperforms WAB and UWEM. judgment, and, for this reason, is a semi-automated
methodology.
2.1.5 Overall accessibility metric (OAM) SAMBA is based on WCAG 1.0. This method applies
human judgment in the context of a Barrier Walkthrough
In 2005, Bailey and Burd [25] proposed OAM. The calculated analysis [27] to estimate aspects related to the automated
value considers the number of violations of a checkpoint and tool errors and the severity of the barriers. The Barrier
the weight of that checkpoint as the confidence level. This Walkthrough method is used for evaluating the web acces-
confidence level depends on how certain the checkpoint is. sibility [28] and it is performed by experts. This manual
There are four confidence levels: certain checkpoints weigh approach contextualizes the accessibility barriers identified
10, high certainty checkpoints weigh 8, low certainty check- by experts within usage scenarios and these barriers receive
points weigh 4 and the most uncertain checkpoints weigh 1. a severity score. The severity score of a barrier assumes a
The higher the weight, the more the barrier is penalized. value from {0, 1, 2, 3} that corresponds to false positive
This metric does not have a bounded range of values. The (FP), minor, major or critical barriers.
higher this metric’s score, the more inaccessible the web page This semi-automated approach [27] applies a set of
is. sequential steps. Initially, automatic accessibility tools are
∑ used to identify the potential accessibility barriers and the
Bc Wc
OAM = (6) provided results are submitted to human judgment. Then,
c
Nattributes + Nelements it is possible to statistically estimate the false positives and
the severity of barriers for each website. Finally, barriers
Equation 6 presents the formula for computing the OAM
are grouped according to disability types and it is possible
metric, where Bc is the number of violations of checkpoint c,
to derive scores that represent non-accessibility.
Wc is the weight of the checkpoint c, Nattributes is the number
This metric computes two accessibility indexes: Raw
of HTML attributes on a given web page, and Nelements is the
Accessibility Index (AIr) and Weighted Accessibility Index
number of elements on a given web page.
(AIw). Since AIw is based on confidence intervals manually
computed by human experts, its result is represented by an
2.1.6 Page measure (PM)
interval [ AIw , AIw ]. The confidence intervals express the
minimum and the maximum percentages of a type of bar-
Later, in 2007, Bailey and Burd [26] proposed Page Meas-
riers (FP, minor, major or critical) for a specific disability
ure (PM). This metric “analyzes the correlations between the
(blind users, deaf users, among others) on a given website.
accessibility of web sites and the policies adopted by software
For example, having the interval [6, 12] in column ‘critical’
companies regarding usage of CMS or maintenance strate-
and row ‘blind’ means that, in a given website, there are
gies” [4]. It is similar to OAM (Overall Accessibility Metric),
between 6% and 12% of critical barriers for blind users. The
however, instead of using checkpoint weights, the checkpoint
AIw index considers weights that are associated with minor
priority levels are considered. This metric does not have a
and major severity levels. If both minor and major weights
bounded range of values. The higher this metric’s score, the
are equal to 1, AIw becomes unweighted (AIu).
more inaccessible the web page is.
SAMBA has a limitation: it cannot cope with false nega-
∑ Bc tives, i.e., problems that were not identified [4]. This means
c priority
PM = c
(7) that, although human judgments are used to evaluate and
Nattributes + Nelements validate the results obtained by the automated tools, they
do not deal with the problem of false negatives, since the
Equation 7 presents the formula for computing the PM met-
experts only verify the identified problems. For this reason,
ric, where Bc is the number of violations of checkpoint c,
the actual issues that were not identified, are not going to be
priorityc identifies the priority level of the checkpoint c (1,
analyzed by the experts, i.e., the problems that are not identi-
2 or 3), Nattributes is the number of HTML attributes on a
fied by the evaluation tools are not considered.

13
416 Universal Access in the Information Society (2024) 23:411–434

∏
AIr = ��⃗d )2
(1 − F ⋅ D Web accessibility”. Nevertheless, the user experience is
(8) a subjective problem and varies according to the user.
d
This means that it is complicated to confirm a relation-
∏ ship between user experience and web accessibility, since
AIw = (1 − F ⋅ min{1, Hd })2 (9)
d
different users can have different user experiences [29].
When using WAEM, the higher the weighted accessibil-
∏ ity score, the more accessible the website is.
AIw = (1 − F ⋅ Hd )2 (10)
d s
p=
h (14)
number of potential barriers Equation 14 presents the formula for computing the Pass
F= , (11)
number of HTML lines Rate, where p is the pass rate of a checkpoint, s is the num-
ber of pages of a website a checkpoint passed, and h is the
f
d,mnr
f
d,maj
total number of web pages of a website.
Hd = + +f , (12)
wmnr wmaj d,cri
∑
m
qi = Pi w = Pi,j wj (15)
j=1
f d,mnr f d,maj
Hd = + + f d,cri (13) Equation 15 presents the formula for computing the
wmnr wmaj
Weighted Accessibility Score, where qi is the weighted
In Eq. 8, F is the barrier density of a website, d is a disability accessibility score of a website i, Pi,j is the pass rate of a
type, and D is the disability vector of a website. In Eqs. 9 checkpoint j on a website i, m is the number of checkpoints,
and 10, Hd is the severity of the barriers of a disability type and wj is the weight of a checkpoint j, according to its prior-
d. Equations 12 and 13 identify f as the relative frequency, ity level.
mnr as a minor barrier, maj as a major barrier, and cri as a {
1 ∶ Pa w > Pb w
critical barrier. f ((a, b), w, P) = (16)
0 ∶ otherwise

2.1.8 Web accessibility evaluation metric (WAEM) Equation 16 presents the formula for computing the function
f, where (a, b) is a PUEXO pair that represents an order iden-
The Web Accessibility Evaluation Metric Based on Partial tified by disabled users, w is the set of checkpoints’ weights,
User Experience Order [29] was proposed by Song et al. and and P is the matrix of the pass rates of all websites.
intends to analyze data from the user experience of people
with disabilities. To do so, the authors defined a formula that ∑
k
argmaxw = f (Li , w, P)
calculates the weighted accessibility score (Eq. 15), by using i=1
the pass rate (Eq. 14), of a certain checkpoint on a website. (17)
∑
m
Besides these formulas, this metric also considers users’ s.t. wj = 1; ∀i, wi > 0
experience evaluations through PUEXO pairs. PUEXO j=1
(Partial User EXperience Order) defines pairs of websites
that establish a comparison in terms of user experience. For Equation 17 presents the formula for computing the optimal
instance, the (a, b) pair indicates that a certain user had a checkpoint weight vector w, where w is the set of check-
better browsing experience in website a compared to web- points’ weights, L is the matrix that contains all pairs of
site b. The PUEXO pairs are then compared to the weighted websites, i is the website, j is the checkpoint, m is the num-
accessibility scores of the websites in question, by Eq. 16. ber of checkpoints, and P is the matrix of pass rates.
Subsequently, the results of Eq. 16 and the users’ evalu- ∑
k
ations are both used to calculate the optimal checkpoint argminw = ei
weights (Eq. 17). Equation 17 is not, however, adequate i=1
once the user experience is a subjective aspect. For this rea- ∑
m

son, the authors developed Eq. 18, where they make use of s.t. wj = 1; ∀, ei ≥ 0, wi > 0, PLi,1 w + ei > PLi,2 w
j=1
machine learning.
(18)
As seen in [29], “results demonstrate that WAEM
really can better match the accessibility evaluation results Equation 18 presents the formula for computing the optimal
with the user experience of people with disabilities on checkpoint weight vector w, where i is the website, e is the
error tolerance vector, P is the matrix of pass rates, m is the

13
Universal Access in the Information Society (2024) 23:411–434 417

number of checkpoints, and L is the matrix that contains all ∑

k
pairs of websites. argminw = ei
i=1
∑
m
(20)
s.t. wj = 1; ∀1 ≤ j ≤ m, wj > 0;
2.1.9 Reliability aware web accessibility experience metric
j=1
(RA‑WAEM)
∀(a, b, u) ∈ L, ea,b ≥ 0, Pa w + ea,b > Pb w
RA-WAEM is a metric that assesses the severity of acces-
Equation 20 presents the formula that corrects 19, since it
sibility barriers by considering the user experience of
is not continuous, where (a, b, u) is a tuple containing the
disabled people [12]. The authors of this metric wanted
PUEXO pair of websites a and b that were evaluated by the
to overcome the limitation of only using checkpoint
disabled user u, e is the error tolerance, P is the matrix of
weights, by reflecting the user experience of people with
pass rates, m is the number of checkpoints, j is the check-
disabilities.
point, w is the checkpoints’ weights, and L is the matrix that
This metric’s approach is similar to WAEM’s approach.
contains all pairs of websites a and b ordered by disabled
RA-WAEM is also aligned with PUEXO, which represents
user u.
a pair of ordered websites, according to user experience. As
∑
RA-WAEM is similar to WAEM, its process is identical to argminw = (a, b, u) ∈ Lea,b ru
WAEM’s formulas 14, 15 and 16. From Eq. 16, RA-WAEM
∑
m
exhibits a different behavior. This metric also aims to cal- s.t. wj = 1; ∀1 ≤ j ≤ m, wj > 0; (21)
culate the optimal checkpoint weights as shown in Eq. 19. j=1
However, this last equation is not continuous. For this rea- ∀(a, b, u) ∈ L, ea,b ≥ 0, Pa w + ea,b > Pb w
son, Eq. 20 emerged. Yet, the fact that user experience is
subjective and influenced by users’ expertise level and objec- Equation 21 presents the formula for computing the reli-
tivity, led to a reliability aware model (Eq. 21) where they ability aware model, where (a, b, u) is a tuple containing the
introduce the reliability level. This new formula is the main PUEXO pair of websites a and b that were evaluated by the
difference between RA-WAEM and WAEM. disabled user u, e is the error tolerance, r is the reliability
The results shown by Song, et al., in their study [12], level vector, P is the matrix of pass rates, m is the number
assert that RA-WAEM outperforms WAEM, since it is of checkpoints, j is the checkpoint, w is the checkpoints’
more stable and reliable concerning the user experience weights, and L is the matrix that contains all pairs of web-
of disabled people. One limitation of both RA-WAEM sites a and b ordered by disabled user u.
and WAEM metrics is that the users that are picked to
evaluate the accessibility of a set of web pages, may not 2.1.10 Barrier impact factor (BIF)
have a certain expertise level, ending up compromising the
final metric results. For instance, users with low expertise BIF is the barrier impact factor. According to Battistelli,
would probably have more difficulty, considering a website et al. [30], this metric analyzes each accessibility error with
as inaccessible [12]. Whenever the experience of more respect to the way it affects disabled users’ browsing by
volunteers is considered, the performance of both metrics means of assistive technologies. It evaluates the accessibil-
decreases. Nevertheless, results indicate that RA-WAEM ity, against the WCAG guidelines, using a list of assistive
is significantly less affected than WAEM [12]. technologies or disabilities affected by each error. Each error
With RA-WAEM, the higher the weighted accessibility represents a success criterion failure that was detected by
score, the more accessible the website is. the accessibility evaluation tool. It is necessary to define a
∑ barrier-error association table in advance that represents a
argmaxw = (a, b, u) ∈ L f (a, b, w, P) list of assistive technologies affected by each error.
∑
m
(19) The main goal is to understand the impact factor of each
s.t. wj = 1; ∀1 ≤ j ≤ m, wj > 0 barrier on a specific assistive technology or disability (for
j=1
example, a screen reader). The result score refers to the
Equation 19 presents the formula for computing the optimal amount of detected errors that were identified for each assis-
checkpoint weight vector w, where w is the set of check- tive technology and it also considers the weight of that assis-
points’ weights, L is the matrix that contains all pairs of tive technology. This weight’s value varies according to the
websites a and b ordered by disabled user u, i is the website, success criterion conformance level: level A errors weigh 3,
j is the checkpoint, m is the number of checkpoints, and P is level AA weigh 2 and level AAA weigh 1.
the matrix of pass rates. This metric’s range of values is not defined. Neverthe-
less, the minimum score it can have is 0, which represents

13
418 Universal Access in the Information Society (2024) 23:411–434

the absence of barriers. The higher this metric’s score, the However, WAQM proved to be tool independent when con-
higher the impact of a certain barrier on a specific type of ducting large scale accessibility evaluations with more than
assistive technology/disability. 1400 web pages [4].
∑ WAQM’s normalized values range from zero to one hun-
BIF(i) = error(i) × weight(i) (22) dred, where the latter corresponds to the maximum acces-
error
sibility level.
Equation 22 presents the formula for computing the BIF ∑
metric, where BIF(i) is the barrier impact factor of an assis- 1 � � Nx,y z∈{1,2,3} Wz A(x, y, z)
WAQM = N
tive technology i, error(i) is the number of detected errors N x∈{p,o,u,r} x y∈{e,w} Nx
that affect the assistive technology i, and weight(i) is the (23)
weight of assistive technology i (1, 2 or 3).
⎧ −100 Bx,y,z Bx,y,z a−100
⎪ b � Px,y,z �+ 100, if Px,y,z < a−100∕b
A(x, y, z) = ⎨ Bx,y,z (24)
2.1.11 Web accessibility quantitative metric (WAQM) ⎪ −a Px,y,z + a, otherwise
⎩

WAQM was proposed by Vigo, et al. [18], and overcomes Equations 23 and 24 present the formulas for computing the
some limitations of previous measures (i.e., lack of score WAQM metric, where N is total number of checkpoints, Nx
normalization and consideration of manual tests). It consid- is the number of checkpoints from a specific principle x (x
ers the WCAG guidelines classified according to the 4 prin- ∈ {Perceivable, Operable, Understandable, Robust}), Nx,y is
ciples: Perceivable, Operable, Understandable and Robust the number of checkpoints from a principle x and type of test
[13]. This metric measures the conformance using percent- y (y ∈ {automatic, manual)}, Wz is the weight of the check-
ages [31], and it produces one score for each WCAG guide- point, according to its priority level z, Bx,y,z is the number of
line in addition to an overall score. It considers the severity accessibility errors of a checkpoint of priority level z, type
of checkpoint violations according to WCAG priorities and of test y and principle x, Px,y,z is the number of test cases of
it provides normalized results. a checkpoint of priority level z, type of test y and principle
Unlike other metrics, WAQM also takes into account the x, a is a variable that varies between 0 and 100, and b is a
problems that are identified as warnings by the accessibility variable that varies between 0 and 1.
evaluation tools [13]. It not only considers automatic tests
but also manual tests.
2.1.12 Navigability and listenability
According to Vigo, Arrue, Brajnik, Abascal and Lomus-
cio [18], this metric was proposed to overcome the draw-
Fukuda et al. [32] proposed two different web metrics. These
backs of the WAB and FR metrics as they do not focus on
metrics are responsible for evaluating the usability for blind
specific user groups, cover less guidelines and do not con-
users.
sider expert manual evaluation results.
Navigability is responsible for evaluating the structure
This metric is based on the sum of failure rates for groups
of the web page elements. It evaluates headings, intra-page
of checkpoints which are grouped according to their priority
links, labels, among other HTML elements of a certain web
levels and their WCAG 2.0 principles (Perceivable, Oper-
page. Listenability takes into consideration the alternative
able, Understandable, Robust) [20]. The authors defined
texts and denotes how properly built they are.
weights for each priority level: W1 = 0.8, W2 = 0.16
Each of these two metrics executes a set of calculations
and W3 = 0.04 for checkpoints with priorities 1, 2 and 3,
using the aDesigner (Accessibility Designer) engine. This
respectively.
approach is responsible for the visualization of the Web’s
Since WAQM was considered to be tool dependent, there
usability for blind users through colors and graduations [32].
was the need to see if it was possible to prove the opposite
[18]. Therefore, Vigo, et al., in their study [18], wanted to
have similar outcomes, regardless of the evaluation tool 2.1.13 Web interaction environments (WIE)
being used. For this matter, the authors proposed a method to
reach independence of the tools for every possible scenario. Lopes and Carriço proposed, in 2008, a metric that quanti-
A total of 1363 web pages from 15 websites were evaluated fies Web accessibility [33]. It calculates the proportion of
against the WCAG guidelines, using the automated evalua- checkpoints that are violated on a web page [4]. To do so, it
tion tools EvalAccess and LIFT. They used 2 different tools considers a set of checkpoints and, for each of them, it veri-
to understand the behavior of the WAQM metric when the fies if a checkpoint c is successfully evaluated or if it fails
accessibility is measured by different tools. So, they tuned [33]. If it is successfully evaluated, then vc = 1, otherwise
two WAQM parameters (a and b) to obtain independence. vc = 0.

13
Universal Access in the Information Society (2024) 23:411–434 419

This metric’s values have a limited range from zero to presents a quantitative index that measures the accessibil-
one, where one means the web page in question is totally ity of a web page. It uses the WCAG 2.0 as a reference.
accessible and all checkpoints that were evaluated in that It has two different modes to calculate the qualifications:
web page have passed.
∑ • Standard: eXaminator applies all tests. Some of the
vc tests identify errors, while others are responsible to
WIE(p) = (25)
n qualify good practices;
Equation 25 presents the formula for computing the WIE • Strict: eXaminator applies only the set of most secure
metric, where WIE(p) is this metric’s final score for a page tests, i.e. the tests that have less possibilities of creating
p, vc is a variable that assumes 1 if a checkpoint c passes, false positives or false negatives.
otherwise is 0, and n is the number of checkpoints.
The author considers that not all tests have the same
importance, i.e., they need different weights. This means
2.1.14 Conservative, strict and optimistic that it is necessary to first weight the tests to make sure
their relative weight reflects their differences from each
Conservative, Strict and Optimistic are the three web acces- other. The weight calculation is reflected in Eq. 29, and
sibility metrics defined by Lopes, Gomes and Carriço in it is the multiplication between the Confidence of the test
2010 [5]. These metrics are based on the results of a check- and the Value. Both Value and Confidence vary between
point evaluation of an HTML element: PASS, FAIL or 0 and 1, meaning that the weight will always be a value
WARN. For each checkpoint, a PASS result indicates that between 0 and 1. The Value variable depends on the
an HTML document compliance is verified; a FAIL result WCAG conformance levels: level A: V = 0.9 ; level AA:
specifies an HTML document compliance that is not veri- V = 0.5; level AAA: V = 0.1. The Confidence variable ver-
fied; and a WARN result specifies it is impossible to verify ifies, for each test, what procedures are applicable when
the HTML document compliance. The main difference running it and, for each procedure that cannot be verified,
between these three metrics resides in the way they consider the confidence decreases by 0.1.
WARN results. They all contemplate the number of PASS eXaminator uses a matrix with information about each
results and the number of applicable elements to evaluate the test, in particular the Element (E), Situation (S), Note (N),
accessibility results of an automatic accessibility evaluation Tolerance (T) and Fraction (F). The Element identifies one
tool. The conservative metric considers WARN results as or a set of HTML elements and the test is only applied if
failures, the optimistic metric considers them as passes, and the element is present in the web page or if the element
the strict metric does not consider them at all. is all. The Situation identifies one or a set of HTML ele-
These three metrics’ scores range from zero to one, where ments that fulfills a certain condition. The Note is the initial
one means the web page in question is totally accessible. qualification of the test that was applied to the first detected
passed+warned situation. It is an absolute value that varies between 1 and
rateconservative = (26) 10, where 10 means the test classification result is excellent.
applicable
The Tolerance is the error tolerance threshold, i.e., indicates
the maximum number of errors that are allowed to happen
passed
rateoptimistic = (27) in a specific situation. If the number of errors exceeds the
applicable Tolerance, the final test classification decreases by 1 point.
Finally, the Fraction variable represents the quantity of
passed errors that decrease the initial note by 1.
ratestrict = (28)
applicable-warned The final score of a web page is the ratio between the
sum of all tests by the sum of their respective weights. This
Equations 26, 27 and 28 present the formulas for comput- metric’s result uses a scale from 1 to 10, where 1 represents
ing the Conservative, Optimistic and Strict metrics, respec- a very bad accessibility level and 10 means otherwise.
tively, where passed is the number of passes, applicable is
the number of applicable elements, warned is the number P=C∗V (29)
of warnings. Equation 29 presents the formula for computing the Test
Weight, where P is the final weight score, C is the confi-
2.1.15 eXaminator dence of the test, and V is the value of the test.
Afterwards, there are three different tests that can be
According to Benavidez [34], this metric classifies specific applied: True/False tests, Test of proportional type and Test
situations that can be positive or negative. eXaminator of decreasing type [34].

13
420 Universal Access in the Information Society (2024) 23:411–434

The score result is bounded from zero to one. The closer

R=N∗P (30)
the result is to zero, the less severe the barrier is.
Equation 30 presents the formula for computing the True/ �
∑k
False tests, where R is the result of the test, N is the Note, freq(bi)2 n(bi)
d=1 Pc
and P is the weight of the test. WABS = �
∑k
∗ ∗�
∑ (33)
2
N k 2
d=1
b(pc) d=1
(b)
R = N ∗ (1 − S∕E) ∗ P (31)
Equation 31 presents the formula for computing the Pro- Equation 33 presents the formula for computing the WABS
portional Type tests, where R is the result of the test, N is metric, where freq(bi) is the frequency of a barrier bi, d is
the Note, S is the Situation, E is the Element, and P is the the web page that is being checked, k is the last web page to
test weight. be tested, b(pc) is the total number of barriers that violate
the same priority level pc, n(bi) is the number of web pages
R = (N − (S − T)∕F) ∗ P (32) the barrier bi appears in, N is the total number of web pages,
Pc is the weight of the priority level of a checkpoint, and b
Equation 32 presents the formula for computing the Decreas- is the total number of barriers of that web page d.
ing Type tests, where R is the result of the test, N is the Note,
S is the Situation, T is the Tolerance, P is the test weight,
2.1.17 Summary
and F is the Fraction.
To present an overview about all the 19 web accessibility
metrics that were studied and reviewed from the literature,
2.1.16 Web accessibility barrier severity (WABS)
Table 1 gathers important aspects about each metric.
We can also analyze the complexity of the metrics from
Instead of being concerned with conformance to priority lev-
the perspective of collecting data for their application and
els, this metric focuses on the barriers that limit the accessi-
resources needed for their computation.
bility based on their severity. It ranks each web accessibility
Almost all metrics use the number of barriers or passes
barrier found in all web pages of a data set of websites [15].
found through an accessibility evaluation. The single excep-
Each barrier has an associated numerical weight accord-
tion is the WIE that relies on checkpoint compliance, with-
ing to its severity and impact on the accessibility level. The
out using more detailed information at the element level,
authors define the barriers’ weights as suggested in [18].
therefore being the easiest to compute in what concerns data
The final result of this metric represents a value for each
collection.
barrier based on the priority class it violates and based on
A significant number of metrics also rely on identifying
the web page that is being assessed, i.e. the final score will
the potential barriers. This is needed for the Failure Rate
relate to a specific barrier in a certain web page of a website.
metric, and for all the other metrics that use the Failure Rate
It was possible to emphasize two aspects when using this
or a similar metric: UWEM, A3, WAB, SAMBA, WAQM
metric. First, each barrier is unique and has different prop-
and WABS. The Conservative, Optimistic and Strict also
erties, even if it belongs to the same priority level as other
require potential barriers (to compute the applicable ele-
barriers, and, second, barriers that belong to priority level
ments), even though they do not use the Failure Rate, and
one are not necessarily more severe compared to the others.
two of them (the Conservative and the Strict) additionally
This metric’s formula covers three different measure-
require the number of warnings. This type of data is tradi-
ments [15]:
tionally made available from accessibility evaluations, either
from automated tools or manual evaluations, with potential
1. the importance of a barrier to the remaining barriers that
barriers being identified from applicable elements that have
violate the same priority level;
been marked as pass or warnings.
2. the importance that barrier has to the web page;
OAM and PM require the number of attributes and ele-
3. the importance of the barrier to all the remaining barri-
ments. SAMBA requires the number of lines in the HTML
ers in all websites.
document. Even though these might not be standard data
resulting from accessibility evaluations, they are not hard to
The formula considers the frequency of a given barrier in a
obtain from an HTML parser.
certain web page, the total number of barriers that violate
The metrics that potentially raise issues when considering
the same priority level in a specific web page, the total num-
data collection include:
ber of web pages where a given barrier appears in, the total
number of web pages and the total number of barriers that
• SAMBA - requires an expert evaluation of the issues
appear in a given web page.
identified by an automated tool;

13
Universal Access in the Information Society (2024) 23:411–434 421

Table 1 19 studied web accessibility metrics

Metric References Year of Applicability Range of values
publica-
tion

Failure Rate (FR) [16] 2000 Page level 0-1, where 0 is totally accessible
Unified Web Evaluation Methodology [19] 2006 Page/website level 0-1, where 0 is totally accessible
(UWEM 1.0)
A3 [22] 2006 Page level 0-1, where 0 is totally accessible
Web Accessibility Barriers (WAB) by Hackett [23] 2004 Website level the higher its score, the worse the accessibility
level
Web Accessibility Barriers (WAB) by Par- [14] 2005 Website level the higher its score, the worse the accessibility
manto and Zeng level
Web Accessibility Evaluation Metric (WAEM) [29] 2017 Website level N/A
Web Accessibility Quantitative Metric [18] 2007 Page level 0-100, where 100 is totally accessible
(WAQM)
Web Interaction Environments (WIE) [33] 2008 Page level 0-1, where 1 is totally accessible
Conservative [5] 2010 Page level 0-1, where 1 is totally accessible
Strict [5] 2010 Page level 0-1, where 1 is totally accessible
Optimistic [5] 2010 Page level 0-1, where 1 is totally accessible
Overall Accessibility Metric (OAM) [25] 2005 Page level the higher its score, the worse the accessibility
level
Page Measure (PM) [26] 2007 Page level the higher its score, the worse the accessibility
level
SAMBA [27] 2007 Page level N/A
Reliability Aware Web Accessibility Experi- [12] 2018 Website level N/A
ence Metric (RA-WAEM)
Barrier Impact Factor (BIF) [35] 2011 Assistive tech- the higher its score, the higher the impact of the
nologies/Disability barrier on an assistive technology/disability
types
Navigability and Listenability [32] 2005 Page level N/A
Web Accessibility Barrier Severity (WABS) [15] 2017 Accessibility barriers 0-1, where the closer the score is to 0, the less
severe the barrier is
eXaminator [34] 2012 Page level 1-10, where 1 is totally inaccessible

• WAEM and RA-WAEM - require user experience evalu- computing resources than the other metrics. The time com-
ations by users with disabilities for obtaining the PUEXO plexity of the other metrics is linear. From these, more com-
pairs; and puting resources are required by A3 and WABS, for comput-
• BIF - requires classification of the barriers (errors in BIF) ing exponentiation and square root operations, respectively.
by assistive technology;

In summary, from a data collection perspective, the data 3 Comparing accessibility metrics
required by most metrics should be easily accessible from
accessibility evaluations. The exception is data that requires To compare a subset of the metrics presented in the previous
human intervention, be it from experts that classify out- section, we planned a study based on a large-scale evaluation
comes of evaluations, or from user tests. of web pages.
A different type of issues, not related to data collection, is
raised by eXaminator. Tests in eXaminator need the defini- 3.1 Methodology
tion of multiple parameters. Even though this needs to be
done only once, these parameters are not available when In this section we present the methodology followed in the
data is collected through different tools or methodologies. study. We introduce the automated evaluation tool we used
In what concerns the complexity of computing the metric, and the data set that was evaluated. We then describe what
only one metric group stands out. WAEM and RA-WAEM, metrics we were able to compare, based on the constraints
by running a vector optimization procedure, require more of running a large-scale study, which prevented us from

13
422 Universal Access in the Information Society (2024) 23:411–434

comparing metrics that rely on human judgment. We also Table 2 Number of web pages Top-level Domain Number of
describe how the metrics were implemented, based on the by top-level domain web pages
results provided by the used tool. We conclude this section
.asia 322,208
with a description of how we analyzed the data.
.au 174,371
.com 166,388
3.1.1 QualWeb and evaluation data set
.org 154,154
.pt 139,807
To run a study based on a large number of accessibility
.gov 130,689
evaluations of web pages, the only viable option is to con-
.info 125,233
duct automated accessibility assessments [36]. To run those
.uk 122,543
evaluations, we used QualWeb2 [37].
.es 120,085
QualWeb is an automated web accessibility engine. It
.it 120,085
performs a set of tests on a web page that check conformance
.fr 113,513
with ACT-Rules3 and WCAG Techniques 2.14. For each web
.de 109,135
page evaluated, we extracted from the QualWeb report the
.us 106,432
number of elements that passed, the number of elements
.news 105,196
that failed and the number of warnings for each test. We also
.eu 102,995
collected information about the test being applicable or not
.net 102,671
to the web page. This information is useful since we want
.br 100,353
to consider only applicable tests when computing the met-
.edu 93,956
rics. When an applicable test passes, it means that it has no
failures nor warnings. If an applicable test returns no errors,
but has at least one warning, the test outcome is “warning”.
If the test has at least one element that fails, the test fails. and the lowest was zero. The highest number of errors on a
As previously mentioned, QualWeb has two types of website was 878,776 and the lowest zero. The ACT-Rules
tests: ACT-rules tests, which test a web page against a set violated in most pages were: ACT-R76 (Text has enhanced
community approved checks; and WCAG techniques, which contrast), ACT-R37 (Text has minimum contrast) and ACT-
test a web page against the tool developer’s interpretation of R12 (Link has accessible name). The ACT-Rules with the
specific WCAG techniques. To ensure that only checks that highest number of errors were ACT-R76 (Text has enhanced
correspond to consensual interpretation of the WCAG are contrast) having 33,109,298 errors.
used and increase the validity of the results, we used only the
outcomes of the ACT-Rules tests in this study. In our study 3.1.2 Applicable metrics
we used the 0.6.1 version of QualWeb, which tested a total
of 72 ACT-Rules. The analysis of the accessibility metrics shows that not
QualWeb was used to evaluate a total of 2,884,498 web all metrics can be studied with this data set. For instance,
pages. The pages were obtained from CommonCrawl5. Com- metrics that require human judgment cannot be considered,
monCrawl is an open corpus of web crawl data, including since it is not viable to produce expert judgment over such
metadata and source of billions of web pages since 2013. We a large set of pages. Therefore, we needed to identify the
used the most recent crawls to obtain the URLs of the pages. ones that could be computed with our data. From the 19
The pages were evaluated in the period from March 2021 to presented metrics, we found that 11 metrics could be com-
September 2021. The 2,884,498 pages correspond to a total puted with our dataset composed from the ACT-Rules evalu-
of 166,311 websites, averaging 21 pages per website. The ation results. Most of the applicable metrics use WCAG 1.0
distribution of pages and websites per top level domain is which considers checkpoints rather than success criteria.
presented in Table 2. Since we used WCAG 2.1 in our accessibility evaluation,
The evaluation found a total of 86,644,426 errors, aver- when computing the accessibility metrics, we will refer to
aging 30 errors per page and 521 errors per website. The the checkpoints as success criteria. Each ACT-Rule has cor-
highest number of errors on a single web page was 15,645 responding success criteria. Each success criterion has an
associated principle and conformance level. Through these
success criteria, it was possible to define which principle(s)
2
http://qualweb.di.fc.ul.pt/evaluator/. and conformance level(s) characterize each test. As one test
3
https://act-rules.github.io/rules/. can have more than one success criterion, it can also have
4
https://www.w3.org/TR/WCAG21/. more than one principle or priority level. This information
5
https://commoncrawl.org/. is required by some of the metrics.

13
Universal Access in the Information Society (2024) 23:411–434 423

Table 3 Constants used to compute the WAQM metric to websites as Vigo, et al. define [17], we decided to addi-
Constants Description Value
tionally use another procedure to convert this metric into a
website metric, as will be described further on.
Nall Number of all tests 72 WIE, Conservative, Optimistic and Strict are four simple
N Number of applicable tests 51 metrics that can be easily applied with our data, as they only
Np Number of Perceivable tests 28 require the number of applied success criteria, the number of
No Number of Operable tests 11 elements, the number of warnings, the number of fails and
Nu Number of understandable tests 7 the number of passes.
Nr Number of Robust tests 11 In what concerns metrics that are applicable to websites,
Npe Number of automatic perceivable tests 26 instead of web pages, we considered WAB by Parmanto and
Noe Number of automatic operable tests 11 Zeng and WAB by Hackett. The two WAB formulas were
Nue Number of automatic understandable tests 6 applied as one requires the priority level and the other one
Nre Number of automatic robust tests 10 the weight of the priority level. Both formulas also calculate
Npw Number of manual perceivable tests 5 the failure rate and consider the total number of web pages
Now Number of manual operable tests 4 a website contains.
Nuw Number of manual understandable tests 1 Other metrics like SAMBA or eXaminator were not con-
Nrw Number of manual robust tests 1 sidered, either because of the lack of information in our data
a Constant 20 or the fact that the metric is semi-automated, which means
b Constant 0,3 that it needs manual intervention. For instance, the indexes
that are computed in SAMBA concern the disability type;
and eXaminator considers information about HTML ele-
The FR metric is the simplest metric to compute. It ments that are evaluated in each page. Yet, we could partially
requires the number of potential and actual problems. For use the WAEM/RA-WAEM metric. Since both WAEM and
each page, the sum of all elements that failed a test and RA-WAEM require users’ intervention as they evaluate pairs
the sum of all elements applicable to the test are computed. of websites, i.e. PUEXO pairs, to be compared to the results
Having both totals for all the tests, it is possible to calculate of the weighted accessibility score computation (Eq. 16), we
the failure rate of the page. It is important to highlight that could only consider the weighted accessibility score (Eq. 15)
some tests might evaluate the same elements of the page. that can be automatically computed. This score is used in the
However, they evaluate different aspects and so they cannot WAEM and RA-WAEM metrics’ process to classify a web-
be counted only once, since we are considering the total site and to compare the results with user classifications, and
number of failures and not the total number of elements that it considers the number of pages a success criteria passes in
failed. For the remaining accessibility metrics that utilize the that website.
number of potential and actual points of failure for success We did not consider the OAM nor the Page Metric met-
criteria, the same logic was applied. rics since they both consider the number of HTML attributes
The WAQM metric is the most complex to compute. It in their formulas. We do not have that information from the
considers the priority level and its weight, the type of the QualWeb reports. Also, we did not consider the two met-
test, i.e., if it is manual or automatic, and the principle(s) of rics by Fukuda et al. [32] since we do not have information
each test. WAQM is computed for each test and its compu- regarding the aspects both formulas take into consideration
tation relies on a number of parameters. Table 3 presents (alt attributes, reaching time of a given element, page size).
the parameters we used. Parameters a and b are constants We could not apply BIF since it needs a table that relates
because “the tuning was not necessary because WAQM the errors that were identified by the accessibility evalua-
proved to be independent of the tool when conducting tion tool with the assistive technologies that are affected by
large-scale evaluations (approx. 1400 pages)” [4]. The other these errors. For this reason, it is not viable to attend to all
parameters were tuned to the QualWeb tool and the ACT- the errors of our 2.8 million web pages sample and identify
Rules it tests. which assistive technologies are affected by them.
It was also possible to compute the UWEM and A3 met- WABS was not considered since it classifies the acces-
rics for each web page, since they both rely on the FR of sibility barriers based on their severity, which means that
each checkpoint of that page. Since both metrics are com- it refers to the severity of a barrier that was identified in a
puted using a weight that is obtained from user feedback, set of websites and their respective web pages. Thus, this
we had to determine this weight according to the priority metric is focused on a specific problem that hinders the
levels, due to time and resources constraints. UWEM already user’s interaction. The final result of this metric would be
calculates a score for each website, by calculating its web a list of barriers that were found in our web pages data set
pages’ average score. Besides applying the UWEM metric and their respective severity scores. For this reason, it is

13
424 Universal Access in the Information Society (2024) 23:411–434

Table 4 Descriptive statistics Metric Average Standard deviation Best score Worst score First quartile Third quartile
for web page metrics
FR 0.0673 0.0780 0 1 0.0201 0.0856
A3 0.6657 0.3203 0 1 0.4077 0.9443
UWEM 0.3842 0.3212 0 0.9997 0.1010 0.800
WAQM 82.8626 19.1529 100 0 76.3289 95.6360
WIE 0.5545 0.1561 1 0 0.4375 0.6667
Conservative 0.3936 0.2316 1 0 0.2018 0.5556
Optimistic 0.6015 0.2314 1 0 0.4453 0.7852
Strict 0.4973 0.2692 1 0 0.2658 0.7209

not possible to correlate metrics that evaluate the accessi- 1. Computing the metric score based on the sum of the
bility of web pages with metrics that evaluate accessibility evaluation results of all the pages of the website; and
barriers. 2. Calculating the average of the metric score for all web
The following list summarizes what metrics were ana- pages of the web site, similar to the UWEM strategy.
lyzed in our study.
Besides analyzing the pairwise similarity obtained from the
• Web page metrics: correlation, we used this information to cluster the corre-
lation scores and find if groups of metrics present similar
– Failure-rate (FR);
behaviors in our data set. For this analysis we used hierarchi-
– Unified Web Evaluation Metric (UWEM);
cal clustering [39].
– A3;
– Web Accessibility Quantitative Metric (WAQM);
– Web Interaction Environments (WIE); 3.2 Results
– Conservative;
– Optimistic; This section presents the results of the metrics comparison
– Strict. study. We begin by presenting an overview of the outcomes
of each metric in the full set of web pages evaluated. We
• Website metrics:
then examine the similarity between metrics across different
contexts: metrics over web pages, and metrics over web-
– Web Accessibility Barriers (WAB) by Hackett;
sites, exploring both ways previously introduced to compute
– Web Accessibility Barriers (WAB) by Parmanto &
a website metric from web page metrics.
Zeng;
– Web Accessibility Evaluation Metric (WAEM).
3.2.1 Descriptive statistics

3.1.3 Metrics comparison and analysis Table 4 presents descriptive statistics of the scores for all
metrics that are applicable at page level.
With our goal being to understand the similarities between Regarding the descriptive statistics for web page metrics,
different accessibility metrics, we computed their correlation we can observe some worthwhile points. The FR metric
pairs. We tested the normality of the data using the Shapiro- average indicates a very optimistic perspective on the acces-
Wilk and Kolmogorov-Smirnov tests. We found that our data sibility of the evaluated web content. Additionally, the stand-
did not follow a normal distribution. Therefore, we used the ard deviation is very small, indicating that the web pages
Spearman correlation in our analysis. Following the recom- scores do not vary much from the average. Also, WAQM
mendations from Statstutor [38], absolute correlation values presents a positive perspective about the accessibility of the
above 0.4 represent moderate or stronger correlation. In our Web, as the average is approximately 83, which is close to
analysis we considered two metrics to have similar results the score that expresses the highest accessibility level. The
when they are at least moderately correlated. UWEM metric is slightly positive concerning the accessi-
It is important to take into consideration the fact that bility, having an average of 0.38 and given the fact that the
some metrics are applicable to websites whereas others are lower the score, the more accessible the web page is. The
applicable to web pages. To be able to compare all metrics, WIE, Optimistic and Strict metrics present an intermediate
the web page metrics were converted to web site metrics via accessibility level average. In contrast, A3 and Conservative
two different approaches: metrics report a negative perspective about the accessibility

13
Universal Access in the Information Society (2024) 23:411–434 425

Table 5 Descriptive statistics Metric Average Standard deviation Best score Worst score First quartile Third quartile
for website metrics, adding the
evaluation results of all website FR 0.0832 0.0836 0 1 0.0296 0.1080
pages
A3 0.8390 0.2713 0 1 0.8301 1.0000
UWEM 0.4728 0.3294 0 0.9997 0.1715 0.8131
WAQM 79.4362 21.7414 100 0 71.7592 94.8497
WIE 0.5176 0.1662 1 0.04 0.400 0.6250
Conservative 0.4327 0.2191 1 0.0006 0.2640 0.5799
Optimistic 0.6366 0.1994 1 0.0006 0.5065 0.7857
Strict 0.5390 0.2410 1 0.0006 0.3515 0.7273
WAB-H 0.4742 0.6927 0 5.8333 0.0263 0.6875
WAB-PZ 0.3053 0.4799 0 4.2 0.0133 0.400
WAEM 4.1765 1.3273 8.68 0.0072 3.3353 5.1446

Table 6 Descriptive statistics Metric Average Standard deviation Best score Worst score First quartile Third quartile
for website metrics, considering
the average of the website FR 0.0859 0.0849 0 1 0.0312 0.1111
pages’ metric scores
A3 0.6744 0.3007 0 1 0.4569 0.9255
UWEM 0.4433 0.3208 0 0.9997 0.1562 0.800
WAQM 77.9130 21.3167 100 0 70.1056 93.1619
WIE 0.5715 0.1562 1 0.0588 0.4645 0.6797
Conservative 0.4457 0.2182 1 0.00055 0.2777 0.5955
Optimistic 0.6470 0.1970 1 0.00055 0.5178 0.7968
Strict 0.5527 0.2384 1 0.00055 0.3688 0.7402
WAB-H 0.4742 0.6927 0 5.8333 0.0263 0.6875
WAB-PZ 0.3053 0.4799 0 4.2 0.0133 0.400
WAEM 4.1765 1.3273 8.68 0.0072 3.3353 5.1446

of the evaluated web pages, as their values are closer to the the evaluation results for all pages of the same website was
inaccessible reference. considered, compared to this metric’s web page version. The
Tables 5 and 6 present descriptive statistics of the scores same did not happen when considering the average of the
of all metrics at the website level. Table 5 shows results page metric results for all pages of the same website, as it
where scores for page level metrics were calculated by add- shows an increase on its average. Conservative, as a website
ing the evaluation results for all pages of the same website. metric, also has a similar negative perspective about web
Table 6 results were calculated averaging the page metric accessibility compared to the same metric applied to web
results for all pages of the same website. pages. The A3 metric for websites, in particular, consider-
In relation to the descriptive statistics for website metrics, ing the average of the website pages’ scores, did not show
some metrics reported consistent behavior while for others a noticeable difference in the average result, compared to
some differences to the web page metrics could be observed. the A3 metric for web pages (around 0.67). Nevertheless, a
The FR average indicates that the accessibility of the evalu- considerable difference between these last two approaches
ated web content is very optimistic, as it was observed in for A3 metric was detected in the website level, concerning
this metric’s web page version. Also, WAQM still presents a website as a web page, having an average of approximately
a more positive perspective about the accessibility of the 0.84. Interestingly, the WAB-H and WAB-PZ metrics reveal
Web, as it was stated in the descriptive analysis of the web differences in their averages that might be justified from the
page metrics. The UWEM metric has slightly increased its worst scores. WAB-H evaluated a website that had a score
average when compared with this metric applied to web of 5.8333, which represents the most inaccessible website.
pages. Nevertheless, it still provides a positive perspective Yet, the WAB-PZ worst score was 4.2, which indicates that
about accessibility. The Optimistic and Strict website met- the accessibility issues are less weighted compared to WAB-
rics’ average also increased, yet they do not show a clear H. Since WAB-PZ, WAB-H and WAEM do not provide a
difference compared to the web pages metrics. The average limited range of values, it is more complicated to define the
of the website scores using the WIE metric decreased when accessibility level by their scores.

13
426 Universal Access in the Information Society (2024) 23:411–434

Table 7 Spearman correlation FR A3 UWEM WAQM WIE Conservative Optimistic Strict

scores for web page
metrics (moderate, strong and FR 1
very strong correlation scores
A3 0.0008 1
are displayed in bold)
UWEM 0.0008 0.8375 1
WAQM 0.0002 −0.0173 -0.0175 1
WIE −0.0006 −0.6342 −0.4963 0.0099 1
Conservative 0.0001 −0.0178 0.0226 0.0061 0.0403 1
Optimistic 0.0004 −0.0209 0.0240 0.0070 0.0456 0.9042 1
Strict 0.0003 −0.0193 0.0231 0.0068 0.0427 0.9706 0.9733 1

bold represents scores that have moderate or higher correlation

As can be observed from Table 1, different metrics have Table 8 Spearman correlation scores for website metrics (moderate,
scores ranging from 0 to 1, others from 0 to 100, and yet oth- strong and very strong correlation scores are displayed in bold)
ers are unbounded. Bounded ranges support easier to interpret WAEM WAB-H WAB-PZ
results, by allowing to compare a score with the limits of the
range. For example, one intuitively expects that an UWEM WAEM 1
score close to zero represents an accessible page, while a score WAB-H −0.1183 1
close to one represents an inaccessible page. With unbounded WAB-PZ −0.1850 0.9858 1
ranges, since there is only one limit, this comparison is not bold represents scores that have moderate or higher correlation
always possible. For example, a WAB score close to zero rep-
resents an accessible page, but what about a score of 1? Or 5?
The data collected in this study, and presented in Tables 4, 5 the accessibility level of the web page. A3 shows an oppo-
and 6, provides not only a reference for the unbounded ranges site behavior. Similarly to A3, UWEM shows a moderate
(the extreme value for WAB-H was 5.83, for WAB-PZ was negative correlation with WIE (𝜌 = −0.4963). This behavior
4.2, and for WAEM was 8.68) but also gives an indication might be explained from the fact that both UWEM and A3
of how metric scores are distributed. This allows us to inter- consider the number of elements that failed while WIE con-
pret values from the metrics more precisely. For example, a siders the number of success criteria that passed in a page.
UWEM score of 0.5 probably represents a web page that is If a page that fails all the success criteria that are tested, also
less accessible than an A3 score of 0.5. The first quartile for fails one element per test, the number of failed elements will
A3 is 0.4077 (considering web page metrics), which is much be similar to the number of failed success criteria.
closer to 0.5 than the first quartile for UWEM (0.101). Interestingly, no other pairs of metrics are correlated.
This means that FR and WAQM are not correlated to any
3.2.2 Web page metrics other metric.

Table 7 presents the Spearman correlation coefficients 3.2.3 Website metrics

between every pair of page metrics.
The highest correlation score obtained was between the In the present study we considered 3 accessibility met-
Strict and Optimistic metrics ( 𝜌 = 0.9733), followed by rics that are exclusively applied at website level: WAEM,
Conservative and Strict metrics (𝜌 = 0.9706) and Optimistic WAB-H and WAB-PZ. Table 8 presents the Spearman cor-
and Conservative metrics (𝜌 = 0.9042). They seem to have relation coefficients between all pairs of metrics.
a very strong positive correlation, since they are all based The WAB metrics show a very strong positive correlation
on the ratio of passed tests over applicable tests, differing as expected (𝜌 = 0.9858), since they share the same formula,
only on how warnings are considered. Possibly, the number with just one little difference: WAB by Hackett considers the
of warnings classified by QualWeb was not high enough to priority level (1, 2 or 3) of the checkpoint (success criterion,
ensure clear differences between the results of each of these in our case), whereas WAB by Parmanto and Zeng consid-
three metrics. ers the weight of the checkpoint priority level ( P1 = 0.8,
A3 shows a strong positive correlation with UWEM P2 = 0.16 and P3 = 0.04). Both WAB metrics are not cor-
( 𝜌 = 0.8375), which is expected since these two metrics related with WAEM.
are similar. Also, A3 has a strong negative correlation with Domain as a web page
WIE ( 𝜌 = −0.6342 ). Since WIE considers the number of
elements that pass, the higher the WIE score, the higher

13
Universal Access in the Information Society (2024) 23:411–434 427

Table 9 Spearman correlation scores for website metrics, considering a domain as a web page (moderate, strong and very strong correlation
scores are displayed in bold)
FR A3 UWEM WAQM WIE Conservative Optimistic Strict WAB-H WAB-PZ WAEM

FR 1
A3 0.1779 1
UWEM 0.4442 0.4131 1
WAQM −0.5423 −0.2140 −0.7285 1
WIE −0.1154 −0.5942 −0.3649 0.2485 1
Conservative −0.0188 −0.3528 −0.0967 0.0333 0.4838 1
Optimistic −0.1336 −0.3519 −0.1329 0.0775 0.4523 0.8759 1
Strict −0.0984 −0.3730 −0.1283 0.0673 0.4898 0.9573 0.9704 1
WAB-H 0.4217 −0.2910 0.5698 −0.5675 −0.0233 −0.0024 −0.0556 −0.0378 1
WAB-PZ 0.4071 −0.2222 0.6457 −0.6249 −0.0525 −0.0098 −0.0591 −0.0434 0.9858 1
WAEM −0.3125 −0.5252 −0.5057 0.4902 0.6566 0.2870 0.2764 0.3044 −0.1183 −0.1850 1

bold represents scores that have moderate or higher correlation

Table 10 Spearman correlation scores for website metrics, considering the average of the web pages’ scores (moderate, strong and very strong
correlation scores are displayed in bold)
FR A3 UWEM WAQM WIE Conservative Optimistic Strict WAB-H WAB-PZ WAEM

FR 1
A3 0.4568 1
UWEM 0.4914 0.8612 1
WAQM −0.5606 −0.7167 −0.7917 1
WIE −0.2018 −0.6053 −0.4411 0.3914 1
Conservative 0.0014 −0.3129 −0.0916 0.0177 0.5604 1
Optimistic −0.1283 −0.3499 −0.1392 0.0748 0.5253 0.8740 1
Strict −0.0910 −0.3567 −0.1332 0.0630 0.5685 0.9536 0.9718 1
WAB-H 0.4083 0.5310 0.6161 −0.5036 −0.2607 −0.0262 −0.0813 −0.0646 1
WAB-PZ 0.3992 0.5957 0.6846 −0.5665 −0.2798 −0.0291 −0.0816 −0.0666 0.9858 1
WAEM −0.3390 −0.5563 −0.5077 0.5369 0.5968 0.2577 0.2583 0.2839 −0.1183 −0.1850 1

bold represents scores that have moderate or higher correlation

Table 9 presents the Spearman correlation coefficients for strong negative correlation with WAQM ( 𝜌 = −0.7285 )
all metrics, with the metrics being computed by considering and positive correlation with both WAB-PZ ( 𝜌 = 0.6457)
the domain as a page with the evaluation results of all the and WAB-H ( 𝜌 = 0.5698). In contrast to UWEM, WAQM
pages of the domain. has negative correlations with WAB-PZ (𝜌 = −0.6249) and
As it would be expected, and matching to the web pages WAB-H (𝜌 = −0.5675). FR shows a positive moderate cor-
correlation scores, the Conservative metric has a very relation with UWEM (𝜌 = 0.4442 ), with WAB by Hackett
strong and positive correlation with Optimistic and Strict: (𝜌 = 0.4217) and with WAB-PZ (𝜌 = 0.4071). It presents a
𝜌 = 0.8759 and 𝜌 = 0.9573, respectively. Also, as observed moderate negative correlation with WAQM (𝜌 = −0.5423).
in the web pages scores, Strict and Optimistic metrics still Average of the web pages’ scores
have the same strong positive correlation ( 𝜌 = 0.9704 ). Table 10 presents the Spearman correlation coefficients
WAEM appears to have a strong positive correlation with for all metrics, with the domain metric being computed by
WIE ( 𝜌 = 0.6566 ). A3 has a moderate positive correla- averaging the metrics of the pages belonging to the domain.
tion with UWEM ( 𝜌 = 0.4131) and it is negatively corre- As expected, the Conservative metric has a very strong
lated with WAEM (𝜌 = −0.5252) and WIE (𝜌 = −0.5942). and positive correlation with Optimistic and Strict:
UWEM has a moderate negative correlation with WAEM 𝜌 = 0.8740 and 𝜌 = 0.9536 , respectively. These values are
( 𝜌 = −0.5057 ), while WAQM has a moderate positive very similar to the ones obtained when considering a domain
correlation with WAEM ( 𝜌 = 0.4902 ). UWEM shares a as a web page. Also, as observed in the web pages scores,

13
428 Universal Access in the Information Society (2024) 23:411–434

Strict and Optimistic metrics have the same strong posi-

tive correlation (𝜌 = 0.9718). Unlike results obtained from
considering a domain as a web page, A3 has a very strong
positive correlation with UWEM ( 𝜌 = 0.8612 ). Since A3
and UWEM are very similar, having a strong correlation,
WAQM also shares similar correlations with both metrics:
𝜌 = −0.7167 and 𝜌 = −0.7917, respectively. The same hap-
pens with WAB-PZ and WAB-H. Since these two metrics are
strongly correlated, their correlations with WAQM are also
very similar: 𝜌 = −0.5665 and 𝜌 = −0.5036, respectively.
UWEM also has a strong positive correlation with
WAB-PZ ( 𝜌 = 0.6846 ) and WAB-H ( 𝜌 = 0.6161). A3 has
a similar correlation score with WAB-H (𝜌 = 0.5310 ) and
WAB-PZ ( 𝜌 = 0.5957). However, it has a negative corre-
lation with the remaining metrics: moderate with WAEM Fig. 1 Clusters of web page metrics
( 𝜌 = −0.5563), and strong with WIE ( 𝜌 = −0.6053) and
WAQM (𝜌 = −0.7167).
UWEM and WAQM have moderate correlation with is expected that they produce similar outcomes and are
WAEM: 𝜌 = −0.5077 and 𝜌 = 0.5369 , respectively. They clustered together.
both share a moderate to almost moderate correlation with A3 and UWEM also have very similar formulas, which
WIE: 𝜌 = −0.4411 and 𝜌 = 0.3914 WAEM appears to have a justifies the high correlation between their outcomes.
moderate positive correlation with WIE (𝜌 = 0.5968). WIE While A3 and UWEM are based on the ratio of actual
correlation scores with Conservative, Optimistic and Strict and potential barriers, WIE considers pass rates of check-
metrics vary between 0,52 and 0,57. points. Even though those perspectives of measuring the
FR shows a positive correlation, although moderate, with accessibility of web pages are different, the fact is that they
UWEM ( 𝜌 = 0.4914 ) and A3 ( 𝜌 = 0.4568). However, FR seem to be correlated. This might be relevant informa-
presents a negative correlation with WAQM (𝜌 = −0.5606). tion when deciding on using one of these metrics, since
WIE requires information that is easier to obtain and less
3.3 Discussion resources to compute than A3 or UWEM.
WAQM did not form a group with FR, as their distance
To identify the groups of metrics, we used hierarchical clus- is higher than 1. The distance between these two metrics
tering [39] that groups similar metrics into clusters accord- may be significant, since their correlation is almost null.
ing to the correlation matrices. To define the number of With respect to website metrics, in particular interpret-
clusters, we had to cut the clustering tree in order to define ing a website as a web page, the following five clusters
the different clusters. To obtain relevant clusters, we ana- were identified, as represented in Figure 2:
lyzed the dendrograms to determine the best cluster distance
where we would cut the clustering tree. With respect to web • Conservative, Strict and Optimistic;
page clusters, we decided to cut the clustering tree where the • A3, WIE and WAEM;
cluster distance is 1. Concerning the website metrics clus- • FR;
ters, we cut the clustering tree where the cluster distance is • WAB-H, WAB-PZ;
approximately 0.7. With these choices, we aimed at having a • UWEM and WAQM.
reasonable number of clusters, while avoiding clusters with
elements that are too far apart. The cuts are represented in From these groups, only one cluster is similar to the
the dendrograms by the red or yellow horizontal lines. web page metrics’ groups. In fact, the web pages cluster
With regards to the web accessibility metrics’ results, and WIE, A3 and UWEM is similar to one cluster of the web-
concerning the web page metrics, we could find four groups: site metrics (A3, WIE and WAEM), except the fact that the
Conservative, Strict and Optimistic; WIE, A3 and UWEM; UWEM metric is not included into the A3 and WIE group.
FR; and WAQM. The clusters are illustrated in Figure 1. The main difference is the fact that it now includes the
The Conservative, Strict and Optimistic metrics have correlations with the website metrics (WAB-H, WAB-PZ
similar formulas based on the number of passed tests over and WAEM). Besides this inclusion, the WAQM seems to
applicable tests, differing on whether warnings are con- be closer to UWEM, compared to the web page metrics’
sidered passes, fails or not applicable. For this reason, it results. This could be happening because all the web pages
data are grouped together to provide the website final score,

13
Universal Access in the Information Society (2024) 23:411–434 429

Fig. 2 Clusters of website metrics, interpreting a website as a web Fig. 3 Clusters of website metrics, calculating the average of the web
page pages’ scores

modifying the metrics’ behavior. The FR is still distant from Another interesting aspect is the fact that Conservative,
the remaining metrics. WAQM and UWEM calculate the Optimistic and Strict are always in the same independ-
failure rate, which might justify the cluster they are grouped ent group, in all the three approaches we have mentioned.
in. WAB-PZ and WAB-H had a very strong correlation, so Perhaps because the number of warnings of the considered
it was expected they would be part of the same cluster. They domains and web pages is not that significant to the point
share similar formulas that only differ in the way they con- of changing these metrics’ results, since the only differ-
sider the success criterion weight: as the priority level or the ence between these three metrics’ formulas is the way the
weight of the priority level. WIE may relate with WAEM, warnings are considered.
since they both consider when a checkpoint passes: WIE In all metrics’ clusters, the FR metric does not form a
increments one every time a certain checkpoint passes on a group with any other metric, even though some of them
website, while WAEM counts the number of pages where incorporate the failure rate in their formulas, indicat-
a checkpoint passes. For instance, if a certain checkpoint ing that their results do not correlate with the FR met-
passes on a website that only has one web page, it will count ric results. Thus, we can recognize that the metrics that
as one for WIE and also for WAEM. The main difference integrate the FR, consider other important information in
between them is that WIE considers the total number of their scope that makes them different from the FR. For
checkpoints, while WAEM not only considers the number of instance, WAQM is more complex than FR, taking into
website pages but also the weight of the checkpoint. account the principles, the type and the priority levels of
Regarding the average of the web pages’ scores, we iden- success criteria. For this reason, when opting for one of
tified the following 6 clusters (Figure 3): these metrics, FR seems a more straightforward and easy
choice, but it should be kept in mind that the other metrics
• Conservative, Strict and Optimistic; may give more relevant information.
• FR;
• A3, UWEM and WAQM;
• WIE;
4 Metric validity
• WAEM;
• WAB-H and WAB-PZ.
Our analysis allows detecting what metrics produce simi-
lar outcomes, but it does not reflect the validity of the
The above groups show that Conservative, Strict and
outcomes. This is a result of the fact that the metrics were
Optimistic metrics belong to the same cluster, as seen
computed from a set of automated evaluation results.
before. However, there are two main differences when
Automated evaluation tools are only capable of identify-
comparing with the other website metrics approach: (1) A3
ing a subset of the real accessibility problems in a web
is now part of the UWEM and WAQM cluster; (2) WAEM
page. Therefore, a page that gets a good outcome on an
and WIE do not belong to the same cluster, as their cluster
automated accessibility evaluation might have undetected
distance is higher than the previously defined threshold.
accessibility problems. This means that a metric computed

13
430 Universal Access in the Information Society (2024) 23:411–434

Table 11 Accessible and inaccessible pages

Web pages

Accessible web pages Inaccessible web pages

https://wsnet2.colostate.edu/cwis24/acns/web-accessibility/Example https://wsnet2.colostate.edu/cwis24/acns/web-accessibility/
Example/index-inaccessible.html
https://www.w3.org/WAI/demos/bad/after/home.html https://www.w3.org/WAI/demos/bad/before/home.html
https://www.washington.edu/accesscomputing/AU/after.html https://www.washington.edu/accesscomputing/AU/before.html
https://www.w3.org/WAI/demos/bad/after/template.html https://www.w3.org/WAI/demos/bad/before/template.html

Table 12 Web page metrics scores for assessing the metrics’ validity (scores that do not reflect the accessibility of the web page are displayed in
bold)
FR A3 UWEM WIE Conservative Optimistic Strict WAQM

Accessible
https://wsnet2.colostate.edu/cwis24/acns/web-accessibil 0,00218 0 0 0,875 0,8297 0,9454 0,93827 98,823
ity/Example
https://www.w3.org/WAI/demos/bad/after/home.html 0,01954 0,02192 0,0057 0,6 0,3909 0,70684 0,5714 89,005
https://www.washington.edu/accesscomputing/AU/after. 0,00998 0 0 0,7333 0,525 0,6367 0,5910 90,667
html
https://www.w3.org/WAI/demos/bad/after/template.html 0,0185 0,0192 0,0057 0,6 0,384 0,7159 0,5746 89,005
Inaccessible
https://wsnet2.colostate.edu/cwis24/acns/web-accessibil 0,271 0,997 0,967 0,308 0,1050 0,2514 0,1230 33,2218
ity/Example/index-inaccessible.html
https://www.w3.org/WAI/demos/bad/before/home.html 0,1453 0,9998 0,939 0,267 0,1738 0,444 0,238 38,6422
https://www.washington.edu/accesscomputing/AU/before. 0,0704 0,995 0,915 0,6154 0,7605 0,901 0,885 51,7320
html
https://www.w3.org/WAI/demos/bad/before/template.html 0,1448 0,999 0,9518 0,2667 0,1785 0,468 0,251 35,420

bold represents scores that do not reflect the accessibility

on that evaluation might indicate an accessibility level that Table 13 Website metrics scores for assessing the metrics’ valid-
is better than the reality. ity (scores that do not reflect the accessibility of the website are dis-
played in bold)
WAEM WAB-PZ WAB-H
4.1 Methodology
Accessible Domains
To investigate what metrics might better reflect the actual wsnet2.colostate.edu 11.06 0.0008 0.007
accessibility level of web pages, we conducted a further www.w3.org 4.430 0.001 0.006
analysis. Since it was not feasible to conduct manual www.washington.edu 8.860 0.001 0.011
assessments of the accessibility of the large data set, we Inaccessible Domains
compiled a small data set composed of web pages that wsnet2.colostate.edu 5.440 0.83 1.10
are published online with the purpose of demonstrating www.w3.org 1.920 0.406 0.521
good and bad accessibility practices. Table 11 presents www.washington.edu 3.840 0.811 1.042
the web pages we considered in our analysis. Two pairs
bold represents scores that do not reflect the accessibility
of web pages were developed by universities while the
other two pairs are part of the Before and After Demon-
stration (BAD)6 published by the W3C Web Accessibil- examples of how web content should be designed to be
ity Initiative. They have been created mostly for educa- accessible, but also how it could be designed in an inacces-
tional purposes, providing instructors with ready to access sible way. One limitation common to all the pages is their
age. All of them were developed prior to the publication
of WCAG 2.1. Therefore, some of the criteria introduced
6
https://www.w3.org/WAI/demos/bad/Overview.html.
in the WCAG 2.1 will not have been explicitly explored in

13
Universal Access in the Information Society (2024) 23:411–434 431

these examples. In our analysis, we computed the metrics Since the three website accessibility metrics do not have a
outcome for all the pages in Table 11, and analyzed the range of scores limited by two values, the level of accessibil-
accessibility level they reported the pages to have. ity of a certain website becomes uncertain. The main con-
clusion we can take from the WAB metric is that the higher
4.2 Results the score, the more inaccessible the website is. Neverthe-
less, it is also possible to detect that the accessible domains’
Table 12 presents the scores of each page level metric for scores are really close to 0, which indicates the domains are
each of the pages used to assess the validity of the metrics. more accessible. In addition, and in contrast to the acces-
Table 13 presents the scores for website metrics. To avoid the sible domains, the inaccessible domains’ scores are higher
canceling effect in these metrics of having a website with the and close to 1.
same number of accessible and inaccessible pages we split By observing Tables 5 or 6, the accessible scores are
each website in two websites: a good website with the acces- in the first quartile, while the inaccessible scores belong
sible pages and a bad website with the inaccessible pages. to the third quartile, indicating that WAB-PZ and WAB-H
may have an appropriate representation of the accessibility.
4.2.1 Web page metrics Nevertheless, the authors of these two WAB metrics [14,
23] performed a study [40] where they refer to the mean-
ing of the WAB scores accessibility level. For instance, for
The results of this experiment show that the FR metric pro- those websites with a WAB score of 5.5 or less, the web-
duces similar scores when evaluating accessible and inac- site is more accessible as it “has better conformance to the
cessible web pages. Since 1 means that the web page is WCAG” [40]. Therefore WAB scores higher than 5.5 indi-
completely inaccessible and 0 means otherwise, we were cate more accessibility barriers, and so, a worse accessibility
expecting to have values close to 1 for the inaccessible web level. Comparing this information to the obtained scores of
pages and close to 0 for the accessible web pages. The FR our study, we can see that all websites’ WAB scores vary
scores for all the accessible web pages seem to be coherent between 0.0069 and 1.0998. These scores indicate that all
and close to 0. However, all the inaccessible web pages also websites (including the inaccessible domains) tend to have
have low values, indicating a positive accessibility level. a small number of accessibility barriers.
WIE, Conservative, Optimistic and Strict metrics exhibit As for the WAEM, the higher this metric’s score, the
a score close to 1 for the same inaccessible web page, which more accessible the website is. Consequently, we cannot
means that this page is close to be completely accessible. define whether a website is accessible or inaccessible. This
Also, these metrics’ scores for this particular inaccessible metric seems to be the only one with incoherent results as
page are higher than some accessible pages’ scores. This the www.w3.org accessible domain presents a lower score
means that the inaccessible page is more accessible than compared to the wsnet2.colostate.edu inaccessible domain.
some accessible pages, according to WIE, Conservative,
Optimistic and Strict metrics’ results. The remaining scores 4.3 Discussion
for these metrics seem to be coherent, except for the Con-
servative metric that classifies two accessible web pages as To define which metric is the most suitable option, it is
inaccessible, by showing scores close to some of the inac- important to analyze their results regarding the accessible
cessible pages’ scores. and inaccessible web pages’ and domains’ evaluations.
A3, UWEM and WAQM are the only three metrics that FR seems to have coherent scores for all the accessible
demonstrated coherent scores for all accessible and inac- web pages. This means that all the accessible web pages
cessible web pages. The WAQM metric shows values close have expected results. However, all the inaccessible web
to 100 for all the accessible web pages. For the inaccessible pages have low scores, indicating that these web pages are
pages, this metric varies from around 33 to 51, which is not accessible when they are not.
close to 0, but still lower than the scores of the accessible WIE, Conservative, Strict and the Optimistic metrics
pages. A3 and UWEM exhibit the correct behavior as all always fail to assess the accessibility level of the inaccessi-
the accessible pages scores are close to 0 and the inacces- ble web page https://w ww.w
ashin gton.e du/a ccess compu ting/
sible pages scores are close to 1. Nevertheless, A3 metric AU/before.html, showing high scores that indicate the web
scores for inaccessible web pages are closer to 1 compared page is accessible. Also, Conservative assigns a score below
to UWEM metric scores for the same web pages. 0.5 to the https://www.w3.org/WAI/demos/bad/after/home.
html accessible web page, which means that this page is not
4.2.2 Website metrics accessible.
Regarding the website metrics, it was possible to state
that WAB-PZ and WAB-H seem to have an optimistic

13
432 Universal Access in the Information Society (2024) 23:411–434

behavior for inaccessible pages, considering these metrics’ correlate with each other. The studied web accessibility met-
ranges defined by Hackett and Parmanto [40]. rics included FR, A3, UWEM, WAQM, WIE, Conservative,
A3, UWEM and WAQM seem to have the expected Optimistic, Strict, WAB-PZ, WAB-H and WAEM.
behavior. Interestingly, these three metrics form a cluster in By analyzing the pairwise correlations we were able to
the analysis of domain accessibility based on the average of identify groups of metrics. When considering the subset
the scores of the web pages belonging to the domain. Hence, of metrics that are applicable at page level, we identified
whenever there is the need to measure the accessibility level four clusters of distinct metrics. By looking at the full set of
of a website, one of these three metrics can be considered, as metrics applicable at site level, we identified a larger num-
they are all correlated in this specific approach. Regardless ber of groups. This information is relevant when a decision
of the available resources, the UWEM metric is the least between using one metric over another is needed. By know-
resource intensive. ing that the outcomes of two metrics are similar, it becomes
Nevertheless, if we investigate deeper into the validity possible to choose the one that is less resource intensive, or
analysis results of each of these metrics, we can detect two from which it is easier to obtain the data required to compute
important aspects that will clarify which metric seems to the metric, for instance.
have the best performance: (1) WAQM metric scores vary Additionally, we ran an experiment with a small number
between 0 and 100 where 0 means the resource is totally of web pages with known levels of accessibility to assess
inaccessible, and the scores of the inaccessible pages are the validity of the different metrics. Even though the set
not close to 0. Instead, they are above 33, which indicates of pages was small, and the metrics were computed from
that this metric is not that discriminating regarding those the outcomes of an automated tool (i.e., not all accessibility
inaccessible web pages; (2) UWEM and A3 have both inac- problems were caught), we were able to identify which met-
cessible and accessible scores close to 1 and 0, respectively, rics were consistent with the expected levels of accessibility
which indicates a correct and consistent behavior. Still, A3 of the pages, and which were not. This information can be
metric scores for inaccessible pages can be more discrimi- also relevant in assisting which metrics to employ.
nating compared to UWEM, as they are all closer to 1.
Acknowledgements This work was supported by FCT through
In conclusion, although UWEM is less resource intensive, the LASIGE Research Unit, ref. UIDB/00408/2020 and ref.
A3 provides more discriminating scores, being the most UIDP/00408/2020.
valid metric in this study.
Funding The research leading to these results received funding from
FCT through the LASIGE Research Unit under Grant Agreements ref.
UIDB/00408/2020 and ref. UIDP/00408/2020.
5 Limitations
Data availability statement The datasets generated during and/or ana-
We acknowledge the accessibility evaluation reports are the lyzed during the current study are available from the corresponding
author on reasonable request.
result of an automated tool and that this type of tools is lim-
ited in the scope of the accessibility problems they can test Declarations
[36, 41]. Given that our main objective is to compare web
accessibility metrics, and that all metrics compared were Conflict of interest On behalf of all authors, the corresponding author
applied to the same dataset, we believe the impact of this states that there is no conflict of interest.
limitation to not be significant. However, for the part of the Open Access This article is licensed under a Creative Commons Attri-
study that checks the validity of the metrics, this limitation bution 4.0 International License, which permits use, sharing, adapta-
can be significant, since it is probable that several acces- tion, distribution and reproduction in any medium or format, as long
sibility problems in the web pages have not been identified. as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
Furthermore, in what relates to the validity study, we were made. The images or other third party material in this article are
acknowledge the sample size is limited and the results pre- included in the article's Creative Commons licence, unless indicated
sented are just indicative. A study with further web pages is otherwise in a credit line to the material. If material is not included in
needed to assess the generalizability of these findings. the article's Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
6 Conclusion

This article compared eleven accessibility metrics by com-

puting them over a dataset of nearly three million web pages
evaluated by the QualWeb automated accessibility evalua-
tion tool. The main goal was to understand if these metrics

13
Universal Access in the Information Society (2024) 23:411–434 433

References Association for Computing Machinery, New York, NY, USA

(2008). https://doi.org/10.1145/1456536.1456551
14. Parmanto, B., Zeng, X.: Metric for web accessibility evaluation.
1. W3C: Introduction to Web Accessibility. (2005). Accessed in 28
J. Am. Soc. Inform. Sci. Technol. 56(13), 1394–1404 (2005)
of December of 2021. https://www.w3.org/WAI/fundamentals/
15. Abuaddous, H.Y., Jali, M.Z., Basir, N.: Quantitative metric for
accessibility-intro/
ranking web accessibility barriers based on their severity. J.
2. Henry, S.L.: Web Content Accessibility Guidelines (WCAG)
Inform. Commun. Technol. 16(1), 81–102 (2017)
Overview. (2021). Accessed in 28 of December of 2021. https://
16. Sullivan, T., Matson, R.: Barriers to use: Usability and content
www.w3.org/WAI/standards-guidelines/wcag/
accessibility on the web’s most popular sites. In: Proceedings
3. Vigo, M., Brajnik, G., Connor, J.O.: Research report on web
on the 2000 Conference on Universal Usability. CUU ’00, pp.
accessibility metrics. In: Vigo, M., Brajnik, G., eds., J.O.C. (eds.)
139–144. Association for Computing Machinery, New York,
W3C WAI Symposium on Website Accessibility Metrics, First
NY, USA (2000). https://doi.org/10.1145/355460.355549
public working draft edn. W3C WAI Research and Development
17. Vigo, M., Brajnik, G., Arrue, M., Abascal, J.: Tool independ-
Working Group (RDWG) Notes. W3C Web Accessibility Initia-
ence for the web accessibility quantitative metric. Disabil.
tive (WAI), ??? (2012). http://w ww.w 3.o rg/T
R/a ccess ibili ty-m
etri
Rehabil. Assist. Technol. 4(4), 248–263 (2009). https://doi.org/
cs-report
10.1080/17483100902903291
4. Vigo, M., Brajnik, G.: Automatic web accessibility metrics:
18. Vigo, M., Arrue, M., Brajnik, G., Lomuscio, R., Abascal, J.:
Where we are and where we can go. Interacting with Computers
Quantitative metrics for measuring web accessibility. In: Pro-
23(2), 137–155 (2011) https://arxiv.org/abs/https://academic.oup.
ceedings of the 2007 International Cross-Disciplinary Confer-
com/iwc/article-pdf/23/2/137/2238626/iwc23-0137.pdf. https://
ence on Web Accessibility (W4A), pp. 99–107. Association for
doi.org/10.1016/j.intcom.2011.01.001
Computing Machinery, New York, NY, USA (2007). https://doi.
5. Lopes, R., Gomes, D., Carriço, L.: Web not for all: A large scale
org/10.1145/1243441.1243465
study of web accessibility. In: Proceedings of the 2010 Inter-
19. Velleman, E., Strobbe, C., Koch, J., Velasco, C.A., Snaprud, M.:
national Cross Disciplinary Conference on Web Accessibility
A unified web evaluation methodology using wcag. In: Stepha-
(W4A). W4A ’10. Association for Computing Machinery, New
nidis, C. (ed.) Universal Access in Human-Computer Interac-
York, NY, USA (2010). https://d oi.o rg/1 0.1 145/1 80598 6.1 80600 1
tion. Applications and Services, pp. 177–184. Springer, Berlin,
6. Kimmons, R.: Open to all? nationwide evaluation of high-priority
Heidelberg (2007)
web accessibility considerations among higher education web-
20. Freire, A.P., Power, C., Petrie, H., Tanaka, E.H., Rocha, H.V.,
sites. J. Comput. Higher Educ. (2017). https://doi.org/10.1007/
Fortes, R.P.: Web accessibility metrics: Effects of different
s12528-017-9151-3
computational approaches. In: International Conference on
7. Acosta-Vargas, G., Acosta-Vargas, P., Jadán-Guerrero, J., Salva-
Universal Access in Human-Computer Interaction, pp. 664–
dor-Ullauri, L., Gonzalez, M.: Improvement of accessibility in
673 (2009). https://d oi.o rg/1 0.1 007/9 78-3-6 42-0 2713-0_7 0.
medical and healthcare websites. In: Nunes, I.L. (ed.) Advances
Springer
in Human Factors and System Interactions, pp. 266–273. Springer,
21. Martínez, A.B., Juan, A.A., Álvarez, D., del Carmen Suárez,
Cham (2021)
M.: Wab*: A quantitative metric based on wab. In: Gaedke, M.,
8. Snaprud, M., Sawicka, A.: Large scale web accessibility evalua-
Grossniklaus, M., Díaz, O. (eds.) Web Engineering, pp. 485–488.
tion - a european perspective. In: Stephanidis, C. (ed.) Universal
Springer, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-
Access in Human-Computer Interaction. Applications and Ser-
3-642-02818-2_44
vices, pp. 150–159. Springer, Berlin, Heidelberg (2007). https://
22. Bühler, C., Heck, H., Perlick, O., Nietzio, A., Ulltveit-Moe, N.:
doi.org/10.1007/978-3-540-73283-9_18
Interpreting results from large scale automatic evaluation of web
9. Costa, D., Fernandes, N., Neves, S., Duarte, C., Hijón-Neira, R.,
accessibility. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karsh-
Carriço, L.: Web accessibility in africa: A study of three african
mer, A.I. (eds.) Computers Helping People with Special Needs,
domains. In: Kotzé, P., Marsden, G., Lindgaard, G., Wesson, J.,
pp. 184–191. Springer, Berlin, Heidelberg (2006). https://doi.org/
Winckler, M. (eds.) Human-Computer Interaction - INTERACT
10.1007/11788713_28
2013, pp. 331–338. Springer, Berlin, Heidelberg (2013)
23. Hackett, S., Parmanto, B., Zeng, X.: Accessibility of internet web-
10. Sirithumgul, P., Suchato, A., Punyabukkana, P.: Quantitative
sites through time. In: Proceedings of the 6th International ACM
evaluation for web accessibility with respect to disabled groups.
SIGACCESS Conference on Computers and Accessibility. Assets
In: Proceedings of the 2009 International Cross-Disciplinary
’04, pp. 32–39. Association for Computing Machinery, New York,
Conference on Web Accessibililty (W4A). W4A ’09, pp. 136–
NY, USA (2003). https://doi.org/10.1145/1028630.1028638
141. Association for Computing Machinery, New York, NY,
24. Brajnik, G., Vigo, M.: Automatic web accessibility metrics:
USA (2009). https://doi.org/10.1145/1535654.1535687
Where we were and where we went. In: Web Accessibility, pp.
11. Freire, A.P., Bittar, T.J., Fortes, R.P.M.: An approach based on
505–521 (2019). https://doi.org/10.1007/978-1-4471-7440-0_27
metrics for monitoring web accessibility in brazilian municipal-
25. Bailey, J., Burd, E.: Tree-map visualisation for web accessibility.
ities web sites. In: Proceedings of the 2008 ACM Symposium
In: 29th Annual International Computer Software and Applica-
on Applied Computing. SAC ’08, pp. 2421–2425. Association
tions Conference (COMPSAC’05), vol. 1, pp. 275–2802 (2005).
for Computing Machinery, New York, NY, USA (2008). https://
https://doi.org/10.1109/COMPSAC.2005.161
doi.org/10.1145/1363686.1364259
26. Bailey, J., Burd, E.: Towards more mature web maintenance prac-
12. Song, S., Bu, J., Shen, C., Artmeier, A., Yu, Z., Zhou, Q.: Reli-
tices for accessibility. In: 2007 9th IEEE International Workshop
ability aware web accessibility experience metric. In: Proceed-
on Web Site Evolution, pp. 81–87 (2007). https://d oi.o rg/1 0.1 109/
ings of the 15th International Web for All Conference. W4A
WSE.2007.4380248
’18. Association for Computing Machinery, New York, NY,
27. Brajnik, G., Lomuscio, R.: Samba: A semi-automatic method for
USA (2018). https://doi.org/10.1145/3192714.3192836
measuring barriers of accessibility. In: Proceedings of the 9th
13. Freire, A.P., Fortes, R.P.M., Turine, M.A.S., Paiva, D.M.B.: An
International ACM SIGACCESS Conference on Computers and
evaluation of web accessibility metrics based on their attributes.
Accessibility. Assets ’07, pp. 43–50. Association for Computing
In: Proceedings of the 26th Annual ACM International Confer-
ence on Design of Communication. SIGDOC ’08, pp. 73–80.

13
434 Universal Access in the Information Society (2024) 23:411–434

Machinery, New York, NY, USA (2007). https://doi.org/10.1145/ 35. Mirri, S., Muratori, L.A., Salomoni, P.: Monitoring accessibility:
1296843.1296853 Large scale evaluations at a geo political level. In: The Proceed-
28. Brajnik, G.: Web accessibility testing: When the method is the ings of the 13th International ACM SIGACCESS Conference on
culprit. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, Computers and Accessibility. ASSETS ’11, pp. 163–170. Asso-
A.I. (eds.) Computers Helping People with Special Needs, pp. ciation for Computing Machinery, New York, NY, USA (2011).
156–163. Springer, Berlin, Heidelberg (2006) https://doi.org/10.1145/2049536.2049566
29. Song, S., Wang, C., Li, L., Yu, Z., Lin, X., Bu, J.: Waem: A web 36. Lazar, J., Goldstein, D., Taylor, A.: Ensuring Digital Accessibility
accessibility evaluation metric based on partial user experience Through Process and Policy. Morgan kaufmann, ??? (2015)
order. In: Proceedings of the 14th Web for All Conference on The 37. Fernandes, N., Costa, D., Neves, S., Duarte, C., Carriço, L.: Eval-
Future of Accessible Work. W4A ’17. Association for Computing uating the accessibility of rich internet applications. In: Proceed-
Machinery, New York, NY, USA (2017). https://doi.org/10.1145/ ings of the International Cross-Disciplinary Conference on Web
3058555.3058576 Accessibility. W4A ’12. Association for Computing Machinery,
30. Battistelli, M., Mirri, S., Muratori, L.A., Salomoni, P.: Measur- New York, NY, USA (2012). https://doi.org/10.1145/2207016.
ing Accessibility Barriers on Large Scale Sets of Pages. (2011). 2207019
Accessed in 28 of December of 2021. https://www.w3.org/WAI/ 38. Statstutor: Spearman’s Correlation. (2021). Accessed in 28 of
RD/2011/metrics/paper2/ December of 2021. https://www.statstutor.ac.uk/resources/uploa
31. Vigo, M., Abascal, J., Aizpurua, A., Arrue, M.: Attaining Metric ded/spearmans.pdf
Validity and Reliability with the Web Accessibility Quantitative 39. Wikipedia: Hierarchical Clustering. (2021). Accessed in 28 of
Metric. (2011). Accessed in 28 of December of 2021. https:// December of 2021. https://en.wikipedia.org/wiki/Hierarchical_
www.w3.org/WAI/RD/2011/metrics/paper6/ clustering
32. Fukuda, K., Saito, S., Takagi, H., Asakawa, C.: Proposing new 40. Hackett, S., Parmanto, B.: Homepage not enough when evaluating
metrics to evaluate web usability for the blind. In: CHI ’05 web site accessibility. Internet Research (2009)
Extended Abstracts on Human Factors in Computing Systems, pp. 41. Abascal, J., Arrue, M., Valencia, X.: Tools for web accessibility
1387–1390. Association for Computing Machinery, New York, evaluation. In: Yesilada, Y., Harper, S. (eds.) Web Accessibility: A
NY, USA (2005). https://doi.org/10.1145/1056808.1056923 Foundation for Research, pp. 479–503. Springer, London (2019).
33. Lopes, R., Carriço, L.: The impact of accessibility assessment in https://doi.org/10.1007/978-1-4471-7440-0_26
macro scale universal usability studies of the web. In: Proceed-
ings of the 2008 International Cross-Disciplinary Conference on Publisher's Note Springer Nature remains neutral with regard to
Web Accessibility (W4A), pp. 5–14. Association for Computing jurisdictional claims in published maps and institutional affiliations.
Machinery, New York, NY, USA (2008). https://doi.org/10.1145/
1368044.1368048
34. Benavidez, C.: Libro Blanco de eXaminator, (2012)

The Web - Access and Inclusion For Disabled People - A Formal Investigation
No ratings yet
The Web - Access and Inclusion For Disabled People - A Formal Investigation
56 pages
Download
No ratings yet
Download
26 pages
Web Accessibility Handbook
No ratings yet
Web Accessibility Handbook
109 pages
Digitalization
No ratings yet
Digitalization
27 pages
Web Accessibility 2022
No ratings yet
Web Accessibility 2022
48 pages
Accessibility
No ratings yet
Accessibility
57 pages
(Website) (Design) (Accessibility) - Checklist - D Qatar Web Accessibility Checklist
100% (1)
(Website) (Design) (Accessibility) - Checklist - D Qatar Web Accessibility Checklist
27 pages
Combined Accessibility Validation and Monitoring o
No ratings yet
Combined Accessibility Validation and Monitoring o
21 pages
Sample Accessibility Test Plan
100% (5)
Sample Accessibility Test Plan
4 pages
‎⁨التصميم الجامع 4⁩
No ratings yet
‎⁨التصميم الجامع 4⁩
18 pages
Combined Accessibility Validation and Monitoring of Web Sites and PDF Documents
No ratings yet
Combined Accessibility Validation and Monitoring of Web Sites and PDF Documents
20 pages
The Use of WCAG and Automatic Tools by Computer
No ratings yet
The Use of WCAG and Automatic Tools by Computer
21 pages
Research Report On Web Accessibility Metrics
No ratings yet
Research Report On Web Accessibility Metrics
15 pages
Optimising Webaccessibility Evaluation Population Sourcing Methods For Webaccessibility Evaluation - Yesilada - Vigo - Harper
No ratings yet
Optimising Webaccessibility Evaluation Population Sourcing Methods For Webaccessibility Evaluation - Yesilada - Vigo - Harper
13 pages
Accessibility Analysis of Some Indian Educational Web Portals
No ratings yet
Accessibility Analysis of Some Indian Educational Web Portals
15 pages
Website Accessibility
No ratings yet
Website Accessibility
12 pages
Web Accessibility For Developers 1571157806 2
No ratings yet
Web Accessibility For Developers 1571157806 2
298 pages
Crownpeak Digitalaccessibility Ebook
No ratings yet
Crownpeak Digitalaccessibility Ebook
15 pages
The Must Have Accessibility Handbook-Siteimprove
No ratings yet
The Must Have Accessibility Handbook-Siteimprove
28 pages
WAccess - A Web Accessibility Tool Based On The Latest WCAG 2.2 Guidelines
No ratings yet
WAccess - A Web Accessibility Tool Based On The Latest WCAG 2.2 Guidelines
11 pages
Accessibility Testing - Introduction
No ratings yet
Accessibility Testing - Introduction
12 pages
Gutiérrez y Restrepo 2010
No ratings yet
Gutiérrez y Restrepo 2010
6 pages
Accessibility Evaluation Report by F. Navarrette
No ratings yet
Accessibility Evaluation Report by F. Navarrette
43 pages
Empirical Studies On Web Accessibility of Educational Websites: A Systematic Literature Review
No ratings yet
Empirical Studies On Web Accessibility of Educational Websites: A Systematic Literature Review
30 pages
Submission 2 A, Aa
No ratings yet
Submission 2 A, Aa
7 pages
Web Accessibility For Developers
No ratings yet
Web Accessibility For Developers
294 pages
Research 1
No ratings yet
Research 1
11 pages
Digital Accessibility: Challenges and Opportunities: IIMB Management Review August 2018
No ratings yet
Digital Accessibility: Challenges and Opportunities: IIMB Management Review August 2018
9 pages
Literature Review - Exploring The Accessibility of Websites For Differently Abled Individuals in Sri La
No ratings yet
Literature Review - Exploring The Accessibility of Websites For Differently Abled Individuals in Sri La
4 pages
1 s2.0 S1071581916000409 Main
No ratings yet
1 s2.0 S1071581916000409 Main
11 pages
Best Practices For Developing Accessible Web Sites
No ratings yet
Best Practices For Developing Accessible Web Sites
27 pages
IQP
No ratings yet
IQP
84 pages
Building large scale web apps
From Everand
Building large scale web apps
Addy Osmani
No ratings yet
Accessibility As A Core UX Strategy: A Comprehensive Evaluation of Inclusive Design
No ratings yet
Accessibility As A Core UX Strategy: A Comprehensive Evaluation of Inclusive Design
5 pages
Final
No ratings yet
Final
15 pages
Professional Development
No ratings yet
Professional Development
6 pages
Présentation Web Accessibility UE-INF2509M HADDOUCHE Mohamed
No ratings yet
Présentation Web Accessibility UE-INF2509M HADDOUCHE Mohamed
8 pages
Xasutuka
No ratings yet
Xasutuka
6 pages
Comparative Analysis of Online Web Accessibility Evaluation Tools
No ratings yet
Comparative Analysis of Online Web Accessibility Evaluation Tools
12 pages
Accessibility and Usability Guidelines For Websites
No ratings yet
Accessibility and Usability Guidelines For Websites
36 pages
HCI - Evaluation Group Assignment
No ratings yet
HCI - Evaluation Group Assignment
10 pages
Acosta-Vargas Et Al. - 2016 - Evaluation of The Web Accessibility of Higher-Educ
No ratings yet
Acosta-Vargas Et Al. - 2016 - Evaluation of The Web Accessibility of Higher-Educ
6 pages
Advocating For Accessibility Testing by F. Navarrette
No ratings yet
Advocating For Accessibility Testing by F. Navarrette
4 pages
CWSN E-Content Guidelines
No ratings yet
CWSN E-Content Guidelines
148 pages
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
From Everand
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Response To The W3C Website Redesign RFP
100% (2)
Response To The W3C Website Redesign RFP
46 pages
The Must-Have WCAG 2.1 Checklist: Practical Resource Guide
No ratings yet
The Must-Have WCAG 2.1 Checklist: Practical Resource Guide
15 pages
Accessibility - Why and How To Include People With Disabilities
No ratings yet
Accessibility - Why and How To Include People With Disabilities
12 pages
LXP RFP Template For L&D Transformation
100% (1)
LXP RFP Template For L&D Transformation
31 pages
Web Accessibility Testing Checks
100% (1)
Web Accessibility Testing Checks
1 page
Bibliometric Analysis of Global Scientific Literature On Web Accessibility
No ratings yet
Bibliometric Analysis of Global Scientific Literature On Web Accessibility
9 pages
TM08 Website Information Standard
No ratings yet
TM08 Website Information Standard
56 pages
Usability Evaluation of Academic Websites Using Automated Tools
No ratings yet
Usability Evaluation of Academic Websites Using Automated Tools
7 pages
Web Applciations
No ratings yet
Web Applciations
14 pages
Web standards-Accessibility-MDN-Docs
No ratings yet
Web standards-Accessibility-MDN-Docs
5 pages
BRM Term Paper
No ratings yet
BRM Term Paper
23 pages
Math Thesis Defense
100% (2)
Math Thesis Defense
6 pages
Sanchez-Gordon Et Al. - 2020 - E-Government Accessibility in Ecuador A Prelimina
No ratings yet
Sanchez-Gordon Et Al. - 2020 - E-Government Accessibility in Ecuador A Prelimina
8 pages
PDF The ST Martin S Guide To Writing Thirteenth Edition Rise B. Axelrod Download
100% (3)
PDF The ST Martin S Guide To Writing Thirteenth Edition Rise B. Axelrod Download
35 pages
2024 27 Business Plan
No ratings yet
2024 27 Business Plan
18 pages
Guidelines For Divyangjan Lift
No ratings yet
Guidelines For Divyangjan Lift
15 pages
Web Accessibilty - Design Development Tips - Kinex Media PDF
No ratings yet
Web Accessibilty - Design Development Tips - Kinex Media PDF
11 pages
Introduction To Web Accessibility
100% (1)
Introduction To Web Accessibility
1 page
Chapter - 4
No ratings yet
Chapter - 4
15 pages
Web Content Accessibility Guidelines 1.0: W3C Recommendation 5-May-1999
No ratings yet
Web Content Accessibility Guidelines 1.0: W3C Recommendation 5-May-1999
34 pages
2020 State of Digital Accessibility Report Level Access
No ratings yet
2020 State of Digital Accessibility Report Level Access
40 pages
Introduction To Web Accessibility
No ratings yet
Introduction To Web Accessibility
1 page
Daraaraa Thesis 4 ACCESSIBILITY
No ratings yet
Daraaraa Thesis 4 ACCESSIBILITY
4 pages
Compliance and Beyond: 4 Ways Digital Accessibility Gives You A Competitive Advantage
No ratings yet
Compliance and Beyond: 4 Ways Digital Accessibility Gives You A Competitive Advantage
5 pages
Abhishek J Warkar
No ratings yet
Abhishek J Warkar
6 pages
2005 Evaluating and Improving Websites
No ratings yet
2005 Evaluating and Improving Websites
56 pages
Enhancing Web Accessibility - Navigating The Upgrade of Design Systems From WCAG 2.0 To WCAG 2.1
No ratings yet
Enhancing Web Accessibility - Navigating The Upgrade of Design Systems From WCAG 2.0 To WCAG 2.1
16 pages
The Evaluation of Accessibility, Usability and User Experience
No ratings yet
The Evaluation of Accessibility, Usability and User Experience
31 pages
Request For Proposal (RFP) : Redesign, Development and Hosting of The City of Williamston Website
No ratings yet
Request For Proposal (RFP) : Redesign, Development and Hosting of The City of Williamston Website
27 pages
Web Accessibility, Mobility, and Findability: Stanford Linear Accelerator Center (SLAC) Bebo@slac - Stanford.edu
No ratings yet
Web Accessibility, Mobility, and Findability: Stanford Linear Accelerator Center (SLAC) Bebo@slac - Stanford.edu
2 pages
Universal Usability
No ratings yet
Universal Usability
22 pages
JUS Koutsabasis August 2010
No ratings yet
JUS Koutsabasis August 2010
15 pages
Website Content Accessibility of The Cyprus Domain
No ratings yet
Website Content Accessibility of The Cyprus Domain
14 pages
DOI Social Media Policy Final Redacted
No ratings yet
DOI Social Media Policy Final Redacted
10 pages
Usability and Accessibility of Aging/Health-Related Web Sites
No ratings yet
Usability and Accessibility of Aging/Health-Related Web Sites
5 pages
User-Tailored Web Accessibility Evaluations: Markel Vigo Alfred Kobsa Myriam Arrue Julio Abascal
No ratings yet
User-Tailored Web Accessibility Evaluations: Markel Vigo Alfred Kobsa Myriam Arrue Julio Abascal
10 pages
Web 2.0: Hype or Happiness?: Mzajicek@brookes - Ac.uk
No ratings yet
Web 2.0: Hype or Happiness?: Mzajicek@brookes - Ac.uk
5 pages
Accessibility: Accessible
No ratings yet
Accessibility: Accessible
7 pages
Section 508 Concepts
No ratings yet
Section 508 Concepts
3 pages
Lecture 10
No ratings yet
Lecture 10
143 pages
Holistic Approaches To Accessibility Briefing (DRAFT)
No ratings yet
Holistic Approaches To Accessibility Briefing (DRAFT)
4 pages
WCAG 2.0 Level A Checklist
No ratings yet
WCAG 2.0 Level A Checklist
2 pages
Mindmap Based On "Constructing Accessible Web Sites"
No ratings yet
Mindmap Based On "Constructing Accessible Web Sites"
1 page