Scanning Print To PDF: Opportunities and Obstacles For Screen Reader Accessibility
Scanning Print To PDF: Opportunities and Obstacles For Screen Reader Accessibility
S
canning print to PDF opens a world of oppor- and flexibility provided by PDF format, it’s really no
tunity for sharing, using, and reusing resource wonder that it continues to thrive.
materials. Here at Virginia Tech’s Newman Li- The ability to use semi-automated processes to
brary, we’ve been able to bring previously unavail- create PDF documents from printed materials has ob-
able publications to the web in PDF format, including vious time-saving advantages. With the right equip-
out-of-print journals and historical documents. Mak- ment, you can scan fifty to ninety printed pages per
ing resources available online in an accessible format minute. However, merely scanning printed materials
creates opportunities for patrons that were not there as images is not enough. While creating a digital im-
before. Patrons can have their own copy of a docu- age of text on a page is a great leap in preservation
ment at the touch of button. After being rendered as and “sharability,” a wide variety of vision issues may
an accessible PDF, resources that previously existed affect any of us at some point in our lives, rendering
only in print take on new utility; they can be read visually oriented materials difficult or impossible to
aloud by a computer. This is a wonderful opportunity use. Making PDF documents accessible to those with
for all patrons, but especially for those with visual visual disabilities via screen reader technology is well
impairments. within the reach of our current technical abilities.
How is it that PDF has remained so popular with However, scanning print to PDF is not a panacea to
* Robert Browder is a digital publishing specialist with VT Publishing, a service of Virginia Tech Libraries. Since obtaining his
undergraduate degree in information science and systems in 2011 from Radford University, Browder has served in a variety of
technology and publishing roles. His work currently focuses on managing resources and workflows associated with the publication
of online open-access scholarly journals.
23
Accessibility, Technology, and Librarianship Heather Moorefield-Lang
PDF makes a suitable “pass-through” and preservation scanning process is the heart of our print-to-PDF pipe-
format to bring print into digital format while avoid- line. Combined with a reliable optical character rec-
ing manual transcription processes. ognition (OCR) process, automated scanning provides
extraordinary efficiency. Christy Stanley, Virginia
Tech University Libraries scanning specialist, uses a
Understanding Visual Disability process consisting of the following basic steps:
When we think about visual disability as a general • Prepare for automated feed scanning
˳˳
term, we are addressing a community of conditions Organization of materials
˳˳
that have different causes but often share similar Removal of spine for bound materials
functional limitations. Visual impairment includes ev- • Scanning
˳˳
erything from complete blindness to conditions that Loading and monitoring the scanner
merely require corrective lenses. Conditions like low • Adjust scanned pages
˳˳
vision, color-blindness, and corneal opacities each Adjust for skew
˳˳
have their own limitations. Crop pages to remove ragged edges
The World Health Organization groups moderate • Compile scanned pages into PDF documents
to severe vision impairment under the term low vi- • Color balance adjustments
˳˳
sion.2 The majority of conditions categorized as low vi- Setting the text to black makes it much more
sion can be improved with the use of corrective lens. legible for those with low vision and may im-
However, in the absence of corrective lens, low vision prove the quality of OCR output.
can make it incredibly difficult for individuals to read • OCR
˳˳
and perform daily tasks. This step can be done either at the end of the
Color-blindness results in perceptions of colors scanning process or at the beginning of the
that differ from the way the majority of the popula- read order editing process.
tion perceive them. Three forms of color blindness are
currently documented: red appears as green, blue ap- We have a couple of Fujitsu scanners, the 6240-Z
pears as yellow, and complete absence of color vision. and the 6770. Both have an auto-feed tray and a flat-
According to the National Eye Institute, “As many as 8 bed. The 6770 will handle larger pages and will scan
percent of men and 0.5 percent of women with North- more pages per minute. There are lots of options
ern European ancestry have the common form of red- for scanners made by familiar brands like Kodak,
green color blindness.”3 As you might imagine, color- Canon, and HP that provide functionality similar to
blindness creates unique challenges for interpreting these. If you’re thinking about buying a scanner and
color-coded information. your library is already invested in equipment from
Globally, most cases of blindness can fit within a particular vendor, it may make sense to get their
a few categories. Corneal opacities (CO), clouding of stuff in the hope that all components will play to-
the cornea, are often the result of infections but can gether nicely. Most scanners come with software that
also result from injury. Age-related macular degenera- may be helpful in building or refining the scanning
Library Technology Reports alatechsource.org May-June 2018
24
Accessibility, Technology, and Librarianship Heather Moorefield-Lang
example of charts and graphs. Colors without appro- documents. Developing familiarity with this tool kit is
priate contrast may render bar graphs and charts diffi- a marathon, not a sprint. Consistent time investment
cult to use. This situation must be considered carefully in developing skills with this tool set will yield best
when scanning documents that contain graphics that results. Acrobat offers various levels of automation for
use color to communicate. Alternative text (alt text) different tasks that are helpful in creating accessible
can be used to add meaning to images that have poor documents, including tagging, accessibility checking,
color contrast. alternative text, and reading order.
25
Accessibility, Technology, and Librarianship Heather Moorefield-Lang
assigning read order to a PDF document. But it cannot What about Math?
determine which elements add meaning to the work
or the correct reading order for content found in com- While an OCR process can interpret characters and
plex layouts. Ideally, read order should be comparable group them into words and sentences, generating a
to the way a human would read a text. consistent screen reader experience for mathemati-
For example, let’s consider a typical page that cal equations is a bit beyond what can reasonably be
contains a running header in the top right corner of expected from the OCR process. If equations are in-
the page with page number and several paragraphs cluded as part of a sentence, they may or may not
of text in the body of the page. Even with just these come through reliably. In the case that equations are
few elements on the page, there is possibility for im- presented on their own apart from the text, they can
properly assigned read order to disrupt the flow of be tagged as figures and have a verbal description
the text and its meaning. Let’s suppose that an au- added as alt text. This approach is especially help-
tomated read order assignment has defined the run- ful with complex multilevel equations that use special
ning header as the first element on the page and the characters and symbols, such as Greek letters.
paragraphs in the body text as the second, third, and Tagging equations as figures and adding alt text
fourth elements. At first glance this may seem fine, is a reasonable way to treat equations in the print-to-
but what if the first sentence on the page is a con- PDF process; however, it raises another issue. The alt
tinuation of the last sentence on the previous page? text must be meaningful. A subject specialist who un-
If the screen reader reads the running header first, it derstands how to correctly communicate the equation
will break the flow of the sentence and possibly con- with a text description will be required.
fuse or distort its meaning. This is a serious quality
issue that can create problems for users of the re-
source. The solution to this problem is to manually Opportunity or Obstacle? It’s
edit the read order of the document. The correct ac- All about the Content
tion in this case would be to define all of the running
headers with page numbers throughout the entire While scanning print to PDF is a great opportunity for
document as background and assign the first para- preserving and sharing printed materials of all types
graph in the body text as the first element on the over the web and is a great entry point for bringing
page. This approach maintains the flow and meaning print into the digital space, the ability to produce a
of the content. PDF that is highly accessible for screen readers is not
always straightforward and often requires excessive
inputs of time and specialized skills. The complex-
Tables ity of content in the source material is the deciding
factor in how well a PDF document can meet the rig-
Tables are a special challenge for the print-to-PDF orous demands of screen reader accessibility. PDF
process. Simple tables are easy enough to tag and offers wonderful opportunities for plain text and im-
use with a screen reader. The simplest of tables can ages. We start to run into obstacles when documents
Library Technology Reports alatechsource.org May-June 2018
even be tagged as a figure and amply described with contain more complex types of content like tables,
alt text. Larger tables are challenging and time-con- graphs, and complex math equations. While all of
suming to tag in PDF. I argue that even the most these content types perform fine visually, develop-
detailed tagged tables do not provide a comparable ing the PDF document to the point that it provides a
experience for those with visual impairments. Let’s comparable experience delivered verbally via screen
take a moment to remember what a table is and what reader is time-consuming and often requires input of
it is supposed to do. A table is a tool that creates specialized skills.
a matrix that allows the user to explore data rela-
tionships in a two-dimensional format, columns and
rows. The matrix functionality that makes a table To Scan or Not to Scan?
such a valuable tool for presenting information can
be severely diminished by representing it verbally. Yes, by all means, scan. But know your goals, know
Trends and patterns that are obvious when using the the limitations of a print-to-PDF scanning process,
table either visually or tactilely may be much more and set expectations accordingly. PDF readily satis-
difficult to identify when attempting to explore the fies goals for preservation and dissemination of vi-
information verbally. The goal is to share informa- sually accessibly materials. PDF also performs well
tion revealed by looking at the relationships of data as a pass-through format to aid in avoiding manual
organized within the matrix. HTML, braille, and tac- transcription. PDF can sometimes satisfy the needs of
tile graphics are often better formats for this type of accessible documents, depending on the types of con-
complex content. tent found in the document. Before a scanning project
26
Accessibility, Technology, and Librarianship Heather Moorefield-Lang
begins, it is important to consider whether or not the 2. “Vision Impairment and Blindness,” fact sheet, World
complexity of the content can be faithfully communi- Health Organization, last updated October 2017,
cated via PDF with screen reader technology and how www.who.int/mediacentre/factsheets/fs282/en/.
3. “Facts about Color Blindness,” National Eye Institute,
much effort will be required to organize PDF docu-
National Institutes of Health, last updated February
ments for screen reader accessibility. 2015, https://nei.nih.gov/health/color_blindness/facts
_about.
4. “Priority Eye Diseases,” World Health Organization, ac-
Notes cessed March 8, 2018, www.who.int/blindness/causes
/priority/en/index8.html.
1. “PDF Format Becomes ISO Standard,” International 5. “Vision Impairment and Blindness.”
Standards Organization, July 2, 2008, https://www
.iso.org/news/2008/07/Ref1141.html.
27
Accessibility, Technology, and Librarianship Heather Moorefield-Lang
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.