0% found this document useful (0 votes)
37 views

Scanning Print To PDF: Opportunities and Obstacles For Screen Reader Accessibility

SSS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Scanning Print To PDF: Opportunities and Obstacles For Screen Reader Accessibility

SSS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 5

Scanning Print to PDF


Opportunities and Obstacles for Screen
Reader Accessibility
Robert Browder*

S
canning print to PDF opens a world of oppor- and flexibility provided by PDF format, it’s really no
tunity for sharing, using, and reusing resource wonder that it continues to thrive.
materials. Here at Virginia Tech’s Newman Li- The ability to use semi-automated processes to
brary, we’ve been able to bring previously unavail- create PDF documents from printed materials has ob-
able publications to the web in PDF format, including vious time-saving advantages. With the right equip-
out-of-print journals and historical documents. Mak- ment, you can scan fifty to ninety printed pages per
ing resources available online in an accessible format minute. However, merely scanning printed materials
creates opportunities for patrons that were not there as images is not enough. While creating a digital im-
before. Patrons can have their own copy of a docu- age of text on a page is a great leap in preservation
ment at the touch of button. After being rendered as and “sharability,” a wide variety of vision issues may
an accessible PDF, resources that previously existed affect any of us at some point in our lives, rendering
only in print take on new utility; they can be read visually oriented materials difficult or impossible to
aloud by a computer. This is a wonderful opportunity use. Making PDF documents accessible to those with
for all patrons, but especially for those with visual visual disabilities via screen reader technology is well
impairments. within the reach of our current technical abilities.
How is it that PDF has remained so popular with However, scanning print to PDF is not a panacea to

Library Technology Reports  alatechsource.org  May-June 2018


the emergence and maturity of other digital reading create accessibility for all types of content. While it is
technologies? In 2008, following many years of practi- perfect for some types of content, more complex types
cal use and popularity, Adobe Systems, creator of the of content prove to be remarkably difficult and time-
PDF file format, released the file specification to the consuming to render screen-reader-accessible in PDF.
International Organization for Standardization (ISO) Scanning print to PDF presents unique opportuni-
for management and expansion.1 Adobe did this in ties and challenges. The source material to be scanned
response to heavy use of the format by governments will determine how much effort is required to make a
and public organizations. Releasing the specification PDF accessible. For complex content like large tables,
to the ISO brought the PDF format into the world of graphs, charts, and equations, HTML often provides
“open technology” and cemented the confidence of better opportunities for accessibility and production
public institutions. For the typical user, PDF provides efficiency than is possible with PDF. For simpler con-
a reading experience that is “near-book” by provid- tent, such as text and images that can be described
ing an application interface that creates a firm bound- with ease verbally, scanning print to PDF is often the
ary from all the distraction that is the modern web most streamlined approach to creating an accessible
browsing experience. With the combination of focus resource from printed materials. In nearly all cases,

* Robert Browder is a digital publishing specialist with VT Publishing, a service of Virginia Tech Libraries. Since obtaining his
undergraduate degree in information science and systems in 2011 from Radford University, Browder has served in a variety of
technology and publishing roles. His work currently focuses on managing resources and workflows associated with the publication
of online open-access scholarly journals.

23
Accessibility, Technology, and Librarianship  Heather Moorefield-Lang
PDF makes a suitable “pass-through” and preservation scanning process is the heart of our print-to-PDF pipe-
format to bring print into digital format while avoid- line. Combined with a reliable optical character rec-
ing manual transcription processes. ognition (OCR) process, automated scanning provides
extraordinary efficiency. Christy Stanley, Virginia
Tech University Libraries scanning specialist, uses a
Understanding Visual Disability process consisting of the following basic steps:

When we think about visual disability as a general • Prepare for automated feed scanning
˳˳
term, we are addressing a community of conditions Organization of materials
˳˳
that have different causes but often share similar Removal of spine for bound materials
functional limitations. Visual impairment includes ev- • Scanning
˳˳
erything from complete blindness to conditions that Loading and monitoring the scanner
merely require corrective lenses. Conditions like low • Adjust scanned pages
˳˳
vision, color-blindness, and corneal opacities each Adjust for skew
˳˳
have their own limitations. Crop pages to remove ragged edges
The World Health Organization groups moderate • Compile scanned pages into PDF documents
to severe vision impairment under the term low vi- • Color balance adjustments
˳˳
sion.2 The majority of conditions categorized as low vi- Setting the text to black makes it much more
sion can be improved with the use of corrective lens. legible for those with low vision and may im-
However, in the absence of corrective lens, low vision prove the quality of OCR output.
can make it incredibly difficult for individuals to read • OCR
˳˳
and perform daily tasks. This step can be done either at the end of the
Color-blindness results in perceptions of colors scanning process or at the beginning of the
that differ from the way the majority of the popula- read order editing process.
tion perceive them. Three forms of color blindness are
currently documented: red appears as green, blue ap- We have a couple of Fujitsu scanners, the 6240-Z
pears as yellow, and complete absence of color vision. and the 6770. Both have an auto-feed tray and a flat-
According to the National Eye Institute, “As many as 8 bed. The 6770 will handle larger pages and will scan
percent of men and 0.5 percent of women with North- more pages per minute. There are lots of options
ern European ancestry have the common form of red- for scanners made by familiar brands like Kodak,
green color blindness.”3 As you might imagine, color- Canon, and HP that provide functionality similar to
blindness creates unique challenges for interpreting these. If you’re thinking about buying a scanner and
color-coded information. your library is already invested in equipment from
Globally, most cases of blindness can fit within a particular vendor, it may make sense to get their
a few categories. Corneal opacities (CO), clouding of stuff in the hope that all components will play to-
the cornea, are often the result of infections but can gether nicely. Most scanners come with software that
also result from injury. Age-related macular degenera- may be helpful in building or refining the scanning
Library Technology Reports  alatechsource.org  May-June 2018

tion is a progressive degeneration of a person’s main process.


field of vision due to lesions of the retina. Glaucoma is After pages are scanned, post-processing can
caused by optic neuropathy, in which messages from be achieved using vendor software that came with
the eye are either not conducted or poorly conducted the scanner or an open-source tool like Scan Tailor.
to the brain. Cataract is a clouding of the lens that pre- Post-processing allows the technician to straighten
vents light from entering the eye.4 crooked pages, adjust the color balance, remove un-
The World Health Organization reports that 253 sightly edges, and group a collection of scanned pages
million people live with visual impairment of some into a single PDF document.
kind.5 While creating accessible digital materials does The importance of adjusting the color balance of
not solve the root problem, it does make information a document should not be ignored. Color balance ad-
available to those who otherwise would not have it. justments can often increase color contrast of typogra-
Consider the benefit you get from reading an article phy, yielding notable improvements in readability for
you are interested in and multiply it by 253 million. those with low vision. OCR processes may also benefit
That’s real opportunity there. from color balance adjustments.
Color can be an important part of any visual com-
munication and can a have serious impact on accessi-
The Scanning Process bility. Color contrast is important for users with low
vision or color blindness. Color is often used to con-
Scanning print to PDF is a process that is used reg- vey meaning and communicate essential informa-
ularly at Virginia Tech’s University Libraries. The tion. Nowhere do we see this more clearly than in the

24
Accessibility, Technology, and Librarianship  Heather Moorefield-Lang
example of charts and graphs. Colors without appro- documents. Developing familiarity with this tool kit is
priate contrast may render bar graphs and charts diffi- a marathon, not a sprint. Consistent time investment
cult to use. This situation must be considered carefully in developing skills with this tool set will yield best
when scanning documents that contain graphics that results. Acrobat offers various levels of automation for
use color to communicate. Alternative text (alt text) different tasks that are helpful in creating accessible
can be used to add meaning to images that have poor documents, including tagging, accessibility checking,
color contrast. alternative text, and reading order.

Setting Expectations for Tagging PDF Documents


Output of the Optical Character
One of the most important steps in creating acces-
Recognition Process
sible PDF documents is tagging. Tagging allows us to
OCR is a process that uses computer algorithms to an- define different elements within the document. Com-
alyze and identify letter shapes and words. OCR can mon elements that need to be tagged are headings,
be achieved with Adobe Acrobat or, in some cases, paragraphs, and images. Screen readers use these tags
with software that came with the scanner. An OCR to assist the reader in using and navigating the docu-
process adds character encoding to the document so ment. Acrobat provides automation for this task that
that screen readers can read the document to users is marginally helpful. The automated process will of-
with visual impairments. OCR also allows users to ten tag artifacts that should be ignored by the screen
copy and paste text from the document. While current reader. Quality will be improved with manual review
OCR technologies do pretty well with recognizing and editing.
standard fonts, OCR algorithms will be confounded
by poor quality scans, decorative typefaces, and hand-
writing. So the output of an OCR process can be only Alternative Text for Images,
as good as the input. Keep this in mind when setting Figures, Graphs, and Charts
expectations for print-to-PDF projects.
Alternative text is descriptive text that can be added
to a document to replace images, figures, math,
Testing the Output graphs, and charts when the document is read by a
screen reader. Alt text fills in the blanks that other-
After OCR, the document must be tested. A screen wise result from unseen images. How well alt text fills
reader such as JAWS, NVDA, or VoiceOver will be in those blanks is another story. Ideally, alt text would
very useful. Adobe Acrobat will also prove indispens- be supplied by the author of a text, but in the case of
able. Using screen readers allows us to know some- scanning, this is usually impossible. Alt text must be
thing about the experience the document will provide created by someone who understands the context and
to those with visual impairments. Using Acrobat will content of the images. With simple images and fig-

Library Technology Reports  alatechsource.org  May-June 2018


provide us with a window into the technical organiza- ures, filling in the alt text is a simple task. With com-
tion of the scanned document. plex charts and graphs, creating alt text that reliably
As a first step, it is always useful to test the doc- communicates the information becomes a specialty
ument with a screen reader. Listen carefully as the that may require a subject specialist.
application reads the text to you. Pay attention and While alt text is intended to create an experience
make note of any inconsistencies. Since this document that is comparable to interacting with the document
is newly scanned, we can expect that some content, visually, whether or not it actually does is, in many
such as images or other graphics, will be skipped over cases, debatable. HTML is often a better format for
by the screen reader. We may also find that content is complex graphs and charts. Tactile graphics can cre-
not always read in the correct order. We may find that ate a truly comparable experience for the visually
artifacts of the document, such as running headers or impaired.
footers, are read by the application when they should
not be. While the scanning and OCR processes have
saved a great deal of time, we may find that the re- Read Order Editing
sulting screen reader output is intelligible but not in-
telligent. Human intervention is typically required to Read order is the order in which a screen reader will
organize the document in such a way that its full con- read the contents of a PDF to a human listener. While
text can be conveyed via screen reader. a human reader will evaluate a page using visual
Abode Acrobat provides tools that can be used cues, a screen reader needs to have the read order ex-
to analyze and edit the underlying structure of PDF plicitly defined. Acrobat can automate the process of

25
Accessibility, Technology, and Librarianship  Heather Moorefield-Lang
assigning read order to a PDF document. But it cannot What about Math?
determine which elements add meaning to the work
or the correct reading order for content found in com- While an OCR process can interpret characters and
plex layouts. Ideally, read order should be comparable group them into words and sentences, generating a
to the way a human would read a text. consistent screen reader experience for mathemati-
For example, let’s consider a typical page that cal equations is a bit beyond what can reasonably be
contains a running header in the top right corner of expected from the OCR process. If equations are in-
the page with page number and several paragraphs cluded as part of a sentence, they may or may not
of text in the body of the page. Even with just these come through reliably. In the case that equations are
few elements on the page, there is possibility for im- presented on their own apart from the text, they can
properly assigned read order to disrupt the flow of be tagged as figures and have a verbal description
the text and its meaning. Let’s suppose that an au- added as alt text. This approach is especially help-
tomated read order assignment has defined the run- ful with complex multilevel equations that use special
ning header as the first element on the page and the characters and symbols, such as Greek letters.
paragraphs in the body text as the second, third, and Tagging equations as figures and adding alt text
fourth elements. At first glance this may seem fine, is a reasonable way to treat equations in the print-to-
but what if the first sentence on the page is a con- PDF process; however, it raises another issue. The alt
tinuation of the last sentence on the previous page? text must be meaningful. A subject specialist who un-
If the screen reader reads the running header first, it derstands how to correctly communicate the equation
will break the flow of the sentence and possibly con- with a text description will be required.
fuse or distort its meaning. This is a serious quality
issue that can create problems for users of the re-
source. The solution to this problem is to manually Opportunity or Obstacle? It’s
edit the read order of the document. The correct ac- All about the Content
tion in this case would be to define all of the running
headers with page numbers throughout the entire While scanning print to PDF is a great opportunity for
document as background and assign the first para- preserving and sharing printed materials of all types
graph in the body text as the first element on the over the web and is a great entry point for bringing
page. This approach maintains the flow and meaning print into the digital space, the ability to produce a
of the content. PDF that is highly accessible for screen readers is not
always straightforward and often requires excessive
inputs of time and specialized skills. The complex-
Tables ity of content in the source material is the deciding
factor in how well a PDF document can meet the rig-
Tables are a special challenge for the print-to-PDF orous demands of screen reader accessibility. PDF
process. Simple tables are easy enough to tag and offers wonderful opportunities for plain text and im-
use with a screen reader. The simplest of tables can ages. We start to run into obstacles when documents
Library Technology Reports  alatechsource.org  May-June 2018

even be tagged as a figure and amply described with contain more complex types of content like tables,
alt text. Larger tables are challenging and time-con- graphs, and complex math equations. While all of
suming to tag in PDF. I argue that even the most these content types perform fine visually, develop-
detailed tagged tables do not provide a comparable ing the PDF document to the point that it provides a
experience for those with visual impairments. Let’s comparable experience delivered verbally via screen
take a moment to remember what a table is and what reader is time-consuming and often requires input of
it is supposed to do. A table is a tool that creates specialized skills.
a matrix that allows the user to explore data rela-
tionships in a two-dimensional format, columns and
rows. The matrix functionality that makes a table To Scan or Not to Scan?
such a valuable tool for presenting information can
be severely diminished by representing it verbally. Yes, by all means, scan. But know your goals, know
Trends and patterns that are obvious when using the the limitations of a print-to-PDF scanning process,
table either visually or tactilely may be much more and set expectations accordingly. PDF readily satis-
difficult to identify when attempting to explore the fies goals for preservation and dissemination of vi-
information verbally. The goal is to share informa- sually accessibly materials. PDF also performs well
tion revealed by looking at the relationships of data as a pass-through format to aid in avoiding manual
organized within the matrix. HTML, braille, and tac- transcription. PDF can sometimes satisfy the needs of
tile graphics are often better formats for this type of accessible documents, depending on the types of con-
complex content. tent found in the document. Before a scanning project

26
Accessibility, Technology, and Librarianship  Heather Moorefield-Lang
begins, it is important to consider whether or not the 2. “Vision Impairment and Blindness,” fact sheet, World
complexity of the content can be faithfully communi- Health Organization, last updated October 2017,
cated via PDF with screen reader technology and how www.who.int/mediacentre/factsheets/fs282/en/.
3. “Facts about Color Blindness,” National Eye Institute,
much effort will be required to organize PDF docu-
National Institutes of Health, last updated February
ments for screen reader accessibility. 2015, https://nei.nih.gov/health/color_blindness/facts
_about.
4. “Priority Eye Diseases,” World Health Organization, ac-
Notes cessed March 8, 2018, www.who.int/blindness/causes
/priority/en/index8.html.
1. “PDF Format Becomes ISO Standard,” International 5. “Vision Impairment and Blindness.”
Standards Organization, July 2, 2008, https://www
.iso.org/news/2008/07/Ref1141.html.

Library Technology Reports  alatechsource.org  May-June 2018

27
Accessibility, Technology, and Librarianship  Heather Moorefield-Lang
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.

You might also like