0% found this document useful (0 votes)
226 views

Intro To Datacap - Lab Guide - v3

The document describes a lab exercise for configuring a Datacap solution using FastDoc. It provides steps to create a new application called FastApp using the Forms template. It then guides configuring document types, fingerprints, rulesets and testing the application before processing sample documents.

Uploaded by

Duy Nguyen Ho
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
226 views

Intro To Datacap - Lab Guide - v3

The document describes a lab exercise for configuring a Datacap solution using FastDoc. It provides steps to create a new application called FastApp using the Forms template. It then guides configuring document types, fingerprints, rulesets and testing the application before processing sample documents.

Uploaded by

Duy Nguyen Ho
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

Exploring IBM Datacap 9.1.

9 – A Solution Showcase
Lab exercises
Contents
LAB 1 CONFIGURING A SOLUTION WITH FASTDOC (STRUCTURED FORM DESIGN) ............... 3
1.0 FASTDOC ADMIN ...................................................................................................... 3
1.1 CREATE AN APPLICATION ............................................................................................ 5
1.2 ADD A NEW DOCUMENT TYPE .................................................................................... 13
1.3 ADD A FINGERPRINT................................................................................................. 18
1.4 CONFIGURE RULESETS ............................................................................................ 23
1.5 DESIGN TIME TESTING ............................................................................................. 25
1.6 PROCESS BATCHES ................................................................................................. 29
1.7 SUMMARY .............................................................................................................. 36
LAB 2 CONFIGURING A SOLUTION WITH STUDIO (SEMI STRUCTURED FORM) ....................... 38
2.0 OVERVIEW ............................................................................................................. 38
2.1 WHAT IS A SEMI STRUCTURED DOCUMENT? ................................................................. 38
2.2 RETEST FASTAPP .................................................................................................... 39
2.3 WHAT IS DATACAP STUDIO? ..................................................................................... 40
2.4 RULESETS, RULES, FUNCTIONS, AND ACTIONS ............................................................ 42
2.5 CREATE A DUPLICATE OF FASTAPP USING DATACAP STUDIO .......................................... 44
2.6 LET’S TROUBLESHOOT TO FIND OUT WHY THE PREVIOUS BATCH WENT TO A FIXUP STEP..... 46
2.7 CREATE A CUSTOM CLASSIFICATION RULESET ............................................................. 49
LAB 3 CONFIGURING APPLICATION TO RUN ON DATACAP NAVIGATOR ................................ 75
3.0 IBM CONTENT NAVIGATOR ADMINISTRATION .............................................................. 75
3.1 CONFIGURING ICN. ................................................................................................. 75
3.2 SETTING UP RULERUNNER TO RUN BACKGROUND PROCESSING ...................................... 81
3.3 RETEST YOUR APPLICATION WITH DATACAP NAVIGATOR. .............................................. 85
LAB 4 EXPORTING TO FILENET CM ............................................................................................. 89
4.0 OVERVIEW ............................................................................................................. 89
4.1 CREATE A NEW DOCUMENT CLASS IN FILENET VIA ACCE .............................................. 89
4.2 SUMMARY ............................................................................................................ 106
Lab 1 Configuring a solution with FastDoc
(Structured Form Design)

1.0 FastDoc Admin


The component that provides the rapid setup is called FastDoc. You can use FastDoc for rapid
application development with Datacap. If additional capabilities are needed, FastDoc can be extended
with Datacap Studio.

Applications that are created with FastDoc Admin can be executed in any of the Datacap clients
including IBM Content Navigator, Datacap Desktop thick client, Datacap Web Server, and in FastDoc
itself.

It is a general purpose design tool that can be used to design and configure the Datacap application.

1.0.1 A Capture Process

This is a diagram of a typical Datacap capture process created using Datacap FastDoc. Datacap includes
predefined templates of standardized workflows so that you can start with a predefined process and
adjust it to suit your needs.

1.0.2 VScan

One of the primary functions of Datacap is scanning documents. Datacap Desktop supports the
operation of scanners that use an industry standard TWAIN and proprietary ISIS drivers. For this lab, we
don’t have a scanner attached. So in this section we use the VScan (import) feature to simulate
scanning. Datacap lets you select pre-scanned images from your computer’s files in a manner similar to
scanning documents.
1.0.3 PageID

Once batches are scanned, they are routed to the next task in the job called PageID. This task enhances
the images to improve readability, identifies each of the pages and assembles them into documents. It
checks the results and if everything is OK, the batch goes to Profiler. If there is a problem, it is routed to
Fixup.

1.0.4 Fixup

Fixup is used to manually fix problems. With Fixup, you can manually classify, rescan, and assemble
documents.

1.0.5 Profiler

If you are using Optical Character Recognition or Intelligent Character Recognition (OCR or ICR), Profiler
reads data from the documents. With paper documents the results may not be completely accurate.
For example, pages may be damaged, users can enter invalid information on the paper or simply cross
out or erase information on the paper.

As a result, the system checks the data and flags fields that need human review. Documents with low-
confidence data or that fail validation checks are sent to the “Verify” task. Documents with high-
confidence are directly sent to “Export” task. Note: splitting batches is an optional feature; if you prefer,
the documents can remain together in one batch.

1.0.6 Verify

The operator reviews each document to correct any problems. If a page has no problems, the system
can be configured to skip it, so that the user only reviews the pages that need attention. Once verified,
documents are sent to Export.

1.0.7 Export

Export sends data and documents to ECM systems like IBM FileNet, IBM Content Manager, CMOD,
SharePoint, databases, and via web services.
1.1 Create an application

1.1.1 Scenario

The scenario we will follow in our lab is a simple one. As part of a larger financing application, we want
to verify an applicant’s Date_of_Birth. We will do this by requesting the applicant submit a copy of their
most recent BENEFICIARY statement. This will then be used by our line of business application to
determine if the applicant’s Date_of_Birth meets our requirements.

1.1.2 Steps

FastDoc Admin includes a wizard to guide you through the initial steps. You can choose a pre-configured
template that matches the style of application that you need. There are two templates included with
Datacap: Forms and Learning. For our lab, we use the Forms template.

Forms – Use for more highly structured documents with a fixed layout such as account applications,
surveys, and request forms.

Learning – Use for less structured, more variable documents such as bank statements, utility bills,
correspondence, and bills of lading.
_1. Logon to FastDoc (admin).
Suggestion: You can pin this app to the taskbar

_2. Select the application TravelDocs from the list of applications and login with the below information.
User : admin
Password : admin
Station :1
_3. Click the Application Wizard icon as shown below.

_4. Click Next in the below screen

_5. Select Create a new RRS application and click the Next button.

Note: a CMIS application allows you to get document properties from a content repository such as
FileNet and IBM Content Manager.
_6. Enter the Application Name as FastApp the Application Template as FormTemplate, and click the
Next button.

_7. Click the Next button for the next two screens.

_8. Click Finish in the next screen then click the Close button.

You have created a new application named FastApp.


_9. Before you close FastDoc, it’s good to pin this program to taskbar, so that it’s easier for you to
relaunch FastDoc.

_10. Re-launch FastDoc (admin).

You can now click the shortcut.

_11. Log in using the FastApp application.

Select the new application FastApp from the list of applications and login with the below information.
Refer to Table 2 below for more details (if required).
User : admin
Password : admin
Station :1
The screen displays the list of tasks that the user can run. Typically, the user sees the Scan and Verify
tasks. You see all the tasks since you are logged in as an administrator.

_12. Review the following items in the table to familiarize yourself with some of the functions.

FastDoc Purpose Explanation


Icons

Provides a list of buttons for The user can click these buttons to process documents. A
processing documents. typical user might only see the interactive tasks, such as scan
and verify.

Configure documents, This is used to design and configure the system. This only
pages and fields shows in “admin” mode.

Clicking this icon displays the “document hierarchy”, the list


of document types, pages and fields.

Configure workflow This is used to design and configure the system. This only
displays in “admin” mode.

Clicking this icon displays the list of jobs and the task
processes in a flowchart-style format.

Logout To logout from FastDoc

Application Wizard For Creating New Applications


_13. Click the Configure documents, pages, and fields button

This screen is only available in “admin” mode and is used to configure the types of documents, pages,
and fields and their settings and other attributes.

_14. Click Document.

The list now expands. The document type “Document” contains one page type, “Page”. Since we have
created a new application, the system lists some generic default values. The best practice is to leave
these items and add the items you need for your application.

To the right, there are three tabs: Settings, Ruleset, and Fingerprints.

This is where we enter the configuration options and test our settings. We be add our own document
and page type and configuring them using these tabs.

Feel free to click and view these tabs, but don’t make any changes yet.

(If you accidentally change something, click the Reload button.)


_15. Click the Configure workflow icon

This is a graphical view of the capture process. On the left is a list of Jobs. The middle diagram is the
sequence of tasks for the selected job. The right is a palette of rulesets which can be assembled to form
a task.

Job: In this exercise we use the DemoSingleTIFFs job and use the default workflow which exists for this
job.

The tasks for the selected job are:

VScan Import images (virtual scan)

PageID Identify pages and create documents

Profiler Extract and validate data

Export Send documents and data to repositories and applications

_16. Click the grey Route boxes.

These expand to show conditional routes in the process. Here, they are used to optionally run jobs that
can fix problems and verify images.

_17. Observe the list of Rulesets on the right.

Additional rulesets can be added to a task by dragging from this list and dropping on a task. We don’t
need to do that at this point.
1.2 Add a new document type
In this section, you will configure a Datacap application to process a Beneficiary Designation document
shown below.

1.2.1 Add a Document Type

_1. Click the icon for Configuring Documents and Pages

_2. In the Batch Structure Pane, select FastApp at Batch level as shown below (1).

Then click Add Document to add a new Document Type.


_3. Specify the Document Type Claim_Document and

Enable the option Use rulesets from - Document as shown below.

Click Add to add this Document Type

1.2.2 Add Page Type

_1. Select Claim_Document in the Batch Structure pane and Click Add Page on the right side.

_2. Specify the Page Type Claim_Page and

Enable the option Use rulesets from – Page


_3. Set the Document Integrity Rule

In the Batch Structure Pane, select the Claim_Page and for the Parameters, specify the below values.
Minimum: 1
Maximum: 1
Order : 1

_4. Click Save

This specifies that the Claim page starts a new Claim document, and that a claim document must contain
one and only one Claim page.
1.2.3 Add Fields to the new page type

Next, add the following fields to the Claim_Page.

Employee_Name

SSN

Date_Of_Birth

Employer

Employer_Address

Group_Number

Home_Telephone_Number

Home_Address

City

State

Zip_Code

_1. Select the Claim_Page from the Batch Structure. Click Add Field button.
_2. Enter Field type: Employee_Name and click Add

_3. Click the Add button. The field is added to the Batch Structure list.

_4. Click the Save button at the top of the screen

_5. Add all of the fields in this list as you did in step 1
SSN
Date_Of_Birth
Employer
Employer_Address
Group_Number
Home_Telephone_Number
Home_Address
City
State
Zip_Code

When you are done, the batch structure should look like the one shown below.
1.3 Add a fingerprint
Fingerprints are sample images that the system can use to identify a page and to locate the data on the
page. In this lab we pick a sample image, identify it as a Claim page, and draw the zones or location of
each field. First, we add a fingerprint class which is a way to categorize and organize the fingerprints.

1.3.1 Add a Finger Print Class

_1. Select the Fingerprints tab and Select <New> for the Fingerprint Class.

_2. Click Add

_3. Enter the name Claim_FP_Class and click Add

_4. Click Close


1.3.2 Add a Fingerprint

_1. Select the Fingerprint class Claim_FP_Class

_2. Click Add as shown below next to the Fingerprints box

_3. Go to TechX-Datacap Box folder to download the exercise images


https://ibm.box.com/s/z11bx9vhnkgdvbkckd82id8a8hbuv8yh
_4. Select the file demo_images01.tif from the Downloads folder

The fingerprint is added to the list in the Fingerprints box.

The image you selected displays in the image display. Image enhancement has also cleaned and
straightened the image.

_5. Select the fingerprint: 556 (Other)


Select the Page Type Claim_Page in the Page type drop-down menu.
When you change fingerprints, the save happens automatically so there is no need to click the Save
button.
Your Fingerprint should look as follows:

1.3.3 Drawing Zones

To extract the values from the Form, Zones are to be drawn for each of the fields.

_1. In the Batch Structure select the Employee_Name field.

_2. In the Fingerprint Image pane, draw a box using the mouse pointer, considering the width and
height of the field.
Tips: If the image is too small, you magnify it by clicking or use the scroll button of your mouse.
To move the image around, right click on the image

_3. Observe that the Zone Position / Coordinates are updated.

It might be easier to zoom into the image by clicking the magnifier icon. You can move the zoomed
image by right-clicking and dragging with the mouse.
_4. Draw zones for all the fields (of Claim_Page) as shown below except for the field
“Employer_Address”.

Field Employer_Address is not present in the Form.

We will configure it in the next exercise by Custom Lookup using a


custom web services interface.

_5. Save the changes by clicking the Save button as shown below. Fingerprint changes save
automatically so if you have already saved previously, the button is greyed out.
1.4 Configure Rulesets

1.4.1 Overview

Next we implement basic OCR recognition of our claim document. Most of the default settings are
sufficient to handle many situations and so only a few changes are needed.

When Datacap processes documents, it follows the process defined for each type of Job. A Job has a
series of tasks. Each task can contain a sequence of Rulesets. For each type of document, page, and field
you can set parameters that determine how the documents are processed.

The first settings we need to update are in the Ruleset, “Recognize Pages and Fields”. We need to tell
the system that we want to use OCR on our page and identify which fields need to be read.

Next we set the file location from which the system will import documents.

1.4.2 Configure OCR Recognition for the Claim_Page

_1. Select Claim_Page in the Batch Structure and Click Ruleset tab.
_2. Select the Ruleset Recognize Pages and Fields

_3. Check the following options as shown below (3).

Enable: Read Page

Enable: Load Zones for fields

Enable: Read Machine print on page

1.4.3 Configure the fields

Configure Ruleset as per the below steps

_4. Select the field Employee_Name in the Claim_Page under the Batch Structure

_5. Select the Ruleset Recognize Pages and Fields

_6. Check the following options as shown below (3) (Add page recognition text to the zone) and click the
Save button

_7. Repeat this for all ten fields except for Employer_Address.
_8. If you have not done so already, click Save.

1.5 Design Time Testing


FastDoc provides a test panel that allows you to verify that the document settings are effective. You can
load sample documents, execute the process tasks and view the test results all within the design tool.

_1. Ruleset configuration

Click the Ruleset Tab

The test panel is located on the right side of the screen.

_2. Select the file to test

Click the Add File button as shown below to add an input file.

Select the file demo_images01.tif and click Open.


The image displays in the test panel.

Start testing

_3. Click the Test button

Click OK and the following message displays


_4. Review the test results

Page has been correctly identified as Claim_Page and document created Claim_Document

How did it do this? It matched the fingerprint that we defined earlier. See the matching fingerprint ID is
556

Didn’t work for you? Check the fingerprint tab and make sure Fingerprint 556 is set to Claim_Page type
and not Other.

_5. Test the Profiler Task


Select the Profiler task.
Unselect the Routing Ruleset.
Click Test

Click OK and the following message displays


_6. Review the Test Results

Select the field Employee_Name in the batch structure and check the results in the test pane. Select
each of the fields in the batch hierarchy and test the results.

Didn’t work?

Make sure the PageID successfully identifies this as a Claim_Page

Check your RuleSet tab and make sure the settings are correct on the Recognize Pages and Fields ruleset
for the Claim_Page and each of the fields.
1.6 Process Batches

_1. Go to the Shortcuts Pane by clicking the icon for processing batches at the top-left side of the
screen.

1.6.1 VScan Task

_1. Do make sure to add the test image to C:\Datacap\FastApp\images\Input_SingleTIFFs

_2. Next run Vscan task for importing files from the folder by double-clicking the VScan shortcut as
shown below.
Note that if you were using a physical scanner, the paper would
feed through the scanner. FastDoc supports both Twain and ISIS
scanners.

_3. Select Demo_SingleTIFFs Job as shown below.

The system loads the images from the Import folder into Datacap.

1.6.2 PageID

_1. If you don’t see the batch details, click on All.

_2. You should now see the batch details.


_3. Run the PageID Task for importing files from the folder by double-clicking the PageID shortcut as
shown below.

Once PageID is done processing, click on All and you should see your batch ready for the next Task
Profile as shown below.

1.6.3 Profiler Task

_1. Run the Profiler task for importing files from the folder by clicking the Profiler shortcut as shown
below.
Note that if pages are recognized correctly without any low
confidence characters or no validation errors it will be directly sent
to Export instead of Verify.

1.6.4 Export Task

Once the Export task is executed, you can take a look at the output file generated.
1.6.5 Verify Task

_1. The default workflow had been configured to send the batch directly to Export if there is no low
confidence characters or no validation errors issue. What if I want all batches to go through the
verification step.
The following instructions is one of the alternative method to alter the capture workflow.

_2. Launch Internet Explorer and select Datacap IIS Console. Then launch FastApp

_3. Select Administrator


_4. Click on Demo_SingleTIFFs then select New

_5. You will see a New Task been created. Change the name of the New Task to Verify and click Apply

_6. You will see that the Description and Program have been updated automatically.
_7. We will now need to move the Verify task up. Do click on Apply after the Verify task is move above
Export task.

_8. One last step. Select Shortcuts.

_9. Make sure the Verify checkbox for Demo_SingleTIFFs is selected.


_10. Relaunch FastDoc and execute step 61-67. This time the batch should end in Verify task.

_11. Now you can Run the Verify task and examine the extracted data. You can try to the modify the
extracted data and you submit the batch.
Execute the Export task and you can verify if the exported data reflect the data you have modified.

You can now log out from the FastApp application

1.7 Summary
This lab showed you how to build new solutions using Datacap and FastDoc. The Datacap application
was configured using simple configurations and required very little technical knowledge. FastDoc allows
you to get started very quickly with a new Datacap application. It creates a very good foundation on
which to continue adding new capabilities with Datacap studio later on.
Congratulations! You have now completed the Configuring new solutions with FastDoc, a low and no
code tool. You have seen how easy and fast it is to create data capture applications using Datacap.
Lab 2 Configuring a solution with Studio (Semi Structured Form)

2.0 Overview
Virtually every company needs to be able to process invoices. Most companies have some form of A/P
software that is used to track, pay, report on, control payment of, and archive inbound invoices.
Information needs to be read from the invoice and entered into those systems. Manual processing is
costly and error-prone. Automation to provide straight-through processing is the goal.

However, invoices can come from dozens, if not hundreds of different sources. And two invoices from
the same company can have critical information located in different areas if the number of line items
differs from invoice to invoice.

2.1 What is a semi structured document?


Regardless of where the invoices come from, the metadata that an Accounts Payable personnel wants to
extract from the invoices will be the same. Eg. Invoice number, Purchase order number…

If we were to make use of the same methodology as in Lab 1, you will probably need to configure load of
fingerprint. In this lab, we will guide you through how you can classify and extract data without a
fingerprint. This technique is extremely useful in many document capture scenario.

Of course, there are other technique in Datacap where we can configure an application to auto-learn
but that is not the intend of this lab.

Datacap includes many different ways to identify a document type. From traditional manual
identification: barcodes, patch codes, and separator pages to more modern methods such as
fingerprints and natural language processing.
2.2 Retest Fastapp
_1. Download demo_images02.tif. Compare demo_images01.tif and demo_images02.tif.
Are you able to spot any difference between the 2 images?
Identical?
Let’s test it!

_2. Add demo_images02.tif to C:\Datacap\FastApp\images\Input_SingleTIFFs


_3. Rerun Steps from 1.6.1.
The batch will likely end up in the FixUp step.
This will mean error had been detected on this batch, in this case, classification had failed.
So how to troubleshoot?
There are a few ways to troubleshoot. We will normally first examine the log file but some errors
could be detect on Datacap Studio quickly.
So let’s dive into Datacap Studio.

2.3 What is Datacap Studio?

Datacap Studio is a rich development environment which allows a user to easily develop,
modify, and test new Datacap applications without having to have programming or
development skills.

A Datacap application can be thought of as the processing rules for a batch of documents. The
batch can contain one or more document types, each with differing processing requirements.

When creating a Datacap application, you typically start by defining the document hierarchy.
The four elements of a document hierarchy are the batch, the documents in a batch, the pages
in a document, and the fields on a page.

The document hierarchy describes the structure of the batch. It describes;

• different types of documents that can occur in a batch

• structure of each document type, which includes the different types of pages that can
appear in a document.

• various fields that can occur on a given page

Next you create the rulesets that are applied to different elements within the document
hierarchy. Rulesets are composed of rules which are, in turn, composed of predefined actions.

Rules can be “bound” to the different elements within a document hierarchy. You may have
rules that are only executed once (e.g. connecting to a lookup database when the batch is
opened). There may be rules that are executed once per document (for example, uploading a
document to an ECM repository). You could have rules that are only executed once per page
(for example, examining the page to see if it is a blank page). And finally, there could be rules
that are executed one or more times per field (for example, verifying that a field like SSN is in
your customer database).

You use the actions (which are reusable and located in the Actions library) to create functions.
A function can be thought of as a group of actions that work together to perform a specific
task. We will look at the details behind rulesets, rules, functions, and actions in more detail
shortly.

Let’s take a quick look at Datacap Studio.

Note that there are three tabs at the top of the Datacap Studio interface.

1. Rulemanager – this is where main configuration for your application is done

2. Zones – this is where you identify any fingerprints that you might use for classification
purposes, as well as any zones you might want to set up for OCR/ICR.

3. Test – this is a test environment for your application


We’ll focus for the moment on the Rulemanager page. Notice how the Rulemanager page
itself is divided into three main sections. Looking from left to right are the following:

On the far left is the Document hierarchy. This is where the structure of the batch is defined.
The illustration above is for the APT application, which does invoice processing. The batch
(called APT, meaning “Accounts Payable Transactions”) is made up of three possible document
types: Invoices, Separator Pages, and Other (a catch all for anything that cannot be classified).
The Invoice document type can have a Main Page, a Trailing Page, an Attachment Separator
page, and the Attachment itself. On the Main Page are many fields, such as the Vendor
Number and Invoice Total.

The middle section is where all of our rulesets are. A ruleset is made up of one or more rules.
Rules are “bound” to different elements within the document hierarchy. For example, we
might have a VScan rule which controls the scanning of the batch. You would use PageID and
ImageFix rules on individual pages. You might use an Export rule to export invoice data to a
line of business database.

The far right is where the Action library and Task profiles are managed. The Action library
(shown in the above illustration) is where all the reusable actions that come with Taskmaster
are organized. You simply click on actions to make use of them when creating your rules. The
Task profile (not shown in the above illustration) describes the order in which rules are applied
to the document hierarchy. For example, you would want to run Scanning rules first, before
running PageID, which would have to run before you could run Recognition rules.

We will examine all parts of the Datacap Studio in more detail as we go through this lab
exercise.

2.4 Rulesets, Rules, Functions, and Actions

The key to Datacap Taskmaster is the Rules paradigm. It is a unique method for configuring
Capture applications. It stresses efficient reusability and reduces, if not outright eliminates, the
need to do any custom scripting. We will take a moment to examine this important aspect of
Taskmaster configuration.

We’ve introduced the concept of rulesets, rules, functions, and actions. Let’s look at these more
closely.

Actions

Actions are our most basic elements and they perform very specific tasks. An action may
perform OCR, connect to a database, or return information about a field. In addition to
potentially returning information (e.g. the results of a SQL call) all actions will return a Boolean
value (true or false) indicating the success of the action. An example of an action is
“PDFDocumentToImage”. This action takes a PDF document and converts it to a multipage TIFF
image. The action returns true if the conversion completes successfully.
Functions

Functions are groups of actions. The actions within a function are executed in sequence until
one action returns false. If all actions return true, then the function returns true. Let’s look at an
example. Let’s say we’ve created a function that will be used to determine if a field is a proper
zip code. The function could look like this:

Function: Is_5_Digit_Zipcode
IsFieldPercentNumeric(100)
MinimumLength(5)
MaximumLength(5)
The first action returns true if 100% of the characters in the field are numeric. The second
action returns true if the field is at least 5 characters long. The third action returns true if the
field has a maximum length of 5 characters. The actions are executed one after the other as
long as all actions return true. So if the field value is “28010”, then all three actions will return
true and the function will return true. But let’s say the field value was “28O1O” (with capital
“ohs” instead of zeros). Then the first action would return false. The remainder of the actions
would not be executed, and the entire function would return false.

Rules

Rules are a collection of functions. One big difference between rules and functions is that
functions execute actions until an action returns false whereas rules execute functions until a
function returns true. Those familiar with programming can think of the logic associated with
functions as bSocial_Security_Nog equivalent to a series of logical AND conditions. The logic
associated with rules is equivalent to a series of logical OR conditions.

Let’s use an example to see why and how this works. Let’s build on our zip code example. We
could have a rule that looks like the following:

Rule: Is_A_Valid_ZIP_Code

Function1: Is_5_Digit_ZIP_Code
IsFieldPercentNumeric(100)
MinimumLength(5)
MaximumLength(5)

Function2: Is_9_digit_ZIP_Code
IsFieldPercentNumeric(90)
MinimumLength(10)
MaximumLength(10)
The second function would only run if one of the actions in the first function returned false
(remember, functions are logical OR conditions). Say if the zip code was longer than 5
characters or wasn’t completely (100%) numeric. The second function would return true if the
field had 9 out of 10 characters be numeric, and be exactly 10 characters long. This means a
field value of something like 28010-8990 would return true. Given even more functions, we
would continue testing the field against possible ZIP code conditions until one of them returns
true. There’s no reason to continue testing the ZIP code if the first function returns true so we
stop.

Rulesets

Rulesets are groups of related rules. For example, you might associate all the rules that validate your
OCR results into a single ruleset. An example of this might look like the following:
Ruleset: Validations
Rule1: Is_A_Valid_ZIP_Code
Rule2: Is_Date_Valid
Rule3: Customer_Number_In_Database

As we’ve noted before, rules are linked to different elements within the document hierarchy. Rulesets
are linked to tasks in the task profile. We’ll examine this in more detail when we look into workflow.
Troubleshooting

2.5 Create a duplicate of Fastapp using Datacap Studio


Start the Datacap Studio and as usual, you can pin this to your Taskbar.

Click Close to Exit the dialog


_1. Launch Datacap application Wizard

_2. Select Copy an application

_3. Select Fastapp from the list and rename it as FastApp2

_4. Click Next and Finish


_5. Click Close and let’s have a quick overview on what is Datacap Studio.

2.6 Let’s troubleshoot to find out why the previous batch went to a FixUp step.

_1. Click on Connection wizard

_2. Select FastApp2

_3. Login to the app

User: admin

Password: admin

Click Finish
_4. Go to the Test tab

_5. Right click on Vscan and Select New. You will see a batch created

_6. Select ScanFromDisk_SingleTIFFs and click Process Rules.


Next click on Advance to move the batch to next step, PageID

_7. Click Process Rules. This time, instead of Advance, select Keep running.
_8. Let’s examine the batch on the + to expand the batch and select TM000002.
We can see here that TM000001 had been classified as Claim_Page while TM000002 is still
unclassified.
Since both images look exactly alike, why is Datacap not able to classify TM000002?

_9. Click on Cancel to clear the batch


_10. Are the images really the same?
Let’s examine the images. Go back to the images and have a look at the properties of both images.
Although both images look “perfectly” alike but to a capture program, it is different due to the
image dimension.

So what can we do for such a scenario?


In this lab, we will use make use of Datacap Studio to handle this scenario. We can handle this with
FastDoc too but in this lab, we will be guiding you how you can build your own rules to potentially
handle more complex scenario.

2.7 Create a custom Classification Ruleset

_1. Add a Ruleset


_2. Right Click on Identify Pages and select Copy.
Right Click on FastApp2 and select Paste.
Rename the Ruleset to my_Identify_Pages.

_3. Save the ruleset then Publish ruleset

_4. After saving, lock the ruleset again for modification.


_5. Functions that are greyed out means the function has been disabled

_6. Removing unwanted functions


_7. Your final ruleset should look like this.
Do remember to save and publish the ruleset.

_8. Expand OCR/SR function and remove rrCompare(“OCR.SR”,”@P.OCR_Engine”)

_9. Click on the action RecognizePageOCR_S() then click Sync Ruleset


_10. The ocr_sr action library will open.
Select the action Recognize and click Add to function

The Recognize action will be added to Function4a.


Do remove the following action from the function rrCompare(OCR/SR,”@P.OCR_Engine) and
RecognizePageOCR_S().
Repeat this process for function 4b: OCR/A too.
You final function should look like this.
Do remember to save and publish the ruleset.
_11. Add a new function called it Create_CCO

We will need to add some action into this function.


These are the actions to add, AnalyzeLayout, CreateCCOFromLayout and CreateTextFile

The final function should look like this.


Do remember to save and publish the ruleset.
_12. Let’s add this ruleset to PageID Task profiles.
Click on lock Task Profile

Select the ruleset my_Identify_Pages and Add ruleset to profile

Remove IdentifyPages from the PageID task profile


The final Task profile should look like this.
Do remember to save changes and unlock the Task profile.
_13. There are some bug with the Batch Level(Close) function as the function could not be enabled.
No worries, let’s delete the grey out Batch Level(Close) function and recreate another Batch
Level(Close) function adding the ReleaseEngineOCR_A() action.

We need to attach this rule to document hierarchy at Batch Close Level.


First select the rule Batch Level(Close) then select “and at the end of…” on the Properties tab and
tick Batches.
_14. We are almost done so hang on there.
For this application, we will not be using the technique of Fingerprint to classify images.
So let’s disable the Find Fingerprint function. Ensure that these are the only enabled functions.

_15. Next, let’s remove the previously created fingerprint.


Go to the Zones tab

Click on the Claim_Page

Click Remove selected

It should look like this after you remove


the Claim_Page fingerprint.
2.7.1 Application run through

After removing the fingerprint, let us have a quick walk through on we do a quick test of an application
in Datacap. Through the process, we will also get a glimpse of how some of the key mechanism of
Datacap work.

_1. Click on the Test tab. At the same time, open up Window explorer and navigate to
C:\Datacap\FastApp2\batches. If there are old batches in the directory, you can delete it so we can
start with an empty batch folder. I will place Studio and explorer side by side.

_2. Right click on VScan and select New

_3. Click on Select task profile and select ScanFromDisk_SingleTIFFs

Click on green button (Process


rules for target object)
Click on Advance

_4. Navigate back to C:\Datacap\FastApp2\batches folder.


You will see a list of files created.

Right click on VScan.xml and Edit with


Notepad++
The information presented in this XML can be used if needed. Eg. You need to know the
origination of this image, we can retrieve it from the ScanSrcPath tag.
If you have advance the batch to PageID , you can take a look at PageID.xml before you
process the rules of this task.

Now, let’s progress the batch to next step, PageID.

Click on green
button (Process
rules for target
object)

_5. Navigate back to C:\Datacap\FastApp2\batches folder.

Open up the batch folder that you are processing. Have a look at TM0000001.tif and TM0000001.tio.
Open the tio file with an image viewer. You will see that the tio is the original image while the tif file is
the enhanced image.
_6. Have a look at the text file. This is the result from the full page OCR process. This is a very
important process in a capture project. You will need to test out the image to determine the
accuracy level of the image. If the source image is dirty, technique such as image
enhancement can be executed to remove noise for better OCR accuracy. There are instances
when the quality of the image is very poor, in such scenario, there is nothing much image
enhancement can do to improve the OCR process.

_7. You can disable OCR/SR and test the same image with OCR/A. Run the same step as above and
compare the full text OCR result. You will probably notice very minimal difference between
the 2 engines as this image is pretty clean. If you have a more “complex” image then we will
notice some differences. You will also notice the difference in processing speed for the 2
engine.

.
_8. On the classification of the 2 images. You will also notice that both images are now unclassified
(Other).

_9. Setting up of Classification rule

Let’s cancel this batch now so we can setup the classification rule to recognize this image.
_10. Click on the Rulemanager tab.

Click on my_Page_Identify ruleset and click Lock

_11. Modify Function5: Locate

Below are the set of actions that we will need to update with the following parameter.

The following are information on which library you can find the needed Actions and also where to enter
the parameters for the action.
- RegExFind in Locate library
- SetPageType in ApplicationObjects library
- rrSet in RuleRunnerLogic library
_12. Once you are done setting up the classification rule, run through a batch to check if you
classification is correct.
If you see similar result as below, congratulation!
You have successfully setup the Classification rule!
2.7.2 Extraction Rule

Let’s create a new ruleset called Field_Extraction

_1. Add a new ruleset

_2. Rename the Ruleset1 to Field_Extraction and Rule1 to Employee_Name

_3. Add the following action from Locate Library to Function 1, enter the parameter for the action as
listed below, save and publish the ruleset.

_4. Lock the ruleset, copy the Employee rule and paste 9 times
_5. Rename the rest of the rules with these names
- SSN
- Date_of_Birth
- Employer
- Group_Number
- Home_Telephone_Number
- Home_Address
- City
- State
- Zip_Code

This is how the final ruleset will look like

_6. Let’s take a look at the image for the fields we want to extract.
We can observed that for this image, we can find the value below the key, ie the value Mary Smith
is below the key Employee Name.
_7. Let’s enter the key into the parameter of the RegExFind action corresponding to the function name.
_8. You will then attach the rule the field.

To check if the rule had been attached to the field, expand the field to check.
_9. Repeat for the rest of the fields

_10. Now, we will add this ruleset to the Task Profiles – Profiler and remove RecognizePagesAndFields
and Routing
_11. We are ready to do a quick test. Check through both Document to make sure all the fields are
extracted correctly.
_12. The application that you have just build is able to classify and extract data intelligently without
creating any fingerprint.
What if the customer redesign the form?
Can your application handle it?

You can download demo_images03.tif to test it.


.
_13. Wah Lah!
The application managed to capture all the fields even when the document format change.
_14. Of course, in this scenario, the document is very clean and “structured”, plus the accuracy rate of
the current recognition had improved significantly.
Thus the simple rules that we have built is able to classify and extract the data easily. You can study
some of the rules created from the application APT, eg. Rules to extract Invoice Number

_15. As you become more familiar with the concept of building rules, you will be able to build a
robust extraction ruleset.
2.7.3 Verification

Verification is the process of visually ensuring that the required data elements on the invoice were
located correctly and that recognition results are accurate. You can configure the Verification module to
display only the pages that contain “low confidence” recognition results. You can also configure it to
display all documents, regardless of confidence.

_1. Launch FastDoc to test your application from VScan all the way to Export.

_2. Review your Exported data file.

Summary

This lab showed how to use the Datacap APT (Accounts Payable) application to process invoices. This is a
fully functional solution built on top of Datacap. It will allow you to quickly start processing invoices and
learn new formats and invoice types as real batches are being processed. The application can extract
line item data and verify that the line items match the total on the invoice as well as verifying vendor
information from your own database. Datacap for Accounts Payable can greatly enhance the speed and
accuracy of your invoice processing.

Congratulations!! You’ve completed your first application with Datacap Studio. Let’s move on to the
next lab exercise.
Lab 3 Configuring Application to run on Datacap Navigator

3.0 IBM Content Navigator Administration


IBM® Content Navigator is a web client that provides users with a console for working with
content from multiple content servers. IBM Content Navigator also enables users to create
custom views of the content on the web client by creating teamspaces, which provide a focused
view of the relevant documents, folders, and searches that a team needs to complete their
tasks. IBM Content Navigator also includes a powerful API toolkit that you can use to extend the
web client and build custom applications.

This unified user interface integrate seamlessly with IBM Business Automation Workflow and
IBM Datacap too. Let’s add the Datacap application which we have just build to run on Datacap
Navigator.

3.1 Configuring ICN.


Launch ICN Admin := http://ibmdemo16:9080/navigator/?desktop=admin

User name: p8admin

Password: Password1

We will first setup a Repositories


Click on New Repositories and select “Datacap Application”
Enter the following parameters and click Connect.
Once done, Save and Close

Next, we will need to add the repository to the Desktop.


We can either create a new Desktop or copy an existing Desktop.
I would typically copy an existing Desktop, in this case I will copy the desktop with the ID dcAll.
You can rename the Desktop to any name you want but I will stick with Fastapp for this lab.
Do Save and Close.

Do remove the TravelDocs repository.


Let’s test out the newly created Desktop.

Earlier we have added Verify to the workflow for SinglepageTiff, now we will add it to Navigator
Job.
Click on the hamburger icon on the top left and select Datacap Admin Console.

Select Fastapp2 workflow and click Edit


Select Navigator Job and click Edit

Select Task and click New Task

Type Verify for the name and the rest of the information will automatically be populated.
Save and Close
Select Verify and click Move Up

Remember to Save and Close

Similar to what you have done previously, you need to enable shortcut for the Verify task.
Move on the Permissions tab then enable Verify task for Navigator Job.
Remember to Save and Close

3.2 Setting up Rulerunner to run background processing

_3. Let’s run another batch through the system to make sure the new custom panel
works. First let’s change the Datacap server to run two rulerunner threads to run the
PageID and Profiler tasks automatically.

Start the Datacap Rulerunner Manager

Select Start -> All Programs -> IBM Datacap Services -> Datacap Rulerunner Manager.

Stop the service if it is already started.


_4. Switch to the Rulerunner Login tab.

User admin, password admin and station id 1 for the Datacap Authentication.

Click Connect.

Let’s first remove all the rulerunner threads.


Click on the Workflow: Job: Task tab.
This screen is divided into two sections.
The left is a list of all the applications on your system.
Right click on the thread and select Remove.
_5. The right is now an empty frame. Right click on the right empty frame and
select Threads/Add Thread or Click <Ctrl><Shift>N

_6. In the left hand panel, click on the FastApp check box. This will expand all the
child notes. Under the Beneficiary application’s Main Job, select:

PageID

Profiler

Export
What is important for this lab is adding PageID, Profiler and Export task to Rulerunner0 thread.
Click on Navigator Job on the left and drag it to Rulerunner0 thread.

Note: You could also add others, including those under


Web Job, but for the purposes of this exercise we will
only be using Main Job.

Change the Sleep For timer to 2 seconds. This is optional, but good for this demo so we
don’t have to wait for batches to process.

Click Save. If the configuration file does not exist, you may get a confirmation screen
asking you to create it. Click Yes.

_7. Click the Rulerunner tab. Start the service, then close the Rulerunner Manager.
3.3 Retest your application with Datacap Navigator.

Login to the Datacap Navigator desktop and Click Scan

Once you click Browse, easiest route go to the


C:\Datacap\FastApp2\images\Input_SingleTIFFs.
Select all 3 images, click Open then click Scan
Click Submit

Wait a while for the background task to finish executing.


You can click either on Start or Verify
Click through the different images from the Batch Structure.

Once done, you can submit the batch.


You can validate your export folder too.
Lab 4 Exporting to Filenet CM

4.0 Overview
For this lab, you will the document to Filenet.

4.1 Create a new document class in Filenet via ACCE


Log in ibmdemo16:9080/ace
user name: p8admin
Password: Password1
Go to Focus Corp Repository

4.1.1 Adding Property Template


Traverse down to Data Design – Property Template
Right click on New Property Template

We will create the same set of metadata from your Claim_Page document.
Repeat for the rest of the Claim_Page’s metadata
We will now create Document class Claim_Page and add the relevant property template.
Let’s create a Demo folder for Datacap to connect to.
4.1.2 Configuring Datacap’s rules to export

Let’s add a create pdf ruleset to your appl

Once Installed, you should see in your application rulesets


Right Click on the pre-compiled ruleset Create TIFF or PDF
Add this ruleset to Export Task Profile

Let’s add Filenet ruleset to application too.

Let’s configure the Filenet Export Ruleset.


Click on Settings

Enter the following information


First click on the Fastapp2 on the document
Filenet URL: http://localhost:9080/wsi/FNCEWS40MTOM
User ID: p8admin
Password: Password1
Storage object id: OS1
Parent Folder: Demo

Next we will setup Document name


Click on Claim_Document on the hierarchy.
Enter the information as according to the image below
Next we will configure the metadata
Click on each of the field on the Document Hierarchy.
Enter the symbolic name of the metadata as of what you define in Filenet.
PS. Exclude City and State
Add Filenet Export ruleset to Export Task Profile

Do launch Datacap Application Manager to enter Filenet Password: Password1


There is a small bug in this environment.
We will need to download the this application Microsoft WSE 3.0.msi to install it.
Click on next and install just the runtime.

4.1.3 You can test your application directly from Datacap Studio
Go to Test tab, right click on VScan and create a batch.
Click on Advance and process rules for PageID and Profiler

You will not process any rules for Verify, just click Advance to Export.
Click on Process Rules for the Export Step

Once done, you can check the exported document in Content Navigator.
You can download the file to have a quick view.
A PDF/A document had been generated.

You can also take a quick look at the Document Properties.


4.2 Summary
Congratulations! You have successfully capture a set of images, classify and extracted the metadatas and
exported the Document as PDF into a Filenet Content Management.
There are still a lot that you can do with Datacap.
Have a read on the Application Development Guide if you would like to learn more.

You might also like