0% found this document useful (0 votes)
58 views

IBM QualityStage V11.5.x Standardizing Dat v2

The document discusses the standardization process in IBM QualityStage. It can identify new fields based on underlying data, such as setting flags. The standardize stage creates additional columns that can be used for blocking and matching. Phonetic codes like NYSIIS are used in matching. Classification overrides can modify rule sets and take precedence over classification tables. Rule sets can contain lookup tables that are called from pattern action files. The standardization process involves parsing free-form fields, assigning tokens to fields, and creating addressable output.

Uploaded by

Antonio Blanco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

IBM QualityStage V11.5.x Standardizing Dat v2

The document discusses the standardization process in IBM QualityStage. It can identify new fields based on underlying data, such as setting flags. The standardize stage creates additional columns that can be used for blocking and matching. Phonetic codes like NYSIIS are used in matching. Classification overrides can modify rule sets and take precedence over classification tables. Rule sets can contain lookup tables that are called from pattern action files. The standardization process involves parsing free-form fields, assigning tokens to fields, and creating addressable output.

Uploaded by

Antonio Blanco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

IBM QualityStage V11.5.

x Standardizing
Data

1. The Standardize Stage identifies


new fields based on underlying data.
Which of the following are examples of
this process?

Select one or more:

A. Setting a name-type flag to differentiate between an individual address and an organization


address

B. Creating phonetic representations of data that can be used in the matching process

C. Creating transformation rules for matching and for creating the load file

D. Setting an address-type flag

The Standardize stage has only one output link. This link can send the raw input
and the standardized output to any other stage. The Standardize stage creates
multiple columns that you can send along with the input columns to the output
link. Any columns from the original input can be written to the output along with
additional data created by the Standardize stage based on the input data (such
as a SOUNDEX phonetic or NYSIIS codes). The Match stage and other stages
can use the output from the Standardize stage—you can use any of the
additional data for blocking and matching columns in Match stages (more on this
in 1.8, “Match stage” on page 82).

2. Which of the following are phonetic


codes used in Matching?
Select one or more:

A. NYSIIS

B. Marlboro

C. USSUBURBS

D. EU

E. Soundex

3. Which of the following are TRUE


statements about Classification
Overrides?

Select one or more:

A. Word Investigation Word Frequency reports are useful in implementing Classification Overrides

B. User Overrides cannot override or add token values

C. Classification overrides do not take precedence over a classification table

D. Classification overrides are available in domain pre-processor rule sets and in domain-specific
rule sets

Classifications
The Classification Table is used in the standardization process to identify and classify key words
such as titles, street name, street type, and directions. The Classification Table includes the name
of the rule set and the classification legend. Click CLS in Figure 1-33 to view its contents. The
partial contents of the Classification Table for the USPREP rule set is shown in Example 1-1 on
page 33. As already mentioned, you can gain valuable insight by browsing the Classification tables
to determine the classification codes, as well as the different literals supported such as ZQMIXAZQ
and ZQNAMEZQ.

Selecting override object types to modify rule sets In the Rules Management window as shown in
Figure 1-33 on page 62, Overrides provides editing windows to customize rule sets for your
business requirements.
IBM WebSphere QualityStage provides five methods of rule set overrides as follows:
Classification
You can modify the classification table of any rule set using the Designer client. Figure 1-34 on
page 65 and Figure 1-35 on page 66 show the classification table override for the domain-specific
USADDR rule set.
In the Input Token field, type the word (AVEDUE) for which you want to override the classification
as it appears in the input file. In the Standard Form field, type the standardized spelling (AVE) of
the token.8 From the Classification menu, select the one-character tag (T- Street Types) that
indicates the class of the token word. In the Comparison Threshold field, type a value (850)9 that
defines the degree of uncertainty to tolerate in the spelling of the token word. Click Add in Figure
1-34 to add the override to the pane at the bottom of the window as shown in Figure 1-35 on page
66.
After you create the override (and provision it), the next time you run the rule set, the word
tokens are classified with the designations you specified and appear with the appropriate standard
form.

With the input pattern override, you can specify token overrides that are based on the input
pattern. The input pattern overrides take precedence over the pattern-action file. Input pattern
overrides are specified for the entire input pattern.

4. Where can you launch the SRD?


Select one or more:
A. From the Director client
B. From any web browser, with the exception of Firefox
C. From the Information Server Launch Pad
D. From the QualityStage Rules Management dialog in the Designer Client
5. Which statement below is a valid
Pattern Action File statement?

Select one:

A. COPY_A [3] (StreetType}

B. COPY_S [2] {StreetName)

C. COPY [1] (HouseNumber)

D. COPY [1] {HouseNumber}

^|D|+|T
COPY [1] {HouseNumber}
COPY [2] {StreetPrefixDirectional}
COPY [3] {StreetName}
COPY [4] {StreetSuffixType}
EXIT

6. Which of the statement below is


TRUE about the Comparison
Threshold?

Select one:

A. It cannot be used in the Classification Table

B. The second pass through the classification table looks for a fuzzy match based on the threshold
level

C. It is always required

D. The second pass through the classification table looks for an exact match
7. Which of the following statement
about Overrides is TRUE?

Select one:

A. Overrides cannot be tested with Rules Analyzer

B. Overrides are used to customize rule sets by applying changes to the Pattern Action File

C. Overrides are used to correct problems found during standardization

D. Administrator status is always required to create Overrides

8. Which of the following are methods


used to standardize international data?

Select one or more:

A. Use a country pre-processor with a domain pre-processor and domain-specific rules

B. Use a default country code designated by ZC…default value…ZC

C. Use a Multinational Standardize or Address Verification Interface

D. Use a four-byte ISO country code

You can also apply rule sets for international stages such as Worldwide Address Verification and
Enhancement System (WAVES) and Multinational Standardize Stage (MNS). With all of these
stages, you can use rules management (that is modify existing rules and add new rules).

9. Which statements below are TRUE


about Lookup Tables?
Select one or more:

A. QualityStage does not use Lookup Tables

B. Rule sets are being phased out in recent versions of QualityStage

C. Rule sets can contain Lookup Tables

D. They are called from the Pattern Action File

Lookup Tables
Click Reference Tables in the Rules Management window in Figure 1-33 on
page 62 to view information about the rule set.

10. Which statements are TRUE about


Rule set revision?

Select one or more:

A. Unpublished changes can be used in the Standardize stage

B. Changes are saved in the SRD database

C. It is a way to save and revert changes to rule sets

D. You can roll back changes by resetting a revisión

11. When using Rule Sets, which of the


following is an optional file?

Select one:

A. Lookup table

B. Dictionary file

C. Pattern action file


D. Classification table

12. Which statements are TRUE about


Text Overrides?

Select one or more:

A. Input Text Overrides apply to the original text string

B. Text overrides must not include character sets with UTF-8 encoding

C. Unhandled Text Overrides only apply to short strings (less than 20 characters)

D. Text Overrides can use partial string matching

E. Text Overrides are used for special cases and specific handling of a string of text

13. Which of the following are TRUE


about the Standardization
Transformation process?
Select one or more:

A. It may execute a Dictionary File script

B. It may use a comparison threshold for classifying like words

C. It may involve parsing free-form fields

D. It may involve bucketing data tokens

E. It involves decomposing free-form fields into single-component fields and assigning data to its
appropriate metadata field

The Standardize stage processes the data with the following outcome:
_ Creates fixed-column, addressable data
_ Facilitates effective matching
_ Enables output formatting
The Standardize stage parses free-form and fixed-format columns into
single-domain columns to create a consistent representation of the input data.
_ Free-form columns contain alphanumeric information of any length as long as
it is less than or equal to the maximum column length defined for that column.
_ Fixed-format columns contain only one specific type of information, such as
only numeric, character, or alphanumeric, and have a specific format

You might also like