IBM QualityStage V11.5.x Standardizing Dat v2
IBM QualityStage V11.5.x Standardizing Dat v2
x Standardizing
Data
B. Creating phonetic representations of data that can be used in the matching process
C. Creating transformation rules for matching and for creating the load file
The Standardize stage has only one output link. This link can send the raw input
and the standardized output to any other stage. The Standardize stage creates
multiple columns that you can send along with the input columns to the output
link. Any columns from the original input can be written to the output along with
additional data created by the Standardize stage based on the input data (such
as a SOUNDEX phonetic or NYSIIS codes). The Match stage and other stages
can use the output from the Standardize stage—you can use any of the
additional data for blocking and matching columns in Match stages (more on this
in 1.8, “Match stage” on page 82).
A. NYSIIS
B. Marlboro
C. USSUBURBS
D. EU
E. Soundex
A. Word Investigation Word Frequency reports are useful in implementing Classification Overrides
D. Classification overrides are available in domain pre-processor rule sets and in domain-specific
rule sets
Classifications
The Classification Table is used in the standardization process to identify and classify key words
such as titles, street name, street type, and directions. The Classification Table includes the name
of the rule set and the classification legend. Click CLS in Figure 1-33 to view its contents. The
partial contents of the Classification Table for the USPREP rule set is shown in Example 1-1 on
page 33. As already mentioned, you can gain valuable insight by browsing the Classification tables
to determine the classification codes, as well as the different literals supported such as ZQMIXAZQ
and ZQNAMEZQ.
Selecting override object types to modify rule sets In the Rules Management window as shown in
Figure 1-33 on page 62, Overrides provides editing windows to customize rule sets for your
business requirements.
IBM WebSphere QualityStage provides five methods of rule set overrides as follows:
Classification
You can modify the classification table of any rule set using the Designer client. Figure 1-34 on
page 65 and Figure 1-35 on page 66 show the classification table override for the domain-specific
USADDR rule set.
In the Input Token field, type the word (AVEDUE) for which you want to override the classification
as it appears in the input file. In the Standard Form field, type the standardized spelling (AVE) of
the token.8 From the Classification menu, select the one-character tag (T- Street Types) that
indicates the class of the token word. In the Comparison Threshold field, type a value (850)9 that
defines the degree of uncertainty to tolerate in the spelling of the token word. Click Add in Figure
1-34 to add the override to the pane at the bottom of the window as shown in Figure 1-35 on page
66.
After you create the override (and provision it), the next time you run the rule set, the word
tokens are classified with the designations you specified and appear with the appropriate standard
form.
With the input pattern override, you can specify token overrides that are based on the input
pattern. The input pattern overrides take precedence over the pattern-action file. Input pattern
overrides are specified for the entire input pattern.
Select one:
^|D|+|T
COPY [1] {HouseNumber}
COPY [2] {StreetPrefixDirectional}
COPY [3] {StreetName}
COPY [4] {StreetSuffixType}
EXIT
Select one:
B. The second pass through the classification table looks for a fuzzy match based on the threshold
level
C. It is always required
D. The second pass through the classification table looks for an exact match
7. Which of the following statement
about Overrides is TRUE?
Select one:
B. Overrides are used to customize rule sets by applying changes to the Pattern Action File
You can also apply rule sets for international stages such as Worldwide Address Verification and
Enhancement System (WAVES) and Multinational Standardize Stage (MNS). With all of these
stages, you can use rules management (that is modify existing rules and add new rules).
Lookup Tables
Click Reference Tables in the Rules Management window in Figure 1-33 on
page 62 to view information about the rule set.
Select one:
A. Lookup table
B. Dictionary file
B. Text overrides must not include character sets with UTF-8 encoding
C. Unhandled Text Overrides only apply to short strings (less than 20 characters)
E. Text Overrides are used for special cases and specific handling of a string of text
E. It involves decomposing free-form fields into single-component fields and assigning data to its
appropriate metadata field
The Standardize stage processes the data with the following outcome:
_ Creates fixed-column, addressable data
_ Facilitates effective matching
_ Enables output formatting
The Standardize stage parses free-form and fixed-format columns into
single-domain columns to create a consistent representation of the input data.
_ Free-form columns contain alphanumeric information of any length as long as
it is less than or equal to the maximum column length defined for that column.
_ Fixed-format columns contain only one specific type of information, such as
only numeric, character, or alphanumeric, and have a specific format