Reading Sample SAPPRESS 1167 SAP Data Services
Reading Sample SAPPRESS 1167 SAP Data Services
Reading Sample
In this selection from Chapter 5, youll get a taste of some of the
main building blocks that make the jobs run in SAP Data Services.
This chapter provides both the information you need to understand
each of the objects, as well as helpful step-by-step instructions to
get them set up and running in your system.
Introduction
Objects
Contents
Index
The Authors
Bing Chen, James Hanck, Patrick Hanck, Scott Hertel, Allen Lissarrague, Paul Mdaille
www.sap-press.com/3688
2015 by Rheinwerk Publishing, Inc. This reading sample may be distributed free of charge. In no way must the file be altered, or
individual pages be removed. The use for any commercial purpose other than promoting the book is strictly prohibited.
Introduction
Welcome to SAP Data Services: The Comprehensive Guide. The mission of this book
is to capture information in one source that will show you how to plan, develop,
implement, and perfect SAP Data Services jobs to perform data-provisioning processes simply, quickly, and accurately.
This book is intended for those who have years of experience with SAP Data Services (which well mainly refer to as just Data Services) and its predecessor, Data
Integrator, as well as those who are brand new to this toolset. The book is
designed to be useful for architects, developers, data stewards, IT operations, data
acquisition and BI teams, and management teams.
17
Introduction
and other key functionality for organizations EIM strategy and solutions.
Finally, the book looks into the outlook of Data Services.
With these four parts, the aim is to give you a navigation panel on where you
should start reading first. Thus, if your focus is more operational in nature, youll
likely focus on Part I, and if youre a developer new to Data Services, youll want
to focus on Parts II and III.
The following is a detailed description of each chapter.
Chapter 1, System Considerations, leads technical infrastructure resources,
developers, and IT managers through planning and implementation of the Data
Services environment. The chapter starts with a look at identifying the requirements of the Data Services environments for your organization over a five-year
horizon and how to size the environments appropriately.
Chapter 2, Installation, is for technical personnel. We explain the process of
installing Data Services on Windows and Linux environments.
Introduction
Chapter 3, Configuration and Administration, reviews components of the landscape and how to make sure your jobs are executed in a successful and timely
manner.
Chapter 4, Application Navigation, walks new users through a step-by-step process on how to create a simple batch job using each of the development environments in Data Services.
Chapter 13, SAP Information Steward, is for the stewardship and developer
resources and explores common functionality in Information Steward that
enables transparency and trust in the data provisioned by Data Services and how
the two technology solutions work together.
Chapter 5, Objects, dives into the primary building blocks that make a job run.
Along the way, we provide hints to make jobs run more efficiently and make
them easier to maintain over their useful life expectancy and then some!
Chapter 14, Where Is SAP Data Services Headed?, explores the potential future
of Data Services.
18
You can also find the code thats detailed in the book available for download at
www.sap-press.com/3688.
Common themes presented in the book include simplicity in job creation and the
use of standards. Through simplicity and standards, were able to quickly hand off
to support or comprehend from the development team how jobs work. Best practice is to design and build jobs with the perspective that someone will have to
support it while half asleep in the middle of the night or from the other side of
the world with limited or no access to the development team. The goal for this
book is that youll be empowered with the information on how to create and
improve your organizations data-provisioning processes through simplicity and
standards to ultimately achieve efficiency in cost and scale.
19
This chapter dives into the primary building blocks that make a job run.
Along the way, we provide hints to make jobs run more efficiently and
make them easier to maintain over their useful life expectancy and then
some!
Objects
Now that you know how to create a simple batch integration job using the two
integrated development environments (IDEs) that come with SAP Data Services,
we dive deeper and explore the objects available to specify processes that enable
reliable delivery of trusted and timely information. Before we get started, we
want to remind you about the Data Services object hierarchy, illustrated in Figure
5.1. This chapter traverses this object hierarchy and describes the objects in
detail.
As you can see from the object hierarchy, the project object is at the top. The project object is a collection of jobs that is used as an organization method to create
job groupings.
Note
A job can belong to multiple projects.
Well spend the remainder of this chapter discussing the commonly used objects
in detail.
207
Objects
Jobs
Project
Function *
Workflow **
Data Flow
Source
Target
Transform
Script
Annotation
Datastore
Formats
Data Integrator
Transforms
Document
File Format
Data Quality
Transforms
Message
Function
COBOL
Copybook
File Format
Platform
Transforms
Outbound
Message
DTD
Table
Excel
Workbook
Format
Template
Table
XML File
Log
Conditional
While Loop
Try-Catch
XML Message
XML Schema
XML Template
5.1
This specifies the polling, in seconds, of how often you want logs to capture status
about sourcing, targeting, and transforming information. Specifying a small number captures information more often, and specifying a large number consumes
fewer system resources and lets more time pass before youre presented with the
errors in the logs.
Jobs
When Data Services is considered as an integration solution (and any of its predecessor names such as BODS, BODI, DI, Acta), its often only considered as an
Extraction, Transformation, Load (ETL) tool where a job is scheduled, runs, and
finishes. Although this use case only describes the batch job, in addition to the
batch job object type, there also exists a real-time job object type. The job object
is really two distinct objects, the batch job object and the real-time job object. We
break each of these objects down in detail in the following sections.
5.1.1
Trace Messages: Printing All Trace Messages versus Specifying One by One
By specifying to Print All Trace Messages, you ignore the individually selected
trace options. With all traces being captured, the results can be quite verbose, so
this option shouldnt be used when diagnosing an issue, although it does present
a simple method of capturing everything. Alternatively, if Print All Trace Messages is unchecked, you can specify individually which traces you want to be captured.
You might be thinking that a batch job sounds old, decrepit, and maybe even
legacy mainframe-ish. In a way, a batch job is like a legacy mainframe process, as
208
209
5.1
Objects
Jobs
If your job contains data flows with validation transforms, selecting this option
will forgo collecting those statistics.
A job can be executed from any job server linked with the local repository upon
which the job is stored. A listing of these job servers is presented in the dropdown list box. If one or more groups have been specified to enable load balancing, they too are listed as options.
Enable Auditing
If youve set audits via Data Services Designer (e.g., a checksum on a field in a table
source within a data flow), this will enable you to toggle the capture of that information on and off.
Collect Statistics for Optimization and Use Collected Statistics
By running your batch job with Collect statistics for optimization set, each
row being processed by contained data flows has its cache size evaluated. So jobs
being executed in production shouldnt be scheduled to always collect statistics
for optimization. After statistics have been collected and the job is executed with
the Use Collected Statistics option, you can determine whether to use In Memory or Pageable caches as well as the cache size.
Collect Statistics for Monitoring
Note
Job servers within a job server group collect load statistics at 60-second intervals to calculate a load balance index. The job server with the lowest load balance index is
selected to execute the next job.
Distribution Level
If youve selected for the job to be executed on a server group, the job execution
can be processed on multiple job servers depending on the distribution level
value chosen. The choices include the following:
Job Level
No distribution among job servers.
This option captures the information pertaining to caching type and size into the
log.
There are also execution options that enable you to control how the job will perform and pick up after a failed run. We discuss these in the following subsections.
Enable Recovery and Recover Last Failed Execution
By selecting Enable Recovery, a job will capture results to determine where a job
was last successful and where it failed. Then if a job did fail with the Enable
Recovery set, a subsequent run can be performed that will pick up at the beginning of the last failed object (e.g., workflows, data flows, scripts, and a slew of
functions).
Note
A workflows Recover as a unit option will cause Recover last failed execution to start
at the beginning of that workflow instead of the last failed object within that workflow.
210
5.1.2
Like a batch job, a real-time job object has execution properties that enable differing levels of tracing and some of the execution objects. Unlike a batch job, a realtime job is started and typically stays running for hours or days at a time. Now
this doesnt mean real-time jobs are slow; rather, they stay running as a service
and respond to numerous requests in a single run. The rest of this section will
show how you create a real-time job, execute a real-time job, and finally make
requests.
To create a real-time job from the Data Services Designer IDE, from the Project
Area or the Jobs tab of the local repository, right-click, and choose New Realtime Job as shown in Figure 5.2.
211
5.1
Objects
Jobs
If we were to replace the annotations with objects that commonly exist in a realtime job, it might look like Figure 5.4. In this figure, you can see that initialization, real-time processing loop, and clean-up are represented by SC_Init, DF_GetCustomerLocation, and SC_CleanUp, respectively.
The new job is opened with process begin and end items. These two items segment the job into three sections. These sections are the initialization (prior to process begin), real-time processing loop (between process begin and end), and
cleanup (after process end) as shown in Figure 5.3.
Within the DF_GetCustomerLocation data flow, youll notice two sources and one
target. Although there can be many objects within a real-time jobs data flow, its
required that they at least have one XML Message Source object and one XML
Message Target object.
A real-time job can even be executed via Data Services Designer the same way
you execute a batch job by pressing the (F8) button (or the menu option Debug
Execute) while the job window is active. Even though real-time jobs can be executed this way in development and testing situations, the execution in production
implementations is managed by setting the real-time job to execute as a service
via the access server within the Data Services Management Console.
Note
For a real-time job to be executed successfully via Data Services Designer, the XML
Source Message and XML Target Message objects within the data flow must have their
XML test file specified, and all sources must exist.
Figure 5.3 Real-Time Job Logical Sections
212
213
5.1
Objects
Jobs
To set up the real-time job to execute as a service, you navigate to the Real-Time
Services folder, as shown in Figure 5.5, within the access server under which we
want it to run.
Now that the service is running, it needs to be exposed. To do this, navigate to the
Web Services Configuration page, select Add Real-time Service in the combo
list box at the bottom, and then click Apply (see Figure 5.8).
After the configuration has been applied, you can start the service by clicking the
Real-Time Services folder, selecting the newly added service, and then clicking
Start (see Figure 5.7). The status changes to Service Started.
Figure 5.8 Web Services Configuration
214
215
5.1
Objects
Workflow
Note
At this point, you can also view the history log from the Web Services Status page
where you can see the status from initialization and processing steps. The clean-up process status isnt included because the service is still running and hasnt yet been shut
down.
To test the web service externally from Data Services, a third-party product such
as SoapUI can be used. After importing the web services Web Services Description Language (WSDL) information into the third-party product, a request can be
created and executed to generate a response from your real-time job. The sample
request and response are shown in Figure 5.11 and Figure 5.12.
The status of the web service can now be viewed from the Web Services Status
tab, where it appears as a published web service (see Figure 5.10). The web service is now ready to accept web service requests.
The rest of this chapter focuses on the remaining Figure 5.1 objects that job
objects rely on to enable their functionality and support their processing.
5.2
216
Workflow
A workflow and a job are very similar. As shown in the object hierarchy, many of
the same objects that interact with a job can optionally also interact with a workflow, and both can contain objects and be configured to execute in a specified
217
5.2
Objects
order of operation. Following are the primary differences between a job and a
workflow:
Although both have variables, a job has global variables, and a workflow has
parameters.
A job can be executed, and a workflow must ultimately be contained by a job
to be executed.
A workflow can contain other workflows and even recursively call itself,
although a job isnt able to contain other jobs.
A workflow has the Recover as a unit option, described later in this section,
whereas a workflow has the Continuous option, also described later in this
section.
You might wonder why you need workflows when you can just specify the order
of operations within a job. In addition to some of the functionality mentioned
later in this section, the use of workflows in an organization will depend heavily
on the organizations standards and design patterns. Such design patterns might
be influenced by the following:
Organizations that leverage an enterprise scheduling system at times like workflow logic to be built into its streams instead of the jobs in which it calls. In such
instances, a rule can emerge that each logical unit of work is a separate job. This
translates into each job having one activity (e.g., a data flow to effect the insertion of data from table A to table B). On the other side of the spectrum, organizations that build all their workflow into Data Services jobs might have a single job provision customer from the source of record to all pertinent systems.
This latter design pattern might have a separate workflow for each pertinent
system so that logic can be compartmentalized to enable easier support and/or
rerun ability.
Organizations may have unique logging or auditing activities that are common
across a class of jobs. In such cases, rather than rebuilding that logic over and
over again to be included in each job, your best option is to create one workflow in a generic manner to contain that logic so that it can be written once and
used by all jobs within that class.
Recommendation
Perform a yearly review of the jobs released into the production landscape. Look for
opportunities for simplification where an activity has been written multiple times in
218
Workflow
multiple jobs. These activities can be encapsulated within a common workflow. Doing
so will make future updates to that activity easier to implement and ensure that all consuming jobs of the activity get the update. When creating the workflows to contain the
common activity, make sure to denote that commonality with the name of the workflow. This is often done with a CWF_ prefix instead of the standard WF_.
5.2.1
Areas of a Workflow
A workflow has two tabs, the General tab and the Continuous Options tab. In
addition to the standard object attributes such as name and description, the General tab also allows specification of workflow-only properties as show in the following list and in Figure 5.13:
1 Execution Type
There are three workflow execution types:
Regular is a traditional encapsulation of operations and enables subworkflows.
Continuous is a workflow introduced in Data Services version 4.1 (this is
discussed in detail in Section 5.2.2).
Single is a workflow that specifies that all objects encapsulated within it are
to execute in one operating system (OS) process.
2 Execute only once
In the rare case where a workflow has included a job more than once and only
one execution of the workflow is required. You may be wondering when you
would ever include the same workflow in the same job. Although rare, this
does come in handy when you have a job with parallel processes, and the workflow contains an activity that is dependent on more than one subprocess and
varies which subprocess finishes first. In that case, you would include the
workflow prior to each subprocess and check the Execute only once checkbox
on both instances. During execution, the first time instance to execute will execute the operations, and the second execution will be bypassed.
3 Recover as a unit
This forces all objects within a workflow to reexecute when the job being executed with the Enable recovery option subsequently fails, and then the job is
reexecuted with the Recover from last failed execution option. If the job is
executed with Enable recover, and the workflow doesnt have the Recover as
a unit option selected, the job restarts in recovery mode from the last failed
219
5.2
Objects
Workflow
object within that workflow (assuming the prior failure occurred within the
workflow).
4 Bypass
This was introduced in Data Services 4.2 SP3 to enable certain workflows (and
data flows) to be bypassed by passing a value of YES. This has a stated purpose
to facilitate testing processes where not all activities are required and isnt
intended for production use.
requirements can be met without a continuous workflow, the technical performance and resource utilization are greatly improved by using it. As shown in Figure 5.15, the instantiation and cleanup processes only need to occur once (or as
often as resources are released), and as shown in Figure 5.16, instantiation and
cleanup processes need to occur for each cycle, resulting in more system resource
utilization.
5.2.2
Continuous Workflow
Setting the stage, an organization wants a process to wait for some event to occur
before it kicks off a set of activities, and then upon finishing the activities, it needs
to await the next event occurrence to repeat. This needs to occur from some start
time to some end time. To effect a standard highly efficient process, Data Services
4.1 released the continuous workflow execution type setting. After setting the
execution type to Continuous, the Continuous Options tab becomes enabled
(see Figure 5.14).
Prior to Data Services 4.1, this functionality could be accomplished by coding
your own polling process to check for an event occurrence, perform some activities, and then wait for a defined interval period. Although the same functional
220
Instantiate
Workflow
Instantiate
Data Flow
Clean Up
Workflow
Clean Up
Data Flow
Start
Workflow
Start
Loop
Data Flow
Complete
Complete
Workflow
Data Flow
221
5.2
Objects
Data Flows
5.3.2
The while loop has two components, a condition and a workspace area. Upon the
while condition resulting in true, and while it stays true, the objects within its
workspace are repeatedly executed.
Instantiate
Workflow
Start
Workflow
Loop
5.3.3
Instantiate
Data Flow
Clean Up
Workflow
Start
Data Flow
Complete
Workflow
Clean Up
Data Flow
Logical flow objects are used to map business logic in the workflows. These
objects are available in the job and workflow workspace. These objects are not
available in the data flow workspace. The logical flow objects consist of conditional, while loop, and try and catch block objects.
5.3.1
The try and catch objects, also collectively referred as a try and catch block, refer
to two separate objects that are always used in a pair.
The try object is dragged onto the beginning of a sequence of objects on a job or
workflow, and a catch object is dragged to where you want the execution encapsulation to end. Between these two objects, if an error is raised in any of the
sequenced objects, the execution moves directly to the catch object, where a new
sequence of steps can be specified.
Complete
Data Flow
5.3
While Loop
Conditional
5.4
Data Flows
Data flows are collections of objects that represent source data, transform objects,
and target data. As the name suggests, it also defines the flow the data will take
through the collection of objects.
Data flows are reusable and can be attached to workflows to join multiple data
flows together, or they can be attached directly to jobs.
There are two types of data flows: the standard data flow and an ABAP data flow.
The ABAP data flow is an object that can be called from the standard object. The
advantage of the ABAP data flow is that it integrates with SAP applications for better performance. A prerequisite for using ABAP data flows is to have transport
files in place on the source SAP application.
The following example demonstrates the use of a standard data flow and an ABAP
data flow. Well create a standard data flow that has an ABAP data flow that
accesses an SAP ECC table, a pass-through Query transform, and a target template
table.
222
223
5.4
Datastores
</ENTRY>
<ENTRY>
<SELECTION>11</SELECTION>
<PRIMARY_NAME1>TRINITY PL</PRIMARY_NAME1>
</ENTRY>
</SUGGESTION_LIST>
Listing 5.1 SUGGESTION_LIST Output Field
5.6
Datastores
Datastores connecting to relational databases, as explored in Chapter 3 and Chapter 4, are also able to source and target applications such as SAP Business Warehouse (SAP BW) and a new datastore type introduced in Data Services version
4.2, RESTful web services, which will be explored in this section.
5.6.1
Because different objects are exposed on the SAP BW side for reading and loading
data, as shown in Figure 5.109, Data Services uses two different types of datastores:
299
5.6
Objects
Datastores
user ID and not a dialog user ID. The SAP Application Server name and SAP
Gateway Hostname fields correspond with the server name of your SAP BW
instance. The Client Number and System Number also correspond with those of
your SAP BW instance.
DataSource/PSA
Staging BAPI
Management Console
InfoCube
RFC Server
InfoObjects
Designer
ad
Datastore Object
Lo
Read
Job Server
There are several steps to setting up a job to read data from SAP BW. Well detail
them all here.
300
301
5.6
Objects
Datastores
5.6.2
If youll be loading data into SAP BW, there are five main steps, which we discuss
in the following subsections:
1. Set up InfoCubes and InfoSources in SAP BW.
2. Designate Data Services as a source system in SAP BW.
3. Create a SAP BW target datastore in Data Services.
4. Import metadata with the Data Services datastore.
5. Construct and execute a job.
Setting Up the InfoCubes and InfoSources in SAP BW with the SAP Data
Warehousing Workbench
An InfoSource will be used to hold data that is placed from Data Services. You create and activate an InfoSource using the following steps:
302
303
5.6
Objects
Datastores
1. In the Modeling section of SAP Data Warehousing Workbench, go to the InfoSource window (InfoSources tab).
2. Right-click InfoSources at the top of the hierarchy, and select Create Application Component. (Application components are tree structures used to organize
InfoSources.)
3. Complete the window that appears with appropriate information. So for Application Comp, for example, you might enter DSAPPCOMP. For Long description, you might enter Data Services application component. Press (Enter).
4. The application component is created and appears in the hierarchy list. Rightclick the name of your new application component in the component list, and
select Create InfoSource.
5. The Create InfoSource: Select Type window appears. Select Transaction
data as the type of InfoSource youre creating, and press (Enter).
6. The Create InfoSource (transaction data) window appears. Enter the appropriate information, and click (Enter). The new InfoSource appears in the hierarchy under the application component name.
InfoCubes should be created and activated where the extracted data will ultimately be placed.
5.6.3
1. In the Data Services Designer, right-click and select New in the Datastore tab
of the Local Object Library.
2. Within the Create New Datastore editor, select SAP BW Target as shown on
Figure 5.113.
3. Click the Advanced button to open the lower editor window.
4. Enter the Client number and System number.
5. Click OK to complete the datastore creation.
http://maps.googleapis.com/maps/api/geocode/xml?address=Chicago
<?xml version="1.0" encoding="UTF-8"?>
<GeocodeResponse>
<status>OK</status>
304
305
5.6
Objects
<result>
<type>locality</type>
<type>political</type>
<formatted_address>Chicago, IL,USA</ formatted_address>
<address_component>
<long_name>Chicago</long_name>
<short_name>Chicago</short_name>
<type>locality</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Cook County</long_name>
<short_name> Cook County </short_name>
<type>administrative_area_level_2</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Illinois</long_name>
<short_name>IL</short_name>
<type> administrative_area_level_1</type>
<type>political</type>
</address_component>
<address_component>
<long_name>United States</long_name>
<short_name>US</short_name>
<type>country</type>
<type>political</type>
</address_component>
<geometry>
<location>
<lat>41.8781136</lat>
<lng>-87.6297982</lng>
</location>
<location_type>APPROXIMATE</location_type>
<viewpoint>
<southwest>
<lat>41.6443349</lat>
<lng>-87.9402669</lng>
Listing 5.2 Google Maps API Sample XML Output
RESTful web services have enjoyed increased popularity in recent years over
other communication methods such as the Simple Object Access Protocol (SOAP).
One of the reasons REST has become more popular is because of its simplicity
using URL parameters and standard HTTP methods for communicating with the
web application to retrieve and manipulate data. In addition, REST supports a
wider range of return data formats such as JavaScript Object Notation (JSON),
XML, and plain text. REST is also more efficient when communicating with the
306
Datastores
5.6.4
Beginning with version 4.2, Data Services allows for communicating with a RESTbased application through the use of a web service datastore object. The available
functions from the REST application can then be imported into the repository and
called from within data flows. For example, you may have a need to perform a
lookup on key master data from a REST application managing master data. The
function call can be done in a Query transform, and the data returned by the REST
function call can then be used in your data flow.
To configure a REST web service datastore object, a Web Application Description
Language (WADL) file is required. A WADL file is an XML-formatted file that
describes the functions and their corresponding parameters available from the
REST application. These functions typically use the HTTP methods GET, PUT, POST,
and DELETE. Listing 5.3 shows a sample WADL file describing a REST application
that has various functions which allow for the retrieval and manipulation of container and item data. Each of the available functions is based on a standard HTTP
method.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<application xmlns="http://research.sun.com/wadl/200/10">
<doc xmlns:jersey="http://jersey.dev.java.net/"
jersey:generatedBy="Jersey: 1.0-ea-SNAPSHOT 10/02/
2008 12:17 PM"/>
<resources base="http://localhost:9998/storage/">
<resource path="/containers">
<method name="GET" id="getContainers">
<response>
<representation mediaType="application/xml"/>
</response>
</method>
<resource path="(container)">
<parm xmlns:xs="http://www.w3.org/2001/XMLSchema"
type-xs:string style="template" name="container"/>
<method name="PUT" id="putContainer">
<response>
<representation mediaType="application/xml"/>
</response>
</method>
<method name="DELETE" id="deleteContainer"/>
<method name="GET" id="getContainer">
307
5.6
Objects
File Formats
<request>
<param xmlns:xs="http://www.w3.org/2001/
XMLSchema"
type="xs:string" style="query" name="search"/>
</request>
<response>
<representation mediaType="application/xml"/>
</response>
</method>
<resource path="(item: .+)">
<param xmlns:xs="http://www.w3.org/2001/XMLSchema"
type="xs:string" style="template" name="item"/>
<method name="PUT" id="putItem">
<request>
<representation mediaType="*/*"/>
</request>
<response>
<representation mediaType="*/*"/>
</response>
</method>
<method name="DELETE" id="deleteItem"/>
<method name="GET" id="getItem">
<response>
<representation mediaType="*/*"/>
</response>
</method>
</resource>
</resource>
</resource>
</resources>
</application>
method GET. The request schema will include the available input parameters for a
function, which in this case, is the parameter item. The reply schema will include
the XML- or JSON-formatted data in addition to HTTP error codes (AL_ERROR_NUM)
and error messages (AL_ERROR_MSG). These fields can then be integrated and used
in downstream data flow processing logic.
In addition to the WADL file, the web service datastore objects can be configured
for security and encryption as well as configured to use XML- or JSON-formatted
data. After the web service datastore has been created, the available functions can
be browsed through the Datastore Explorer in the Data Services Designer application. Figure 5.114 shows the view from the Datastore Explorer in Data Services
Designer for the same REST application with functions for manipulating container and item data. Notice all seven functions from the WADL file are present
along with a nested structure for the functions.
Any of the available functions can then be imported into the repository through
the Data Store Explorer. Once imported, they become ready for use in data flows.
Figure 5.115 shows the getItem function from our WADL file with both a request
schema and reply schema defined. Notice the function is based on the HTTP
308
5.7
File Formats
As datastore structures provide metadata regarding column attribution to databases and application, file formats provide the same for files. Well go over the
different file format types and explain how to use them in the following sections.
309
5.7
Objects
5.7.1
File Formats
To use a flat file as a source or target data source, it needs to first exist in the local
repository and visible in the Local Object Library area of the Data Services
Designer. The object is a template rather than a specific file. After the object is
added to a data flow, then it can be configured to represent a specific file or set of
files. Well step through the configuration later in this section, but first well create the file format template.
The first step in creating a flat file format template is to right-click Flat Files, and
select New as shown in Figure 5.116. The Flat Files objects are located in the
Data Services Designer, in the Local Object Library area, in the File Format section.
310
In the left-hand column, the first option to consider is the type of flat file. Your
default choice is Delimited and appropriately is the typical selection. The other
choices include Fixed Width, in which the data fields are fixed lengths. These
files tend to be easier to read but can be inefficient for storing empty spaces for
values smaller than defined widths. SAP Transports type is used when you want
to create your own SAP application file format. This is the case if the predefined
Transport_Format wasnt suitable to read from or write to a file in an SAP data
flow. The last two types are Unstructured Text and Unstructured Binary.
Unstructured Text type is used to consume unstructured files such as text files,
HTML, or XML. Unstructured Binary is used to read unstructured binary files
such as Microsoft Office documents (Word, Excel, PowerPoint, Outlook emails),
generic .eml files, PDF files, and other binary files (Open Document, Corel WordPerfect, etc.). You can then use these sources with Text Data Processing transforms to extract and manipulate data.
The second configuration option is Name. It cant be blank, so a generated name
is entered by default. Assuming you want a more descriptive name to reference,
youll need to change the name prior to saving. After the template is created, the
Name cant be changed.
The third item of the Delimited file is Adaptable Schema. This is a useful option
when processing several files, where their formats arent consistent. For example,
311
5.7
Objects
youre processing 10 files, and some have a comments column at the end,
whereas others do not. In this case, youll receive an error because your schema
is expecting a fixed number of elements and finds either more or less. With
Adaptable Schema set at Yes, you define the maximum number of columns. For
the files with fewer columns, null values are inserted into the missing column
placeholders. There is an assumption here that the missing columns are at the end
of the row, and the order of the fields isnt changed.
File Formats
The default type of delimiter is Comma. If your results show only one column
of combined data, adjust the Column in the Delimiter section as necessary.
If the first rows on the files are headers, then you can select to Skip row header
to change the column names to the headers. If you want the headers written
when using the template as a target, then select Yes on the Write row header
option.
If youll be using the template for multiple files, review the data types and field
sizes in the right-hand section. The values are determined from the file you
imported; if the field sizes are variable, then ensure the values are set to their
maximum. Data types can also be incorrect based on a small set of values; for
example, the part IDs in the file may be all numbers, but in another file, they
may include alphanumeric values.
The File Format Editor creates the template based on the data in the specified
file and displays the results in the right-hand section as shown in Figure 5.119.
Here are a couple of things to remember when using this method:
312
313
5.7
Objects
can change the Adaptable Schema setting. You can modify performance settings
that werent present on the template such as Join Rank, Cache, and Parallel
process threads. At the bottom of the left-hand section is the Source Information. The Include file name column setting will add a column to the files data
source that contains the file name. This is a useful setting when processing multiple files.
File Formats
If you intend to process multiple files in a given directory, you can use a wildcard
(*). For example, if you want to read all text files in a directory, then the File
Name(s) setting would be *.txt. You can also use the wildcard for just a portion
of the file name. This is useful if the directory contains different types of data. For
example, a directory contains part data and price data, and the files are appended
with timestamps. To process only part data from the year 2014, the setting is
part_update_2014*.txt.
When using the wildcard, Data Services expects a file to exist. An error occurs
when the specified file cant be found. This error is avoided by checking existence
prior to reading the flat file. One method of validating file existence is to use the
file_exists(filename) function, which given the path to the file will return 0 if
no files are located. The one shortcoming of this method is that the file_exists
function doesnt process file paths that contain wildcards, which in many cases is
the scenario presented. There is another function, wait_for_file(filename,
timeout, interval), which does allow for wildcards in the file name and can be
set to accomplish the same results. Figure 5.121 shows the setup to accomplish
this. First a script calls the wait_for_file function. Then a conditional transform
is used to either process the flat file or catch when no file exists.
The Root Directory setting can be modified from what is in the template and be
specific to that one flat file source or target. You can hard-code the path into this
setting, but this path will likely change as you move the project through environments. A better practice is to use substitution parameters for the path. Using the
substitution parameter configurations for each environment allows the same
parameter to represent different values in each environment (or configuration).
The File Name(s) setting is similar to the Root Directory setting. The value
comes from the template, but it can be changed and is independent of the template. You can hard-code the name, but that assumes the same file will be used
each time. Its not uncommon to have timestamps, department codes, or other
identifiers/differentiators appended to file names. In the situation where the
name is changed and set during the execution of the job, youll use a parameter or
global variables.
314
315
5.7
Objects
5.7.2
File Formats
The methods to create flat file templates discussed so far can be used for either
source or target data sources. Target data sources have one other method to
quickly create a flat file template. Within the Query Editor, you can right-click the
output schema and select Create File Format as shown in Figure 5.122. This will
use the output schema as the data type definition and create a template.
To read the data from an Excel file, you must first define an Excel adapter on the
job server in the Data Services Management Console.
Warning
Only one Excel adapter can be configured for each job server.
5.7.3
Excel files are widely used in every organization. From marketing to finance
departments, a large number of data sets throughout a company are stored in
Excel format. As a result, learning to work efficiently with these in Data Services
is critical to the success of many data-provisioning projects.
Excel supports multiple Excel versions and Excel file extensions. The following
versions are compatible as of Data Services version 4.2: Microsoft Excel 2003,
2007, and 2010.
316
317
5.7
Objects
File Formats
Warning
Data Services references Excel workbooks as sources only. Its not possible to define an
Excel workbook as a target. For situations where weve needed to output to Excel workbooks, weve used a User-Defined transform; an example is given in the User-Defined
Transform section found under Section 5.5.3.
At this point, you cant import or read a password-protected Excel workbook.
6. Start the adapter from the Adapter Instance Status tab (see Figure 5.125).
The 64-bit Data Services Designer and job server are incompatible with Microsoft
Office products prior to Office 2010. During installation, if Microsoft Office isnt
installed on the server, the installer will attempt to install Microsoft Access Database Engine 2010. A warning message will appear if an earlier version of Microsoft Office is installed. To remedy this and be able to use Excel workbook sources,
upgrade to 2010 64 bit or greater or uninstall Microsoft Office products and
install the Microsoft Access Database Engine 2010 redistributable that comes
with the Data Services (located in <LINK_DIR>/ext/microsoft/AccessDatabaseEngine_X64.exe).
318
319
5.7
Objects
File Formats
If the first row of your spreadsheet contains the column names, dont forget to
click the corresponding option as shown in Figure 5.129.
Format Name
Give the component a name in the Format name field, and specify the file location (or alternatively, a parameter if one has been defined) and file name as
shown in Figure 5.128.
Warning
To import an Excel workbook into Data Services, it must first be available on the Windows file system. You can later change its actual location (e.g., to the UNIX server). The
same goes if you want to reimport the workbook definition or view its content.
320
After the properties have been specified, click the Import schema button. This
will create a file definition based on your range selection. If the column names are
present, theyll be used to define each column, as shown in Figure 5.130. A blank
will appear if the content type cant be automatically determined for a column.
When its not possible for Data Services to determine the data type, the column
will be assigned a default value of varchar(255).
After confirming and updating column attributes if required, click the OK button,
and the Excel format object will be saved.
Alternatively, an Excel format can be manually defined by entering the column
names, data types, content types, and description. These can be entered in the
Schema pane at the top of the New File Creation tab. After the file format has
been defined, it can be used in any workflow as a source or a target.
321
5.7
Objects
File Formats
FTP
Be careful with worksheet names. Blank names or names that contain special
characters may not be processed.
Provide the host name, fully qualified domain name, or IP address of the computer where the data is stored. Also, provide the user name and password for that
server. Lastly, provide the file name and location of the Excel file you want to
access.
Custom
Provide the name of the executable script that will access the data. Also provide
the user name, password, and any arguments to the program.
Note
If both options (FTP/Custom) are left unchecked, Data Services assumes that the file is
located on the same machine where the job server is installed.
322
323
5.7
Objects
File Formats
Usage Notes
Prerequisites
Data Services can deal with Excel formulas. However, if an invalid formula results in
an error such as #DIV/0, #VALUE, or #REF, then the software will process the cell as
a NULL value.
Before attempting to connect Data Services to your Hadoop instance, make sure to validate the following prerequisites and requirements:
Be careful when opening the file in Excel when using it from Data Services. Because
the application reads stored formula values, there is a risk of reading incorrect values
if Excel isnt closed properly. Ideally, the file shouldnt be opened by any other software when used in Data Services.
Hadoop must be installed on the same server as the Data Services job server. The job
server may or may not be part of the Hadoop cluster.
5.7.4
The job server must start from an environment that has sourced the Hadoop environment script.
Text processing components must be installed on Hadoop Distributed File System
(HDFS).
Hadoop
With the rise of big data, Apache Hadoop has become a widely used framework
for storing and processing very large volumes of structured and unstructured data
on clusters of commodity hardware. While Hadoop has many components that
can be used in various configurations, Data Services has the capability to connect
and integrate with the Hadoop framework specifically in the following ways:
Hadoop distributed file system (HDFS)
A distributed file system that provides high aggregate throughput access to
data.
MapReduce
A programming model for parallel processing of large data sets.
Hive
An SQL-like interface used for read-only queries written in HiveQL.
Pig
A scripting language used to simplify the generation of MapReduce programs
written in PigLatin.
324
325
5.7
Objects
2. Create a new HDFS file format for the source file (see Figure 5.133).
Summary
5.8
Summary
This chapter has traversed and described the majority of the Data Services Object
hierarchy. With these objects at your disposal within Data Services, virtually any
data provisioning requirement can be solved. In the next chapter we explore how
to use value placeholders to enable values to be set, shared, and passed between
objects.
3. Set the appropriate Data File(s) properties for your Hadoop instance, such as
the hostname for the Hadoop cluster and path to the file you want to read in.
The HDFS file can now be connected to an object (in this case, a Merge transform) and loaded into a number of different types of targets, such as a database
table, flat file, or even another file in HDFS.
In Figure 5.134, we connect HDFS to a Merge transform and then load the data
into a database table. The HDFS file can be read and sourced much like any other
source object.
326
327
5.8
Contents
Acknowledgments ............................................................................................
Introduction .....................................................................................................
15
17
1.2
1.3
1.4
1.5
1.6
23
24
29
31
33
36
37
37
39
40
41
41
41
42
45
45
46
52
55
Installation ................................................................................ 57
2.1
2.2
57
57
62
64
65
Contents
2.3
2.4
2.5
2.6
2.7
2.2.2
Preparing for Repository Creation .....................................
2.2.3
Creating Repositories .......................................................
Postal Directories ..........................................................................
2.3.1
USA Postal Directories .....................................................
2.3.2
Global Postal Directories ..................................................
Installing SAP Server Functions ......................................................
Configuration for Excel Sources in Linux ........................................
2.5.1
Enabling Adapter Management in a Linux Job Server ........
2.5.2
Configuring an Adapter for Excel on a Linux Job Server ....
SAP Information Steward ...............................................................
Summary .......................................................................................
67
67
74
74
81
82
84
84
86
89
90
3.7
3.2
3.3
3.4
3.5
3.6
91
91
94
113
125
132
132
133
136
138
139
140
141
142
143
147
150
150
152
153
155
157
157
157
164
166
166
3.6.2
Common Causes of Job Execution Failures ........................ 168
Summary ....................................................................................... 172
Contents
4.4
4.5
175
177
180
181
183
184
190
192
193
196
198
202
202
205
5.2
5.3
5.4
5.5
Jobs ...............................................................................................
5.1.1
Batch Job Object ..............................................................
5.1.2
Real-Time Job Object .......................................................
Workflow ......................................................................................
5.2.1
Areas of a Workflow .........................................................
5.2.2
Continuous Workflow ......................................................
Logical Flow Objects .....................................................................
5.3.1
Conditional ......................................................................
5.3.2
While Loop ......................................................................
5.3.3
Try and Catch Blocks ........................................................
Data Flows ....................................................................................
5.4.1
Creating a Standard Data Flow with an Embedded
ABAP Data Flow ..............................................................
5.4.2
Creating an Embedded ABAP Data Flow ..........................
Transforms ....................................................................................
5.5.1
Platform Transforms .........................................................
208
208
211
217
219
220
222
222
223
223
223
224
225
231
231
Contents
5.6
5.7
5.8
7.5
331
335
336
337
339
340
342
343
345
349
352
352
356
357
10
249
271
299
299
303
305
307
309
310
316
316
324
327
5.5.2
Data Integrator Transforms ...............................................
5.5.3
Data Quality Transforms ...................................................
Datastores .....................................................................................
5.6.1
SAP BW Source Datastores ...............................................
5.6.2
SAP BW Target Datastore .................................................
5.6.3
RESTful Web Services .......................................................
5.6.4
Using RESTful Applications in Data Services .....................
File Formats ...................................................................................
5.7.1
Flat File Format ................................................................
5.7.2
Creating a Flat File Template from a Query Transform ......
5.7.3
Excel File Format ..............................................................
5.7.4
Hadoop ............................................................................
Summary .......................................................................................
Contents
360
362
363
363
8.4
8.5
8.6
8.3.2
Oracle ..............................................................................
8.3.3
SQL Server .......................................................................
Target-Based CDC Solution ............................................................
Timestamp CDC Process ................................................................
8.5.1
Limitations .......................................................................
8.5.2
Salesforce .........................................................................
8.5.3
Example ...........................................................................
Summary .......................................................................................
369
369
372
376
376
377
377
379
9.2
9.3
9.4
9.5
384
385
386
387
387
388
393
394
395
396
404
405
407
411
Performance ..................................................................................
10.1.1 Constraining Results .........................................................
10.1.2 Pushdown ........................................................................
10.1.3 Enhancing Performance When Joins Occur on
the Job Server ..................................................................
10.1.4 Caching ............................................................................
10.1.5 Degree of Parallelism (DoP) ..............................................
10.1.6 Bulk Loading ....................................................................
414
414
414
422
423
424
425
11
Contents
10.2
10.3
Contents
Simplicity ......................................................................................
10.2.1 Rerunnable .......................................................................
10.2.2 Framework .......................................................................
Summary .......................................................................................
426
426
426
427
11.2
11.3
430
430
431
433
437
438
446
451
454
14.3
14.4
14.5
500
501
502
502
502
504
505
508
12.2
12.3
455
455
464
465
466
472
473
473
13.2
13.3
13.4
12
477
479
479
481
490
496
498
13
Index
A
ABAP data flow, 223
create, 225
extraction, 229
ABAP program, dynamically load/execute, 82
Access server
configure, 123, 127
parameters, 129
SSL, 148
Accumulating snapshot, 437, 450
Acta Transformation Language, 493
Adapter Management, 84
Adapter SDK, 504
Adaptive Job Server, 93
Adaptive Process Server, 93
Address
census data, 81
cleansing and standardization, 282
data, clean and standardize, 275
global validation, 81
ISO country codes, 295
latitude/longitude, 294
list of potential matches, 297
street-level validation, 82
Address Cleanse transform, 74
Address SHS Directory, 75
Administrator module, 95
Adapter Instances submodule, 99
Central Repository, 100
Management configuration, 104
Object Promotion submodule, 101
real-time job, 97
Real-Time submodule, 97
SAP Connections submodule, 98
Server Group, 100
Web Services submodule, 98
Aggregate fact, 437, 450
Alias, create, 136
All-world directories, 81
Apache Hadoop, 324
Application
authorization, 92
framework, 46
Application (Cont.)
settings, 92
Architecture
performance considerations, 24
scenario, 24
system availability/uptime considerations, 24
web application performance, 27
Associate transform, 292
Asynchronous changes, 361
ATL, export files from Data Services Designer,
32
Auditing, prevent pushdown, 421
Authentication, 58
Auto Documentation
module, 109
run, 109
B
BAPI function call, 466
BAPI function call, read data into
Data Services, 470
Batch job, 208
auditing, 210
create, 189
execution properties, 209
logging, 209
monitoring sample rate, 209
performance, 210
statistics, 210
trace message, 209
Batch Job Configuration tab, 97
Big data, 500
unstructured repositories, 451
BIP, 24, 57, 62, 63
CMC, 58
licensing, 63
patch, 63
user mapping, 59
Blueprints, 404
BOE Scheduler, 160
BOE BIP
Brand loyalty, 384
513
Index
C
CA Wily Introscope, 131
Caching, 423
in-memory, 374
Case transform, 235, 468
configure, 236
CDC, 259, 359
datastore configuration, 363
datastore output fields, 365
design considerations, 362
enable for database table, 369
Map Operation transform, 366
Oracle database, 369
Salesforce functionality, 377
source-based, 361, 363
subscription, 363
synchronous and asynchronous, 369
target-based, 361, 372
timestamp, 376
types, 360
CDC-enabled table, 364
Central Management Console CMC
Central Management Server CMS
Central repository, 65
code considerations, 32
reports, 101
Central repository-based promotion, 153
Certificate Logs, 109
Change data capture CDC
Change Tracking, 369
Changed Data Capture, 369
Changed records, 361
City Directory, 76
Cleansing package, 271
Cleansing Package Builder, 490
Client tier, 46
Cloud-based application, 377
CMC, 58, 91
application authorization, 92
514
Index
CMC (Cont.)
authentication, 58
repository, 61
server services, 92
users and groups, 60
uses, 91
CMS, 58, 90, 94
Enterprise authentication, 59
IPS, 26
login request, 58
logon parameters, 107
security plug-in, 58
SSO, 60
sync backup with filestore, 26
Code
move to production, 157
promote between environments, 150
sharing between repositories, 32
Column derivations, map, 200
Command-line execution
UNIX, 72
Windows, 70
Competitive intelligence, 384
Conditional, 222
Conditional, expression, 235
Conformed dimension, 431
Connection parameter, 178
Consolidated fact, 437
Country ID transform, 295
Country-specific address directories, 82
CPU utilization, 39
Custom function
call, 349
Data Services script, 347
parameters, 347
Custom Functions tab, 345
Customer
data, merge, 177
loyalty, 455
problem, 390
request, 390
sentiment, 384, 390
Customer Engagement Initiative (CEI), 507
D
Data
cleansing, 490
compare with CDC, 373
consolidation project, 189
de-duplicate, 477
dictionary, 271
extract and load, 231
gather/evaluate from social media site, 384
latency requirements, 362
mine versus query, 387
move from column to row, 267
move into data warehouse, 429
nested, 263
parse/clean/standardize, 271
staging, 417
unstructured to structured, 387
Data Cleanse transform, 271
cleansing package, 273
configure, 271
date formatting options, 275
firm standardization options, 274
input fields, 272
options, 273
output fields, 272
parsing rule, 272
person standardization options, 273
Data Cleansing Advisor, 478
Data flow, 176
add objects, 184
branching paths, 235
bypass, 502
calculate delivery time, 466
CDC tables, 366
configured CDC, 367
create, 184, 198
create with embedded ABAP data flow, 224
definition, 223
delete records, 246
execution, 202
fact table loading scenario, 448
flat file error, 169
graphical portrayal, 109
include in batch job, 189
periodic snapshot/aggregate fact table, 450
SCD implementation, 439
515
Index
516
Index
G
Generate XML Schema option, 457
GeoCensus files, 81
Geocoder transform, 294
output, 294
Geo-spatial data integration, 504
Global Address Cleanse transform, 81,
275, 282
address class, 287
country assignment, 283
Country ID Options group, 285
Directory Path option, 284
Engines option group, 283
Field Class, 286
map input fields, 282
standardize input address data, 285
517
Index
H
Hadoop, 451
Hadoop distributed file system (HDFS), 324
Haversine formula, 462
HDFS
connect to Merge transform, 326
load/read data, 325
Hierarchical data structure, 265
read/write, 247
Hierarchy Flattening transform, 261
High-rise business building address, 78
History Preserving transform, 256
effective dates, 256
History table, 444
Horizontal flattening, 261
HotLog mode, 369
I
I/O
performance, 37
pushdown, 414
reduce, 414
speed of memory versus disk, 423
IDE, 175, 179
primary objects, 175
Idea Place, 505
IDoc message transform, 471
Impact and Lineage Analysis module, 112
InfoPackage, 469
Information Platform Services IPS
Information Steward, 89, 477
expose lineage, 496
Match Review, 477
InfoSource, create, 303
Inner join, 232, 422
Input and Output File Repository, 94
518
Index
Installation
commands, 42
deployment decisions, 62
Linux, 84
settings, 41
Integrated development environment, 180
Integration, 30
IPS, 24, 57
cluster installation, 24
CMC, 58
licensing, 63
system availability/performance, 24
user mapping, 59
J
Job, 176, 208
add global variable, 335
blueprint, 404
common execution failures, 168
common processes, 426
data provisioning, 191
dependency, 160
execution exception, 166
filter, 96
graphic portrayal, 109
processing tips, 426
read data from SAP BW, 300
replication, 191
rerunnable, 426
schedule, 157
specify specific execution, 158
standards, 330
Job Error Log tab, 167
Job execution
log, 166
statistics, 113
Job server, 211
associate local repository, 119
configure, 115
create, 116
default repository, 121
delete, 117
edit, 117
join performance enhancement, 422
remove repository, 120
SSL, 148
Job service
start, 115
stop, 115
Join pair, 201
Join rank, 238, 422
option, 232
Mapping, 200
input fields to content types, 272
multiple candidates, 200
parent and child objects, 264
Master data management, 479
Match results, create through
Data Services, 481
Match Review, 189, 477
approvers/reviewers, 486
best record, 484
configuration, 482
configure job status, 485
data migration, 481
process, 478
results table, 483
task review, 487
terminology, 479
use cases, 479
Match transform, 288, 479
break key, 288
consolidate results from, 292
Group Prioritization option, 288
output fields, 291
scoring method, 289
Memory, 40
page caching, 41
Merge transform, 237
Metadata Management, 64, 496
Mirrored mount point, 25
Monitor log, 97
Monitor, system resources, 37
Multi-developer environment, 65
Multi-developer landscape, 31
Multiline field, 276
K
Key Generation transform, 249, 441
Kimball approach, 429
principles, 430
Kimball methodology, 429
Klout scores, 385
Knowledge transfer, 165
519
Index
Network traffic, 45
New numeric sequence, 238
O
Object, 207
common, 426
connect in Data Services Designer, 186
hierarchy, 207
pairs, 223
types, 175
Object Promotion, 155
execute, 156
with CTS+, 157
Operating system, 36
I/O operations, 37
optimize to read from disk, 37
Operation code, UPDATE/DELETE, 372
Operational Dashboard module, 113
Optimized SQL, 415, 416
perform insert, 417
P
Pageable cache, 122, 130
Parameters, 339
Parse Discrete Input option, 274
Parsing rule, 271
Performance metrics and scoping, 131
Performance Monitor, 97
Periodic snapshot, 437, 450
Permissions, users and groups, 60
Pipeline forecast, 465
Pivot transform, 267
Placeholder naming, 331
Platform transform, 231
Postal directory, 74, 278
global, 81
USA, 74
Postal discount, 299
Postcode directory, 76
Postcode reverse directory, 75
Predictive analytics, 385
Privileges, troubleshoot, 168
520
Index
Q
Quality assurance, 30
Query Editor, mapping derivation, 187
Query join, compare data source/target, 444
Query transform, 229, 231
constraint violation, 170
data field modification, 232
define joins, 232
define mapping, 186
filter records output, 233
mapping, 232
primary key fields, 235
sorting criteria, 234
R
Range, date sequence, 249
Real-time job, 211, 456
execute as service, 214
execute via Data Services Designer, 213
S
Salesforce source tables, CDC functionality, 377
Sandbox, 29
SAP APO, 455, 465
interface, 469
load calculations, 467
read data from, 470
SAP BusinessObjects BI platform BIP
SAP BusinessObjects Business Intelligence
platform, 24
SAP BusinessObjects reporting, 63
SAP BW, 465, 469
load data, 303
set up InfoCubes/InfoSources, 303
source datastore, 299, 302
source system, 470
target datastore, 303, 304
SAP Change and Transport System (CTS),
install, 83
SAP Customer Connection, 507
SAP Customer Engagement Intelligence
application, 409
SAP Data Services roadmap, 499
SAP Data Warehousing Workbench, 303
SAP ECC, 455, 465, 466
data load, 467
extraction, 467
interfaces, 471
SAP HANA, 451, 505
SAP HANA, integrate platforms, 452
SAP Information Steward
Information Steward
SAP server functions, 82
SCD
change history, 435
dimension entities, 435
type 1, 438
type 1 implement in data flow, 439
type 2, 441
type 2 implement in data flow, 442
type 3, 443
type 3 implement in data flow, 444
type 4, 444
type 4 implement in data flow, 445
typical scenario, process and load, 437
521
Index
Script, 176
object, 349
Scripting language and Python, 343
Secure Socket Layer (SSL), 142
Semantic disambiguation, 393
Semantics and linguistic context, 388
Sentiment extraction demonstration, 407
Server
group, 100
list of, 93
Server-based tools, 91
Services configuration, 50
Shared directory, export, 103
Shared object library, 65
SIA
clustering, 24
clustering example, 28
subnet, 27
Simple, 500
Simple Object Access Protocol (SOAP), 306
Sizing, 45
Sizing, planning, 54
Slowly changing dimensions (SCD), 434
SMTP
configuration, 124
for Windows, 131
Snowflaking, 436
SoapUI, 217
Social media analytics, 383
create application, 404
Voice of the Customer domain, 390
Social media data, filter, 409
Source data, surrogate key, 249
Source object, join multiple, 237
Source system
audit column, 414
tune statement, 421
Source table
prepare/review match results, 478
run snapshot CDC, 376
Source-based CDC, 360
datastore configuration, 363
SQL Query transform, 244
SQL Server database CDC, 369
SQL Server Management Studio, 369
SQL transform, 356
SQL transform, custom code, 356
522
Index
T
Table Comparison transform, 254, 361, 372,
375, 439, 442
comparison columns, 255
primary key, 255
Table, pivot, 267
Target table, load only updated rows, 367
Target-based CDC, 360, 372
data flow design consideration, 375
Targeted mailings, 79
TDP, 387
dictionary, 390
TDP (Cont.)
extraction, 388
grammatical parsing, 393
public sector extraction, 391
SAP Customer Engagement Intelligence
application, 409
SAP HANA, 410
SAP Lumira, 404
semantic disambiguation, 393
Temporary cache file, encrypt, 149
Text Data Processing Extraction Customization
Guide, 391
Text data processing TDP
Text data, social media analysis, 384
Threshold, Match Review, 482
Tier, 46
Tier to server to services mapping, 49
Time dimension table, 250
Timestamp
CDC, 361, 376
Salesforce table, 377
TNS-less connection, 69
Trace log, 97
Trace message, 209
Transaction fact, 437
load, 448
Transform, 176, 231
compare set of records, 374
Data Integrator, 249
data quality, 271
Entity Extraction, 394
object, 352
platform, 231
Transparency, decreased, 421
Try and catch block, 223
Twitter data, 408
U
Unauthorized access, 30
UNIX, adding an Excel Adapter, 317
Unstructured data, 387
Unstructured text, TDP, 387
Upgrades, 63
US zip codes, 78
V
Validation rule, 112
definition, 241
fail, 241
failed records, 243
reporting, 242
Validation transform, 240
define rules, 241
Varchar, 202
Variable, 336
pass, 339
vs. local variable, 339
Vertical flattening, 261
VOC, requests, 390
Voice of the Customer (VOC) domain, 390
W
Warranty support, 165
Web presence, 384
523
Index
Workflow (Cont.)
properties, 219
versus job, 217
X
XML Map transform, 247, 265
XML Pipeline transform, 263
XML schema, 456, 457
file format object, 458
use in data flow, 459
XML_Map, 470, 471
batch mode, 247
XML_Pipeline transform, 460
XSD, 456
create, 457
Z
Z4 Change Directory, 78
524
First-hand knowledge.
Bing Chen leads the Advanced Analytics practice at Method360 and has over 18 years of experience in IT consulting, from custom application development to data integration, data warehousing, and business analytics applications on
multiple database and BI platforms.
James Hanck is co-founder of Method360 and currently
leads the enterprise information management practic
Patrick Hanck is a lead Enterprise Information Management
consultant at Method360. He specializes in provisioning information, system implementation, and process engineering,
and he has been recognized through business excellence and
process improvement awards.
Scott Hertel is a senior Enterprise Information Management consultant with over 15 years of consulting experience helping companies integrate, manage, and improve the
quality of their data.
2015 by Rheinwerk Publishing, Inc. This reading sample may be distributed free of charge. In no way must the file be altered, or
individual pages be removed. The use for any commercial purpose other than promoting the book is strictly prohibited.
Bing Chen, James Hanck, Patrick Hanck, Scott Hertel, Allen Lissarrague, Paul Mdaille
www.sap-press.com/3688