Ds 42 Tutorial en
Ds 42 Tutorial en
Tutorial
© 2022 SAP SE or an SAP affiliate company. All rights reserved.
2 Product overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Product components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The Designer user interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Designer tool palette. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 About objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Object hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Jobs and subordinate objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Work flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Data flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Naming conventions for objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6 Delete reusable objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Tutorial
2 PUBLIC Content
8.4 Adding a work flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.5 Adding a data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.6 Define the data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Add objects to the DF_SalesOrg data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Define the order of steps in a data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
Configure the query transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.7 Validating the DF_SalesOrg data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.8 Viewing details of validation errors and warnings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.9 Saving the project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
8.10 Ensuring that the Job Server is running. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
8.11 Executing the job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.12 What's next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Tutorial
Content PUBLIC 3
11.7 Leveraging the XML_Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Create a job, work flow, and data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Unnesting the schema with the XML_Pipeline transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.8 What's next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Tutorial
4 PUBLIC Content
View audit results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.6 What's next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Tutorial
Content PUBLIC 5
17.2 Defining an SAP application datastore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
17.3 Importing metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
17.4 Repopulate the customer dimension table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Adding the SAP_CustDim job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Adding ABAP data flow to Customer Dimension job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Defining the DF_SAP_CustDim ABAP data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Executing the JOB_SAP_CustDim job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
ABAP job execution errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .188
17.5 Repopulating the material dimension table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Adding the Material Dimension job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . .189
Adding ABAP data flow to Material Dimension job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Defining the DF_SAP_MtrlDim ABAP data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Executing the JOB_SAP_MtrlDim job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
17.6 Repopulating the Sales Fact table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Adding the Sales Fact job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Adding ABAP data flow to Sales Fact job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Defining the DF_ABAP_SalesFact ABAP data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Executing the JOB_SAP_SalesFact job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
17.7 What's next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Tutorial
6 PUBLIC Content
1 Introduction to the tutorial
This tutorial introduces you to the basic use of SAP Data Services Designer by explaining key concepts and
providing a series of related exercises and sample data.
Data Services Designer is a graphical user interface (GUI) development environment in which you extract,
transform, and load batch data from flat-file and relational database sources for use in a data warehouse. You
can also use Designer for real-time data extraction and integration.
The tutorial is for users experienced in many areas of database management, SQL, and Microsoft Windows.
The tutorial introduces core SAP Data Services Designer functionality. We wrote the tutorial assuming that you
have experience in some of the following areas:
Related Information
Tutorial
Introduction to the tutorial PUBLIC 7
1.2 Tutorial objectives
After you complete this tutorial, you’ll be able to extract, transform, and load data from various source and
target types, and understand the concepts and features of SAP Data Services Designer.
You’ll know about the various Data Services objects such as datastores and transforms, and you’ll be able to
define a file format, import data, and analyze data results.
You’ll learn how to use Data Services Designer features and functions to do the following:
Related Information
Tutorial
8 PUBLIC Introduction to the tutorial
2 Product overview
SAP Data Services extracts, transforms, and loads (ETL) data from heterogeneous sources into a target
database or data warehouse.
You specify data mappings and transformations by using Designer, the graphical user interface for Data
Services. Data Services combines industry-leading data quality and integration into one platform. It transforms
your data in many ways. For example, it standardizes input data, adds additional address data, cleanses data,
and removes duplicate entries.
Data Services provides additional support for real-time data movement and access. It performs predefined
operations in real time, as it receives information. The Data Services real-time components also provide
services to Web applications and other client applications.
For a complete list of Data Services resources, see the SAP Help Portal at http://help.sap.com/bods.
Tutorial
Product overview PUBLIC 9
Component Description
Job Server Application that launches the Data Services processing engine and serves as an interface to the engine
and other components in the Data Services suite.
Engine Executes individual jobs that you define in the Designer to effectively accomplish the defined tasks.
Repository Database that stores Designer predefined system objects and user-defined objects including source and
target metadata and transformation rules. Use a local repository to store your data and objects. To share
data and objects with others and for version control, use a central repository.
Access Server Passes messages between Web applications and the Data Services Job Server and engines. Provides a
reliable and scalable interface for request-response processing.
Administrator Web administrator that provides the following browser-based administration of Data Services resources:
The following diagram illustrates Data Services product components and relationships.
Tutorial
10 PUBLIC Product overview
Parent topic: Product overview [page 9]
Related Information
Use the many tools in SAP Data Services Designer to create objects, projects, data flows, and workflows to
process data.
The Designer interface contains key work areas that help you set up and run jobs. The following illustration
shows the key areas of the Designer user interface.
Tutorial
Product overview PUBLIC 11
Parent topic: Product overview [page 9]
Related Information
The SAP Data Services objects to build work flows and data flows appear as icons on the tool palette to the
right of the workspace.
If the object isn't applicable to what you have open in the workspace, Data Services disables the icon. We use
only a few of the objects in the tool palette for the tutorial.
The following table contains descriptions for some of the objects in the tool palette. For descriptions and use of
all objects in the tool palette, see the Designer Guide.
Object Description
Tutorial
12 PUBLIC Product overview
Object Description
Related Information
Tutorial
Product overview PUBLIC 13
3 About objects
SAP Data Services objects are entities that you create, add, define, modify, or work with in the software.
Each Data Services object has similar characteristics for creating and configuring objects.
Characteristic Description
Properties Text that describes the object. For example, the name, de
scription, and creation date describes aspects of an object.
Attributes Properties that organize objects and make them easier for
you to find. For example, organize objects by attributes such
as object types.
The Designer contains a Local Object Library that is divided by tabs. Each tab is labeled with an object type.
Objects in a tab are listed in groups. For example, the Project tab groups projects by project name and further
by job names that exist in the project.
• Projects
• Jobs
• Workflows
• Data flows
• Transforms
• Datastores
• Formats
• Functions
Tutorial
14 PUBLIC About objects
Delete reusable objects [page 21]
SAP Data Services deletes objects differently based on the location in which you choose to delete the
object.
Each object in SAP Data Services has a specific place in the object hierarchy.
The highest object in the hierarchy is a project. All other objects are subordinate to a project. The following
diagram shows the hierarchical order of key object types in Data Services.
Tutorial
About objects PUBLIC 15
When you create objects in a project, add objects in hierarchical order.
Example
In a project, you must 1st create a batch job, which is the 2nd highest object in Data Services. Then you can
add a work flow and a data flow. A work flow isn't required, but a job must have a data flow to process data.
A data flow can contain only subordinate objects such as tables, transforms, and template tables.
Tutorial
16 PUBLIC About objects
Related Information
Add a job to a project, and build the job by adding subordinate objects in a specific order.
A project is the highest-level object in Designer hierarchy. Projects organize jobs and the related subordinate
objects in a job.
Note
A job doesn't have to be a part of a project. You can create a job independent of a project. A data flow,
however, must be a part of a job.
Open a project by right-clicking the project in the object library and selecting Open. After you open a project, it
appears in the project area pane. If you open a different project from the Project tab in the object library, Data
Services closes the opened project in the project area and displays the newly opened project.
Build a project by adding subordinate objects in hierarchical order. The following table contains a list of objects
and the related subordinate objects.
Tutorial
About objects PUBLIC 17
Object Subordinate object Subordinate description
Related Information
Use a work flow to specify the order in which SAP Data Services processes multiple objects, including other
work flows and subordinate data flows.
A work flow is a reusable object. It executes only within a Job. Use work flows to perform the following tasks:
Example
Open a work flow in the workspace and start to build it by adding applicable objects such as data flows.
Arrange data flows in the workspace so that the output from 1 data flow is ready as input to the next data
flow.
Tutorial
18 PUBLIC About objects
The following is an example of a work flow diagram in a workspace:
Related Information
Use data flows to transform source data into target data in a specific order.
Data flows process data in the order in which they’re arranged in the workspace.
A data flow defines the tasks of a job. It also defines the direction or flow of data. Data flows can be as simple as
having a source, transform, and target. However, data flows can be complicated and involve, for example,
several sources, if/then statements, queries, multiple targets, and more.
Tutorial
About objects PUBLIC 19
• Define the transformations to perform on the data.
• Identify the target object and define the transfer protocol for transformed data.
A data flow is a reusable object. It's always called by either a parent work flow or a parent job.
Related Information
We recommend that you decide on a comprehensive naming convention before you begin creating objects.
Example
The naming convention described in the following table adds prefixes or suffixes to identify the object type,
and includes the job name, to relate the object to a specific job.
Datastore DS SalesOrg_DS
Related Information
Tutorial
20 PUBLIC About objects
Jobs and subordinate objects [page 17]
Work flows [page 18]
Data flows [page 19]
Delete reusable objects [page 21]
SAP Data Services deletes objects differently based on the location in which you choose to delete the object.
The following table describes what Data Services deletes when you delete an object from either the project
area or the object library.
Project area Deletes only the object from the opened project.
Object library Deletes the object from the object library, and deletes all
calls to the object in the following locations:
• Repository
• Parent objects
Before Data Services deletes an object from the object library, it issues a message letting you know when an
object is used in multiple locations. The message provides the following options:
• Yes: Continue with the deletion of the object from the repository.
• No: Cancel the deletion of the object from the repository.
• View Where Used: Display a list of the related objects.
Related Information
Tutorial
About objects PUBLIC 21
Naming conventions for objects [page 20]
Tutorial
22 PUBLIC About objects
4 Preparation for this tutorial
Ensure that you or an administrator perform all of the preparation for the tutorial so that you can successfully
complete each exercise.
The preparation includes some steps that your administrator must complete. Contact your administrator for
important connection information and access information related to those tasks.
We have a complete documentation set for SAP Data Services available on our Customer Help Portal. If you are
unclear about a process in the tutorial, or if you don't understand a concept, refer to the online documentation
at http://help.sap.com/bods.
Required tasks
The steps to prepare for the SAP Data Services tutorial exercises include tasks for your administrator and tasks
that you can perform.
You must have sufficient user permission to perform the exercises in the tutorial. For information about
permissions, see the Administrator Guide.
The following table lists each task and who performs the task. Instructions for administrator-only tasks aren’t
included in the tutorial, but they’re included in other Data Services documents such as the Installation Guide.
Installed through either of the following applications: Installed before installation of SAP Data Services. More infor
mation in the Installation Guide.
• SAP BusinessObjects Business Intelligence platform (BI
platform)
• Information platform services platform (IPS platform)
Tutorial
Preparation for this tutorial PUBLIC 23
Task Who performs
The scripts add the source and target tables to your RDBMS,
and add the other data, such as XML file, to the Tutorial di
rectory.
The tutorial requires that you have access to an SAP Data Services repository, a source database, and a target
database.
Repository database
SAP Data Services requires a repository database. Data Services installation includes the creation of a
repository database. Ask your administrator for the repository name and password. Then use that information
when you log into Data Services.
If you or your administrator creates a repository specifically for this tutorial, make sure to follow the
instructions in the Post-Installation section, “Configuring repositories”, in the Installation Guide.
Tutorial
24 PUBLIC Preparation for this tutorial
Source and target databases
Either request that your administrator create a source and target database, or create the databases yourself. If
you create the databases, you must have the required permissions in the relational database management
system (RDBMS) to create users and databases.
Create the source and target database in either the same RDBMS as the repository, or a different RDBMS. For
example, create a source and target database in SAP SQL Anywhere, which is bundled with Data Services.
To add tables to the source and target databases, you run special scripts designed for certain database
management systems. There is a script for each of the following RDBMS:
• DB2
• Informix
• Microsoft SQL Server
• Micorsoft SQL Server 2005 and later versions
• ODBC
• Oracle
• SAP SQL Anwhere (Sybase)
The scripts are prepopulated with the database names, user names, and passwords listed in the following
table.
The scripts also require the server name on which you run your RDBMS.
If you use different database names, user names, or passwords when you create the databases, enter the
information in the handy worksheet located in Database connections worksheet [page 26]. We refer to the
information in the worksheet in several of the tutorial exercises.
Remember
If you use different names, update the scripts with the information that you used.
Database requirements
Ensure that the source and target users have the following permissions in the RDBMS:
Tutorial
Preparation for this tutorial PUBLIC 25
Note
For Oracle, set the protocol to TCP/IP and enter a service name; for example, training.sap. The service
name can act as your connection name.
Consult your RDBMS documentation for specific requirements for setting permissions.
Related Information
Complete the database connections worksheet and refer to the information in the sheet to complete the
exercises in this tutorial.
Print this page, or copy it to an editable format, and enter the applicable information in each column. We refer
you to the information in the worksheet in many of the tutorial exercises.
Database Connections
Value Repository Source Target
Database type:
Database Name
User name:
Password:
Related Information
Tutorial
26 PUBLIC Preparation for this tutorial
Running the provided SQL scripts [page 27]
Run the tutorial batch files to populate the source and target databases with tables.
Before you perform the following steps, make sure that you have permission to copy and edit files for the
directory in which the scripts are located.
• CreateTables_DB2
• CreateTables_Informix
• CreateTables_MSSQL
• CreateTables_MSSQL2005 (use for Microsoft SQL Server versions 2005 and later)
• CreateTables_ODBC
• CreateTables_ORA
• CreateTables_Sybase
The batch files run scripts that create tables in the source and target databases that you prepared for the
tutorial.
To edit and run the provided SQL scripts, perform the following steps:
1. Locate the batch file for your specific RDBMS in the Data Services installation directory.
Example
Alter the original file name by adding “_original” after the file name. For the batch file
CreateTables_ORA, rename the original file to CreateTables_ORA_original.
Example
You created the source and target databases using Oracle. You completed the database connections
worksheet as follows:
Tutorial
Preparation for this tutorial PUBLIC 27
Database name User name Password
Example
You created the source and target databases using Microsoft SQL Server 2019. You completed the
database connections worksheet as follows:
CDC_time CDC_time
cust_dim cust_dim
employee_dim employee_dim
mtrl_dim mtrl_dim
ods_customer sales_fact
ods_delivery salesorg_dim
ods_employee status_table
ods_material time_dim
ods_region
Tutorial
28 PUBLIC Preparation for this tutorial
Source tables Target tables
ods_salesitem
ods_salesorder
sales_fact
salesorg_dim
status_table
time_dim
Related Information
Tutorial
Preparation for this tutorial PUBLIC 29
5 Tutorial data model
To introduce you to the features in SAP Data Services, the tutorial uses a simplified data model.
The tutorial data model is a sales data warehouse with a star schema that contains one fact table and some
dimension tables.
In the tutorial, you perform tasks on the sales data warehouse. We divided the tutorial exercises into the
following segments:
Tutorial segments
Segment Lessons
Create datastores and import metadata Introduces how to create and use datastores to access data
from various data sources.
Populate a table from a flat file Introduces basic data flows, query transforms, and source
and target tables.
Populate a table with data from a relational table Introduces data extraction from relational tables.
Populate a table from an XML File Introduces data extraction from nested sources.
Populate a table from multiple relational tables Continues data extraction from relational tables and introdu
ces joins and the lookup function.
The tutorial also has segments that introduce the following concepts:
Tutorial
30 PUBLIC Tutorial data model
• Data assessment
• Recovery mechanisms
• Multiuser development
• SAP application data
• Real-time jobs
Complete each segment before going on to the next segment. Each segment creates the jobs and objects that
you need in the next segment. We reinforce each skill in subsequent segments.
Tutorial
Tutorial data model PUBLIC 31
6 Logging into the Designer
When you start the tutorial, and each time you resume the tutorial, you need to log into the Designer.
Before you perform the following steps, know your user name and password assigned to you when your
administrator created your user account. Also know the repository name and password.
After you log into the Designer a few times, you won't need to refer to these steps, and you will remember your
log in credentials.
See an example of the Designer in The Designer user interface [page 11].
Tutorial
32 PUBLIC Logging into the Designer
Option Description
System-host[:port] The name of the Central Management Server (CMS) system. You may also
need to specify the port when applicable.
Note
If applicable, enter localhost.
User name The user name assigned to you when your administrator created your user ac
count in the Central Management Console (CMC).
Password The password assigned to you when your administrator created your user ac
count in the CMC.
Note
This value is usually Enterprise.
If there is more than one repository, the software displays a list of existing local repositories in the bottom
pane. If there is just one repository, the prompt for your password appears.
4. Enter the repository password and select OK.
Next you learn how to use a datastore to define the connections to the source and target databases.
Related Information
Tutorial
Logging into the Designer PUBLIC 33
1. Select Project Exit.
If you haven't saved your changes, SAP Data Services prompts you to save your work before you exit.
2. Select Yes to save your work.
Related Information
When you’re ready to resume the tutorial, start at the point where you exited.
To resume the tutorial, log into SAP Data Services and perform the following steps:
Related Information
Tutorial
34 PUBLIC Logging into the Designer
7 Create datastores and import metadata
Datastores contain connection configurations to databases and applications in which you have data.
• How to create datastores that connect to the database where the source and target tables are stored.
• How to use the datastores to import metadata from source and target tables into the local repository
Use the connection information in datastores to import metadata from the database or application for which
you created the datastore. Use the metadata from the import in source and target objects in jobs.
• As a source object, Data Services accesses the data through the connection information in the datastore
and loads the data into the data flow.
• As a target object, Data Services outputs processed data from the data flow into the target object and, if
configured to do so, uploads the data to the database or application using the datastore connection
information.
In addition to other elements such as functions and connection information, the metadata in a datastore
consists of the following table elements:
• Table name
• Column names
• Column data types
• Primary and foreign key columns
• Table attributes
Data Services datastores can connect to any of the following databases or applications:
• Databases
• Mainframe file systems
• Applications that have prepackaged or user-written adapters
• J.D. Edwards, One World, J.D. Edwards World, Oracle applications, PeopleSoft, SAP applications, SAP Data
Quality Management, microservices for location data, Siebel applications, and Google BigQuery.
Tutorial
Create datastores and import metadata PUBLIC 35
• Remote servers using FTP, SFTP, and SCP
• SAP systems: SAP applications, SAP NetWeaver Business Warehouse (BW) Source, and BW Target
For complete information about datastores, see the Designer Guide. See the various supplements for
information about specific databases and applications. For example, for applications with adapters, see the
Supplement for Adapters.
To perform this task, you need to know the RDBMS that you used for the database. You also need the user
name and password information that you entered for your source database in the database connections
worksheet.
The remaining options change based on the database type you choose.
6. Enter the connection information for the source database.
7. Enter the database name, the user name, and the password for the source database.
If you used the suggested values, enter ODS for the database name, and enter ods for both the user name
and password. Otherwise, consult your database connections worksheet.
Tutorial
36 PUBLIC Create datastores and import metadata
Note
For the tutorial, we don't open or complete any of the advanced options.
8. Select OK.
Data Services saves the source datastore in the repository.
Related Information
Create a database datastore to use as a connection to the target database that you created for the tutorial.
To create a target datastore, follow the same process as for the source datastore, except enter Target_DS for
the datastore name. If you used the suggested values, enter Target for the database name, and enter target for
both the user name and password. Otherwise, consult your database connections worksheet for the values you
used.
Related Information
Tutorial
Create datastores and import metadata PUBLIC 37
7.3 Importing metadata for source tables
Use the datastore that you created for the source database to import metadata into Designer.
SAP Data Services opens the Datastore Explorer in the right pane. With External Metadata selected at the
top, the explorer lists all the tables in the source database.
Note
Data Services imports the metadata for each table into the local repository.
Note
5. Expand the Tables node under the ODS_DS datastore in the object library.
Related Information
Use the datastore that you created for the target database to import metadata into Designer.
SAP Data Services opens the Datastore Explorer in the right pane. With External Metadata selected at the
top of the right pane, the explorer lists all of the tables in the target database.
Tutorial
38 PUBLIC Create datastores and import metadata
Note
Data Services imports the metadata for each table into the local repository.
Note
5. Expand the Tables node under the Target_DS datastore in the object library.
Related Information
When you’ve created the source and target datastores and imported metadata, you’re ready to begin the next
segment.
In the next segment, create a file format to define the schema for a flat file named sales_org.txt. Then use
the information in the file format to populate the Sales Org Dimension table with data from the
sales_org.txt flat file.
Related Information
Tutorial
Create datastores and import metadata PUBLIC 39
Importing metadata for source tables [page 38]
Importing metadata for target tables [page 38]
Populate a table from a flat file [page 41]
Tutorial
40 PUBLIC Create datastores and import metadata
8 Populate a table from a flat file
Populate a table with data from a flat file using a file format object.
After you complete the tasks in this segment, you'll learn how to:
The goal
The purpose of this segment is to populate the Sales Org Dimension table with data from a source flat file
named sales_org.txt.
The circled portion of the Star Schema in the following diagram shows the portion we’ll work on in this
segment.
Each task in this segment adds objects to an SAP Data Services project. The project contains objects in a
specific hierarchical order.
Tutorial
Populate a table from a flat file PUBLIC 41
At the end of each task, save your work. You can either proceed to the next task or exit Data Services. If you exit
Data Services before you save your work, the software asks that you save your work before you exit.
Related Information
A file format specifies a set of properties that describe the structure of a flat file.
Use the SAP Data Services file format editor to create a file format for the flat file named sales_org.txt.
Tutorial
42 PUBLIC Populate a table from a flat file
1. Open the Formats tab in the object library and right-click a blank area in the tab.
Option Value
General group:
Type Delimited
Name Format_SalesOrg
Location Local
Date Select ddmmyyyy from the list. If the date format isn't in
the list, type the format in the field.
Input/Output group:
4. In the upper right of the File Format Editor, select date in the DateOpen row under the Data Type column.
5. Select Save & Close.
The following screen capture shows the completed File Format Editor.
Tutorial
Populate a table from a flat file PUBLIC 43
The new format, Format_SalesOrg, appears under the Flat Files node in the File Formats tab of the object
library.
To create a new project, log into the Designer and perform the following steps:
A list of your existing projects appears. If you don’t have any projects created, the list is empty.
2. Enter the following name in Project name: Class_Exercises.
3. Select Create.
The project Class_Exercises appears in the Project Area of the Designer, and in the Projects tab of the Local
Object Library.
Tutorial
44 PUBLIC Populate a table from a flat file
Select the Save All icon to save the project.
Related Information
To create a job in the Class_Exercises project, log into SAP Data Services Designer and perform the
following steps:
1. Open the Project tab in the object library and double-click Class_Exercises.
A new job node appears under the project node with the name “New Job”. At this point, the job name is
editable. If you make a menu selection or click away from the new job, you can rename the job using the
right-click menu and selecting Rename.
Note
The job appears in the Project Area under Class_Exercises, and in the Jobs tab under the Batch Jobs node
in the Local Object Library.
Related Information
Tutorial
Populate a table from a flat file PUBLIC 45
8.4 Adding a work flow
Work flows contain the order of steps in which the software executes a job.
To add a work flow to the Job_SalesOrg batch job, with the Class_Exercises project open in the Project
Area, perform the following steps:
The job opens in the workspace and the tool palette appears to the right of the workspace.
2. Select the work flow button ( ) from the tool palette and select an empty area of the workspace.
A work flow icon appears in the workspace. The work flow also appears in the Project Area hierarchy under
the job JOB_SalesOrg node.
3. Rename the new work flow WF_SalesOrg.
4. Select WF_SalesOrg in the project area.
Note
Work flows are easiest to read in the workspace from left to right and from top to bottom. Keep this
arrangement in mind as you add objects to the work flow workspace.
Related Information
A data flow contains instructions to extract, transform, and load data through data flow objects.
To add a data flow to the WF_SalesOrg workspace, perform the following steps:
1. Select the Data Flow icon ( ) from the tool palette and select an empty area of the workspace.
The data flow icon appears in the workspace and the new data flow appears under the work flow node in
the Project Area.
2. Rename the new data flow DF_SalesOrg.
The project, job, work flow, and data flow objects display in hierarchical order in the Project Area.
Tutorial
46 PUBLIC Populate a table from a flat file
Related Information
The data flow contains objects with instructions to SAP Data Services for building the sales organization
dimension table.
To build the sales organization dimension table, add a source file, query object, and a target table to the
DF_SalesOrg data flow in the workspace.
Perform the steps in each of the following task groups to define and configure the DF_SalesOrg data flow:
To add objects to the data flow, open DF_SalesOrg in the workspace and perform the following steps:
1. Open the Formats ( ) tab In the Local Object Library and expand the Flat Files node.
2. Drag and drop the Format_SalesOrg file format to the left side of the workspace and choose Make
Source from the popup menu.
Position the object to the left of the workspace area to make room for other objects.
3. Select the Query icon on the tool palette ( ) and select an area in the workspace to the right of the file
format object.
All the objects necessary to create the sales organization dimension table are now in the workspace. In the next
section, you connect the objects in the order in which you want the data to flow.
Tutorial
Populate a table from a flat file PUBLIC 47
Task overview: Define the data flow [page 47]
Next task: Define the order of steps in a data flow [page 48]
To define the order that SAP Data Services processes the objects in the data flow DF_SalesOrg, connect the
objects in a specific order.
Data Services reads objects in a data flow from left to right. Therefore, arrange the objects in order from left to
right.
1. Select the small square on the right edge of the Format_SalesOrg source file and drag your pointer to the
triangle on the left edge of the query transform.
The mouse pointer turns into a hand holding a pencil. When you drag from the square to the triangle, the
software connects the two objects with an arrow that points in the direction of the data flow.
2. Use the same drag technique to connect the square on the right edge of the query transform to the triangle
on the left edge of the SALESORG_DIM target table.
The order of operation is established after you connect all of the objects. Next you configure the query
transform.
Previous task: Add objects to the DF_SalesOrg data flow [page 47]
Tutorial
48 PUBLIC Populate a table from a flat file
Next task: Configure the query transform [page 49]
The query transform retrieves a data set that satisfies conditions that you specify.
Before you can configure the query transform, all objects in the data flow must be connected. When you
connect the objects in the data flow, the column information from the source and target files appears in the
Query transform to help you set up the query.
To configure the query transform in the Job_SalesOrg job, perform the following steps:
1. Double-click the query object listed under the DF_SalesOrg node in the Project Area.
The query editor opens. The query editor is divided into the following areas:
• Schema In pane: Lists the columns in the source file
• Schema Out pane: Lists the columns in the target file
• Options pane: Contains tabs for defining the query
2. To map the input columns to the output columns, select the column icon in the Schema In pane and drag it
to the corresponding column in the Schema Out pane. Map the columns as listed in the following table.
SalesOffice → SALESOFFICE
DateOpen→ DATEOPEN
Region→ REGION
Note
After you drag the input column to the output column, an arrow icon appears next to the source column to
indicate that the column has been mapped.
The following graphic of the Query Editor contains red letters that relate to the descriptions in the following
legend:
• A. Target schema
• B. Source schema
• C. Query option tabs
• D. Column mapping definition
Tutorial
Populate a table from a flat file PUBLIC 49
3. Optional: Select a field in the Schema Out area and view the column mapping definition in the Mapping tab
of the options pane.
Example
For example, in the graphic, the mapping for the SalesOffice input column to the SALESOFFICE output
column is: Format_SalesOrg.SalesOffice.
4. Select the cell in the SALESOFFICE row under the Type column in the Schema Out pane and choose
Decimal from the list.
5. Set Precision to 10 and Scale to 2 in the Type:Decimal popup dialog and select OK.
6. Select the Back arrow icon from the Designer toolbar to close the query editor and return to the data
flow workspace.
7. Save your work and optionally close the workspaces that you have open.
Previous task: Define the order of steps in a data flow [page 48]
The Validation menu provides design-time validation options and not runtime verification. You can check for
runtime errors later in the process.
Tutorial
50 PUBLIC Populate a table from a flat file
2. Select Validation Validate Current View .
Note
You can alternatively use the icon bar and select either the Validate Current icon or the Validate All icon
to perform the same validations.
After the validation completes, Data Services displays the Output dialog with either the Errors tab or the
Warning tab open.
Note
For this exercise, two warning messages appear. The warnings are a result of changing the data type of
the SALESOFFICE column when we defined the output schema. You don't have to fix anything for
warning messages.
For this exercise, there aren't any errors. However, if you do receive errors, fix the errors before you
proceed.
3. Select the “X” in the upper right corner of the Output dialog to close it.
Related Information
If there are errors or warnings after you validate your job, view more details and open the area where the
problem exists.
The job doesn't execute when there are validation errors. Therefore, you must fix errors. Warnings don't
prohibit the job from running.
To help you fix validation errors, view more information about the error. To view more information about
validation errors and warnings in the Output dialog box, perform the following steps:
SAP Data Services displays the Message dialog box in which you can read the expanded notification text.
2. Double-click the error notification or right-click the error and select Go to Error.
Tutorial
Populate a table from a flat file PUBLIC 51
Data Services takes you to the object that contains the error.
Example
Data Services opens the target editor with the column SALESOFFICE highlighted in the Schema Out
pane.
Save the objects that you have created and exit SAP Data Services at any time.
To save objects and close Data Service, use one of the following methods:
• Save all changed objects from the current session: Select the Save All icon in the toolbar.
• Save while closing: When you select to close Data Services, it presents a list of all changed objects that
haven't been saved. Select Yes to save all objects in the list, or select specific objects to save and then
select Yes.
Before you execute a job, either as an immediate or scheduled task, ensure that the Job Server is running.
With the SAP Data Services Designer open, look at the bottom right of the page. The status of the Job Server is
indicated with icons.
Icon Description
The name of the active Job Server and port number appears in the status bar in the lower left when the cursor
is over the icon.
Tutorial
52 PUBLIC Populate a table from a flat file
An additional icon appears indicating whether the profile server is running. You will know what the icon
represents by viewing the message that appears when you hover your mouse over the icon.
When you execute the job, SAP Data Services moves your data through the Query transform and loads the data
to the target table, SALESORG_DIM.
Complete all of the steps to populate the Sales Organization Dimension from a flat file. Ensure that all errors
are fixed and that you save the job. If you exited Data Services, log back in to Data Services, and ensure that the
Job Server is running.
If you have not saved changes that you made to the job, the software prompts you to save them now.
The software validates the job and displays the Execution Properties dialog box.
The Execution Properties dialog box includes parameters and options for executing the job and to set
traces and global variables. Do not change the default settings for this exercise.
Tutorial
Populate a table from a flat file PUBLIC 53
4. Select OK.
Data Services displays a job log in the workspace. Trace messages appear while the software executes the
job. If the job encounters errors, an error icon becomes active and the job stops executing.
5. Change the log view by selecting the applicable log icon at the top of the job log.
Log files
Trace log A list of the job steps in the order they started.
Tutorial
54 PUBLIC Populate a table from a flat file
Log file Description
Monitor log A list of each step in the job, the number of rows proc
essed by that step, and the time required to complete the
operation.
Note
The error icon is not active when there are no errors.
Note
Remember that you should periodically close the tabs in the workspace when you are finished working with
the objects in the tab. To close a tab, click the X icon in the upper right of the workspace.
In the next segment, you populate the Time Dimension table with the following time attributes
• Year number
• Month number
• Business quarter
You can now exit Data Services or go to the next group of tutorial exercises. If you exit, the software reminds
you to save your work if you did not save it before. The software saves all projects, jobs, workflows, data flows,
and results in the local repository.
Related Information
Tutorial
Populate a table from a flat file PUBLIC 55
9 Populate a time dimension table
Time dimension tables contain date and time-related attributes such as season, holiday period, fiscal quarter,
and other attributes that aren’t directly obtainable from traditional SQL style date and time data types.
In this segment, you'll practice the basic skills that you learned in the previous segment. In addition, you'll learn
how to do the following:
The goal
The Time Dimension table in this segment is simple in that it contains only the year number, month number,
and business quarter as time attributes. It uses a Julian date as a primary key.
Note
The name of this section implies that we’re working with time, as in hours, minutes, and seconds. However,
we’re actually working with time in the sense of year, month, and business quarter. Don't confuse our
reference to time with time on the clock.
The following diagram shows the Star Schema with the portion we’ll work on in this segment circled.
Tutorial
56 PUBLIC Populate a time dimension table
Adding a job and data flow to the project [page 57]
Prepare a new job and data flow to populate the Time Dimension table.
Prepare a new job and data flow to populate the Time Dimension table.
Log into SAP Data Services Designer and open the Class_Exercises project in the Project Area.
To add a new job and data flow to the Class_Exercises project, perform the following steps:
1. Right-click the project name Class_Exercises in the Project Area and select New Batch Job.
The new job appears under the Class_Exercises project node in the Project Area and an empty job
workspace opens. Notice that the new job listed in the Project Area contains the generic job name in a text
box.
Tutorial
Populate a time dimension table PUBLIC 57
2. Rename the job to JOB_TimeDim.
3. Right-click in the empty Job_TimeDim workspace and select Add New Data Flow .
Note
A work flow is an optional object. For this job, we don't include a work flow.
Related Information
The components of the DF_TimeDim data flow consist of a Date_Generation transform as a source and a table
as a target.
To add objects to the DF_TimeDim data flow, perform the following steps in the DF_TimeDim workspace:
1. Open the Transforms tab ( ) in the Local Object Library and expand the Data Integrator
node.
2. Drag and drop the Date_Generation transform onto the data flow workspace.
The transforms in the Transform tab are predefined. The transform on your workspace is a copy of the
predefined Date_Generation transform.
3. Select the Query icon on the tool palette ( ) and select an empty area to the right of the
Date_Generation transform.
SAP Data Services adds a query transform icon to the data flow.
4. Open the Datastore tab in the Local Object Library and expand the Tables node under Target_DS.
5. Drag and drop the TIME_DIM table onto the workspace to the right of the query object and select Make
Target from the popup menu.
Tutorial
58 PUBLIC Populate a time dimension table
6. Connect all of the objects starting with the Date_Generation transform, through the query transform and
finally to the target table.
All of the objects to create the time dimension table are in the workspace.
Related Information
The Date_Generation transform outputs one column of dates in a sequence that you define. For more
information about the Date_Generation transform, see the Reference Guide.
To configure the Date_Generation transform, perform the following steps with the DF_TimeDim open in the
workspace:
Increment daily
3. Select the Back arrow icon in the upper toolbar to close the transform editor and return to the data
flow.
4. Select the Save All icon in the toolbar.
Tutorial
Populate a time dimension table PUBLIC 59
Related Information
Configure the query to map the DI_GENERATED_DATE column from the transform, to apply functions to the
output columns, and to map the output columns to an internal data set.
Perform the following steps with the DF_TimeDim data flow workspace open:
The query editor opens. The Schema In pane of the query editor contains one column from the
Date_Generation transform, DI_GENERATED_DATE. The Schema Out pane has columns that are copied
from the target table.
2. Drag the DI_GENERATED_DATE column from the Schema In pane to the NATIVEDATE column in the
Schema Out pane.
A blue arrow appears to the left of each column name indicating that the column is mapped.
3. Map each of the other output columns in the output schema by performing the following substeps:
a. Select the column name in the Schema Out pane.
b. Type a function for the column in the Mapping tab in the lower pane as directed in the following table.
The following table contains the column name and the corresponding function to type.
DATE_ID julian(di_generated_date) Sets the Julian date for the date value.
Tutorial
60 PUBLIC Populate a time dimension table
The data flow validates without errors. If there are errors, go back through the steps up to this point and
perform any steps that you missed.
6. Select the Save All icon in the toolbar.
Related Information
After you save and validate the data flow DF_TimeDim, execute the job, JOB_TimeDim.
The JOB_TimeDim job populates the TIME_DIM dimension table with the transformed data.
The following graphic shows the data flow with the magnifying glass icon circled.
The target data table opens in the lower pane. The following screen capture shows the view of the data. The
table contains a row for each date beginning with the start date that you entered in the Date_Generation
transform editor and ending with the end date. The functions that you entered in the query editor break
down each date by month, business quarter, and year.
Tutorial
Populate a time dimension table PUBLIC 61
Task overview: Populate a time dimension table [page 56]
Related Information
In the next segment, you'll extract data to populate the Customer Dimension table.
At this point, you've populated the following tables in the sales data warehouse:
Tutorial
62 PUBLIC Populate a time dimension table
• Sales Org Dimension from a flat file
• Time Dimension from a transform
Remember to periodically close the workspace tabs when you are finished working with the objects in the tab.
Right now, you can exit SAP Data Services or go to the next segment of tutorial exercises. If you exit, Data
Services reminds you to save any work that you haven't saved. Data Services saves all projects, jobs,
workflows, data flows, and results in the local repository.
Related Information
Tutorial
Populate a time dimension table PUBLIC 63
10 Populate a table with data from a
relational table
In this segment, you extract data from a relational table to populate the Customer Dimension table.
While you perform the tasks in this segment, you'll learn some basic features of the interactive debugger. In
addition, you'll learn about the following:
The goal
Populate the Customer Dimension table with data from a relational table. Then use the interactive debugger
feature to examine the data after it flows through each transform or object in the data flow.
The following diagram shows the Star Schema with the portion we’ll work on in this segment circled.
Adding the CustDim job, work flow, and data flow [page 65]
Add a new job, work flow, and data flow to the Class_Exercises project.
Tutorial
64 PUBLIC Populate a table with data from a relational table
Add objects to DF_CustDim data flow in the workspace area to build the instructions for populating the
Customer Dimension table.
10.1 Adding the CustDim job, work flow, and data flow
Add a new job, work flow, and data flow to the Class_Exercises project.
Open the Class_Exercises project so it appears in the Project Area in SAP Data Services Designer.
A tab opens in the workspace area for the new batch job.
2. Rename this job JOB_CustDim.
3. Select the work flow button from the tool palette at right and click in the workspace.
Task overview: Populate a table with data from a relational table [page 64]
Related Information
Tutorial
Populate a table with data from a relational table PUBLIC 65
Executing the CustDim job [page 69]
The interactive debugger [page 70]
What's next [page 74]
Work flows [page 18]
Add objects to DF_CustDim data flow in the workspace area to build the instructions for populating the
Customer Dimension table.
In this exercise, you build the data flow by adding the following objects:
• Source table
• Query transform
• Target table
Parent topic: Populate a table with data from a relational table [page 64]
Related Information
Adding the CustDim job, work flow, and data flow [page 65]
Validating the CustDim data flow [page 68]
Executing the CustDim job [page 69]
The interactive debugger [page 70]
What's next [page 74]
1. Open the Datastore tab in the Local Object Library and expand the Tables node under ODS_DS.
Tutorial
66 PUBLIC Populate a table with data from a relational table
2. Drag and drop the ODS_CUSTOMER table to the workspace and select Make Source from the popup dialog
box.
3. Select the Query button on the tool palette at right ( ) and select an empty area of the workspace to
the right of the ODS_CUSTOMER table.
Next you'll define the input and output schemas in the Query transform.
You configure the query transform by mapping columns from the source to the target objects.
Note
CUST_ID → CUST_ID
CUST_CLASSF → CUST_CLASSF
NAME1 → NAME1
ADDRESS → ADDRESS
CITY → CITY
Tutorial
Populate a table with data from a relational table PUBLIC 67
Schema In column Schema Out column
REGION_ID → REGION_ID
ZIP → ZIP
Note
If your database manager is Microsoft SQL Server or Sybase ASE, specify the columns in the order
shown in the table.
A blue arrow appears to the left of each column indicating the column is mapped. Because you didn't map
the CUST_TIMESTAMP column, it doesn’t have a blue arrow.
3. Right-click CUST_ID in the Schema Out pane and select Primary Key.
A key icon appears to the left of the CUST_ID column in the Schema Out pane indicating that the column is
a primary key.
4. Select the Back arrow icon ( ) in the toolbar to close the query editor and return to the data flow.
5. Save your work.
Validate the data flow before execution to make sure that it’s constructed correctly.
Note
You can alternatively use the icon bar and select Validate Current and Validate All to perform the same
validations.
The Output dialog box opens. If there are errors, the dialog opens with the Errors tab opens. In this situation,
the dialog box opens with the Warnings tab opened.
Task overview: Populate a table with data from a relational table [page 64]
Tutorial
68 PUBLIC Populate a table with data from a relational table
Related Information
Adding the CustDim job, work flow, and data flow [page 65]
Define the data flow [page 66]
Executing the CustDim job [page 69]
The interactive debugger [page 70]
What's next [page 74]
1. Right-click the JOB_CustDim job in the Project Area and select Execute.
The Trace log opens and displays the job execution process. The job completes when you see the message
that the job completed successfully.
3. After the execution completes successfully, view the output data:
a. Open the DF_CustDim data flow in the workspace.
b. Select the magnifying glass icon that appears on the lower right corner of the target object.
For information about the icon options above the sample data, see “Using View Data” in the Designer
Guide.
4. Select the Back arrow icon in the upper toolbar to close the workspace pane.
Task overview: Populate a table with data from a relational table [page 64]
Related Information
Adding the CustDim job, work flow, and data flow [page 65]
Define the data flow [page 66]
Validating the CustDim data flow [page 68]
The interactive debugger [page 70]
What's next [page 74]
Tutorial
Populate a table with data from a relational table PUBLIC 69
10.5 The interactive debugger
SAP Data Services Designer has an interactive debugger that enables you to examine and modify data row by
row during job execution.
The debugger uses filters and breakpoints so that you can examine what happens to the data after each
transform or object in the data flow:
• Debug filter: Functions as a simple query transform with a WHERE clause. Use a filter to reduce a data set
in a debug job execution.
• Breakpoint: A point in the execution where the debugger pauses the job execution and returns the control
to you.
When you start a job in the interactive debugger, Data Services opens additional panes in the workspace area.
The following screen capture shows the default locations for the additional panes.
The following icons in the upper toolbar enable you to toggle the panes in the workspace:
• Call Stack
Tutorial
70 PUBLIC Populate a table with data from a relational table
• Debug Variables
• Trace
The Tutorial doesn't show you all aspects of the debugger feature. To learn more about the interactive
debugger, see the Designer Guide.
Parent topic: Populate a table with data from a relational table [page 64]
Related Information
Adding the CustDim job, work flow, and data flow [page 65]
Define the data flow [page 66]
Validating the CustDim data flow [page 68]
Executing the CustDim job [page 69]
What's next [page 74]
A breakpoint is a location in the data flow where a debug job execution pauses and returns control to you.
Ensure that you have the Class_Exercises project open in the Project Area.
To set a breakpoint in the DF_CustDim data flow, perform the following step:
The following screen capture shows the Breakpoint editor with the correct options selected.
Tutorial
Populate a table with data from a relational table PUBLIC 71
5. Select OK.
Related Information
The interactive debugger stops at invervals so that you can see what's happening during job execution.
Before you perform the following task, make sure that you set a breakpoint by following the steps in Setting a
breakpoint in a data flow [page 71].
Tutorial
72 PUBLIC Populate a table with data from a relational table
The debugging starts. After SAP Data Services processes the first row, the debugger stops the process and
displays the first record, Cust_ID DT01 in the Target Data pane. Also notice that, for each row processed,
a trace message appears in the Trace pane.
Another row replaces the existing row in the Target Data pane.
4. To see all debugged rows, select the All checkbox in the upper right of the Target Data pane.
The Target Data pane shows the first two rows that it has debugged. As you progress through each row, the
Target Data pane adds the processed rows.
5. To stop the debugger, select the Stop Debug icon in the toolbar ( ).
Related Information
Set a condition on the breakpoint to stop processing when a specific condition is met.
Example
Add a breakpoint condition for the Customer Dimension job to break when the debugger reaches a row in
the data with a REGION_ID value of 2.
1. Double-click the breakpoint icon on the connector line between the source and the query in the
DF_CustDim data flow.
Tutorial
Populate a table with data from a relational table PUBLIC 73
4. Type 2 for Value.
5. Select OK.
6. Right-click JOB_CustDim in the Project Area and choose Start debug.
SAP Data Services starts to debug the job. The debugger stops after processing the row that has the value
of 2 for the REGION_ID column.
7. To stop the debug mode, select the Stop Debug icon ( ) in the toolbar.
Related Information
In the next segment, you'll learn about document type definitions (DTD) and extracting data from an XML file.
For more information about the topics covered in this section, see the Designer Guide.
Parent topic: Populate a table with data from a relational table [page 64]
Related Information
Adding the CustDim job, work flow, and data flow [page 65]
Define the data flow [page 66]
Validating the CustDim data flow [page 68]
Executing the CustDim job [page 69]
The interactive debugger [page 70]
Populate a table from an XML File [page 75]
Tutorial
74 PUBLIC Populate a table with data from a relational table
11 Populate a table from an XML File
In this segment, use a DTD (Data Type Definition) file to define the format of an XML file, which has a
hierarchical structure.
An XML file represents hierarchical data using XML tags instead of rows and columns as in a relational table.
In this segment, you'll learn two methods to flatten a nested schema and process an XML file:
Tip
Using an XML_Pipeline transform is much easier than using a query transform. However, performing the
exercises using a query transform first helps you to appreciate the simplicity of the XML_Pipeline method.
To help you understand the goal for the tasks in this section, read about nested data in the Designer Guide.
The goal
Data Services can process hierarchical data only after you’ve flattened the hierarchy. The goal of this segment
is to flatten a nested schema from an XML file and output the data to a table.
The circled portion of the Star Schema in the following diagram shows the portion we’ll work on in this
segment.
Tutorial
Populate a table from an XML File PUBLIC 75
Nested data [page 77]
SAP Data Services provides a way to view and manipulate hierarchical relationships within data flow
sources, targets, and transforms using Nested Relational Data Modeling (NRDM).
Adding MtrlDim job, work flow, and data flow [page 78]
To create the objects for this task, we omit the details and rely on the skills that you learned in the first
few exercises of the tutorial.
Tutorial
76 PUBLIC Populate a table from an XML File
11.1 Nested data
SAP Data Services provides a way to view and manipulate hierarchical relationships within data flow sources,
targets, and transforms using Nested Relational Data Modeling (NRDM).
In this tutorial, we use an XML file that has a hierarchical structure. We use a document type definition (DTD)
schema to define the XML. The DTD describes the data contained in the XML document and the relationships
among the elements in the data.
Using the nested data method can be more concise than other methods of representing nested data.
Example
For example, when you represent nested data in a single data set, you have repeated information. In the
following table, the first four columns contain repeated information.
Also, columns inside a nested schema can contain columns. There is a unique instance of each nested
schema for each row at each level of the relationship:
The following screen capture shows the structure of nested source data in the Schema In pane of a query
editor in Designer:
Tutorial
Populate a table from an XML File PUBLIC 77
Parent topic: Populate a table from an XML File [page 75]
Related Information
Adding MtrlDim job, work flow, and data flow [page 78]
Importing a document type definition [page 79]
Define the MtrlDim data flow [page 80]
Validating the MtrlDim data flow [page 88]
Executing the MtrlDim job [page 89]
Leveraging the XML_Pipeline [page 89]
What's next [page 93]
To create the objects for this task, we omit the details and rely on the skills that you learned in the first few
exercises of the tutorial.
Object Rename
Job JOB_MtrlDim
Tutorial
78 PUBLIC Populate a table from an XML File
Related Information
A document type definition (DTD) schema file describes the data contained in an XML document and the
relationships among the elements in the data.
The scripts that you ran at the beginning of the tutorial added the necessary objects for you to perform the
following task.
Import the DTD schema named Mtrl_List by performing the following steps.
MTRL_MASTER_LIST is the primary node. SAP Data Services imports only elements of the DTD that belong
to this primary node and any subnodes.
Data Services adds the DTD Mtrl_List to the Nested Schemas group in the Local Object Library. The
following is a text view of the Mtrl.dtd file:
<?xml encoding="UTF-8"?>
<!ELEMENT MTRL_MASTER_LIST (MTRL_MASTER+, EFF_DATE)>
<!ELEMENT MTRL_MASTER (MTRL_ID, MTRL_TYPE, IND_SECTOR, MTRL_GROUP, UNIT,
TOLERANCE, HAZMAT_IND*, TEXT+ )>
<!ELEMENT MTRL_ID (#PCDATA)>
<!ELEMENT MTRL_TYPE (#PCDATA)>
Tutorial
Populate a table from an XML File PUBLIC 79
<!ELEMENT IND_SECTOR (#PCDATA)>
<!ELEMENT MTRL_GROUP (#PCDATA)>
<!ELEMENT UNIT (#PCDATA)>
<!ELEMENT TOLERANCE (#PCDATA)>
<!ELEMENT HAZMAT_IND (HAZMAT_TYPE, HAZMAT_LEVEL )>
<!ELEMENT HAZMAT_TYPE (#PCDATA)>
<!ELEMENT HAZMAT_LEVEL (#PCDATA)>
<!ELEMENT TEXT (LANGUAGE, SHORT_TEXT, LONG_TEXT*)>
<!ELEMENT LANGUAGE (#PCDATA)>
<!ELEMENT SHORT_TEXT (#PCDATA)>
<!ELEMENT LONG_TEXT (#PCDATA)>
<!ELEMENT EFF_DATE (#PCDATA)>
Related Information
In this exercise you add specific objects to the DF_MtrlDim data flow workspace and connect them in the
order in which the software should process them.
Follow the tasks in this exercise to configure the objects in the DF_MtrlDim data flow so that the data flow
correctly processes hierarchical data from an XML source file.
Tutorial
80 PUBLIC Populate a table from an XML File
Related Information
Build the DF_MtrlDim data flow with a source, target, and query transform.
The Source File Editor opens containing the Schema Out options in the upper pane and the Source options
in the lower pane.
Tutorial
Populate a table from an XML File PUBLIC 81
5. To complete the options in the Source tab, perform the following substeps:
a. Ensure that XML is selected.
b. Choose <Select file> from the File list.
The File option in the Source tab populates with the file name and location of the XML file.
d. Select Enable validation.
Enable validation compares the incoming data to the stored data type definition (DTD) format.
Data Services automatically populates the following options in the Source tab:
• Format name: The schema name Mtrl_List
• Root element name: The primary node name MTRL_MASTER_LIST
Note
6. Select the Back arrow icon in the toolbar to return to the DF_MtrlDim data flow workspace.
7. Select the Query Transform icon in the tool palette and then select an empty area of the workspace, to the
right of the table object.
8. Rename the query transform “qryunnest”.
9. Drag and drop the MTRL_DIM table from the Tables node under Target_DS to the workspace.
10. Choose Make Target from the popup menu.
11. Connect the objects in the data flow to indicate the flow of data from the source XML file through the query
to the target table.
12. Save your work.
Related Information
Use the query transform to unnest the hierarchical Mtrl_List XML source data properly.
We've broken this process into several segments. Make sure that you take your time and try to understand
what you accomplish in each segment.
Tutorial
82 PUBLIC Populate a table from an XML File
The Query editor opens as shown in the following screen capture. Notice the nested structure of the source in
the Schema In pane. The Schema Out pane reflects the current structure of the target table MTRL_DIM. Notice
the differences in column names and data types between the input and output schemas.
In the next several exercises, we use specific configuration settings to systematically unnest the table.
Related Information
Move the MTRL_MASTER schema from the Schema In pane to the Schema Out pane in the query editor for
qryunnest:
1. Select all five columns in the Schema out pane so they’re highlighted:
• MTRL_ID
• MTRL_TYP
• IND_SECTOR
• MTRL_GRP
• DESCR
2. Right-click and select Cut.
SAP Data Services removes the five columns and saves the column names and data types to your
clipboard.
Tutorial
Populate a table from an XML File PUBLIC 83
Caution
Don’t use Delete. By selecting Cut instead of Delete, SAP Data Services copies the correct column
names and data types from the target schema to the clipboard. In a later step, we instruct you to paste
the clipboard information to the Schema Out pane of the MTRL_Master target table schema.
3. Drag and drop the MTRL_MASTER schema to the Schema Out pane from the Schema In pane.
The following screen capture shows the results on the qryunnest schema in the Schema Out pane. Notice
that MTRL_MASTER is now nested under the qryunnest schema.
The schema now contains the MTRL_MASTER schema that you just moved to the Schema Out pane in
the query editor.
1. Right-click MTRL_MASTER in the Schema Out pane and choose Make Current.
Tutorial
84 PUBLIC Populate a table from an XML File
• MTRL_ID
• MTRL_TYPE
• IND_SECTOR
• MTRL_GROUP
• UNIT
• TOLERANCE
• HAZMAT_IND
The following screen capture shows the remaining nested nodes under MTRL_MASTER in the Schema Out
pane.
3. Right-click the MTRL_MASTER schema in the Schema Out pane and choose Paste.
The columns that you originally deleted from the Schema Out pane are added back to the schema.
However, now the columns appear under the MTRL_MASTER schema.
4. Map the following fields from the Schema In pane to the corresponding columns in the Schema Out pane:
• MTRL_ID
• MTRL_TYPE
• IND_SECTOR
• MTRL_GROUP
The following screen capture shows the columns that you added back under the MTRL_MASTER schema.
Tutorial
Populate a table from an XML File PUBLIC 85
11.4.2.3 3. Map the DESCR column
Map the SHORT_TEXT column in the Schema In pane to the DESCR column in the Schema Out pane.
1. Right-click the DESCR column in the Schema Out pane and choose Cut from the popup menu.
SAP Data Services removes the DESCR column from the Schema Out pane, but saves it to your clipboard.
2. Right-click the TEXT nested table in the Schema Out pane and select Make Current from the popup menu.
3. Right-click the LANGUAGE column in the Schema Out pane and select Paste Insert Below.
Data Services places the DESCR column at the same level as the SHORT_TEXT column.
4. Map the SHORT_TEXT column from the Schema In pane to the DESCR column in the Schema Out pane.
5. Delete the following two columns and nested schema from the Schema Out pane:
• LANGUAGE
• SHORT_TEXT
• TEXT_nt_1
The TEXT nested table in the Schema Out pane contains only the DESCR column.
Tutorial
86 PUBLIC Populate a table from an XML File
6. View the results of the steps on the MTRL_DIM target table:
a. Double-click MTRL_DIM in the Project Area under DF_MtrlDim.
The Schema In pane shows the same schemas and columns that appear in the queryunnest query
Schema Out pane. However, the Schema In of the MTRL_DIM target table is still not flat, and it won’t
produce the flat schema that the target requires. Therefore, next we flatten the remaining schema.
1. In the Schema Out pane, right-click the TEXT node and choose Unnest.
The table icon next to the TEXT node appears with a left-pointing arrow ( ).
2. Right-click MTRL_MASTER in the Schema Out pane and select Make Current.
The Schema In and Schema Out panes show one level for each.
Tutorial
Populate a table from an XML File PUBLIC 87
5. Select the Save All icon in the upper toolbar.
After unnesting the source data using the Query in the last exercises, validate the DF_MtrlDim to make sure
that there are no errors.
The Output dialog box opens in the Warnings tab. There are warning messages indicating that data type
conversion will be used to convert from varchar (1024) to the data type and length of the target file.
If your design contains any errors in the Errors tab, you must fix them. For example, the following error
indicates that the source schema is still nested: “The flat loader...cannot be connected to NRDM”. Right-
click the error message and select Go to error. If you have syntax errors, a dialog box appears with a
message describing the error. Address all errors before executing the job.
Related Information
Tutorial
88 PUBLIC Populate a table from an XML File
11.6 Executing the MtrlDim job
Execute the JOB_MtrlDim to see the unnested data in the output table.
Before you execute the JOB_MtrlDim job, validate the data flow and save your work.
The Trace Messages dialog opens showing processing messages. The last message is that the job
completed successfully.
Open DF_MtrlDim in the workspace and select the magnifying glass icon in the lower right corner of the
MTRL_DIM target table. A table opens in the lower pane showing a sample of the transformed data.
Related Information
The XML_Pipeline transform extracts data from an XML file using tools such as SQL SELECT statements.
When you extract data from an XML file to load into a target data warehouse, you obtain only parts of the XML
file. In the previous exercises, we used the Query transform for partial extraction. The XML_Pipeline transform
extracts much more than the Query transform because it uses many of the clauses of a SQL SELECT
statement. Additionally, the XML_Pipeline transform performs better than the Query transform because of the
way it uses memory:
• Uses less memory: Processes each instance of a repeatable schema within the XML file rather than
building the whole XML structure first.
• Uses memory efficiently: Releases and reuses memory continually to flow XML data through the
transform more steadily.
To build the MTRL_DIM table from a nested XML file, use the XML_Pipeline transform in addition to a Query
transform. Construct the data flow with the following objects:
Tutorial
Populate a table from an XML File PUBLIC 89
• XML file: The source
• XML_Pipeline transform: Obtains a repeatable portion of the nested source schema
• Query transform: Maps the output from the XML_Pipeline transform to a flat target schema
• Flat file: The target
Related Information
In this exercise, you’ll achieve the same outcome as in the previous exercise, but you use the XML Pipeline
transform for more efficient configuration and processing.
1. Add the following objects to Class_Exercises in the Project Area using one of the methods you've
learned in previous exercises:
• JOB_Mtrl_Pipe
• WF_Mtrl_Pipe
• DF_Mtrl_Pipe
2. Open DF_Mtrl_Pipe in the workspace.
3. Expand the Nested Schemas node in the Formats tab of the Local Object Library.
4. Drag and drop the Mtrl_List file into the DF_Mtrl_Pipe workspace and choose Make File Source.
5. Double-click the Mtrl_List source file to open the source file editor.
6. In the Source tab, ensure XML is selected.
7. Choose Select file from the File list.
8. Select the mtrl_list.xml in <LINK_DIR>\Tutorial Files\ and select Open.
Tutorial
90 PUBLIC Populate a table from an XML File
9. Select Enable Validation.
Enable Validation enables comparison of the incoming data to the stored DTD format.
10. Select the back arrow in the upper toolbar to return to the data flow workspace.
11. Expand the Data Integrator node in the Transforms tab of the Local Object Library.
12. Drag and drop the XML_Pipeline transform to the DF_Mtrl_Pipe workspace.
13. Select the Query transform icon in the tool palette and select an empty area of the workspace.
14. Rename the Query transform Query_Pipeline.
15. Drag and drop the MTRL_DIM table from the Tables node of the Target_DS datastore to the DF_Mtrl_Pipe
workspace and select Make Target.
16. Connect the objects in the data flow to indicate the flow of data from the source XML file through the
XML_Pipeline and Query_Pipeline transforms to the target table.
The following shows an example of the data flow in Designer.
Related Information
The XML_Pipeline transform enables you to map a nested column directly to a flat target table.
Set up the job as instructed in Create a job, work flow, and data flow [page 90].
The transform editor opens. The Schema In pane shows the nested structure of the source file.
2. Drag and drop the following columns from the Schema In pane to the Schema Out pane.
Tutorial
Populate a table from an XML File PUBLIC 91
• MTRL_ID
• MTRL_TYPE
• IND_SECTOR
• MRTL_GROUP
• SHORT_TEXT
3. Click the Back arrow icon ( ) from the upper toolbar to close the transform editor.
4. Double-click Query_Pipeline to open the query editor.
5. Map each column from the Schema In pane to the column in the Schema Out pane as shown in the
following table.
MTRL_ID → MTRL_ID
MTRL_TYPE → MTRL_TYPE
IND_SECTOR → IND_SECTOR
MTRL_GROUP → MTRL_GROUP
SHORT_TEXT → DESCR
When you map each column from the Schema In pane to the Schema Out pane, the column Type in
Schema Out doesn't change, even though the input fields have the type varchar(1024).
6. Double-click MTRL_DIM in the data flow to open the target table editor.
7. Open the Options tab in the lower pane and select Delete data from table before loading.
This option deletes existing data in the table before loading new data. If you don’t select this option, SAP
Data Services appends data to the existing table.
8. Select the Back arrow icon in the upper toolbar to close the target table editor.
9. Select the Validate icon from the upper toolbar.
The Warnings tab opens. The warnings indicate that each column will be converted to the data type in the
Schema Out pane.
10. Execute the JOB_Mtrl_Pipe job.
11. Accept the default settings in Execution Properties and select OK.
After the job successfully executes, open DF_Mtrl_Pipe in the workspace and select the magnifying glass
icon in the lower right corner of the MTRL_DIM target table. A table opens in the lower pane showing a sample
of the transformed data.
Related Information
Tutorial
92 PUBLIC Populate a table from an XML File
11.8 What's next
In the next segment, learn about using joins and functions to obtain data from multiple relational tables.
Related Information
Tutorial
Populate a table from an XML File PUBLIC 93
12 Populate a table from multiple relational
tables
The goal
Create an inner join to combine data from the ods_SalesItem and ods_SalesOrder tables to populate the
SalesFact table. Then add order status information from the ods_Delivery table using a Lookup function.
The circled portion of the Star Schema in the following diagram shows the portion we’ll work on in this
segment.
More information:
Tutorial
94 PUBLIC Populate a table from multiple relational tables
• For information about joins in the Query transform, see the Query transform section in the Reference
Guide.
• For more information about operations on nested data, see the Nested data section in the Designer Guide.
• For more information about the Lookup expression, functions, and filters, see the Designer Guide.
• For more information about Impact and Lineage reports, see the Management Console Guide.
Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 105]
Use the Data Services Management Console to view a lineage analysis of the Sales Fact job.
Use the basic skills that you’ve learned in earlier exercises to set up a new job, work flow, and data flow.
1. Add a new job to the Class_Exercises project in the Project Area and name it JOB_SalesFact.
2. Add the following objects to the job using the skills you've learned in previous exercises:
Task overview: Populate a table from multiple relational tables [page 94]
Tutorial
Populate a table from multiple relational tables PUBLIC 95
Related Information
Build the DF_SalesFact data flow by adding objects including two source tables.
Arrange the two sources vertically on the left of the workspace, with one above the other.
Tutorial
96 PUBLIC Populate a table from multiple relational tables
7. Save your work.
Task overview: Populate a table from multiple relational tables [page 94]
Related Information
Use an inner join to join the columns of the two source tables to include only the matching columns from both
tables.
SAP Data Services defines the relationship between the ODS_SALESORDER and ODS_SALESITEM tables by
matching the key column SALES_ORDER_NUMBER, which is in both tables. The join option generates a join
expression based on primary and foreign keys and column names.
The values in the SALES_ORDER_NUMBER column must be the same in each table before the record is included
in the output.
1. Double-click the Query transform in the DF_SalesFact workspace to open the query editor.
2. Open the FROM tab in the lower pane.
6. Click the eclipses icon next to the Right table name ODS_SALESITEM.
The Smart Editor opens. Add a filter to apply to the records that qualify for the inner join.
7. Place your cursor at the end of the first line and press Enter .
8. Type the following two lines, each on its own line, using the casing as shown. Alternately, copy and paste
the text into the Smart Editor:
Tutorial
Populate a table from multiple relational tables PUBLIC 97
These lines filter the sales orders by date. Data Services moves all orders that are from January 1, 2007 up
to and including December 31, 2007 to the target table.
Tip
If you decide to type the lines, as you type the function names, the Smart Editor prompts you with
options. Either ignore the prompts and keep typing or select an option that is highlighted and press
Enter . You can alternately double-click the prompt to accept it.
The join conditions that you added in the Smart Editor appear in the Join Condition column and in the
FROM Clause area.
10. In the Schema In and Schema Out panes of the query editor, map the following source columns to output
columns using drag and drop.
ORDER_DATE → SLS_DOC_DATE
SALES_LINE_ITEM_ID → SLS_DOC_LINE_NO
MTRL_ID → MATERIAL_NO
PRICE → NET_VALUE
11. Keep the query editor open for the next task.
Task overview: Populate a table from multiple relational tables [page 94]
Related Information
Tutorial
98 PUBLIC Populate a table from multiple relational tables
Adding objects to the SalesFact data flow [page 96]
Purpose of the lookup_ext function [page 99]
Configuring the lookup_ext function [page 101]
Executing the SalesFact job [page 104]
Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 105]
What's next [page 107]
The lookup_ext function gets data from a lookup table and outputs the data when user-defined conditions
are met.
In this example, we create a lookup_ext function to output data to the SALES_FACT target table from a non-
source table. We designate the non-source table as a lookup table in the data flow configuration.
The SALES_FACT target table contains a column named ORD_STATUS that we haven't mapped because there
are no comparable columns in our two source tables. The ODS_DELIVERY table contains the order status
information in the DEL_ORDER_STATUS column. Therefore, we establish ODS_DELIVERY as a lookup table so
that we include the order status information in our target table. To ensure that the correct delivery order status
is output with each record, we set conditions.
The following table shows the columns from the lookup table and the corresponding columns in the source
table that we use in the lookup_ext conditions. The values in each field pair must match to satisfy the
conditions.
DEL_SALES_ORDER_NMBER = SALES_ORDER_NUMBER
DEL_ORDER_ITEM_NUMBER = SALES_LINE_ITEM_ID
The syntax of the lookup_ext function seems complicated, however, there’s a graphical user interface that
helps you create the function. The following code shows the syntax of the lookup_ext function with just the
portions that we use in this example:
The following table describes each variable and the values for the lookup_ext we set in this example. For a
complete list of all of the options, see “lookup_ext” in the “Descriptions of Data Services built-in functions”
section of the Reference Guide.
Tutorial
Populate a table from multiple relational tables PUBLIC 99
Note
Because some of the function sections are too wide for the table, we've shown them with line breaks.
Condition 2:
DEL_ORDER_ITEM_NUMBER,'=
',
ODS_SALESITEM.SALES_LINE
_ITEM_ID
Parent topic: Populate a table from multiple relational tables [page 94]
Related Information
Tutorial
100 PUBLIC Populate a table from multiple relational tables
What's next [page 107]
The lookup_ext function retrieves data from a column in the ODS_DELIVERY table to include in the
SALES_FACT output table.
The following steps continue from the exercise in Creating an inner join [page 97].
For the following exercise, we use the ODS_DELIVERY table as the lookup table. We'll create two conditions in
the mapping of the ORD_STATUS column in the Schema Out pane. Perform the following steps in the
DF_SalesFact query editor:
The column hasn't been mapped yet, so a Column icon appears to the left of the column name.
2. Open the Mapping tab in the lower pane and select Functions....
The Select Parameters dialog box opens with options to define the Lookup_Ext function.
6. Establish ODS_DELIVERY as the lookup table by performing the following substeps:
Note
The lookup table is where the lookup_ext function obtains the value to put into the ORD_STATUS
column.
a. Select the down pointing arrow at the end of the Lookup Table option at the top.
The Input Parameter dialog box closes. The ODS_DELIVERY table is now the lookup table.
7. Expand the two source tables to expose the columns to use for the conditions:
a. Expand the Lookup table node at left and then expand the ODS_DELIVERY subnode.
b. Expand the Input Schema node at left and then expand the ODS_SALESITEM subnode.
We don't use the ODS_SALESORDER source table because it doesn't contain the columns we need for
the conditions in the function.
8. Set the first condition:
Tutorial
Populate a table from multiple relational tables PUBLIC 101
Note
a. Drag and drop DEL_SALES_ORDER_NUMBER from under ODS_DELIVERY to the Condition group under
Column in Lookup table.
b. Ensure that the equal sign (=) appears under Op.(&).
c. Select the ellipses at the end of the row.
ODS_DELIVERY.DEL_SALES_ORDER_NUMBER = ODS_SALESITEM.SALES_ORDER_NUMBER
ODS_DELIVERY.DEL_ORDER_ITEM_NUMBER = ODS_SALESITEM.SALES_LINE_ITEM_ID
Tutorial
102 PUBLIC Populate a table from multiple relational tables
The following screen capture shows the completed Select Parameters dialog box:
The final lookup_ext function displays in the Mapping tab and looks as follows:
lookup_ext([ODS_DS.DBO.ODS_DELIVERY,'PRE_LOAD_CACHE','MAX'],
[DEL_ORDER_STATUS],[NULL],
[DEL_SALES_ORDER_NUMBER,'=',ODS_SALESITEM.SALES_ORDER_NUMBER,DEL_ORDER_ITEM_NU
MBER,'=',ODS_SALESITEM.SALES_LINE_ITEM_ID]) SET
("run_as_separate_process"='no', "output_cols_info"='<?xml version="1.0"
encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/>
</output_cols_info>' )
12. Select Validate Current in the upper toolbar to make sure that there are no errors.
13. Select Back ( ) in the upper toolbar to close the query editor.
14. Save your work.
Task overview: Populate a table from multiple relational tables [page 94]
Tutorial
Populate a table from multiple relational tables PUBLIC 103
Related Information
After you have performed the validation step and fixed any errors, execute the JOB_SalesFact job.
The trace messages open. The execution is complete when you see the message in trace messages that
the job completed successfully.
3. Select DF_SalesFact in the Project Area to open it in the workspace.
4. Select the View Data icon (magnifying-glass) on the lower right corner of the SALES_FACT target object to
view 17 rows of data.
• Based on the filter you set for the inner join, the records show dates in the ORDER_DATE column that are
greater than or equal to January 1, 2007 and less than or equal to December 31, 2007.
• The ORD_STATUS column contains either a “D” or an “O” to indicate that the order status is D = delivered
or O = ordered.
Task overview: Populate a table from multiple relational tables [page 94]
Related Information
Tutorial
104 PUBLIC Populate a table from multiple relational tables
12.7 Viewing Impact and Lineage Analysis for the
SALES_FACT target table
Use the Data Services Management Console to view a lineage analysis of the Sales Fact job.
View information about the SALES_FACT target table by performing the following steps:
1. In SAP Data Services Designer, select Tools Data Services Management Console
The Impact and Lineage page opens with Objects to Analyze at left and repository information at right.
4. Select Settings in the upper right corner.
5. Check the name in the Repository text box at right to make sure that it contains the current repository.
6. Open the Refresh Usage Data tab to make sure that it lists the current job server in the Job Server text box.
7. Select Calculate Column Mapping.
The software calculates the current column mapping and displays a notification that column mappings are
calculated successfully at the top of the tab.
8. Select Close.
9. In the file tree at left, expand Datastores and then Target_DS to view the list of tables.
10. Expand Data Flow Column Mapping Calculation in the right pane to view the calculation status of each data
flow.
11. Select the SALES_FACT table name under Target_DS in the file tree.
The Overview tab for SALES_FACT table opens at right. The Overview tab displays general information
about the table such as the table datastore name and the table type.
12. Open the Lineage tab.
The following screen capture shows the Impact and Lineage Analysis for the SALES_FACT target table.
When you move the pointer over a source table icon, the name of the datastore, data flow, and owner
appear.
Tutorial
Populate a table from multiple relational tables PUBLIC 105
13. Expand the SALES_FACT table in the file tree and double-click the ORD_STATUS column.
The Lineage tab in the right-pane refreshes to show the lineage for the column. The following screen
capture shows the lineage of SALES_FACT.ORD_STATUS.
Notice that it shows the lookup table as the source for the ORD_STATUS column.
14. Print the reports by selecting the print option in your browser. For example, for Chrome, select the Tools
icon in the upper right and select Print.
Task overview: Populate a table from multiple relational tables [page 94]
Tutorial
106 PUBLIC Populate a table from multiple relational tables
Related Information
In the next segment, you'll learn how to take advantage of change data capture.
Parent topic: Populate a table from multiple relational tables [page 94]
Related Information
Tutorial
Populate a table from multiple relational tables PUBLIC 107
13 Changed data capture
Changed data capture (CDC) extracts only new or modified data after you process an initial load of the data to
the target system.
In this segment, you'll learn how to use SAP Data Services data flows and scripts to build a logic for finding
changed data. This method uses date, time, or datetime stamps to identify new rows added to a source table at
a given point in time. To learn about the other methods for capturing changed data, see the Designer Guide.
After you perform the tasks in this segment, you'll learn about using the following CDC objects:
• Global variables
• Template tables
• Scripts
• Custom functions
The goal
In this exercise, we create two jobs, an initial load job and a delta load job:
• Initial load job: Loads all rows from a source table to a target table. The job deletes all data in the target
table before it loads data. Therefore, the target data is the same as the source data.
• Delta load job: Reads data from the source table, but inserts only the new data to the target because of the
WHERE condition. The WHERE condition filters the data between the global variables for START TIME and
ENDTIME. Therefore, the job loads only the changed data to the target table. The job doesn't delete
existing data in the target table before loading changed data.
In the initial load job, Data Services establishes a baseline using an assigned date and time for each row in the
data source. In the delta load job, Data Services determines which rows are new or changed based on the last
date and time data.
The target database contains a job status table called CDC_time. Data Services stores the last date and time
data for each row in CDC_time. The delta load job updates that date and time for the next execution.
Creating the initial load job and defining global variables [page 110]
Create a job that processes the initial load of data.
Tutorial
108 PUBLIC Changed data capture
Use the replicate feature to copy the existing DF_CDC_Initial data flow and use the copy in the delta
load job.
Creating the Delta job and scripts for global variables [page 116]
SAP Data Services uses the delta load job to update the target table with data that is new or changed
since the last time the job ran.
Global variables are symbolic placeholders for values in a specific job that increase the flexibility and reusability
of jobs.
In general, an initial job contains the usual objects, such as a source, transform, and a target. But, an initial job
can also serve as a baseline for the source data through global variables.
• Global variables are available within the job for which they were created. They aren’t available for any other
jobs.
Example
For example, you have 26 jobs named JobA, JobB, through JobZ. You create a global variable for JobA.
You can't use the JobA global variable for JobsB through Z.
• Set values for global variables in several ways, including in scripts, at job execution, or in job schedule
properties.
• Global variables provide you with maximum flexibility at runtime.
Example
For example, you can change default values for global variables at runtime from a job's schedule or
SOAP call without having to open a job in the SAP Data Services Designer.
For complete information about using global variables in Data Services, see the Designer Guide.
Related Information
Creating the initial load job and defining global variables [page 110]
Tutorial
Changed data capture PUBLIC 109
Replicating the initial load data flow [page 116]
Creating the Delta job and scripts for global variables [page 116]
Execute the jobs [page 119]
What's next [page 122]
After you create the initial load job, create two global variables. The global variables serve as placeholders for
job execution start and end time stamps.
1. Open Class_Exercises in the Project Area and create a new batch job.
2. Rename the new job JOB_CDC_Initial.
3. Select job JOB_CDC_Initial in the Project Area to highlight it.
The Variables and Parameters dialog box opens. The dialog box displays the job name in the context
header.
5. Right-click Global Variables and select Insert.
Tutorial
110 PUBLIC Changed data capture
Map the columns from the source table to the output schema in the QryCDC query, and add a function
that checks the date and time.
Related Information
Add objects to the JOB_CDC_Initial job, including a work flow, data flow, initialization script, and termination
script.
The initialization and termination scripts define values for the global variables.
5. Add a script object ( ) from the tool palette to the left side of the work flow workspace.
6. Name the script SET_START_END_TIME.
7. Add a data flow object from the tool palette to the right of the script object in the workspace.
8. Name the data flow DF_CDC_Initial.
9. Add a second script object from the tool palette to the right of the DF_CDC_Initial data flow object.
10. Name the second script UPDATE_CDC_TIME_TABLE.
11. Connect the objects from left to right to set the order of the work flow.
The following screen capture shows the completed WF_CDC_Initial workflow:
Tutorial
Changed data capture PUBLIC 111
Next you add functions to the scripts.
Task overview: Creating the initial load job and defining global variables [page 110]
Related Information
Use your database management scripting language to add expressions that define the values for the global
variables.
When you define scripts, make sure that you follow the syntax rules for your database management system.
Before you define the scripts, check the date and time in the existing database to make sure you use a date in
the script that includes all of the records. To check the date and time, perform the following prerequisite steps:
To define the scripts in the WF_CDC_Initial work flow, perform the following steps:
1. Expand the WF_CDC_Inital node in the Project Area and select the SET_START_END_TIME script to open
it in the workspace.
This script establishes the $GV_STARTTIME as a date that includes all records in the table. The end time is
the time of job execution. The initial data flow captures all of the rows in the source table. In the
prerequisite steps, you noted the date for all rows in the source table as 2008.03.27 00:00:00. Therefore,
set the value for $GV_STARTTIME to a date that includes all rows. For example, the date 2008.01.01
00:00:000 is before the timestamp date in the table, therefore, it will include all rows from the table.
a. Enter the script directly in the text area using the syntax applicable for your database.
The following example is for Microsoft SQL Server. It establishes a start date and time of 2008.01.01
00:00:000. Then it establishes that the end date is the system date for the initial load job execution:
Tutorial
112 PUBLIC Changed data capture
Tip
As you start to type the global variable name, a list of variable names appears. Double-click the
applicable variable name from the list to add it to the string.
b. Select the Validate Current icon in the upper toolbar to validate the script.
Fix any syntax errors and revalidate if necessary.
Note
Even if SAP Data Services doesn't find syntax errors, your DBMS can find syntax errors when you
execute the job.
The script resets the LAST_TIME column value in the CDC_time job status table to the system date in
$GV_ENDTIME.
a. Enter the script directly in the text area using the syntax applicable for your database.
The following example is for Microsoft SQL Server. The function deletes the current date in the
CDC_time job status table and inserts the value from the global variable $GV_ENDTIME:
Note
Note that in the script, “ODS” is the owner name. Use your owner name in your script.
Note
Even if SAP Data Services doesn't find syntax errors, your DBMS can find syntax errors when you
execute the job.
Task overview: Creating the initial load job and defining global variables [page 110]
Related Information
Tutorial
Changed data capture PUBLIC 113
13.2.3 Defining the data flow
Define the data flow by adding a query and a template table to the data flow.
With a target template table, you don’t have to specify the table schema or import metadata. Instead, during
job execution, SAP Data Services has the DBMS create the table with the schema defined by the data flow.
Template tables appear in the Local Object Library under each datastore.
Data Services adds the Query to the data flow in the workspace.
5. Rename the query QryCDC.
6. Expand Target_DS in the Datastores tab in the Local Object Library.
7. Drag and drop the Template Tables icon to the workspace as the target in the data flow.
Task overview: Creating the initial load job and defining global variables [page 110]
Related Information
Tutorial
114 PUBLIC Changed data capture
13.2.4 Defining the QryCDC query
Map the columns from the source table to the output schema in the QryCDC query, and add a function that
checks the date and time.
Note
Task overview: Creating the initial load job and defining global variables [page 110]
Related Information
Tutorial
Changed data capture PUBLIC 115
13.3 Replicating the initial load data flow
Use the replicate feature to copy the existing DF_CDC_Initial data flow and use the copy in the delta load
job.
After you replicate the data flow, change the name and adjust some of the settings.
A new data flow appears in the Data Flow list with “Copy_1” added to the work flow name.
3. Rename the copy dataflow DF_CDC_Delta.
4. Double-click the DF_CDC_Delta data flow in the Local Object Library to open it in the workspace.
5. Double-click the CUST_CDC target table object in the workspace.
This step enables the job to update the target table with changed data while retaining the current data.
8. Close the Template Target Table Editor.
9. Save your work.
Related Information
13.4 Creating the Delta job and scripts for global variables
SAP Data Services uses the delta load job to update the target table with data that is new or changed since the
last time the job ran.
1. Create a new batch job in Class_Exercises and name the job JOB_CDC_Delta.
2. Select the JOB_CDC_Delta job name in the Project Area to highlight it.
Tutorial
116 PUBLIC Changed data capture
The Variables and Parameters dialog box opens. The banner of the dialog box contains the job name
JOB_CDC_Delta.
4. Right-click Global Variables and choose Insert.
Even though you use the same global variable names as for the initial load job, Data Services doesn't consider
them as duplicates because you create them for different jobs.
Related Information
Tutorial
Changed data capture PUBLIC 117
6. Add a second script from the tool pallet to the right of the data flow object in the workspace and name it
UPDATE_CDC_TIME_TABLE.
7. Connect the objects in the workspace from left to right.
8. Save your work.
Task overview: Creating the Delta job and scripts for global variables [page 116]
Related Information
Define the $GV_STARTTIME and $GV_ENDTIME global variables in the delta load scripts.
When you define scripts, make sure that you follow the rules for your database management system.
Perform the following steps beginning in the WF_CDC_Delta work flow workspace in SAP Data Services
Designer:
Note
Note that in the script, “ODS” is the owner name. Use your owner name in your script.
Fix any syntax errors and revalidate until there are no errors.
Note
Even if SAP Data Services doesn't find syntax errors, your DBMS can find syntax errors when you
execute the job.
d. Select the Back arrow icon in the toolbar to close the Script Editor.
2. Define the UPDATE_CDC_TIME_TABLE script:
a. Double-click the UPDATE_CDC_TIME_TABLE script in the workspace to open the Script Editor.
Tutorial
118 PUBLIC Changed data capture
b. Define the script to replace the value in the LAST_TIME column in the CDC_Time job status table to the
system date that is defined in $GV_ENDTIME global variable.
Note
Note that in the script, “ODS” is the owner name. Use the actual owner name in your script.
Note
Even if SAP Data Services doesn't find syntax errors, your DBMS can find syntax errors when you
execute the job.
Correct any errors, and ignore any warnings for this exercise.
4. Save your work and close all workspace tabs.
Task overview: Creating the Delta job and scripts for global variables [page 116]
Related Information
To understand how change data capture (CDC) works, execute the JOB_CDC_Initial then use your DBMS to
change the data in the ODS_CUSTOMER table before you execute the JOB_CDC_Delta.
The JOB_CDC_Delta job extracts the changed data from the table and updates the target table with only the
changed data.
View the results to see the different time stamps and to verify that only the changed data was loaded to the
target table.
Tutorial
Changed data capture PUBLIC 119
Executing the delta load job [page 121]
The delta-load job outputs the row that you added to the table after you ran the initial-load job.
Related Information
The initial load job outputs the source data to the target table, and updates the job status table with the job
execution date and time.
Use your DBMS to open the ODS_CUSTOMER table. Notice that there are 12 rows. The columns are the same as
the columns that appear in the Schema In pane in the QryCDC object.
To execute the initial load job, perform the following steps in SAP Data Services Designer:
1. Right-click the JOB_CDC_Initial job in the Project Area and select Execute.
2. Accept all of the default settings in the Execution Properties dialog box and select OK.
The Job Log opens in the workspace area. View the messages in the Trace page. If there’s an error, the Error
icon activates and processing stops. Select the Error icon and read the error messages. Even when the
script syntax validated, there can still be script errors issued by your database management system.
3. After successful execution, select the Monitor icon ( ) and view the Row Count column. The job is
successful when the column contains 12, which indicates the job processed all 12 rows of the source table.
4. View the data in the CUST_CDC target table.
The CUST_CDC target table contains the rows from the ODS_CUSTOMER source table.
Related Information
Tutorial
120 PUBLIC Changed data capture
13.5.2 Changing the source data
To see change data capture (CDC) in action, add a row to the ODS_CUSTOMER table and execute the delta load
job.
Note
If your database does not allow nulls for some fields, copy the data from another row.
Cust_ID ZZ01
Cust_Classf ZZ
Name1 EZ BI
Address NULL
City NULL
Region_ID NULLL
ZIP ZZZZZ
Related Information
The delta-load job outputs the row that you added to the table after you ran the initial-load job.
Before you perform the following steps, ensure that you added an additional row to the ODS_Customer table in
your database management system.
1. Right-click the JOB_CDC_Delta job in the Project Area and select Execute.
Tutorial
Changed data capture PUBLIC 121
2. Accept all of the default settings in the Execution Properties dialog box and select OK.
The Job Log opens in the workspace area. View the messages in the Trace page. If there’s an error, the Error
icon activates and processing stops. Select the Error icon and read the error messages. Even when the
script syntax validated, there can still be script errors issued by your database management system.
3. After successful execution, select the Monitor icon ( ) and view the Row Count column.
The job is successful when the column contains 1, which indicates the job processed only the changed row
in the source table.
4. View the data in the CUST_CDC target table.
The row that you added to the table in your database management system appears in the CUST_CDC target
table along with the original content of the table.
Related Information
In the next segment, learn how to verify your source data and improve source data quality.
Related Information
Tutorial
122 PUBLIC Changed data capture
14 Data assessment
Use data assessment features to identify problems in your data, separate out bad data, and audit data to
improve the quality and validity of your data.
Data Assessment provides features that enable you to trust the accuracy and quality of your source data.
In this segment, learn about the following methods to profile and audit data details:
• View table data and use the profile tools to view the default profile statistics.
• Use the Validation transform in a data flow to find records in your data that violate a data format
requirement in a specific column.
• Create an audit expression and an action for when a record fails the expression.
• Add an additional target table for records that fail an audit rule.
• View audit details in Operational Dashboard reports in the Data Services Management Console
The goal
In the previous exercise for change data capture, we instructed you to add the value of ZZZZZ for the ZIP
column. For this exercise, we must employ a business rule from a fictional company that requires a target ZIP
column contains only numeric data.
The exercises in this section introduce the following Data Services features:
• Data profiling: Pulls specific data statistics about the quality of your source data.
• Validation transform: Applies your business rules to data and sends data that failed the rules to a separate
target table.
• Audit dataflow: Outputs invalid records to a separate table.
• Auditing tools: Tracks your jobs in the Data Services Management Console.
For more information about data assessment features, see the Designer Guide.
Tutorial
Data assessment PUBLIC 123
Audit objects [page 130]
Auditing provides a way to ensure that a data flow loads correct data into intended targets.
The Data Profiler executes on a profiler server to provide column and relationship information about your data.
The software reveals statistics for each column that you choose to evaluate. The following table describes the
default statistics.
Statistic Description
Distincts The total number of distinct values out of all records for the
column.
Nulls The total number of NULL values out of all records in the col
umn.
Tutorial
124 PUBLIC Data assessment
Statistic Description
For more information about using the data profiler, see the Data Assessment section of the Designer Guide.
Related Information
Use features of the View Data dialog box to see profile statistics about source data that help you determine
data quality before processing.
1. Open the Datastores tab in the Local Object Library and expand ODS_DS Tables .
2. Right-click the ODS_CUSTOMER table and select View Data.
3. Select the Profile Tab icon ( ), which is the second tab from the left.
The Profile Tab opens. The first column contains the column names in the table. Subsequent columns
contain profile information for each column. The following screen capture shows the Profile Tab for
ODS_CUSTOMER.
Tutorial
Data assessment PUBLIC 125
Notice the ZIP column contains “ZZZZZ” in the Max column.
4. After you examine the statistics, select the “X” in the upper right corner to close the View Data dialog.
Next, create a validation job that changes the invalid entry of “ZZZZZ” in the ZIP column to blank.
Related Information
The Validation transform qualifies a data set based on rules for input schema columns.
Use a Validation transform to define rules that sort good data from bad data. The Validation transform outputs
up to three values: Pass, Fail, and RuleViolation. Data outputs are based on the condition that you specify
in the transform.
For this exercise, we set up a Pass target table for the first job execution. Then we alter the first job by adding a
Fail target table with audit rules.
Tutorial
126 PUBLIC Data assessment
Parent topic: Data assessment [page 123]
Related Information
1. Add a new job to the Class_Exercises project and name the job JOB_CustGood.
2. Open JOB_CustGood in the workspace.
3. Add a Data Flow object to the JOB_CustGood workspace from the tool palette and name the data flow
DF_CustGood.
4. Open the DF_CustGood data flow in the workspace.
5. Add the ODS_CUSTOMER table to the DF_CustGood data flow workspace and select Make Source from the
popup menu.
Find ODS_CUSTOMER in the Datastores tab of the Local Object Library under ODS_DS.
6. Add a Validation transform icon to the DF_CustGood workspace.
Find the Validation transform in the Transform tab of the Local Object Library under the Platform node.
7. Add a Template Table icon to the DF_CustGood workspace.
Find the Template Table icon in the Datastores tab of the Local Object Library under Target_DS.
The Pass option requires that SAP Data Services passes all rows to the target table, even rows that fail the
validation rules. The following screen capture shows the data flow:
Tutorial
Data assessment PUBLIC 127
12. Save your work.
Related Information
Create a rule in the Validation transform that marks records that have a noncompliant value in the ZIP column
and substitutes <Blank> for the noncompliant value.
To configure the Validation transform, open the DF_CustGood data flow workspace and perform the following
steps:
Option Instruction
Note
A small check mark appears next to the specified vali
dation column in the Schema in pane of the Transform
Editor.
Send to Pass passes the row to the target table even when
it fails the 5_Digit_ZIP_Column_Rule rule.
Tutorial
128 PUBLIC Data assessment
Option Instruction
Bindings
The following screenshot shows the completed Rule Editor dialog box:
4. Select OK.
The Rule Editor dialog box closes. The new rule appears in the Validation Rules tab under Rules.
5. Select the Enabled checkbox under If any rule fails and Send to Pass, substitute with.
6. Double-click the cell next to the checked Enabled cell, under Column.
7. Select ODS_Customer.ZIP from the list.
8. Enter the following under Expression: '' (two single quotes with no space between).
Tutorial
Data assessment PUBLIC 129
The two single quotes substitutes <Blank> for the ZIP values that don't pass the 5-digit string rule.
9. Select the Validate Current icon in the upper toolbar.
10. Fix any validation errors, if necessary.
11. Select the Back arrow icon in the upper toolbar to close the Transform Editor.
12. Right-click JOB_CustGood in the Project Area and select Execute.
After a successful execution, view the data in the CUST_GOOD target table to verify that the rule worked as you
intended. The row with the CUST_ID value of ZZ01, that contained “ZZZZZ” for the ZIP column, now contains
<Blank> in the ZIP column.
Related Information
Auditing provides a way to ensure that a data flow loads correct data into intended targets.
Collect audit statistics on data that flows out of any object in SAP Data Services, such as a source, transform,
or target.
In the next exercise, we set up the validation job from the last exercise to output records to two target tables:
• Fail target table: Contains records that don't pass the validation rule.
• Pass target table: Contains records that pass the validation rule.
The following table describes the various audit objects involved in creating an audit data flow.
Setting Description
Audit function Collects statistics for the audit points. For this exercise, we
set up a Count audit function on the source and pass target
tables. The Count audit function collects the following statis
tics:
Audit label Unique name in the data flow that is generated for the audit
statistics for each defined audit function.
Tutorial
130 PUBLIC Data assessment
Setting Description
Audit rule Boolean expression that uses audit labels to verify the job.
Audit action on failure Action the job takes when there’s a failure.
For a complete list of audit objects and descriptions, see the Data Assessment section of the Designer Guide.
Related Information
To configure auditing in the DF_CustGood data flow, add a second target table to the Validation transform.
Send to Fail causes Data Services to send the rows that fail the 5_Digit_ZIP_Column_Rule to a fail
target table.
6. Select the Back arrow in the upper toolbar to close the Transform Editor.
7. Add a Template Table icon to the DF_CustGood data flow as a second target object.
Tutorial
Data assessment PUBLIC 131
8. Enter Cust_Bad_Format in Template name in the Create Template dialog box and select OK.
9. Draw a connection from the Validation transform to the Cust_Bad_Format target table and select the Fail
option from the popup menu.
The following screen capture shows an example of the finished data flow.
Related Information
Create an audit function in the DF_CustGood data flow to direct failed records to the applicable target table.
The following steps set up a rule expression for each of the target tables in the data flow.
1. Open the DF_CustGood data flow in the workspace and select the Audit icon ( ) in the upper toolbar.
Tutorial
132 PUBLIC Data assessment
4. Open the Rule tab.
5. Select Add in the upper right.
6. Select the following values from each of the three lists in the center of the Rule tab:
• $Count_ODS_CUSTOMER
• =
• $CountError_ODS_CUSTOMER
7. Select Add.
Data Services adds the first auditing rule and opens a new line to add a second auditing rule.
8. Select the following values from each of the three lists in the center of the Rule tab:
• $Count_CUST_GOOD
• =
• $CountError_CUST_GOOD
9. In the Action on failure group at the right of the pane, deselect Raise exception.
Deselecting Raise exception prevents the job from stopping when an exception occurs.
10. Select Close to close the Audit dialog box.
The following screen capture shows the completed data flow. Notice that Data Services indicates the audit
points with the Audit icon on the right side of the ODS_Customer source table and Cust_Good target table.
The audit points are where Data Services collects audit statistics.
11. Select the Validate All icon in the upper toolbar to verify that there are no errors.
12. Save your work.
13. Right-click the Job_CustGood job and select Execute.
Related Information
Tutorial
Data assessment PUBLIC 133
14.5 Viewing audit details in Operational Dashboard reports
View audit details, such as an audit rule summary and audit labels and values, in the SAP Data Services
Management Console.
The Management Console is browser-based. Therefore, Data services opens your browser and presents
the Management Console login screen.
2. Log into the Management Console using your access credentials.
The Dashboard opens with statistics and data. For more information about the Dashboard, see the
Management Console Guide.
4. Select one of the JOB_CustGood jobs listed in the table at right.
There are two JOB_CustGood jobs because you executed it in this exercise and the last exercise.
Note
If the job doesn't appear in the table, adjust the Time Period dropdown list to a longer or shorter time
period as applicable.
The Job Execution Details pane opens showing the job execution history of the JOB_CustGood job.
5. Select the JOB_CustGood in the Job Execution History table.
The Job Details table opens. The Contains Audit Data column contains YES.
6. Select DF_CustGood in the Data Flow Name column.
Three graphs appear at right: Buffer Used, Row Processed, and CPU Used. Read about these graphs in the
Management Console Guide.
7. Select View Audit Data located just above the View Audit Data table.
The Audit Details dialog box opens. The following screen shot shows an example of the Audit Details dialog
box.
Tutorial
134 PUBLIC Data assessment
The following table explains the Audit Details pane.
Audit Rule Failed The violated audit rule from the job execution.
The Audit Details table lists the number counts for each Audit Label:
• $Count_ODS_CUSTOMER = 13
• $Count_CUST_GOOD = 12
The validation rule requires that all records comply with the 5_Digit_ZIP_Column_Rule. One record
failed the rule. That was the record that you manually added to the data table. It contained a ZIP value of
“ZZZZZ”. The audit rules that you created require that the row count is equal. However, because one row
failed the validation rule, the counts are not equal.
Related Information
After the job executes, open the fail target table to view the failed record.
Open the data flow in the workspace and click the magnifying icon in the lower right corner of the
CUST_BAD_FORMAT target table. The CUST_BAD_FORMAT target table contains one record. In addition to the
fields selected for output, the software added and populated three additional fields for error information:
• DI_ERRORACTION = F
• DI_ERRORCOLUMNS = Validation failed rule(s): ZIP
• DI_ROWID = 1.000000
Tutorial
Data assessment PUBLIC 135
These are the rule violation output fields that are automatically included in the Validation transform. For
complete information about the Validation transform, see the Reference Guide.
Parent topic: Viewing audit details in Operational Dashboard reports [page 134]
The next segment shows you how to design jobs that are recoverable if the job malfunctions, crashes, or
doesn’t complete.
Related Information
Tutorial
136 PUBLIC Data assessment
15 Recovery mechanisms
Use SAP Data Services recovery mechanisms to set up automatic recovery or to recover jobs manually that
don’t complete successfully.
A recoverable work flow is one that can run repeatedly after failure without loading duplicate data. Examples of
failure include source or target server crashes or target database errors that cause a job or work flow to
terminate prematurely.
In this segment, learn about the job recovery mechanisms that you can use to recover jobs that only partially
ran, and failed for some reason.
The goal
Create a recoverable job that loads the sales organization dimension table that you loaded in the exercise
Populate a table from a flat file [page 41]. Reuse the data flow DF_SalesOrg from that exercise to complete this
segment.
For more information about recovery methods, see the Designer Guide and the Reference Guide.
Tutorial
Recovery mechanisms PUBLIC 137
Executing the job [page 145]
Execute the job to see how the software functions with the recovery mechanism.
Create a job that contains three objects that are configured so that the job is recoverable.
The recoverable job that you create in this section contains the following objects:
Related Information
Local variables contain information that you can use in a script to determine when a job must be recovered.
In previous exercises you defined global variables. Local variables differ from global variables. Use local
variables in a script or expression that is defined in the job or work flow that calls the script.
Tutorial
138 PUBLIC Recovery mechanisms
1. Open the Class_Exercises project in the Project Area and add a new job named JOB_Recovery.
A new variable appears named $NewVariableX where X indicates the new variable number.
4. Double-click $NewVariableX and enter $recovery_needed for Name.
5. Select int from the Data type dropdown list.
6. Follow the same steps to create another local variable.
7. Name the variable $end_time and select varchar(20) from the Data type dropdown list.
Related Information
Tutorial
Recovery mechanisms PUBLIC 139
15.3 Creating the script that determines the status
Create a script that checks the $end_time variable to determine if the job completed properly.
The script reads the ending time in the status_table table that corresponds to the most recent start time. If
there is no ending time for the most recent starting time, the software determines that the prior data flow must
not have completed properly.
1. With JOB_Recovery opened in the workspace, add a script to the left side of the workspace and name it
GetWFStatus.
2. Open the script in the workspace and type the script directly into the Script Editor. Make sure that the
script complies with syntax rules for your DBMS.
For Microsoft SQL Server or SAP ASE, enter the following script:
Sample Code
Sample Code
Related Information
Tutorial
140 PUBLIC Recovery mechanisms
15.4 Conditionals
Conditionals are single use objects, which means they can only be used in the job for which they were created.
Define a conditional for this exercise to specify a recoverable data flow. To define a conditional, you specify a
condition and two logical branches:
Then Work flow elements to execute when the “If” expression eval
uates to TRUE.
Else (Optional) Work flow elements to execute when the “If” ex
pression evaluates to FALSE.
Related Information
Tutorial
Recovery mechanisms PUBLIC 141
15.4.1 Adding the conditional
2. Click the conditional icon on the tool palette then click in the workspace to the right of the script
GetWFStatus.
3. Name the conditional recovery_needed.
4. Double-click the conditional in the workspace to open the Conditional Editor.
($recovery_needed = 1)
Complete the conditional by specifying the work flows to execute for the If and Then conditions.
Related Information
Complete the conditional by specifying the data flows to use if the conditional equals true or false.
Follow these steps with the recovery_needed conditional open in the workspace:
1. Open the Data Flow tab In the Local Object Library and move DF_SalesOrg to the Else portion of the
Conditional Editor using drag and drop.
You use this data flow for the “false” branch of the conditional.
2. Right-click DF_SalesOrg in the Data Flow tab in the Local Object Library and select Replicate.
3. Name the replicated data flow ACDF_SalesOrg.
4. Move ACDF_SalesOrg to the Then area of the conditional using drag and drop.
Tutorial
142 PUBLIC Recovery mechanisms
6. Double-click the SALESORG_DIM target table to open it in the workspace.
7. Open the Options tab in the lower pane of the Target Table Editor.
8. Find the Update control category in the Advanced section and set Auto correct load to Yes.
Auto correct loading ensures that the same row is not duplicated in a target table by matching primary key
fields. See the Reference Guide for more information about how auto correct load works.
Related Information
This script updates the status_table table with the current timestamp after the work flow in the conditional has
completed. The timestamp indicates a successful execution.
1. With JOB_Recovery opened in the workspace, add the script icon to the right of the recovery_needed
conditional.
2. Name the script UpdateWFStatus.
3. Double-click UpdateWFStatus to open the Script Editor in the workspace.
4. Enter text using the syntax for your RDBMS.
For Microsoft SQL Server and SAP ASE, enter the following text:
Connect GetWFStatus script to the recover_needed conditional, and then connect recover_needed
conditional to the UpdateWFStatus script.
Tutorial
Recovery mechanisms PUBLIC 143
8. Save your work.
Related Information
Make sure that job configuration for JOB_Recovery is complete by verifying that the objects are ready.
Objects in JOB_Recovery
Purpose
Object
recovery_needed Conditional Specifies the work flow to execute when the “If” statement is
true or false.
UpdateWFStatus script Updates the status table with the current timestamp after
the work flow in the conditional has completed. The time
stamp indicates a successful execution.
Object Purpose
DF_SalesOrg data flow The data flow to execute when the conditional equals false.
ACDF_SalesOrg data flow The data flow to execute when the conditional equals true.
Tutorial
144 PUBLIC Recovery mechanisms
Parent topic: Recovery mechanisms [page 137]
Related Information
Execute the job to see how the software functions with the recovery mechanism.
Edit the status table status_table in your DBMS and make sure that the end_time column is NULL or blank.
1. Execute JOB_Recovery.
2. View the Trace messages and the Monitor data to see that the conditional chose ACDF_SalesOrg to
process.
ACDF_SalesOrg is the job that runs when the condition equals true. The condition is true because there
was no date in the end_time column in the status table. The software concludes that the previous job did
not complete and needs recovery.
3. Now execute the JOB_Recovery again.
4. View the Trace messages and the Monitor data to see that the conditional chose DF_SalesOrg to process.
DF_SalesOrg is the job that runs when the conditional equals false. The condition is false for this job
because the end_time column in the status_table contained the date and time of the last execution of the
job. The software concludes that the previous job completed successfully, and that it does not require
recovery.
Related Information
Tutorial
Recovery mechanisms PUBLIC 145
Conditionals [page 141]
Creating the script that updates the status [page 143]
Verify the job setup [page 144]
Data Services automated recovery properties [page 146]
What's next [page 147]
Data Services provides automated recovery methods to use as an alternative to the job setup for
JOB_Recovery.
With automatic recovery, Data Services records the result of each successfully completed step in a job. If a job
fails, you can choose to run the job again in recovery mode. During recovery mode, the software retrieves the
results for successfully completed steps and reruns incompleted or failed steps under the same conditions as
the original job.
Data Services has the following automatic recovery settings that you can use to recover jobs:
• Select Enable recovery and Recover from last failed execution in the job Execution Properties dialog.
• Select Recover as a unit in the work flow Properties dialog.
For more information about how to use the automated recovery properties in Data Services, see the Designer
Guide.
Related Information
Tutorial
146 PUBLIC Recovery mechanisms
15.9 What's next
The remaining segments in the tutorial provide information about some of the advanced features in SAP Data
Services.
The next three segments are optional. They contain exercises that help you learn about working in a multiuser
environment, working with SAP application data, and about running real-time jobs.
Related Information
Tutorial
Recovery mechanisms PUBLIC 147
16 Multiuser development
SAP Data Services enables teams of developers working on separate local repositories to store and share their
work in a central repository.
Each individual developer or team works on the application in their unique local repository. Each team uses a
central repository to store the master copy of its application. The central repository preserves all versions of all
objects in the application so you can revert to a previous version if necessary.
You can implement optional security features for central repositories. For more information about
implementing Central Repository security, see the Designer Guide.
In this segment, you'll learn how to perform the following multiuser development tasks:
The goal
We base the exercises for multiuser development on the following use case:
Example
Two developers use a Data Services job to collect data for the HR department. Each developer has their
own local repository and they share a central repository. Throughout the exercises, the developers modify
the objects in the job and use the central repository to store and manage the modified versions of the
objects.
Perform the exercises by acting as both developers, or work with another person with each of you assuming
one of the developer roles.
Tutorial
148 PUBLIC Multiuser development
How multiuser development works [page 151]
Data Services uses a central repository as a storage location and a version control tool for all objects
uploaded from local repositories.
The central object library provides access to reusable objects in a central repository, which you use in a multi-
user environment to check objects out to your local repository.
The central object library is a source control mechanism for objects in a central repository. It tracks the check-
out and check-in status of all objects that multiple users access. The central object library is a dockable and
movable pane just like the project area and local object library.
Through the central object library, authorized users access the central repository. The central repository
contains versions of objects saved by other users from their local repositories. The central object library
enables administrators to control who can add, view, and modify the objects stored in the central repository.
Example
Check out an object from the central repository to your local repository. Edit and save the object, then
check the object back into the central repository. Data Services adds the edited object to the central
repository as a new version, and also maintains the original version. When you check an object out of the
central repository, no other user can work on that object until you check the object back into the central
repository.
Users must belong to a user group that has permission to perform tasks in the central repository.
Administrators assign permissions to an entire group of users as well as assign various levels of permissions to
the users in a group.
Related Information
Tutorial
Multiuser development PUBLIC 149
What's next [page 176]
Multi-user development
The central object library pane contains controls for working with objects in the central repository as well as
version information.
The top of the central object library pane displays a Group Permission box with the current user permissions,
and the name of the central repository. There are icons located at the top of the pane for performing the
following tasks:
The central object library contains the same tabs as the local object library for accessing the existing objects
from the central repository. For example, open the Datastores tab and the central object library lists all of the
datastores saved to the central repository.
The following table describes the content for the additional columns in the central object library pane.
Check out user The name of the user who currently has the object checked
out of the library. Blank when the object is not checked out.
Check out repository The name of the local repository that contains the checked-
out object. Blank when the object is not checked out.
Permission The authorization type for the group that appears in the
Group Permission box at the top of the pane. When you add a
new object to the central object library, the current group
gets FULL permission to the object and all other groups get
READ permission.
Latest version A version number and a timestamp that indicate when the
software saved this version of the object.
Tutorial
150 PUBLIC Multiuser development
Related Information
Data Services uses a central repository as a storage location and a version control tool for all objects uploaded
from local repositories.
The central repository retains a history for all objects stored there. Developers use their local repositories to
create, modify, or execute objects such as jobs.
• Get objects
• Add objects
• Check out objects
• Check in objects
Task Description
Get objects Copy objects from the central repository to your local reposi
tory. If the object already exists in your local repository, the
file from the central repository overwrites the object in your
local repository.
Check out objects The software locks the object when you check it out from the
central repository. No one else can work on the object when
you have it checked out. Other users can copy a locked ob
ject and put it into their local repository, but it is only a copy.
Any changes that they make cannot be uploaded to the cen
tral repository.
Tutorial
Multiuser development PUBLIC 151
Task Description
Check in objects When you check the object back into the central repository,
Data Services creates a new version of the object and saves
the previous version. Other users can check out the object
after you check it in. Other users can also view the object
history to view changes that you made to the object.
Add objects Add objects from your local repository to the central reposi
tory any time, as long as the object does not already exist in
the central repository.
The central repository works like file collaboration and version control software. The central repository retains a
history for each object. The object history lists all versions of the object. Revert to a previous version of the
object if you want to undo your changes. Before you revert an object to a previous version, make sure that you
are not mistakingly undoing changes from other users.
Related Information
16.4 Preparation
Your system administrator sets up the multiuser environment to include two repositories and a central
repository.
Create three repositories using the user names and passwords listed in the following table.
central central
user1 user1
user2 user2
Tutorial
152 PUBLIC Multiuser development
Example
For example, with Oracle use the same database for the additional repositories. However, first add the users
listed in the table to the existing database. Make sure that you assign the appropriate access rights for each
user. When you create the additional repositories, Data Services qualifies the names of the repository
tables with these user names.
Example
For Microsoft SQL Server, create a new database for each of the repositories listed in the table. When you
create the user names and passwords, ensure that you specify appropriate server and database roles to
each database.
Consult the Designer Guide and the Management Console Guide for additional details about multiuser
environments.
Related Information
Follow these steps to configure a central repository. If you created a central repository during installation, use
that central repository for the exercises.
Tutorial
Multiuser development PUBLIC 153
2. From your Windows Start menu, click Programs SAP Data Services 4.2 Data Services Repository
Manager .
Data Services creates repository tables in the database that you identified.
8. Click Close.
Configure the two local repositories using the Data Services Repository Manager.
Repeat these steps to configure the user1 repository and the user2 repository.
2. From the Start menu, click Programs SAP Data Services 4.2 Data Services Repository Manager .
3. Enter the database connection information for the local repository.
4. Type the following user name and password based on which repository you are creating:
1 user1 user1
2 user2 user2
Tutorial
154 PUBLIC Multiuser development
16.4.3 Associating repositories to your job server
You assign a Job Server to each repository to enable job execution in Data Services.
1. From the Start menu, click Programs SAP Data Services 4.2 Data Services Server Manager .
2. Click Configuration Editor in the Job Server tab.
The Job Server Properties dialog box opens. A list of current associated repositories appears in the
Associated Repositories list, if applicable.
4. Click Add under the Associated Repositories list.
The Repository Information options become active on the right side of the dialog box.
5. Select the appropriate database type for your local repository from the Database type dropdown list.
6. Complete the appropriate connection information for your database type as applicable.
7. Type user1 in both the User name and Password fields.
8. Click Apply.
The software resyncs the job server with the repositories that you just set up.
Assign the central repository named central to user1 and user2 repositories.
1. Start the Designer, enter your log in credentials, and click Log on.
2. Select the repository user1 and click OK.
3. Enter the password for user1.
Tutorial
Multiuser development PUBLIC 155
4. Select Tools Central Repositories. .
If a prompt appears asking to overwrite the Job Server option parameters, select Yes.
10. Exit Designer.
11. Perform the same steps to connect user2 to the central repository.
Related Information
As you perform the tasks in this section, Data Services adds all objects to your local repositories.
Tutorial
156 PUBLIC Multiuser development
Adding objects to the central repository [page 159]
After you import objects to the user1 local repository, add the objects to the central repository for
storage.
Related Information
Tutorial
Multiuser development PUBLIC 157
16.5.1 Activating a connection to the central repository
Activate the central repository for the user1 and user2 local repositories so that the local repository has central
repository connection information.
The Central Repository Connections option is selected by default in the Designer list.
2. In the Central repository connections list, select Central and click Activate.
Data Services activates a link between the user1 repository and the central repository.
3. Select the option Activate automatically.
This option enables you to move back and forth between user1 and user2 local repositories without
reactivating the connection to the central repository each time.
4. Open the Central Object Library by clicking the Central Object Library icon on the Designer toolbar.
For the rest of the exercises in this section, we assume that you have the Central Object Library available in the
Designer.
Related Information
Before you can import objects into the local repository, complete the tasks in the section Preparation [page
152].
2. In the Local Object Library, right-click in a blank space and click Repository Import From File .
3. Select multiusertutorial.atl located in <LINK_DIR>\Tutorial Files and click Open.
A prompt opens explaining that the chosen ATL file is from an earlier release of Data Services. The ATL
older version does not affect the tutorial exercises. Therefore, click Yes.
Another prompt appears asking if you want to overwrite existing data. Click Yes.
Tutorial
158 PUBLIC Multiuser development
The Import Plan window opens.
4. Click Import.
5. Enter dstutorial for the passphrase and click Import.
The multiusertutorial.atl file contains a batch job with previously created work flows and data flows.
6. Open the Project tab in the Local Object Library and double-click MU to open the project in the Project
Area.
The MU project contains the following objects:
• JOB_Employee
• WF_EmpPos
• DF_EmpDept
• DF_EmpLoc
• WF_PostHireDate
• DF_PostHireDate
After you import objects to the user1 local repository, add the objects to the central repository for storage.
When you add objects to the central repository, add a single object or the object and its dependents. All
projects and objects in the object library can be stored in a central repository.
After importing objects into the user1 local repository, you can add them to the central repository for storage.
Follow these steps to add a single object from the user1 repository to the central repository:
Note
Make sure that you verify that you are using the correct library by reading the header information.
Tutorial
Multiuser development PUBLIC 159
4. Optional. Add any comments about the object.
5. Click Continue.
A status Options dialog box opens to indicate that Data Services added the object successfully.
Note
If the object already exists in the central repository, the Add to Central Repository option is not active.
6. Open the Central Object Library and open the Formats tab.
Expand Flat Files to see the NameDate_Format file is now in the central repository.
Related Information
Select to add an object and object dependents from the local repository to the central repository.
Log in to Data Services Designer, select the user1 repository, and enter user1 for the repository password.
• DF_EmpDept
• DF_EmpLoc
3. Right-click WF_EmpPos in the Local Object Library and select Add to Central Repository Object and
dependents .
Instead of choosing the right-click options, you can move objects from your local repository to the central
repository using drag and drop. The Version Control Confirmation dialog box opens. Click Next and the click
Next again so that all dependent objects are included in the addition.
Tutorial
160 PUBLIC Multiuser development
The comment appears for the object and all dependents when you view the history in the central
repository.
6. Click Continue.
The Output dialog box displays with a message that states “Add object completed”. Close the dialog box.
7. Verify that the Central Object Library contains the WF_EmpPos, DF_EmpDept, and DF_EmpLoc objects in
their respective tabs.
When you include the dependents of the WF_EmpPos, you add other dependent objects, including dependents
of the two data flows DF_EmpDept and DF_EmpLoc.
• Open the Datastores tab in the Central Object Library to see that the NAMEDEPT and the POSLOC tables.
• Open the Format tab in the Central Object Library to see the flat files PosDept_Format,
NamePos_Format, and NameLoc_Format objects
Related Information
Add an object and dependents that has dependent objects that were already added to the central repository
through a different object.
This topic continues from Adding an object and dependents to the central repository [page 160]. We assume
that you are still logged in to the user1 repository in Designer.
2. Right-click WF_PosHireDate and select Add to Central Repository Objects and dependents .
The Add to Central Repository Alert dialog box appears listing the objects that already exist in the central
repository:
• DW_DS
• NameDate_Format
• NamePos_Format
• POSHDATE(DWS_DS.USER1)
3. Click Yes to continue.
It is okay to continue with the process because you haven't changed the existing objects yet.
4. Enter a comment and select Apply comments to all objects.
Tutorial
Multiuser development PUBLIC 161
5. Click Continue.
6. Close the Output dialog box.
The central repository now contains all objects in the user1 local repository. Developers who have access to the
central repository can check out, check in, label, and get those objects.
Related Information
When you check out an object from the central repository, it becomes unavailable for other users to change it.
You can check out a single object or check out an object with dependents.
• If you check out a single object such as WF_EmpPos, it is not available for any user to change it. However,
the dependent object DF_EmpDept, remains in the central repository and it can be checked out by other
users.
• If you check out WF_EmpPos and the dependent DF_EmpDept, no one else can check out those objects.
Change the objects and save your changes locally, and then check the objects with your changes back into
the central repository. The repository creates a new version of the objects that include your changes.
After you make your changes and check the changed objects back into the central repository, other users can
view your changes, and check out the objects to make additional changes.
Check out an object and dependent objects from the central repository using menu options or icon tools.
Perform the following steps while you are logged in to the user1 repository.
1. Open the Central Object Library and open the Work Flow tab.
A warning appears telling you that checking out WF_EmpPos does not include the datastores. To include the
datastores in the checkout, use the Check Out with Filtering check out option.
Tutorial
162 PUBLIC Multiuser development
Note
The software does not include the datastore DW_DS in the checkout as the message states. However,
the tables NAMEDEPT and POSLOC, which are listed under the Tables node of DW_DS, are included in the
dependent objects that are checked out.
Alternatively, you can select the object in the Central Object Library, then click the Check out object and
Data Services copies the most recent version of WF_EmpPos and its dependent objects from the central
repository into the user1 local repository. A red check mark appears on the icon for objects that are checked
out in both the local and central repositories.
User1 can modify the WF_EmpPos work flow and the checked out dependents in the local repository while it is
checked out of the central repository.
Task overview: Check out objects from the central repository [page 162]
Related Information
Task overview: Check out objects from the central repository [page 162]
Tutorial
Multiuser development PUBLIC 163
Related Information
You can check in an object by itself or check it in along with all associated dependent objects. When an object
and its dependents are checked out and you check in the single object without its dependents, the dependent
objects remain checked out.
After you change an existing object, check it into the central repository so that other users can access it.
Data Services copies the object from the user1 local repository to the central repository and removes the
check-out marks.
6. In the Central Object Library window, right-click DF_EmpLoc and click Show History.
The History dialog box contains the user name, date, action, and version number for each time the file was
checked out and checked back in. The dialog box also lists the comments that the user included when they
checked the object into the central repository. This information is helpful for many reasons, including:
• Providing information to the next developer who checks out the object.
• Helping you decide what version to choose when you want to roll back to an older version.
• Viewing the difference between versions.
For more information about viewing history, see the Designer Guide.
7. After you have reviewed the history, click Close.
Tutorial
164 PUBLIC Multiuser development
Task overview: Checking in objects to the central repository [page 164]
Related Information
Set up the environment for user2 so that you can perform the remaining tasks in Multiuser development.
Log into SAP Data Services Designer and choose the user2 repository. Enter user2 for the password.
Set up the user2 developer environment in the same way that you set up the environment for user1. The
following is a summary of the steps:
Related Information
Undo a checkout to restore the object in the central repository to the condition in which it was when you
checked it out.
In this exercise, you check out DF_PosHireDate from the central repository, modify it, and save your changes
to your local repository. Then you undo the checkout of DF_PosHireDate from the central repository.
When you undo a checkout, you restore the object in the central repository to the way it was when you checked
it out. SAP Data Services does not save changes or create a new version in the central repository. Your local
repository, however, retains the changes that you made. To undo changes in your local repository, “get” the
object from the central repository after you undo the checkout. The software overwrites your local copy and
replaces it with the restored copy of the object in the central repository.
Undo checkout works for both a single object as well as objects with dependents.
Tutorial
Multiuser development PUBLIC 165
Undo an object checkout when you do not want to save your changes, and you want to revert the object
back to the original content when you checked it out.
Related Information
Check out the DF_PosHireDate and modify the output mapping in the query.
The DF_PosHireDate object appears with a red checkmark in both the Local Object Library and the
Central Object Library indicating that it is checked out.
4. In the local object library, double-click DF_PosHireDate to open it in the workspace.
5. Double-click the query in the data flow to open the Query Editor.
6. In the Schema Out pane, right-click LName and click Cut.
Related Information
Undo an object checkout when you do not want to save your changes, and you want to revert the object back to
the original content when you checked it out.
1. Open the Data Flow tab in the Central Object Library and expand Data Flows.
Tutorial
166 PUBLIC Multiuser development
2. Right-click DF_PosHireDate and click Undo Check Out Object .
Data Services removes the check-out symbol from DF_PosHireDate in the Local and Central Object Library,
without saving your changes in the central repository. The object in your local repository still has the output
mapping change.
Related Information
Compare two objects, one from the local repository, and the same object from the central repository to view
the differences between the objects.
Make sure that you have followed all of the steps in the Undo checkout section.
1. Expand the Data Flow tab in the Local Object Library and expand Data Flows.
The Difference Viewer opens in the workspace. It shows the local repository contents for DF_PosHireDate
on the left and the central repository contents for DF_PosHireDate on the right.
3. Examine the data in the Difference Viewer.
The Difference Viewer helps you find the differences between the local object and the object in the central
repository.
Expand the Query node and then expand the Query table icon. The Difference Viewer indicates that the
LName column was removed in the local repository on the left, but it was added back in the central
repository. The text is in green, and the green icon appears signifying that there was an insertion.
The Difference Viewer shows the difference between an object in the local repository and the central repository.
In the following screen capture, the Difference Viewer shows the differences between the DF_PosHireDate
objects in the left and right panes. Notice the following areas of the dialog box:
Tutorial
Multiuser development PUBLIC 167
• Each line represents an object or item in the object.
• The red bars on the right indicate where data is different. Click a red bar on the right and the viewer
highlights the line that contains the difference.
• The changed lines contain a colored status icon on the object icon that shows the status: Deleted,
changed, inserted, or consolidated. There is a key at the bottom of the Difference Viewer that lists the
status that corresponds to each colored status icon.
The Difference Viewer contains a status line at the bottom of the dialog box as shown in the image below. The
status line indicates the number of differences. If there are no differences, the status line indicates Difference [ ]
of 0. To the left of the status line is a key to the colored status icons.
Deleted The item does not appear in the object in the right pane.
Changed The differences between the items are highlighted in blue (the default) text.
Inserted The item has been added to the object in the right pane.
Consolidated The items within the line have differences. Expand the item by clicking its plus
sign to view the differences
Tutorial
168 PUBLIC Multiuser development
16.5.9 Check out object without replacement
Check out an object from the central repository so that SAP Data Services does not overwrite your local copy.
Example
For example, you may need to use the checkout without replacement option when you change an object in
your local repository before you check it out from the central repository.
The option prevents Data Services from overwriting the changes that you made in your local copy.
After you have checked out the object from the central repository the object in both the central and local
repository has a red check out icon. But the local copy is not replaced with the version in the central repository.
You can then check your local version into the central repository so that it is updated with your changes.
Do not ues the check out without replacement option if another user checked out the file from the central
repository, made changes, and then checked in the changes.
Example
For example, you make changes to your local copy of Object-A without realizing you are working in your
local copy.
Meanwhile, another developer checks out Object-A from the central repository, makes extensive changes
and checks it back in to the central repository.
You finally remember to check out Object-A from the central repository. Instead of checking the object
history, you assume that you were the last developer to work in the master of Object-A, so you check
Object-A out of the central repository using the without replacement option. When you check your local
version of Object-A into the central repository, all changes that the other developer made are overwritten.
Caution
Before you use the Object without replacement option in a multiuser environment, check the history of the
object in the central repository. Make sure that you are the last person who worked on the object.
In the next exercise, user2 uses the check out option Object without replacement to be able to update the
master version in the central repository with changes from the version in the local repository.
Tutorial
Multiuser development PUBLIC 169
16.5.9.1 Checking out an object without replacement
Use the checkout option without replacement to check out an object from the central repository without
overwriting the local copy that has changed.
1. Open the Data Flow tab in the Local Object Library and expand Data Flows.
2. Double-click DF_EmpLoc to open it in the workspace.
3. Double-click the query in the workspace to open the Query Editor.
4. Right-click FName in the Schema Out pane and click Cut.
5. Save your work.
6. Open the Data Flow tab of the Central Object Library and expand Data Flows.
The software marks the DF_EmpLoc object in the Central Object Library and the Local Object Library as
checked out. The software does not overwrite the object in the Local Object Library, but preserves the
object as is.
Related Information
Check in the local version of DF_EmpLoc to update the central repository version to include your changes.
These steps continue from the topic Checking out an object without replacement [page 170].
1. In the Central Object Library, right-click DF_EmpLoc and select Check in Object .
2. Type a comment as follows in the Comment dialog box and click Continue.
Now the central repository contains a third version of DF_EmpLoc. This version is the same as the copy of
DF_EmpLoc in the user2 local object library.
3. Right-click DF_EmpLoc in your Local Object Library and select Compare Object to central .
The Difference Viewer should show the two objects as the same.
Tutorial
170 PUBLIC Multiuser development
Task overview: Check out object without replacement [page 169]
Related Information
Related Information
Tutorial
Multiuser development PUBLIC 171
16.5.10 Get objects
When you get an object from the central repository, you are making a copy of a specific version for your local
repository.
You might want to copy a specific version of an object from the central repository into your local repository.
Getting objects allows you to select a version other than the most recent version to copy. When you get an
object, you replace the version in your local repository with the version that you copied from the central
repository. The object is not checked out of the central repository, and it is still available for others to lock and
check out.
Perform the following steps in Designer. You can use either the user1 or user2 repository.
7. Right-click DF_EmpLoc and select Get Latest Version Object from the dropdown menu.
Data Services copies the most recent version of the data flow from the central repository to the local
repository.
8. Open the DF_EmpLoc data flow in the Local Object Library,.
9. Open the query to open the Query Editor.
10. Notice that there are now three columns in the Schema Out pane: LName, Pos, and Loc.
The latest version of DF_EmpLoc from the central repository overwrites the previous copy in the local
repository.
11. Click the Back arrow in the icon menu bar to return to the data flow.
Related Information
Tutorial
172 PUBLIC Multiuser development
16.5.10.2 Getting a previous version of an object
Obtain a copy of a select previous version of an object from the central repository.
Perform the following steps in Designer. You can use either the user1 or user2 repository.
When you get a previous version of an object, you get the object but not its dependents.
Version 1 of DF_EmpLoc is the version that you first added to the central repository at the beginning of this
section. The software overwrote the altered version in your local repository with Version 1 from the central
repository.
Related Information
Use filtering to select the dependent objects to include, exclude, or replace when you add, check out, or check
in objects in a central repository.
When multiple users work on an application, some objects can contain repository-specific information. For
example, datastores and database tables might refer to a particular database connection unique to a user or a
phase of development. After you check out an object with filtering, you can change or replace the following
configurations:
Tutorial
Multiuser development PUBLIC 173
Related Information
The Version Control Confirmation dialog box opens with a list of dependent object types. Expand each node
to see a list of dependent objects of that object type.
4. Select NamePos_Format under Flat Files.
5. Select Exclude from the Target status dropdown list.
The word “excluded” appears next to NamePos_Format in the Action column. Data Services excludes the
flat file NamePos_Format from the dependent objects to be checked out.
6. Click Next.
The Datastore Options dialog box opens listing the datastores that are used by NamePos_Format.
7. Click Finish.
You may see a Check Out Alert dialog box stating that there are some dependent objects checked out by other
users. For example, if user1 checked in the WF_EmpPos work flow to the central repository without selecting to
include the dependent objects, the dependent objects could still be checked out. The Check Out Alert lists the
reasons why each listed object cannot be checked out. For example, “The object is checked out by the
repository: user1”. This reason provides you with the information to decide what to do next:
• Select Yes to get copies of the latest versions of the selected objects into your repository.
• Select No to check out the objects that are not already checked out by another user.
• Select Cancel to cancel the checkout.
You can delete objects from the local or the central repository.
Tutorial
174 PUBLIC Multiuser development
16.5.12.1 Deleting an object from the central repository
When you delete objects from the central repository, dependent objects, and objects in your local repositories
are not always deleted.
When you delete objects from the central repository, you delete only the selected object and all versions of
it; you do not delete any dependent objects.
7. Open the Work Flows tab in the local object library to verify that WF_PosHireDate was not deleted from the
user2 local object library.
When you delete an object from a central repository, it is not automatically deleted from the connected
local repositories.
Related Information
When you delete an object from a local repository, it is not deleted from the central repository.
When you delete an object from a local repository, the software does not delete it from the central
repository. If you delete an object from your local repository by accident, recover the object by selecting to
“Get” the object from the central repository, if it exists in your central repository.
4. Open the Central Object Library.
5. Click the Refresh icon on the object library toolbar.
6. Open the Data Flows tab in the Central Object Library and verify that DF_EmpDept was not deleted from
the central repository.
7. Exit Data Services.
Tutorial
Multiuser development PUBLIC 175
Task overview: Deleting objects [page 174]
Related Information
In the next segment, learn how to extract SAP application data using SAP Data Services.
Related Information
Tutorial
176 PUBLIC Multiuser development
17 Extracting SAP application data
To work with data from SAP applications, use specific tools and objects in SAP Data Services.
In this segment, use the following advanced features to obtain and process data from SAP applications:
• ABAP code and ABAP data flow: Define the data to extract from SAP applications.
• Data transport object: Carries data from the SAP application into Data services.
• Lookup function and additional lookup values: Obtains data from a source that isn't included in a job.
For more information about using SAP application data in Data Services, see the Supplement for SAP.
The goal
In this section, we work with the data sources that are circled in the star schema:
Note
To perform the exercises in this section, your implementation of Data Services must be able to connect to
an SAP remote server. Ask your administrator for details.
Tutorial
Extracting SAP application data PUBLIC 177
Note
The structure of standard SAP tables varies between versions. Therefore, the sample tables for these
exercises may not work with all versions of SAP applications. If the exercises in this section aren’t working
as documented, it can be because of the versions of your SAP applications.
SAP applications are the main building blocks of the SAP solution portfolios for industries.
SAP applications provide the software foundation with which organizations address their business issues. SAP
delivers the following types of applications:
Ask your system administrator about the types of SAP applications that your organization uses.
Tutorial
178 PUBLIC Extracting SAP application data
Related Information
Use the SAP application datastore to connect Data Services to the SAP application server.
Log on to Designer and to the tutorial repository. Do not use the user1, user2, or central repositories that you
created for the multiuser exercises.
For details about completing other datastore options, see the Datastores for SAP applications section of
the Supplement for SAP.
7. Click OK.
The new datastore appears in the Datastore tab of the Local Object Library.
Related Information
Tutorial
Extracting SAP application data PUBLIC 179
17.3 Importing metadata
Import SAP application tables into the new datastore SAP_DS for the exercises in this section.
Create and configure the SAP application datastore named SAP_DS before you import the metadata.
MAKT
MARA
VBAK
VBUP
The software adds the tables to the Datastores tab of the Local Object Library under Tables.
Tutorial
180 PUBLIC Extracting SAP application data
Related Information
Repopulate the customer dimension table by configuring a data flow that outputs SAP application data to a
datastore table.
Configure a Data Services job that includes a work flow and an ABAP data flow. The ABAP data flow extracts
SAP data and loads it into the customer dimension table.
To configure the Data Services job so that it communicates with the SAP application, configure an ABAP data
flow. The ABAP data flow contains Data Services supplied commands so you do not need to know ABAP.
For more information about configuring an ABAP data flow, see the Supplement for SAP.
1. Adding the SAP_CustDim job, work flow, and data flow [page 182]
The job for repopulating the customer dimension table includes a work flow and a data flow.
2. Adding ABAP data flow to Customer Dimension job [page 183]
Add the ABAP data flow to JOB_SAP_CustDim and set options in the ABAP data flow.
3. Defining the DF_SAP_CustDim ABAP data flow [page 184]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
4. Executing the JOB_SAP_CustDim job [page 187]
Validate and then execute the JOB_SAP_CustDim job.
5. ABAP job execution errors [page 188]
There are some common ABAP job execution errors that have solutions.
Related Information
Tutorial
Extracting SAP application data PUBLIC 181
Defining an SAP application datastore [page 179]
Importing metadata [page 180]
Repopulating the material dimension table [page 188]
Repopulating the Sales Fact table [page 195]
What's next [page 204]
17.4.1 Adding the SAP_CustDim job, work flow, and data flow
The job for repopulating the customer dimension table includes a work flow and a data flow.
Next task: Adding ABAP data flow to Customer Dimension job [page 183]
The SAP_CustDim data flow needs an ABAP dataflow to extract SAP application data.
The ABAP data flow interacts directly with the SAP application database layer. Because the database layer is
complex, Data Services accesses it using ABAP code.
Data Services executes the SAP_CustDim batch job in the following way:
Tutorial
182 PUBLIC Extracting SAP application data
17.4.2 Adding ABAP data flow to Customer Dimension job
Add the ABAP data flow to JOB_SAP_CustDim and set options in the ABAP data flow.
2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.
Option Action
Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.
ABAP program name Specify the name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following name requirements:
• Begins with the letter Y or Z
• Cannot exceed 8 characters
Job name Type SAP_CustDim. The name is for the job that runs in
the SAP application.
4. Open the General tab and name the data flow DF_SAP_CustDim.
5. Click OK.
6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the CUST_DIM table on to the workspace using drag and drop.
Previous task: Adding the SAP_CustDim job, work flow, and data flow [page 182]
Next task: Defining the DF_SAP_CustDim ABAP data flow [page 184]
Tutorial
Extracting SAP application data PUBLIC 183
17.4.3 Defining the DF_SAP_CustDim ABAP data flow
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
Perform the following group of tasks to define the ABAP data flow:
Previous task: Adding ABAP data flow to Customer Dimension job [page 183]
2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the KNA1 table to the left side of the workspace using drag and drop.
4. Select Make Source.
5. Add a query from the tool pallet to the right of the KNA1 table in the workspace.
6. Add a data transport from the tool pallet to the right of the query in the workspace.
7. Connect the icons in the data flow to indicate the flow of data as shown.
Tutorial
184 PUBLIC Extracting SAP application data
17.4.3.2 Defining the query
Complete the output schema in the query to define the data to extract from the SAP application.
1. Open the query In the workspace to open the Query Editor dialog box.
2. Expand the KNA1 table in the Schema In pane to see the columns.
3. Click the column head (above the table name) to sort the list in alphabetical order.
4. Map the following seven source columns to the target schema. Use Ctrl + Click to select multiple
columns and drag them to the output schema.
KUKLA
KUNNR
NAME1
ORT01
PSTILZ
REGIO
STRAS
The icon next to the source column changes to an arrow to indicate that the column has been mapped. The
Mapping tab in the lower pane of the Query Editor shows the mapping relationships.
5. Rename the target columns and verify or change the data types and descriptions using the information in
the following table. To change these settings, right-click the column name and select Properties from the
dropdown list.
Note
Microsoft SQL Server and Sybase ASE DBMSs require that you specify the columns in the order shown
in the following table and not alphabetically.
Tutorial
Extracting SAP application data PUBLIC 185
6. Click the Back arrow icon in the icon toolbar to return to the data flow and to close the Query Editor.
7. Save your work.
A data transport defines a staging file for the data that is extracted from the SAP application.
Tutorial
186 PUBLIC Extracting SAP application data
17.4.3.4 Setting the execution order
Set the order of execution by joining the objects in the data flow.
The data flow contains the ABAP data flow and the target table named Cust_Dim.
2. Connect the ABAP data flow to the target table.
3. Save your work.
Related Information
1. With the job selected in the Project Area, click the Validate All icon on the icon toolbar.
If your design contains errors, a message appears describing the error. The software requires that you
resolve the error before you can proceed.
If the job has warning message, you can continue. Warnings do not prohibit job execution.
If your design does not have errors, the following message appears:
2. Right-click the job name in the project area and click Execute.
If you have not saved your work, a save dialog box appears. Save your work and continue. The Execution
Properties dialog box opens.
3. Leave the default selections and click OK.
After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the cust_dim table in your DBMS.
Previous task: Defining the DF_SAP_CustDim ABAP data flow [page 184]
Tutorial
Extracting SAP application data PUBLIC 187
17.4.5 ABAP job execution errors
There are some common ABAP job execution errors that have solutions.
The following table lists a few common ABAP job execution errors. probable causes, and how to fix them.
Cannot open Lack of permissions for Job Server 1. Open the Services Control Panel
ABAP output file service account. 2. Double-click the Data Services service and select a user
account that has permissions to the working folder on
the SAP server
Cannot create Working directory on SAP server speci Open the Datastores tab in the Local Object Library and fol
ABAP output file fied incorrectly. low these steps:
If you have other ABAP errors, read about debugging and testing ABAP jobs in the Supplement for SAP.
For this exercise, you create a data flow that is similar to the dataflow that you created to repopulate the
customer dimension table. However, in this process, the data for the material dimension table is the result of a
join between two SAP application tables.
1. Adding the Material Dimension job, work flow, and data flow [page 189]
Create the Material Dimension job and add a work flow and a data flow.
2. Adding ABAP data flow to Material Dimension job [page 189]
Add the ABAP data flow to JOB_SAP_MtrlDim and set options in the ABAP data flow.
3. Defining the DF_SAP_MtrlDim ABAP data flow [page 191]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
Tutorial
188 PUBLIC Extracting SAP application data
4. Executing the JOB_SAP_MtrlDim job [page 194]
Validate and then execute the JOB_SAP_MtrlDim job.
Related Information
17.5.1 Adding the Material Dimension job, work flow, and data
flow
Create the Material Dimension job and add a work flow and a data flow.
Log into SAP Data Services Designer and open the Class_Exercises project in the Project Area.
Next task: Adding ABAP data flow to Material Dimension job [page 189]
Add the ABAP data flow to JOB_SAP_MtrlDim and set options in the ABAP data flow.
2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.
Tutorial
Extracting SAP application data PUBLIC 189
The Properties window of the ABAP data flow opens.
3. Complete the fields in the Options tab as described in the following table:
Option Action
Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.
ABAP program name Specify the name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following name requirements:
• Begins with the letter Y or Z
• Cannot exceed 8 characters
Job name Type SAP_MtrlDim. The name is for the job that runs in
the SAP application.
4. Open the General tab and name the data flow DF_SAP_MtrlDim.
5. Click OK.
6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the MTRL_DIM table to the workspace using drag and drop.
Previous task: Adding the Material Dimension job, work flow, and data flow [page 189]
Next task: Defining the DF_SAP_MtrlDim ABAP data flow [page 191]
Related Information
Tutorial
190 PUBLIC Extracting SAP application data
17.5.3 Defining the DF_SAP_MtrlDim ABAP data flow
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
Perform the following group of tasks to define the ABAP data flow:
Defining the query with a join between source tables [page 192]
Set up a join between the two source tables and complete the output schema to define the data to
extract from the SAP application
Previous task: Adding ABAP data flow to Material Dimension job [page 189]
Related Information
Add the necessary objects to complete the DF_SAP_MtrlDim ABAP data flow.
2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the MARA table to the left side of the workspace using drag and drop.
4. Select Make Source.
5. Move the MAKT table to the workspace using drag and drop. Position it under the MARA table.
Tutorial
Extracting SAP application data PUBLIC 191
6. Select Make Source.
7. Add a query from the tool pallet to the right of the table in the workspace.
8. Add a data transport from the tool pallet to the right of the query in the workspace.
9. Connect the icons in the data flow to indicate the flow of data as shown.
Related Information
Set up a join between the two source tables and complete the output schema to define the data to extract from
the SAP application
1. Double-click the query in the workspace to open the Query Editor dialog box.
2. Open the FROM tab in the lower pane.
3. In the Join pairs group, select MARA from the Left dropdown list.
4. Select MAKT from the Right dropdown list.
The source rows must meet the requirements of the condition to be passed to the target, including the join
relationship between sources. The MARA and MAKT tables are related by a common column named
MATNR. The MATNR column contains the material number and is the primary key between the two tables.
(MARA.MATNR = MAKT.MATNR)
Tutorial
192 PUBLIC Extracting SAP application data
6. Type the following command in the Smart Editor. Use all uppercase:
This command filters the material descriptions by language. Only the records with the material
descriptions in English are output to the target.
7. Click OK to close the Smart Editor.
8. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop.
Table Column
MARA MATNR
MTART
MBRSH
MATKL
MAKT MAKTX
9. Rename the target columns, verify data types, and add descriptions based on the information in the
following table.
10. Click the Back arrow in the icon toolbar to return to the data flow.
11. Save your work.
Related Information
A data transport defines a staging file for the data that is extracted from the SAP application.
Tutorial
Extracting SAP application data PUBLIC 193
This file stores the data set produced by the ABAP data flow. The full path name for this file is the path of
the SAP Data Services shared directory concatenated with the file name that you just entered.
4. Select Replace File.
Replace File truncates this file each time the data flow is executed.
5. Click the Back icon in the icon toolbar to return to the data flow.
6. Save your work.
Set the order of execution by joining the objects in the data flow.
The data flow contains the ABAP data flow and the target table named Mtrl_Dim.
2. Connect the ABAP data flow to the target table.
3. Save your work.
Related Information
1. With JOB_SAP_MtrlDim selected in the Project Area, click the Validate All icon on the icon toolbar.
If your design contains errors, a message appears describing the error, which requires solving before you
can proceed.
If your design contains warnings, a warning message appears. Warnings do not prohibit job execution.
If your design does not have errors, the following message appears:
2. Right-click the job name in the Project Area and click the Execute icon in the toolbar.
If you have not saved your work, a save dialog box appears. Save your work and continue.
After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the Mtrl_Dim table in your DBMS.
Tutorial
194 PUBLIC Extracting SAP application data
Task overview: Repopulating the material dimension table [page 188]
Previous task: Defining the DF_SAP_MtrlDim ABAP data flow [page 191]
Related Information
Repopulate the Sales Fact table from two SAP application sources.
This task extracts data from two source tables, and it extracts a single column from a third table using a lookup
function.
1. Adding the Sales Fact job, work flow, and data flow [page 196]
Create the Sales Fact job and add work flow and a data flow objects.
2. Adding ABAP data flow to Sales Fact job [page 196]
Add the ABAP data flow to JOB_SAP_SalesFact and set options in the ABAP data flow.
3. Defining the DF_ABAP_SalesFact ABAP data flow [page 197]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
4. Executing the JOB_SAP_SalesFact job [page 203]
Validate and then execute the JOB_SAP_SalesFact job.
Related Information
Tutorial
Extracting SAP application data PUBLIC 195
17.6.1 Adding the Sales Fact job, work flow, and data flow
Create the Sales Fact job and add work flow and a data flow objects.
Log into SAP Data Services Designer and open the Class_Exercises project in the Project Area.
Next task: Adding ABAP data flow to Sales Fact job [page 196]
Add the ABAP data flow to JOB_SAP_SalesFact and set options in the ABAP data flow.
2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.
Option Action
Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.
ABAP program name Specify a name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following naming requirements:
• Begins with the letter Y or Z
• Cannot exceed 8 characters
Tutorial
196 PUBLIC Extracting SAP application data
Option Action
Job name Type SAP_SalesFact. The name is for the job that runs
in the SAP application.
4. Open the General tab enter DF_ABAP_SalesFact for the ABAP data flow.
5. Click OK.
6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the SALES_FACT table to the workspace using drag and drop.
Previous task: Adding the Sales Fact job, work flow, and data flow [page 196]
Next task: Defining the DF_ABAP_SalesFact ABAP data flow [page 197]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
Perform the following group of tasks to define the ABAP data flow:
Defining the query with a join between source tables [page 199]
Set up a join between the two source tables and complete the output schema to define the data to
extract from the SAP application
Defining the lookup function to add output column with a value from another table [page 200]
Use a lookup function to extract data from a table that is not defined in the job.
Tutorial
Extracting SAP application data PUBLIC 197
Task overview: Repopulating the Sales Fact table [page 195]
Previous task: Adding ABAP data flow to Sales Fact job [page 196]
Related Information
Add the necessary objects to complete the DF_ABAP_SalesFact ABAP data flow.
2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the VBAP table to the left side of the workspace using drag and drop.
4. Select Make Source.
5. Move the VBAK table to the workspace using drag and drop. Place it under the VBAP table.
6. Select Make Source.
7. Add a query from the tool palette to the right of the tables in the workspace.
8. Add a data transport from the tool palette to the right of the query in the workspace.
9. Connect the icons in the dataflow to indicate the flow of data as shown.
Tutorial
198 PUBLIC Extracting SAP application data
10. Save your work.
Related Information
Defining the query with a join between source tables [page 199]
Set up a join between the two source tables and complete the output schema to define the data to extract from
the SAP application
VBAP.VBELN = VBAK.VBELN
This statement filters the sales orders by date and brings the sales orders from one year into the target
table.
7. Click OK.
8. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop:
Table Column
VBAP VBELN
POSNR
MATNR
NETWR
Tutorial
Extracting SAP application data PUBLIC 199
Table Column
VBAK KVGR1
AUDAT
9. Rename the target columns, verify data types, and add descriptions as shown in the following table:
Use a lookup function to extract data from a table that is not defined in the job.
Option Value
Name ord_status
Length 1
Tutorial
200 PUBLIC Extracting SAP application data
Note
4. Click OK.
Restriction
The LOOKUP function is case sensitive. Enter the values using the case as listed in the following table.
Type the entries in the text boxes instead of using the dropdown arrow or the Browse button.
Result column GBSTA The column from the VBUP table that contains the value for
the target column ord_status.
Tutorial
Extracting SAP application data PUBLIC 201
Option Value Description
Default value 'none' The value used if the lookup isn't successful. Use single quotes
as shown.
Cache spec 'NO_CACHE' Specifies whether to cache the table. Use single quotes as
shown.
Note
The value for the ord_status column comes from the GBSTA column in the VBUP table. The value in
the GBSTA column indicates the status of a specific item in the sales document. The software needs
both an order number and an item number to determine the correct value to extract from the table.
The function editor provides fields for only one dependency, which you defined using the values from
the table.
The Lookup function can process any number of comparison value pairs. To include the dependency on the
item number to the Lookup expression, add the item number column from the translation table and the
item number column from the input (source) schema as follows:
POSNR, VBAP.POSNR
13. Click the Back arrow in the icon toolbar to close the Query Editor.
14. Save your work.
Related Information
Tutorial
202 PUBLIC Extracting SAP application data
17.6.3.4 Defining the details of the data transport
A data transport defines a staging file for the data that is extracted from the SAP application.
Related Information
Set the order of execution by joining the objects in the data flow.
Related Information
1. With JOB_SAP_SalesFact selected in the Project Area, click the Validate All icon in the toolbar.
If your design contains errors, a message appears describing the error, which requires solving before you
can proceed.
If your design contains warnings, a warning message appears. Warnings do not prohibit job execution.
Tutorial
Extracting SAP application data PUBLIC 203
If your design does not have errors, the following message appears:
2. Right-click JOB_SAP_SalesFact in the Project Area and click the Execute icon in the toolbar.
If you have not saved your work, a save dialog box appears. Save your work and continue.
After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the Sales_Fact table in your DBMS.
Previous task: Defining the DF_ABAP_SalesFact ABAP data flow [page 197]
Related Information
In the next section, learn how to import and run a real-time job.
The tutorial employs batch jobs to help you learn how to use SAP Data Services. However, real-time jobs
process requests from external systems or Web applications, and send back requests in real time.
Related Information
Tutorial
204 PUBLIC Extracting SAP application data
18 Real-time jobs
In this segment, you execute a real-time job to see the basic functionality.
For real-time jobs, Data Services receives requests from ERP systems and Web applications and sends replies
immediately after receiving the requested data. Requested data comes from a data cache or a second
application. You define operations for processing on-demand messages by building real-time jobs in the
Designer.
• A single real-time data flow (RTDF) that runs until explicitly stopped
• Requests in XML message format and SAP applications using IDoc format
Note
The tutorial exercise focuses on a simple XML-based example that you import.
For more information about real-time jobs, see the Reference Guide.
The goal
We've developed a simple real-time job that you import and run in test mode.
1. Copy the following files from <LINK_DIR>\ConnectivityTest and paste them into your temporary
directory. For example, C:\temp:
Tutorial
Real-time jobs PUBLIC 205
• TestOut.dtd
• TestIn.dtd
• TestIn.xml
• ClientTest.txt
2. Copy the file ClientTest.exe from <LINK_DIR>\bin and paste it to your temporary directory.
Note
ClientTest.exe uses DLLs in your <LINK_DIR>\bin directory. If you encounter problems, ensure
that you have included <LINK_DIR>\bin in the Windows environment variables path statement.
4. Right-click in a blank space in the Local Object Library and select Repository Import From File .
Related Information
Run a real time job that transforms an input string of Hello World to World Hello.
Use the files that you imported previously to create a real-time job.
Tutorial
206 PUBLIC Real-time jobs
5. Expand Job_TestConnectivity and click RT_TestConnectivity to open it in the workspace.
The workspace contains one XML message source named TestIn (XML request) and one XML message
target named TestOut (XML reply).
6. Double-click TestIn to open it. Verify that the Test file option in the Source tab is C:\temp\TestIn.XML.
7. In Windows Explorer, open Testin.XML in your temporary directory. For example, C:\temp\TestIn.XML.
Confirm that it contains the following message:
<test>
<Input_string>Hello World</Input_string>
</test>
8. Back in Designer, double-click TestOut in the workspace to open it. Verify that the Test file option in the
Target tab is C:\temp\TestOut.XML.
9. Execute the job Job_TestConnectivity
10. Click Yes to save all changes if applicable.
11. Accept the default settings in the Execution Properties dialog box and click OK.
12. When the job completes, open Windows Explorer and open C:\temp\TestOut.xml. Verify that the file
contains the following text:
<test>
<output_string>World Hello</output_string>
</test>
Related Information
Tutorial
Real-time jobs PUBLIC 207
Important Disclaimers and Legal Information
Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:
• Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:
• The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.
• SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.
• Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering a SAP-hosted Web site. By using such
links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.
Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.
Bias-Free Language
SAP supports a culture of diversity and inclusion. Whenever possible, we use unbiased language in our documentation to refer to people of all cultures, ethnicities,
genders, and abilities.
Tutorial
208 PUBLIC Important Disclaimers and Legal Information
Tutorial
Important Disclaimers and Legal Information PUBLIC 209
www.sap.com/contactsap
SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.