0% found this document useful (0 votes)
63 views

Working With The DataFlux Data Job Transform

Uploaded by

shd_sbq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Working With The DataFlux Data Job Transform

Uploaded by

shd_sbq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Working with the DataFlux Data Job

Transformation
2 Working with the DataFlux Data Job Transformation

Introduction

DataFlux Data Management Studio (with Server)


DataFlux Data DataFlux
Client Tier Management Studio Repository

Jobs and
Quality Services
Knowledge
Base Source or Target Data
Data Tier
ODBC,
SAS data sets,
text,
federated data,
and so on

DataFlux Data DataFlux


Server Tier Management Server Repository

Jobs and
Services
3
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Jobs and services are designed in DataFlux Data Management Studio using functionality from the
Quality Knowledge Base (QKB). These jobs and services are stored in the DataFlux repository
registered for Data Management Studio, and can be uploaded to the repository for the DataFlux
Data Management Server for execution.
Note: The Data Management Server is intended to be a more powerful processing system, used to
implement the jobs and services, created in Data Management Studio, in both batch and
real-time environments.
Note: Objects can be deployed to the Data Management Server repository by “remotely submitting”
jobs to the Data Management Server from within Data Management Studio, or by logging
into the server (from within Data Management Studio) and “importing” objects up to the
server’s repository.
Note: In order to execute the jobs and services on Data Management Server, the server will need
to be configured to access the source data, the QKB, and any reference data packs that are
used.
Note: Data connections on the Data Management Server need to be identical to the data
connections that are defined in Data Management Studio and referenced in the data jobs
that are to be run on the server.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 3

DataFlux Data Management Server Interaction with SAS


DataFlux Data SAS Data
Client Tier Management Studio Integration Studio

SAS Data
Quality Server
Data Tier Code

Server Tier
SAS Metadata DataFlux Data SAS Application
SAS Metadata Server Management Server Server
Repository
Authentication SOAP/HTTP

Servers,
Users,
Roles, etc. Jobs and
Services
4
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

The diagram above shows a basic run-time architecture for using SAS Data Integration Studio to
execute data cleansing processes on the DataFlux Data Management Server. SAS client
applications, like Data Integration Studio invoke SAS Data Quality Server code on the SAS
Application Server, which communicates with the Data Management Server via SOAP over HTTP
protocol. The jobs and services are stored in a repository on the DataFlux Data Management Server.
The Data Management Server security services are implemented by the SAS Metadata Server.
These security services are a set of permissions that are established in SAS metadata (through SAS
Management Console) for administering security on a Data Management Server. The DataFlux Data
Management Server is registered in the SAS Metadata Repository, as well as the users, and their
corresponding roles for interacting with jobs and services on the DataFlux Data Management Server.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4 Working with the DataFlux Data Job Transformation

Configuring SAS to Data Management Server


Steps to configuring SAS to the Data Management Server include:
• Make sure the Data Management Server service is started on the server
machine.
• Verify the port the DataFlux Data Management Server is listening on.
• Register the Data Management Server in SAS Management Console’s Server
Manager utility.
• Build SAS Data Integration Studio jobs to run jobs and call services on the Data
Management Server.
• Use SAS Data Quality Server PROCS to submit jobs and call services on the Data
Management Server.

5
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Data Management Server Metadata Registration

DataFlux Data Management


Server listening port

DataFlux Data Management


Server selected

6
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

In order for the SAS Application Server to communicate with the DataFlux Data Management
Server, the server needs to be registered in SAS Metadata. This server registration is made using
the Server Manager component of SAS Management Console.
The Data Management Server security services are implemented by the SAS Metadata Server.
These security services are a set of permissions that are established in SAS metadata (through SAS
Management Console) for administering security on a Data Management Server.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 5

Reviewing and Testing a Data Job


This demonstration illustrates the steps necessary to access a DataFlux Data
Management Studio repository that contains a pre-built data job and import the job to the DataFlux
Data Management Server’s repository.
1. Access the Administration riser bar of DataFlux Data Management Studio.
a. If necessary, select Start  All Programs  DataFlux  Data Management Studio 2.7.
b. If prompted, click Cancel in the Log On dialog box.
c. Verify that the Home tab is selected.
d. Click the Administration riser bar.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6 Working with the DataFlux Data Job Transformation

2. Define access to an existing Data Management Studio repository.


a. Click Repository Definitions in the list of administration items on the Navigation pane.
b. Click New to define a new repository.

c. Enter DIFT Demo as the name.


d. Click Browse next to the Database file field.
1) In the Open window, navigate to D:\Workshop\dift\dqdemo.
2) Click dqdemo.rps.
3) Click Open.
e. Click Browse next to the Folder field in the File storage area.
1) Navigate to D:\Workshop\dift\dqdemo\files.
2) Click OK.
f. Clear Private.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 7

The final settings for the new repository definition should resemble the following:

g. Click OK.
The DIFT Demo repository is listed.

3. Review a data job in the DIFT Demo repository.


a. If necessary, click the Folders riser bar.
b. Expand DIFT Demo  batch_jobs.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
8 Working with the DataFlux Data Job Transformation

c. Right-click DIFT Standardize Contacts Data Job and select Open.

The data job appears on a new tab.


Data job opens on a new tab

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 9

d. Verify that there are four nodes in this data job.

• The Contacts table is the source for this job.


• The Standardize Fields node applies standardization to four fields (address, city, state,
phone).
• The Gender Analysis node determines gender based on a name field.
• The target node updates the Contacts table with the new column information.
e. Right-click the node labeled Contacts Table and select Properties.
f. Review the set of fields found in Contacts Table.

g. Click Cancel to close the Data Source Properties window.


h. Right-click the node labeled Standardize Fields and select Properties.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
10 Working with the DataFlux Data Job Transformation

i. Verify that four fields are being standardized.


j. Verify that the standardized values will be written to new fields named
<OriginalFieldName>_Stnd.

k. Click Cancel to close the Standardization Properties window.


l. Right-click the node labeled Gender Analysis and select Properties.
m. Verify that gender analysis is being performed on the CONTACT field using the Name
gender definition.

New field created


by the node

n. Verify that the gender information will be written to a field named CONTACT_Gender.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 11

o. Click Cancel to close the Gender Analysis Properties window.


p. Right-click the node labeled Data Target (Update) and select Properties.
q. Verify that the table being updated is Contacts.
r. Verify that the updates occur with the ID value (primary key) match.

s. Verify that the new _Stnd fields, as well as the new gender field, will be updated or added to
the Contacts table.

t. Click Cancel to close the Data Target (Update) Properties window.


u. Select File  Close to close the data job. Do not save any changes.
4. Upload the data job to the Data Management Server.
a. Ensure the DataFlux Data Management Server service is started.
1) Click Start.
2) Click Administrative Tools.
3) Click Services.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
12 Working with the DataFlux Data Job Transformation

4) Locate the DataFlux Data Management Server (server 1) and verify that it the status is
Started.

5) If necessary, right click on the service and select Start.


b. Connect to the available Data Management Servers in Data Management Studio.
1) Click the Data Management Servers riser bar.
2) Click Data Management Servers.
3) In the information pane (on the right), click Connect.
The Log On window appears.
4) Enter Ahmed as the user ID and Student1 as the password.

5) Click Log On.


The navigation panel reflects the available Data Management Servers.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 13

c. Upload a data job to the Data Management Server.


1) Expand DataFlux Data Management Server – sasbap.
2) Right-click the Batch Jobs folder and select Import.

3) In the Import Items window, expand DIFT Demo.


4) Expand batch_jobs.
5) Select DIFT Standardize Contacts Data Job.

6) Click Next.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
14 Working with the DataFlux Data Job Transformation

7) Click the Batch Jobs folder.

8) Click Import.
9) Verify that the import completed successfully.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 15

10) Click Close to close the Import from Repository window.


The data job appears in the Batch Jobs folder.

5. Run the data job on the server.


a. If necessary, click Batch Jobs under DataFlux Data Management Server – sasbap.
b. Click the DIFT Standardize Contacts Data Job in the information pane.

c. Click (the Run icon).


d. Click Run in the Run Job dialog window.
e. You will be prompted to specify any macro/value pairs for the job. Click Run.
f. Click the Run History tab.
The Status field will show the status of the job while it is running, and upon completion.

Run History tab selected

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
16 Working with the DataFlux Data Job Transformation

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Working with the DataFlux Batch Job Transformation 17

Working with the DataFlux Batch Job


Transformation

DataFlux Data Management Server Transformations

9
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

There are two transformations available in SAS Data Integration Studio specifically for interaction
with the DataFlux Data Management Server. These transformations are found in the Data Quality
group in the Transformations tab. The two transformations available are DataFlux Batch Job and
DataFlux Data Service.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
18 Working with the DataFlux Data Job Transformation

DataFlux Batch Job Transformation

Connection to DataFlux Selected batch job from


Data Management Server the server’s repository

10
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

The DataFlux Batch Job transformation is used to execute a batch data job that has been deployed
to the Data Management Server’s repository. You use the DataFlux Batch Job transformation in
Data Integration Studio by simply adding the transformation to the job flow. When you access the
properties for the transformation, you can connect to a registered Data Management Server
instance, then select the type of job you wish to run, and then select the specific job to be executed.
Note: Before using this transformation, you need to ensure that the DataFlux Data Management
Server service is started.
Note: Before using this transformation, you need to ensure that the job is uploaded to the Data
Management Server, and it runs successfully on the server. You can do this by connecting to
the server from within Data Management Studio.
Note: The SAS Application server issues SOAP commands over HTTP to return the list of jobs
from the selected server.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Working with the DataFlux Batch Job Transformation 19

Verifying the Job Completed Successfully

DI Studio job
completed successfully
DM Serer batch job
completed successfully
11
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
20 Working with the DataFlux Data Job Transformation

Using the DataFlux Batch Job Transformation


This demonstration illustrates the steps necessary to use the DataFlux Batch Job
transformation in SAS Data Integration Studio to execute a batch job from the DataFlux Data
Management Server repository.
1. If necessary, access SAS Data Integration Studio using Student’s credentials.
a. Select Start  All Programs  SAS  SAS Data Integration Studio.
b. Select My Server as the connection profile.
c. Click OK to close the Connection Profile window. The Log On window appears.
d. Enter Student in the User ID field and Metadata0 in the Password field.
e. Click OK to close the Log On window.
2. Create a new job to use the DataFlux Batch Job transformation.
a. Click the Folders tab.
b. Right-click My Folder and select New Job.
c. Enter DIFT Run DataFlux Batch Job – Standardize Contacts Example in the Name field.
d. Click OK.
e. Add a DataFlux Batch Job transformation to the job.
1) Click the Transformations tab.
2) Expand the Data Quality grouping.
3) Add a DataFlux Batch Job transformation to the Job Editor.

4) Configure the transformation.


5) Right-click the DataFlux Batch Job transformation and select Properties.
6) On the General tab, enter DataFlux Batch Job - Standardize Contacts after the default
name.

7) Click the Job tab.


8) Verify that DataFlux Data Management Server - sasbap is selected in the Server field.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Working with the DataFlux Batch Job Transformation 21

9) Verify that Batch is specified as the Job type.


10) Verify that DIFT Standardize Contacts Data Job.ddf is selected as the Job.

11) Click OK to close the DataFlux Batch Job Properties window.


12) Select File  Save to save the job metadata.
f. Run the job and view the results.
1) Click Run.
2) Verify that the job completed successfully.

3) Click the Folders tab.


4) Expand Data Mart Development  Orion Source Data.
5) Right-click DIFT Contacts and select Update Metadata.
6) Click Yes (you want to continue).
7) Click No (you do not need to view the details).

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
22 Working with the DataFlux Data Job Transformation

8) Right-click DIFT Contacts and select Open.

9) Verify that the new field GENDER exists, and that the ADDRESS, CITY, STATE and
PHONE fields have been standardized.
10) Close the View Data window.
11) Close the Data Job, saving any changes.
12) Click File  Exit to close SAS Data Integration Studio.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

You might also like