Working With The DataFlux Data Job Transform
Working With The DataFlux Data Job Transform
Transformation
2 Working with the DataFlux Data Job Transformation
Introduction
Jobs and
Quality Services
Knowledge
Base Source or Target Data
Data Tier
ODBC,
SAS data sets,
text,
federated data,
and so on
Jobs and
Services
3
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Jobs and services are designed in DataFlux Data Management Studio using functionality from the
Quality Knowledge Base (QKB). These jobs and services are stored in the DataFlux repository
registered for Data Management Studio, and can be uploaded to the repository for the DataFlux
Data Management Server for execution.
Note: The Data Management Server is intended to be a more powerful processing system, used to
implement the jobs and services, created in Data Management Studio, in both batch and
real-time environments.
Note: Objects can be deployed to the Data Management Server repository by “remotely submitting”
jobs to the Data Management Server from within Data Management Studio, or by logging
into the server (from within Data Management Studio) and “importing” objects up to the
server’s repository.
Note: In order to execute the jobs and services on Data Management Server, the server will need
to be configured to access the source data, the QKB, and any reference data packs that are
used.
Note: Data connections on the Data Management Server need to be identical to the data
connections that are defined in Data Management Studio and referenced in the data jobs
that are to be run on the server.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 3
SAS Data
Quality Server
Data Tier Code
Server Tier
SAS Metadata DataFlux Data SAS Application
SAS Metadata Server Management Server Server
Repository
Authentication SOAP/HTTP
Servers,
Users,
Roles, etc. Jobs and
Services
4
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
The diagram above shows a basic run-time architecture for using SAS Data Integration Studio to
execute data cleansing processes on the DataFlux Data Management Server. SAS client
applications, like Data Integration Studio invoke SAS Data Quality Server code on the SAS
Application Server, which communicates with the Data Management Server via SOAP over HTTP
protocol. The jobs and services are stored in a repository on the DataFlux Data Management Server.
The Data Management Server security services are implemented by the SAS Metadata Server.
These security services are a set of permissions that are established in SAS metadata (through SAS
Management Console) for administering security on a Data Management Server. The DataFlux Data
Management Server is registered in the SAS Metadata Repository, as well as the users, and their
corresponding roles for interacting with jobs and services on the DataFlux Data Management Server.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4 Working with the DataFlux Data Job Transformation
5
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
6
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
In order for the SAS Application Server to communicate with the DataFlux Data Management
Server, the server needs to be registered in SAS Metadata. This server registration is made using
the Server Manager component of SAS Management Console.
The Data Management Server security services are implemented by the SAS Metadata Server.
These security services are a set of permissions that are established in SAS metadata (through SAS
Management Console) for administering security on a Data Management Server.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 5
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6 Working with the DataFlux Data Job Transformation
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 7
The final settings for the new repository definition should resemble the following:
g. Click OK.
The DIFT Demo repository is listed.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
8 Working with the DataFlux Data Job Transformation
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 9
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
10 Working with the DataFlux Data Job Transformation
n. Verify that the gender information will be written to a field named CONTACT_Gender.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 11
s. Verify that the new _Stnd fields, as well as the new gender field, will be updated or added to
the Contacts table.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
12 Working with the DataFlux Data Job Transformation
4) Locate the DataFlux Data Management Server (server 1) and verify that it the status is
Started.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 13
6) Click Next.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
14 Working with the DataFlux Data Job Transformation
8) Click Import.
9) Verify that the import completed successfully.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Introduction 15
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
16 Working with the DataFlux Data Job Transformation
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Working with the DataFlux Batch Job Transformation 17
9
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
There are two transformations available in SAS Data Integration Studio specifically for interaction
with the DataFlux Data Management Server. These transformations are found in the Data Quality
group in the Transformations tab. The two transformations available are DataFlux Batch Job and
DataFlux Data Service.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
18 Working with the DataFlux Data Job Transformation
10
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
The DataFlux Batch Job transformation is used to execute a batch data job that has been deployed
to the Data Management Server’s repository. You use the DataFlux Batch Job transformation in
Data Integration Studio by simply adding the transformation to the job flow. When you access the
properties for the transformation, you can connect to a registered Data Management Server
instance, then select the type of job you wish to run, and then select the specific job to be executed.
Note: Before using this transformation, you need to ensure that the DataFlux Data Management
Server service is started.
Note: Before using this transformation, you need to ensure that the job is uploaded to the Data
Management Server, and it runs successfully on the server. You can do this by connecting to
the server from within Data Management Studio.
Note: The SAS Application server issues SOAP commands over HTTP to return the list of jobs
from the selected server.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Working with the DataFlux Batch Job Transformation 19
DI Studio job
completed successfully
DM Serer batch job
completed successfully
11
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
20 Working with the DataFlux Data Job Transformation
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Working with the DataFlux Batch Job Transformation 21
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
22 Working with the DataFlux Data Job Transformation
9) Verify that the new field GENDER exists, and that the ADDRESS, CITY, STATE and
PHONE fields have been standardized.
10) Close the View Data window.
11) Close the Data Job, saving any changes.
12) Click File Exit to close SAS Data Integration Studio.
Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.