0% found this document useful (0 votes)
193 views

DataStage Migration Webinar - v3FINAL

Uploaded by

Hulya Sahip
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
193 views

DataStage Migration Webinar - v3FINAL

Uploaded by

Hulya Sahip
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

IBM DataStage Webinar

Cloud Pak for Data Migration Best Practices

Jessie Snyder
Offering Manager, IBM Data and AI Data Integration

Bala Vaithyalingam
Principal, IBM Data and AI Expert Labs and Learning

Scott Brokaw
Offering Manager, IBM Data and AI Data Integration
IBM Data and AI / © 2020
Please IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM’s sole discretion.
Note Information regarding potential future products is intended to outline our general product direction
and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract.

The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in


a controlled environment. The actual throughput or performance that any user will experience
will vary depending upon many factors, including considerations such as the amount of
multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and
the workload processed. Therefore, no assurance can be given that an individual user will achieve
results similar to those stated here.

2 2
IBM Cloud Pak for Data
Simplifies, unifies and automates your journey to AI

Analyze & Infuse


Plug and play 45+ data, analytics and AI apps. Manage
your favorite open source capabilities along side IBM’s
market leading differentiators.

Organize
Catalog and govern all enterprise data, models, rules, and
insights through a common experience

Collect
Virtually connect, manage and query data & AI assets OpenShift
no matter where they live. Leverage the leading
open source hybrid cloud
platform to SCALE data &
AI workloads.
Run On-Premises or on ANY Cloud
Decoupling enterprise data, analytics and AI will prevent
lock in and accelerate polyglot eco-systems.

IBM Data and AI / © 2020


Why Future proofing your Data Integration
Key customer benefits

DataStage on – Design with Speed through smart automation and high level

IBM Cloud
of reusability

– No additional development cost when scaling out to new


5x
environments Faster execution than on
Pak for Data? Spark parallel engine
– Runtime independence through Enterprise container
platform foundation

– Cost, speed and scale optimized data integration

– Increased compliance and deploy @ scale operate through


9x
Faster design than hand-
full CI/CD integration coding
– Significantly reduce effort in management and operation

85%
Reduction in infrastructure
management time & effort

IBM Data and AI / © 2020 4


Paths to Modernize
Modernization Offering
Today
DataStage
DataStage Enterprise
DataStage Workgroup Edition for Cloud Pak for Data

DataStage

QualityStage DataStage Enterprise Plus


for Cloud Pak for Data

Information Server for Data Integration

Information Server Enterprise Edition


Information Server
for Cloud Pak for Data
Information Server for Data Quality

IBM Data and AI / © 2020 5


Capability Comparison Cloud Pak for Data
DataStage DataStage Enterprise DataStage Enterprise Plus for Information Server
(on-prem) for Cloud Pak for Data Cloud Pak for Data for Cloud Pak for Data
Business Glossary ✓
Business & Technical Data Lineage ✓
Self-service Data Preparation
Add by licensing

Search & find relevant data Watson Knowledge Catalog (WKC) via Cloud ✓
Review, Rate & Share data Pak for Data base ✓
Data Profiling ✓
Sensitive data discovery ✓
Data Cleansing and Enrichment ✓ ✓
Data Quality Validation & Monitoring ✓ ✓
Service enablement ✓ ✓
Data Specification Mapping ✓ ✓
Extract, transform, load data ✓ ✓ ✓ ✓
Metadata Management ✓ ✓ ✓ ✓
Common Cloud Pak for Data Platform
Management
✓ ✓ ✓
Unlimited users ✓ ✓ ✓
Automatic Workload Balancing ✓ ✓ ✓
IBM Data and AI / © 2020 6
How entitlements traded up to Modernization Upgrade can be allocated
Scenario: Your existing DataStage
• Today: 7000 PVUs of DataStage Standalone (Prod)
• At renewal: trade-up to DataStage Enterprise Upgrade
• Get at a minimum: 100 VPCs of DataStage Enterprise + 100VPCs of Cloud Pak for Data
• Once you trade-up, you can allocate this entitlement in an infinite number of ways, some shown below:

Example Example
Trade-up license entitlement but workload still Trade-up license entitlement and move workload
runs on stand-alone offering to extension gradually

100 VPCs 100 VPCs 80 VPCs 20 VPCs 100 VPCs


Leveraging Service Leveraging Base Modernizing at Not locked in Leveraging Base
where it was Services their own pace once moved Services

IBM Data and AI / © 2020


Synergies and Benefits of an
integrated Data & AI Platform
The Whole is Greater than the sum of the parts

DataStage taking Utilize implicit Data Optimized and DataStage being


advantage of co- Discovery, Lineage, agile data the vital link of the
located Netezza or Governance and processing when Edge to Analytics
Db2 WH for ultra Metadata combing DataStage value chain on
highspeed data Management with Data Cloud Pak for Data
load Virtualization

IBM Data and AI / © 2020 8


DataStage
Environment Modernization

IBM Data and AI / © 2020 9


Getting Started - Lift and Shift
A schematic project plan:

t0 t1 t2 t3

Installation Migrate (Import / Export) Test Go live !


Setup assets
Config Optionally: Convert Server to
Parallel jobs
Network

• Lift and Shift (CPD) takes place in parallel to regular DataStage operation.
• Scenario assumes no change in use cases
• Perform asset conversion if needed – can be performed on the legacy environment
• Migration can be done directly from current version of DataStage / QualityStage
• Migration period managed by waiver.

IBM Data and AI / © 2020 10


Modernizing with MettleCI MettleCI Features & Functions
• Automated Peer Review (Compliance)
• Automated Unit Testing (generation &
IBM is planning to provide the entire MettleCI management)
tool set to any DataStage or Information Server • Universal Git integration (Jobs and Unit Tests)
on Cloud Pak for Data licensee* • Continuous Integration
• Continuous Delivery

This tool set supports both upgrade/migration


of DataStage as well as CICD MettleCI Benefits
• Shorter time to delivery
• Lower cost of maintenance
• Higher performance Jobs
• Earlier, cheaper defect discovery
• Lower testing costs
• More reliable E2E execution
• Zero-effort work item traceability / auditability
• Release management with minimal effort
• Visibly more productive DataStage teams
• Higher utilisation of Information Server platform
*OEM of MettleCI expected to be complete in Q4 2020 • Demonstrable alignment with organisation strategy
IBM Data and AI / © 2020 11
DataStage Upgrade Report Card

• Provides job assessment to


help determine complexity for
migration and upgrade
compatibility

• IBM’s Rapid DataStage


Upgrade Assessment (free)
additionally provides
estimates of upgrade
duration, costs, and approach

• Minimizes manual effort, cost,


risk, and elapsed time
associated with upgrading
your DataStage environment

IBM Data and AI / © 2020 12


Rapid DataStage Upgrade
Assessment Process

IBM Data and AI / © 2020 13


DataStage
Compliance
Report
Summary

IBM Data and AI / © 2020 15


DataStage Upgrade through Continuous Delivery Pipeline
Legacy Environment

DataStage Git Testing


5. Upgrade 6. Testing 7. Target Ready
Jobs are Verify baseline tests All jobs are
automatically from legacy still pass imported,
upgraded to replace in the new target converted,
Deploy deprecated stages environment compiled, and unit
tested

1. Source Delta 2. Git Check-in 3. Testing


Jobs and Unit Test All code is Legacy testing for
specifications and promoted via build job compliance with
data are managed events triggered by coding standards, as Upgrade Testing DataStage
using DataStage an automatically- well as automated
Designer and managed Git unit testing
MettleCI repository.
4. Deploy
Workbench Automated
import of all
artifacts into the Target Environment
new target

Automated

IBM Data and AI / © 2020 16


DataStage Modernization/Migration FAQs

What would a Lift & Shift migration to Cloud Pak for Data look like?
• Lift and Shift from stand alone DataStage to Cloud Pak for Data is very similar to Lift and Shift when doing a
version to version migration.
• Users will use similar tools / techniques to move assets (import / export) between a stand-alone DS instance
and DS on CP4D as if they would do in a previous version migration
• Users can additionally use MettleCI, to be packaged with any DataStage/Information Server for Cloud Pak for
Data extension, to assist with moving assets

Are there any jobs that will not run on Cloud Pak for Data?
• For a Lift and Shift approach:
• Most jobs are going to work unchanged on Cloud Pak for Data v3.5.
• Some functionality available on Cloud Pak for Data (v3.5), is in deprecated state and should therefore be
appropriately migrated to supported functionality

IBM Data and AI / © 2020 17


DataStage Modernization/Migration FAQs

What should clients do with their Server jobs / Server routines?


• IBM plans to provide a Server to Parallel accelerator tool – planned in early 2021 -- to help clients convert
Server jobs to Parallel jobs. The Server to Parallel job migration can be started prior to migrating to Cloud Pak
for Data.

What should clients do if they are on DataStage V11.5 ?


• Clients should work with their sales representative to create a modernization path to the DataStage
cartridge, as there are various routes that can be taken depending on the environment in question.

IBM Data and AI / © 2020 18


What help/assistance is IBM providing?
Description When Available

Migration License Users can request a migration waiver - up to a total of 12 month – during which they Now
Waiver can continue to run their excising DataStage / IS environment while migrating to
DataStage on CP4D
Job assessment Users can utilize the job assessment tool from Data Migrators Now
https://upgrade.mettleci.com/ to understand job compatibility and upgrade complexity

CI/CD Tooling IBM is planning to provide the entire MettleCI tool set (including the above assessment Q4/2020 – Q1/2021
component) to any DataStage / Information Server on Cloud Pak for Data licensee.
This tool set supports both upgrade/migration of DataStage as well as CICD

Server to Parallel Job IBM is planning to provide users with an accelerator tool to migrate DS Server jobs to Q1/2021
conversion Parallel jobs
accelerator
Service-led IBM has a full range of service offerings. Starting from a JumpStart deployment services Now
deployment / all the way to complete end to end migration engagements
migration
IBM Data and AI / © 2020 19
Result: 360 View
One of England's major sport
Increased ticket sale by
leagues uses IBM technology to 50%
increase ticket sale and fan 43% increase in email
engagement marketing campaign
success
Transformation:
• Creation of a centralized customer hub
• IBMs Integration and Governance capabilities
provide the backbone to cleanse, integrate, Business Goal:
merge, manage and govern the CRM • Boost fan engagement on the digital channel to
• Now providing a 360-degree view of each fan’s drive investment and participation in the sport
individual behaviors and preferences based on Challenge:
their interactions across all its channels
• Lacking a single view of a customer (fan) due to
siloed data spread across many separate systems
• High risk of customer dissatisfaction due to
inconsistent communication
IBM Data and AI / © 2020 20
A state judiciary in the US Result Compliance &
accelerates its case review 98%-time savings Risk
process by 98 percent during case review
Management
up to $10M annual
cost savings
Transformation:
• A risk assessment tool keeps people from being
jailed while awaiting trial
• Utilizing IBMs Data Integration and MDM solution, Challenges
the Judiciary established an automated risk
assessment system that takes less than three Jailing people often makes their circumstances worse by
minutes per case. This translates into a 98 percent keeping them from working and losing their jobs causing
time savings compared to three hours required for defaults on home mortgages, family stress and more.
manual information gathering and records analysis This judiciary is finding ways to eliminate the need to jail
for an expected savings of USD 10 million annually. people while waiting for their day in court.

IBM Data and AI / © 2020 21


Learn more about DataStage

Performance Tech Paper: Up to 30% Faster Execution Time

Video: Auto-scaling and workload management

Blog: Data Integration: The vital baking ingredient in your AI strategy

Tech Talks: Community webinars

Solution brief: IBM DataStage

Join the online DataStage community: bit.ly/datastage-community

IBM Data and AI / © 2020 22


Appendix

IBM Data and AI / © 2020 24


Canadian Health Provider Result
Analytics &
improves quality of care through Clinicians and managers
Insights
Instant insights into
analytics operations
Improve efficiency and
reduced cost

Transformation:
• Creating and utilizing a single integrated analytics
platform
Business Challenge
• End to end process from high speed integration of
source data into the warehouse to sub-second Decision makers must monitor key metrics that
response for complex analytics driving the insights influence the hospital’s care processes and funding
dashboard and applying optimization reimbursement.
The hospital needed an analytics architecture that could
provide dynamic insight into large volumes of data.

IBM Data and AI / © 2020 25


US Bank leverages IBMs Cloud Result Data Hub / Data
• Accelerated AI time to
Pak for Data platform to value
Exploration
accelerate AI • Improved Client
experience and
analytics results
Transformation:
• Using IBM Cloud Pak for Data System for rapid
deployment and scaling of AI.
• Single interface platform for end-to-end enterprise
analytics and easy creation of a customer 360 Benefits of an integrated Stack
system
“…The integrated stack contains
• Utilizing an integrated stack of services for data what we need to improve data quality, catalog our data
acquisition, cleansing, cataloguing, collaboration and assets, enable data collaboration,
data science and build/operationalize data sciences. We're able to
• Supports mandates for compliance with privacy move quickly with design, test, build and deployment of
regulations new models and analytical applications.”

IBM Data and AI / © 2020 26


Expert Lab Services
Modernization Launch Package
Assessment Activities

• Assessment tool for sizing and estimating DataStage • REMOTE ASSESSMENT | Review high level architecture,
& QualityStage Job Migration: business objectives & prerequisites

• Graph Analysis • DISCOVERY WORKSHOP | Migration Planning; Determine


sample set for migration
• Computational Complexity

• Compliance Rules • INSTALL | Installation + configuration of OpenShift + Cloud


Pak for Data + DataStage Enterprise (or Plus)

• OPERATIONAL WORKSHOP | Walk client through the platform


& demonstrate basic functionality

• DATASTAGE CODE MIGRATION | Migrate DataStage project(s)


to Cloud Pak for Data environment & perform sample
testing

• MIGRATION GUIDANCE | 1 Month of Expertise Connect


Advanced resource to support DataStage migration 27
DataStage Upgrade Process - Comparison
Traditional Upgrade (Manual) Automated Upgrade
Proactive (Upfront Analysis & Visibility to Ongoing upgrade
Upgrade Approach Reactive (Trial & Error Method)
Process)
Upfront Complexity Analysis Report available. Also, provides
Complexity Mostly Unknown (Upgrade blockers unknown)
Compliance summary to adopt best practices.
Accelerated Timeline
Longer Timeline
• Unit Test can be limited to jobs high complex jobs, unique
• Unit Test must be performed on almost all jobs due
patterns, jobs that has upgrade impacts without worry about
to job dependency
dependency.
• Late Discovery of Risks and Issues therefore more
• on almost all jobs due to job dependency
time spent on troubleshooting and issue remediation
Upgrade Duration • Risks / Issues known based on upfront analysis of code
• Manual migration of project settings and
• Automated migration of project settings and environment
environment configurations.
configurations.
• Data Comparison is manual and more time
• Data Comparison is automatic and quicker as it provides visual
consuming to identify which jobs produces
representation of discrepancies at field level as well as tied to a
inconsistent results
given DataStage job
Agile based upgrade based on Integration with DevOps and OSH
Agile Method Very limited due to jobs dependency
intercept capability

Monitoring & Tracking Provides Live dashboard, Report cards, Workflows and monitoring
Manual reporting required
Upgrade Process (consolidated view of multi-source SDLC)
IBM Data and AI / © 2020

You might also like