SlideShare a Scribd company logo
DM-57-2017
Data movement issues: Explicit
SQL Pass-Through can do the trick
Kiran Venna
Dataspace Inc.
Agenda
• Motivation
• Introduction
• Different categories to access Teradata
tables
• Case study 1: ETL Design with write
access to Teradata permanent tables
• Case study 2: ETL Design with no write
access to Teradata permanent tables
• Case Study 3: ETL design involving many
Teradata tables and SAS tables
Agenda contd..
• Case study 4: Advantage of using explicit
SQL pass-through in SAS macro.
• General Issue with macro variables in
explicit SQL pass-through and solution to it.
• Issue of remerging summary statistics with
data in explicit SQL pass-through and
solution to it
• Conclusion
Motivation
• Very Long running queries
• Failed queries
Introduction
• SAS and Teradata play an important role
in decision support systems (DSS).
• Teradata is used for the purpose of data
warehousing, where in large amounts of
data can be stored and retrieved quickly.
• SAS is powerful in doing reporting and
analytics owing to lot of custom
procedures
Introduction contd..
• ETL with SAS involves moving data in
both directions.
• Data movement between SAS and
Teradata will increase I/O time .
• Teradata executes query in parallel and
can complete ETL Job in less time.
• Explicit SQL pass-through is essential for
decreasing run time of SAS Job
Different categories to access
Teradata tables and
advantages/disadvantages with
each approach
By Libname statement
libname teralib teradata server=myserver user=myuserid
pwd=mypass;
data teralib.tab1;
set teralib.tab(keep =name);
Run;
proc sql;
create table teralib.tab3 as
select * from teralib.tab
where name like ‘%ABC%’;
quit;
Explicit SQL Pass-Through
proc sql;
connect to teradata (server=myserver
user=myuserid pw=mypass);
execute(create table edwwrkuser.tab4 as
select * from edwwrkuser.tab)
with data primary index(cust_id)) by teradata;
execute(commit work) by teradata;
disconnect from teradata;
quit;
Pros/Cons of libname method
Pros
• Familiar syntax and functions
• Many queries sent to DBMS directly
• DBDIRECTEXEC= System option passing
queries directly to the Teradata for execution.
Cons
• Some queries and SAS functions are not
passed to Teradata.
• This will lead to huge I/O processing and also
results in inefficient query processing.
Pros/Cons of Explicit SQL Pass method
Pros
• Decreases data movement
• Because of parallel processing capabilities
of Teradata it enhances query
performance.
Cons
• Understand various nuances of Teradata
SQL.
Case study 1: ETL design
involving lot of Teradata tables
with write access to Teradata
permanent tables and final
output as SAS table.
Case Study 1—Write access to Teradata
permanent table
proc sql;
connect to teradata (server=srv user=userid pw=mypass);
/* 1st table created*/
execute(create edwwrkuser.staging_customer as
select * from edwwrkuser.Cusomter table
where create_dt between ‘2017-01-01’ and ‘2017-01-31’ )
with data primary index(cust_id)) by teradata;
execute(commit work) by Teradata;
/* second Teradata table is created*/
execute(create table edwwrkuser.staging_txn_tbl as
select cust_id, txn_id from PD_EDW_DB.Cusomter table
where order_nb = 1) with data primary index(cust_id)) by
teradata;
Case Study 1—Write access to Teradata
permanent table –contd…
/* final Teradata to be created*/
execute(create table edwwrkuser.Final_cust_txn
select a.* , b.txn_id from edwwrkuser.staging_customer a
inner join edwwrkuser.staging_txn_tbl
On a.cust_id =b.cust_id) with data no primary index)by
teradata;
execute(commit work) by Teradata;
/* cleaning up permanent table*/
execute( drop table edwwrkuser.staging_customer)by
teradata;
disconnect from teradata;
quit;
Moving final datasets to SAS
libname teradb Teradata user=**** pw=****
FASTEXPORT=YES;
libname saslib '/u/mystuff/sastuff/hello';
data saslib.staging_customer_new;
set teradb.staging_customer_new;
run;
Case study 2: ETL design
involving lot of Teradata tables
with no write access to Teradata
permanent tables and final
output as SAS table.
Case Study 2—No Write access to
Teradata permanent table
proc sql;
connect to teradata (server=myserver user=myuserid
pw=mypass connection = global);
/* 1st volatile table created*/
execute(create volatile table staging_customer as
select * from edwwrkuser.Cusomter table
where create_dt between ‘2017-01-01’ and ‘2017-01-31’ )
with data primary index(cust_id)on commit preserve
rows) by teradata;
execute(commit work) by teradata;
Case Study 2—No Write access to
Tetadata permanent table
/* final Teradata table to be created*/
execute(create volatile table Final_cust_txn
select a.* , b.txn_id from staging_customer a
inner join
staging_txn_tbl
on a.cust_id =b.cust_id)
with data no primary index on commit preserve rows)by
teradata;
execute(commit work) by Teradata;
disconnect from teradata;
quit;
Moving final datasets into SAS
libname tdtemp teradata server=server user=userid
pwd=password connection=global dbmstemp=yes;
libname saslib '/u/mystuff/sastuff/hello';
data saslib.staging_customer_new;
set tdtemp.staging_customer_new;
run;
Case Study 3: ETL design
involving many Teradata tables
with few reference tables in SAS.
Moving reference table into Teradata
Permanent table
libname teralib teradata server=server user=userid
pwd=password ;
libname saslib '/u/mystuff/sastuff/hello';
data teralib.staging_customer_ref(fastload =yes
dbcreate_table_opts= 'primary index(cust_id)');
set saslib.cust_ref;
run;
• Followed by ETL of Case study 1
Moving reference table into Teradata
Volatile table
libname tdtemp Teradata user=**** pw=****
connection=global dbmstemp=yes;
libname saslib '/u/mystuff/sastuff/hello';
data tdtemp.staging_customer_ref;
set saslib.cust_ref;
run;
• Followed by ETL of Case study 2
Case study 4: Advantage of
using explicit SQL pass-through
in SAS macro.
SAS MACRO - Explicit SQL Pass-
Through --Introduction
• Inspiration data scrubbing global macros
in one of the projects.
• data scrubbing on Teradata table and
follows business rules and a final dataset
is created in SAS.
• Macro with Implicit pass through and DATA
Step --2 hours.
• Explicit SQL Pass-Through less than 10
minutes
SAS MACRO - Explicit SQL Pass-
Through
%macro test(tddbnm =, tdtblnm= ,saslibnm =, sastblnm=);
proc sql;
connect to teradata (server=myserver user=myuserid
pw=mypass);
/* one of the data scrubbing step*/
execute(UPDATE &tddbnm..&tablenm
SET name_indicator = ‘BAD’
WHERE customer_name is null) by Teradata;
/* this could be many steps not shown for simplicity*/
SAS MACRO - Explicit SQL Pass-
Through –contd..
/* moving data to SAS*/
proc sql;
connect to teradata (server=myserver
user=myuserid pw=mypass);
create table &saslibnm..&sastblnm as
select * from connection to teradata
(Select * from &tddbnm..&tdtblnm);
quit;
%mend;
General Issue with macro
variables in explicit SQL pass-
through and solution to it
Issue with macro variable
• SAS Macro facility understands macro
variables in double quotes
%let start_dt = 2017-07-01;
%let end_dt = 2017-07-31;
execute(create table edwwrkuser.staging_customer as
select * from edwwrkuser.Cusomter table
where create_dt between “&start_dt” and “&enddate”)
• Results in error –Double quotes are used
for columns in Teradata
1st solution
%let start_dt = '2017-07-01';
%let end_dt = '2017-07-31';
proc sql;
connect to teradata (server=myserver
user=myuserid pw=mypass);
execute(create table edwwrkuser.staging_customer
as select * from edwwrkuser.Cusomter table
where create_dt between &start_dt and &enddate)
with data primary index(cust_id)) by teradata;
2nd Solution
• %let start_dt = 2017-07-01;
• %let end_dt = 2017-07-31;
• create_dt between %bquote('&start_dt') and
%bquote('&end_dt')
Issue of remerging summary
statistics with data in Explicit
SQL pass-through and solution
to it
Select and group by different variables
proc sql;
create table teradb.reserve_custagg as
select hhold_id,
cust_id,
count(res_id) as cnt_reserve from
teradb.customer_table
group by hhold_id;
Quit;
Note in SAS and Error in Pass through
• "NOTE: The query requires remerging
summary statistics back with the original
data.”
• SELECT Failed. [3504] Selected non-
aggregate values must be part of the
associated group.
Rewrite in Explicit SQL Pass-Through
select a.*, b.booking from
(select hhold_id, cust_id, from
edwwrkuser.booking_table)a
inner join
(sel hhold_id, count(reserv_id) as cnt_ reserve
from edwwrkuser.booking_table
group by 1)b
on a.hhold_id = b. hhold_id
Important authors
• Jeff Bailey --bulk load utilities
• Harry Droogendyk explain SAS equivalent
code in Teradata
Conclusion
• Saves tremendous response time for SAS
ETL Job’s and SAS Macro’s.
• No need access to permanent table
• Faster with Teradata Permanent table as
fastload and fastexport can be used.
• Issues with macro variables in can be
fixed with %bquote.
• Remerge feature of PROC SQL can be
emulated by query rewrite
Thanks
• Thanks for listening
• I would like to specially thank, Paul Kirk
Lafler for encouraging me to write papers.
• I would like to thank, Lakshmi Nadiya
Chintalapudi, Charannag Devarapalli ,
Anvesh Reddy Perati, Srirama Reddy for
helping me with proof reading and giving
their valuable suggestions.
Ad

More Related Content

What's hot (17)

Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
Brian Enochson
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL I
Sankhya_Analytics
 
Using T-SQL
Using T-SQL Using T-SQL
Using T-SQL
Antonios Chatzipavlis
 
Module02
Module02Module02
Module02
Sridhar P
 
Ado.net & data persistence frameworks
Ado.net & data persistence frameworksAdo.net & data persistence frameworks
Ado.net & data persistence frameworks
Luis Goldster
 
Introduction to ADO.NET
Introduction to ADO.NETIntroduction to ADO.NET
Introduction to ADO.NET
rchakra
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data Modelling
Knoldus Inc.
 
ADO .Net
ADO .Net ADO .Net
ADO .Net
DrSonali Vyas
 
Teradata Tutorial for Beginners
Teradata Tutorial for BeginnersTeradata Tutorial for Beginners
Teradata Tutorial for Beginners
rajkamaltibacademy
 
sql_server_2016_history_tables
sql_server_2016_history_tablessql_server_2016_history_tables
sql_server_2016_history_tables
arthurjosemberg
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
Ahmed Elbaz
 
paper
paperpaper
paper
Lukas Klingsbo
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
Arno Huetter
 
ADO.NET
ADO.NETADO.NET
ADO.NET
Farzad Wadia
 
Ch06 ado.net fundamentals
Ch06 ado.net fundamentalsCh06 ado.net fundamentals
Ch06 ado.net fundamentals
Madhuri Kavade
 
Chapter16
Chapter16Chapter16
Chapter16
gourab87
 
ADO.NET by ASP.NET Development Company in india
ADO.NET by ASP.NET  Development Company in indiaADO.NET by ASP.NET  Development Company in india
ADO.NET by ASP.NET Development Company in india
iFour Institute - Sustainable Learning
 

Similar to Data Movement issues: Explicit SQL Pass-Through can do the trick (20)

External & Managed Tables In Fabric Lakehouse.pptx
External & Managed Tables In Fabric Lakehouse.pptxExternal & Managed Tables In Fabric Lakehouse.pptx
External & Managed Tables In Fabric Lakehouse.pptx
Puneet Vijwani
 
Stretch db sql server 2016 (sn0028)
Stretch db   sql server 2016 (sn0028)Stretch db   sql server 2016 (sn0028)
Stretch db sql server 2016 (sn0028)
Antonios Chatzipavlis
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
Dushyant Nasit
 
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
TAISEEREISA
 
SQL Server 2008 Performance Enhancements
SQL Server 2008 Performance EnhancementsSQL Server 2008 Performance Enhancements
SQL Server 2008 Performance Enhancements
infusiondev
 
World2016_T5_S7_TeradataFunctionalOverview
World2016_T5_S7_TeradataFunctionalOverviewWorld2016_T5_S7_TeradataFunctionalOverview
World2016_T5_S7_TeradataFunctionalOverview
Farah Omer
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
Ravindra kumar
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
CitiusTech
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
Antonios Chatzipavlis
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
AWS Germany
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
政宏 张
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
Kai Sasaki
 
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerianHybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Data Con LA
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
Amir Reza Hashemi
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
Ado
AdoAdo
Ado
abhay singh
 
External & Managed Tables In Fabric Lakehouse.pptx
External & Managed Tables In Fabric Lakehouse.pptxExternal & Managed Tables In Fabric Lakehouse.pptx
External & Managed Tables In Fabric Lakehouse.pptx
Puneet Vijwani
 
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
TAISEEREISA
 
SQL Server 2008 Performance Enhancements
SQL Server 2008 Performance EnhancementsSQL Server 2008 Performance Enhancements
SQL Server 2008 Performance Enhancements
infusiondev
 
World2016_T5_S7_TeradataFunctionalOverview
World2016_T5_S7_TeradataFunctionalOverviewWorld2016_T5_S7_TeradataFunctionalOverview
World2016_T5_S7_TeradataFunctionalOverview
Farah Omer
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
Ravindra kumar
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
CitiusTech
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
Antonios Chatzipavlis
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
AWS Germany
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
政宏 张
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
Kai Sasaki
 
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerianHybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Data Con LA
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
Amir Reza Hashemi
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
Ad

Recently uploaded (20)

04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Ad

Data Movement issues: Explicit SQL Pass-Through can do the trick

  • 1. DM-57-2017 Data movement issues: Explicit SQL Pass-Through can do the trick Kiran Venna Dataspace Inc.
  • 2. Agenda • Motivation • Introduction • Different categories to access Teradata tables • Case study 1: ETL Design with write access to Teradata permanent tables • Case study 2: ETL Design with no write access to Teradata permanent tables • Case Study 3: ETL design involving many Teradata tables and SAS tables
  • 3. Agenda contd.. • Case study 4: Advantage of using explicit SQL pass-through in SAS macro. • General Issue with macro variables in explicit SQL pass-through and solution to it. • Issue of remerging summary statistics with data in explicit SQL pass-through and solution to it • Conclusion
  • 4. Motivation • Very Long running queries • Failed queries
  • 5. Introduction • SAS and Teradata play an important role in decision support systems (DSS). • Teradata is used for the purpose of data warehousing, where in large amounts of data can be stored and retrieved quickly. • SAS is powerful in doing reporting and analytics owing to lot of custom procedures
  • 6. Introduction contd.. • ETL with SAS involves moving data in both directions. • Data movement between SAS and Teradata will increase I/O time . • Teradata executes query in parallel and can complete ETL Job in less time. • Explicit SQL pass-through is essential for decreasing run time of SAS Job
  • 7. Different categories to access Teradata tables and advantages/disadvantages with each approach
  • 8. By Libname statement libname teralib teradata server=myserver user=myuserid pwd=mypass; data teralib.tab1; set teralib.tab(keep =name); Run; proc sql; create table teralib.tab3 as select * from teralib.tab where name like ‘%ABC%’; quit;
  • 9. Explicit SQL Pass-Through proc sql; connect to teradata (server=myserver user=myuserid pw=mypass); execute(create table edwwrkuser.tab4 as select * from edwwrkuser.tab) with data primary index(cust_id)) by teradata; execute(commit work) by teradata; disconnect from teradata; quit;
  • 10. Pros/Cons of libname method Pros • Familiar syntax and functions • Many queries sent to DBMS directly • DBDIRECTEXEC= System option passing queries directly to the Teradata for execution. Cons • Some queries and SAS functions are not passed to Teradata. • This will lead to huge I/O processing and also results in inefficient query processing.
  • 11. Pros/Cons of Explicit SQL Pass method Pros • Decreases data movement • Because of parallel processing capabilities of Teradata it enhances query performance. Cons • Understand various nuances of Teradata SQL.
  • 12. Case study 1: ETL design involving lot of Teradata tables with write access to Teradata permanent tables and final output as SAS table.
  • 13. Case Study 1—Write access to Teradata permanent table proc sql; connect to teradata (server=srv user=userid pw=mypass); /* 1st table created*/ execute(create edwwrkuser.staging_customer as select * from edwwrkuser.Cusomter table where create_dt between ‘2017-01-01’ and ‘2017-01-31’ ) with data primary index(cust_id)) by teradata; execute(commit work) by Teradata; /* second Teradata table is created*/ execute(create table edwwrkuser.staging_txn_tbl as select cust_id, txn_id from PD_EDW_DB.Cusomter table where order_nb = 1) with data primary index(cust_id)) by teradata;
  • 14. Case Study 1—Write access to Teradata permanent table –contd… /* final Teradata to be created*/ execute(create table edwwrkuser.Final_cust_txn select a.* , b.txn_id from edwwrkuser.staging_customer a inner join edwwrkuser.staging_txn_tbl On a.cust_id =b.cust_id) with data no primary index)by teradata; execute(commit work) by Teradata; /* cleaning up permanent table*/ execute( drop table edwwrkuser.staging_customer)by teradata; disconnect from teradata; quit;
  • 15. Moving final datasets to SAS libname teradb Teradata user=**** pw=**** FASTEXPORT=YES; libname saslib '/u/mystuff/sastuff/hello'; data saslib.staging_customer_new; set teradb.staging_customer_new; run;
  • 16. Case study 2: ETL design involving lot of Teradata tables with no write access to Teradata permanent tables and final output as SAS table.
  • 17. Case Study 2—No Write access to Teradata permanent table proc sql; connect to teradata (server=myserver user=myuserid pw=mypass connection = global); /* 1st volatile table created*/ execute(create volatile table staging_customer as select * from edwwrkuser.Cusomter table where create_dt between ‘2017-01-01’ and ‘2017-01-31’ ) with data primary index(cust_id)on commit preserve rows) by teradata; execute(commit work) by teradata;
  • 18. Case Study 2—No Write access to Tetadata permanent table /* final Teradata table to be created*/ execute(create volatile table Final_cust_txn select a.* , b.txn_id from staging_customer a inner join staging_txn_tbl on a.cust_id =b.cust_id) with data no primary index on commit preserve rows)by teradata; execute(commit work) by Teradata; disconnect from teradata; quit;
  • 19. Moving final datasets into SAS libname tdtemp teradata server=server user=userid pwd=password connection=global dbmstemp=yes; libname saslib '/u/mystuff/sastuff/hello'; data saslib.staging_customer_new; set tdtemp.staging_customer_new; run;
  • 20. Case Study 3: ETL design involving many Teradata tables with few reference tables in SAS.
  • 21. Moving reference table into Teradata Permanent table libname teralib teradata server=server user=userid pwd=password ; libname saslib '/u/mystuff/sastuff/hello'; data teralib.staging_customer_ref(fastload =yes dbcreate_table_opts= 'primary index(cust_id)'); set saslib.cust_ref; run; • Followed by ETL of Case study 1
  • 22. Moving reference table into Teradata Volatile table libname tdtemp Teradata user=**** pw=**** connection=global dbmstemp=yes; libname saslib '/u/mystuff/sastuff/hello'; data tdtemp.staging_customer_ref; set saslib.cust_ref; run; • Followed by ETL of Case study 2
  • 23. Case study 4: Advantage of using explicit SQL pass-through in SAS macro.
  • 24. SAS MACRO - Explicit SQL Pass- Through --Introduction • Inspiration data scrubbing global macros in one of the projects. • data scrubbing on Teradata table and follows business rules and a final dataset is created in SAS. • Macro with Implicit pass through and DATA Step --2 hours. • Explicit SQL Pass-Through less than 10 minutes
  • 25. SAS MACRO - Explicit SQL Pass- Through %macro test(tddbnm =, tdtblnm= ,saslibnm =, sastblnm=); proc sql; connect to teradata (server=myserver user=myuserid pw=mypass); /* one of the data scrubbing step*/ execute(UPDATE &tddbnm..&tablenm SET name_indicator = ‘BAD’ WHERE customer_name is null) by Teradata; /* this could be many steps not shown for simplicity*/
  • 26. SAS MACRO - Explicit SQL Pass- Through –contd.. /* moving data to SAS*/ proc sql; connect to teradata (server=myserver user=myuserid pw=mypass); create table &saslibnm..&sastblnm as select * from connection to teradata (Select * from &tddbnm..&tdtblnm); quit; %mend;
  • 27. General Issue with macro variables in explicit SQL pass- through and solution to it
  • 28. Issue with macro variable • SAS Macro facility understands macro variables in double quotes %let start_dt = 2017-07-01; %let end_dt = 2017-07-31; execute(create table edwwrkuser.staging_customer as select * from edwwrkuser.Cusomter table where create_dt between “&start_dt” and “&enddate”) • Results in error –Double quotes are used for columns in Teradata
  • 29. 1st solution %let start_dt = '2017-07-01'; %let end_dt = '2017-07-31'; proc sql; connect to teradata (server=myserver user=myuserid pw=mypass); execute(create table edwwrkuser.staging_customer as select * from edwwrkuser.Cusomter table where create_dt between &start_dt and &enddate) with data primary index(cust_id)) by teradata;
  • 30. 2nd Solution • %let start_dt = 2017-07-01; • %let end_dt = 2017-07-31; • create_dt between %bquote('&start_dt') and %bquote('&end_dt')
  • 31. Issue of remerging summary statistics with data in Explicit SQL pass-through and solution to it
  • 32. Select and group by different variables proc sql; create table teradb.reserve_custagg as select hhold_id, cust_id, count(res_id) as cnt_reserve from teradb.customer_table group by hhold_id; Quit;
  • 33. Note in SAS and Error in Pass through • "NOTE: The query requires remerging summary statistics back with the original data.” • SELECT Failed. [3504] Selected non- aggregate values must be part of the associated group.
  • 34. Rewrite in Explicit SQL Pass-Through select a.*, b.booking from (select hhold_id, cust_id, from edwwrkuser.booking_table)a inner join (sel hhold_id, count(reserv_id) as cnt_ reserve from edwwrkuser.booking_table group by 1)b on a.hhold_id = b. hhold_id
  • 35. Important authors • Jeff Bailey --bulk load utilities • Harry Droogendyk explain SAS equivalent code in Teradata
  • 36. Conclusion • Saves tremendous response time for SAS ETL Job’s and SAS Macro’s. • No need access to permanent table • Faster with Teradata Permanent table as fastload and fastexport can be used. • Issues with macro variables in can be fixed with %bquote. • Remerge feature of PROC SQL can be emulated by query rewrite
  • 37. Thanks • Thanks for listening • I would like to specially thank, Paul Kirk Lafler for encouraging me to write papers. • I would like to thank, Lakshmi Nadiya Chintalapudi, Charannag Devarapalli , Anvesh Reddy Perati, Srirama Reddy for helping me with proof reading and giving their valuable suggestions.