SlideShare a Scribd company logo
The MySQL Availability Company
Tungsten Replicator Master Class
Advanced: Working with Data Warehouse Targets
Chris Parker, Customer Success Director, EMEA & APAC
Topics
In this short course, we will
• Review replicator flow
• Explore Hadoop, Redshift and Vertica specific pre-requisites
• Review configurations
• Demo
Replicator Flow
How Hadoop Replication Works
THL
Target Replicator:
Applier
CSV
JS
HDFS
How the Hadoop Materialisation Works
Op Seqno ID Msg
I 1 1 Hello World!
I 2 2 Meet Continuent
D 3 1
I 3 1 Goodbye World
Op Seqno ID Msg
I 2 2 Meet Continuent
I 3 1 Goodbye World
insert into t1
values (1,”Hello World!”);
insert into t1
values (2,”Meet Continuent”);
update t1
set msg=”Goodbye World”
where ID = 1;
How Vertica Replication Works
cpimport
THL
Target Replicator:
Applier
CSV
JS JDBC
merge
How Redshift Replication Works
THL
Target Replicator:
Applier
CSV
JS JDBC
merge
S3
s3cmd
copy
Prerequisites
• Review online documentation
• https://docs.continuent.com
• Download the Prerequisite Checklist
• Extractor/Applier Hosts
• OS User
• /etc/hosts
• sudoers
• Ruby
• Java
• Network
• Review Port Requirements
• MySQL
• my.cnf settings
• User accounts
• Hadoop
• HDFS writeable by replicator user
• Vertica
• User Accounts
• JDBC Drivers
• Redshift
• User Accounts
• S3 Bucket
• S3 Tools for uploading
• AWS JSON Config
• All tables need Primary Keys
AWS JSON Config for Redshift
• /opt/continuent/share/s3-config-<servicename>.json
• awsS3Path — the location within your S3 storage where files should be loaded.
• awsAccessKey — the S3 access key to access your S3 storage. Not required if awsIAMRole is
used.
• awsSecretKey — the S3 secret key associated with the Access Key. Not required if awsIAMRole is
used.
• awsIAMRole — the IAM role configured to allow Redshift to interact with S3. Not required if
awsAccessKey and awsSecretKey are in use.
• s3Binary — the binary to use for loading csv file up to S3. (Valid Values: s3cmd, s4cmd, aws)
(Default: s3cmd)
• cleanUpS3Files - a boolean value used to identify whether the CSV files loaded into S3 should
be deleted after they have been imported and merged (Default: true)
Provisioning Options
• Traditional CSV export and import
• Dump and load through Blackhole engine
• If target support standard SQL, extract data as INSERTS
• For Hadoop, use Sqoop
Object Mapping
• Hadoop
• MySQL Database à HDFS Directory
• Table à Hive Compatible CSV File
• Row à Line in the file
• Redshift & Vertica (PostgreSQL Interface and Syntax)
• MySQL Instance à Database
• MySQL Database à Schema
DDLScan
• Reverse engineers MySQL objects
• Creates target specific DDL
• Can be used for single objects or entire databases
• Must be configured prior to starting replicators
• Must be run twice, once for base tables, once for staging tables
ddlscan –service alpha -template ddl-mysql-redshift.vm -db test >ddl.sql
ddlscan –service alpha -template ddl-mysql-redshift-staging.vm -db test
>ddl-staging.sql
Extractor Config
[defaults]
user=tungsten
install-directory=/opt/continuent
mysql-allow-intensive-checks=true
profile-script=~/.bash_profile
disable-security-controls=true
[alpha]
master=tr-ext-2
members=tr-ext-2
replication-user=tungsten
replication-password=secret
replication-port=3306
enable-heterogeneous-service=true
Applier Configs
[defaults]
user=tungsten
install-directory=/opt/continuent
profile-script=~/.bash_profile
disable-security-controls=true
[alpha]
master=tr-ext-2
members=verticahost
datasource-type=vertica
replication-user=dbadmin
replication-password=Secret123
replication-port=5433
vertica-dbname=demo
batch-enabled=true
batch-load-template=vertica6
batch-load-language=js
svc-applier-block-commit-interval=30s
svc-applier-block-commit-size=250000
[defaults]
user=tungsten
install-directory=/opt/continuent
profile-script=~/.bash_profile
disable-security-controls=true
[alpha]
master=tr-ext-2
members=tr-app-1
datasource-type=redshift
replication-user=dbadmin
replication-password=Secret123
replication-port=5439
replication-host=redshift-endpoint
redshift-dbname=demo
batch-enabled=true
batch-load-template=redshift
svc-applier-block-commit-interval=30s
svc-applier-block-commit-size=250000
VerticaRedshift
Applier Configs
[defaults]
user=tungsten
install-directory=/opt/continuent
profile-script=~/.bash_profile
disable-security-controls=true
[alpha]
master=tr-ext-2
members=hadoopapplier
datasource-type=file
property=replicator.datasource.global.csvType=hive
replication-user=tungsten
replication-password=secret
batch-enabled=true
batch-load-template=hadoop
batch-load-language=js
svc-applier-block-commit-interval=30s
svc-applier-block-commit-size=250000
Hadoop
Demonstration
Summary
What we have learnt today
• Reviewed replicator flow
• Explored Hadoop, Redshift and Vertica specific pre-requisites
• Reviewed configurations
Next Steps
In the next session we will
• Learn how to use Tungsten Replicator with MongoDB
THANK YOU FOR LISTENING
continuent.com
The MySQL Availability Company
Chris Parker, Customer Success Director, EMEA & APAC

More Related Content

What's hot (20)

PPT
Scaling MySQL using Fabric
Karthik .P.R
 
PDF
Sqoop
Prashant Gupta
 
PDF
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
DataStax
 
PDF
Presto updates to 0.178
Kai Sasaki
 
PDF
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Skills Matter
 
PDF
Dataflow in 104corp - AWS UserGroup TW 2018
Gavin Lin
 
PDF
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
 
PDF
Apache Sqoop: A Data Transfer Tool for Hadoop
Cloudera, Inc.
 
PPT
MySQL HA Percona cluster @ MySQL meetup Mumbai
Remote MySQL DBA
 
PDF
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
 
PDF
Sqoop on Spark for Data Ingestion
DataWorks Summit
 
PPTX
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
 
PDF
Apache Sqoop: Unlocking Hadoop for Your Relational Database
huguk
 
PDF
MySQL Query Optimization (Basics)
Karthik .P.R
 
PPTX
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon
 
PDF
Giraph+Gora in ApacheCon14
Renato Javier Marroquín Mogrovejo
 
PDF
The How and Why of Fast Data Analytics with Apache Spark
Legacy Typesafe (now Lightbend)
 
PPTX
Introduction to sqoop
Uday Vakalapudi
 
ODP
Hadoop and Cassandra at Rackspace
Stu Hood
 
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
Scaling MySQL using Fabric
Karthik .P.R
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
DataStax
 
Presto updates to 0.178
Kai Sasaki
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Skills Matter
 
Dataflow in 104corp - AWS UserGroup TW 2018
Gavin Lin
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Cloudera, Inc.
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
Remote MySQL DBA
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
 
Sqoop on Spark for Data Ingestion
DataWorks Summit
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
huguk
 
MySQL Query Optimization (Basics)
Karthik .P.R
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon
 
Giraph+Gora in ApacheCon14
Renato Javier Marroquín Mogrovejo
 
The How and Why of Fast Data Analytics with Apache Spark
Legacy Typesafe (now Lightbend)
 
Introduction to sqoop
Uday Vakalapudi
 
Hadoop and Cassandra at Rackspace
Stu Hood
 
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 

Similar to Training Slides: 351 - Tungsten Replicator for Data Warehouses (20)

PDF
Replicating in Real-time from MySQL to Amazon Redshift
Continuent
 
PDF
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Continuent
 
PDF
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Continuent
 
PDF
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Continuent
 
PDF
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Continuent
 
PDF
Replicate from Oracle to data warehouses and analytics
Continuent
 
PDF
Real-Time Data Loading from MySQL to Hadoop
Continuent
 
PDF
Set Up & Operate Real-Time Data Loading into Hadoop
Continuent
 
PDF
Tungsten University: Load A Vertica Data Warehouse With MySQL Data
Continuent
 
PDF
Migrating and living on rds aurora
Balazs Pocze
 
PDF
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
Amazon Web Services Korea
 
PDF
Training Slides: Tungsten Replicator AMI - The Getting Started Guide
Continuent
 
PDF
Training Slides: Intermediate 205: Configuring Tungsten Replicator to Extract...
Continuent
 
PDF
Sneak Peek: Continuent Tungsten 3.0
Continuent
 
PDF
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
Amazon Web Services Korea
 
PDF
Tungsten University: Setup and Operate Tungsten Replicators
Continuent
 
PDF
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Continuent
 
PDF
Connect to RDS MySQL 101: Your Easy Guide on How to Banish Glitches | The Ent...
Enterprise world
 
PDF
Bases de datos en la nube con AWS
Amazon Web Services LATAM
 
PDF
MySQL on AWS 101
Anders Karlsson
 
Replicating in Real-time from MySQL to Amazon Redshift
Continuent
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Continuent
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Continuent
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Continuent
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Continuent
 
Replicate from Oracle to data warehouses and analytics
Continuent
 
Real-Time Data Loading from MySQL to Hadoop
Continuent
 
Set Up & Operate Real-Time Data Loading into Hadoop
Continuent
 
Tungsten University: Load A Vertica Data Warehouse With MySQL Data
Continuent
 
Migrating and living on rds aurora
Balazs Pocze
 
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
Amazon Web Services Korea
 
Training Slides: Tungsten Replicator AMI - The Getting Started Guide
Continuent
 
Training Slides: Intermediate 205: Configuring Tungsten Replicator to Extract...
Continuent
 
Sneak Peek: Continuent Tungsten 3.0
Continuent
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
Amazon Web Services Korea
 
Tungsten University: Setup and Operate Tungsten Replicators
Continuent
 
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Continuent
 
Connect to RDS MySQL 101: Your Easy Guide on How to Banish Glitches | The Ent...
Enterprise world
 
Bases de datos en la nube con AWS
Amazon Web Services LATAM
 
MySQL on AWS 101
Anders Karlsson
 
Ad

More from Continuent (20)

PDF
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Continuent
 
PDF
Continuent Tungsten Value Proposition Webinar
Continuent
 
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Continuent
 
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Continuent
 
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Continuent
 
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Continuent
 
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Continuent
 
PDF
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Continuent
 
PDF
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Continuent
 
PDF
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Continuent
 
PPTX
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Continuent
 
PDF
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Continuent
 
PDF
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Continuent
 
PDF
Training Slides: 303 - Replicating out of a Cluster
Continuent
 
PDF
Training Slides: 206 - Using the Tungsten Cluster AMI
Continuent
 
PDF
Training Slides: 254 - Using the Tungsten Replicator AMI
Continuent
 
PDF
Training Slides: 253 - Filter like a Pro
Continuent
 
PDF
Training Slides: 252 - Monitoring & Troubleshooting
Continuent
 
PDF
Training Slides: 302 - Securing Your Cluster With SSL
Continuent
 
PDF
Webinar Slides: Global MySQL Availability: SaaS Cloud Contact Center Secures ...
Continuent
 
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Continuent
 
Continuent Tungsten Value Proposition Webinar
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Continuent
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Continuent
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Continuent
 
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Continuent
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Continuent
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Continuent
 
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Continuent
 
Training Slides: 303 - Replicating out of a Cluster
Continuent
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Continuent
 
Training Slides: 254 - Using the Tungsten Replicator AMI
Continuent
 
Training Slides: 253 - Filter like a Pro
Continuent
 
Training Slides: 252 - Monitoring & Troubleshooting
Continuent
 
Training Slides: 302 - Securing Your Cluster With SSL
Continuent
 
Webinar Slides: Global MySQL Availability: SaaS Cloud Contact Center Secures ...
Continuent
 
Ad

Recently uploaded (20)

PPTX
Pengenalan perangkat Jaringan komputer pada teknik jaringan komputer dan tele...
Prayudha3
 
PDF
Paper PDF: World Game (s) Great Redesign.pdf
Steven McGee
 
PDF
GEO Strategy 2025: Complete Presentation Deck for AI-Powered Customer Acquisi...
Zam Man
 
PPTX
MSadfadsfafdadfccadradfT_Presentation.pptx
pahalaedward2
 
PPTX
Finally, My Best IPTV Provider That Understands Movie Lovers Experience IPTVG...
Rafael IPTV
 
PPTX
Different Generation Of Computers .pptx
divcoder9507
 
PPTX
Perkembangan Perangkat jaringan komputer dan telekomunikasi 3.pptx
Prayudha3
 
PPTX
Birth-after-Previous-Caesarean-Birth (1).pptx
fermann1
 
PDF
The AI Trust Gap: Consumer Attitudes to AI-Generated Content
Exploding Topics
 
PPTX
AI at Your Side: Boost Impact Without Losing the Human Touch (SXSW 2026 Meet ...
maytaldahan
 
PPT
Introduction to dns domain name syst.ppt
MUHAMMADKAVISHSHABAN
 
PPTX
Artificial-Intelligence-in-Daily-Life (2).pptx
nidhigoswami335
 
PPTX
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
PPTX
The Monk and the Sadhurr and the story of how
BeshoyGirgis2
 
PPTX
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
PDF
How Much GB RAM Do You Need for Coding? 5 Powerful Reasons 8GB Is More Than E...
freeshopbudget
 
PDF
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
PPTX
Slides pptx: World Game's Eco Economic Epochs.pptx
Steven McGee
 
PPT
1965 INDO PAK WAR which Pak will never forget.ppt
sanjaychief112
 
PDF
UI/UX Developer Guide: Tools, Trends, and Tips for 2025
Penguin peak
 
Pengenalan perangkat Jaringan komputer pada teknik jaringan komputer dan tele...
Prayudha3
 
Paper PDF: World Game (s) Great Redesign.pdf
Steven McGee
 
GEO Strategy 2025: Complete Presentation Deck for AI-Powered Customer Acquisi...
Zam Man
 
MSadfadsfafdadfccadradfT_Presentation.pptx
pahalaedward2
 
Finally, My Best IPTV Provider That Understands Movie Lovers Experience IPTVG...
Rafael IPTV
 
Different Generation Of Computers .pptx
divcoder9507
 
Perkembangan Perangkat jaringan komputer dan telekomunikasi 3.pptx
Prayudha3
 
Birth-after-Previous-Caesarean-Birth (1).pptx
fermann1
 
The AI Trust Gap: Consumer Attitudes to AI-Generated Content
Exploding Topics
 
AI at Your Side: Boost Impact Without Losing the Human Touch (SXSW 2026 Meet ...
maytaldahan
 
Introduction to dns domain name syst.ppt
MUHAMMADKAVISHSHABAN
 
Artificial-Intelligence-in-Daily-Life (2).pptx
nidhigoswami335
 
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
The Monk and the Sadhurr and the story of how
BeshoyGirgis2
 
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
How Much GB RAM Do You Need for Coding? 5 Powerful Reasons 8GB Is More Than E...
freeshopbudget
 
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
Slides pptx: World Game's Eco Economic Epochs.pptx
Steven McGee
 
1965 INDO PAK WAR which Pak will never forget.ppt
sanjaychief112
 
UI/UX Developer Guide: Tools, Trends, and Tips for 2025
Penguin peak
 

Training Slides: 351 - Tungsten Replicator for Data Warehouses