0% found this document useful (0 votes)
186 views16 pages

Lab - Batch Data Ingestion With DMS - Instructor Setup

The document provides instructions for setting up an instructor environment for a Database Migration Services lab. It includes steps to: 1. Launch an AWS CloudFormation stack to create resources like a VPC, subnets, security groups, EC2 instance, and RDS PostgreSQL database. 2. Populate the RDS database by running scripts from a GitHub repository on the EC2 instance. 3. Optional steps to enable change data capture on the database for ongoing replication demonstrations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views16 pages

Lab - Batch Data Ingestion With DMS - Instructor Setup

The document provides instructions for setting up an instructor environment for a Database Migration Services lab. It includes steps to: 1. Launch an AWS CloudFormation stack to create resources like a VPC, subnets, security groups, EC2 instance, and RDS PostgreSQL database. 2. Populate the RDS database by running scripts from a GitHub repository on the EC2 instance. 3. Optional steps to enable change data capture on the database for ongoing replication demonstrations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Amazon Web Services

Data Engineering Immersion Day


Database Migration Services Instructor Setup Instructions
January 2019

1
Table of Contents
Introduction .......................................................................................................................... 2
Create the Instructor Environment ......................................................................................... 3
Create the Change Data Capture Environment (Optional) ....................................................... 6
Appendix: AWS CloudFormation Template ........................................................................... 10

1
Database Migration Services Instructor Environment for the Lab

Introduction
***Make sure you select the us-east-1 (Virginia) region***

The Database Migration Services (DMS) hands-on lab provide a scenario, where participant
learns to hydrate Amazon S3 data lake with a relation database. To achieve that, participants
need a source endpoint and this guide helps instructors set up a PostgreSQL database with
public endpoint as the source database.

In this lab, you will complete the following tasks:

1. Create the source database environment.


2. Hydrate the source database environment.
3. Update the source database environment to demonstrate CDC replication within DMS.

Relevant information about this lab:

• Expected setup time: 45 minutes


• Source database name: sportstickets
• Source schema name: dms_sample

2
Database Migration Services Instructor Environment for the Lab

Create the Instructor Environment


In this section, you are going to create a PostgreSQL RDS instance as data source for AWS Data
Migration Service to consume by lab attendees for data migration to Amazon S3 data lake.

1. Sign in to the Console where you will host the source database environment.

2. Navigate to the AWS CloudFormation page.

3. Launch a new stack with the AWS CloudFormation template


DMSLab_instructor_CFN.json provided with your lab package. Make sure to select us-
east-1 (Virginia) region.

Alternatively, You can follow instruction in Appendix : AWS CloudFormation Template to


create your AWS CloudFormation template for this lab.

4. Give stack name and Enter the Key Pair to use. Please make sure to create an Amazon
EC2 Key pair if don’t have one in select us-east-1 (Virginia) region. Follow User guide
Amazon EC2 key pairs to create a key pair.

5. Enter a tag for the Name that identifies the resources as part of this lab.

6. Launch the stack. It may take 15 minutes for the stack to launch.
This stack creates a new VPC, Subnets, Security groups, EC2 instance, Route table,
Routes, and an RDS Postgres instance and takes about 20 minutes to launch. You can
see all resources listed below:

3
Database Migration Services Instructor Environment for the Lab

7. Once the stack is launched, navigate to the Amazon Relational Database Service
(Amazon RDS) page and select Instances > dmslabinstance and Copy the instance
Endpoint information as shown in below screenshot

4
Database Migration Services Instructor Environment for the Lab

8. SSH to the ec2 instance created by this template and execute the following command(s)
in sequence:

cd aws-database-migration-samples/PostgreSQL/sampledb/v1/

export PGPASSWORD=master123

nohup psql --host=<instance endpoint> --port=5432 --dbname=sportstickets --


username=master -f install-postgresql.sql >& ~/install.out &

For example : nohup psql --host=dmslabinstance.ccla1oozkrry.us-east-


1.rds.amazonaws.com --port=5432 --dbname=sportstickets --username=master -
f install-postgresql.sql >& ~/install.out &

You will get a process id upon successful submission:

To see how your job is doing you can observe install.out file by giving command

cat ~/install.out

Note:
i. It may include messages about non-existing table, but you should not see
any errors and the background process will end when complete. You can
check whether the process is still running with the following command.
ps -aef | grep psql

Wait 15 to 20 minutes for the install to complete.

The github repository for aws-database-migration-samples is located here:


https://github.com/aws-samples/aws-database-migration-
samples/tree/master/PostgreSQL/sampledb/v1
You can read though the documentation to better understand the source database
environment.

5
Database Migration Services Instructor Environment for the Lab

When you want to generate transactions to demonstrate DMS CDC functionality you can
execute the following commands:
psql --host=<instanceaddress> --port=5432 --dbname=sportstickets --username=master

enter the password “master123” when prompted, then you can execute the following within
the psql command prompt (sportstickets=>)

select dms_sample.generateticketactivity(1000);

select dms_sample.generatetransferactivity(100);

Note:
When enabling CDC functionality in DMS, only one DMS instance/task should activate “Ongoing
replication” to avoid conflicts.
When replicating to multiple targets, the processing to fan out the updates should begin with
the Amazon S3 bucket, that is the target of the DMS task responsible for Ongoing replication.
The process should not begin with the source database, as only one CDC process should be
tracking and setting the last committed transaction that was replicated.

Create the Change Data Capture Environment (Optional)


If you are planning to show ongoing CDC capability you should also set the following attributes:

1. Create a custom DB parameter group in RDS console for postgres10. Go to Amazon RDS
Parameter groups and click on Create Parameter group button as shown below:

6
Database Migration Services Instructor Environment for the Lab

2. In your custom parameter group, you should:


a. Set rds.logical_replication to 1. This is a static parameter that requires a reboot
of the DB instance for the parameter to take effect .
b. Set the wal_sender_timeout parameter to 0. Setting this parameter to 0
prevents PostgreSQL from terminating replication connections that are inactive
longer than the specified timeout.
c. Increase max_wal_senders parameter from 10 to 20 to accommodate for Data
Migration Service.

7
Database Migration Services Instructor Environment for the Lab

3. Modify the RDS instance we created, and associate the custom parameter group with
the RDS instead of the default parameter group, and choose to apply it immediately.

4. Once you see that your instance parameters are in “pending-reboo” state, reboot the
RDS instance via RDS console to let the new static parameters take effect.

8
Database Migration Services Instructor Environment for the Lab

5. After the reboot of the database SSH to your ec2 instance and run following:

psql --host=<instanceaddress> --port=5432 --dbname=sportstickets --username=master

For example : nohup psql --host=dmslabinstance.ccla1oozkrry.us-east-


1.rds.amazonaws.com --port=5432 --dbname=sportstickets --username=master

enter the password “master123” when prompted, then you can run the
following SQL script to create the wrappers needed for DMS CDC replication:

BEGIN;
CREATE SCHEMA IF NOT EXISTS fnRenames;
CREATE OR REPLACE FUNCTION fnRenames.pg_switch_xlog() RETURNS pg_lsn AS $$
SELECT pg_switch_wal(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_xlog_replay_pause() RETURNS VOID AS $$
SELECT pg_wal_replay_pause(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_xlog_replay_resume() RETURNS VOID AS $$
SELECT pg_wal_replay_resume(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_current_xlog_location() RETURNS pg_lsn AS $$
SELECT pg_current_wal_lsn(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_is_xlog_replay_paused() RETURNS boolean AS $$
SELECT pg_is_wal_replay_paused(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_xlogfile_name(lsn pg_lsn) RETURNS TEXT AS $$
SELECT pg_walfile_name(lsn); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_last_xlog_replay_location() RETURNS pg_lsn AS $$
SELECT pg_last_wal_replay_lsn(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_last_xlog_receive_location() RETURNS pg_lsn AS $$
SELECT pg_last_wal_receive_lsn(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_current_xlog_flush_location() RETURNS pg_lsn AS $$
SELECT pg_current_wal_flush_lsn(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_current_xlog_insert_location() RETURNS pg_lsn AS $$
SELECT pg_current_wal_insert_lsn(); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_xlog_location_diff(lsn1 pg_lsn, lsn2 pg_lsn) RETURNS NUMERIC AS $$
SELECT pg_wal_lsn_diff(lsn1, lsn2); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_xlogfile_name_offset(lsn pg_lsn, OUT TEXT, OUT INTEGER) AS $$
SELECT pg_walfile_name_offset(lsn); $$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION fnRenames.pg_create_logical_replication_slot(slot_name name, plugin name,
temporary BOOLEAN DEFAULT FALSE, OUT slot_name name, OUT xlog_position pg_lsn) RETURNS RECORD AS $$
SELECT slot_name::NAME, lsn::pg_lsn FROM pg_catalog.pg_create_logical_replication_slot(slot_name, plugin,
temporary); $$ LANGUAGE SQL;

ALTER user master SET search_path to fnRenames, pg_catalog, "$user", public;


COMMIT;

Details on the above script can be found here


https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html
#CHAP_Source.PostgreSQL.v10)

9
Database Migration Services Instructor Environment for the Lab

Appendix: AWS CloudFormation Template


The AWS CloudFormation template is below. This template only works in the us-east-1 region.

Copy and paste this template into an instructor_dmslab.json file on your computer and save it.
Select that file in AWS CloudFormation for Step 3.

{
"AWSTemplateFormatVersion": "2010-09-09",
"Parameters" : {
"KeyName": {
"Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance",
"Type": "AWS::EC2::KeyPair::KeyName",
"ConstraintDescription" : "must be the name of an existing EC2 KeyPair in us-east-1
only."
}
},
"Resources": {
"dmsinstructorvpc": {
"Type": "AWS::EC2::VPC",
"Properties": {
"CidrBlock": "10.0.0.0/24",
"InstanceTenancy": "default",
"EnableDnsSupport": "true",
"EnableDnsHostnames": "true",
"Tags": [
{
"Key": "Name",
"Value": "DMSLabSourceDB"
}
]
}
},
"RDSSubNet": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"CidrBlock": "10.0.0.0/28",
"AvailabilityZone": "us-east-1d",
"VpcId": {
"Ref": "dmsinstructorvpc"
},
"Tags": [
{
"Key": "Name",
"Value": "DMSLabRDS1"
}
]
}
},
"EC2SubNet": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"CidrBlock": "10.0.0.32/28",
"AvailabilityZone": "us-east-1c",
"VpcId": {
"Ref": "dmsinstructorvpc"
},
"Tags": [
{
"Key": "Name",
"Value": "DMSLabEC2"
}
]
}
},

10
Database Migration Services Instructor Environment for the Lab

"RDSSubNet2": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"CidrBlock": "10.0.0.16/28",
"AvailabilityZone": "us-east-1b",
"VpcId": {
"Ref": "dmsinstructorvpc"
},
"Tags": [
{
"Key": "Name",
"Value": "DMSLabRDS2"
}
]
}
},
"igw0887475a258f00277": {
"Type": "AWS::EC2::InternetGateway",
"Properties": {
"Tags": [
{
"Key": "Name",
"Value": "DMSLabIGW"
}
]
}
},
"dopt1cc25278": {
"Type": "AWS::EC2::DHCPOptions",
"Properties": {
"DomainName": "ec2.internal",
"DomainNameServers": [
"AmazonProvidedDNS"
]
}
},
"rtb0c3fae104a7b64456": {
"Type": "AWS::EC2::RouteTable",
"Properties": {
"VpcId": {
"Ref": "dmsinstructorvpc"
},
"Tags": [
{
"Key": "Name",
"Value": "DMSLabRT"
}
]
}
},
"instancei0f63b887480639040": {
"Type": "AWS::EC2::Instance",
"Properties": {
"DisableApiTermination": "false",
"InstanceInitiatedShutdownBehavior": "stop",
"EbsOptimized": "true",
"ImageId": "ami-04681a1dbd79675a5",
"InstanceType": "t3.2xlarge",
"KeyName": {"Ref" : "KeyName" },
"UserData" : {"Fn::Base64" : {"Fn::Join" : ["", [
"#!/bin/bash -xe\n",
"yum install -y postgresql\n",
"yum install -y git\n",
"yum update -y\n",
"cd /home/ec2-user\n",
"git clone https://github.com/aws-samples/aws-database-migration-
samples.git\n"
]]}},
"Monitoring": "false",
"Tags": [
{

11
Database Migration Services Instructor Environment for the Lab

"Key": "Name",
"Value": "DMSLabEC2"
}
],
"NetworkInterfaces": [
{
"DeleteOnTermination": "true",
"Description": "Primary network interface",
"DeviceIndex": 0,
"SubnetId": {
"Ref": "EC2SubNet"
},
"PrivateIpAddresses": [
{
"PrivateIpAddress": "10.0.0.40",
"Primary": "true"
}
],
"GroupSet": [
{
"Ref": "sgDMSLabSG"
}
],
"AssociatePublicIpAddress": "true"
}
]
}
},
"rdsdmslabdb": {
"Type": "AWS::RDS::DBInstance",
"Properties": {
"AllocatedStorage": "20",
"AllowMajorVersionUpgrade": "false",
"AutoMinorVersionUpgrade": "true",
"DBInstanceClass": "db.t2.xlarge",
"DBInstanceIdentifier": "dmslabinstance",
"Port": "5432",
"PubliclyAccessible": "true",
"StorageType": "gp2",
"BackupRetentionPeriod": "7",
"MasterUsername": "master",
"MasterUserPassword": "master123",
"PreferredBackupWindow": "04:00-04:30",
"PreferredMaintenanceWindow": "sun:05:20-sun:05:50",
"DBName": "sportstickets",
"Engine": "postgres",
"EngineVersion": "10.4",
"LicenseModel": "postgresql-license",
"DBSubnetGroupName": {
"Ref": "dbsubnetdefaultdmsinstructorvpc"
},
"VPCSecurityGroups": [
{
"Ref": "sgrdslaunchwizard2"
}
],
"Tags": [
{
"Key": "workload-type",
"Value": "other"
}
]
}
},
"dbsubnetdefaultdmsinstructorvpc": {
"Type": "AWS::RDS::DBSubnetGroup",
"Properties": {
"DBSubnetGroupDescription": "Created from the RDS Management Console",
"SubnetIds": [
{
"Ref": "RDSSubNet"

12
Database Migration Services Instructor Environment for the Lab

},
{
"Ref": "EC2SubNet"
},
{
"Ref": "RDSSubNet2"
}
]
}
},
"sgDMSLabSG": {
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
"GroupDescription": "launch-wizard-6 created 2018-08-29T15:10:01.302-04:00",
"VpcId": {
"Ref": "dmsinstructorvpc"
}
}
},
"sgrdslaunchwizard2": {
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
"GroupDescription": "Created from the RDS Management Console: 2018/08/29
18:14:15",
"VpcId": {
"Ref": "dmsinstructorvpc"
},
"Tags": [
{
"Key": "Name",
"Value": "DMSLabRDS-SG"
}
]
}
},
"dbsgdefault": {
"Type": "AWS::RDS::DBSecurityGroup",
"Properties": {
"GroupDescription": "default"
}
},
"gw1": {
"Type": "AWS::EC2::VPCGatewayAttachment",
"Properties": {
"VpcId": {
"Ref": "dmsinstructorvpc"
},
"InternetGatewayId": {
"Ref": "igw0887475a258f00277"
}
}
},
"subnetroute1": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {
"RouteTableId": {
"Ref": "rtb0c3fae104a7b64456"
},
"SubnetId": {
"Ref": "RDSSubNet2"
}
}
},
"subnetroute2": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {
"RouteTableId": {
"Ref": "rtb0c3fae104a7b64456"
},
"SubnetId": {
"Ref": "RDSSubNet"

13
Database Migration Services Instructor Environment for the Lab

}
}
},
"subnetroute3": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {
"RouteTableId": {
"Ref": "rtb0c3fae104a7b64456"
},
"SubnetId": {
"Ref": "EC2SubNet"
}
}
},
"route1": {
"Type": "AWS::EC2::Route",
"Properties": {
"DestinationCidrBlock": "0.0.0.0/0",
"RouteTableId": {
"Ref": "rtb0c3fae104a7b64456"
},
"GatewayId": {
"Ref": "igw0887475a258f00277"
}
},
"DependsOn": "gw1"
},
"dchpassoc1": {
"Type": "AWS::EC2::VPCDHCPOptionsAssociation",
"Properties": {
"VpcId": {
"Ref": "dmsinstructorvpc"
},
"DhcpOptionsId": {
"Ref": "dopt1cc25278"
}
}
},
"ingress1": {
"Type": "AWS::EC2::SecurityGroupIngress",
"Properties": {
"GroupId": {
"Ref": "sgDMSLabSG"
},
"IpProtocol": "tcp",
"FromPort": "22",
"ToPort": "22",
"CidrIp": "0.0.0.0/0"
}
},
"ingress2": {
"Type": "AWS::EC2::SecurityGroupIngress",
"Properties": {
"GroupId": {
"Ref": "sgrdslaunchwizard2"
},
"IpProtocol": "tcp",
"FromPort": "5432",
"ToPort": "5432",
"SourceSecurityGroupId": {
"Ref": "sgDMSLabSG"
},
"SourceSecurityGroupOwnerId": "649225637812"
}
},
"ingress3": {
"Type": "AWS::EC2::SecurityGroupIngress",
"Properties": {
"GroupId": {
"Ref": "sgrdslaunchwizard2"
},

14
Database Migration Services Instructor Environment for the Lab

"IpProtocol": "tcp",
"FromPort": "5432",
"ToPort": "5432",
"CidrIp": "72.21.196.67/32"
}
},
"ingress4": {
"Type": "AWS::EC2::SecurityGroupIngress",
"Properties": {
"GroupId": {
"Ref": "sgrdslaunchwizard2"
},
"IpProtocol": "tcp",
"FromPort": "5432",
"ToPort": "5432",
"CidrIp": "0.0.0.0/0"
}
},
"egress1": {
"Type": "AWS::EC2::SecurityGroupEgress",
"Properties": {
"GroupId": {
"Ref": "sgDMSLabSG"
},
"IpProtocol": "-1",
"CidrIp": "0.0.0.0/0"
}
},
"egress2": {
"Type": "AWS::EC2::SecurityGroupEgress",
"Properties": {
"GroupId": {
"Ref": "sgrdslaunchwizard2"
},
"IpProtocol": "-1",
"CidrIp": "0.0.0.0/0"
}
}
},
"Description": "DMS Lab Instructor account",
"Metadata": {
"AWS::CloudFormation::Designer": {
"a79fb943-c167-4e59-8eda-911d4acc331f": {
"size": {
"width": 60,
"height": 60
},
"position": {
"x": 810,
"y": 390
},
"z": 1,
"embeds": []
}
}
}
}

15

You might also like