Extracting SAP Data From Your On-Premise SAP System With Amazon AppFlow SAP OData Connector
Extracting SAP Data From Your On-Premise SAP System With Amazon AppFlow SAP OData Connector
SAP systems form a major part of most customers landscape. Most of the SAP systems hold
critical data and cannot be open to public internet. They need to be accessed using Secure
connections.
Customers are constantly looking to get more value from their SAP data by leveraging data and
analytics capabilities on the cloud. Customers want to build data lakes on cloud, for e.g,
Amazon Web Services (AWS) and combine SAP data with non-SAP data to gain insights
previously not possible. By doing this, Customers are able to take advantage of advanced data
and analytics services of cloud providers, regardless of where the SAP system resides. Hence,
a secure connection becomes very important.
In one such use-case, we wanted to extract the master data from SAP system. The Amazon
AppFlow SAP OData Connector, released in Q3 2021, is one such tool that allows customers to
extract data from SAP ERP/ SAP HANA / SAP BW systems which are on-premises or in any
cloud other than AWS. Data is then stored in Amazon S3 storage to be combined with non-SAP
data and/or analyzed by Machine Learning algorithms.
In this blog we will walk through the configuration steps required to setup Amazon AppFlow SAP
OData Connection from an SAP system that does not run on AWS or is an on-premise system.
Solution
The end to end solution requires configuration in multiple areas including: the SAP
system, the site to site VPN connection to AWS and the Amazon AppFlow service. This
blog will focus on the Amazon AppFlow configuration and provide general guidance for
the SAP system and the VPN configuration.
On-premise SAP system to AWS and Amazon AppFlow Architecture
Walkthrough
SAP systems are usually protected systems and not open to internet. If the system is behind a
VPN or on a different Cloud (non-AWS) additional configurations need to be done to enable the
connectivity through AWS PrivateLink, which is an encrypted connectivity set up required by
Amazon AppFlow.
In the blog, when we refer to source system we are referring to Amazon AppFlow (since it will be
initiating the request) and for target system we will be referring to the on-premises SAP system
(that needs to be connected to for data extraction).
• Public Hosted Zone in Route 53 with validation for domain name, VPC Endpoint, AWS
Certificate Manager (ACM) Certificate
• SSL certificates set up – generated through ACM
Prerequisites
For this walkthrough, you should have the following prerequisites and access:
• An AWS account
• VPC Console Access – To create subnets, VPC end-point services, Site-to-Site VPN
connection, Customer Gateway, Virtual Private Gateway
• EC2 Console Access – To create Network Load Balancer and Target Groups
• Route 53 Access – To create Hosted Zone and route to right Load Balancer, Validate
Endpoint Service and SSL Certificate.
• AWS Certificate Manager – To Provision certificate for SSL Connection
• On-Premises SAP system user access with authorization to use ODATA services.
• From the target end, VPN Tunnel IPs, Ports and the Internal IP of the server SAP system
is hosted on.
You can use the “Create VPC” Wizard from the VPC Dashboard on the AWS console to create
the subnets and route tables in a visualized way. Follow this guide
• Ensure you create subnets for at least 50% of the Availability Zones (AZ), this is to
ensure our network load balancer spans at least 50% of AZ for Amazon AppFlow to
work.
• Ensure DNS Hostname for VPC. A DNS hostname is a name that uniquely and
absolutely names a computer; it's composed of a host name and a domain name. DNS
servers resolve DNS hostnames to their corresponding IP addresses. We must have to
enable DNS hostname to use internet gateway or other gateways.
Once the VPC and its components are created, create and add an Internet Gateway for the
VPC and attach it in the route table, so all the routes are propagated correctly. We need to
update the route table, created above, so that it should also understand the internet gateway
address and allow them.
With the VPC setup completed, the next is to set up the AWS Site-to-Site VPN, which consists
of a number of steps.
• Configure routing
Once both sides are aligned, the Tunnels should show as “Up”. In case they show “Down”,
before proceeding with any further steps, the configuration of the tunnels in Site – to – Site VPN
Connection need to be revisited and aligned with the Target side, so that they show “Up”.
To create Target Group, the internal IP of the server hosting the on-premises SAP system is
needed. Also, we need to know the port through which the server accepts the requests.
When Creating the Target Group choose target type as “IP Addresses”, provide a Target Group
Name and provide the Transport Protocol and port (port open to receive requests).
Once the Target group is configured, the status will be “Unassigned”. This will change once the
Network Load Balancer (NLB) is attached to the Target Group.
• choose scheme as “Internal” as we are going to use it only within the VPC,
• Choose the VPC created in Step1 and assign all the Subnets.
• The NLB should be available in at-least 50% of the AZ for the region to work with
Amazon AppFlow.
Under the Listener, select the protocol as “TCP” and the port as “443”. Choose the target group
created above and create the Load Balancer. Further listeners can be added later.
The Target group will be in assigned status and if the connection through Tunnels created on
VPN is up, then the Health Check of the Internal IP will show “Healthy”
To create a hosted zone for an existing domain name, enter the domain name in the field. In the
descriptions, additional names for the same domain name can be entered. Clicking on create,
will create the “Publicly” hosted zone.
Figure 3 – Configuration for Public Hosted Zone
Once this is created, the zone will have 2 records “NS” and “SOA”. Additionally, records need to
be added to the Zone.
The first record that needs to be added is Network Load Balancer. To do that, click on “Create
Record”.
Choose the record type as “A” and keep the record name blank. Provide the value as “DNS
Record Name” of the Network Load Balancer and create the record.
Figure 5 – Choosing DNS Name of Network Load Balancer
The other 2 records for VPC endpoint service and Certificate Manager will be created in
subsequent steps in similar way.
The certificates can be created on AWS Certificate Manager (ACM) and assigned to the flow or
externally created certificates can be imported in ACM and used for this purpose. These
certificates must be trusted certificates from 3 party. Self-signed and wildcard certificates will
not work.
In this step, we will be provisioning certificate for the domain name given in Public Hosted zone
(above step) and validating the certificate by adding the record in Route 53.
Open the ACM and Click on “Request” and choose “Request a Public certificate”.
The next page will show the certificate details and the status will be in “Pending Validation”.
Figure 6 – Certificate Created in AWS Certificate Manager
Open Route 53 Public Hosted Zone and click on “Create Record”. The Certificate Manager
expects a record of type CNAME. So, choose Record Type as “CNAME”.
In the “Name” field, copy the value under CNAME name without the domain name extension.
In the “Value” field, copy the value under CNAME value and save the record.
Figure 7 – Addition of Certificate record to Route 53
After the record is saved, refreshing the page in AWS certificate Manager will set the status of
the certificate as “Issued” and it is ready to use in the flow.
Go to VPC Console and go to VPC Endpoint services. Click on “Create endpoint service”.
Give a name and choose Network Load Balancer. List of available load balancers in the region
is shown, choose the one that was created in step 5. Keep rest defaults and create the service.
Once it is provisioned, go to “Allow Principals” tab. On this tab, click on “Allow Principals”, and
add “appflow.amazonaws.com” as the principal.
Next click on tab Endpoint Connections, choose the connection and click on “Accept”. This will
make the connection to “Available”.
Go back to the Network load Balancer and edit the listener to attach the certificate created in
ACM.
Figure 8 – Attaching certificate to Network Load Balancer Listener
Add the endpoint service to the Route 53 record set. As described in above steps, add a new
record, the type of the record will be “TXT”, the name of the record is the “Domain Verification
name” and the value is “Domain verification Value”.
Go to Amazon AppFlow and go to “Connections”. Select the connector type “SAP ODATA” and
click “Create Connection”
• Enter the “Application Host URL” of the SAP system from which the data must be pulled.
• Enter valid SAP system Client Number, Logon Language, User Name and Password.
• In the Private Link radio button, choose “Enabled”, and copy the service name of VPC
endpoint in this field, highlighted in step 8.