1539 ImplementingOracleToSnowflakeSynchronizationUsingCloudMassIngestionDatabases en H2L
1539 ImplementingOracleToSnowflakeSynchronizationUsingCloudMassIngestionDatabases en H2L
© Copyright Informatica LLC 2020, 2021. Informatica, the Informatica logo, and Informatica Cloud Data Integration
are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout
the world. A current list of Informatica trademarks is available on the web at https://www.informatica.com/
trademarks.html
Abstract
This article describes how to set up Mass Ingestion Databases and create your first database ingestion job to replicate
data from an Oracle source to a Snowflake target.
Supported Versions
• Informatica Intelligent Cloud Services Fall 2020
Table of Contents
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Use Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Environment setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Step1. Set up an organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Step 2. Download and install a Secure Agent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Step 3. Configure a runtime environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Implementation tasks for Oracle to Snowflake ingestion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Create a project folder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Prepare the Oracle source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Oracle privileges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Download the connectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Configure connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Oracle Database Ingestion connection properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Snowflake Cloud Data Warehouse V2 connection properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Configure a database ingestion task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Defining basic task information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Configuring the Oracle source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Configuring the Snowflake target. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Configuring runtime options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Deploy the database ingestion task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Run the database ingestion job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Other resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Overview
Many businesses are moving to cloud-based data warehouses and data lakes to modernize their data and analytics in
the cloud and to implement new-age projects that involve artificial intelligence (AI) and machine learning (ML). Their
biggest challenge is to migrate data from various siloed sources to cloud data lakes and data warehouses. They need
the ability to efficiently and accurately ingest large amounts of data from various sources and then keep the sources
and the targets in sync.
Informatica Intelligent Cloud Services Mass Ingestion can ingest data at scale from database, streaming, and file data
sources and transfer the data with low latency to selected messaging systems and cloud targets.
2
Mass Ingestion provides the following ingestion solutions:
• Mass Ingestion Databases. Propagates data from source objects in a database management system (DBMS)
to multiple types of targets. A database ingestion job can transfer a point-in-time snapshot of all source data
in a schema to a target in a batch operation. A database ingestion job can also incrementally propagate data
changes and schema changes in near real time from a database source such as Oracle to a target on a
continuous basis. If you select the combined initial and incremental load type, the database ingestion task
performs an initial load and then automatically switches to incremental load processing of data changes.
• Mass Ingestion Files. Transfers a large number of files of different types between on-premises and cloud
repositories. You can use Mass Ingestion Files to track and monitor file transfers.
• Mass Ingestion Streaming. Transfers real-time streaming and machine data from selected sources to selected
messaging systems and batch targets.
This article walks through the steps required to initially set up a Mass Ingestion Databases environment and then
configure and implement a database ingestion task that replicates Oracle data to a Snowflake target.
Use Case
A business user needs to move data from an on-premises Oracle database to a Snowflake data warehouse that is used
for analytics and reporting.
To achieve this goal, we'll configure a database ingestion task in Mass Ingestion Databases for a combined initial and
incremental load job. The initial load portion of the job initially populates the Snowflake target with point-in-time data
from Oracle. The incremental load portion of the job replicates Oracle change data in real time to keep the source and
target in sync. We'll also set schema drift options to automatically replicate some types of source DDL changes to the
target.
After deploying the task, we'll monitor the job and data flow in real time to check the job status and determine if any
bottlenecks or alerts occur.
Environment setup
In Administrator, configure your organization, runtime environment, and download and install a Secure Agent.
3
accounts, ingestion tasks, and information about jobs and security. When you set up an organization, perform the
following tasks:
• On the Organization page, configure organization properties such as the organization name and address,
authentication information, and notification email addresses.
• On the Licenses page, verify that your organization has a Cloud Unified Mass Ingestion Edition license, which
covers all ingestion types.
• On the Users, User Groups, and User Roles pages, configure users, user groups, and user role permissions.
To help you complete these tasks, Administrator provides context-sensitive help. To display the help for the page in
which you're working, click the help (?) icon and select Online Help.
Note: Mass Ingestion databases does not support the Hosted Agent and Secure Agent groups with multiple agents.
Secure Agent registration requires an install token. To get the install token, copy the token when you download the
agent or use the Generate Install Token option in Administrator. The token expires after 24 hours.
Before you download and install the Secure Agent, verify that no other Secure Agent is installed on the machine. If
there is, you must uninstall it.
4
The Secure Agent Manager opens and prompts you to register the agent as shown in the following image:
5. If you did not copy the install token when you downloaded the agent, click Generate Install Token on the
Runtime Environments page in Administrator, and copy the token.
6. In the Secure Agent Manager, enter the following information, and then click Register:
Option Description
User Name User name that you use to access Informatica Intelligent Cloud Services.
The Secure Agent Manager displays the status of the Secure Agent. It takes a minute for all of the services to
start.
7. If your organization uses an outgoing proxy server to connect to the internet, enter the proxy server
information.
8. Close the Secure Agent Manager.
The Secure Agent Manager minimizes to the taskbar and continues to run as a service until stopped.
5
Downloading and installing the Secure Agent on Linux
To install the Secure Agent on a Linux machine, you must download and run the Secure Agent installation program and
then register the agent.
Secure Agent registration requires an install token. To get the install token, copy the token when you download the
agent or use the Generate Install Token option in Administrator. The token expires after 24 hours.
Before you download and install the Secure Agent, verify that no other Secure Agent is installed on the machine using
the same Linux user account. If there is, you must uninstall it.
After you download the Secure Agent to your runtime environment, the DBMI packages are pushed to the on-premises
system where the Secure Agent runs, provided that you have custom licenses for both Mass Ingestion Databases and
the DBMI packages. You can then optionally configure properties for the Database Ingestion service that runs on the
Secure Agent.
6
Database Ingestion service properties
To change or optimize the behavior of the Database Ingestion service (DBMI agent) that the Secure Agent uses,
configure Database Ingestion properties for your runtime environment.
To configure the properties, open your runtime environment. Under System Configuration Details, click Edit. Then
select the Database Ingestion service and the DBMI_AGENT_CONFIG type.
Property Description
maxTaskUnits The maximum number of database ingestion tasks that can run concurrently on the on-prem
machine where the Secure Agent runs. The default value is 10.
serviceLogRetentionPeriod The number of days to retain each internal Database Ingestion service log file after the last
update is written to the file. When this retention period elapses, the log file is deleted. The
default value is 7 days.
Note: Service logs are retained on the Secure Agent host where they are created:
<infaagent>/apps/Database_Ingestion/logs.
taskLogRetentionPeriod The number of days to retain each job log file after the last update is written to the file. When
this retention period elapses, the log file is deleted. The default value is 7 days.
ociPath For Oracle sources and targets, the path to the Oracle Call Interface (OCI) file oci.dll or
libcIntsh.so. For a DBMI agent that is running, this value is appended to the path that is
specified in the PATH environment variable on Windows or in the LD_LIBRARY_PATH
environment variable on Linux.
serviceUrl The URL that the Database Ingestion service uses to connect to the Informatica Intelligent
Cloud Services cloud.
logLevel The level of detail to include in the logs that the Database Ingestion service produces.
Options are:
- TRACE
- DEBUG
- INFO
- WARN
- ERROR
The default value is TRACE.
taskExecutionHeapSize The maximum heap size, in gigabytes, for the Task Execution service. This value, in
conjunction with maxTaskUnits property, affects the number of concurrent database
ingestion tasks that can run on a Secure Agent. Try increasing the heap size to run more
tasks concurrently. Enter this value followed by "g" for gigabytes, for example, '9g'. The
default value is '8g'.
DBMI_WRITER_CONN_POOL_SIZE Indicates the number of connections that a database ingestion job uses to propagate
the change data to the target. The default value is 8. Valid values are 4 through 8.
7
Note: After you define or change an environment variable, restart the Database Ingestion agent service for the changes
to take effect.
To create a project, go to the Explore page and select to explore by projects, and then click New Project.
To create a project folder, go to the Explore page and open the project, and then click New Folder.
You can create one level of folders in a project. You cannot create folders within folders.
8
Prepare the Oracle source
To use Oracle sources in database ingestion tasks, first prepare the source database and learn about source-specific
usage considerations.
• Define the ORACLE_HOME environment variable on the Linux or Windows system where the Secure Agent runs
for Mass Ingestion Databases to use the Oracle Call Interface (OCI) to communicate with the Oracle source
database.
• Make sure the Mass Ingestion Databases user has the Oracle privileges that are required for the database
ingestion load type to be performed. For more information, see “Oracle privileges” on page 10.
• Database ingestion jobs read incremental change data from Oracle redo logs. For the ingestion jobs to access
redo logs that are remote from the on-prem system where the Secure Agent runs, make sure that the redo logs
are managed by Oracle Automatic Storage Management (ASM) or mounted to a network file system (NFS).
• For incremental load or combined initial and incremental load operations, perform the following prerequisite
tasks in Oracle:
- Enable ARCHIVELOG mode for the Oracle database. Also define an archive log destination.
- If your Oracle source tables have primary keys, ensure that supplemental logging is enabled for all primary
key columns. For source tables that do not have primary keys, ensure that supplemental logging is enabled
for all columns from which change data will be captured.
Note: When you create a database ingestion task, you have the option of generating a script that implements
supplemental logging for all columns or only primary key columns for the selected source tables.
If you do not have the authority to perform these tasks, ask your Oracle database administrator to perform
them.
• Ensure that the Oracle Database Client or Instant Client is installed. Typically, the Database Client is installed
when you install your Oracle version. If you do not have a client installed, you can download either client from
the Oracle downloads web site.
Review the following usage considerations:
• If Oracle source CHAR or VARCHAR columns contain nulls, the database ingestion job does not delimit the null
values with double-quotation (") marks or any other delimiter when writing data to a Amazon S3, Flat File,
Microsoft Azure Data Lake, or Microsoft Azure SQL Data Warehouse target.
• Mass Ingestion Databases does not support the following Oracle source data types with any target type or any
load type:
- INTERVAL
- LOBs
- LONG
- LONG RAW
- UROWID
- XMLTYPE
Source columns that have these data types are excluded from the target definition.
• Mass Ingestion Databases does not support invisible columns in Oracle source columns, regardless of the
target type. For these columns, database ingestion incremental load jobs and combined initial and incremental
load jobs propagate nulls to the corresponding target columns.
9
• For Oracle sources that use the multitenant architecture, the source tables must reside in a single pluggable
database (PDB) within a multitenant container database (CDB).
Oracle privileges
To deploy and run a database ingestion task that has an Oracle source, the source connection must specify a Mass
Ingestion Databases user who has the privileges required for the ingestion load type.
For a database ingestion task that performs an incremental load or combined initial and incremental load, ensure that
the user has been granted the following privileges:
GRANT CREATE SESSION TO <cmid_user>;
GRANT SELECT ON table TO <cmid_user>; -- For each source table created by user
GRANT EXECUTE ON DBMS_FLASHBACK TO <cmid_user>;
-- In the following, do not use ANY TABLE unless your security policy allows it.
GRANT FLASHBACK ON table|ANY TABLE TO <cmid_user>;
GRANT ALTER table|ANY TABLE TO <cmid_user>; -- To Execute supplemental logging option
-- Also ensure that you have access to the following ALL_* views:
ALL_CONSTRAINTS
ALL_CONS_COLUMNS
ALL_ENCRYPTED_COLUMNS
10
ALL_INDEXES
ALL_IND_COLUMNS
ALL_OBJECTS
ALL_TABLES
ALL_TAB_COLS
ALL_TAB_PARTITIONS
ALL_USERS
For a database ingestion task that performs an initial load, ensure that the user has the following privileges at
minimum:
GRANT CREATE SESSION TO <cmid_user>;
Configure connections
Configure an Oracle source connection and a Snowflake target connection on the Connections page in Administrator.
11
3. Configure the following connection details:
Property Description
After you select the connection type, additional properties that are specific to that type appear.
Property Description
Connection A name for the connection. This name must be unique within the organization. Connection names can
Name contain alphanumeric characters, spaces, and the following special characters: _ . + -
Spaces at the beginning or end of the name are trimmed and are not saved as part of the name.
Maximum length is 100 characters. Connection names are not case sensitive.
Description An optional description for the connection. Maximum length is 255 characters.
Type The type of connection. For an Oracle Database Ingestion connection, the type must be Oracle
Database Ingestion.
Runtime The name of the runtime environment where you want to run database ingestion tasks. You define
Environment runtime environments in Administrator.
User Name User name for the Oracle database login that is used to retrieve source metadata and data. The user
name cannot contain a semicolon.
Password Password for the Oracle database login. The password cannot contain a semicolon.
12
Property Description
Port Network port number used to connect to the database server. Default is 1521.
Service Name Service name or System ID (SID) that uniquely identifies the Oracle database. Specify the SID in the
following format to connect to Oracle databases: SID:<ORACLE_SID>
Code Page The code page of the database server. Database ingestion tasks use the UTF-8 code page. Default is
UTF-8.
Database An Oracle connection string, defined in TNS, that database ingestion tasks use to connect to the Oracle
Connect String database.
TDE Wallet The path and file name for the Oracle wallet file that is used for Oracle Transparent Data Encryption
Directory (TDE). Specify this property value only if you capture change data from TDE-encrypted tablespaces and
one of the following conditions are true:
- The Oracle wallet is not available to the database.
- The Oracle database is running on a server that is remote from Oracle redo logs.
- The wallet directory is not in the default location on the database host or the wallet name is not the
default name of ewallet.p12.
- The wallet directory is not available to the Secure Agent host.
TDE Wallet A clear text password that is required to access the Oracle TDE wallet and get the master key. This
Password property value is required if you need to read and decrypt data from TDE-encrypted tablespaces in the
Oracle source database.
Directory A local path prefix to substitute for the server path prefix of the redo logs on the Oracle server. This
Substitution substitute local path is required when the log reader runs on a system other than the Oracle server and
uses a different mapping to access the redo log files. Use this property in the following situations:
- The redo logs reside on shared disk.
- The redo logs have been copied to a system other than the Oracle system.
- The archived redo logs are accessed by using a different NFS mount.
Note: Do not use this statement if you use Oracle Automatic Storage Management (ASM) to manage the
redo logs.
You can define one or more substitutions. Use the following format:
server_path_prefix,local_path_prefix;server_path_prefix,local_path_prefix;...
Reader Active A mask that the log reader uses for selecting active redo logs when the Oracle database uses
Log Mask multiplexing of redo logs. The log reader compares the mask against the member names in an active
redo log group to determine which log to read. In the mask, you can use the asterisk (*) wildcard to
represent zero or more characters.
The mask can be up to 128 characters in length. It is case-sensitive on Linux or UNIX systems but not
on Windows systems.
Reader Archive The primary log destination from which the log reader reads archived logs, when Oracle is configured to
Destination 1 write more than one copy of each archived redo log. Enter a number that corresponds to a n value in an
Oracle LOG_ARCHIVE_DEST_n initialization parameter, where n is a value from 1 to 10.
If you set only one of the Reader Archive Destination 1 and Destination 2 properties, the log reader uses
that property setting. If you specify neither property, the archive log queries are not filtered by the log
destination.
13
Property Description
Reader Archive The secondary log destination from which the log reader reads archived logs when the primary
Destination 2 destination becomes unavailable or when the logs at the primary destination cannot be read. For
example, logs might have been corrupted or deleted. Enter a number that corresponds to the n value in
an Oracle LOG_ARCHIVE_DEST_n initialization parameter, where n is a value from 1 to 10. Usually, this
value is a number greater than 1.
Reader ASM In an Oracle ASM environment, the Oracle connection string, defined in TNS, that the log reader uses to
Connect String connect to the ASM instance that manages storage of active and archived redo logs for the source
database.
Reader ASM In an Oracle ASM environment, an Oracle user ID that the log reader uses to connect to the ASM
User Name instance that manages storage of active and archived redo logs for the source database. This user ID
must have SYSDBA or SYSASM authority. To use SYSASM authority, set the Reader ASM Connect As
SYSASM property to Y.
Reader ASM In an Oracle ASM environment, a clear text password for the user that is specified in the Reader ASM
Password User Name property. The log reader uses this password and the ASM user name to connect to the ASM
instance that manages storage of active and archived redo logs for the source database.
Reader ASM If you use Oracle 11g ASM or later and want the log reader to use a user ID that has SYSASM authority
Connect As to connect to the ASM instance, select this check box. Also specify a user ID that has SYSASM
SYSASM authority in the Reader ASM User Name property. To use a user ID that has SYSDBA authority, clear this
check box. By default, this check box is cleared.
Reader Mode Indicates the source of and types of Oracle redo logs that the log reader reads. Valid options are:
- ACTIVE. Read active and archived redo logs from the Oracle online system. Optionally, you can use
the Reader Active Log Mask property to filter the active redo logs and use the Reader Archive
Destination 1 and Reader Archive Destination 2 properties to limit the archived log destinations from
which to read archived logs.
- ARCHIVEONLY. Read only archived redo logs. Optionally, you can use the Reader Archive Destination
1 and Reader Archive Destination 2 properties to limit the archived log destinations from which to
read archived logs.
- ARCHIVECOPY. Read archived redo logs that have been copied to an alternate file system. Use this
option in the following situations:
- You do not have the authority to access the Oracle archived redo logs directly.
- The archived redo logs are written to ASM, but you do not have access to ASM.
- The archived log retention policy for the database server causes the archived logs to not be
retained long enough.
With this option, the Reader Archive Destination 1 and Reader Archive Destination 2 properties are
ignored.
Default is ACTIVE.
Reader Standby A mask that the log reader uses for selecting redo logs for an Oracle standby database when the
Log Mask database uses multiplexing of redo logs. The log reader compares the mask against the member names
in an redo log group to determine which log to read. In the mask, you can use the asterisk (*) wildcard
to represent zero or more characters.
The mask can be up to 128 characters in length. It is case-sensitive on Linux or UNIX systems but not
on Windows systems.
Standby Connect An Oracle connection string, defined in TNS, that the log reader uses to connect to the Oracle physical
String standby database for change capture when the database is not open for read-only access.
Standby User A user ID that the log reader uses to connect to the Oracle physical standby database for change
Name capture. This user ID must have SYSDBA authority.
14
Property Description
Standby A clear text password that the log reader uses to connect to the Oracle physical standby database for
Password change capture.
RAC Members The maximum number of active redo log threads, or members, in an Oracle Real Application Cluster
(RAC) that can be tracked. For a Data Guard physical standby database that supports a primary
database in a RAC environment, this value is the number of active threads for the primary database.
Valid values are 1 to 100. Default is 0, which causes an appropriate number of log threads to be
determined automatically. If this value is not appropriate for your environment, set this property to a
value greater than 0.
BFILE Access If you use the Amazon Relational Database Service (RDS) for Oracle source, select this check box. This
option enables access to redo logs for a cloud-based database instance deployed in RDS. By default,
this check box is cleared.
Note: You can also select this check box for an on-premises Oracle Database source to remotely
access redo logs in certain situations, such as for testing purposes.
The following table describes the Snowflake Cloud Data Warehouse V2 connection properties:
Connection Description
property
Runtime The name of the runtime environment where you want to run the tasks.
Environment You can specify a Secure Agent, Hosted Agent, or serverless runtime environment for a mapping.
Specify a Secure Agent or a serverless runtime environment for an elastic mapping.
Authentication Select the authentication method that the connector must use to log in to Snowflake.
Select Standard. Default is Standard.
Username The user name to connect to the Snowflake Cloud Data Warehouse account.
Password The password to connect to the Snowflake Cloud Data Warehouse account.
Warehouse The Snowflake warehouse name. You must specify the warehouse name.
15
Connection Description
property
After you complete all wizard pages, save the information and then click Deploy to make the task available as an
executable job to the Secure Agent.
16
Defining basic task information
To begin defining a database ingestion task, you must first enter some basic information about the task, such as a task
name, project or project folder location, and load operation type.
The Definition page of the Mass Ingestion Databases Task wizard appears.
2. Configure the following properties:
Property Description
Location The project or project\folder that will contain the task definition.
Runtime The runtime environment in which you want to run the task.
Environment A database ingestion task must run on a Secure Agent. The runtime environment can include a
Secure Agent Group with only one agent. You cannot use a Hosted Agent.
Load Type The type of load operation that the database ingestion task performs. Options are:
- Initial Load. Loads data read at a specific point in time from source tables to a target in a batch
operation. You can perform an initial load to materialize a target to which incremental change
data will be sent.
- Incremental Load. Propagates source data changes to a target continuously or until the job is
stopped or ends. The job propagates the changes that have occurred since the last time the job
ran or from a specific start point for the first job run.
- Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then
automatically switches to propagating incremental data changes made to the same source tables
on a continuous basis.
3. Click Next.
17
Configuring the Oracle source
Configure the source on the Source page of the database ingestion task wizard.
1. In the Connection list, select the connection for the source system.
The connection must be predefined in Administrator for a runtime environment that your organization uses.
The list includes only the connection types that are valid for the load type selected on the Definition page. No
connections are listed if you did not select a load type.
If you change the load type and the selected connection is no longer valid, a warning message is issued and
the Connection field is cleared. You must select another connection that is valid for the updated load type.
Note: After you deploy the ingestion task, you cannot change the connection without first undeploying the
associated ingestion job. After you change the connection, you must deploy the task again.
2. In the Schema list, select the source schema that includes the source tables.
The list includes only the schemas that are available in the database that is accessed with the specified
source connection.
An expanded view of the Table Selection Rules area appears. By default, this area contains a single Include
rule with a condition that specifies only the asterisk (*) wildcard character. This rule selects all tables in the
specified source schema. For example:
3. To select a subset of the source tables in the schema, you can define additional Include rules, Exclude rules,
or both types of rules.
Define the rules in the order in which you want them to be processed.
When you are done, click Table Count to display the number of source tables that match each rule. The Total
Tables field displays the total number of source tables that match all rules.
Important: Mass Ingestion Databases might exclude an unsupported type of table from processing even if
this table matches the selection rules.
18
4. For an incremental load operation or a combined initial and incremental load operation, if the Oracle
database option for enabling change data capture is not configured for one or more of the selected source
tables, Mass Ingestion Databases generates a script for setting the appropriate database option for the
tables. The script enables unconditional supplemental logging on each source table.
To configure and download or execute the script, perform the following steps:
a. In the CDC Script field, select one of the following options:
• Supplemental logging for all columns. Enables supplemental logging for all columns in the selected
source tables.
• Supplemental logging for primary key columns. Enables supplemental logging for only primary key
columns in the selected source tables.
Note: For source tables without a primary key, including tables with unique indexes, supplemental
logging for all columns is applied by default, regardless of which option is selected.
b. To run the script, click Execute.
c. If you do not have a database role or privilege that allows you to run the script, click Download to
download the script. Then ask your database administrator to run the script.
The script file name has the following format: cdc_script_taskname_number.txt.
Important: Make sure the script runs before you run the database ingestion task.
If you enabled supplemental logging for either all columns or primary key columns and then select the other
supplemental logging option and run the CDC script again, the script first drops supplemental logging for the
original set of columns and then enables supplemental logging for the current set of columns.
5. To create and download a list of the source tables that match the selection rules, perform the following
steps:
a. In the Table Names list, select the type of defined rules that you want to use to select the source tables
to list. Options are:
• Include Rules Only
• Exclude Rules Only
• Include And Exclude Rules
b. If you want to list the columns too, select the Columns check box.
c. Click Download.
The downloaded list has the following format:
status,schema_name,table_name,object_type,column_name,comment
19
The following table describes the information that is displayed in the downloaded list:
Field Description
status Indicates whether Mass Ingestion Databases excludes the source table or column from
processing because it has an unsupported type. Valid values are:
- E. The object is excluded from processing by an Exclude rule.
- I. The object is included in processing.
- X. The object is excluded from processing because it is an unsupported type of object. For
example, unsupported types of objects include columns with unsupported data types and tables
that include only unsupported columns.
The details are specified in the comment field.
object_type Specifies the type of the source object. Valid values are:
- C. Column.
- T. Table.
column_na Specifies the name of the source column. This information appears only if you selected the
me Columns check box.
comment Specifies the reason why a source object of an unsupported type is excluded from processing
even though it matches the selection rules.
For more information, see “Rules for selecting source tables” on page 21.
6. Under Advanced, for an incremental load operation or a combined initial and incremental load operation,
optionally set the Restart Point for Incremental Load property to customize the point in the source logs from
which the database ingestion job starts reading change records the first time it runs. Options are:
• Default. For Oracle, the default restart position is the approximate end of the current online redo log.
• Latest Available. The approximate end of the current Oracle online redo log.
• Position. A valid Oracle SCN that Mass Ingestion uses to determine a position in change stream from
which to start retrieving change records. The SCN must be equal to or less than the current SCN. An
invalid value will cause the job to fail. Default is 0, which results in the default behavior.
• Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Mass Ingestion
uses to determine the position in the change stream from which to start retrieving change records. Mass
Ingestion retrieves only the changes that were started after this date and time. If you enter a date and
time earlier than the earliest date and time in the available archived logs, the job will fail.
The default value is Default.
Notes:
• This restart point option pertains only to the initial run of a job. Thereafter, if you resume a stopped or
aborted job, the job begins propagating source data from where it last left off.
20
• The initial load part of combined initial and incremental load jobs uses Oracle Flashback queries to get
committed data that was current at a specific point in the change stream. If you select the Position or
Specific Date and Time option, ensure that the position or date and time that you specify is within the
flashback retention period of the Oracle source database. Also ensure that no source table is truncated
during the flashback period. If these requirements are not met, the job will fail. You can use the Oracle
DB_FLASHBACK_RETENTION_TARGET parameter to define how far back in time, in minutes, that the
database can be flashed back. The default value is 1440 minutes.
7. Click Next.
By default, an Include rule that contains only the asterisk (*) wildcard character is provided. This rule selects all tables
in the source schema. To narrow the source tables to be processed by the task, you can define additional Include rules,
Exclude rules, or both types of rules.
To define a rule, click the + icon to add a row for the rule. In the row, select the rule type in the Operator column. In the
corresponding Condition column, you can enter a specific table name or a table-name mask. A mask can contain the
asterisk (*) wildcard to represent one or more characters, the question mark (?) wildcard to represent a single
character, or both types of wildcards. A wildcard can occur multiple times in a value and can occur anywhere in the
value.
The task wizard is case sensitive. Enter the table names or masks in the case with which the tables were defined. Do
not include delimiters such as quotation marks or brackets, even if the source database uses them. For example,
Oracle requires quotation marks around lowercase and mixed-case names to force the names to be stored in
lowercase or mixed case. However, in the task wizard, you must enter the lowercase or mixed-case names without
quotation marks. Also, if a table name includes special characters, escape each special character in the name with a
backslash (\) when you enter the rule.
The rules are processed in the order that they are listed. You can change the order by selecting a rule row and clicking
the up-arrow or down-arrow icon button.
Example:
Assume that 1,000 tables are in the source schema. These tables have different prefixes. You want to select the three
tables that have the prefix "2019_SALES" and all tables that match other prefixes except "2019_".
21
Configuring the Snowflake target
Configure the target on the Target page of the database ingestion task wizard.
The following table describes the Snowflake target properties that appear under Target:
Property Description
Target The only available option is Create Target Tables, which generates the target tables based on the source
Creation tables.
Schema Select the target schema in which Mass Ingestion Databases creates the target tables.
Stage The name of internal staging area that holds the data read from the source before the data is written to
the target tables. This name must not include spaces. If the staging area does not exist, it will be
automatically created.
To create a rule for renaming tables, first specify the source tables to which the target tables correspond. Enter only the
asterisk (*) wildcard to select all source tables that match the selection criteria on the Source page. Alternatively, enter
a specific source table name or a table-name pattern that includes the asterisk (*) wildcard, for example, CA*.
If you want to use a table-name pattern with the wildcard character in the Target Table column, you must also use the
wildcard character in the corresponding Source Table value. If you use a specific source table name with a target table
pattern that includes the wildcard character, the task deployment will fail.
22
You can define multiple table rules. The order of the rules does not matter with regard to how they are processed
unless a table matches multiple rules. In this case, the last matching rule determines the name of the table.
1. Under Advanced, set the Number of Rows in Output File value to specify the maximum number of rows that
the database ingestion task writes to an output data file for a Snowflake target.
For incremental load operations and combined initial and incremental load operations, change data is flushed
to the target either when this number of rows is reached or when the flush latency period expires and the job
is not in the middle of processing a transaction. The flush latency period is the time that the job waits for
more change data before flushing data to the target. The latency period is internally set to 10 seconds and
cannot be changed.
Valid values are 1 through 100000000. The default value is 100000 rows.
Note: For Snowflake targets, the data is first stored in an internal stage area before being written to the target
tables.
2. Under Schema Drift Options, specify the schema drift options for each of the supported types of DDL
operations.
Note: Schema drift options are supported only for database ingestion incremental load tasks and combined
initial and incremental load tasks.
The following table describes the schema drift options:
Option Description
Ignore Does not replicate DDL changes that occur on the source database to the target. This option is the
default option for the Drop Column and Rename Column operation types.
Stop Table Stops processing the source table on which the DDL change occurred. When one or more of the tables
are excluded from replication because of the Stop Table schema drift option, the job state changes to
Running with Warning.
Important: The database ingestion job cannot retrieve the data changes that occurred on the source
table after the job stopped processing it. Consequently, data loss might occur on the target. To avoid
data loss, you will need to resynchronize the source and target objects that the job stopped
processing. Use the Resume With Options > Resync option.
Replicate Allows the database ingestion job to replicate the DDL change to the target. This option is the default
option for the Add Column and Modify Column operation types.
Important: If you choose to replicate a type of schema change that is not supported on the target, the
database ingestion job ends with an error.
3. Click Save.
23
Deploy the database ingestion task
After you define a database ingestion task and save it, deploy the task to create an executable job instance on the on-
premises system that contains the Secure Agent and the Mass Ingestion Databases agent and DBMI packages. You
must deploy the task before you can run the job. The deploy process also validates the task definition.
Before you deploy a task with a Snowflake target, drop any existing target tables that do not match the structure of the
source tables, for example, because of added source columns, dropped source or target columns, or altered column
null constraints or data types. When you deploy the task, the target tables are then generated based source table
selections and target renaming rules.
u To deploy a task, in the database ingestion task wizard, save the completed task definition and then click
Deploy.
Note: If you included spaces in the database ingestion task name, the spaces are omitted from the corresponding job
name for the generated job instance.
After you deploy a task successfully, the task is in the Deployed state and you can run it from the My Jobs page in
Mass Ingestion or from the All Jobs tab on the Mass Ingestion page in Monitor.
You can run a job from either the My Jobs page in the Mass Ingestion service or from the All Jobs tab on the Mass
Ingestion page in Monitor.
1. Navigate to the row for the job that you want to run.
2. In the Actions menu for the row, click Run.
A subtask is started for each source table.
Notes:
• If the initial load portion of a combined initial and incremental load job fails to load data from a source
table to a target table, the database ingestion job retries the subtask for the table up to three times. The
interval between retries is a minimum of 60 seconds. If all of the initial load retries fail, the subtask
acquires the state of Error and the table is excluded from replication. The job then tries to proceed with
incremental loading. In this case, the job status changes to Running with Warning.
• For initial load and combined tasks, the initial load might take a long time to perform if the source tables
contain many rows.
On the My Jobs page in Mass Ingestion, you can monitor the ingestion jobs for the tasks that you created
and deployed.
Other resources
For the latest information, see the Mass Ingestion and Administrator help information, which is accessed from within
the Informatica Intelligent Cloud Services interface.
For a high-level overview, see the Mass Ingestion Databases Quick Start, which is available from the Informatica
Documentation Portal right pane.
Author
Informatica Documentation Team
24