0% found this document useful (0 votes)

42 views

DGC - Sources November2023 ApacheAtlasSources en

This document provides instructions for registering and configuring an Apache Atlas catalog source in Metadata Command Center to extract metadata from an Apache Atlas source system. It describes verifying prerequisites, registering the catalog source, configuring capabilities, associating stakeholders, running extraction jobs, and viewing extraction results in Data Governance and Catalog. Key metadata extracted includes Atlas servers, Hive processes, Sqoop processes, calculations, Spark applications/processes, and data lineage.

Uploaded by

Orachai Tassanamethin

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

DGC - Sources November2023 ApacheAtlasSources en

Uploaded by

Orachai Tassanamethin

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Informatica® Metadata Command Center

November 2023

Apache Atlas Sources

Informatica Metadata Command Center Apache Atlas Sources
November 2023
© Copyright Informatica LLC 2023

This software and documentation are provided only under a separate license agreement containing restrictions on use and disclosure. No part of this document may be
reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC.

U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial
computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such,
the use, duplication, disclosure, modification, and adaptation is subject to the restrictions and license terms set forth in the applicable Government contract, and, to the
extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License.

Informatica, Informatica Cloud, Informatica Intelligent Cloud Services, PowerCenter, PowerExchange, and the Informatica logo are trademarks or registered trademarks
of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://
www.informatica.com/trademarks.html. Other company and product names may be trade names or trademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties. Required third party notices are included with the product.

The information in this documentation is subject to change without notice. If you find any problems in this documentation, report them to us at
[email protected].

Informatica products are warranted according to the terms and conditions of the agreements under which they are provided. INFORMATICA PROVIDES THE
INFORMATION IN THIS DOCUMENT "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT.

Publication Date: 2023-11-20

Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 1: Introduction to Apache Atlas catalog sources. . . . . . . . . . . . . . . . . . . . . . 5

Extraction and view process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
About the Apache Atlas catalog source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Extracted metadata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Chapter 2: Before you begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Verify authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Verify permissions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Import the SSL certificate to the Secure Agent machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Get Apache Atlas source information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 3: Create catalog sources in Metadata Command Center. . . . . . . . . . . . 11

Step 1. Register a catalog source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Step 2. Configure capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Configure metadata extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Step 3. Associate stakeholders with technical assets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 4. Run or schedule the job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 5. Connect to referenced source systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 4: View results in Data Governance and Catalog. . . . . . . . . . . . . . . . . . . . . 20

View metadata extraction results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
View data lineage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
View source lineage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
View lineage at data set level and data element level. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Table of Contents 3
Preface
Read Apache Atlas Sources to learn how to register and configure Apache Atlas sources in Metadata
Command Center as catalog sources. After you configure a catalog source, you extract metadata and then
view the results in Data Governance and Catalog.

4
Chapter 1

Introduction to Apache Atlas

catalog sources
You can use Metadata Command Center to extract metadata from a source system.

A source system is any system that contains data or metadata. For example, Apache Atlas is a source
system from which you can extract metadata through an Apache Atlas catalog source with Metadata
Command Center. A catalog source is an object that represents and contains metadata from the source
system.

Before you extract metadata from a source system, you first create and register a catalog source that
represents the source system. You can configure capabilities that represent tasks that the catalog source
can perform.

When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted
metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets,
viewing lineage, and creating links between those assets and their business context.

Extraction and view process

To extract metadata from a source system, configure the catalog source and run the extraction job in
Metadata Command Center. Then view the results in Data Governance and Catalog.

The following image shows the process to extract metadata from an Apache Atlas source system:

5
After you verify prerequisites, perform the following tasks to extract metadata from Apache Atlas:

1. Register a catalog source. Create a catalog source object, select the source system, and specify values
for connection properties.
2. Configure the catalog source. Specify the runtime environment, optionally configure parameters for the
metadata extraction capability, and add filters for metadata extraction.
3. Associate stakeholders. Optionally, associate users with technical assets, giving the users permission to
perform actions determined by their roles.
4. Run or schedule the catalog source job.
5. Optionally, assign a connection to referenced source system assets.

After you run the catalog source job, you view the results in Data Governance and Catalog.

About the Apache Atlas catalog source

You can use the Apache Atlas catalog source to extract metadata from an Apache Atlas source system.

Apache Atlas is the governance and metadata framework for Hadoop. Apache Atlas has a scalable and
extensible architecture that can be plugged into many Hadoop components to manage their metadata in a
central repository.

Extracted metadata
You can extract metadata from an Apache Atlas source system.

Objects extracted
Metadata Command Center extracts the following metadata from an Apache Atlas source system:

• Atlas Server
• Hive Process
• Sqoop Process
• Calculation
Note: Calculation objects are extracted when there is column-level lineage from one asset to another in
Hive and Sqoop processes.
• Spark Application
• Spark Process
The Apache Atlas catalog source extracts data lineage from the following data sources:

• Oracle
• MySQL
• PostgreSQL
• Apache Hive
• Hadoop Distributed File System (HDFS)
• Apache HBase

6 Chapter 1: Introduction to Apache Atlas catalog sources

Note: Metadata Command Center skips extraction of Hive processes and the associated lineage links for the
following operation types:

• CREATETABLE
• CREATEVIEW
• CREATE_MATERIALIZED_VIEW

Metadata Command Center extracts folders as reference objects from Hadoop Distributed File System.

Metadata Command Center extracts the following objects as reference objects from Apache Hive:

• Schema
• Table
• View
• External Table
• Column

Field and column objects are extracted when there is column-level lineage from one asset to another in
Apache Atlas.

Extracted metadata 7
Chapter 2

Before you begin

Before you can extract catalog source metadata, get information from the Apache Atlas administrator.

Perform the following prerequisite tasks:

• Verify authentication.
• Verify permissions.
• Import SSL certificate to the JRE folder of the Informatica Secure Agent.
• Get Apache Atlas source information.

Verify authentication
To extract Apache Atlas metadata, verify that you have the URL to access Apache Atlas and connect to the
Atlas REST API.

You need to provide the Kerberos principal for authentication when you configure the Apache Atlas catalog
source in Metadata Command Center.

Complete the following tasks:

• Add details of the Kerberos server to the host file located in the Secure Agent machine in the following
format: <ip_address> <hostname>
On a Windows machine, the host file is available in the following path: C:\Windows\System32\drivers
\etc\hosts
On a Linux machine, the host file is available in the following path: /etc/hosts
• Copy the Atlas Keytab file from the Hadoop cluster to any location on the Secure Agent machine.
• Enable Atlas hook from Hadoop Distributed File System and Apache Hive configurations so that Apache
Atlas can read the metadata.
• Copy the Kerberos configuration file from the Hadoop cluster to any location on the Secure Agent
machine. You can modify the Kerberos configuration file as per requirement.
The following code shows a sample Kerberos configuration file:
[libdefaults]
default_realm = *****
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = rc4-hmac
default_tkt_enctypes = rc4-hmac

8
permitted_enctypes = rc4-hmac
udp_preference_limit = 1
kdc_timeout = 3000
allow_weak_crypto=true
[realms]
<domain name> = {
kdc = *****
admin_server = *****
}
[domain_realm]
Note: If the Kerberos encryption algorithms are not compatible with Java Standard Edition version 11, you
can add the allow_weak_crypto=true property in the Kerberos configuration file.

Verify permissions
Verify that you have the following account and permissions:

• A user account to access and extract metadata from the Apache Atlas source system.
• Read permission for the account to access the Apache Atlas source system.

Import the SSL certificate to the Secure Agent

machine
If SSL is enabled on the Apache Atlas source system, import the SSL certificate to the JRE folder of the
Informatica Secure Agent installation directory.

Complete the following steps to import the SSL certificate:

1. Download the SSL certificate from the Apache Atlas installation.

2. Copy the SSL certificate file to any location on the Informatica Secure Agent machine.
3. Identify the Java version for the Secure Agent.
You can identify the Java version from the following files:
• <Informatica Secure Agent installation directory>\apps\agentcore\agentcore.log
Search for "AgentCore JRE version".
• On Windows operating systems, open the lcm-env.bat file from one of the following locations:
- <Informatica Secure Agent installation directory>\apps\agentcore\<latest version>
\.lcm
- <Informatica Secure Agent installation directory>\apps\DIS\<latest version>\.lcm
• On Linux operating systems, open the lcm-env.sh file from one of the following locations:
- <Informatica Secure Agent installation directory>/apps/agentcore/<latest version>
/.lcm
- <Informatica Secure Agent installation directory>/apps/DIS/<latest version>/.lcm

4. Open a command prompt from the following directory:

<Informatica Secure Agent installation directory>\apps\jdk\<latest version>\jre\bin

Verify permissions 9
5. Run the following command to import the SSL certificate:
keytool -import -alias <alias name> -keystore <path to cacert file> -file <absolute
path to SSL certificate>

Note: The Java certificate file is named cacerts and is located in the following Java directory: \jre\lib
\security\cacerts
For example, you can run the following command for Windows operating systems:
keytool -import -alias aliasname -keystore "C:\data\devprod\jdk\jre\lib\security
\cacerts" -file "C:\data\devprod\filename.crt"

6. Restart the Secure Agent.

Get Apache Atlas source information

Before you configure the catalog source, ask the Apache Atlas administrator for connection information that
you need to configure the catalog source.

Note: You don't need to create a connection object for Apache Atlas. You provide this information when you
configure the catalog source.

The following table describes the properties that you need:

Property Description

Base URL URL to access Apache Atlas and connect to the Atlas REST API.

Principal The Kerberos principal used for authentication.

Keytab File Path The absolute path to the Kerberos keytab file located on the Secure Agent machine used for
authentication.

Configuration File Path The absolute path to the Kerberos configuration file located on the Secure Agent machine
used for authentication.

10 Chapter 2: Before you begin

Chapter 3

Create catalog sources in

Metadata Command Center
Use Metadata Command Center to configure a catalog source for Apache Atlas and extract metadata.

When you configure a catalog source, you define the source system from which you want to extract
metadata. Configure filters to include or exclude source system metadata before you run the job.

To provide stakeholders access to technical assets, you can assign access through roles. To view lineage for
any system that the source system references, create a catalog source and a connection associated with the
referenced source system after you run the job.

Step 1. Register a catalog source

When you register a catalog source, provide general information and connection values.

1. Log in to Informatica Intelligent Cloud Services.

The My Services page appears.
2. Click Metadata Command Center.
The following image shows the Metadata Command Center box on the My Services page:

11
3. Click New from the menu.
4. Select Catalog Source from the list of asset types.
5. Select Apache Atlas from the list of source systems.

6. Click Create.
The following image shows the Apache Atlas registration information:

7. In the General Information area, enter a name and an optional description for the catalog source.
Note: After you create a catalog source, you can't change the name.
8. In the Connection Information area, enter the Apache Atlas connection information based on the
connection values that you got from the administrator.
The following table describes the properties to configure:

Property Description

Base URL URL to access Apache Atlas and connect to the Atlas REST API.

Principal The Kerberos principal used for authentication.

12 Chapter 3: Create catalog sources in Metadata Command Center

Property Description

Keytab File Path The absolute path to the Kerberos keytab file located on the Secure Agent machine
used for authentication.

Configuration File Path The absolute path to the Kerberos configuration file located on the Secure Agent
machine used for authentication.

9. Click Next.
The Configuration page appears.

Step 2. Configure capabilities

When you configure the Apache Atlas catalog source, you define the settings for the metadata extraction
capability.

The metadata extraction capability extracts source metadata from external source systems.

Configure metadata extraction

When you configure the Apache Atlas catalog source, you choose a runtime environment, define filters, and
enter configuration parameters for metadata extraction.

Before you configure metadata extraction, configure runtime environments in the Informatica Intelligent
Cloud Services Administrator.

1. In the Connection and Runtime area, choose the Secure Agent group where you want to run catalog
source jobs.
2. Choose to retain or delete objects that are deleted from the source in the catalog using the Metadata
Change Option.
• Retain. Retains objects that are deleted from the source in the catalog. If you update or add a filter,
the catalog retains objects extracted from the previous job and extracts additional objects that match
the current filter. Objects deleted from the source are not deleted from the catalog. Enrichments
added on deleted objects and relationships are retained.
• Delete. Deletes metadata from the catalog based on objects deleted from the source and changes
you make to the filter. Enrichments added on deleted objects and relationships are also permanently
lost. Objects renamed in the source are removed and recreated in the catalog.
Note: You can also change the configured metadata change option when you run a catalog source.
3. In the Filters area, define one or more filter conditions to apply for metadata extraction:
a. From the Include or Exclude metadata list, choose to include or exclude metadata based on the filter
parameters.
b. From the Object type list, select Hive Database, HDFS Path, or HBase Namespace.
c. Enter the filter values.
Filters can contain the following wildcards:
• Question mark. Represents a single character.

Step 2. Configure capabilities 13

• Asterisk. Represents multiple characters or empty text.
The following image shows the filter condition options:

d. To define an additional filter with an OR condition, click the Add icon.

The following image shows that the filter includes metadata related to Hive tables in the HR
database with names that start with EMP followed by a single character, includes metadata related
to the table named HbaseTable located in the HbaseNS namespace, and excludes metadata related
to all files in the hdfsfolder1 folder and its subfolders:

Exclude filter conditions are considered if the assets in the include filter conditions are not related or
linked through lineage to the excluded assets. For example, add a filter condition to include metadata
related to all tables with the name EMP across all databases (*.EMP) and then add another filter
condition to exclude metadata related to the EMP table located in the HR database (HR.EMP). Here, the
exclude filter condition is considered as the assets are not related or linked through lineage.
Exclude filter conditions are not considered if the assets in the include filter conditions are related or
linked through lineage to the excluded assets. For example, add a filter condition to include metadata
related to EMP table in the HR database (HR.EMP) and then add another filter condition to exclude
metadata related to SAL table in the same HR database (HR.SAL). Here, the exclude filter condition is
not considered due to the presence of lineage links between the EMP and SAL tables.
If you add a filter condition to include metadata from a table deleted from the Apache Atlas source
system, Metadata Command Center ignores the filter condition.
If the value of the HDFS Path filter contains special characters, replace the special characters with an
asterisk wildcard character. For example, replace /Test$~^!()*<>_Folder with /Test*Folder.
4. In the Configuration Parameters area, enter configuration properties.
Note: Click Show Advanced to view all configuration parameters.

14 Chapter 3: Create catalog sources in Metadata Command Center

The following table describes the properties that you can enter:

Property Description

Lineage Direction The direction of data flow between assets that you extract from Apache Atlas with the
direction parameter of the LineageRESTAPI.
Select one of the following options:
- BOTH. Extracts both input and output data flow between assets.
- INPUT. Extracts only input data flow between assets.
- OUTPUT. Extracts only output data flow between assets.

Lineage Depth The number of lineage hops to extract from Apache Atlas for filtered assets with the depth
parameter of the LineageRESTAPI.
Default is 3.

Page Result Limit Advanced parameter. The maximum number of search result entries per page from a fetch
using the limit parameter of the DiscoveryRESTAPI.
Default is 1000.

Entity Bulk Fetch Advanced parameter. The maximum number of entities to include in a bulk fetch when you
Count use the BulkEntityRESTAPI.
Default is 100.

Connection Advanced parameter. The maximum amount of time, in milliseconds, that the Secure Agent
Timeout waits to set up an HTTP connection to communicate and get a response from the Apache
Atlas server.
Default is -1 which means timeout is disabled.

Parallel Lineage Advanced parameter. The maximum number of LineageRESTAPI calls that can run
Fetch Count simultaneously to retrieve lineage data.
Default is 5.

5. Optionally, in the Configuration Parameters area, enter additional settings.

The following table describes the property that you enter for Additional settings.
Note: The Additional settings section appears when you click Show Advanced.

Property Description

Expert Enter additional configuration options to be passed at runtime. Required if you need to
parameters troubleshoot the catalog source job.
Caution: Use expert parameters when it is recommended by Informatica Global Customer
Support.

6. Click Next.
The Associations page appears.

Step 2. Configure capabilities 15

Step 3. Associate stakeholders with technical assets
You can provide users and groups access to technical assets in Data Governance and Catalog through the
roles assigned to them. Choose a role to associate with the technical assets, and then choose the
stakeholders within that role to provide access to the assets.

Verify that the organization administrator assigned users and user groups to the role that you want to
associate with technical assets.

1. On the Associations page, select Assign Stakeholders.

2. Select a role to assign to the technical assets for the catalog source that you are configuring.
3. Click Select to add users and user groups as stakeholders for the technical assets.
The Add Users & User Groups dialog box displays the list of users and user groups that the organization
administrator has assigned to the selected role.

4. Select one or more users or user groups to assign as stakeholders for the technical assets, and click OK.
Only the selected users and user groups belonging to the specified role are granted the role-defined
permissions to technical assets.
5. You can assign more than one role to technical assets and add users and user groups from each role. To
assign more roles, click the add button.
6. Choose to save and run the job or to schedule a recurring job.
• To save and run the job, click Save and then Run.
• To schedule a recurring job, click Next to open the Schedule page.

Step 4. Run or schedule the job

Choose to run the job manually, or configure it to run on schedule.

The first time that you run the job, Metadata Command Center extracts the metadata. Subsequently, each
time that you run the job, Metadata Command Center synchronizes the catalog with the source system.

Note: You can't run multiple jobs simultaneously.

16 Chapter 3: Create catalog sources in Metadata Command Center

Run the job manually
If you didn't run the job from the Filters or Association pages, you can run it from the Schedule page.

1. On the Schedule page, click Run to run the job.

The Run Catalog Source Job dialog box opens.
2. Click Run.

Run the job on a schedule

You can choose to run metadata extraction on a recurring schedule.

1. On the Schedule page, click the Run on Schedule checkbox to schedule the job.
The Schedule configuration page opens.
2. Enter the start date, time zone, and the interval at which you want to run the job.
3. Click Save to save the schedule.

Monitor job status

After the job runs, you can monitor the status of the job on the Overview page for the job.

For more information about job monitoring, see Administration.

Step 5. Connect to referenced source systems

If the source system references another source system, create a connection assignment in Metadata
Command Center to view complete data lineage. To create a connection assignment, create a catalog source
based on the reference source system, and then assign the connection to the catalog source. A reference
source system can be a file system such as Hadoop Distributed File System or relational databases such as
Apache Hive, Oracle, MySQL, and PostgreSQL.

Before you assign a connection, create a catalog source for each reference source system and run the
catalog source job.

Note: You can view the lineage with reference objects without creating a connection assignment. After
connection assignment, you can view the actual objects.

Apache Atlas uses Sqoop queries to import data from a reference source system to a Hive database. If the
Sqoop query contains double quotes, replace the double quotes with backticks (`) in the Apache Atlas source
system to view the reference objects correctly in Data Governance and Catalog.

1. On the Monitor page, select the Connection Assignment tab.

The Connection Assignment panel displays a list of assigned and unassigned connections along with
details for each connection.
2. Select the connection to the reference source system and click Assign.

Step 5. Connect to referenced source systems 17

The following image shows the Assign button and the list of connections:

3. In the Assign Connection dialog box, select one or more endpoint objects to assign to the selected
connection and click Assign.
You can filter the list in the Assign Connection dialog box by name, type, or endpoint.
You can create a connection assignment to the following catalog source types:
• Apache Hive. The target endpoint object must belong to the Database class type.
• Hadoop Distributed File System. The target endpoint object must belong to the File System class
type.
• Oracle. The target endpoint object must belong to the Database class type.
• MySQL. The target endpoint object must belong to the Database class type.
• PostgreSQL. The target endpoint object must belong to the Database class type.
Note: You can assign connections to Oracle, MySQL, and PostgreSQL catalog sources only when
Metadata Command Center extracts Sqoop processes from an Apache Atlas source system.

18 Chapter 3: Create catalog sources in Metadata Command Center

The following image shows the Assign Connection dialog box:

When you click Assign, Metadata Command Center creates links between matching objects in the
connected catalog sources, and it calculates the percentage of matched and unmatched objects. The
higher the percentage of matched objects, the more accurate the lineage that you view in Data
Governance and Catalog.

Step 5. Connect to referenced source systems 19

Chapter 4

View results in Data Governance

and Catalog
After Metadata Command Center runs a job, you can view the results in Data Governance and Catalog where
the catalog source and its elements are called technical assets. You can view a catalog source as a
hierarchy. Expand each technical asset to see its components.

When referenced source systems are connected to a catalog source, you can expand the hierarchy to see
details about the technical asset's component elements.

You can view the data lineage of an asset contained within a catalog source to see individual elements such
as data sources, calculations, and filters. When you view data lineage, you can see the individual upstream
elements that contribute data or expressions to each component of a data flow or catalog source.

View metadata extraction results

After a job runs in Metadata Command Center, view the results in Data Governance and Catalog. You can
view details about source system contents as hierarchical displays and trace data lineage.

1. Log in to Informatica Intelligent Cloud Services.

The My Services page appears.
2. Click Data Governance and Catalog.
The following image shows the Data Governance and Catalog box on the My Services page:

20
3. On the Data Governance and Catalog home page, click the number in the Technical Assets panel.
The Technical Assets page opens.
4. Select Catalog Source in the Filter list.
The list of catalog sources opens.
5. Search for the catalog source from which you extracted metadata, and click the name.
The Overview tab of the asset opens.
The following image shows a sample asset page:

View metadata extraction results 21

6. View the asset from different perspectives by clicking on the tabs.
Note: You can view the calculation properties such as expression, control conditions, and calculation
complexity in the Overview tab of a calculation asset.
If a table or column is deleted from the source system, the Technical Description field in the Overview
tab and the Comment field in the System Attributes tab display the value as Deleted.
For more information about working with assets, see Working with Assets in the Data Governance and
Catalog help.

View data lineage

Data lineage views are available for technical assets in the catalog source. You can view lineage at the
source, target, data set, or data element level.

Data lineage is a visual representation of the flow of data across the systems in your organization. Lineage
depicts how the data flows from the system of its origin to the system of its destination.

View source lineage

The lineage diagram at the source or target level shows how data assets refer to and use sources.

To view data lineage at the source or target level, search for and open a technical asset, and then click the
Lineage tab.

The following image shows how the LOAD Hive process loads data from the ORDERS.avro reference source
file to the avro_load reference target table before connection assignment:

The following image shows how the LOAD Hive process loads data from the ORDERS.avro actual source file
to the avro_load actual target table after connection assignment:

22 Chapter 4: View results in Data Governance and Catalog

View lineage at data set level and data element level
The lineage at the data set level and the data element level shows how the technical assets such as files and
commands contribute to the selected asset.

Data sets are technical assets that contain sets of data. Examples include files, databases, or temp files that
hold the results of calculations. Data elements are objects upstream or downstream of a data set, and are
accessible when you expand a data set to the data element level. For example, a column in a source object.

View lineage at the data set level

The data set level is a view that shows individual sets of data in the data flow. To view lineage at the data set
level, open a technical asset, click the Lineage tab, and then verify that the level is set to Data Set Level.

The following image shows table-level lineage where the avro_table referenced table gets data from the
csv_format_parent referenced table after data transformation using the storageformats.avro Hive process
before connection assignment:

The following image shows table-level lineage where the avro_table actual table gets data from the
csv_format_parent actual table after data transformation using the storageformats.avro Hive process after
connection assignment:

View data lineage 23

View lineage at the data element level
The data element level displays detailed information. At the data element level, you can see the input sources
for expressions or commands and calculations or transformations on the data. To view data lineage at the
data element level, open a technical asset, click the Lineage tab, and then verify that the level is set to Data
Element Level.

The following image shows column-level lineage where the col_decimal referenced column of the avro_table
gets data from the col_decimal referenced column of the csv_format_parent table after data transformation
using the storageformats.avro Hive process before connection assignment:

The following image shows column-level lineage where the col_decimal actual column of the avro_table gets
data from the col_decimal actual column of the csv_format_parent table after data transformation using the
storageformats.avro Hive process after connection assignment:

24 Chapter 4: View results in Data Governance and Catalog

0004 Tech Tips
No ratings yet
0004 Tech Tips
51 pages
Metadata Manager Guide
No ratings yet
Metadata Manager Guide
140 pages
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
5/5 (1)
OCA Oracle Database 11g Database Administration I: A Real-World Certification Guide
From Everand
OCA Oracle Database 11g Database Administration I: A Real-World Certification Guide
Steve Ries
4/5 (1)
Learning Informatica PowerCenter 9.x
From Everand
Learning Informatica PowerCenter 9.x
Rahul Malewar
3/5 (4)
CTS Batch - GenC AIA - Informatica - IICS - DataStage - Curriculum
100% (1)
CTS Batch - GenC AIA - Informatica - IICS - DataStage - Curriculum
69 pages
Dude Where Is My Chair
No ratings yet
Dude Where Is My Chair
4 pages
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
WhatSNewAndChanged en
No ratings yet
WhatSNewAndChanged en
402 pages
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
From Everand
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
Saurabh Chhajed
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Informatica Powercenter 7 Level I Developer: Education Services
100% (1)
Informatica Powercenter 7 Level I Developer: Education Services
289 pages
SDI April2023 Mappings en
No ratings yet
SDI April2023 Mappings en
61 pages
Koha 3 Library Management System
From Everand
Koha 3 Library Management System
Savitra Sirohi
2.5/5 (2)
Building Websites with OpenCms
From Everand
Building Websites with OpenCms
Matt Butcher
No ratings yet
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
Cloud Data Governance and Catalog - Azure Case Study
No ratings yet
Cloud Data Governance and Catalog - Azure Case Study
29 pages
PC 811 AdministratorGuide
No ratings yet
PC 811 AdministratorGuide
494 pages
Informatica 9.5 Training in Chennai
No ratings yet
Informatica 9.5 Training in Chennai
57 pages
Azure Data Catalog Short Set
No ratings yet
Azure Data Catalog Short Set
23 pages
CDI HiveConnector En
No ratings yet
CDI HiveConnector En
55 pages
Ref360_February2023_WhatSNew_en
No ratings yet
Ref360_February2023_WhatSNew_en
25 pages
IN_1021_ReleaseGuide_en
No ratings yet
IN_1021_ReleaseGuide_en
221 pages
Hibernate, Spring & Struts Interview Questions You'll Most Likely Be Asked
From Everand
Hibernate, Spring & Struts Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
CDI October2021 Transformations en
No ratings yet
CDI October2021 Transformations en
385 pages
CDI Fall2019January Transformations en
No ratings yet
CDI Fall2019January Transformations en
262 pages
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
Manual Magnet AXIOM
No ratings yet
Manual Magnet AXIOM
345 pages
Mastering Zabbix - Second Edition
From Everand
Mastering Zabbix - Second Edition
Vacche Andrea Dalle
No ratings yet
CDI August2022 Transformations en
No ratings yet
CDI August2022 Transformations en
410 pages
OCP Oracle Database 11g Administration II Exam Guide: Exam 1Z0-053
From Everand
OCP Oracle Database 11g Administration II Exam Guide: Exam 1Z0-053
Bob Bryla
No ratings yet
Manual para Desarrolladores Geonetwork
No ratings yet
Manual para Desarrolladores Geonetwork
239 pages
MetaQuery User Guide
No ratings yet
MetaQuery User Guide
15 pages
What's New: Informatica Cloud Data Integration Winter 2019 April
No ratings yet
What's New: Informatica Cloud Data Integration Winter 2019 April
28 pages
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
In 1022 IntegrationGuide en
No ratings yet
In 1022 IntegrationGuide en
196 pages
DIH 10.5 Installation Guide
No ratings yet
DIH 10.5 Installation Guide
133 pages
In 1021 EnterpriseDataCatalog (REST-API) Reference en
100% (1)
In 1021 EnterpriseDataCatalog (REST-API) Reference en
59 pages
JasperReports for Java Developers: Create, Design, Format, and Export Reports with the World's Most Popular Java Reporting Library
From Everand
JasperReports for Java Developers: Create, Design, Format, and Export Reports with the World's Most Popular Java Reporting Library
David R. Heffelfinger
No ratings yet
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
Mastering Python Advanced Concepts and Practical Applications
From Everand
Mastering Python Advanced Concepts and Practical Applications
Aissa Younes
No ratings yet
Informatica
100% (1)
Informatica
289 pages
Enterprise Data Catalog Resource Configuration Reference
No ratings yet
Enterprise Data Catalog Resource Configuration Reference
29 pages
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
Geo Network User Manual
No ratings yet
Geo Network User Manual
244 pages
Met A Data
No ratings yet
Met A Data
19 pages
DX 1053 B2BDataExchangeInstallationAndConfigurationGuide en
No ratings yet
DX 1053 B2BDataExchangeInstallationAndConfigurationGuide en
151 pages
Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS
From Everand
Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS
Joe Minichino
No ratings yet
Informatica Powercenter 7.1 Basics: Education Services
No ratings yet
Informatica Powercenter 7.1 Basics: Education Services
287 pages
Data Observability & Discovery Platform - OpenMetadata - by Amit Singh Rathore - Geek Culture - Medium
No ratings yet
Data Observability & Discovery Platform - OpenMetadata - by Amit Singh Rathore - Geek Culture - Medium
12 pages
Troubleshooting NetScaler
From Everand
Troubleshooting NetScaler
Raghu Varma Tirumalaraju
No ratings yet
Geo Network User Manual
No ratings yet
Geo Network User Manual
248 pages
IICS July2024 OrganizationAdministration en
No ratings yet
IICS July2024 OrganizationAdministration en
88 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Microsoft SQL Server 2005: A Beginner''s Guide
From Everand
Microsoft SQL Server 2005: A Beginner''s Guide
Dusan Petkovic
No ratings yet
JBoss Tools 3 Developers Guide
From Everand
JBoss Tools 3 Developers Guide
Anghel Leonard
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
PC 81 TransformationGuide
No ratings yet
PC 81 TransformationGuide
552 pages
XactlyConnector en
No ratings yet
XactlyConnector en
16 pages
PC 811 TransformationGuide
No ratings yet
PC 811 TransformationGuide
642 pages
Informatica Enterprise Data Catalog On The AWS Cloud Marketplace
No ratings yet
Informatica Enterprise Data Catalog On The AWS Cloud Marketplace
25 pages
Best Practices For Implementing Cloud Data Governance and Catalog
No ratings yet
Best Practices For Implementing Cloud Data Governance and Catalog
45 pages
Deployment Guidelines For CDGC, CDQ, CDP, and CDMP
No ratings yet
Deployment Guidelines For CDGC, CDQ, CDP, and CDMP
75 pages
IDMC Best Practices and Standards
100% (1)
IDMC Best Practices and Standards
27 pages
SCIM Configuration in IICS Admin
No ratings yet
SCIM Configuration in IICS Admin
16 pages
Architect Academy - IDMC Security Overview
No ratings yet
Architect Academy - IDMC Security Overview
28 pages
Azure Key Vault Integration With IICS
No ratings yet
Azure Key Vault Integration With IICS
23 pages
Cloud Data Marketplace Rest Accelerator Pack
No ratings yet
Cloud Data Marketplace Rest Accelerator Pack
19 pages
COMP P2-New
No ratings yet
COMP P2-New
3 pages
Dhannada Abhilash Yadav: Educational Qualification
No ratings yet
Dhannada Abhilash Yadav: Educational Qualification
3 pages
Lab 15
No ratings yet
Lab 15
3 pages
EAR 4 5k Manual ITSv7r1
No ratings yet
EAR 4 5k Manual ITSv7r1
145 pages
Director Program Management IT in Philadelphia PA Resume Edward Mayer
No ratings yet
Director Program Management IT in Philadelphia PA Resume Edward Mayer
2 pages
User Acceptance Test Plan
No ratings yet
User Acceptance Test Plan
4 pages
Task Scheduler
No ratings yet
Task Scheduler
1 page
Yimm Dpack Web: Migration Manual
No ratings yet
Yimm Dpack Web: Migration Manual
14 pages
Binary Codes
No ratings yet
Binary Codes
40 pages
Lab 1 - Installation and Demo
No ratings yet
Lab 1 - Installation and Demo
48 pages
Dds 2v0
100% (2)
Dds 2v0
14 pages
Applying Theory of Constraints Tools To Focus Lean Development
No ratings yet
Applying Theory of Constraints Tools To Focus Lean Development
63 pages
How Do You Use Tor So Well That Even The FBI Can't Track You - Quora
No ratings yet
How Do You Use Tor So Well That Even The FBI Can't Track You - Quora
1 page
Java Definitions
No ratings yet
Java Definitions
93 pages
Safe Passage Hardware Requirements v1.6
No ratings yet
Safe Passage Hardware Requirements v1.6
2 pages
Algorithm: Sqi Ul
No ratings yet
Algorithm: Sqi Ul
2 pages
Ai Programs
No ratings yet
Ai Programs
29 pages
BMC-B1 Software Manual
No ratings yet
BMC-B1 Software Manual
16 pages
UVM Tips and Tricks
0% (1)
UVM Tips and Tricks
17 pages
The Challenges Tomcat Faces in High Throughput Production Systems
No ratings yet
The Challenges Tomcat Faces in High Throughput Production Systems
48 pages
Poe Access Switch Technical Specifications
No ratings yet
Poe Access Switch Technical Specifications
3 pages
Nouveau Document Texte
No ratings yet
Nouveau Document Texte
2 pages
Zoommodem V.92 Pci: Model 3025
No ratings yet
Zoommodem V.92 Pci: Model 3025
2 pages
t2 I 85 Catch The Bugs Scratch Worksheet - Ver - 4
No ratings yet
t2 I 85 Catch The Bugs Scratch Worksheet - Ver - 4
4 pages
Computer Awareness - Computer Awareness-2
No ratings yet
Computer Awareness - Computer Awareness-2
16 pages
OS XP Install Guide
No ratings yet
OS XP Install Guide
19 pages
Digital Forensics: Computer Forensics
No ratings yet
Digital Forensics: Computer Forensics
26 pages
Materi 4 Teknik Elisitasi-CAK2AAB3-TELU
No ratings yet
Materi 4 Teknik Elisitasi-CAK2AAB3-TELU
31 pages

DGC - Sources November2023 ApacheAtlasSources en

Uploaded by

DGC - Sources November2023 ApacheAtlasSources en

Uploaded by

Informatica® Metadata Command Center

Apache Atlas Sources

Publication Date: 2023-11-20

Chapter 1: Introduction to Apache Atlas catalog sources. . . . . . . . . . . . . . . . . . . . . . 5

Chapter 2: Before you begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Chapter 3: Create catalog sources in Metadata Command Center. . . . . . . . . . . . 11

Chapter 4: View results in Data Governance and Catalog. . . . . . . . . . . . . . . . . . . . . 20

Introduction to Apache Atlas

Extraction and view process

About the Apache Atlas catalog source

6 Chapter 1: Introduction to Apache Atlas catalog sources

Before you begin

Perform the following prerequisite tasks:

Complete the following tasks:

Import the SSL certificate to the Secure Agent

Complete the following steps to import the SSL certificate:

1. Download the SSL certificate from the Apache Atlas installation.

4. Open a command prompt from the following directory:

6. Restart the Secure Agent.

Get Apache Atlas source information

The following table describes the properties that you need:

Principal The Kerberos principal used for authentication.

10 Chapter 2: Before you begin

Create catalog sources in

Step 1. Register a catalog source

1. Log in to Informatica Intelligent Cloud Services.

Principal The Kerberos principal used for authentication.

12 Chapter 3: Create catalog sources in Metadata Command Center

Step 2. Configure capabilities

Configure metadata extraction

Step 2. Configure capabilities 13

d. To define an additional filter with an OR condition, click the Add icon.

14 Chapter 3: Create catalog sources in Metadata Command Center

5. Optionally, in the Configuration Parameters area, enter additional settings.

Step 2. Configure capabilities 15

1. On the Associations page, select Assign Stakeholders.

Step 4. Run or schedule the job

Note: You can't run multiple jobs simultaneously.

16 Chapter 3: Create catalog sources in Metadata Command Center

1. On the Schedule page, click Run to run the job.

Run the job on a schedule

Monitor job status

For more information about job monitoring, see Administration.

Step 5. Connect to referenced source systems

1. On the Monitor page, select the Connection Assignment tab.

Step 5. Connect to referenced source systems 17

18 Chapter 3: Create catalog sources in Metadata Command Center

Step 5. Connect to referenced source systems 19

View results in Data Governance

View metadata extraction results

1. Log in to Informatica Intelligent Cloud Services.

View metadata extraction results 21

View data lineage

View source lineage

22 Chapter 4: View results in Data Governance and Catalog

View lineage at the data set level

View data lineage 23

24 Chapter 4: View results in Data Governance and Catalog

You might also like