0% found this document useful (0 votes)
42 views

DGC - Sources November2023 ApacheAtlasSources en

This document provides instructions for registering and configuring an Apache Atlas catalog source in Metadata Command Center to extract metadata from an Apache Atlas source system. It describes verifying prerequisites, registering the catalog source, configuring capabilities, associating stakeholders, running extraction jobs, and viewing extraction results in Data Governance and Catalog. Key metadata extracted includes Atlas servers, Hive processes, Sqoop processes, calculations, Spark applications/processes, and data lineage.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

DGC - Sources November2023 ApacheAtlasSources en

This document provides instructions for registering and configuring an Apache Atlas catalog source in Metadata Command Center to extract metadata from an Apache Atlas source system. It describes verifying prerequisites, registering the catalog source, configuring capabilities, associating stakeholders, running extraction jobs, and viewing extraction results in Data Governance and Catalog. Key metadata extracted includes Atlas servers, Hive processes, Sqoop processes, calculations, Spark applications/processes, and data lineage.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Informatica® Metadata Command Center

November 2023

Apache Atlas Sources


Informatica Metadata Command Center Apache Atlas Sources
November 2023
© Copyright Informatica LLC 2023

This software and documentation are provided only under a separate license agreement containing restrictions on use and disclosure. No part of this document may be
reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC.

U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial
computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such,
the use, duplication, disclosure, modification, and adaptation is subject to the restrictions and license terms set forth in the applicable Government contract, and, to the
extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License.

Informatica, Informatica Cloud, Informatica Intelligent Cloud Services, PowerCenter, PowerExchange, and the Informatica logo are trademarks or registered trademarks
of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://
www.informatica.com/trademarks.html. Other company and product names may be trade names or trademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties. Required third party notices are included with the product.

The information in this documentation is subject to change without notice. If you find any problems in this documentation, report them to us at
[email protected].

Informatica products are warranted according to the terms and conditions of the agreements under which they are provided. INFORMATICA PROVIDES THE
INFORMATION IN THIS DOCUMENT "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT.

Publication Date: 2023-11-20


Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 1: Introduction to Apache Atlas catalog sources. . . . . . . . . . . . . . . . . . . . . . 5


Extraction and view process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
About the Apache Atlas catalog source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Extracted metadata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Chapter 2: Before you begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8


Verify authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Verify permissions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Import the SSL certificate to the Secure Agent machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Get Apache Atlas source information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 3: Create catalog sources in Metadata Command Center. . . . . . . . . . . . 11


Step 1. Register a catalog source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Step 2. Configure capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Configure metadata extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Step 3. Associate stakeholders with technical assets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 4. Run or schedule the job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 5. Connect to referenced source systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 4: View results in Data Governance and Catalog. . . . . . . . . . . . . . . . . . . . . 20


View metadata extraction results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
View data lineage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
View source lineage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
View lineage at data set level and data element level. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Table of Contents 3
Preface
Read Apache Atlas Sources to learn how to register and configure Apache Atlas sources in Metadata
Command Center as catalog sources. After you configure a catalog source, you extract metadata and then
view the results in Data Governance and Catalog.

4
Chapter 1

Introduction to Apache Atlas


catalog sources
You can use Metadata Command Center to extract metadata from a source system.

A source system is any system that contains data or metadata. For example, Apache Atlas is a source
system from which you can extract metadata through an Apache Atlas catalog source with Metadata
Command Center. A catalog source is an object that represents and contains metadata from the source
system.

Before you extract metadata from a source system, you first create and register a catalog source that
represents the source system. You can configure capabilities that represent tasks that the catalog source
can perform.

When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted
metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets,
viewing lineage, and creating links between those assets and their business context.

Extraction and view process


To extract metadata from a source system, configure the catalog source and run the extraction job in
Metadata Command Center. Then view the results in Data Governance and Catalog.

The following image shows the process to extract metadata from an Apache Atlas source system:

5
After you verify prerequisites, perform the following tasks to extract metadata from Apache Atlas:

1. Register a catalog source. Create a catalog source object, select the source system, and specify values
for connection properties.
2. Configure the catalog source. Specify the runtime environment, optionally configure parameters for the
metadata extraction capability, and add filters for metadata extraction.
3. Associate stakeholders. Optionally, associate users with technical assets, giving the users permission to
perform actions determined by their roles.
4. Run or schedule the catalog source job.
5. Optionally, assign a connection to referenced source system assets.

After you run the catalog source job, you view the results in Data Governance and Catalog.

About the Apache Atlas catalog source


You can use the Apache Atlas catalog source to extract metadata from an Apache Atlas source system.

Apache Atlas is the governance and metadata framework for Hadoop. Apache Atlas has a scalable and
extensible architecture that can be plugged into many Hadoop components to manage their metadata in a
central repository.

Extracted metadata
You can extract metadata from an Apache Atlas source system.

Objects extracted
Metadata Command Center extracts the following metadata from an Apache Atlas source system:

• Atlas Server
• Hive Process
• Sqoop Process
• Calculation
Note: Calculation objects are extracted when there is column-level lineage from one asset to another in
Hive and Sqoop processes.
• Spark Application
• Spark Process
The Apache Atlas catalog source extracts data lineage from the following data sources:

• Oracle
• MySQL
• PostgreSQL
• Apache Hive
• Hadoop Distributed File System (HDFS)
• Apache HBase

6 Chapter 1: Introduction to Apache Atlas catalog sources


Note: Metadata Command Center skips extraction of Hive processes and the associated lineage links for the
following operation types:

• CREATETABLE
• CREATEVIEW
• CREATE_MATERIALIZED_VIEW

Metadata Command Center extracts folders as reference objects from Hadoop Distributed File System.

Metadata Command Center extracts the following objects as reference objects from Apache Hive:

• Schema
• Table
• View
• External Table
• Column

Field and column objects are extracted when there is column-level lineage from one asset to another in
Apache Atlas.

Extracted metadata 7
Chapter 2

Before you begin


Before you can extract catalog source metadata, get information from the Apache Atlas administrator.

Perform the following prerequisite tasks:

• Verify authentication.
• Verify permissions.
• Import SSL certificate to the JRE folder of the Informatica Secure Agent.
• Get Apache Atlas source information.

Verify authentication
To extract Apache Atlas metadata, verify that you have the URL to access Apache Atlas and connect to the
Atlas REST API.

You need to provide the Kerberos principal for authentication when you configure the Apache Atlas catalog
source in Metadata Command Center.

Complete the following tasks:

• Add details of the Kerberos server to the host file located in the Secure Agent machine in the following
format: <ip_address> <hostname>
On a Windows machine, the host file is available in the following path: C:\Windows\System32\drivers
\etc\hosts
On a Linux machine, the host file is available in the following path: /etc/hosts
• Copy the Atlas Keytab file from the Hadoop cluster to any location on the Secure Agent machine.
• Enable Atlas hook from Hadoop Distributed File System and Apache Hive configurations so that Apache
Atlas can read the metadata.
• Copy the Kerberos configuration file from the Hadoop cluster to any location on the Secure Agent
machine. You can modify the Kerberos configuration file as per requirement.
The following code shows a sample Kerberos configuration file:
[libdefaults]
default_realm = *****
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = rc4-hmac
default_tkt_enctypes = rc4-hmac

8
permitted_enctypes = rc4-hmac
udp_preference_limit = 1
kdc_timeout = 3000
allow_weak_crypto=true
[realms]
<domain name> = {
kdc = *****
admin_server = *****
}
[domain_realm]
Note: If the Kerberos encryption algorithms are not compatible with Java Standard Edition version 11, you
can add the allow_weak_crypto=true property in the Kerberos configuration file.

Verify permissions
Verify that you have the following account and permissions:

• A user account to access and extract metadata from the Apache Atlas source system.
• Read permission for the account to access the Apache Atlas source system.

Import the SSL certificate to the Secure Agent


machine
If SSL is enabled on the Apache Atlas source system, import the SSL certificate to the JRE folder of the
Informatica Secure Agent installation directory.

Complete the following steps to import the SSL certificate:

1. Download the SSL certificate from the Apache Atlas installation.


2. Copy the SSL certificate file to any location on the Informatica Secure Agent machine.
3. Identify the Java version for the Secure Agent.
You can identify the Java version from the following files:
• <Informatica Secure Agent installation directory>\apps\agentcore\agentcore.log
Search for "AgentCore JRE version".
• On Windows operating systems, open the lcm-env.bat file from one of the following locations:
- <Informatica Secure Agent installation directory>\apps\agentcore\<latest version>
\.lcm
- <Informatica Secure Agent installation directory>\apps\DIS\<latest version>\.lcm
• On Linux operating systems, open the lcm-env.sh file from one of the following locations:
- <Informatica Secure Agent installation directory>/apps/agentcore/<latest version>
/.lcm
- <Informatica Secure Agent installation directory>/apps/DIS/<latest version>/.lcm

4. Open a command prompt from the following directory:


<Informatica Secure Agent installation directory>\apps\jdk\<latest version>\jre\bin

Verify permissions 9
5. Run the following command to import the SSL certificate:
keytool -import -alias <alias name> -keystore <path to cacert file> -file <absolute
path to SSL certificate>

Note: The Java certificate file is named cacerts and is located in the following Java directory: \jre\lib
\security\cacerts
For example, you can run the following command for Windows operating systems:
keytool -import -alias aliasname -keystore "C:\data\devprod\jdk\jre\lib\security
\cacerts" -file "C:\data\devprod\filename.crt"

6. Restart the Secure Agent.

Get Apache Atlas source information


Before you configure the catalog source, ask the Apache Atlas administrator for connection information that
you need to configure the catalog source.

Note: You don't need to create a connection object for Apache Atlas. You provide this information when you
configure the catalog source.

The following table describes the properties that you need:

Property Description

Base URL URL to access Apache Atlas and connect to the Atlas REST API.

Principal The Kerberos principal used for authentication.

Keytab File Path The absolute path to the Kerberos keytab file located on the Secure Agent machine used for
authentication.

Configuration File Path The absolute path to the Kerberos configuration file located on the Secure Agent machine
used for authentication.

10 Chapter 2: Before you begin


Chapter 3

Create catalog sources in


Metadata Command Center
Use Metadata Command Center to configure a catalog source for Apache Atlas and extract metadata.

When you configure a catalog source, you define the source system from which you want to extract
metadata. Configure filters to include or exclude source system metadata before you run the job.

To provide stakeholders access to technical assets, you can assign access through roles. To view lineage for
any system that the source system references, create a catalog source and a connection associated with the
referenced source system after you run the job.

Step 1. Register a catalog source


When you register a catalog source, provide general information and connection values.

1. Log in to Informatica Intelligent Cloud Services.


The My Services page appears.
2. Click Metadata Command Center.
The following image shows the Metadata Command Center box on the My Services page:

11
3. Click New from the menu.
4. Select Catalog Source from the list of asset types.
5. Select Apache Atlas from the list of source systems.

6. Click Create.
The following image shows the Apache Atlas registration information:

7. In the General Information area, enter a name and an optional description for the catalog source.
Note: After you create a catalog source, you can't change the name.
8. In the Connection Information area, enter the Apache Atlas connection information based on the
connection values that you got from the administrator.
The following table describes the properties to configure:

Property Description

Base URL URL to access Apache Atlas and connect to the Atlas REST API.

Principal The Kerberos principal used for authentication.

12 Chapter 3: Create catalog sources in Metadata Command Center


Property Description

Keytab File Path The absolute path to the Kerberos keytab file located on the Secure Agent machine
used for authentication.

Configuration File Path The absolute path to the Kerberos configuration file located on the Secure Agent
machine used for authentication.

9. Click Next.
The Configuration page appears.

Step 2. Configure capabilities


When you configure the Apache Atlas catalog source, you define the settings for the metadata extraction
capability.

The metadata extraction capability extracts source metadata from external source systems.

Configure metadata extraction


When you configure the Apache Atlas catalog source, you choose a runtime environment, define filters, and
enter configuration parameters for metadata extraction.

Before you configure metadata extraction, configure runtime environments in the Informatica Intelligent
Cloud Services Administrator.

1. In the Connection and Runtime area, choose the Secure Agent group where you want to run catalog
source jobs.
2. Choose to retain or delete objects that are deleted from the source in the catalog using the Metadata
Change Option.
• Retain. Retains objects that are deleted from the source in the catalog. If you update or add a filter,
the catalog retains objects extracted from the previous job and extracts additional objects that match
the current filter. Objects deleted from the source are not deleted from the catalog. Enrichments
added on deleted objects and relationships are retained.
• Delete. Deletes metadata from the catalog based on objects deleted from the source and changes
you make to the filter. Enrichments added on deleted objects and relationships are also permanently
lost. Objects renamed in the source are removed and recreated in the catalog.
Note: You can also change the configured metadata change option when you run a catalog source.
3. In the Filters area, define one or more filter conditions to apply for metadata extraction:
a. From the Include or Exclude metadata list, choose to include or exclude metadata based on the filter
parameters.
b. From the Object type list, select Hive Database, HDFS Path, or HBase Namespace.
c. Enter the filter values.
Filters can contain the following wildcards:
• Question mark. Represents a single character.

Step 2. Configure capabilities 13


• Asterisk. Represents multiple characters or empty text.
The following image shows the filter condition options:

d. To define an additional filter with an OR condition, click the Add icon.


The following image shows that the filter includes metadata related to Hive tables in the HR
database with names that start with EMP followed by a single character, includes metadata related
to the table named HbaseTable located in the HbaseNS namespace, and excludes metadata related
to all files in the hdfsfolder1 folder and its subfolders:

Exclude filter conditions are considered if the assets in the include filter conditions are not related or
linked through lineage to the excluded assets. For example, add a filter condition to include metadata
related to all tables with the name EMP across all databases (*.EMP) and then add another filter
condition to exclude metadata related to the EMP table located in the HR database (HR.EMP). Here, the
exclude filter condition is considered as the assets are not related or linked through lineage.
Exclude filter conditions are not considered if the assets in the include filter conditions are related or
linked through lineage to the excluded assets. For example, add a filter condition to include metadata
related to EMP table in the HR database (HR.EMP) and then add another filter condition to exclude
metadata related to SAL table in the same HR database (HR.SAL). Here, the exclude filter condition is
not considered due to the presence of lineage links between the EMP and SAL tables.
If you add a filter condition to include metadata from a table deleted from the Apache Atlas source
system, Metadata Command Center ignores the filter condition.
If the value of the HDFS Path filter contains special characters, replace the special characters with an
asterisk wildcard character. For example, replace /Test$~^!()*<>_Folder with /Test*Folder.
4. In the Configuration Parameters area, enter configuration properties.
Note: Click Show Advanced to view all configuration parameters.

14 Chapter 3: Create catalog sources in Metadata Command Center


The following table describes the properties that you can enter:

Property Description

Lineage Direction The direction of data flow between assets that you extract from Apache Atlas with the
direction parameter of the LineageRESTAPI.
Select one of the following options:
- BOTH. Extracts both input and output data flow between assets.
- INPUT. Extracts only input data flow between assets.
- OUTPUT. Extracts only output data flow between assets.

Lineage Depth The number of lineage hops to extract from Apache Atlas for filtered assets with the depth
parameter of the LineageRESTAPI.
Default is 3.

Page Result Limit Advanced parameter. The maximum number of search result entries per page from a fetch
using the limit parameter of the DiscoveryRESTAPI.
Default is 1000.

Entity Bulk Fetch Advanced parameter. The maximum number of entities to include in a bulk fetch when you
Count use the BulkEntityRESTAPI.
Default is 100.

Connection Advanced parameter. The maximum amount of time, in milliseconds, that the Secure Agent
Timeout waits to set up an HTTP connection to communicate and get a response from the Apache
Atlas server.
Default is -1 which means timeout is disabled.

Parallel Lineage Advanced parameter. The maximum number of LineageRESTAPI calls that can run
Fetch Count simultaneously to retrieve lineage data.
Default is 5.

5. Optionally, in the Configuration Parameters area, enter additional settings.


The following table describes the property that you enter for Additional settings.
Note: The Additional settings section appears when you click Show Advanced.

Property Description

Expert Enter additional configuration options to be passed at runtime. Required if you need to
parameters troubleshoot the catalog source job.
Caution: Use expert parameters when it is recommended by Informatica Global Customer
Support.

6. Click Next.
The Associations page appears.

Step 2. Configure capabilities 15


Step 3. Associate stakeholders with technical assets
You can provide users and groups access to technical assets in Data Governance and Catalog through the
roles assigned to them. Choose a role to associate with the technical assets, and then choose the
stakeholders within that role to provide access to the assets.

Verify that the organization administrator assigned users and user groups to the role that you want to
associate with technical assets.

1. On the Associations page, select Assign Stakeholders.


2. Select a role to assign to the technical assets for the catalog source that you are configuring.
3. Click Select to add users and user groups as stakeholders for the technical assets.
The Add Users & User Groups dialog box displays the list of users and user groups that the organization
administrator has assigned to the selected role.

4. Select one or more users or user groups to assign as stakeholders for the technical assets, and click OK.
Only the selected users and user groups belonging to the specified role are granted the role-defined
permissions to technical assets.
5. You can assign more than one role to technical assets and add users and user groups from each role. To
assign more roles, click the add button.
6. Choose to save and run the job or to schedule a recurring job.
• To save and run the job, click Save and then Run.
• To schedule a recurring job, click Next to open the Schedule page.

Step 4. Run or schedule the job


Choose to run the job manually, or configure it to run on schedule.

The first time that you run the job, Metadata Command Center extracts the metadata. Subsequently, each
time that you run the job, Metadata Command Center synchronizes the catalog with the source system.

Note: You can't run multiple jobs simultaneously.

16 Chapter 3: Create catalog sources in Metadata Command Center


Run the job manually
If you didn't run the job from the Filters or Association pages, you can run it from the Schedule page.

1. On the Schedule page, click Run to run the job.


The Run Catalog Source Job dialog box opens.
2. Click Run.

Run the job on a schedule


You can choose to run metadata extraction on a recurring schedule.

1. On the Schedule page, click the Run on Schedule checkbox to schedule the job.
The Schedule configuration page opens.
2. Enter the start date, time zone, and the interval at which you want to run the job.
3. Click Save to save the schedule.

Monitor job status


After the job runs, you can monitor the status of the job on the Overview page for the job.

For more information about job monitoring, see Administration.

Step 5. Connect to referenced source systems


If the source system references another source system, create a connection assignment in Metadata
Command Center to view complete data lineage. To create a connection assignment, create a catalog source
based on the reference source system, and then assign the connection to the catalog source. A reference
source system can be a file system such as Hadoop Distributed File System or relational databases such as
Apache Hive, Oracle, MySQL, and PostgreSQL.

Before you assign a connection, create a catalog source for each reference source system and run the
catalog source job.

Note: You can view the lineage with reference objects without creating a connection assignment. After
connection assignment, you can view the actual objects.

Apache Atlas uses Sqoop queries to import data from a reference source system to a Hive database. If the
Sqoop query contains double quotes, replace the double quotes with backticks (`) in the Apache Atlas source
system to view the reference objects correctly in Data Governance and Catalog.

1. On the Monitor page, select the Connection Assignment tab.


The Connection Assignment panel displays a list of assigned and unassigned connections along with
details for each connection.
2. Select the connection to the reference source system and click Assign.

Step 5. Connect to referenced source systems 17


The following image shows the Assign button and the list of connections:

3. In the Assign Connection dialog box, select one or more endpoint objects to assign to the selected
connection and click Assign.
You can filter the list in the Assign Connection dialog box by name, type, or endpoint.
You can create a connection assignment to the following catalog source types:
• Apache Hive. The target endpoint object must belong to the Database class type.
• Hadoop Distributed File System. The target endpoint object must belong to the File System class
type.
• Oracle. The target endpoint object must belong to the Database class type.
• MySQL. The target endpoint object must belong to the Database class type.
• PostgreSQL. The target endpoint object must belong to the Database class type.
Note: You can assign connections to Oracle, MySQL, and PostgreSQL catalog sources only when
Metadata Command Center extracts Sqoop processes from an Apache Atlas source system.

18 Chapter 3: Create catalog sources in Metadata Command Center


The following image shows the Assign Connection dialog box:

When you click Assign, Metadata Command Center creates links between matching objects in the
connected catalog sources, and it calculates the percentage of matched and unmatched objects. The
higher the percentage of matched objects, the more accurate the lineage that you view in Data
Governance and Catalog.

Step 5. Connect to referenced source systems 19


Chapter 4

View results in Data Governance


and Catalog
After Metadata Command Center runs a job, you can view the results in Data Governance and Catalog where
the catalog source and its elements are called technical assets. You can view a catalog source as a
hierarchy. Expand each technical asset to see its components.

When referenced source systems are connected to a catalog source, you can expand the hierarchy to see
details about the technical asset's component elements.

You can view the data lineage of an asset contained within a catalog source to see individual elements such
as data sources, calculations, and filters. When you view data lineage, you can see the individual upstream
elements that contribute data or expressions to each component of a data flow or catalog source.

View metadata extraction results


After a job runs in Metadata Command Center, view the results in Data Governance and Catalog. You can
view details about source system contents as hierarchical displays and trace data lineage.

1. Log in to Informatica Intelligent Cloud Services.


The My Services page appears.
2. Click Data Governance and Catalog.
The following image shows the Data Governance and Catalog box on the My Services page:

20
3. On the Data Governance and Catalog home page, click the number in the Technical Assets panel.
The Technical Assets page opens.
4. Select Catalog Source in the Filter list.
The list of catalog sources opens.
5. Search for the catalog source from which you extracted metadata, and click the name.
The Overview tab of the asset opens.
The following image shows a sample asset page:

View metadata extraction results 21


6. View the asset from different perspectives by clicking on the tabs.
Note: You can view the calculation properties such as expression, control conditions, and calculation
complexity in the Overview tab of a calculation asset.
If a table or column is deleted from the source system, the Technical Description field in the Overview
tab and the Comment field in the System Attributes tab display the value as Deleted.
For more information about working with assets, see Working with Assets in the Data Governance and
Catalog help.

View data lineage


Data lineage views are available for technical assets in the catalog source. You can view lineage at the
source, target, data set, or data element level.

Data lineage is a visual representation of the flow of data across the systems in your organization. Lineage
depicts how the data flows from the system of its origin to the system of its destination.

View source lineage


The lineage diagram at the source or target level shows how data assets refer to and use sources.

To view data lineage at the source or target level, search for and open a technical asset, and then click the
Lineage tab.

The following image shows how the LOAD Hive process loads data from the ORDERS.avro reference source
file to the avro_load reference target table before connection assignment:

The following image shows how the LOAD Hive process loads data from the ORDERS.avro actual source file
to the avro_load actual target table after connection assignment:

22 Chapter 4: View results in Data Governance and Catalog


View lineage at data set level and data element level
The lineage at the data set level and the data element level shows how the technical assets such as files and
commands contribute to the selected asset.

Data sets are technical assets that contain sets of data. Examples include files, databases, or temp files that
hold the results of calculations. Data elements are objects upstream or downstream of a data set, and are
accessible when you expand a data set to the data element level. For example, a column in a source object.

View lineage at the data set level


The data set level is a view that shows individual sets of data in the data flow. To view lineage at the data set
level, open a technical asset, click the Lineage tab, and then verify that the level is set to Data Set Level.

The following image shows table-level lineage where the avro_table referenced table gets data from the
csv_format_parent referenced table after data transformation using the storageformats.avro Hive process
before connection assignment:

The following image shows table-level lineage where the avro_table actual table gets data from the
csv_format_parent actual table after data transformation using the storageformats.avro Hive process after
connection assignment:

View data lineage 23


View lineage at the data element level
The data element level displays detailed information. At the data element level, you can see the input sources
for expressions or commands and calculations or transformations on the data. To view data lineage at the
data element level, open a technical asset, click the Lineage tab, and then verify that the level is set to Data
Element Level.

The following image shows column-level lineage where the col_decimal referenced column of the avro_table
gets data from the col_decimal referenced column of the csv_format_parent table after data transformation
using the storageformats.avro Hive process before connection assignment:

The following image shows column-level lineage where the col_decimal actual column of the avro_table gets
data from the col_decimal actual column of the csv_format_parent table after data transformation using the
storageformats.avro Hive process after connection assignment:

24 Chapter 4: View results in Data Governance and Catalog

You might also like