unified-data-fabric-whitepaper
unified-data-fabric-whitepaper
Abstract 3
Introduction 3
About Cloudera 9
Metadata/catalog Data
5 Data modeling, preparation, curation, graph engine AI/ML
Discovery
Data security
Policies
Data Ingestion
2 Ingestion, streaming, data movement AI/ML
AI/ML and Streaming
AI/ML
Cloud Data Sources On premises
SOURCES
4 The Unified Data Fabric
How to build a Data Fabric
with Cloudera
Cloudera has been built from the ground up to support Cloudera provides the freedom to securely move
hybrid, multi-cloud data management in support of applications, data, and users bi-directionally between
a Data Fabric architecture. In this section we provide the data center and multiple data clouds, regardless of
an introduction to Cloudera, with a focus on the data where your data lives. As a result, the platform is perfectly
management capabilities that enable the Data Fabric. placed to implement modern data architectures:
Lakehouse
On Premises
Ozone S3
Data
Unified Operational
Highly Engineering Data Fabric Database Customer Data Amazon Amazon
Proprietary Athena Redshift
Data
ADLS GCS
Data Machine
Warehouse Learning
Cloud
Machine Data Advertising Data Dataproc
Azure Synapse Data Flow
Analytics
BigQuery
Azure
Figure 02 — Cloudera
Data
Services
DF DE DW OD ML
The Cloudera Control Pane provides a ubiquitous • Comprehensive — Support for all entities that make
service that is consistent and spans an organization’s up the hybrid platform: Hive tables, Kafka topics,
deployment instances. In the diagram above this Nifi flow, HBase tables, Machine Learning Models,
shows how a public cloud instance shares services etc. Each asset will be displayed alongside its
such as governance with the private cloud instance. contextual metadata, such as schema, security
It goes further in supporting multiple cloud and multiple policies, tags and classifications, profile, governance
private cloud deployments. The Control Plane is rules and business annotations.
a federated service which enables the metadata,
security, encryption and governance to be managed
• Discoverability — Single location to discover and
search for data from all nodes of the Fabric.
as a central, but federated service. The fundamental
building blocks are based on Open Source components • Governance — Built in profiling to give insights into
and have an Open and Accessible API which provides data quality and sensitivity, built in classification
integration to a wider ecosystem of services and engine that assigns security, compliance and policy
supports open standards and Interoperability. related attributes such as PII.
1 6
Global Data
Access
Data Warehouse Operational Database Data Visualization
5
Data
Discovery
Data Catalog Data Visualization
4
Data Catalog
Data
Orchestration
Streaming Analytics Data Engineering
3
Data Processing
and Persistence
Data Warehouse Operational Database Data Hub
2
Data Ingestion
and Streaming
Data Flow & Streaming Data Engineering
Cloudera, Inc. | 5470 Great America Pkwy, Santa Clara, CA 95054 USA | cloudera.com
© 2025 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other
trademarks are the property of their respective companies. Information is subject to change without notice. WP_003_V2 December 23, 2024