100% found this document useful (1 vote)
692 views

Cloudera Big Data Architecture Diagram

The document describes the data flow in a Cloudera Data Platform system including data ingestion from various sources into the data lake and data warehouse/mart, processing via the SQL virtualization engine and CDH cluster, and presentation and analysis using business intelligence tools, visualization, reporting, and data science workspaces. It also provides recommendations for the minimum hardware configuration for the master, worker, edge, and utility nodes in the CDH cluster.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
692 views

Cloudera Big Data Architecture Diagram

The document describes the data flow in a Cloudera Data Platform system including data ingestion from various sources into the data lake and data warehouse/mart, processing via the SQL virtualization engine and CDH cluster, and presentation and analysis using business intelligence tools, visualization, reporting, and data science workspaces. It also provides recommendations for the minimum hardware configuration for the master, worker, edge, and utility nodes in the CDH cluster.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Flow ETL / ELT

Ingestion Data Warehouse /


Data Lake SQL Virtualization Engine Mart

Operational Database
Data Source Cloudera Data Platform Presentation Layer
CDH Cluster (n-node) Business Visualization /
Intelligence Tools Reporting
Unstructured / Semi
Structured Impala Kudu

ELT:
External Data Impala Job
ETLT:
Hue Talend

Data Scientist
Playground
Advance
Analytics

Cloudera Navigator
CDSW
Cloudera
Data Science
Workbench

Data Governance Ad-hoc Data Exploration


I'm Business User,
I'm Handling Blue
Data Source Cloudera Data Platform Presentation Layer Sector
CDH Cluster (n-node) Business
Intelligence Tools
I'm IT Ops, I'm
Impala Kudu Handling Green
Sector

ELT: I'm Data Steward,


Impala I'm Handling Purple
Job Sector
Hue ETLT:
Talend
I'm Data Engineer,
I'm Handling Orange
Advance Sector
Analytics

Cloudera Navigator I'm Data Scientist,


I'm Handling Yellow
Sector
Master Node Edge Node
8 Core 64GB RAM 16 Core 128GB Minimum Configuration
256GB SSD RAM 1M3W1E (+1U)
Preferable Physical 512GB SSD
OS&App Master Node for coordinating for all Worker Node (including load balancing)
Preferable
Worker Node Worker Node Worker Node Physical Worker Node for do the transformation process and keep the data
16 Core 16 Core 16 Core
128GB RAM 128GB RAM 128GB RAM (If Using Edge Node for all client tools that to interact with Master Node but get result
256GB SSD 256GB SSD 256GB SSD Talend) directly from Worker Node
OS&App OS&App OS&App 2 slotx16 Core
1TB HDD for 1TB HDD for 1TB HDD for 256GB RAM Utility Node for Cloudera Manager & Management Service backend (optional)
Data Data Data 512GB SSD
Must Physical Must Physical Must Physical OS&App Talend Big Data Integration Server for Talend ETL Tools could be placed on
500GB HDD for Edge Node
Utility Node
8 Core 64GM RAM Temp
256GB SSD
Preferable Physical

You might also like