SlideShare a Scribd company logo
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Architecture of Big Data
Solutions
Guido Schmutz
Frankfurt, 13.12.2017
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Architektur of Big Data Solutions
Agenda
1. Introduction
2. Big Data & Fast Data Reference Architectures
3. Continuous Streaming Data Ingestion
4. Big Data & Cloud
5. Microservices Architecture
6. Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
Introduction
Architektur of Big Data Solutions
Big Data Definition (4 Vs)
+	Time	to	action	?	– Big	Data	+	Real-Time	=	Stream	Processing
Characteristics	of	Big	Data:	Its	Volume,	Velocity	
and	Variety	in	combination
Architektur of Big Data Solutions
Architektur von Big Data Lösungen
Enterprise Data
Warehouse
ETL / Stored
Procedures
Data Marts /
Aggregations
Location
Social
Clickstream
Segmentation & Churn
Analysis
BI Tools
Marketing Offers
Billing &
Ordering
CRM / Profile
Marketing
Campaigns
Architektur of Big Data Solutions
Traditional Flow Diagram - Challenges
Enterprise Data
Warehouse
ETL / Stored
Procedures
Data Marts /
Aggregations
Location
Social
Clickstream
Segmentation & Churn
Analysis
BI Tools
Marketing Offers
Billing &
Ordering
CRM / Profile
Marketing
Campaigns
Limited
Processing
Power
Does not
model easily
to traditional
database
schema
Limited
Processing
Power
Storage
Scaling
very
expensive
Based on
sample /
limited data
Loss in
Fidelity
Other /
New Data
Sources
High
Voume
and
Velocity
Architektur of Big Data Solutions
Big Data to the rescue? Why is a structuring /
architecture important?
Architektur of Big Data Solutions
Why talk about Big Data Architectures?
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the
right/matching technologies
Architektur of Big Data Solutions
Big Data & Fast Data Reference
Architectures
Architektur of Big Data Solutions
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Big Data Architecture
BI	Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search	/	Explore
Online	&	Mobile	
Apps
Search
• Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Big Data Architecture - Hadoop
BI	Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search	/	Explore
Online	&	Mobile	
Apps
Search
• Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Big Data Architecture - Spark
BI	Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search	/	Explore
Online	&	Mobile	
Apps
Search
• Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BI	Tools
Enterprise Data
Warehouse
Event
Hub
SQL
Search	/	Explore
Online	&	Mobile	
Apps
Search
Data Flow • Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BI	Tools
Enterprise Data
Warehouse
Event
Hub
SQL
Search	/	Explore
Online	&	Mobile	
Apps
Search
Data Flow • Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
Architektur of Big Data Solutions
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BI	Tools
Enterprise Data
Warehouse
Event
Hub
SQL
Search	/	Explore
Online	&	Mobile	
Apps
Search
Data Flow • Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
high	latency
“Data at Rest” vs. “Data in Motion”
Architektur of Big Data Solutions
Data at Rest Data in Motion
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Processing
Cluster
Streaming Analytics Architecture
BI	Tools
Enterprise Data
Warehouse
Event
Hub
Search	/	Explore
Online	&	Mobile	
Apps
Search
Data Flow Data Flow
Results
• Low	Latency	Processing
• Alerting
• ”Real-Time”	Dashboard
Stream Analytics
Reference /
Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Processing
Cluster
BI	Tools
Enterprise Data
Warehouse
Event
Hub
Search	/	Explore
Online	&	Mobile	
Apps
Search
Data Flow Data Flow
Results
• Low	Latency	Processing
• Alerting
• ”Real-Time”	Dashboard
Stream Analytics
Reference /
Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
Streaming Analytics Architecture – Open Source
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Processing
Cluster
Streaming Analytics Architecture
BI	Tools
Enterprise Data
Warehouse
Event
Hub
Search	/	Explore
Online	&	Mobile	
Apps
Search
Data Flow Data Flow
Results
• Low	Latency	Processing
• Alerting
• ”Real-Time”	Dashboard
Stream Analytics
Reference /
Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
low	latency	without	keeping	raw	data/events
Hadoop Clusterd
Hadoop Cluster
Event Processing
Cluster
Keep raw event data
BI	Tools
Enterprise Data
Warehouse
Search	/	Explore
Online	&	Mobile	
Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event
Hub
Event
Hub
Event
Hub
File Import / SQL Import
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
“Lambda Architecture” for Big Data
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Event
Hub
Event
Hub
Event
Hub
SQL
Search
BI	Tools
Enterprise Data
Warehouse
Search	/	Explore
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Hadoop Clusterd
Hadoop Cluster
Event Processing
Cluster
Results
Stream Analytics
Reference /
Models
Dashboard
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
“Kappa Architecture” for Big Data
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
SQL
Search
BI	Tools
Enterprise Data
Warehouse
Search	/	Explore
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Hadoop Clusterd
Hadoop Cluster
Event Processing
Cluster
Results
Stream Analytics
Reference /
Models
Dashboard
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event
Hub
Event
Hub
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
“Unified Architecture” for Big Data
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI	Tools
Enterprise Data
Warehouse
Search	/	Explore
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Event
Hub
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Continuous Streaming Data
Ingestion
Architektur of Big Data Solutions
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Continuous Data Ingestion
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI	Tools
Enterprise Data
Warehouse
Search	/	Explore
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Event
Hub
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Continuous Streaming Data Ingestion
DB	Source
Big	DataLog
Stream	
Processing
IoT Sensor
Event	Hub
Topic
Topic
REST
Topic
IoT GW
CDC	GW
Connect
CDC
DB	Source
Log CDC
Native
IoT Sensor
IoT Sensor
31
Dataflow	GW
Topic
Topic
Queue
Message	GW
Topic
Dataflow	GW
Dataflow
TopicREST
31
File	Source
Log
Log
Log
Social
Native
Topic
Topic
Architektur of Big Data Solutions
Continuous Streaming Data Ingestion
Architektur of Big Data Solutions
SQL Polling
Change Data Capture
(CDC)
File Polling
File Stream (File Tailing)
File Stream (Appender)
Sensor Stream
Continuous Streaming Data Ingestion
DB	Source
Big	DataLog
Stream	
Processing
IoT Sensor
Event	Hub
Topic
Topic
REST
Topic
IoT GW
CDC	GW
Connect
CDC
DB	Source
Log CDC
Native
IoT Sensor
33
Dataflow	GW
Topic
Topic
Queue
Message	GW
Topic
Dataflow	GW
Dataflow
TopicREST
33
File	Source
Log
Log
Log
Social
Native
Topic
Topic
Architektur of Big Data Solutions
Big Data & Cloud
Architektur of Big Data Solutions
Data Locality vs. Compute/Storage Separation
Data Local Compute Separate Compute and Storage
Worker #1
Disk
Processing
Master Node
Worker #2
Disk
Processing
Worker #3
Disk
Processing
Network
Storage
Disk Disk Disk
Compute #1
Processing
Compute #2
Processing
Compute #3
Processing
Network
Master Node
Network
Separation of compute
and storage – the
fundamental difference
• store data in Object
Storage instead of DFS
• bring up Compute nodes
only for data processing
• multiple workloads on
separate clusters can
access same data
Architektur of Big Data Solutions
A new way to Manage Big Data
Big Data Traditional
Assumptions
Bare-metal
Data Locality
HDFS on local disks
Big Data
A New Approach
Containers and VMs
Compute and storage
separation
Shared storage
Benefits and Value
Big-Data-as-a-Service
Agility and cost savings
Faster time-to-insights
Architektur of Big Data Solutions
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI	Tools
Enterprise Data
Warehouse
Search	/	Explore
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Event
Hub
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Big Data & Cloud - Amazon WebServices (AWS)
Microservices Architecture
Architektur of Big Data Solutions
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Asynchronous Microservice Architecture
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
SQL
Search
BI	Tools
Enterprise Data
Warehouse
Search	/	Explore
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice Cluster
Microservice State
{		}
API
Stream Analytics Cluster
Stream
Processor
State
{		}
API
Event
Stream
Event
Stream
Service
Architektur of Big Data Solutions
Big Data Ecosystem – many
choices sorted!
Architektur of Big Data Solutions
Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com
Architektur of Big Data Solutions
Ad

More Related Content

What's hot (20)

Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
WinWire Technologies Inc
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
DataWorks Summit
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Azure Cost Management
Azure Cost ManagementAzure Cost Management
Azure Cost Management
Stefano Tempesta
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Timothy McAliley
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
Alexey Grishchenko
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 
Data Mesh
Data MeshData Mesh
Data Mesh
Piethein Strengholt
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Cathrine Wilhelmsen
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
DataWorks Summit
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Timothy McAliley
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Cathrine Wilhelmsen
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 

Similar to Architecture of Big Data Solutions (20)

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
Guido Schmutz
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
 
Customer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewCustomer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° view
Swiss Data Forum Swiss Data Forum
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
Ramesh Kutumbaka
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
Raghu Kashyap
 
A Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentA Data Fabric for All Things Intelligent
A Data Fabric for All Things Intelligent
Denodo
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
Guido Schmutz
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
Guido Schmutz
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
Guido Schmutz
 
Customer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewCustomer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° view
Guido Schmutz
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
Salesforce Developers
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
James Serra
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
DataWorks Summit/Hadoop Summit
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
Guido Schmutz
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
Raghu Kashyap
 
A Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentA Data Fabric for All Things Intelligent
A Data Fabric for All Things Intelligent
Denodo
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
Guido Schmutz
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
Guido Schmutz
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
Guido Schmutz
 
Customer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewCustomer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° view
Guido Schmutz
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
Salesforce Developers
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
James Serra
 
Ad

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Ad

Recently uploaded (20)

Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 

Architecture of Big Data Solutions

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Architecture of Big Data Solutions Guido Schmutz Frankfurt, 13.12.2017 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: [email protected] Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Architektur of Big Data Solutions
  • 3. Agenda 1. Introduction 2. Big Data & Fast Data Reference Architectures 3. Continuous Streaming Data Ingestion 4. Big Data & Cloud 5. Microservices Architecture 6. Big Data Ecosystem – many choices sorted! Architektur of Big Data Solutions
  • 5. Big Data Definition (4 Vs) + Time to action ? – Big Data + Real-Time = Stream Processing Characteristics of Big Data: Its Volume, Velocity and Variety in combination Architektur of Big Data Solutions
  • 6. Architektur von Big Data Lösungen Enterprise Data Warehouse ETL / Stored Procedures Data Marts / Aggregations Location Social Clickstream Segmentation & Churn Analysis BI Tools Marketing Offers Billing & Ordering CRM / Profile Marketing Campaigns Architektur of Big Data Solutions
  • 7. Traditional Flow Diagram - Challenges Enterprise Data Warehouse ETL / Stored Procedures Data Marts / Aggregations Location Social Clickstream Segmentation & Churn Analysis BI Tools Marketing Offers Billing & Ordering CRM / Profile Marketing Campaigns Limited Processing Power Does not model easily to traditional database schema Limited Processing Power Storage Scaling very expensive Based on sample / limited data Loss in Fidelity Other / New Data Sources High Voume and Velocity Architektur of Big Data Solutions
  • 8. Big Data to the rescue? Why is a structuring / architecture important? Architektur of Big Data Solutions
  • 9. Why talk about Big Data Architectures? Choosing the right architecture is key for any (big data) project Big Data is still quite a rather young field and therefore a “moving target” no standard architectures available which have been used for years In the past years, some architectures and best practices have evolved Know your use cases before choosing your architecture / technologies To have a reference architecture in place helps in choosing the right/matching technologies Architektur of Big Data Solutions
  • 10. Big Data & Fast Data Reference Architectures Architektur of Big Data Solutions
  • 11. Hadoop Clusterd Hadoop Cluster Big Data Cluster Big Data Architecture BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  • 12. Hadoop Clusterd Hadoop Cluster Big Data Cluster Big Data Architecture - Hadoop BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  • 13. Hadoop Clusterd Hadoop Cluster Big Data Cluster Big Data Architecture - Spark BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  • 14. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub for handling streaming data BI Tools Enterprise Data Warehouse Event Hub SQL Search / Explore Online & Mobile Apps Search Data Flow • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data
  • 15. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub for handling streaming data BI Tools Enterprise Data Warehouse Event Hub SQL Search / Explore Online & Mobile Apps Search Data Flow • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data Architektur of Big Data Solutions
  • 16. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub for handling streaming data BI Tools Enterprise Data Warehouse Event Hub SQL Search / Explore Online & Mobile Apps Search Data Flow • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data high latency
  • 17. “Data at Rest” vs. “Data in Motion” Architektur of Big Data Solutions Data at Rest Data in Motion
  • 18. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Processing Cluster Streaming Analytics Architecture BI Tools Enterprise Data Warehouse Event Hub Search / Explore Online & Mobile Apps Search Data Flow Data Flow Results • Low Latency Processing • Alerting • ”Real-Time” Dashboard Stream Analytics Reference / Models Dashboard Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data
  • 19. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Processing Cluster BI Tools Enterprise Data Warehouse Event Hub Search / Explore Online & Mobile Apps Search Data Flow Data Flow Results • Low Latency Processing • Alerting • ”Real-Time” Dashboard Stream Analytics Reference / Models Dashboard Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data Streaming Analytics Architecture – Open Source
  • 20. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Processing Cluster Streaming Analytics Architecture BI Tools Enterprise Data Warehouse Event Hub Search / Explore Online & Mobile Apps Search Data Flow Data Flow Results • Low Latency Processing • Alerting • ”Real-Time” Dashboard Stream Analytics Reference / Models Dashboard Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data low latency without keeping raw data/events
  • 21. Hadoop Clusterd Hadoop Cluster Event Processing Cluster Keep raw event data BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps Search Results Stream Analytics Reference / Models Dashboard Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub Event Hub Event Hub File Import / SQL Import Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data
  • 22. “Lambda Architecture” for Big Data Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Event Hub Event Hub Event Hub SQL Search BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Hadoop Clusterd Hadoop Cluster Event Processing Cluster Results Stream Analytics Reference / Models Dashboard Hadoop Clusterd Hadoop Cluster Big Data Cluster Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  • 23. “Kappa Architecture” for Big Data Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps SQL Search BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Hadoop Clusterd Hadoop Cluster Event Processing Cluster Results Stream Analytics Reference / Models Dashboard Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub Event Hub Event Hub Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  • 24. Hadoop Clusterd Hadoop Cluster Big Data Cluster “Unified Architecture” for Big Data Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  • 26. Hadoop Clusterd Hadoop Cluster Big Data Cluster Continuous Data Ingestion Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  • 27. Continuous Streaming Data Ingestion DB Source Big DataLog Stream Processing IoT Sensor Event Hub Topic Topic REST Topic IoT GW CDC GW Connect CDC DB Source Log CDC Native IoT Sensor IoT Sensor 31 Dataflow GW Topic Topic Queue Message GW Topic Dataflow GW Dataflow TopicREST 31 File Source Log Log Log Social Native Topic Topic Architektur of Big Data Solutions
  • 28. Continuous Streaming Data Ingestion Architektur of Big Data Solutions SQL Polling Change Data Capture (CDC) File Polling File Stream (File Tailing) File Stream (Appender) Sensor Stream
  • 29. Continuous Streaming Data Ingestion DB Source Big DataLog Stream Processing IoT Sensor Event Hub Topic Topic REST Topic IoT GW CDC GW Connect CDC DB Source Log CDC Native IoT Sensor 33 Dataflow GW Topic Topic Queue Message GW Topic Dataflow GW Dataflow TopicREST 33 File Source Log Log Log Social Native Topic Topic Architektur of Big Data Solutions
  • 30. Big Data & Cloud Architektur of Big Data Solutions
  • 31. Data Locality vs. Compute/Storage Separation Data Local Compute Separate Compute and Storage Worker #1 Disk Processing Master Node Worker #2 Disk Processing Worker #3 Disk Processing Network Storage Disk Disk Disk Compute #1 Processing Compute #2 Processing Compute #3 Processing Network Master Node Network Separation of compute and storage – the fundamental difference • store data in Object Storage instead of DFS • bring up Compute nodes only for data processing • multiple workloads on separate clusters can access same data Architektur of Big Data Solutions
  • 32. A new way to Manage Big Data Big Data Traditional Assumptions Bare-metal Data Locality HDFS on local disks Big Data A New Approach Containers and VMs Compute and storage separation Shared storage Benefits and Value Big-Data-as-a-Service Agility and cost savings Faster time-to-insights Architektur of Big Data Solutions
  • 33. Hadoop Clusterd Hadoop Cluster Big Data Cluster Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions Big Data & Cloud - Amazon WebServices (AWS)
  • 35. Hadoop Clusterd Hadoop Cluster Big Data Cluster Asynchronous Microservice Architecture Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps SQL Search BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Parallel Processing Storage Storage RawRefined Results Microservice Cluster Microservice State { } API Stream Analytics Cluster Stream Processor State { } API Event Stream Event Stream Service Architektur of Big Data Solutions
  • 36. Big Data Ecosystem – many choices sorted! Architektur of Big Data Solutions
  • 37. Big Data Ecosystem – many choices sorted! Architektur of Big Data Solutions
  • 38. Big Data Ecosystem – many choices sorted! Architektur of Big Data Solutions