SlideShare a Scribd company logo
Hadoop Application
Architectures:
Architecting a Next
Generation Data
Platform
Strata + Hadoop World, Singapore 2016
tiny.cloudera.com/app-arch-singapore
tiny.cloudera.com/app-arch-questions
Mark Grover | @mark_grover
Ted Malaska | @ted_malaska
Jonathan Seidman | @jseidman
Questions? tiny.cloudera.com/app-arch-questions
Logistics
▪ Break at 10:30 – 11:00 AM
▪ Questions at the end of each section
▪ Slides at tiny.cloudera.com/app-arch-singapore
▪ Code at https://github.com/hadooparchitecturebook/Taxi360
Questions? tiny.cloudera.com/app-arch-questions
About the book
▪ @hadooparchbook
▪ hadooparchitecturebook.com
▪ github.com/hadooparchitecturebook
▪ slideshare.com/hadooparchbook
Questions? tiny.cloudera.com/app-arch-questions
About the presenters
▪ Technical Group Architect at
Blizzard Entertainment
▪ Previously Principal Solutions
Architect at Cloudera, lead
architect at FINRA
▪ Contributor to Apache
Hadoop, HBase, Flume, Avro,
Pig, Spark, YARN, Sqoop,
Kudu, Kafka
Ted Malaska
Questions? tiny.cloudera.com/app-arch-questions
About the presenters
▪ Partner Software Engineer at
Cloudera
▪ Contributor to Apache Sqoop.
▪ Previously Technical Lead on
the big data team at Orbitz,
co-founder of the Chicago
Hadoop User Group and
Chicago Big Data
Jonathan Seidman
Questions? tiny.cloudera.com/app-arch-questions
About the presenters
▪ Software Engineer on Spark
at Cloudera
▪ Committer on Apache Bigtop,
PMC member on Apache
Sentry(incubating)
▪ Contributor to Apache Spark,
Hadoop, Hive, Sqoop, Pig,
Flume
Mark Grover
Case Study Overview
Internet of Things and Entity 360
Questions? tiny.cloudera.com/app-arch-questions
Customer 360
Questions? tiny.cloudera.com/app-arch-questions
Connected Cars
Questions? tiny.cloudera.com/app-arch-questions
Entity (Taxi) 360 View
Geo-location/
Traffic Data
Customer Data
Maintenance
Data
Other Data
Sources
Streaming
Vehicle Data
Questions? tiny.cloudera.com/app-arch-questions
What Makes Hadoop a Fit?
Data Sources Extract Transform Load
The early days…
Questions? tiny.cloudera.com/app-arch-questions
What Makes Hadoop a Fit?
SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP,	CRM,	RDBMS,	MACHINES FILES,	IMAGES,	VIDEOS,	LOGS,	CLICKSTREAMS EXTERNAL	DATA	SOURCES
Today…
Questions? tiny.cloudera.com/app-arch-questions
Enabling a Range of New Use Cases…
Fraud Detection Market
Transactions
Internet of
Things
Network Security
Questions? tiny.cloudera.com/app-arch-questions
Hadoop Challenges
Kafka StreamsKafka Connect
Kafka
Questions? tiny.cloudera.com/app-arch-questions
Challenges - Architectural Considerations
▪ Reliable and scalable ingress of multiple data types and sources:
- High volume event data? Batch data?
▪ Reliable and scalable storage to support multiple workloads and access patterns
- Historical data? Real-time search? Analytics
▪ Processing engines (for background processing):
- Stream processing? Batch processing?
▪ Data Modeling
- Modeling data for real-time random access? Analytic access? Batch access?
Case Study
Requirements
Overview
Questions? tiny.cloudera.com/app-arch-questions
Requirements
▪ Allow users (technical and non-technical) to analyze and visualize data…
Questions? tiny.cloudera.com/app-arch-questions
Requirements
▪ Provide analysts with query capabilities via a standard interface…
Questions? tiny.cloudera.com/app-arch-questions
Requirements
▪ Provide developers the ability to perform batch processing on historical data…
Questions? tiny.cloudera.com/app-arch-questions
Requirements
▪ To support all this, we need:
- Reliable ingestion of streaming and batch data.
- Ability to perform transformations on streaming data in flight.
- Ability to perform sophisticated processing of historical data.
High level architecture
Walkthrough
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Data Producer Pub-Sub
Processing &
Ingestion
Engine
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Data Buffering
Considerations
Questions? tiny.cloudera.com/app-arch-questions
But wait!
What about batch data?
Event Data Buffering
Overview
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Data Producer Pub-Sub
Processing &
Ingestion
Engine
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Questions? tiny.cloudera.com/app-arch-questions
Buffering Data
▪ What do we mean by “buffering” and why do we need it?
event,event,event,event,event,event…
This is bad!
Questions? tiny.cloudera.com/app-arch-questions
Buffering Data – Message Brokers
Publisher
Publisher
Publisher
Message
Queue
Subscriber
Subscriber
Subscriber
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Processing &
Ingestion
Engine
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT Rest
NRT
Dashboard
Questions? tiny.cloudera.com/app-arch-questions
Buffering Data – Flume vs. Kafka
▪ Flume – well integrated with Hadoop.
- Great choice when ingesting data into HDFS.
- Can support simple transformations.
- Less coding.
▪ But…
- Interface between Kafka and the streaming layer is already well defined.
- Transformations are done in the stream processing layer.
- We need a more general purpose system at this layer.
Questions? tiny.cloudera.com/app-arch-questions
Kafka Connect
Kafka
Connect
(Source)
Kafka
Connect
(Sink)
Questions? tiny.cloudera.com/app-arch-questions
What is Kafka?
▪ It’s like a message queue, right?
- Actually, it’s a “distributed commit log”.
0 1 2 3 4 5 6 7 8
Data
Source
Data
Consumer
A
Data
Consumer
B
Questions? tiny.cloudera.com/app-arch-questions
Topics and Partitions
▪ Messages are organized into topics, and each topic is split into partitions.
- Each partition is an immutable, time-sequenced log of messages on disk.
- Note that time ordering is guaranteed within, but not across, partitions.
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
Partition 0
Partition 1
Partition 2
Data
Source
Topic
Questions? tiny.cloudera.com/app-arch-questions
Kafka Background (Physical)
ProducersProducersProducers
BrockerBrockerBroker
ConsumersConsumersConsumers
ZooKeeperZooKeeperZooKeeper
Questions? tiny.cloudera.com/app-arch-questions
Consumers
Kafka In Our Architecture
Taxi Trip Data
Producer
Kafka
taxi-trip-input
Topic
Stream
Processing
(Analytic)
Stream
Processing
(Lookup)
Stream
Processing
(Search)
Stream
Processing
(Long Term)
Questions? tiny.cloudera.com/app-arch-questions
Input Events
CMT,2009-01-05 08:31:55,2009-01-05 8:37:50,1,0.90000000000000002,-73.977936999999997,
40.745919000000001,,,-
73.983609000000001,40.755051000000002,Credit,5.2999999999999998,0,,0.79000000000000004,0,6.0
899999999999999
vendor_name,Trip_Pickup_DateTime,Trip_Dropoff_DateTime,Passenger_Count,Trip_Distance,
Start_Lon,Start_Lat,Rate_Code,store_and_forward,End_Lon,End_Lat,Payment_Type,Fare_Amt,
surcharge,mta_tax,Tip_Amt,Tolls_Amt,Total_Amt
Questions? tiny.cloudera.com/app-arch-questions
Kafka Considerations – Reliability
▪ Different reliability levels for topics:
Taxi Trip Data
Kafka
taxi-trip-input
Twitter customer-sentiment
100% – dups
are ok
(“At least
once”)
<=100%
(“At most
once”)
Questions? tiny.cloudera.com/app-arch-questions
Kafka Considerations – Reliability
▪ But remember there are tradeoffs…
Questions? tiny.cloudera.com/app-arch-questions
Kafka Reliability – Replication
Producer
Broker
Partition1
Partition2
Partition3
Leader
Questions? tiny.cloudera.com/app-arch-questions
Kafka Reliability – Replication
Producer
Broker
Partition1
Partition2
Partition3
Questions? tiny.cloudera.com/app-arch-questions
Kafka Reliability – Replication
Producer
Broker
Partition1
Partition2
Partition3
Broker
Partition1
Partition2
Partition3
Leader
Questions? tiny.cloudera.com/app-arch-questions
Kafka Reliability– Replication
Producer
Broker
Partition1
Partition2
Partition3
Broker
Partition1
Partition2
Partition3
Leader
Leader
Questions? tiny.cloudera.com/app-arch-questions
Kafka Reliability – Replication
Producer
Broker
Partition1
Partition2
Partition3
Broker
Partition1
Partition2
Partition3
Broker
Partition1
Partition2
Partition3
Leader
Questions? tiny.cloudera.com/app-arch-questions
Kafka Reliability – Replication
▪ So how does this relate to our application?
kafka-topics --zookeeper ZKHOST:ZKPORT –partition 2 --replication-factor 3 
--create --topic taxi-trip-input
kafka-topics --zookeeper ZKHOST:ZKPORT –partition 2 --replication-factor 1 
--create –topic customer-sentiment
Questions? tiny.cloudera.com/app-arch-questions
Kafka Reliability – Producers
Taxi Trip Data
Kafka
taxi_trip_input
Partition 1
Partition 2
Partition 3
Topic B
Partition 1
Partition 2
Partition 3
Message
failure?
Producer
Resend
message
acks=all
Questions? tiny.cloudera.com/app-arch-questions
Kafka Reliability – Producers
▪ What about duplicates?
Taxi Trip Data
Kafka
taxi_trip_input
Partition 1
Partition 2
Partition 3
Topic B
Partition 1
Partition 2
Partition 3
Producer
ID Message
1000 2009-01-04 03:02:00,1,2.629,...
1001 2009-01-04 03:38:00,3,4.549…
1001 2009-01-04 03:38:00,3,4.549…
Questions? tiny.cloudera.com/app-arch-questions
Kafka Scaling – Partitions
Producer
Kafka
taxi-trip-input
Partition 1
Partition 2
Partition 3
Consumer Group
Consumer
Consumer
Consumer
Questions? tiny.cloudera.com/app-arch-questions
Kafka Scaling – Partitions
Producer
Kafka
taxi-trip-input
Partition 1
Partition 2
Partition 3
Consumer Group
Consumer
Consumer
Consumer
Partition 4
Partition 5
Consumer
Consumer
Higher
throughput
Higher
throughput
More
resources
(memory)
More
resources
(file handles)
Producer
Questions? tiny.cloudera.com/app-arch-questions
Kafka Scaling – Partitions
Questions? tiny.cloudera.com/app-arch-questions
Custom Partitioning
0 1 2
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5
Producer
0 1 2
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5
Producer
3 4 5 6 7
6 7
Partitioner
Questions? tiny.cloudera.com/app-arch-questions
Kafka Scaling – Producers
Producer
Kafka
taxi-trip-input
Partition 1
Partition 2
Partition 3
Consumer Group
Consumer
Consumer
Consumer
Partition 4
Partition 5
Consumer
Consumer
Producer
Questions? tiny.cloudera.com/app-arch-questions
Guarding Against Message Loss
▪ Producer – What happens if the producer loses connection to Kafka and the buffer
overflows?
- Consider a producer side buffer (e.g. Flume).
▪ Source – What happens if events are lost before getting sent to producer?
- Once again use some kind of buffer to provide sufficient retention of data.
Stream Processing
Considerations
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Processing &
Ingestion
Engine
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Questions? tiny.cloudera.com/app-arch-questions
Streaming agenda
▪ What do we mean by streaming?
▪ Streaming use-cases
▪ Streaming semantics
▪ Which streaming engine to choose?
▪ Streaming in our use-case
What do we mean by
streaming?
Questions? tiny.cloudera.com/app-arch-questions
What do we mean by streaming?
Constant low
milliseconds & under
Low milliseconds to
seconds, delay in
case of failures
10s of seconds or
more, re-run in case
of failures
Real-time Near real-time Batch
Questions? tiny.cloudera.com/app-arch-questions
What do we mean by streaming?
Constant low
milliseconds & under
Low milliseconds to
seconds, delay in
case of failures
10s of seconds or
more, re-run in case
of failures
Real-time Near real-time Batch
Questions? tiny.cloudera.com/app-arch-questions
But, there’s no free lunch
Constant low
milliseconds & under
Low milliseconds to
seconds, delay in
case of failures
10s of seconds or
more, re-run in case
of failures
Real-time Near real-time Batch
“Difficult” architectures, lower
latency
“Easier” architectures, higher
latency
Streaming use-cases
Questions? tiny.cloudera.com/app-arch-questions
Streaming Use-cases
▪ Ingestion (most relevant in our use-case)
▪ Simple transformations
- Decision (e.g. anomaly detection)
- Enrichment (e.g. add a state based on zipcode)
▪ Advanced usage
- Machine Learning
- Windowing
Questions? tiny.cloudera.com/app-arch-questions
#1 - Simple ingestion
Buffer
Event e Stream
Processing Long term
storage
Event e
Questions? tiny.cloudera.com/app-arch-questions
#2 - Enrichment
Buffer
Event e Stream
Processing Storage
Event e’
e’ = enriched event e
Context store
Questions? tiny.cloudera.com/app-arch-questions
#2 - Decision
Buffer
Event e Stream
Processing Storage
Event e’
e’ = e + decision
Rules
Questions? tiny.cloudera.com/app-arch-questions
#3 – Advanced usage
Buffer
Event e Stream
Processing Storage
Event e’
e’ = aggregation or
windowed aggregation
Model
Questions? tiny.cloudera.com/app-arch-questions
#1 – Simple Ingestion
1. Zero transformation
- No transformation, plain ingest
- Keep the original format – SequenceFile, Text, etc.
- Allows to store data that may have errors in the schema
2. Format transformation
- Simply change the format of the field
- To a structured format, say, Avro, for example
- Can do schema validation
3. Atomic transformation
- Mask a credit card number
Questions? tiny.cloudera.com/app-arch-questions
#2 - Enrichment
Buffer
Event e Stream
Processing Storage
Event e’
e’ = enriched event e
Context store
Need to store the
context
somewhere
Questions? tiny.cloudera.com/app-arch-questions
Where to store the context?
1. Locally Broadcast Cached Dim Data
- Local to Process (On Heap, Off Heap)
- Local to Node (Off Process)
2. Partitioned Cache
- Shuffle to move new data to partitioned cache
3. External Fetch Data (e.g. HBase, Memcached)
Questions? tiny.cloudera.com/app-arch-questions
#1a - Locally broadcast cached data
Could be
On heap or Off heap
Questions? tiny.cloudera.com/app-arch-questions
#1b - Off process cached data
Data is cached on the
node, outside of
process. Potentially in
an external system like
Rocks DB
Questions? tiny.cloudera.com/app-arch-questions
#2 - Partitioned cache data
Data is partitioned
based on field(s) and
then cached
Questions? tiny.cloudera.com/app-arch-questions
#3 - External fetch
Data fetched from
external system
Questions? tiny.cloudera.com/app-arch-questions
Partitioned cache + external
Streaming semantics
Questions? tiny.cloudera.com/app-arch-questions
Delivery Types
▪ At most once
- Not good for many cases
- Only where performance/SLA is more important than accuracy
▪ Exactly once
- Expensive to achieve but desirable
▪ At least once
- Easiest to achieve
Questions? tiny.cloudera.com/app-arch-questions
Semantics of our architecture
Source System 1
Destination
systemSource System 2
Source System 3
Ingest Extract Streaming
engine
Push
Message broker
Questions? tiny.cloudera.com/app-arch-questions
Classification of storage systems
▪ File based
- S3
- HDFS
▪ NoSQL
- HBase
- Cassandra
▪ Document based
- Search
▪ NoSQL-SQL
- Kudu
Questions? tiny.cloudera.com/app-arch-questions
Classification of storage systems
▪ File based
- S3
- HDFS
▪ NoSQL
- HBase
- Cassandra
▪ Document based
- Search
▪ NoSQL-SQL
- Kudu
De-duplication at file level
Semantics at key/record level
Which streaming
engine to choose?
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Processing &
Ingestion
Engine
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Apache
Beam
Kafka
Streams
Questions? tiny.cloudera.com/app-arch-questions
Spark Streaming
▪ Micro batch based architecture
▪ Allows stateful transformations
▪ Feature rich
- Windowing
- Sessionization
- ML
- SQL (Structured Streaming)
Questions? tiny.cloudera.com/app-arch-questions
Spark Streaming
DStream
DStream
DStream
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count Print
First
Batc
h
Second
Batch
Questions? tiny.cloudera.com/app-arch-questions
DStream
DStream
DStream
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count
Print
Source Receiver
RDD
partitions
RDD
Parition
RDD
Single Pass
Filter Count
Pre-first
Batch
First
Batc
h
Second
Batch
Stateful
RDD 1
Print
Stateful
RDD 2
Stateful
RDD 1
Spark Streaming
Questions? tiny.cloudera.com/app-arch-questions
Flink
▪ True “streaming” system, but not as feature rich as Spark
▪ Much better event time handling
▪ Good built-in backpressure support
▪ Allows stateful transformations
▪ Lower Latency
- No Micro Batching
- Asynchronous Barrier Snapshotting (ABS)
Questions? tiny.cloudera.com/app-arch-questions
Flink - ABS
Operator
Buffer
Questions? tiny.cloudera.com/app-arch-questions
Operator
Buffer
Operator
Buffer
Flink - ABS
Barrier 1A Hit
Barrier 1B
Still Behind
Questions? tiny.cloudera.com/app-arch-questions
Operator
Buffer
Flink - ABS
Both Barriers
Hit
Operator
Buffer
Barrier 1A Hit
Barrier 1B
Still Behind
Questions? tiny.cloudera.com/app-arch-questions
Operator
Buffer
Flink - ABS Both Barriers
Hit
Operator
Buffer Barrier is
combined and
can move on
Buffer can be
flushed out
Questions? tiny.cloudera.com/app-arch-questions
Storm
▪ Old school
▪ Didn’t manage state – had to use Trident
▪ No good support for batch processing
Questions? tiny.cloudera.com/app-arch-questions
Samza
▪ Good integration with Kafka
▪ Doesn’t support batch
▪ Forked by Kafka Streams
Questions? tiny.cloudera.com/app-arch-questions
Flume
▪ Well integrated with the Hadoop ecosystem
▪ Allowed interceptors (for simple transformations)
▪ Supports buffering
- Memory
- File
- Kafka
▪ But no real fault-tolerance
▪ No state management
Questions? tiny.cloudera.com/app-arch-questions
Others
▪ Apache Apex
▪ Kafka Streams
▪ Heron
Questions? tiny.cloudera.com/app-arch-questions
Apache Beam
▪ Abstraction on top of Streaming Engines
▪ Best support for Google Dataflow
Streaming in our use-
case
Questions? tiny.cloudera.com/app-arch-questions
Spark Streaming
▪ We chose Spark Streaming because:
- Same execution engine for batch and streaming
- Similar code for batch and streaming
- Support for security, kafka integration
- Thriving community
- We don’t have low millisecond requirements
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Storage Layer
Considerations
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Data Modeling
Questions? tiny.cloudera.com/app-arch-questions
Structured Landing Zones
Hive Relational Model
Kudu/HDFS
Hive Nested Model
HDFS
Aggregations
Kudu
HBase Entity Time
Series
Solr
Traditional SQL
Optimized for nested Structures like JSON
Optimized Storing and mutating aggregates
Optimized Entity 360 and time base access
Optimized faceted charts and reverse index look
ups
Questions? tiny.cloudera.com/app-arch-questions
Relational
▪ Everyone knows it
▪ Simple
▪ Very painful to do large Join
▪ May lead to customers making bad queries
▪ Easier to mutate
Questions? tiny.cloudera.com/app-arch-questions
Kudu Data Models
▪ Entity Summary Tables
- Quick update and access of aggregate of Entity Stats
▪ Event Tables
- Number of Partitioning strategies
- Partition by Entity
- Partition by Hash on time
Questions? tiny.cloudera.com/app-arch-questions
Kudu: Table Creation Example
CREATE EXTERNAL TABLE ny_taxi_trip (
vender_id STRING,
tpep_pickup_datetime TIMESTAMP,
tpep_dropoff_datetime TIMESTAMP,
passenger_count INT,
trip_distance DOUBLE,
pickup_longitude DOUBLE,
pickup_latitude DOUBLE,
rate_code_id STRING,
store_and_fwd_flag STRING,
dropoff_longitude DOUBLE,
dropoff_latitude DOUBLE,
payment_type STRING,
fare_amount DOUBLE,
extra DOUBLE,
mta_tax DOUBLE,
improvement_surcharge DOUBLE,
tip_amount DOUBLE,
tolls_amount DOUBLE,
total_amount DOUBLE
)
STORED AS PARQUET
LOCATION 'usr/root/hive/ny_taxi_trip';
Questions? tiny.cloudera.com/app-arch-questions
Kudu: Data Population
SparkStreamingTaxiTripToKudu.scala
Questions? tiny.cloudera.com/app-arch-questions
Kudu: REST API
KuduServiceLayer.scala
Questions? tiny.cloudera.com/app-arch-questions
View Strategies
Hive Relational Model
Hive Nested Model
Models
Hive Normal Views
Hive Materialized Table
Views
Use in the cases where the view requires
a join that is done through a shuffle
Use only for tables that filter
records/columns or use for marking fields
Questions? tiny.cloudera.com/app-arch-questions
Nested
▪ Less Space than Denormalization
▪ Still have tables but the cost of joins is all but gone
▪ Also great for cartesian joins
- N x M vs N + M
▪ Not really supported yet with Kudu or HBase with SQL
Questions? tiny.cloudera.com/app-arch-questions
Nested Writing Example in Spark
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" }
]
Questions? tiny.cloudera.com/app-arch-questions
Nested Writing Example in Spark
val jsonDF = hiveContext.read.json(jsonRDD)
jsonDF.write.parquet("./parquet")
hiveContext.createExternalTable("jsonNestedTable", "./parquet")
Questions? tiny.cloudera.com/app-arch-questions
Nested: Taxi Example
KuduToNestedHDFS.scala
Questions? tiny.cloudera.com/app-arch-questions
Entity Centric Time Series
▪ Partition by Entity ID
▪ Order by Time
▪ Allows for free windowing
▪ Allows for fetching of single time window of single entity at web scale
Questions? tiny.cloudera.com/app-arch-questions
HBase Entity Time Series
Cust-A, 10
Cust-A, 20
Cust-A, 40
Cust-C, 10
Cust-C, 20
Cust-C, 30
Cust-C, 40
Cust-B, 10
Cust-B, 20
Cust-B, 30
Cust-B, 40
Cust-F, 20
Cust-F, 30
Cust-F, 40
Cust-D, 10
Cust-D, 20
Cust-D, 40
Cust-G, 10
Cust-G, 20
Cust-G, 30
Cust-G, 40
Questions? tiny.cloudera.com/app-arch-questions
HBase Entity Time Series
Cust-A, 10
Cust-A, 20
Cust-A, 40
Cust-C, 10
Cust-C, 20
Cust-C, 30
Cust-C, 40
Cust-B, 10
Cust-B, 20
Cust-B, 30
Cust-B, 40
Cust-F, 20
Cust-F, 30
Cust-F, 40
Cust-D, 10
Cust-D, 20
Cust-D, 40
Cust-G, 10
Cust-G, 20
Cust-G, 30
Cust-G, 40
Rest Call Short Scan
Questions? tiny.cloudera.com/app-arch-questions
HBase Entity Time Series
Cust-A, 10
Cust-A, 20
Cust-A, 40
Cust-C, 10
Cust-C, 20
Cust-C, 30
Cust-C, 40
Cust-B, 10
Cust-B, 20
Cust-B, 30
Cust-B, 40
Cust-F, 20
Cust-F, 30
Cust-F, 40
Cust-D, 10
Cust-D, 20
Cust-D, 40
Cust-G, 10
Cust-G, 20
Cust-G, 30
Cust-G, 40
Mapper Mapper Mapper
Questions? tiny.cloudera.com/app-arch-questions
HBase Entity Time Series
Cust-A, 10
Cust-A, 20
Cust-A, 40
Cust-C, 10
Cust-C, 20
Cust-C, 30
Cust-C, 40
Cust-B, 10
Cust-B, 20
Cust-B, 30
Cust-B, 40
Cust-F, 20
Cust-F, 30
Cust-F, 40
Cust-D, 10
Cust-D, 20
Cust-D, 40
Cust-G, 10
Cust-G, 20
Cust-G, 30
Cust-G, 40
Mapper
Mapper Mapper
Questions? tiny.cloudera.com/app-arch-questions
HBase: Row Key Example
def generateRowKey(customerTrans: CustomerTran, numOfSalts:Int): Array[Byte] = {
val salt = StringUtils.leftPad(
Math.abs(customerTrans.customerId.hashCode % numOfSalts).toString, 4, "0")
Bytes.toBytes(salt + ":" +
customerTrans.customerId + ":" +
StringUtils.leftPad(customerTrans.eventTimeStamp.toString, 11, "0") + ":trans:" +
customerTrans.transId)
}
Questions? tiny.cloudera.com/app-arch-questions
HBase: Population Example
SparkStreamingTaxiTripToHBase.scala
Questions? tiny.cloudera.com/app-arch-questions
HBase: REST Example
HBaseServiceLayer.scala
Questions? tiny.cloudera.com/app-arch-questions
Solr: Data Model
▪ Think of it like a cube on a object type
- In our case a taxi trip
- Allows for rollups and aggregations from object’s point of view
- Think of objects as immutable
- Try to find time based events
- May design more than one object type
Questions? tiny.cloudera.com/app-arch-questions
Solr Details
1 Trip:101 1
2 Trip:102 1
3 Trip:103 1
ID Document Live
4 Trip:104 1
5 Trip:105 1
ID Field Value Documents
1 Cash 1,3
2 Credit 2
3 Debit 4,5
Questions? tiny.cloudera.com/app-arch-questions
Single Value Aggregations
▪ Get Array Lengths
ID Field Value Documents
1 Cash 1,3
2 Credit 2
3 Debit 4,5
Questions? tiny.cloudera.com/app-arch-questions
Multi Value Aggregations
▪ Ordered Merge Join
- Think like a zipper
- Scans
- No Lookups
▪ Top N from both sides
- Leaving the rest to other
▪ Indexes distributed
▪ No need to read document data
1 4 5 7 8 9 10 14 16
2 3 6 11 12 13 15 17 18
1 2 3 6 7 8 10 15 18
Cash
Credit
Vender A
4 5 9 11 12 13 14 16 17Vender B
Questions? tiny.cloudera.com/app-arch-questions
Solr: Population Example
SparkStreamingTaxiTripToSolR.scala
Questions? tiny.cloudera.com/app-arch-questions
Storage
High level architecture
Source Transport Stream
Processing
Access
Custom
Producer
or
Batch Processing
Considerations
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Questions? tiny.cloudera.com/app-arch-questions
Why have batch processing?
▪ When you need a larger context
- Say, to train a model
▪ Complex periodic job that does something
- Convert data to a nested structure for reduced number of shuffles
▪ In our use-case,
- Kudu -> HDFS Nested is batch processing
- KMeans calculation is also in bash
Questions? tiny.cloudera.com/app-arch-questions
Batch processing options
▪ Spark (+ MLlib)
▪ MapReduce (+ Mahout)
▪ Flink (+ Flink ML)
Questions? tiny.cloudera.com/app-arch-questions
Spark
▪ Pretty popular
▪ Much faster than MapReduce
▪ Thriving community
Questions? tiny.cloudera.com/app-arch-questions
MapReduce
▪ Sloooooow
Questions? tiny.cloudera.com/app-arch-questions
Flink
▪ Pretty popular
▪ Batch is a special case of Streaming
▪ Developing community
Questions? tiny.cloudera.com/app-arch-questions
In our use-case
▪ We chose Spark
- We were using Spark Streaming anyways
- Similar code between Spark and Spark Streaming
- Thriving community
Interactive
Data Access
Considerations
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Questions? tiny.cloudera.com/app-arch-questions
Types of data access
▪ REST server/APIs for querying entities and aggregates
▪ UI for displaying search facets
▪ SQL engine
REST servers
Considerations
Questions? tiny.cloudera.com/app-arch-questions
Why have REST server?
▪ Tired of business people telling us how to access data
▪ Serves as an interface between the data engineers and business folks
▪ Lets business folks decide access patterns
▪ Engineers to optimize those patterns
▪ Brownie points from your boss
▪ And, it’s not that difficult to write!
Questions? tiny.cloudera.com/app-arch-questions
Don’t believe me?
import org.mortbay.jetty.Server
import org.mortbay.jetty.servlet.{Context, ServletHolder}
…
val server = new Server(port)
val sh = new ServletHolder(classOf[ServletContainer])
sh.setInitParameter("com.sun.jersey.config.property.resourceConfigClass",
"com.sun.jersey.api.core.PackagesResourceConfig")
sh.setInitParameter("com.sun.jersey.config.property.packages",
"com.hadooparchitecturebook.taxi360.server.hbase")
sh.setInitParameter("com.sun.jersey.api.json.POJOMappingFeature", "true”)
val context = new Context(server, "/", Context.SESSIONS)
context.addServlet(sh, "/*”)
server.start()
server.join()
Questions? tiny.cloudera.com/app-arch-questions
Then, write a ServiceLayer
@GET
@Path("vender/{venderId}/timeline")
@Produces(Array(MediaType.APPLICATION_JSON))
def getTripTimeLine (@PathParam("venderId") venderId:String,
@QueryParam("startTime") startTime:String = Long.MinValue.toString,
@QueryParam("endTime") endTime:String = Long.MaxValue.toString):
Array[NyTaxiYellowTrip] = {
Questions? tiny.cloudera.com/app-arch-questions
Use REST! Say no to business people!
▪ Access data like so:
http://<serverURL>:8080/vender/{venderId}/timeline
UI
Considerations
Questions? tiny.cloudera.com/app-arch-questions
UI requirements
Something that can
▪ Represent search results really well
▪ Integrates with Apache Solr on Hadoop
Questions? tiny.cloudera.com/app-arch-questions
UI options
▪ Hue
▪ Banana
▪ Kibana
Questions? tiny.cloudera.com/app-arch-questions
We choose Hue
▪ Because it’s included
▪ Please look at the others
SQL engines
Considerations
Questions? tiny.cloudera.com/app-arch-questions
SQL engine criteria
▪ Low latency SQL access
▪ Allows for high concurrency
▪ JDBC/ODBC integration
▪ Capable of large scale aggregation
▪ Optionally integrates with Kudu for real-time updates to SQL tables
Questions? tiny.cloudera.com/app-arch-questions
Apache Hive
▪ Good JDBC integration
▪ Not really low latency, even when using Tez
▪ Doesn’t integrate with Kudu
▪ Can run at MR, Spark, or Tez
Questions? tiny.cloudera.com/app-arch-questions
Presto
▪ Low latency SQL engine from Facebook
▪ Provides JDBC/ODBC access
▪ Is only in-memory, large aggregations can lead to OOM errors
▪ Doesn’t integrate with Kudu
Questions? tiny.cloudera.com/app-arch-questions
Apache Impala
▪ Low latency SQL access
▪ Provides JDBC/ODBC access
▪ Excellent concurrency support
▪ Integrates with Kudu for real-time SQL
Questions? tiny.cloudera.com/app-arch-questions
Apache Drill
▪ Similar in architecture to Impala
▪ Provides JDBC/ODBC access
▪ Doesn’t integrate with Kudu
Questions? tiny.cloudera.com/app-arch-questions
Spark SQL
▪ Builds on top of Spark
▪ JDBC/ODBC access only via Spark Thrift Server
- Doesn’t scale well with larger number of concurrent users
- Doesn’t fully provide secure access.
Questions? tiny.cloudera.com/app-arch-questions
We choose
▪ Spark SQL
▪ Impala
Overall Architecture
Review
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
Processing &
Ingestion
Engine
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT Rest
NRT
Dashboard
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Nested
Tables
Indexed
Cube
Relational
Tables
Entity Time
Series Lookup
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Questions? tiny.cloudera.com/app-arch-questions
Storage
High level architecture
Source Transport Stream
Processing
Custom
Producer
or
Access
Batch
Processing
SQL
NRT REST
NRT
Dashboard
Questions? tiny.cloudera.com/app-arch-questions
Access
High level architecture
Source Transport Stream
Processing
Storage
Custom
Producer
or
Questions? tiny.cloudera.com/app-arch-questions
High level architecture
Source Transport Stream
Processing
Storage Access
Custom
Producer
or
Demo!
Questions? tiny.cloudera.com/app-arch-questions
High Level of the Demo Design
Producer
Kafka
Topic Foo
Partition 1
Partition 2
Partition 3
Spark
Streaming
Kudu
Spark
Streaming
HBase
Spark
Streaming
Solr
Spark
Streaming
HDFS
Kudu
HBase
Solr
HDFS
SQL
REST
REST
Hue
SQL
Where else to find us?
Questions? tiny.cloudera.com/app-arch-questions
Other Sessions
▪ Ask Us Anything session (all) – Wednesday, 2:35 PM
▪ Top Five Mistakes When Writing Spark Applications (Mark/Ted) – Wednesday,
11:15 AM
▪ Storage designs done right equal faster processing and access (Ted) –
Wednesday, 4:15 PM
Thank you!
@hadooparchbook
tiny.cloudera.com/app-arch-singapore
Jonathan Seidman | @jseidman
Ted Malaska | @ted_malaska
Mark Grover | @mark_grover
Ad

More Related Content

What's hot (20)

Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
hadooparchbook
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
hadooparchbook
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
hadooparchbook
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
hadooparchbook
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
DataWorks Summit
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
Jamie Grier
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
markgrover
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
MapR Technologies
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
Gwen (Chen) Shapira
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
Ran Wei
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Guido Schmutz
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Spark Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
 
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingThe Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
Spark Summit
 
Big Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneBig Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIne
Douglas Moore
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
hadooparchbook
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
hadooparchbook
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
hadooparchbook
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
hadooparchbook
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
DataWorks Summit
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
Jamie Grier
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
markgrover
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
MapR Technologies
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
Ran Wei
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Guido Schmutz
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Spark Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
 
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingThe Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
Spark Summit
 
Big Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneBig Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIne
Douglas Moore
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 

Viewers also liked (19)

Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
hadooparchbook
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
hadooparchbook
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
hadooparchbook
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
Zaloni
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
hadooparchbook
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata LondonHadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
hadooparchbook
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
Cloudera, Inc.
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
Caserta
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
Introduction to HCFS
Introduction to HCFSIntroduction to HCFS
Introduction to HCFS
Jazz Yao-Tsung Wang
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
hadooparchbook
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
hadooparchbook
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
hadooparchbook
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
Zaloni
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
hadooparchbook
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata LondonHadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
hadooparchbook
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
Cloudera, Inc.
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
Caserta
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
Ad

Similar to Architecting next generation big data platform (20)

Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018
Jonathan Seidman
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
Jonathan Seidman
 
Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018
Jonathan Seidman
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoop
markgrover
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
DataWorks Summit
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
Araf Karsh Hamid
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
Iulia Emanuela Iancuta
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
confluent
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
Jim Haughwout
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
confluent
 
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
 Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e... Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
VMware Tanzu
 
WebRTC 101
WebRTC 101WebRTC 101
WebRTC 101
Kensaku Komatsu
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
Xiao Li
 
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018
Jonathan Seidman
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
Jonathan Seidman
 
Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018
Jonathan Seidman
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoop
markgrover
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
DataWorks Summit
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
Iulia Emanuela Iancuta
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
confluent
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
Jim Haughwout
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
confluent
 
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
 Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e... Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
VMware Tanzu
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
Xiao Li
 
Ad

Recently uploaded (20)

MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 

Architecting next generation big data platform