SlideShare a Scribd company logo
Approaching real-time:
Things you can do before going Impala
Chris Huang
SPN Hadoop Architect
About – Chris Huang
• Chris Huang
– SPN Hadoop Architect
– SPN Dumbo Team
– Hadoop.TW Active Member
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 2
About – SPN
• SPN, Smart Protection Network
– 主動式雲端截毒技術
• 2013 Big Data Foresight Forum
– Scaling Big Data Mining Infrastructure: The Smart Protection
Network
http://www.slideshare.net/chenhsiu/scaling-bigdatamininginfra2
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 3
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 4
Batch v.s. Real-time
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 5
Batch, High Throughput Real-time, Timely Information
Q: How can I transport 10,000
people from Taipei to Kaohsiung?
Q: What’s the fastest way to Taipei
Train Station?
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 6
67%
Query Hadoop
using Hive
51%
Load data into
Hadoop in less than
90 mins
54%
Use HBase for
real-time data
access
* Cloudera customer survey Aug. 2012
Time is Money!
From Batch to Real-time
• Bridge the gap between batch and now
• 80/20 rule
– Hadoop solves 80% easily
– Remaining 20% takes 80% of the efforts
• Go as close as possible, don’t overdo it!
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 7
What is Real-time?
• Real-time is NOT always “faster than batch”
– If you have really BIG DATA
• Most of the time, we want Timely Information
• Minimize the gap between scheduled MR jobs
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 8
Hourly Job
Hourly Job
Hourly Job
How to get result at 1:33?
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 9
So, You want
to talk about
Impala?
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 10
NO
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 11
Impala is not
silver bullet
* Here Impala denotes any interactive query solution, including Apache Drill, Apache Tez + Stinger
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 12
You can do a
lot before
using Impala
3 Arrows for Real-time Applications
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 13
HBase (20%)
SolrCloud (60%)
Streaming (20%)
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc.
Example Case
14
Question 1
• If we get a C&C malicious URL
hxxp://www.thebadguy.com/?info=12345678
• Yesterday, Who accessed that URL? From where, How?
What’s the frequency?
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 15
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 16
Very Simple
But we have 5 billion lines of log per day
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 17
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 18
It takes about
20 minutes
~1 hour if you’re not lucky
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 19
And we may
query 50,000
times a day
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 20
We need a
real-time
(interactive)
system
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 21
1st Arrow:
HBase
Make Good Use of HBase Row Key
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 22
Region Start Key End Key
R1 net.pwnnetwork#201208 net.tlm100.f19e100f#201304
R2 net.tlm100.f19e100f#201304 nl.efkobeton.www#201211
R3 nl.efkobeton.www#201211 no.rubrikk#201305
R4 no.rubrikk#201305 org.saintalphonsus.www#201304
R5 org.saintalphonsus.www#201304 pl.opole.uni.socjologia.www#201301
com.domain.reverse#YYYYMMDD
Easy retrieve data by
row key scan
Hadoop in Taiwan 2012 –設計高效能 HBase Schema 了解HBase
http://www.youtube.com/watch?v=8DMzNmVrXEI
Compute Once, Import Once
• Clarify your use case
• Compute the whole thing once
– Daily job + hourly job
• Import into HBase using Bulk Loading
• On the fly query, with constant query time
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 23
If You Really Care About Real-Time
• Delta data are not big, don’t use MR
• Write another program to calculate on the fly
• Dynamically put into HBase
– Row key: com.domain.reverse#YYYYMMDD_HHmmss
• Query from both hourly batch and delta data
• Drop delta data in next hourly batch
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 24
2 am 3 am
Delta data
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 25
But...
Life suffers because of “but”
Question 2
• Query malicious sites with pattern *.com hosted in Japan,
sorted by the distance to GeoLocation (30.0,130.0)
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 26
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 27
HBase does
not have 2nd
index (yet)
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 28
2nd Arrow:
SolrCloud
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 29
Lucene, Solr,
SolrCloud
TW Hadoop User Group Q1 Meetup - Solr Tutorial
http://www.slideshare.net/chenhsiu/20130310-solr-tuorial
What is Lucene?
• Full-text search library
• Written in Java
• Indexing & searching
• One of the top 5 Apache projects
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 30
Inverted Index
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 31
https://developer.apple.com/library/mac/#documentation/userexperience/Conceptual/SearchKitConcepts/searchKit_basics/searc
hKit_basics.html
What is Solr?
• Enterprise search server based on Lucene
– NOT a database
• Advanced full-text search capabilities
• Flexible and adaptable with XML configuration
• Extensible plug-in architecture
• REST-like APIs
• Web admin interface
• Runs inside a Java servlet container such as Jetty and
Tomcat
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 32
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 33
Use Hadoop
MapReduce for
Indexing
Lucene Indexing Flow
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 34
Use SolrCloud
for Scalable,
Fault Tolerant
Query
Solr: Index Query Flow
What is SolrCloud?
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 35
Indexing in SolrCloud
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 36
Searching in SolrCloud
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 37
Question 2
• Query malicious sites with pattern *.com hosted in Japan,
sorted by the distance to GeoLocation (30.0,130.0)
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 38
A = load 'date://2013/09/28' using NSCTmProxyURLFProtobufLoader();
B = foreach A generate value.addr.peerIp as ip,
value.NSCLog.URL as url, Location(value.addr.peerIp) as loc;
C = foreach B generate ip, url, loc.countryName as cn,
CONCAT(CONCAT((chararray)loc.latitude, ','),
(chararray)loc.longitude) as loc;
store C into 'solrcloud://$COLLECTION'
using SolrStorage('ip_s,url_domain,cn_s,loc_p',
'$USERNAME', '$PASSWORD');
hxxp://$SERVER:8983/solr/$SHARD/select?q=cn_s:Japan+url_s:com*&wt=js
on&indent=true&rows=5&sort=geodist(loc_p,30.0,130.0)+asc
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 39
That’s it?
YES
If You Really Care About Real-Time
• Delta data are not big, don’t use MR
• Write another program to calculate on the fly
• Solr supports dynamic indexing
– Send your data to Solr to create a delta index
• Query from both batch index and delta index
• Drop delta index in next hourly batch
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 40
2 am 3 am
Delta data
Domain/IP Census
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 41
www.facebook.com
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 42
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 43
Excellent!
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 44
But...
Life suffers because of “but”
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 45
We need to
identify use
case first
Yesterday, Who accessed
hxxp://www.thebadbuy.com?
From where, How? What’s the
frequency?
Query malicious sites with
pattern *.com hosted in Japan,
sorted by the distance to
GeoLocation (30.0,130.0)
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 46
3rd Arrow:
Streaming
Question 1 Revisited
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 47
Yesterday, Who accessed
hxxp://www.thebadbuy.com?
From where, How? What’s the
frequency?
• Can you send email when there is a
contact to specific C&C server?
• Can you monitor a specific client IP to
a list of C&C server?
• I found there is certain pattern in C&C
URL paths, can you give me a hourly
update of top 10 path grouping?
• Report the C&C connect’s parent
process SHA-1 to Virus DB for
sourcing
The Messaging
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 48
OSDC.TW 2012 - TME: Open Source Realtime Big Data Processing Platform
http://cloud.github.com/downloads/trendmicro/tme/TME_Introduction_OSDC.tw2012%20.pdf
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 49
Let’s dump
the data
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 50
You need lots
of workers!
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 51
Your boss
won’t buy you
another 100
servers
NextGen MapReduce (YARN)
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 52
Storm-YARN
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 53
Storm-on-YARN: Convergence of Low-Latency and Big-Data
http://www.slideshare.net/Hadoop_Summit/feng-june26-1120amhall1v2
Continuously Processing
• Calculate data on the fly, endless processing
• Hook up your processing anytime
– Or store scripts on ZooKeeper
• Leverage your existing Hadoop cluster
• Dynamically scale in/out your workers
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 54
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 55
Summary
3 Arrows for Real-time Applications
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 56
HBase (20%)
SolrCloud (60%)
Streaming (20%)
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 57
80/20 Rule
As close as
possible,
don’t overdo
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 58
Why not just
use Impala?
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 59
The same
problem,
anyway
Q&A
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 60
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 61
You’re Brilliant
We’re hiring!
9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 62
Ad

More Related Content

What's hot (20)

Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Carol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat Detection
Databricks
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation Workshop
MapR Technologies
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
Sriram Krishnan
 
ebay
ebayebay
ebay
DataWorks Summit/Hadoop Summit
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Brian O'Neill
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Carol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat Detection
Databricks
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation Workshop
MapR Technologies
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
Sriram Krishnan
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Brian O'Neill
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
 

Viewers also liked (9)

重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1
Chris Huang
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)
Chris Huang
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)
Chris Huang
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)
Chris Huang
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)
Chris Huang
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)
Chris Huang
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)
Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2
Chris Huang
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)
Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1
Chris Huang
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)
Chris Huang
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)
Chris Huang
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)
Chris Huang
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)
Chris Huang
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)
Chris Huang
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)
Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2
Chris Huang
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)
Chris Huang
 
Ad

Similar to Approaching real-time-hadoop (20)

LF Energy Arras GridChat Webinar 30 January 2025_share.pdf
LF Energy Arras GridChat Webinar 30 January 2025_share.pdfLF Energy Arras GridChat Webinar 30 January 2025_share.pdf
LF Energy Arras GridChat Webinar 30 January 2025_share.pdf
DanBrown980551
 
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
 "How overlay networks can make public clouds your global WAN" by Ryan Koop o... "How overlay networks can make public clouds your global WAN" by Ryan Koop o...
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
Cohesive Networks
 
A Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on HadoopA Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on Hadoop
Chung-Tsai Su
 
Highway to heaven - Microservices Meetup Dublin
Highway to heaven - Microservices Meetup DublinHighway to heaven - Microservices Meetup Dublin
Highway to heaven - Microservices Meetup Dublin
Christian Deger
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
MapR Technologies
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Cloudera Japan
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
Ted Dunning
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
Dr. Mirko Kämpf
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
Jan Kunigk
 
"How overlay networks can make public clouds your global WAN" from LASCON 2013
"How overlay networks can make public clouds your global WAN" from LASCON 2013"How overlay networks can make public clouds your global WAN" from LASCON 2013
"How overlay networks can make public clouds your global WAN" from LASCON 2013
Ryan Koop
 
Cloudera streaming with flink oct 29, 2020 meetup london
Cloudera streaming with flink oct 29, 2020 meetup londonCloudera streaming with flink oct 29, 2020 meetup london
Cloudera streaming with flink oct 29, 2020 meetup london
Timothy Spann
 
A Secure and Dynamic Multi Keyword Ranked Search over Encrypted Cloud Data
A Secure and Dynamic Multi Keyword Ranked Search over Encrypted Cloud DataA Secure and Dynamic Multi Keyword Ranked Search over Encrypted Cloud Data
A Secure and Dynamic Multi Keyword Ranked Search over Encrypted Cloud Data
IRJET Journal
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
Timothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e
 
Elephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud readyElephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud ready
Krzysztof Adamski
 
Elephants in the cloud or How to become cloud ready
Elephants in the cloud or How to become cloud readyElephants in the cloud or How to become cloud ready
Elephants in the cloud or How to become cloud ready
GetInData
 
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Evention
 
[TDC 2013] Integre um grid de dados em memória na sua Arquitetura
[TDC 2013] Integre um grid de dados em memória na sua Arquitetura[TDC 2013] Integre um grid de dados em memória na sua Arquitetura
[TDC 2013] Integre um grid de dados em memória na sua Arquitetura
Fernando Galdino
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
Neo4j
 
Parameter study bonn
Parameter study bonnParameter study bonn
Parameter study bonn
Gabriel Stöckle
 
LF Energy Arras GridChat Webinar 30 January 2025_share.pdf
LF Energy Arras GridChat Webinar 30 January 2025_share.pdfLF Energy Arras GridChat Webinar 30 January 2025_share.pdf
LF Energy Arras GridChat Webinar 30 January 2025_share.pdf
DanBrown980551
 
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
 "How overlay networks can make public clouds your global WAN" by Ryan Koop o... "How overlay networks can make public clouds your global WAN" by Ryan Koop o...
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
Cohesive Networks
 
A Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on HadoopA Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on Hadoop
Chung-Tsai Su
 
Highway to heaven - Microservices Meetup Dublin
Highway to heaven - Microservices Meetup DublinHighway to heaven - Microservices Meetup Dublin
Highway to heaven - Microservices Meetup Dublin
Christian Deger
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Cloudera Japan
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
Ted Dunning
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
Dr. Mirko Kämpf
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
Jan Kunigk
 
"How overlay networks can make public clouds your global WAN" from LASCON 2013
"How overlay networks can make public clouds your global WAN" from LASCON 2013"How overlay networks can make public clouds your global WAN" from LASCON 2013
"How overlay networks can make public clouds your global WAN" from LASCON 2013
Ryan Koop
 
Cloudera streaming with flink oct 29, 2020 meetup london
Cloudera streaming with flink oct 29, 2020 meetup londonCloudera streaming with flink oct 29, 2020 meetup london
Cloudera streaming with flink oct 29, 2020 meetup london
Timothy Spann
 
A Secure and Dynamic Multi Keyword Ranked Search over Encrypted Cloud Data
A Secure and Dynamic Multi Keyword Ranked Search over Encrypted Cloud DataA Secure and Dynamic Multi Keyword Ranked Search over Encrypted Cloud Data
A Secure and Dynamic Multi Keyword Ranked Search over Encrypted Cloud Data
IRJET Journal
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
Timothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e
 
Elephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud readyElephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud ready
Krzysztof Adamski
 
Elephants in the cloud or How to become cloud ready
Elephants in the cloud or How to become cloud readyElephants in the cloud or How to become cloud ready
Elephants in the cloud or How to become cloud ready
GetInData
 
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Evention
 
[TDC 2013] Integre um grid de dados em memória na sua Arquitetura
[TDC 2013] Integre um grid de dados em memória na sua Arquitetura[TDC 2013] Integre um grid de dados em memória na sua Arquitetura
[TDC 2013] Integre um grid de dados em memória na sua Arquitetura
Fernando Galdino
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
Neo4j
 
Ad

More from Chris Huang (18)

Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learning
Chris Huang
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10
Chris Huang
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
Chris Huang
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Chris Huang
 
Wissbi osdc pdf
Wissbi osdc pdfWissbi osdc pdf
Wissbi osdc pdf
Chris Huang
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
Chris Huang
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
Chris Huang
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)
Chris Huang
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
Chris Huang
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong he
Chris Huang
 
Social English Class HW4
Social English Class HW4Social English Class HW4
Social English Class HW4
Chris Huang
 
Social English Class HW3
Social English Class HW3Social English Class HW3
Social English Class HW3
Chris Huang
 
Sm Case1 Ikea
Sm Case1 IkeaSm Case1 Ikea
Sm Case1 Ikea
Chris Huang
 
火柴人的故事
火柴人的故事火柴人的故事
火柴人的故事
Chris Huang
 
中德文化比較
中德文化比較中德文化比較
中德文化比較
Chris Huang
 
Sm Case4 Fuji Xerox
Sm Case4 Fuji XeroxSm Case4 Fuji Xerox
Sm Case4 Fuji Xerox
Chris Huang
 
Disney報告 最終版
Disney報告 最終版Disney報告 最終版
Disney報告 最終版
Chris Huang
 
Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learning
Chris Huang
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10
Chris Huang
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
Chris Huang
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Chris Huang
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
Chris Huang
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
Chris Huang
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)
Chris Huang
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
Chris Huang
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong he
Chris Huang
 
Social English Class HW4
Social English Class HW4Social English Class HW4
Social English Class HW4
Chris Huang
 
Social English Class HW3
Social English Class HW3Social English Class HW3
Social English Class HW3
Chris Huang
 
火柴人的故事
火柴人的故事火柴人的故事
火柴人的故事
Chris Huang
 
中德文化比較
中德文化比較中德文化比較
中德文化比較
Chris Huang
 
Sm Case4 Fuji Xerox
Sm Case4 Fuji XeroxSm Case4 Fuji Xerox
Sm Case4 Fuji Xerox
Chris Huang
 
Disney報告 最終版
Disney報告 最終版Disney報告 最終版
Disney報告 最終版
Chris Huang
 

Recently uploaded (20)

Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 

Approaching real-time-hadoop

  • 1. Approaching real-time: Things you can do before going Impala Chris Huang SPN Hadoop Architect
  • 2. About – Chris Huang • Chris Huang – SPN Hadoop Architect – SPN Dumbo Team – Hadoop.TW Active Member 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 2
  • 3. About – SPN • SPN, Smart Protection Network – 主動式雲端截毒技術 • 2013 Big Data Foresight Forum – Scaling Big Data Mining Infrastructure: The Smart Protection Network http://www.slideshare.net/chenhsiu/scaling-bigdatamininginfra2 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 3
  • 4. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 4
  • 5. Batch v.s. Real-time 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 5 Batch, High Throughput Real-time, Timely Information Q: How can I transport 10,000 people from Taipei to Kaohsiung? Q: What’s the fastest way to Taipei Train Station?
  • 6. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 6 67% Query Hadoop using Hive 51% Load data into Hadoop in less than 90 mins 54% Use HBase for real-time data access * Cloudera customer survey Aug. 2012 Time is Money!
  • 7. From Batch to Real-time • Bridge the gap between batch and now • 80/20 rule – Hadoop solves 80% easily – Remaining 20% takes 80% of the efforts • Go as close as possible, don’t overdo it! 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 7
  • 8. What is Real-time? • Real-time is NOT always “faster than batch” – If you have really BIG DATA • Most of the time, we want Timely Information • Minimize the gap between scheduled MR jobs 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 8 Hourly Job Hourly Job Hourly Job How to get result at 1:33?
  • 9. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 9 So, You want to talk about Impala?
  • 10. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 10 NO
  • 11. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 11 Impala is not silver bullet * Here Impala denotes any interactive query solution, including Apache Drill, Apache Tez + Stinger
  • 12. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 12 You can do a lot before using Impala
  • 13. 3 Arrows for Real-time Applications 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 13 HBase (20%) SolrCloud (60%) Streaming (20%)
  • 14. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. Example Case 14
  • 15. Question 1 • If we get a C&C malicious URL hxxp://www.thebadguy.com/?info=12345678 • Yesterday, Who accessed that URL? From where, How? What’s the frequency? 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 15
  • 16. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 16 Very Simple
  • 17. But we have 5 billion lines of log per day 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 17
  • 18. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 18 It takes about 20 minutes ~1 hour if you’re not lucky
  • 19. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 19 And we may query 50,000 times a day
  • 20. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 20 We need a real-time (interactive) system
  • 21. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 21 1st Arrow: HBase
  • 22. Make Good Use of HBase Row Key 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 22 Region Start Key End Key R1 net.pwnnetwork#201208 net.tlm100.f19e100f#201304 R2 net.tlm100.f19e100f#201304 nl.efkobeton.www#201211 R3 nl.efkobeton.www#201211 no.rubrikk#201305 R4 no.rubrikk#201305 org.saintalphonsus.www#201304 R5 org.saintalphonsus.www#201304 pl.opole.uni.socjologia.www#201301 com.domain.reverse#YYYYMMDD Easy retrieve data by row key scan Hadoop in Taiwan 2012 –設計高效能 HBase Schema 了解HBase http://www.youtube.com/watch?v=8DMzNmVrXEI
  • 23. Compute Once, Import Once • Clarify your use case • Compute the whole thing once – Daily job + hourly job • Import into HBase using Bulk Loading • On the fly query, with constant query time 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 23
  • 24. If You Really Care About Real-Time • Delta data are not big, don’t use MR • Write another program to calculate on the fly • Dynamically put into HBase – Row key: com.domain.reverse#YYYYMMDD_HHmmss • Query from both hourly batch and delta data • Drop delta data in next hourly batch 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 24 2 am 3 am Delta data
  • 25. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 25 But... Life suffers because of “but”
  • 26. Question 2 • Query malicious sites with pattern *.com hosted in Japan, sorted by the distance to GeoLocation (30.0,130.0) 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 26
  • 27. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 27 HBase does not have 2nd index (yet)
  • 28. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 28 2nd Arrow: SolrCloud
  • 29. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 29 Lucene, Solr, SolrCloud TW Hadoop User Group Q1 Meetup - Solr Tutorial http://www.slideshare.net/chenhsiu/20130310-solr-tuorial
  • 30. What is Lucene? • Full-text search library • Written in Java • Indexing & searching • One of the top 5 Apache projects 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 30
  • 31. Inverted Index 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 31 https://developer.apple.com/library/mac/#documentation/userexperience/Conceptual/SearchKitConcepts/searchKit_basics/searc hKit_basics.html
  • 32. What is Solr? • Enterprise search server based on Lucene – NOT a database • Advanced full-text search capabilities • Flexible and adaptable with XML configuration • Extensible plug-in architecture • REST-like APIs • Web admin interface • Runs inside a Java servlet container such as Jetty and Tomcat 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 32
  • 33. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 33 Use Hadoop MapReduce for Indexing Lucene Indexing Flow
  • 34. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 34 Use SolrCloud for Scalable, Fault Tolerant Query Solr: Index Query Flow
  • 35. What is SolrCloud? 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 35
  • 36. Indexing in SolrCloud 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 36
  • 37. Searching in SolrCloud 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 37
  • 38. Question 2 • Query malicious sites with pattern *.com hosted in Japan, sorted by the distance to GeoLocation (30.0,130.0) 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 38 A = load 'date://2013/09/28' using NSCTmProxyURLFProtobufLoader(); B = foreach A generate value.addr.peerIp as ip, value.NSCLog.URL as url, Location(value.addr.peerIp) as loc; C = foreach B generate ip, url, loc.countryName as cn, CONCAT(CONCAT((chararray)loc.latitude, ','), (chararray)loc.longitude) as loc; store C into 'solrcloud://$COLLECTION' using SolrStorage('ip_s,url_domain,cn_s,loc_p', '$USERNAME', '$PASSWORD'); hxxp://$SERVER:8983/solr/$SHARD/select?q=cn_s:Japan+url_s:com*&wt=js on&indent=true&rows=5&sort=geodist(loc_p,30.0,130.0)+asc
  • 39. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 39 That’s it? YES
  • 40. If You Really Care About Real-Time • Delta data are not big, don’t use MR • Write another program to calculate on the fly • Solr supports dynamic indexing – Send your data to Solr to create a delta index • Query from both batch index and delta index • Drop delta index in next hourly batch 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 40 2 am 3 am Delta data
  • 41. Domain/IP Census 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 41
  • 42. www.facebook.com 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 42
  • 43. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 43 Excellent!
  • 44. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 44 But... Life suffers because of “but”
  • 45. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 45 We need to identify use case first Yesterday, Who accessed hxxp://www.thebadbuy.com? From where, How? What’s the frequency? Query malicious sites with pattern *.com hosted in Japan, sorted by the distance to GeoLocation (30.0,130.0)
  • 46. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 46 3rd Arrow: Streaming
  • 47. Question 1 Revisited 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 47 Yesterday, Who accessed hxxp://www.thebadbuy.com? From where, How? What’s the frequency? • Can you send email when there is a contact to specific C&C server? • Can you monitor a specific client IP to a list of C&C server? • I found there is certain pattern in C&C URL paths, can you give me a hourly update of top 10 path grouping? • Report the C&C connect’s parent process SHA-1 to Virus DB for sourcing
  • 48. The Messaging 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 48 OSDC.TW 2012 - TME: Open Source Realtime Big Data Processing Platform http://cloud.github.com/downloads/trendmicro/tme/TME_Introduction_OSDC.tw2012%20.pdf
  • 49. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 49 Let’s dump the data
  • 50. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 50 You need lots of workers!
  • 51. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 51 Your boss won’t buy you another 100 servers
  • 52. NextGen MapReduce (YARN) 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 52
  • 53. Storm-YARN 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 53 Storm-on-YARN: Convergence of Low-Latency and Big-Data http://www.slideshare.net/Hadoop_Summit/feng-june26-1120amhall1v2
  • 54. Continuously Processing • Calculate data on the fly, endless processing • Hook up your processing anytime – Or store scripts on ZooKeeper • Leverage your existing Hadoop cluster • Dynamically scale in/out your workers 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 54
  • 55. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 55 Summary
  • 56. 3 Arrows for Real-time Applications 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 56 HBase (20%) SolrCloud (60%) Streaming (20%)
  • 57. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 57 80/20 Rule As close as possible, don’t overdo
  • 58. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 58 Why not just use Impala?
  • 59. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 59 The same problem, anyway
  • 60. Q&A 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 60
  • 61. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 61 You’re Brilliant We’re hiring!
  • 62. 9/28/2013 Confidential | Copyright 2013 TrendMicro Inc. 62

Editor's Notes

  • #2: The analytics platform at Trend Micro has experienced tremendous growth over the past few years in terms of size and complexity. In this talk, we’ll discuss the evolution of our infrastructure and the development of capabilities for data mining on “big data”.