SlideShare a Scribd company logo
Hadoop Presentation
       2012

Presenter : Pham Thai Hoa
Email : thaihoabo@gmail.com
Web : http://mobion.com/hoa



                    4/14/2012   Pham Thai Hoa
Topic
 Introduce to Hadoop
 Introduce to Hive
 Introduce to Logger
 Using Hadoop at Mobion
 Warehouse at Mobion
 Q&A




                4/14/2012   Pham Thai Hoa
What is Hadoop
 It’s a framework for the distributed
  processing
 Inspired by Google’s architecture: Map
  Reduce and GFS
 A top-level Apache project
 Hadoop is the open source
 Hadoop have the two important
  elements
  + Map – Reduce core
  + Hadoop Distributed File System
                  4/14/2012   Pham Thai Hoa
Why use Hadoop
 Fault-tolerant hardware is expensive
 Hadoop is designed to run on cheap
  commodity hardware
 It automatically handles data
  replication and node failure
 It does the hard work – you can focus
  on processing data
 It has the three supported modes :
  Local, Pseudo-Distributed, Fully-
  Distributed Mode
                  4/14/2012   Pham Thai Hoa
Data Flow into Hadoop




         4/14/2012   Pham Thai Hoa
Who use Hadoop
 Amazon's product search indices
  using the streaming API and pre-
  existing C++, Perl, and Python tools
 Yahoo : More than 100,000 CPUs in
  >40,000 computers running Hadoop
 Facebook use Hadoop to store copies
  of internal log and dimension data
  sources and use it as a source for
  reporting/analytics and machine
  learning
                 4/14/2012   Pham Thai Hoa
What is Hive
 Hive is a data warehouse system for
  Hadoop
 Using Map-Reduce for execution
 Using HDFS for storage
 Metadata in an RDBMS
 Scalability and performance
 Interoperability
 Using a SQL-like language called
  HiveQL
                  4/14/2012   Pham Thai Hoa
Data Flow into Hive




        4/14/2012   Pham Thai Hoa
Hive Data Model
 Tables
  + Typed columns (int, float, string,…)
  + Also, array/map/struct for JSON-like
  data
 Partitions
  + e.g., to range-partition tables by
  date
 Buckets
  + Hash partitions within ranges (useful
  for sampling, join optimization)
                   4/14/2012   Pham Thai Hoa
Hive Metastore
 Database: namespace containing a
  set of tables
 Holds Table/Partition definitions
  (column types,mappings to HDFS
  directories)
 Statistics
 Implemented with DataNucleus ORM.
  Runs on Derby, MySQL, and many
  other relational databases
                4/14/2012   Pham Thai Hoa
Introduce to Logger
 A logging system has three broad
  components
  + Client Code Interface
  + Distribution System
  + Do Something Usefullizer
 Scribe is a server for aggregating
  streaming log data. It is designed to
  scale to a very large number of nodes
  and be robust to network and node
  failures
                  4/14/2012   Pham Thai Hoa
Why use Scribe
 Scalability and performance
 Event Notification library
 Thrift framework
 Hadoop is optional
 Client using
 Distributed scribe system
 Over 1 million messages per second
  for logging
 Hierarchy stores

                 4/14/2012   Pham Thai Hoa
Warehouse at Mobion
 Log Collector
 Log/Data Transformer
 Data Analyzer
 Web Reporter
 Log define
 Log integrate (into application)
 Log/Data analyze
 Report develop (API, Mobion, Music
  …)
                 4/14/2012   Pham Thai Hoa
Warehouse at Mobion
 Data mining
 Music Recommendation
 Spam Detection
 Application performance
 Export data and import into MySQL for
  web report
 Analytic system



                  4/14/2012   Pham Thai Hoa
Q&A
 Why use hadoop ?
 Why use Hive ?
 Why need a logging system ?
 What is the warehouse system
  architecture ?
 Do we use these system for voting,
  chat, message and feed ??
 How can we use them for
  recommendation, suggestion ?

                  4/14/2012   Pham Thai Hoa
Following Link
 http://facebook.com
 http://highscalability.com/product-
  scribe-facebooks-scalable-logging-
  system
 http://hadoop.apache.org/
 http://hive.apache.org/
 http://wiki.apache.org/hadoop/Powere
  dBy
 http://www.apache.org/foundation/than
  ks.html         4/14/2012   Pham Thai Hoa
THANK YOU
   4/14/2012   Pham Thai Hoa

More Related Content

What's hot (20)

PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
PPTX
PPT on Hadoop
Shubham Parmar
 
PDF
Introduction to Big Data & Hadoop
Edureka!
 
PPTX
Hadoop Presentation - PPT
Anand Pandey
 
PDF
Big data Hadoop Analytic and Data warehouse comparison guide
Danairat Thanabodithammachari
 
PPTX
Apache hadoop introduction and architecture
Harikrishnan K
 
PPTX
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
PPSX
Hadoop
Nishant Gandhi
 
PDF
Introduction to Bigdata and HADOOP
vinoth kumar
 
PPTX
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
PDF
Hadoop tools with Examples
Joe McTee
 
PPTX
HADOOP TECHNOLOGY ppt
sravya raju
 
PPTX
Introduction to Hadoop Technology
Manish Borkar
 
PPTX
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
PPTX
Hadoop technology
tipanagiriharika
 
PPTX
Big data processing with apache spark part1
Abbas Maazallahi
 
PDF
Hadoop Ecosystem
Sandip Darwade
 
PPTX
Big data Analytics Hadoop
Mishika Bharadwaj
 
PPTX
Big Data Concepts
Ahmed Salman
 
PPTX
Introduction to Big Data and Hadoop
Edureka!
 
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
PPT on Hadoop
Shubham Parmar
 
Introduction to Big Data & Hadoop
Edureka!
 
Hadoop Presentation - PPT
Anand Pandey
 
Big data Hadoop Analytic and Data warehouse comparison guide
Danairat Thanabodithammachari
 
Apache hadoop introduction and architecture
Harikrishnan K
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Introduction to Bigdata and HADOOP
vinoth kumar
 
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
Hadoop tools with Examples
Joe McTee
 
HADOOP TECHNOLOGY ppt
sravya raju
 
Introduction to Hadoop Technology
Manish Borkar
 
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Hadoop technology
tipanagiriharika
 
Big data processing with apache spark part1
Abbas Maazallahi
 
Hadoop Ecosystem
Sandip Darwade
 
Big data Analytics Hadoop
Mishika Bharadwaj
 
Big Data Concepts
Ahmed Salman
 
Introduction to Big Data and Hadoop
Edureka!
 

Viewers also liked (18)

PPT
Seminar Presentation Hadoop
Varun Narang
 
PPTX
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
PPTX
Big Data Analytics with Hadoop
Philippe Julio
 
PPTX
Pig, Making Hadoop Easy
Nick Dimiduk
 
PDF
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
PDF
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
 
PDF
Hive Quick Start Tutorial
Carl Steinbach
 
PDF
Integration of Hive and HBase
Hortonworks
 
ODP
Hadoop demo ppt
Phil Young
 
PPT
Introduction To Map Reduce
rantav
 
PDF
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
 
PPT
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
 
KEY
Hadoop, Pig, and Twitter (NoSQL East 2009)
Kevin Weil
 
PPTX
Food & Beverage Liability Insurance
Tom Wallace, CIC, ARM
 
PPT
Room Viewer
roomviewer
 
PPTX
The New Enterprise Data Platform
Krishnan Parasuraman
 
PDF
Apartment buildings insurance
CompleteMarkets/INSOMIS Corp.
 
PDF
Life Insurance Facts
PolicyBoss
 
Seminar Presentation Hadoop
Varun Narang
 
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Big Data Analytics with Hadoop
Philippe Julio
 
Pig, Making Hadoop Easy
Nick Dimiduk
 
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
 
Hive Quick Start Tutorial
Carl Steinbach
 
Integration of Hive and HBase
Hortonworks
 
Hadoop demo ppt
Phil Young
 
Introduction To Map Reduce
rantav
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
 
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Kevin Weil
 
Food & Beverage Liability Insurance
Tom Wallace, CIC, ARM
 
Room Viewer
roomviewer
 
The New Enterprise Data Platform
Krishnan Parasuraman
 
Apartment buildings insurance
CompleteMarkets/INSOMIS Corp.
 
Life Insurance Facts
PolicyBoss
 
Ad

Similar to Hadoop Presentation (20)

PPT
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
 
PPTX
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
ODP
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
Evert Lammerts
 
PDF
Sam fineberg big_data_hadoop_storage_options_3v9-1
Pramod Gosavi
 
PPT
Unit-3_BDA.ppt
PoojaShah174393
 
PDF
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
PPT
Hadoop distributed file system (HDFS), HDFS concept
kuthubussaman1
 
PPT
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
PPTX
Hadoop basics
Laxmi Rauth
 
PPTX
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
PPTX
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
PDF
Hadoop training kit from lcc infotech
lccinfotech
 
PDF
WHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USES
Sprintzeal
 
PPTX
OOP 2014
Emil Andreas Siemes
 
PPTX
Hadoop online training
Keylabs
 
PPTX
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
PPTX
Hadoop hdfs
Sudipta Ghosh
 
PPT
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
Amr Awadallah
 
PPTX
Introduction to Data Analyst Training
Cloudera, Inc.
 
PPTX
Hybrid Data Warehouse Hadoop Implementations
David Portnoy
 
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
Evert Lammerts
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Pramod Gosavi
 
Unit-3_BDA.ppt
PoojaShah174393
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Hadoop distributed file system (HDFS), HDFS concept
kuthubussaman1
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Hadoop basics
Laxmi Rauth
 
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
Hadoop training kit from lcc infotech
lccinfotech
 
WHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USES
Sprintzeal
 
Hadoop online training
Keylabs
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
Hadoop hdfs
Sudipta Ghosh
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
Amr Awadallah
 
Introduction to Data Analyst Training
Cloudera, Inc.
 
Hybrid Data Warehouse Hadoop Implementations
David Portnoy
 
Ad

Recently uploaded (20)

PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
Earn Agentblazer Status with Slack Community Patna.pptx
SanjeetMishra29
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
The Future of Artificial Intelligence (AI)
Mukul
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Earn Agentblazer Status with Slack Community Patna.pptx
SanjeetMishra29
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 

Hadoop Presentation

  • 1. Hadoop Presentation 2012 Presenter : Pham Thai Hoa Email : [email protected] Web : http://mobion.com/hoa 4/14/2012 Pham Thai Hoa
  • 2. Topic  Introduce to Hadoop  Introduce to Hive  Introduce to Logger  Using Hadoop at Mobion  Warehouse at Mobion  Q&A 4/14/2012 Pham Thai Hoa
  • 3. What is Hadoop  It’s a framework for the distributed processing  Inspired by Google’s architecture: Map Reduce and GFS  A top-level Apache project  Hadoop is the open source  Hadoop have the two important elements + Map – Reduce core + Hadoop Distributed File System 4/14/2012 Pham Thai Hoa
  • 4. Why use Hadoop  Fault-tolerant hardware is expensive  Hadoop is designed to run on cheap commodity hardware  It automatically handles data replication and node failure  It does the hard work – you can focus on processing data  It has the three supported modes : Local, Pseudo-Distributed, Fully- Distributed Mode 4/14/2012 Pham Thai Hoa
  • 5. Data Flow into Hadoop 4/14/2012 Pham Thai Hoa
  • 6. Who use Hadoop  Amazon's product search indices using the streaming API and pre- existing C++, Perl, and Python tools  Yahoo : More than 100,000 CPUs in >40,000 computers running Hadoop  Facebook use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning 4/14/2012 Pham Thai Hoa
  • 7. What is Hive  Hive is a data warehouse system for Hadoop  Using Map-Reduce for execution  Using HDFS for storage  Metadata in an RDBMS  Scalability and performance  Interoperability  Using a SQL-like language called HiveQL 4/14/2012 Pham Thai Hoa
  • 8. Data Flow into Hive 4/14/2012 Pham Thai Hoa
  • 9. Hive Data Model  Tables + Typed columns (int, float, string,…) + Also, array/map/struct for JSON-like data  Partitions + e.g., to range-partition tables by date  Buckets + Hash partitions within ranges (useful for sampling, join optimization) 4/14/2012 Pham Thai Hoa
  • 10. Hive Metastore  Database: namespace containing a set of tables  Holds Table/Partition definitions (column types,mappings to HDFS directories)  Statistics  Implemented with DataNucleus ORM. Runs on Derby, MySQL, and many other relational databases 4/14/2012 Pham Thai Hoa
  • 11. Introduce to Logger  A logging system has three broad components + Client Code Interface + Distribution System + Do Something Usefullizer  Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures 4/14/2012 Pham Thai Hoa
  • 12. Why use Scribe  Scalability and performance  Event Notification library  Thrift framework  Hadoop is optional  Client using  Distributed scribe system  Over 1 million messages per second for logging  Hierarchy stores 4/14/2012 Pham Thai Hoa
  • 13. Warehouse at Mobion  Log Collector  Log/Data Transformer  Data Analyzer  Web Reporter  Log define  Log integrate (into application)  Log/Data analyze  Report develop (API, Mobion, Music …) 4/14/2012 Pham Thai Hoa
  • 14. Warehouse at Mobion  Data mining  Music Recommendation  Spam Detection  Application performance  Export data and import into MySQL for web report  Analytic system 4/14/2012 Pham Thai Hoa
  • 15. Q&A  Why use hadoop ?  Why use Hive ?  Why need a logging system ?  What is the warehouse system architecture ?  Do we use these system for voting, chat, message and feed ??  How can we use them for recommendation, suggestion ? 4/14/2012 Pham Thai Hoa
  • 16. Following Link  http://facebook.com  http://highscalability.com/product- scribe-facebooks-scalable-logging- system  http://hadoop.apache.org/  http://hive.apache.org/  http://wiki.apache.org/hadoop/Powere dBy  http://www.apache.org/foundation/than ks.html 4/14/2012 Pham Thai Hoa
  • 17. THANK YOU 4/14/2012 Pham Thai Hoa