SlideShare a Scribd company logo
Intro to Hadoop
   Jaideep Dhok
Hi!


●   I work at

●   Involved with Hadoop for 2+ years
Outline
Brief History of Hadoop
●   2005 -
●   Inspired by the GFS and MapReduce papers
    published by Google.
●   Promoted heavily by Yahoo! Since 2006
●   Today, the defacto standard in 'Big Data'
    computing
The Buzz
Why?
●   'Big Data'
●   How big? - petabyte scale
●   Scalable
●   Robust
●   Secure!
Scalability
When To Use It
●   Can you use Hadoop to do X?
    ●   Is your problem 'embarassingly' parallel?
    ●   Workflow?
        –   Dependent/Independent Tasks
    ●   Data/CPU intensive?
●   Can you use Hadoop to do X in the Clouds?
    ●   Depends where your data is
Why To Use It?
●   Ad hoc analysis
    ●   Semi/structured data
        –   Log files
        –   Text
        –   CSV, XML, anything really
        –   RDBMS
        –   NoSQL!
Use Cases
●   Analytics
    ●   User behavior
●   Reporting
●   Filtering
●   Machine Learning
●   Just storing your data
Just From The Logs
●   Suppose you run a web-site
    ●   User breakdown by browsers
    ●   Location
    ●   Understanding user session
        –   How long do they use it?
        –   Who are the active users?
        –   What part of my app they use the most?
        –   What part of my app is user X's fav?
Tools
●   Native Hadoop APIs – Java
●   Streaming – Perl, Python, Ruby, any language
    as long it has support for 'stdin' and 'stdout'
●   Pig
●   HIVE
●   Pipes – C and C++
Ecosystem
Don't Wait
●   Hadoop
    ●   hadoop.apache.org
●   Cloudera tutorials on Hadoop
●   Books
Questions?
Thank You!
jaideep.dhok@gmail.com

More Related Content

What's hot (19)

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Steve Watt
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
Brian Enochson
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
responseteam
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
 
Hadoop
Hadoop Hadoop
Hadoop
Shamama Kamal
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
Antonio Silveira
 
Bw tech hadoop
Bw tech hadoopBw tech hadoop
Bw tech hadoop
Mindgrub Technologies
 
Hadoop
HadoopHadoop
Hadoop
Kartik Kalpande Patil
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
Steve Staso
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
Chirag Ahuja
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Steve Watt
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
Brian Enochson
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
responseteam
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 

Similar to Geek camp (20)

Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
Sufi Nawaz
 
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
ggphotosmuskan
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Big Data Montreal
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
Jordan Open Source Association
 
Unit 3 intro.pptx
Unit 3 intro.pptxUnit 3 intro.pptx
Unit 3 intro.pptx
AkhilJoseph63
 
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Cloudera, Inc.
 
Drupal sharing in HP7
Drupal sharing in HP7Drupal sharing in HP7
Drupal sharing in HP7
jimyhuang
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
Sam Dias
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
Data Science London
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
Scaling up wso2 bam for billions of requests and terabytes of data
Scaling up wso2 bam for billions of requests and terabytes of dataScaling up wso2 bam for billions of requests and terabytes of data
Scaling up wso2 bam for billions of requests and terabytes of data
WSO2
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
scorlosquet
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
Humoyun Ahmedov
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocket
SeedRocket
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
Eric Baldeschwieler
 
Drupal as a Semantic Web platform - ISWC 2012
Drupal as a Semantic Web platform - ISWC 2012Drupal as a Semantic Web platform - ISWC 2012
Drupal as a Semantic Web platform - ISWC 2012
scorlosquet
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Hw09 Next Steps For Hadoop
Hw09   Next Steps For HadoopHw09   Next Steps For Hadoop
Hw09 Next Steps For Hadoop
Cloudera, Inc.
 
Apache pig
Apache pigApache pig
Apache pig
Suresh Mandava
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
Sufi Nawaz
 
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
ggphotosmuskan
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Big Data Montreal
 
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Cloudera, Inc.
 
Drupal sharing in HP7
Drupal sharing in HP7Drupal sharing in HP7
Drupal sharing in HP7
jimyhuang
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
Sam Dias
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
Data Science London
 
Scaling up wso2 bam for billions of requests and terabytes of data
Scaling up wso2 bam for billions of requests and terabytes of dataScaling up wso2 bam for billions of requests and terabytes of data
Scaling up wso2 bam for billions of requests and terabytes of data
WSO2
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
scorlosquet
 
Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocket
SeedRocket
 
Drupal as a Semantic Web platform - ISWC 2012
Drupal as a Semantic Web platform - ISWC 2012Drupal as a Semantic Web platform - ISWC 2012
Drupal as a Semantic Web platform - ISWC 2012
scorlosquet
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Hw09 Next Steps For Hadoop
Hw09   Next Steps For HadoopHw09   Next Steps For Hadoop
Hw09 Next Steps For Hadoop
Cloudera, Inc.
 

Geek camp

  • 1. Intro to Hadoop Jaideep Dhok
  • 2. Hi! ● I work at ● Involved with Hadoop for 2+ years
  • 4. Brief History of Hadoop ● 2005 - ● Inspired by the GFS and MapReduce papers published by Google. ● Promoted heavily by Yahoo! Since 2006 ● Today, the defacto standard in 'Big Data' computing
  • 6. Why? ● 'Big Data' ● How big? - petabyte scale ● Scalable ● Robust ● Secure!
  • 8. When To Use It ● Can you use Hadoop to do X? ● Is your problem 'embarassingly' parallel? ● Workflow? – Dependent/Independent Tasks ● Data/CPU intensive? ● Can you use Hadoop to do X in the Clouds? ● Depends where your data is
  • 9. Why To Use It? ● Ad hoc analysis ● Semi/structured data – Log files – Text – CSV, XML, anything really – RDBMS – NoSQL!
  • 10. Use Cases ● Analytics ● User behavior ● Reporting ● Filtering ● Machine Learning ● Just storing your data
  • 11. Just From The Logs ● Suppose you run a web-site ● User breakdown by browsers ● Location ● Understanding user session – How long do they use it? – Who are the active users? – What part of my app they use the most? – What part of my app is user X's fav?
  • 12. Tools ● Native Hadoop APIs – Java ● Streaming – Perl, Python, Ruby, any language as long it has support for 'stdin' and 'stdout' ● Pig ● HIVE ● Pipes – C and C++
  • 14. Don't Wait ● Hadoop ● hadoop.apache.org ● Cloudera tutorials on Hadoop ● Books