SlideShare a Scribd company logo
(      &Hadoop     )

    2013 4 12
     Takumi Asai
(26 )
–
– H21 H23 NTT Communications          IP
– H23     NTT

– twitter:@p_i_o4545
– blog:http://pioneerinocean.hatenablog.com/
    •
    •                 R    Hadoop              (   )


–
    •
(           :4/12)

    Hadoop
(       :     )
    R




              Ruby R
データ解析技術入門(Hadoop編)
/   /
   /
⇒wikipedia
=
21            (   )
⇒




Google,Facebook
1000


       D   R
VS

                IT




                     RDBMS
SPSS   R




           IT
VS




FSP            Web




       FSP

       TESCO
VS




WinWin
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
Hadoop

  Hadoop
  – Apache
                 Java
  – Google MapReduce,Google File
    System(GFS)

     •   google
Hadoop

  Hadoop
  –    HDFS MapReduce
  – Hbase
  HDFS
  – Google      GFS
  –

  MapReduce
  – Google      MapReduce
  – Key-Value
       Java
HDFS

Namenode,2Namenode,Datanode   3


       Data Node
                                  Data Node

                      Name Node



       Data Node                  Data Node
                      Secondary
                      Name Node


       Data Node                  Data Node
HDFS

•             HDFS
                      (64MB              )

                     abcdefg   #Block1
                     hijklmn
                               (64MB)
                     opqrstu
        abcdefg
        hijklmn
        opqrstu
        vwxyz        vwxyz
                               #Block2
                               (64MB)



       150M
                               #Block3
                               (22MB)
HDFS


  –

  –
  –

   abcdefg
             #Block1   Data Node:A has 1,2
   hijklmn
             (64MB)
   opqrstu
                       Data Node:B has 2,3

   vwxyz               Data Node:C has 1,3
             #Block2
             (64MB)

                       Data Node:D has 1

             #Block3
             (22MB)    Data Node:E has 2,3
Namenode(NN)
– Namenode
– HDFS
–
–
Datanode(DN)
–
– blk_xxxxxx
–




               Secondary   Data Node
 Name Node
               Name Node
Secondary Namenode

Secondary Namenode(2NN)
– 2NN Namenode
– Namenode
–
     •          3
2NN NN
– Namenode

–        Namenode

     •
– 2NN
Namenode             !
– Namenode         HDFS
–     NN     2NN


– HDFS




–
–
HDFS

                     HDFS


  Data Node                             Data Node

                            Name Node
                              Active


  Data Node                             Data Node

                            Name Node
                             Standby


  Data Node                             Data Node



Standby              2NN
          2NN
HDFS

  HDFS
   – Datanode
   – Datanode


  Namenode
   –              Namenode
   – Namenode                          ⇔Datanode
     Datanode⇔Datanode
       •
       •
       •


                         Linux
   – ls,cat
   –                             rwx
       •          x          HDFS
MapReduce

   MapReduce
   –
   –



   – Map/Reduce    2
   – Map/Reduce         ,Mapper/Reducer
   –       Map,Reduce     Shuffle
MapReduce

HDFS


   Task Tracker
                                 Task Tracker
     (      )



                   Job Tracker
   Task Tracker      (     )     Task Tracker




   Task Tracker                  Task Tracker



  JobTracker      TaskTracker
Data Node
                                  Data Node
Task Tracker
                                 Task Tracker
                   Name Node
                   Job Tracker


 Data Node                        Data Node
Task Tracker                     Task Tracker
                   Secondary
                   Name Node


 Data Node     ※   HDFS
                                  Data Node
Task Tracker   ※   Mapreduce
                                 Task Tracker
Mapreduce

  YARN
  – HDFS                   Mapreduce

  – YARN(Mapreduce Ver2)
  – Mapreduce
  –           YARN
  –                YARN
MapReduce

   WordCount
    – MapReduce                             (Hello World          )


    Hello Hadoop Goodbye World Hello Goodbye World World Hadoop

   Map

   <Hello,1> <Hadoop,1> <Goodbye,1> <World,1> <Hello,1>
   <Goodbye,1> <World,1> <World,1> <Hadoop,1>

  Shuffle

   <Goodbye,[1,1]> <Hadoop,[1,1]> <Hello,[1,1]> <World,[1,1,1]>

  Reduce
   <Goodbye,2> <Hadoop,2> <Hello,2> <World,3>
MapReduce

   Mapper Reducer
   –
   –
   –
   –                HDFS ”   ”



            Map

                                 reduce

            Map

                                 reduce
            Map
MapReduce


  –                               WordCount

  – Map                   Reduce

  –
      • fizz buzz fizzbuzz fizz
  – Ruby                               Ruby


  – Map                      #{      }¥t1

          OK
  – Reduce
MapReduce


  hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred



   Hadoop        3
   hdfs          1
   Mapred        4

  –                                        OK
      • #{   }¥t#{       }


  – cat test.txt | ruby map.rb | sort | ruby reduce.rb
      •              Hadoop
MapReduce

            :Map
  hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred




   hdfs         1
   Hadoop       1
   Hadoop       1
   Mapred       1
   Mapred       1
Map
#!/usr/bin/env ruby


STDIN.each_line do |line|


line.split.each do |word|
  puts "#{word}¥t1"
 end


end
Reduce
wordhash = {}
STDIN.each_line do |line|
 word, count = line.strip.split


 if wordhash.has_key?(word)
  wordhash[word] += count.to_i
 else
  wordhash[word] = count.to_i
 end
end


wordhash.each {|record, count| puts "#{record}¥t#{count}"}
Hadoop

          Hadoop
  –

  –                Java   OK
  –

      •
          .
Hadoop
Hadoop
–
    • Pig
    • Hive
–
    • Sqoop
–
    • Mahout
–              Hadoop
    • whirr
                        etc…
Hadoop
– HDFS
    • RAID
    •
–       HDFS      Mapreduce
    • Amazon S3
–
    •


–
    •
–
    •
(Hadoop)


–      RDB


–


– Hive Pig

–

–
(Hadoop)


–


–                       HDD



–         Mapreduce
–

–              Hadoop
データ解析技術入門(Hadoop編)

More Related Content

PPTX
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
PDF
Treasure Data on The YARN - Hadoop Conference Japan 2014
PPT
Hadoop
PPTX
Presentation sreenu dwh-services
PDF
Hadoop on Azure, Blue elephants
PDF
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
PDF
Hadoop pig
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Treasure Data on The YARN - Hadoop Conference Japan 2014
Hadoop
Presentation sreenu dwh-services
Hadoop on Azure, Blue elephants
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
Hadoop pig

What's hot (20)

PDF
myHadoop 0.30
PPTX
Drill lightning-london-big-data-10-01-2012
PPTX
A Basic Introduction to the Hadoop eco system - no animation
PDF
Large Scale Math with Hadoop MapReduce
PDF
알쓸신잡
PPTX
Drill at the Chug 9-19-12
PDF
HTrace: Tracing in HBase and HDFS (HBase Meetup)
PDF
Introduction to Mongodb
PDF
Hadoop Overview
 
PDF
Hadoop installation, Configuration, and Mapreduce program
PPT
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
PPT
Hadoop for Scientific Workloads__HadoopSummit2010
PPT
Hadoop 1.x vs 2
PDF
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
DOCX
Hadoop Tutorial for Beginners
PDF
HUG slides on NFS and ODBC
PPTX
Overview of Big data, Hadoop and Microsoft BI - version1
ODP
Architecture of Hadoop
PDF
Hadoop basics
PPTX
Big Data Performance and Capacity Management
myHadoop 0.30
Drill lightning-london-big-data-10-01-2012
A Basic Introduction to the Hadoop eco system - no animation
Large Scale Math with Hadoop MapReduce
알쓸신잡
Drill at the Chug 9-19-12
HTrace: Tracing in HBase and HDFS (HBase Meetup)
Introduction to Mongodb
Hadoop Overview
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop 1.x vs 2
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Tutorial for Beginners
HUG slides on NFS and ODBC
Overview of Big data, Hadoop and Microsoft BI - version1
Architecture of Hadoop
Hadoop basics
Big Data Performance and Capacity Management
Ad

Similar to データ解析技術入門(Hadoop編) (20)

PDF
Hadoop入門とクラウド利用
PDF
Hadoop, HDFS and MapReduce
PDF
Hadoop Inside
PDF
Hadoop, Taming Elephants
PDF
第2回 Hadoop 輪読会
PPTX
Hadoop & HDFS for Beginners
PPTX
Hadoop hbase mapreduce
ODP
Hadoop HDFS by rohitkapa
PDF
GOTO 2011 preso: 3x Hadoop
PDF
Introduction to Hadoop
PPTX
Hadoop
PDF
Bigdata Technologies that includes various components .pdf
PPTX
Understanding hdfs
KEY
Hadoop本 輪読会 1章〜2章
PDF
Apache Hadoop & Friends at Utah Java User's Group
PDF
Hadoop Conference Japan 2011 Fallに行ってきました
PPTX
MapReduce Paradigm
PPTX
MapReduce Paradigm
KEY
マーケティングのためのHadoop利用
Hadoop入門とクラウド利用
Hadoop, HDFS and MapReduce
Hadoop Inside
Hadoop, Taming Elephants
第2回 Hadoop 輪読会
Hadoop & HDFS for Beginners
Hadoop hbase mapreduce
Hadoop HDFS by rohitkapa
GOTO 2011 preso: 3x Hadoop
Introduction to Hadoop
Hadoop
Bigdata Technologies that includes various components .pdf
Understanding hdfs
Hadoop本 輪読会 1章〜2章
Apache Hadoop & Friends at Utah Java User's Group
Hadoop Conference Japan 2011 Fallに行ってきました
MapReduce Paradigm
MapReduce Paradigm
マーケティングのためのHadoop利用
Ad

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Modernizing your data center with Dell and AMD
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
REPORT: Heating appliances market in Poland 2024
PPTX
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Belt and Road Supply Chain Finance Blockchain Solution
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Google’s NotebookLM Unveils Video Overviews
PDF
DevOps & Developer Experience Summer BBQ
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
Smarter Business Operations Powered by IoT Remote Monitoring
PPTX
Understanding_Digital_Forensics_Presentation.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Modernizing your data center with Dell and AMD
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
REPORT: Heating appliances market in Poland 2024
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
GamePlan Trading System Review: Professional Trader's Honest Take
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Belt and Road Supply Chain Finance Blockchain Solution
NewMind AI Weekly Chronicles - August'25 Week I
Google’s NotebookLM Unveils Video Overviews
DevOps & Developer Experience Summer BBQ
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
Sensors and Actuators in IoT Systems using pdf
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
Transforming Manufacturing operations through Intelligent Integrations
Smarter Business Operations Powered by IoT Remote Monitoring
Understanding_Digital_Forensics_Presentation.pptx

データ解析技術入門(Hadoop編)

  • 1. ( &Hadoop ) 2013 4 12 Takumi Asai
  • 2. (26 ) – – H21 H23 NTT Communications IP – H23 NTT – twitter:@p_i_o4545 – blog:http://pioneerinocean.hatenablog.com/ • • R Hadoop ( ) – •
  • 3. ( :4/12) Hadoop ( : ) R Ruby R
  • 5. / / / ⇒wikipedia
  • 6. =
  • 7. 21 ( ) ⇒ Google,Facebook
  • 8. 1000 D R
  • 9. VS IT RDBMS SPSS R IT
  • 10. VS FSP Web FSP TESCO
  • 14. Hadoop Hadoop – Apache Java – Google MapReduce,Google File System(GFS) • google
  • 15. Hadoop Hadoop – HDFS MapReduce – Hbase HDFS – Google GFS – MapReduce – Google MapReduce – Key-Value Java
  • 16. HDFS Namenode,2Namenode,Datanode 3 Data Node Data Node Name Node Data Node Data Node Secondary Name Node Data Node Data Node
  • 17. HDFS • HDFS (64MB ) abcdefg #Block1 hijklmn (64MB) opqrstu abcdefg hijklmn opqrstu vwxyz vwxyz #Block2 (64MB) 150M #Block3 (22MB)
  • 18. HDFS – – – abcdefg #Block1 Data Node:A has 1,2 hijklmn (64MB) opqrstu Data Node:B has 2,3 vwxyz Data Node:C has 1,3 #Block2 (64MB) Data Node:D has 1 #Block3 (22MB) Data Node:E has 2,3
  • 19. Namenode(NN) – Namenode – HDFS – – Datanode(DN) – – blk_xxxxxx – Secondary Data Node Name Node Name Node
  • 20. Secondary Namenode Secondary Namenode(2NN) – 2NN Namenode – Namenode – • 3 2NN NN – Namenode – Namenode • – 2NN
  • 21. Namenode ! – Namenode HDFS – NN 2NN – HDFS – –
  • 22. HDFS HDFS Data Node Data Node Name Node Active Data Node Data Node Name Node Standby Data Node Data Node Standby 2NN 2NN
  • 23. HDFS HDFS – Datanode – Datanode Namenode – Namenode – Namenode ⇔Datanode Datanode⇔Datanode • • • Linux – ls,cat – rwx • x HDFS
  • 24. MapReduce MapReduce – – – Map/Reduce 2 – Map/Reduce ,Mapper/Reducer – Map,Reduce Shuffle
  • 25. MapReduce HDFS Task Tracker Task Tracker ( ) Job Tracker Task Tracker ( ) Task Tracker Task Tracker Task Tracker JobTracker TaskTracker
  • 26. Data Node Data Node Task Tracker Task Tracker Name Node Job Tracker Data Node Data Node Task Tracker Task Tracker Secondary Name Node Data Node ※ HDFS Data Node Task Tracker ※ Mapreduce Task Tracker
  • 27. Mapreduce YARN – HDFS Mapreduce – YARN(Mapreduce Ver2) – Mapreduce – YARN – YARN
  • 28. MapReduce WordCount – MapReduce (Hello World ) Hello Hadoop Goodbye World Hello Goodbye World World Hadoop Map <Hello,1> <Hadoop,1> <Goodbye,1> <World,1> <Hello,1> <Goodbye,1> <World,1> <World,1> <Hadoop,1> Shuffle <Goodbye,[1,1]> <Hadoop,[1,1]> <Hello,[1,1]> <World,[1,1,1]> Reduce <Goodbye,2> <Hadoop,2> <Hello,2> <World,3>
  • 29. MapReduce Mapper Reducer – – – – HDFS ” ” Map reduce Map reduce Map
  • 30. MapReduce – WordCount – Map Reduce – • fizz buzz fizzbuzz fizz – Ruby Ruby – Map #{ }¥t1 OK – Reduce
  • 31. MapReduce hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred Hadoop 3 hdfs 1 Mapred 4 – OK • #{ }¥t#{ } – cat test.txt | ruby map.rb | sort | ruby reduce.rb • Hadoop
  • 32. MapReduce :Map hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred hdfs 1 Hadoop 1 Hadoop 1 Mapred 1 Mapred 1
  • 33. Map #!/usr/bin/env ruby STDIN.each_line do |line| line.split.each do |word| puts "#{word}¥t1" end end
  • 34. Reduce wordhash = {} STDIN.each_line do |line| word, count = line.strip.split if wordhash.has_key?(word) wordhash[word] += count.to_i else wordhash[word] = count.to_i end end wordhash.each {|record, count| puts "#{record}¥t#{count}"}
  • 35. Hadoop Hadoop – – Java OK – • .
  • 37. Hadoop – • Pig • Hive – • Sqoop – • Mahout – Hadoop • whirr etc…
  • 38. Hadoop – HDFS • RAID • – HDFS Mapreduce • Amazon S3 – • – • – •
  • 39. (Hadoop) – RDB – – Hive Pig – –
  • 40. (Hadoop) – – HDD – Mapreduce – – Hadoop