SlideShare a Scribd company logo
Running Hadoop On Ubuntu Linux
Introduction Single-Node Cluster http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) Multi-Node Cluster http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) Decommission Issues ITRI Cloud Storage System Architecture Agenda
HDFS is  highly fault-tolerant  and is designed to be  deployed on low-cost hardware . HDFS provides high throughput access to application data and is  suitable for applications that have large data sets . HDFS relaxes a few POSIX requirements to enable streaming access to file system data.  Introduction
Introduction (con’t) HDFS Architecture (source:http://hadoop.apache.org/core/docs/current/hdfs_design.html
Introduction (con’t) HDFS multi-node overview (source:http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
Introduction (con’t) HDFS multi-node cluster Architecture (source:http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
Prerequisites Sun JAVA 6 Add a hadoop system user Configuration  SSH public key authentication Single-Node Cluster need to access localhost Disabling IPv6 Hadoop installation Configuration <HADOOP_INSTALL>/conf/hadoop-env.sh <HADOOP_INSTALL>/conf/core-site.xml <HADOOP_INSTALL>/conf/mapred-site.xml <HADOOP_INSTALL>/conf/hdfs-site.xml Single-Node Cluster
Formatting the name node <HADOOP_INSTALL>/hadoop/bin/hadoop namenode -format Starting/Stop your single-node cluster <HADOOP_INSTALL>/bin/start-all.sh <HADOOP_INSTALL>/bin/stop-all.sh Check hadoop processes are running jps Copy local example data to HDFS <HADOOP_INSTALL>/ bin/hadoop dfs  -copyFromLocal /tmp/gutenberg  gutenberg <HADOOP_INSTALL>/ bin/hadoop dfs  –ls <HADOOP_INSTALL>/ bin/hadoop dfs  -ls  gutenberg Run the MapReduce job bin/hadoop  jar  hadoop-0.20.2-examples.jar  wordcount gutenberg  gutenberg-output Single-Node Cluster(con’t)
http ://localhost:50030/  - web UI for MapReduce job tracker(s) http://localhost:50060/  - web UI for task tracker(s) http://localhost:50070/  - web UI for HDFS name node(s) Single-Node Cluster(con’t)
/etc/hosts SSH access Configuration <HADOOP_INSTALL>/conf/masters master <HADOOP_INSTALL>/conf/slaves master  slave  anotherslave01 anotherslave02  anotherslave03 <HADOOP_INSTALL>/ conf/core-site.xml <value> hdfs://master:54310 </value> <HADOOP_INSTALL>/ conf/mapred-site.xml <HADOOP_INSTALL>/ conf/hdfs-site.xml Multi-Node Cluster
Make a large cluster smaller by taking out a bunch of nodes simultaneously. How can this be done? Create a file “excludes” slave97 slave98 slave99 Add configuration in <HADOOP_INSTALL>/conf/hadoop-site.xml    <property>                                                 <name>dfs.hosts.exclude</name>          <value>excludes</value>   </property>      <HADOOP_INSTALL>/bin/hadoop  dfsadmin  -refreshNodes Decommission
NameNode backup NameNode shutdown DataNode shutdown Add DataNode dynamically Remove DataNode dynamically(Decommission?) How to tune file/block size? Big data testing Issues
Cloud Storage System Architecture HDFS Client HDFS DataNode HDFS NameNode HDFS DataNode … iSCSI Target iSCSI Initiator VM Volume DMS
Read Flow HDFS Client HDFS DataNode HDFS NameNode iSCSI Target iSCSI Initiator VM Volume I.1 I.2 I.4 1 4 5 6 I.5 I.3 I.4 DMS 2 3
Write Flow HDFS DataNode 1 HDFS NameNode 1 4 5 6 HDFS DataNode 2 7 VM (Domain-U) HDFS Client iSCSI Target iSCSI Initiator VM Volume I.1 I.2 I.4 I.5 I.3 I.4 9 8 10 7.1 8.2 11 12 DMS 2 3
Ad

More Related Content

What's hot (20)

Hadoop 3.1.1 single node
Hadoop 3.1.1 single nodeHadoop 3.1.1 single node
Hadoop 3.1.1 single node
康志強 大人
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
Young Pyo
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
Shiva Rama Krishna Dasharathi
 
Dev ops
Dev opsDev ops
Dev ops
Tom Hall
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
tomasbart
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294
IkiArif1
 
Light my-fuse
Light my-fuseLight my-fuse
Light my-fuse
Workhorse Computing
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
baabtra.com - No. 1 supplier of quality freshers
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
NETWAYS
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016
StackIQ
 
Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016
StackIQ
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)
Denish Patel
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Denish Patel
 
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
Equnix Business Solutions
 
Build your own private openstack cloud
Build your own private openstack cloudBuild your own private openstack cloud
Build your own private openstack cloud
NUTC, imac
 
A Journey to Boot Linux on Raspberry Pi
A Journey to Boot Linux on Raspberry PiA Journey to Boot Linux on Raspberry Pi
A Journey to Boot Linux on Raspberry Pi
Jian-Hong Pan
 
使用 CLI 管理 OpenStack 平台
使用 CLI 管理 OpenStack 平台使用 CLI 管理 OpenStack 平台
使用 CLI 管理 OpenStack 平台
NUTC, imac
 
Docker 基本概念與指令操作
Docker  基本概念與指令操作Docker  基本概念與指令操作
Docker 基本概念與指令操作
NUTC, imac
 
Linux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPSLinux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPS
joshuasoundcloud
 
Haproxy - zastosowania
Haproxy - zastosowaniaHaproxy - zastosowania
Haproxy - zastosowania
Łukasz Jagiełło
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
Young Pyo
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
Shiva Rama Krishna Dasharathi
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
tomasbart
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294
IkiArif1
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
NETWAYS
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016
StackIQ
 
Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016
StackIQ
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)
Denish Patel
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Denish Patel
 
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
Equnix Business Solutions
 
Build your own private openstack cloud
Build your own private openstack cloudBuild your own private openstack cloud
Build your own private openstack cloud
NUTC, imac
 
A Journey to Boot Linux on Raspberry Pi
A Journey to Boot Linux on Raspberry PiA Journey to Boot Linux on Raspberry Pi
A Journey to Boot Linux on Raspberry Pi
Jian-Hong Pan
 
使用 CLI 管理 OpenStack 平台
使用 CLI 管理 OpenStack 平台使用 CLI 管理 OpenStack 平台
使用 CLI 管理 OpenStack 平台
NUTC, imac
 
Docker 基本概念與指令操作
Docker  基本概念與指令操作Docker  基本概念與指令操作
Docker 基本概念與指令操作
NUTC, imac
 
Linux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPSLinux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPS
joshuasoundcloud
 

Similar to Running hadoop on ubuntu linux (20)

Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
Positive Hack Days
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
Sandeep Raju
 
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Alluxio, Inc.
 
KNOX-HTTPFS-ONEFS-WP
KNOX-HTTPFS-ONEFS-WPKNOX-HTTPFS-ONEFS-WP
KNOX-HTTPFS-ONEFS-WP
Boni Bruno
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
prabakaranbrick
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
Giovanna Roda
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
Schubert Zhang
 
The Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comThe Practice of Alluxio in JD.com
The Practice of Alluxio in JD.com
Alluxio, Inc.
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
Mandakini Kumari
 
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Alluxio, Inc.
 
Open stack implementation
Open stack implementation Open stack implementation
Open stack implementation
Soumyajit Basu
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
Alluxio, Inc.
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
Ashok Modi
 
Hdfs design
Hdfs designHdfs design
Hdfs design
Không còn Phù Hợp
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
Robert Lemke
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSHTame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
David Stockton
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
Positive Hack Days
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
Sandeep Raju
 
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Alluxio, Inc.
 
KNOX-HTTPFS-ONEFS-WP
KNOX-HTTPFS-ONEFS-WPKNOX-HTTPFS-ONEFS-WP
KNOX-HTTPFS-ONEFS-WP
Boni Bruno
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
prabakaranbrick
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
Giovanna Roda
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
Schubert Zhang
 
The Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comThe Practice of Alluxio in JD.com
The Practice of Alluxio in JD.com
Alluxio, Inc.
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
Mandakini Kumari
 
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Alluxio, Inc.
 
Open stack implementation
Open stack implementation Open stack implementation
Open stack implementation
Soumyajit Basu
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
Alluxio, Inc.
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
Ashok Modi
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
Robert Lemke
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSHTame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
David Stockton
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
Ad

Recently uploaded (20)

ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Ad

Running hadoop on ubuntu linux

  • 1. Running Hadoop On Ubuntu Linux
  • 2. Introduction Single-Node Cluster http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) Multi-Node Cluster http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) Decommission Issues ITRI Cloud Storage System Architecture Agenda
  • 3. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware . HDFS provides high throughput access to application data and is suitable for applications that have large data sets . HDFS relaxes a few POSIX requirements to enable streaming access to file system data. Introduction
  • 4. Introduction (con’t) HDFS Architecture (source:http://hadoop.apache.org/core/docs/current/hdfs_design.html
  • 5. Introduction (con’t) HDFS multi-node overview (source:http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
  • 6. Introduction (con’t) HDFS multi-node cluster Architecture (source:http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
  • 7. Prerequisites Sun JAVA 6 Add a hadoop system user Configuration SSH public key authentication Single-Node Cluster need to access localhost Disabling IPv6 Hadoop installation Configuration <HADOOP_INSTALL>/conf/hadoop-env.sh <HADOOP_INSTALL>/conf/core-site.xml <HADOOP_INSTALL>/conf/mapred-site.xml <HADOOP_INSTALL>/conf/hdfs-site.xml Single-Node Cluster
  • 8. Formatting the name node <HADOOP_INSTALL>/hadoop/bin/hadoop namenode -format Starting/Stop your single-node cluster <HADOOP_INSTALL>/bin/start-all.sh <HADOOP_INSTALL>/bin/stop-all.sh Check hadoop processes are running jps Copy local example data to HDFS <HADOOP_INSTALL>/ bin/hadoop dfs -copyFromLocal /tmp/gutenberg gutenberg <HADOOP_INSTALL>/ bin/hadoop dfs –ls <HADOOP_INSTALL>/ bin/hadoop dfs -ls gutenberg Run the MapReduce job bin/hadoop jar hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output Single-Node Cluster(con’t)
  • 9. http ://localhost:50030/  - web UI for MapReduce job tracker(s) http://localhost:50060/  - web UI for task tracker(s) http://localhost:50070/  - web UI for HDFS name node(s) Single-Node Cluster(con’t)
  • 10. /etc/hosts SSH access Configuration <HADOOP_INSTALL>/conf/masters master <HADOOP_INSTALL>/conf/slaves master slave anotherslave01 anotherslave02 anotherslave03 <HADOOP_INSTALL>/ conf/core-site.xml <value> hdfs://master:54310 </value> <HADOOP_INSTALL>/ conf/mapred-site.xml <HADOOP_INSTALL>/ conf/hdfs-site.xml Multi-Node Cluster
  • 11. Make a large cluster smaller by taking out a bunch of nodes simultaneously. How can this be done? Create a file “excludes” slave97 slave98 slave99 Add configuration in <HADOOP_INSTALL>/conf/hadoop-site.xml   <property>                                                 <name>dfs.hosts.exclude</name>          <value>excludes</value>   </property>      <HADOOP_INSTALL>/bin/hadoop dfsadmin -refreshNodes Decommission
  • 12. NameNode backup NameNode shutdown DataNode shutdown Add DataNode dynamically Remove DataNode dynamically(Decommission?) How to tune file/block size? Big data testing Issues
  • 13. Cloud Storage System Architecture HDFS Client HDFS DataNode HDFS NameNode HDFS DataNode … iSCSI Target iSCSI Initiator VM Volume DMS
  • 14. Read Flow HDFS Client HDFS DataNode HDFS NameNode iSCSI Target iSCSI Initiator VM Volume I.1 I.2 I.4 1 4 5 6 I.5 I.3 I.4 DMS 2 3
  • 15. Write Flow HDFS DataNode 1 HDFS NameNode 1 4 5 6 HDFS DataNode 2 7 VM (Domain-U) HDFS Client iSCSI Target iSCSI Initiator VM Volume I.1 I.2 I.4 I.5 I.3 I.4 9 8 10 7.1 8.2 11 12 DMS 2 3

Editor's Notes

  • #3: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #4: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #5: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #6: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #7: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #8: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #9: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #10: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #11: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #12: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。
  • #13: 說明專案規劃的重要性 、與生命週期各階段的關係及相關流程領域 。 說明 CMMI 專案規劃流程領域的規範內容 。 提供一些專案規劃範例程序 。 說明如何規劃製作專案規劃程序 。