0% found this document useful (0 votes)

33 views

2016 09 05 Raspberry Pi Hadoop Setup v1

This document provides instructions for installing Hadoop 2.7.3, Yarn, Hive 2.1.0, Scala 2.11.8, and Spark 2.0 on a Raspberry Pi cluster consisting of 3 nodes. It describes setting up the basic environment and configuration of each Raspberry Pi node, installing Hadoop and configuring it for a single node cluster, and optional steps for installing additional components like Hive and Spark.

Uploaded by

Luis Betancourt

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

2016 09 05 Raspberry Pi Hadoop Setup v1

Uploaded by

Luis Betancourt

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Installing Hadoop 2.7.3 / Yarn, Hive 2.1.

0, Scala
2.11.8, and Spark 2.0 on Raspberry Pi Cluster of 3
Nodes

By: Nicholas Propes 2016

1
NOTES
Please follow instructions PARTS in order because the results are cumulative
(i.e. PART I, then PART II, then PART III, then PART IV, then PART V). PARTS
III, IV and V are optional.

1. A lot of tools here are for Windows, so substitute your OS equivalent.

2. I am using 3 Raspberry Pi model 3.0 with a 8-port switch. They each have a
32 GB micro sd card (you have to buy this separately) and a case (also
bought separately). They also each come with 1 GB RAM (not upgradable).
They also have wireless capability built-in, so you may try it without the
8-port switch, but I'm choosing wired.

3. I might forget to put "sudo" in front of commands, so if you get

permission errors try putting a "sudo" in front of the command.

4. I attached my switch to my router that was connected to the Internet. I

attached each raspberry pi separately, one-by-one, to the switch as I
configured it. I didn't connect all of them at once to avoid confusion of
which DHCP provided IP belonged to which raspberry pi.

5. I am using precompiled binaries for Hadoop which is 32-bit. If you want

to try to compile for 64-bit on the raspberry pi, you can compile Hadoop
from source, but it takes a very long time and there are patches (e.g.
https://issues.apache.org/jira/browse/HADOOP/fixforversion/12334005/?selec
tedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel).
Make sure you have the correct versions of src to compile. When trying, I
found I had to have this library compiled first-protobuf 2.5-
http://pkgs.fedoraproject.org/repo/pkgs/protobuf/protobuf-
2.5.0.tar.bz2/a72001a9067a4c2c4e0e836d0f92ece4/ from google code (do not
try the newer version, you need 2.5 for Hadoop 2.7.3). You will have to
ensure that you install the compilers and libraries you need such as
openssl, maven, g++, etc. I kept finding new ones I needed as I tried to
compile. My recommendation is not to do this. First, try to get
comfortable with precompiled binaries and Hadoop configuration as in these
instructions. Then once you get experience, go back and see if you can
compile your own version for the raspberry pi platform. You will often
see warning messages using the 32-bit precompiled binaries, "WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable" when executing
commands. This is OK for my purposes.

6. If you get stuck, you might try these websites for reference though they
seem to have errors:

 http://scn.sap.com/community/bi-platform/blog/2015/04/25/a-hadoop-
data-lab-project-on-raspberry-pi--part-14
 http://scn.sap.com/community/bi-platform/blog/2015/05/03/a-haddop-
data-lab-project-on-raspberry-pi--part-24
 http://scn.sap.com/community/bi-platform/blog/2015/07/10/a-hadoop-
data-lab-project-on-raspberry-pi--part-44
 http://www.widriksson.com/raspberry-pi-hadoop-cluster/
 http://spark.apache.org/docs/latest/spark-standalone.html

2
Part I: Basic Setup
1. Download the Raspbian Jessie OS disk image. (I didn't use the lite
version, though you could try as this might save disk space--not sure if
you will have to install java or other components that might be missing if
you use the lite version)

2. Burn the disk image to a micro SD card using Win32 Disk Imager (Windows)

3. Put the micro SD card in the Raspberry Pi and start it up.

4. SSH into Raspberry Pi using Putty (have to find out what IP is given to it
using an network scanning tool, I used one I put on my phone). Default
username is "pi" and password is "raspberry"

5. Set up raspberry pi using the command "sudo raspi-config"

- International Options->
- Advanced Options -> Memory Split to 32MB
- Advanced Options -> SSH -> Enable SHH
- Advanced Options-> Hostname -> node1 (or your preference of node name)
- reboot and log back in using Putty

6. Type "sudo apt-get update"

7. Type "sudo apt-get install python-dev"

8. Set up network interface (note I'm setting the node1 to address

192.168.0.110):
- Type "ifconfig" and note the inet addr, Bcast, and Mask of eth0
- Type "route -n" and note Gateway and Destination (not 0.0.0.0 on either,
the other one).
- Type "sudo nano /etc/network/interfaces" and enter/edit the following:

3
(to save, press CTRL-X, and press y, and then hit enter, I won't repeat
this for nano editing in future)

-Type "sudo nano /etc/dhcpcd.conf" and enter/edit the following at the

bottom of the file:

-Type "sudo nano /etc/hosts" and delete everything then enter the
following:

127.0.0.1 localhost
192.168.0.110 node1

Make sure that is all that is in that file and no other items exist
such as ipv6, etc.

-Type "sudo nano /etc/hostname" and make sure it just says:

node1

9. Type "java -version" and make sure you have the correct java version. I
am using java version "1.8.0_64" i.e. Java 8. If you don't have the
correct version, type "sudo apt-get install oracle-java8-jdk". You might
have multiple Java versions installed. You can use the command "sudo
update-alternatives --config java" to select the correct one.

10. Now, we set up a group and user that will be used for Hadoop. We also
make the user a superuser.
-Type "sudo addgroup hadoop"
-Type "sudo adduser --ingroup hadoop hduser"
-Type "sudo adduser hduser sudo"

4
11. Next, we create a RSA key pair so that the master node can log into
slave nodes through ssh without password. This will be used later when we
have multiple slave nodes.
-Type "su hduser"
-Type "mkdir ~/.ssh"
-Type "ssh-keygen -t rsa -P """
-Type "cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys"
-Verify by typing "ssh localhost"

12. Reboot the raspberry pi by typing "sudo reboot"

13. Login as hduser and make sure you can access the Internet (note your
Putty now should use 192.168.0.110 to access the raspberry pi).
-Type "ping www.cnn.com"
-Press CTRL-C when finished.

If you can't access the Internet something is wrong with your network setup
(probably you aren't hooked up to a router, you misspelled something, or your
Internet isn't working).

5
Part II: Hadoop 2.7.3 / Yarn Installation :
Single Node Cluster

1. In Windows, go to the Apache Hadoop website: http://hadoop.apache.org/ and

click on the "Releases" link on the left. You'll see the list of Hadoop
releases for source and binary. I selected the binary tarball release for
version 2.7.3 by clicking on the link. Now, you will see a lot of
different links for downloading this binary. I wrote down (don't
download) the link to one of them such as:

http://apache.cs.utah.edu/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

2. Login to the raspberry pi using Putty as hduser.

3. We'll be installing Hadoop / Yarn in the "/opt" directory.

-Type "cd /opt".

4. Download the binary for Hadoop.

-Type typing "sudo wget <URL in step 1>" e.g. "wget
http://apache.cs.utah.edu/hadoop/common/hadoop-2.7.3/hadoop-
2.7.3.tar.gz"

5. Unzip the tarball.

-Type "sudo tar -xvzf hadoop-2.7.3.tar.gz"

6. I renamed the download to something easier to type-out later.

-Type "sudo mv hadoop-2.7.3 hadoop"

7. Make this hduser an owner of this directory just to be sure.

-Type "sudo chown -R hduser:hadoop hadoop"

8. Now that we have hadoop, we have to configure it before it can launch its
daemons (i.e. namenode, secondary namenode, datanode, resourcemanager, and
nodemanager). Make sure you are logged in as hduser.
-Type "su hduser"

9. Now, we will add some environmental variables.

-Type "cd ~"
-Type "sudo nano .bashrc"
-At the bottom of the .bashrc file, add the following lines

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

export HADOOP_INSTALL=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
export HADOOP_USER_CLASSPATH_FIRST=true

10. Many configuration files for Hadoop and its daemons are located in the
/opts/hadoop/etc/hadoop folder. We will edit some of these files for
configuration purposes. Note, there are a lot of configuration parameters
to explore.
-Type "cd /opts/hadoop/etc/hadoop"
-Type "sudo nano hadoop-env.sh"
-Edit the line (there should be place to edit an existing line)
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
-Edit the line (there should be place to edit an existing line)

6
export HADOOP_HEAPSIZE=250
The default is 1000 MB of heap per daemon launched by HADOOP, but we
are dealing with limited memory Raspberry Pi (1GB).

11. Many configuration files for Hadoop and its daemons are located in the
/opts/hadoop/etc/hadoop folder. We will edit some of these files for
configuration purposes. Note, there are a lot of configuration parameters
to explore. Now we will edit the core Hadoop configuration in core-
site.xml.
-Type "sudo nano core-site.xml"
-Add the following properties between the <configuration> and
</configuration> tags.

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hdfs/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://node1:54310</value>
</property>
</configuration>

12. Now edit the hdfs (hadoop file system) configuration in hdfs-site.xml.
-Type "sudo nano hdfs-site.xml"
-Add the following properties between the <configuration> and
</configuration> tags.

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

We'll be setting this to a different value once we have multiple nodes.

13. Now edit the YARN (Yet Another Resource Negotiator) configuration in
yarn-site.xml.
-Type "sudo nano hdfs-site.xml"
-Add the following properties between the <configuration> and
</configuration> tags.

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>

7
</configuration>

14. Now edit the map-reduce configuration in mapred-site.xml. Here I

believe the default framework for map-reduce is YARN, but I do this anyway
(may be optional).
- Type "sudo cp mapred-site.xml.template mapred-site.xml"
- Type "sudo nano mapred-site.xml"
-Add the following properties between the <configuration> and
</configuration> tags.

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

15. Now edit the masters and slaves files.

-Type "sudo nano slaves"
-Edit the file so it only contains the following:
node1
-Type "sudo nano masters"
-Edit the file so it only contains the following:
node1

16. Reboot by typing "sudo reboot"

17. Login as hduser.

18. Create a location for hdfs (see core-site.xml) and format hdfs.
-Type "sudo mkdir -p /hdfs/tmp"
-Type "sudo chown hduser:hadoop /hdfs/tmp"
-Type "sudo chmod 750 /hdfs/tmp"
-Type "hadoop namenode -format"

19. Start Hadoop (hdfs) and YARN (resource scheduler). Ignore any warning
messages that may occur (as mentioned in notes, most are due to 32-bit
binary running on 64-bit platform)
-Type "cd ~"
-Type "start-dfs.sh"
-Type "start-yarn.sh"

20. Test Hadoop and YARN (see if daemons are running)

-Type "jps"

You should see something like this:

5021 DataNode
4321 NameNode
2012 Jps
1023 SecondaryNameNode
23891 Nodemanager
3211 ResourceManager

If you don't see DataNode, SecondaryNameNode, and NameNode, probably

something is setup wrong in .bashrc, core-site.xml, or hdfs-site.xml.

8
If you don't see ResourceManager and Nodemanager, probably something is
incorrectly setup in .bashrc, yarn-site.xml, or mapred-site.xml.

21. You can test a calculation using examples provided in the distribution.
Here we put a local file into hdfs. Then we execute a Java program that
counts the frequency of words in the file located on hdfs now. Then we
grab the output file from hdfs and put it on the local computer.
-Type "hadoop fs -copyFromLocal /opt/hadoop/LICENSE.txt /license.txt"
-Type "hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-
examples-2.7.3.jar wordcount /license.txt /license-out.txt"
-Type "hadoop fs -copyToLocal /license-out.txt"
-Type "more ~/license-out.txt/part-r-00000"

Here you can see the output that counts the frequency of words in the
LICENSE.txt file.

22. You can view the setup in your Windows browser by following these URLs.

NAMENODE INFORMATION
http://192.168.0.110:50070

ALL APPLICATIONS (YARN)

http://192.168.0.110:8088

23. There are a lot of commands to explore (there are also hdfs commands
which I believe are considered more modern than hadoop commands, but not
sure yet). Here are a few to try out:

- "hadoop fs -ls /" shows contents of hdfs

- "hadoop fs -rm <file>" deletes file
- "hadoop fs -rm -r -f <directory>" deletes directory and contents
- "hadoop fs -copyFromLocal <local source file> <hdfs destination
file>" copies file from local file system to hdfs.
- "hadoop fs -copyToLocal <hdfs source file> <local destination file>"
copies file from hdfs to local file system.
- "start-dfs.sh" starts hdfs daemon (NameNode, Datanode, Secondary
NameNode)
- "start-yarn.sh" starts yarn daemon (ResourceManager, NodeManager)
- "stop-dfs.sh" stops hdfs daemon (NameNode, Datanode, Secondary
NameNode)
- "stop-yarn.sh" stops yarn daemon (ResourceManager, NodeManager)

9
Part III: Hadoop 2.7.3 / Yarn Installation :
Multi-Node Cluster
1. On node1, login as hduser.

2. Here we will setup a multi-node cluster following on Parts I and II setup.

Each node will have Hadoop / Yarn installed on it because we will be
cloning node1.
-Type "sudo nano /etc/hosts" and edit it with the following:

127.0.0.1 localhost
192.168.0.110 node1
192.168.0.111 node2
192.168.0.112 node3

Make sure that is all that is in that file and no other items exist
such as ipv6, etc.

3. Remove any data in the /hdfs/tmp folder.

-Type "sudo rm -f /hdfs/tmp/*"

4. Shutdown the raspberry pi.

-Type "sudo shutdown -h now"

5. Now we will clone the single node we created onto 2 other SD cards for the
other two raspberry pis. Then we will change the configuration for each to
setup the cluster. Node 1 will be the master node. Nodes 2 and 3 will be
the slave nodes.

6. We will now copy the node1 32 GB micro SD card to the other two blank SD
cards.
-Unplug the raspberry pi from power.
-Remove the SD card from the raspberry pi.
-Using a micro SD card reader and Win 32 Disk Imager, "READ" the SD card
to an .img file on your Windows computer (you can choose any name for your
.img file like node1.img). Warning: this file will be approximately 32 GB
so have room where you want to create the image on your Windows computer.
-After the image is created, put your node1 micro SD card back into the
original raspberry pi. Get your other two blank micro SD cards for the
other two raspberry pis and "WRITE" the node1 image you just created to
them one at a time.
-After making the images, put the micro SD cards back to their respective
raspberry pis and set them aside for now.

7. Now plug in the raspberry pi you want for node2 to the network and power
it up. (it should be the only one attached to the network switch). Login
to it using hduser using Putty.

8. Set up network interface for node2 (its ip address will be 192.168.0.111)

-Type "sudo nano /etc/network/interfaces" and change the address from
192.168.0.110 to 192.168.0.111.
-Type "sudo nano /etc/dhcpcd.conf" and change the static ip_address
from 192.168.0.110/24 to 192.168.0.111/24.
-Type "sudo nano /etc/hostname" and change the name from node1 to
node2.

10
-Type "sudo reboot"

9. Now plug in the raspberry pi you want for node3 to the network and power
it up. (node2 and node3 should be the only one attached to the network
switch). Login to it using hduser using Putty.

10. Set up network interface for node3 (its ip address will be

192.168.0.112)
-Type "sudo nano /etc/network/interfaces" and change the address from
192.168.0.110 to 192.168.0.112.
-Type "sudo nano /etc/dhcpcd.conf" and change the static ip_address
from 192.168.0.110/24 to 192.168.0.112/24.
-Type "sudo nano /etc/hostname" and change the name from node1 to
node3.
-Type "sudo reboot"

11. Now attach node1 to the network switch and power it up. Login to node1
(192.168.0.110) using Putty as hduser. You should now see 192.168.0.110,
192.168.0.111, and 192.168.0.112 on your network.

12. Now edit the hdfs configuration in hdfs-site.xml for node1.

-Type "cd /opt/hadoop/etc/hadoop"
-Type "sudo nano hdfs-site.xml"
-Edit the value to 3 for property dfs.replication.
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
-Type "sudo nano slaves"
-Edit the file so it only contains the following:
node1
node2
node3
-Type "sudo nano masters"
-Edit the file so it only contains the following:
node1

13. Copy the RSA keys over to nodes 2 and 3.

-Type "sudo ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@node2"
-Type "sudo ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@node3"

14. Login to node2 (192.168.0.111) using Putty as hduser.

15. Now edit the hdfs configuration in hdfs-site.xml for node2.

11
node2
-Type "sudo nano masters"
-Edit the file so it only contains the following:
node1
-Type "sudo reboot"

16. Login to node3 (192.168.0.112) using Putty as hduser.

17. Now edit the hdfs configuration in hdfs-site.xml for node3.

-Type "cd /opt/hadoop/etc/hadoop"
-Type "sudo nano hdfs-site.xml"
-Edit the value to 3 for property dfs.replication.
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
-Type "sudo nano slaves"
-Edit the file so it only contains the following:
node3
-Type "sudo nano masters"
-Edit the file so it only contains the following:
node1
-Type "sudo reboot"

18. Login to node1 (192.168.0.110) using Putty as hduser.

-Type "cd ~"
-Type "hadoop namenode -format"
-Type "sudo reboot"

19. Login to node1 (192.168.0.110) using Putty as hduser.

-Type "start-dfs.sh"
-Type "start-yarn.sh"

20. Test Hadoop and YARN (see if daemons are running)

-Type "jps"

You should see something like this:

5021 DataNode
4321 NameNode
2012 Jps
1023 SecondaryNameNode
23891 Nodemanager
3211 ResourceManager

21. Login to node2 (192.168.0.111) using Putty as hduser.

22. Test Hadoop and YARN (see if daemons are running)

-Type "jps"

You should see something like this:

5021 DataNode
2012 Jps
23891 Nodemanager

23. Login to node3 (192.168.0.112) using Putty as hduser.

12
24. Test Hadoop and YARN (see if daemons are running)
-Type "jps"

You should see something like this:

5021 DataNode
2012 Jps
23891 Nodemanager

25. You can view the setup in your Windows browser by following these URLs.

NAMENODE INFORMATION
http://192.168.0.110:50070

ALL APPLICATIONS (YARN)

http://192.168.0.110:8088

13
14
Part IV: Hive 2.1.0 Installation
1. Here we will install Hive on node1. Hive only needs to be installed on
the master node.

2. On node1 (192.168.0.110), login as hduser.

3. In your Windows computer, open up a web browser and go to:

https://hive.apache.org/downloads.html and click on "Download a release
now!" You will see a list of links to download Hive. Click on one of the
links. Then click on "hive-2.1.0". Write the link of the bin version
down (do not download - e.g. http://apache.claz.org/hive/hive-
2.1.0/apache-hive-2.1.0-bin.tar.gz)

4. On node1, we will now download Hive into the /opt directory.

-Type "cd /opt"
-Type "sudo wget <URL from step 3>"
-Type "sudo tar -xzvf apache-hive-2.1.0-bin.tar.gz"
-Type "sudo mv apache-hive-2.1.0-bin hive-2.1.0"
-Type "sudo chown -R hduser:hadoop /opt/hive-2.1.0"

5. On node1, we will add some environmental variables:

-Type "cd ~"
-Type "sudo nano .bashrc"
-Enter the following additions at the bottom of .bashrc
export HIVE_HOME=/opt/hive-2.1.0
export PATH=$HIVE_HOME/bin:$PATH
-Type "sudo reboot"

6. Log back into node1 as hduser. We shall now start up hdfs and yarn
services and make some directories.
-Type "start-dfs.sh"
-Type "start-yarn.sh"

7. On node1, we will also initialize a database for hive (never delete or

modify the metastore_db directory directly-but if you do, you need to do
this command again but your data in Hive will be lost).
-Type "cd ~"
-Type "schematool -initSchema -dbType derby"

8. On node1, you can start the hive command line interface (cli).
-Type "hive"

15
16
Part IV: Spark 2.0 Installation
1. Here we will install Spark (Standalone Mode) on node1, node2, and node3.
Then we will configure each node separately. node1 will be master node
for spark and node2 and node3 slave nodes for spark. Before we install
spark, we will install Scala on node1.

2. Find Scala by going to the http://www.scala-lang.org/download/2.11.8.html

page in Windows and at the bottom find the link to the scala-2.11.8.deb
package (e.g. http://downloads.lightbend.com/scala/2.11.8/scala-
2.11.8.deb)

3. Install Scala on node1. Login to node1 as hduser.

-Type "sudo wget http://downloads.lightbend.com/scala/2.11.8/scala-
2.11.8.deb"
-Type "sudo dpkg -i scala-2.11.8.deb"

4. Find Spark by going to the http://spark.apache.org/downloads.html page in

Windows. Right click on the spark-2.0.0-bin-hadoop2.7.tgz link (step #4
on webpage) and paste into notebook to see the address (e.g.
http://d3kbcqa49mib13.cloudfrontnet/spark-2.0.0-bin-hadoop2.7.tgz).

5. Install Spark on node1. Login to node1 as hduser.

-Type "cd /opt"
-Type "sudo wget http://d3kbcqa49mib13.cloudfrontnet/spark-2.0.0-bin-
hadoop2.7.tgz"
-Type "sudo tar -xvzf spark-2.0.0-bin-hadoop2.7.tgz"
-Type "sudo mv spark-2.0.0-bin-hadoop2.7 spark"
-Type "sudo chown -R hduser:hadoop /opt/spark"

6. Repeat step 5 for both node2 and node3.

7. On node1 we configure Spark. Login to node1 as hduser.

-Type "cd ~"
-Type "sudo nano .bashrc"
Add the following to bottom of .bashrc file
export PATH=$PATH:/opt/spark/bin
export PATH=$PATH:/opt/spark/sbin
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
-Type "cd /opt/spark/conf"
-Type "sudo cp slaves.template slaves"
-Type "sudo nano slaves"
Add the following to bottom of slaves file
node1
node2
node3
-Type "sudo reboot"

8. On node2 we configure Spark. Login to node1 as hduser.

-Type "cd ~"
-Type "sudo nano .bashrc"
Add the following to bottom of .bashrc file
export PATH=$PATH:/opt/spark/bin
-Type "sudo reboot"

17
9. On node3 we configure Spark. Login to node1 as hduser.
-Type "cd ~"
-Type "sudo nano .bashrc"
Add the following to bottom of .bashrc file
export PATH=$PATH:/opt/spark/bin
-Type "sudo reboot"

10. On node1, login as hduser.

-Type "/opt/spark/sbin/start-all.sh"

11. To test we check jps.

-Type "jps"
Note that we should see Master and Worker processes on node1 and
Worker process only on nodes 2 and 3.

12. You can check the spark monitoring website as well in a web browser at
http://192.168.0.110:8080.

13. You can also launch pyspark.

-Type "pyspark"

SAMPLE of She of The Sea by Lucy H. Pearce, Womancraft Publishing
100% (2)
SAMPLE of She of The Sea by Lucy H. Pearce, Womancraft Publishing
29 pages
The Alphabet of SPEC!!!!
100% (5)
The Alphabet of SPEC!!!!
30 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Running Ha Do Op Michel Noll
No ratings yet
Running Ha Do Op Michel Noll
23 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Installing Multi Node Cluster - Handbook 2.0
No ratings yet
Installing Multi Node Cluster - Handbook 2.0
2 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Lab 0-Cluster With Multiple VMs-30-01-2024
No ratings yet
Lab 0-Cluster With Multiple VMs-30-01-2024
6 pages
Big Dataa-Lab-Manual
No ratings yet
Big Dataa-Lab-Manual
24 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
Installation OpenMeetings 3.0.x On Ubuntu 14.04 PDF
No ratings yet
Installation OpenMeetings 3.0.x On Ubuntu 14.04 PDF
17 pages
Hadoop Cluster Creation
No ratings yet
Hadoop Cluster Creation
8 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Install Rasperry Pi OS On Your SD Card With The Raspberry Pi Imager
No ratings yet
Install Rasperry Pi OS On Your SD Card With The Raspberry Pi Imager
14 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Bab 3 Hadoop Installation
No ratings yet
Bab 3 Hadoop Installation
12 pages
Installation+Steps
No ratings yet
Installation+Steps
5 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Raspbian Stretch With Joomla
No ratings yet
Raspbian Stretch With Joomla
9 pages
Manual Hadoop HIve Installation
No ratings yet
Manual Hadoop HIve Installation
4 pages
Hadoop Multi Node Cluster
No ratings yet
Hadoop Multi Node Cluster
7 pages
Hadoop Installatio1
No ratings yet
Hadoop Installatio1
22 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Installation OpenMeetings 4.0.0 On Ubuntu 16.04 LTS
No ratings yet
Installation OpenMeetings 4.0.0 On Ubuntu 16.04 LTS
16 pages
Installation OpenMeetings 5.0.0 On Ubuntu 20.04 Lts
No ratings yet
Installation OpenMeetings 5.0.0 On Ubuntu 20.04 Lts
17 pages
PRACTICAL 4 - Single and Multi Node Hadoop Install
No ratings yet
PRACTICAL 4 - Single and Multi Node Hadoop Install
11 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
Install Hadoop in RHEL 8 PDF
No ratings yet
Install Hadoop in RHEL 8 PDF
9 pages
CouchDB - Installation
No ratings yet
CouchDB - Installation
7 pages
Complete Hadoop Map Reduce Hive Setup Step by Step
No ratings yet
Complete Hadoop Map Reduce Hive Setup Step by Step
30 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Installation OpenMeetings 5.0.0-M3 On Ubuntu 18.04 LTS PDF
No ratings yet
Installation OpenMeetings 5.0.0-M3 On Ubuntu 18.04 LTS PDF
16 pages
Manual de Instalacion de Openmeetings 5
No ratings yet
Manual de Instalacion de Openmeetings 5
16 pages
Experiment-2_BDA_Lab
No ratings yet
Experiment-2_BDA_Lab
13 pages
Part B Assignment_No_11
No ratings yet
Part B Assignment_No_11
6 pages
Synergy On Raspberry Pi
No ratings yet
Synergy On Raspberry Pi
9 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
Installing A Single Node Hadoop Cluster
No ratings yet
Installing A Single Node Hadoop Cluster
4 pages
hadoop6
No ratings yet
hadoop6
5 pages
Big Data File
No ratings yet
Big Data File
16 pages
Install Hadoop
No ratings yet
Install Hadoop
5 pages
BUILDING
No ratings yet
BUILDING
14 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Raspberry Pi User Guide
No ratings yet
Raspberry Pi User Guide
10 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
1.Mrplab Intro
No ratings yet
1.Mrplab Intro
18 pages
3) - Nasdag
No ratings yet
3) - Nasdag
4 pages
Hadoop Single Node Installation
No ratings yet
Hadoop Single Node Installation
4 pages
Instalisasi Hadoop Dengan Ubuntu
No ratings yet
Instalisasi Hadoop Dengan Ubuntu
17 pages
Hortonworks Sandbox (HDP 2.2.4) Installation Guide: Big Data and Hadoop For Beginners
100% (1)
Hortonworks Sandbox (HDP 2.2.4) Installation Guide: Big Data and Hadoop For Beginners
13 pages
big-data-file
No ratings yet
big-data-file
32 pages
L Hadoop 1 PDF
No ratings yet
L Hadoop 1 PDF
12 pages
Installation OpenMeetings 3.2.0 On Ubuntu 16.04 LTS
No ratings yet
Installation OpenMeetings 3.2.0 On Ubuntu 16.04 LTS
19 pages
Installation OpenMeetings - On - CentOs
No ratings yet
Installation OpenMeetings - On - CentOs
17 pages
Tutorial InstallationHadoopSingleNodePrerequisite
No ratings yet
Tutorial InstallationHadoopSingleNodePrerequisite
4 pages
HDFS Installation Guide-Anju
No ratings yet
HDFS Installation Guide-Anju
4 pages
RHCSA Exam Questions & Answers
No ratings yet
RHCSA Exam Questions & Answers
8 pages
RasberryPiClusterBuild
No ratings yet
RasberryPiClusterBuild
11 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
Is Censorship Needed in Entertainment INDUSTRY?????
No ratings yet
Is Censorship Needed in Entertainment INDUSTRY?????
12 pages
Setting Up Payables in Fusion Cloud
No ratings yet
Setting Up Payables in Fusion Cloud
30 pages
Cambridge O Level: COMMERCE 7100/22
No ratings yet
Cambridge O Level: COMMERCE 7100/22
12 pages
Socratic Seminar Assignment - Why Did The Allies Win WW2
No ratings yet
Socratic Seminar Assignment - Why Did The Allies Win WW2
9 pages
TALF Annual+Report 2021
No ratings yet
TALF Annual+Report 2021
208 pages
22 Octavia Butler and Spiritual Ecofeminism
No ratings yet
22 Octavia Butler and Spiritual Ecofeminism
10 pages
The A.C.C. Members Co - Operative
No ratings yet
The A.C.C. Members Co - Operative
5 pages
Bai Tap Ngu Phap Tieng Anh Lop 6
No ratings yet
Bai Tap Ngu Phap Tieng Anh Lop 6
7 pages
7 Habits Lessons (Announcements)
No ratings yet
7 Habits Lessons (Announcements)
9 pages
Instant download (Ebook) The Power Game in Byzantium: Antonina and the Empress Theodora by James Allan Evans ISBN 9781441140784, 1441140786 pdf all chapter
100% (3)
Instant download (Ebook) The Power Game in Byzantium: Antonina and the Empress Theodora by James Allan Evans ISBN 9781441140784, 1441140786 pdf all chapter
63 pages
Ronaldo Newsleter
No ratings yet
Ronaldo Newsleter
5 pages
Tutorial 08-Answers
No ratings yet
Tutorial 08-Answers
3 pages
Moral Courage
No ratings yet
Moral Courage
38 pages
What Is Reading in Philippine History
No ratings yet
What Is Reading in Philippine History
6 pages
YODAWI
No ratings yet
YODAWI
11 pages
SURVEY
No ratings yet
SURVEY
3 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
SDC 632RF Instruction Manual
No ratings yet
SDC 632RF Instruction Manual
2 pages
CHAP1-3 BILLIARD - Ytwaaadadada
No ratings yet
CHAP1-3 BILLIARD - Ytwaaadadada
22 pages
Avx 10k Flight Line Test Set Brochures en
No ratings yet
Avx 10k Flight Line Test Set Brochures en
2 pages
G.R. No. 111474, August 22, 1994
No ratings yet
G.R. No. 111474, August 22, 1994
6 pages
StatisticsofNonOECDCountries PDF
No ratings yet
StatisticsofNonOECDCountries PDF
750 pages
Subud World NEws
No ratings yet
Subud World NEws
3 pages
TSF 4
No ratings yet
TSF 4
1 page
Kling Prompts
No ratings yet
Kling Prompts
2 pages
Colliers Manila Q2 2023 Office v2
No ratings yet
Colliers Manila Q2 2023 Office v2
4 pages
Characters in The Play
No ratings yet
Characters in The Play
18 pages
2. Constituent-WPS Office
No ratings yet
2. Constituent-WPS Office
3 pages

2016 09 05 Raspberry Pi Hadoop Setup v1

Uploaded by

2016 09 05 Raspberry Pi Hadoop Setup v1

Uploaded by

Installing Hadoop 2.7.3 / Yarn, Hive 2.1.

By: Nicholas Propes 2016

1. A lot of tools here are for Windows, so substitute your OS equivalent.

3. I might forget to put "sudo" in front of commands, so if you get

4. I attached my switch to my router that was connected to the Internet. I

5. I am using precompiled binaries for Hadoop which is 32-bit. If you want

3. Put the micro SD card in the Raspberry Pi and start it up.

5. Set up raspberry pi using the command "sudo raspi-config"

6. Type "sudo apt-get update"

7. Type "sudo apt-get install python-dev"

8. Set up network interface (note I'm setting the node1 to address

-Type "sudo nano /etc/dhcpcd.conf" and enter/edit the following at the

-Type "sudo nano /etc/hostname" and make sure it just says:

12. Reboot the raspberry pi by typing "sudo reboot"

1. In Windows, go to the Apache Hadoop website: http://hadoop.apache.org/ and

2. Login to the raspberry pi using Putty as hduser.

3. We'll be installing Hadoop / Yarn in the "/opt" directory.

4. Download the binary for Hadoop.

5. Unzip the tarball.

6. I renamed the download to something easier to type-out later.

7. Make this hduser an owner of this directory just to be sure.

9. Now, we will add some environmental variables.

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

We'll be setting this to a different value once we have multiple nodes.

14. Now edit the map-reduce configuration in mapred-site.xml. Here I

15. Now edit the masters and slaves files.

16. Reboot by typing "sudo reboot"

17. Login as hduser.

20. Test Hadoop and YARN (see if daemons are running)

You should see something like this:

If you don't see DataNode, SecondaryNameNode, and NameNode, probably

ALL APPLICATIONS (YARN)

- "hadoop fs -ls /" shows contents of hdfs

2. Here we will setup a multi-node cluster following on Parts I and II setup.

3. Remove any data in the /hdfs/tmp folder.

4. Shutdown the raspberry pi.

8. Set up network interface for node2 (its ip address will be 192.168.0.111)

10. Set up network interface for node3 (its ip address will be

12. Now edit the hdfs configuration in hdfs-site.xml for node1.

13. Copy the RSA keys over to nodes 2 and 3.

14. Login to node2 (192.168.0.111) using Putty as hduser.

15. Now edit the hdfs configuration in hdfs-site.xml for node2.

16. Login to node3 (192.168.0.112) using Putty as hduser.

17. Now edit the hdfs configuration in hdfs-site.xml for node3.

18. Login to node1 (192.168.0.110) using Putty as hduser.

19. Login to node1 (192.168.0.110) using Putty as hduser.

20. Test Hadoop and YARN (see if daemons are running)

You should see something like this:

21. Login to node2 (192.168.0.111) using Putty as hduser.

22. Test Hadoop and YARN (see if daemons are running)

You should see something like this:

23. Login to node3 (192.168.0.112) using Putty as hduser.

You should see something like this:

ALL APPLICATIONS (YARN)

2. On node1 (192.168.0.110), login as hduser.

3. In your Windows computer, open up a web browser and go to:

4. On node1, we will now download Hive into the /opt directory.

5. On node1, we will add some environmental variables:

7. On node1, we will also initialize a database for hive (never delete or

2. Find Scala by going to the http://www.scala-lang.org/download/2.11.8.html

3. Install Scala on node1. Login to node1 as hduser.

4. Find Spark by going to the http://spark.apache.org/downloads.html page in

5. Install Spark on node1. Login to node1 as hduser.

6. Repeat step 5 for both node2 and node3.

7. On node1 we configure Spark. Login to node1 as hduser.

8. On node2 we configure Spark. Login to node1 as hduser.

10. On node1, login as hduser.

11. To test we check jps.

13. You can also launch pyspark.

You might also like