CCS335 - Cloud Computing Record
CCS335 - Cloud Computing Record
CSS335–CLOUD COMPUTING
LABORATORY
NAME :______________________
YEAR :______________________
SEMESTER :______________________
BRANCH :______________________
1
JEPPIAAR INSTITUTE OF TECHNOLOGY
SELF BELIEF | SELF DISCIPLINE | SELF RESPECT
KUNNAM, SUNGUVARCHATRAM, SRIPERUMBUDUR, CHENNAI - 631 604.
BONAFIDE CERTIFICATE
Date ___________________
2
VISION OF THE INSTITUTION
Jeppiaar Institute of Technology aspires to provide technical education in futuristic technologies with
the perspective of innovative, industrial and social application for the betterment of humanity.
• To produce competent and disciplined high-quality professionals with the practical skills
necessary to excel as innovative professionals and entrepreneurs for the benefit of the society.
• To improve the quality of education through excellence in teaching and learning, research,
leadership and by promoting the principles of scientific analysis, and creative thinking.
• To provide excellent infrastructure, serene and stimulating environment that is most conducive to
learning.
• To strive for productive partnership between the Industry and the Institute for research and
• To serve the global community by instilling ethics, values and life skills among the students
needed to enrich their lives.
3
VISION OF THE DEPARTMENT
The department will be an excellent centre to impart futuristic and innovative technological education
to facilitate the evolution of problem-solving skills along with knowledge application in the field of
Information Technology, understanding industrial and global requirements and societal needs for the benefit
of humanity.
Enabled students competent and employable by providing excellent Infrastructure to learn and
contribute for the welfare of the society.
To channelize the potentials of the students by offering state of the art amenities to undergo research
and higher education.
To facilitate students, obtain profound understanding nature and social requirements and grow as
professionals with values and integrity.
Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences,
and engineering sciences.
Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.
4
Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information
to provide valid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations.
The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms ofthe
engineering practice.
Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
Students are able to design and develop algorithms for real time problems, scientific and
businessapplications through analytical, logical and problems solving skills.
Students are able to provide security solution for network components and data storage and
management which will enable them to work efficiently in the industry.
5
PRACTICAL EXERCISES
1. Install Virtualbox/VMware/ Equivalent open source cloud Workstation with different flavours of Linux
2. Install a C compiler in the virtual machine created using a virtual box and execute Simple Programs
3. Install Google App Engine. Create a hello world app and other simple web applications using
python/java.
5. Simulate a cloud scenario using CloudSim and run a scheduling algorithm that is not present in
CloudSim.
6. Find a procedure to transfer the files from one virtual machine to another virtual machine.
7. Install Hadoop single node cluster and run simple applications like wordcount.
10. Find procedure to run the virtual machine of different configuration. Check how many virtual machines
6
TABLE OF CONTENTS
7
EX.NO:1 INSTALL VIRTUALBOX/VMWARE WORKSTATION WITH DIFFERENT
Date:
AIM:
To Install Virtualbox/VMware Workstation with different flavours of linux or windows OS on top of
windows7 or 8.
PROCEDURE
STEP 1: Go to VirtualBox website to download the binary for your current operating system. Since
our host machine is running on Windows, I'll choose 'x86/amd64' from Windows hosts. When
download is finished, run the executable file. Continue with the installation of VirtualBox with the
defaults. This will open VirtualBox at the end of the installation.
8
CREATE VIRTUAL MACHINE
9
STEP 4: The memory size depends on your host machine memory size. In this case, 12GB physical
RAM. I like to allocate as much as possible for Ubuntu but leave some for my Windows host machine.
I pick 8192 MB for my Ubuntu. Note that Virtual Box will create a swap partition with the same
amount space as base memory you have entered here. So later when you are selecting the size of the
virtual hard drive, make sure it is large enough since the hard drive will be splitted into root (/) and
swap partitions. The root partition contains by default all your system files, program settings and
documents.
10
Accept the default 'Create a virtual hard drive now' and click 'Create' button.
Continue to accept the default 'VDI' drive file type and click 'Next' button.
Change the storage type from the default 'Dynamically allocated' to 'Fixed size' to increase performance.
For the virtual hard drive space, the default value is 8GB which is too little for RNA-Seq analysis.
I'll pick 100GB since I have plenty of space in my hard disk. You want to choose a good size for
your RNA-Seq analysis. If you realize the drive space is not large enough, you'll need to go over
these steps again to create another virtual machine.
11
Click 'Create' button and Virtual Box will generate Ubuntu virtual machine.
Now the virtual machine is created. We are ready to install Ubuntu in this virtual machine. Select
your new virtual machine and click 'Settings' button. Click on 'Storage' category and then 'Empty'
under Controller: IDE. Click "CD/DVD" icon on right hand side and select the Ubuntu ISO file to
mount.
Note that if you have not downloaded 64-bit Ubuntu ISO file,
When downloading Ubuntu ISO file, make sure to select 64-bit version. Also make sure the VT-
x/Virtualization Technology has been enabled in your computer's BIOS/Basic Input Output System.
12
Since Top hat program can take an advantage of multiple processors/threads, it is a good idea to
specify a large number of processors in virtual machine (default value is 1). You can change this
number by clicking on 'System' category. In this case, I change the number of CPUs to 4 since 4 is
the largest value shown on the green bar in my case. Now you can click 'OK' button to continue.
13
Virtual Box may pop up a message about 'Auto capture keyboard' option. Read the message there and
check 'Do not show this message again' option before clicking OK.
14
INSTALL UBUNTU
Back to Oracle VM Virtual Box Manager, click on the new Ubuntu virtual machine and hit 'Start'
button. Now you shall see a 'Welcome' screen. Click 'Install Ubuntu' button. Note that the
installation process may differ a little bit from version to version. The screenshots here are based
on Ubuntu 14.04.1.
Ubuntu will ask you a few questions. If the default is good, click 'Continue' button.
In 'Who are you?' dialog, enter your preferred name, username and password. Note that this user
will have root/sudo privilege. Click 'Continue' button. The installation will continue until it is finished.
After installation is complete, click 'Restart Now' button. When you see a screen with a black
background saying 'Please remove installation media and close the tray (if any) then press ENTER:', just follow it.
Enter the password you have chosen and press 'Enter'. The Ubuntu Desktop OS is ready.
OUTPUT:
15
RESULT:
16
EX.NO 2: INSTALL A C COMPILER IN THE VIRTUAL MACHINE AND EXECUTE A
SAMPLE PROGRAM
DATE:
AIM: To Install a C compiler in the virtual machine and execute a sample C program.
PROCEDURE:
Step 1: Open Terminal (Applications-Accessories-Terminal)
Step 2: Open gedit by typing “gedit &” on terminal
(You can also use any other Text Editor application)
Step 3: Type the following on gedit (or any other text editor)
#include<stdio.h>
main()
{
printf ("Hello World\n");
}
Step 4: Type “ls” on Terminal to see all files under current folder
Confirm that “helloworld.c” is in the current directory. If not, type cd DIRECTORY_PATH to
17
go to the directory that has “helloworld.c”
Step 5: Type “gcc helloworld.c” to compile, and type “ls” to confirm that a new executable
file “a.out” is created
OUTPUT:
RESULT:
18
EX. NO 3 INSTALL GOOGLE APP ENGINE CREATE HELLO WORLD APP
AND OTHER SIMPLE WEB APPLICATIONS USING PYTHON/JAVA.
DATE:
AIM: To create hello world app and other simple web applications using python/java.
PROCEDURE:
STEP 1: PRE-REQUISITES: PYTHON 2.5.4
The App Engine SDK allows you to run Google App Engine Applications on your local computer.
Itsimulates the run-‐time environment of the Google App Engine infrastructure.
If you don't already have Python 2.5.4 installed in your computer, download and Install Python 2.5.4
from:
http://www.python.org/download/releases/2.5.4/
You can download the Google App Engine SDK by going to:
http://code.google.com/appengine/downloads.html
Download the Windows installer – the simplest thing is to download it to your Desktop or another
folder that you remember.
19
Double Click on the GoogleApplicationEngine installer.
Click through the installation wizard, and it should install the App Engine. If you do not have
Python2.5, it will install Python 2.5 as well.
Once the install is complete you can discard the downloaded installer
20
STEP 3: MAKING YOUR FIRST APPLICATION
Now you need to create a simple application. We could use the “+” option to have the launcher make
us an application – but instead we will do it by hand to get a better sense of what is going on.
Make a folder for your Google App Engine applications. Make the Folder on Desktop called “apps”
– the path to this folder is:
And then make a sub-•‐folder in within apps called “ae-•01-•trivial” – the path to this folder would be:
Using a text editor such as JEdit (www.jedit.org), create a file called app.yaml in the
ae-•01-•trivial folder with the following contents:
application: ae-01-trivialversion: 1
runtime: pythonapi_version: 1
handlers:
- url: /.*
script: index.py
Note: Please do not copy and paste these lines into your text editor – you might endup with strange
characters – simply type them into your editor.
Then create a file in the ae-•01-•trivial folder called index.py with three lines in it:
Then start the GoogleAppEngineLauncher program that can be found under Applications. Use the
File -•> Add Existing Application command and navigate into the apps directory and select the ae-
• 01-•trivial folder. Once you have addedthe application, select it so that you can control
the application using the launcher.
21
Once you have selected your application and press Run. After a few moments yourapplication will
start and the launcher will show a little green icon next to your application. Then press Browse to
open a browser pointing at your application which is running at http://localhost:8080/
STEP 4 : Paste http://localhost:8080 into your browser and you should see yourapplication as
follows: Just for fun, edit the index.py to change the name “Chuck” to your own name andpress
Refresh in the browser to verify your updates.
You can watch the internal log of the actions that the web server is performing whenyou are
interacting with your application in the browser. Select your application in the Launcher and press the
Logs button to bring up a log window:
22
Each time you press Refresh in your browser – you can see it retrieving the outputwith a GET
request.
To shut down the server, use the Launcher, select your application and press the
Stop button.
OUTPUT:
RESULT:
23
EX. NO. 4: USE GAE LAUNCHER TO LAUNCH THE WEB APPLICATIONS
DATE:
PROCEDURE:
Deploying the app to App Engine
To upload the guest book app, run the following command from within the appengine-guestbook-
python directory of your application where the app.yaml and index.yaml files are located:
gcloud app deploy app.yaml index.yaml
Optional flags:
Include the --project flag to specify an alternate Cloud Console project ID to what you initialized a
as the default in the gcloud tool.
Example: --project [YOUR_PROJECT_ID]
Include the -v flag to specify a version ID, otherwise one is generated for you.
Example: -v [YOUR_VERSION_ID]
The Data store indexes might take some time to generate before your application is available. If the
indexes are still in the process of being generated, you will receive a Need Index Error message
when accessing your app. This is a transient error, so try a little later if at first you receive this
error.
24
OUTPUT:
RESULT:
25
EX.NO.5 SIMULATE A CLOUD SCENARIO USING CLOUDSIM AND RUN A
SCHEDULING ALGORITHM THAT IS NOT PRESENT IN
CLOUDSIM
DATE:
AIM:
To simulate a cloud scenario using Cloud Sim and run a scheduling algorithm that is not present in
Cloud Sim
PROCEDURE:
The steps to be followed: How to use CloudSim in Eclipse
CloudSim is written in Java. The knowledge you need to use CloudSim is basic Java programming and
some basics about cloud computing. Knowledge of programming IDEs such as Eclipse or NetBeans is
also helpful. It is a library and, hence, CloudSim does not have to be installed. Normally, you can unpack
the downloaded package in any directory, add it to the Java classpath and it is ready to be used. Please
verify whether Java is available on your system.
6. Data centre’s are the resource providers in CloudSim; hence, creation of data centres is a second
step. To create Datacenter, you need the Datacenter Characteristics object that stores the properties of a
data centre such as architecture, OS, list of machines, allocation policy that covers the time or space
shared, the time zone and its price:
Datacenter datacenter9883 = new Datacenter(name, characteristics,
new VmAllocationPolicySimple(hostList), s
8. The fourth step is to create one virtual machine unique ID of the VM, userId ID of the VM’s
owner, mips, number Of Pes amount of CPUs, amount of RAM, amount of bandwidth, amount of storage,
virtual machine monitor, and cloudletScheduler policy for cloudlets:
VM vm = new Vm(vmid, brokerId, mips, pesNumber, ram, bw, size, vmm, new
CloudletSchedulerTimeShared())
CloudSimExample1 finished!
27
OUTPUT:
RESULT :
28
EX. NO:6 FILES TRANSFER FROM ONE VIRTUAL MACHINE TO
ANOTHER VIRTUAL MACHINE
DATE
AIM:
To find a procedure to transfer the files from one virtual machine to another virtual machine.
PROCEDURE:
1. You can copy few (or more) lines with copy & paste mechanism.
For this you need to share clipboard between host OS and guest OS, installing Guest Addition on both the
virtual machines (probably setting bidirectional and restarting them).You copy from guest OS in the
clipboard that is shared with the host OS.
Then you paste from the host OS to the second guest OS.
2. You can enable drag and drop too with the same method (Click on the machine,
settings,general, advanced, drag and drop: set to bidirectional )
3. You can have common Shared Folders on both virtual machines and use one of thedirectory shared
as buffer to copy.
Installing Guest Additions you have the possibility to set Shared Folders too. As you put afile in a shared
folder from host OS or from guest OS, is immediately visible to the other. (Keep in mind that can arise some
problems for date/time of the files when there are different clock settings on the different virtual machines).
If you use the same folder shared on more machines you can exchange files directly copyingthem in this
folder.
4. You can use usual method to copy files between 2 different computers with client-server
application. (e.g. scp with sshd active for linux, winscp... you can get some info about SSHservers e.g. here)
You need an active server (sshd) on the receiving machine and a client on the sending machine. Of course
you need to have the authorization setted (via password or, better, viaan automatic authentication method).
Note: many Linux/Ubuntu distribution install sshd by default: you can see if it is runningwith pgrep sshd
from a shell. You can install with sudo apt-get install openssh-server.
5. You can mount part of the file system of a virtual machine via NFS or SSHFS on theother, or you
can share file and directory with Samba.
You may find interesting the article Sharing files between guest and host withoutVirtualBox shared folders
with detailed step by step instructions.
You should remember that you are dialling with a little network of machines with differentoperative
systems, and in particular:
Each virtual machine has its own operative system running on and acts as a physicalmachine.
Each virtual machine is an instance of a program owned by an user in the hosting operativesystem and
should undergo the restrictions of the user in the hosting OS.
E.g Let we say that Hastur and Meow are users of the hosting machine, but they did not allow each other to
see their directories (no read/write/execute authorization). When each ofthem run a virtual machine, for the
hosting OS those virtual machine are two normal programs owned by Hastur and Meow and cannot see the
private directory of the other user.This is a restriction due to the hosting OS. It's easy to overcame it: it's
enough to give authorization to read/write/execute to a directory or to chose a different directory in which
both users can read/write/execute.
29
Windows likes mouse and Linux fingers. :-)
I mean I suggest you to enable Drag & drop to be cosy with the Windows machines andthe Shared folders
or to be cosy with Linux.
When you will need to be fast with Linux you will feel the need of ssh-keygen and to Generate once SSH
Keys to copy files on/from a remote machine without writingpassword anymore. In this way it functions
bash auto-completion remotely too!
30
OUTPUT:
RESULT:
31
EX.NO:7 INSTALL HADOOP SINGLE NODE CLUSTER AND RUN
SIMPLE APPLICATIONS LIKE WORDCOUNT
DATE:
AIM:
To install hadoop single node cluster and run simple applications like wordcount
PROCEDURE:
The steps followed for Hadoop 2 - Pseudo Node Installation
Prerequisites:
Hadoop is a framework written in Java for running applications on large clusters of commodity
hardware. Hadoop needs Java 6 or above to work.
Step 1: Download Jdk tar.gz file for linux-62 bit, extract it into “/usr/local”
boss@solaiv[]# cd /opt
boss@solaiv[]# sudo tar xvpzf/home/itadmin/Downloads/jdk-8u5-linux-x64.tar.gz
boss@solaiv[]# cd /opt/jdk1.8.0_05
Step 2:
Open the “/etc/profile” file and add the following line as per the version
Set a environment for Java
Use the root user to save the /etc/proflie or use gedit instead of vi .
The 'profile' file contains commands that ought to be run for login shells
boss@solaiv[]# sudo vi /etc/profile #--insert JAVA_HOME JAVA_HOME=/opt/jdk1.8.0_05
#--in PATH variable just append at the end of the line PATH=$PATH:$JAVA
_HOME/bin
#--Append JAVA_HOME at end of the export statement
32
export PATH JAVA_HOME
source /etc/profile
2) configure ssh
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your
local machine if you want to use Hadoop on it (which is what we want to do in this
short tutorial). For our single-node setup of Hadoop, we therefore need to configure
SSH access to localhost
The need to create a Password-less SSH Key generation based authentication is so
that the master node can then login to slave nodes (and the secondary node) to
start/stop them easily without any delays for authentication
If you skip this step, then have to provide password
Generate an SSH key for the user. Then Enable password-less SSH access to yo
root@solaiv[]# ssh-keygen
root@solaiv[]# ssh-copy-id -i localhost
--After above 2 steps, You will be connected without password, root@solaiv[]# ssh
localhost
root@solaiv[]# exit
33
3) Hadoop installation
Now Download Hadoop from the official Apache, preferably a stable release version
of Hadoop 2.7.x and extract the contents of the Hadoop package to a location of your choice.
Step 1: Download the tar.gz file of latest version Hadoop ( hadoop-2.7.x) from the official
site .
Step 2: Extract(untar) the downloaded file from this commands to /opt/bigdata
root@solaiv[]# cd /opt
root@solaiv[/opt]# sudo tar xvpzf /home/itadmin/Downloads/hadoop-2.7.0.tar.gz
root@solaiv[/opt]# cd hadoop-2.7.0/
boss@solaiv[]# cd $HADOOP_PREFIX
boss@solaiv[]# bin/hadoop version
34
core-site.xml, hdfs-site.xml, mapred-site.xml & yarn-site.xml
boss@solaiv[]# cd $HADOOP_PREFIX/etc/hadoop
boss@solaiv[]# vi hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_05
export HADOOP_PREFIX=/opt/hadoop-2.7.0
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Paste following between <configuration> tags
YARN configuration - Single Node modify the mapred-site.xml
35
</configuration>
Modiy yarn-site.xml
boss@solaiv[]# vi yarn-site.xml
The first step to starting up your Hadoop installation is formatting the Hadoop files
system which is implemented on top of the local file system of our “cluster” which
includes only our local machine. We need to do this the first time you set up a Hadoop
cluster.
Do not format a running Hadoop file system as you will lose all the data currently in
the cluster (in HDFS)
root@solaiv[]# cd $HADOOP_PREFIX
root@solaiv[]# bin/hadoop namenode -format
root@solaiv[]# sbin/start-dfs.sh
To know the running daemons jut type jps or /usr/local/jdk1.8.0_05/bin/jps Start
ResourceManager daemon and NodeManager daemon: (port 8088)
root@solaiv[]# sbin/start-yarn.sh
36
Make the HDFS directories required to execute MapReduce jobs:
or
View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat /output/*
HDFS Shell commands
cd $HADOOP_PREFIX
#craete new dir
bin/hadoopfs -mkdir /mit/
bin/hadoopfs -mkdir /mit/day2
OR
bin/hadoopfs -mkdir -p /mit/day2/
bin/hadoopfs -mkdir /mit/day3
# rmdir
bin/hadoopfs -rm /mit/day3
#Append single src, or multiple srcs from local file system to the destination file system.
Also reads
input from stdin and appends to destination file system.
bin/hdfsdfs -appendToFilelocalfile /user/hadoop/hadoopfile
37
#both localfile1 and localfile2 will be appended to hdfs
bin/hdfsdfs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
#HDFS path can also be given like below
bin/hdfsdfs -appendToFilelocalfile hdfs://localhost:9000/user/hadoop/hadoopfile
bin/hdfsdfs -appendToFile /home/bigdata/Downloads/ hdfs:////user/hadoop/hadoopfile
#read the input from stdin
hdfsdfs -appendToFile -
hdfs://nn.example.com/hadoop/hadoopfile #prompt will wait to
Reads the input from stdin. to finish <ctrl+c> #cat
bin/hadoopfs -cat /user/hadoop/hadoopfile
#chown&chmod
#put (or) copyFromLocal
bin/hadoopfs -put /media/bdalab/shortNotes/examples_files/vote_data /mit
bin/hadoopfs -put /home/bigdata/Downloads/mrdata /mit/day2
bin/hadoopfs -put /home/bigdata/Downloads/hivedata /mit/day2
#moveFromLocal
#Similar to put command, except that the source <localsrc> is deleted after it's copied.
bin/hdfsdfs -moveFromLocal<localsrc><dst>
#cp is used to copy files between directories present in HDFS
./bin/hdfsdfs -cp /user/hadoopfile /user/hadoop/hadoopfile
#mv, move the file from one hdfs location to other
./bin/hdfsdfs -mv /user/hadoopfile /user/hadoop/hadoopfile
#list the files
bin/hadoopfs -ls /
#The output columns with -count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE
FILE_NAME
./bin/hdfsdfs -count /
#du display the size, dir
./bin/hdfsdfs -du -h /
# for aggregate summary, -s
./bin/hdfsdfs -du -h -s /
#expunge, Empty the Trash.
bin/hdfsdfs -expunge
#Displays the Access Control Lists (ACLs) of files and directories
38
#we can get the same by using "ls" command
./bin/hdfsdfs -getfacl /user
#getmerge
#Sets an extended attribute name and value for a file or directory. -x <name>, remove the
attribute
hdfsdfs -setfattr -n user.myAttr -v myValue /user
#Displays the extended attribute names and values (if any) for a file or directory.
./bin/hdfsdfs -getfattr -d /user
#setrep, change replication factor of a file -w wait till the replication is achived
#hdfsdfs -setrep [-R] [-w] <numReplicas><path>
hdfsdfs -setrep -w 2 /user/hadoop/dir1
#distcp, Copy a directory from one node in the cluster to another
# -overwrite option to overwrite in an existing files
# -update command to synchronize both directories
hadoopfs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/hadoop
#tail, Displays last kilobyte of the file to stdout.
bin/hdfsdfs -tail /filename
#change the default configuration by using Generic Option
./bin/hadoopfs -D dfs.replication=2 -mkdir /user/solai
MapRed
======== #running
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar
wordcount /mit/day2/mrdata /op/
#list job
./bin/mapred job -list
#in order to show the previous running job details, start history server, port 10020
./sbin/mr-jobhistory-daemon.sh start historyserver
#job status
./bin/mapred job -status job_1423283172541_0001
#Get the file size As compared against OS and Hadoop
#OS size
blockdev --getbsz /dev/sda1
#HDFS Size
39
./bin/hdfsgetconf -confKeydfs.blocksize
PROCEDURE:
Hadoop MapReduce is a software framework for easily writing applications which process
vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of
nodes) of commodity hardware in a reliable, fault-tolerant manner.
A MapReduce job usually splits the input data-set into independent chunks which are
processed by the map tasks in a completely parallel manner. The framework sorts the
outputs of the maps, which are then input to the reduce tasks. Typically both the input and
the output of the job are stored in a file-system. The framework takes care of scheduling
tasks, monitoring them and re-executes the failed tasks.
Typically the compute nodes and the storage nodes are the same, that is, the MapReduce
framework and the Hadoop Distributed File System are running on the same set of nodes.
This configuration allows the framework to effectively schedule tasks on the nodes where
data is already present, resulting in very high aggregate bandwidth across the cluster.
The Hadoop job client then submits the job (jar/executable etc.) and configuration to
the ResourceManager which then assumes the responsibility of distributing the
software/configuration to the slaves, scheduling tasks and monitoring them, providing status
and diagnostic information to the job-client.
Prerequisites:
40
The key and value classes have to be serializable by the framework and hence need to
implement the Writable interface. Additionally, the key classes have to implement
theWritableComparable interface to facilitate sorting by the framework.
(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3,
v3> (output)
CODING:
WordCount is a simple application that counts the number of occurrences of each word in a
given input set.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import
org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
EXECUTION :
export JAVA_HOME=/usr/java/default
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
Hadoop launches jobs by getting a jar file containg the compiled Java code. In addition, we
typically send two command line arguments through to the Java program: the input data file
or directory, and an ouput directory for the results from the reduce tasks. Using a tool
called ant makes it pretty quick to create a jar file from the above code.
The ant tool uses an xml file that describes what needs to be compiled and packaged into a
jar file. Here is the one you used for the above WordCount example:
42
3 <property name="sourceDir" value="."/>
4 <property name="outputDir" value="classes" />
5 <property name="buildDir" value="jar" />
6 <property name="lib.dir" value="/usr/lib/hadoop"/>
7
8 <path id="classpath">
9 <fileset dir="${lib.dir}" includes="**/*.jar"/>
10 </path>
11 </target>
12 <target name="clean" depends="init">
13 <delete dir="${outputDir}" />
14 <delete dir="${buildDir}" />
15 </target>
16 <target name="prepare" depends="clean">
17 <mkdir dir="${outputDir}" />
18 <mkdir dir="${buildDir}"/>
19 </target>
20 <target name="compile" depends="prepare">
21 <javac srcdir="${sourceDir}" destdir="${outputDir}" classpathref="classpath"
22 />
23 </target>
24 <target name="jar" depends="compile">
25
26 <jar destfile="${buildDir}/wc.jar" basedir="${outputDir}">
27 <manifest>
28 <attribute name="Main-Class" value="wc.WordCount"/>
29 </manifest>
30 </jar>
31 </target>
</project>
Assuming that:
43
$ bin/hadoop fs -cat
/user/joe/wordcount/input/file01 hai hello how are
you
Applications can specify a comma separated list of paths which would be present in the
current working directory of the task using the option -files. The -libjars option allows
applications to add jars to the classpaths of the maps and reduces. The option -
archives allows them to pass comma separated list of archives as arguments. These archives
are unarchived and a link with name of the archive is created in the current working
directory of tasks.
OUTPUT:
44
RESULT:
45
EX NO:8 CREATING AND EXECUTING YOUR FIRST CONTAINER USING DOCKER
Date:
Aim:
To create and execute your first container using Docker.
Procedure:
Docker Image
A docker image is a file used to execute code in a docker container. Docker images acts as a set of
instructions to build a docker container, like a template. An image is comparable to a snapshot in virtual
machines (VM) environments.
Docker Container
A Docker Container is a lightweight, standalone, executable package of software that includes
everything needed to run an application code, runtime, system tools, system libraries and settings.
46
47
48
49
50
51
52
53
54
55
OUTPUT:
56
RESULT:
57
Ex No: 9 RUN A CONTAINER FROM A DOCKER HUB
Date:
Aim:
To run a container from a docker hub.
Procedure:
Step 1: Search for the image.
You can search images by selecting the bar at the top, or by using the Ctrl + k shortcut.
Search for Welcome-to-do-docker to find the image.
Step 2: Run the image.
Select run.
When the optional setting appear, specify the host port number 8090 and select run.
Step 3: Explore the container.
Go to the container tab in Docker Desktop to view the container.
58
59
60
61
62
OUTPUT:
63
RESULT:
64
CONTENTS BEYOND SYLLABUS
DATE:
OBJECTIVE:
To understand procedure to run the virtual machine of different configuration. Check how many
Virtual machines can be utilized at particular time .
PROCEDURE:
KVM INSTALLTION
Check that your CPU supports hardware virtualization
To run KVM, you need a processor that supports hardware virtualization. Intel and AMD both
have developed extensions for their processors, deemed respectively
Intel VT-x (code name Vanderpool) and AMD-V (code name Pacifica). To see if your
processor supports one of these, you can review the output from this command:
Now see if your running kernel is 64-bit, just issue the following command:
$ uname –m
x86_64 indicates a running 64-bit kernel. If you use see i386, i486, i586 or i686, you're
running a 32-bit kernel.
$ ls /lib/modules/3.16.0-30- generic/kernel/arch/x86/kvm/kvm
65
Install Necessary Packages
1. qemu-kvm
2. libvirt-bin
3. bridge-utils
4. virt-manager
5. qemu-system
$ sudo apt-get install qemu-kvm
$ sudo apt-get install libvirt-bin
$ sudo apt-get install bridge-utils
$ sudo apt-get install virt-manager
$ sudo apt-get install qemu-system
Verify Installation
You can test if your install has been successful with the following command:
$ virsh -c qemu:///system list
Id Name State
66
To Login in Guest OS
Step 1: Under the Project Tab, Click Instances. In the right side screen Click Launch
Instance.
Step 2: In the details, Give the instance name(eg. Instance1).
Step 3: Click Instance Boot Source list and choose 'Boot from image'
Step 4: Click Image name list and choose the image currently uploaded.
Step 5: Click launch.
Your VM will get created.
67
OPEN STACK INSTALLATION
68
Run DevStack:
$ ./stack.sh
Re-Starting Openstack
$ ./rejoin.sh
$ ps -ef|grep devstack it shows all the processes running
End all the processes.
Result:
69
EX.NO:11 MOUNT THE ONE NODE HADOOP CLUSTER USING FUSE
DATE:
PROCEDURE:
FUSE (Filesystem in Userspace) enables you to write a normal user application as a bridge for a
traditional filesystem interface. The hadoop-hdfs-fuse package enables you to use your HDFS
cluster as if it were a traditional filesystem on Linux. It is assumed that you have a working HDFS
cluster and know the hostname and port that your NameNode exposes.
To install fuse-dfs on Ubuntu systems:
$mkdir -p <mount_point>
hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port><mount_point>
You can now run operations as if they are on your mount point. Press Ctrl+C to end the fuse-
dfs program, and umount the partition if it is still mounted.
$ umount<mount_point>
You can now add a permanent HDFS mount which persists through reboots. To add a system
mount:
R-2017 IV-CSE/ -07 SEM CLOUD COMPUTING LAB
70
hadoop-fuse-dfs#dfs://<name_node_hostname>:<namenode_port><mount_point> fuse
allow_other,usetrash,rw 2 0
For example:
Your system is now configured to allow you to use the ls command and use that mount point
as if it were a normal system disk.
OUTPUT:
RESULT:
71
Ex. No. 12 USE THE API’S OF HADOOP FOR INTERACTION WITH HADOOP
DATE:
AIM: To write a program to use the API’s of Hadoop to interact with it.
PROCEDURE:
Prerequisites:
First we write a program to fetch titles from one or more web pages in Python:
// multifetch.py
#!/usr/bin/env python
import sys, urllib, re
title_re = re.compile("<title>(.*?)</title>",
re.MULTILINE | re.DOTALL | re.IGNORECASE)
for url in sys.argv[1:]:
match = title_re.search(urllib.urlopen(url).read())
if match:
print url, "\t", match.group(1).strip()
OUTPUT (Sample):
http://www.jeppiaarinstitute.org Jeppiaar Institute of Technology
http://www.annauniv.edu Anna University
RESULT:
72