0% found this document useful (0 votes)
18 views

Practical 1-2com

Uploaded by

2203051057108
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Practical 1-2com

Uploaded by

2203051057108
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Faculty OfEngineering& Technology

BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

FACULTY OF ENGINEERING AND TECHNOLOGY

Big Data Analytics (203105348)

7th SEMESTER

7A9 (CSE)

Name: Aashutosh.S.yadav

Year/Sem: 4th

Enrolment No. 2203051057108


Course: B-tech(CSE)

2203051057108 1
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

CERTIFICATE

This is to certify that

Mr./Ms..............................................................................................................
with enrolment no. ................................................................ has
successfully completed his/her laboratory experiments in the Big Data
Analytics (203105348) From the Department of

...................................................................................................

during the academic year ............................................

Date of Submission: -........................... Staff In charge: -...........................

2203051057108 2
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

INDEX

PAGE NO. MARKS SIGN


SR DATE OF DATE OF
NO PRACTICAL LIST START COMPLETION From To
1 To understand the overall 15-0602024 15-06-2024 4 6
programming architecture
using Map Reduce API.
2 Write a program of Word 7 10
Count in Map Reduce over
HDFS.
3
Basic CRUD operations in
MongoDB
4 Store the basic information
about students such as roll
no, name, date of birth, and
address of student using
various collection types
such as List, Set and Map.
5 Basic commands available
for the Hadoop Distributed
File System
6
Basic commands available
for HIVE Query Language.
7 Basic commands of HBASE
Shell.
8 Creating the HDFS tables
and loading them in Hive
and learn joining of tables
in Hive

2203051057108 3
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

Practical:1
AIM: To understand the overall programming architecture using Map Reduce API.

The MapReduce task is mainly divide into into two phase map phase and Reduce Phase.
1. Map(), filter(), and reduce() in python.
2. These functions are most commonly used with lambda function.
1.Map():
“A map function execute certain instructions or functionality provided to it on every item of an
iterable could be a list, tuple, set, etc.
SYNTAX:
Map(function,iterable)
EXAMPLE:
items=[1,2,3,4,5]
a=list(map((lambda x: x **3), items))
print(a)

2.Filter():-
“A filter function in python tests a specific user-defined confition for a function and returns an
iterable for the elements and values that satisfy the condition or, in other words, return true.”

2203051057108 4
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

SYNTAX:
Filter(function, iterable)

EXAMPLE:
a=[1,2,3,4,5]
b=[2,5,0,7,3]
c=list(filter(lambda x: x in a,b))
print(c)# prints out[2,5,3]

3.Reduce():
“Reduce function apply a function to every item of an iterable and gives back a single value as a
resultant”.
We have to import the reduce function from functools module using the statement.
SYNTAX:
reduce(function, iterable)
EXAMPLE:
from functools import reduce
a=reduce((lambda x, y: x*y),[1,2,3,4,])
print(a)

2203051057108 5
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

2203051057108 6
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

Practical-2

Aim: Write a program of Word Count in Map Reduce over HDFS.


Description:
MapReduce is a framework for processing large datasets using a large number of computers
(nodes), collectively referred to as a cluster. Processing can occur on data stored in a file
system (HDFS).A method for distributing computation across multiple nodes.Each node
processes the data that is stored at that node.
Consists of two main phases
Mapper Phase
Reduce phase

Map

I/P File Reduce HDFS

Map

Input data set is split into independent blocks – processed in parallel. Each input split is
converted in Key Value pairs. Mapper logic processes each key value pair and produces and
intermediate key value pairs based on the implementation logic. Resultant key value pairs can
be of different type from that of input key value pairs. The output of Mapper is passed to the
reducer. Output of Mapper function is the input for Reducer. Reducer sorts the intermediate

2203051057108 7
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

key value pairs. Applies reducer logic upon the key value pairs and produces the output in
desired format.Output is stored in HDFS.

The overall MapReduce word count process

Python Code
import urllib.request

import random

from operator import itemgetter

import_word = {}

import_count = 0

story = 'http://sixty-north.com/c/t.txt'

2203051057108 8
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

request =urllib.request.Request(story)

response = urllib.request.urlopen(request)

each_word = []

words = 1

same_words = {}

word = []

""" lopping the entire file"""

#Collect All the words into a list

for line in response:

#print "Line " , line

line_words = line.split()

for word in line_words:

each_word.append(word)

for words in each_word:

if words .lower() not in same_words.keys():

same_words[words.lower()] = 1

else:

same_words[words.lower()] =same_words[words.lower()]= +1

for each in same_words.keys():

print("word = ",each,"count = ",same_words[each]

2203051057108 9
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

Output:-

2203051057108 10

You might also like