0% found this document useful (0 votes)

2 views

Big Data Lab Manual Printout Copy

The document outlines the Big Data Analytics Laboratory course at Hindusthan Institute of Technology, detailing the installation of Apache Hadoop and various MapReduce programs. It includes practical exercises such as calculating word frequency, finding maximum temperatures, and analyzing datasets using Hadoop and Python. The document serves as a record of work done by students in the Computer Science and Engineering department for the academic year 2024-2025.

Uploaded by

UTHAYAKUMAR J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Big Data Lab Manual Printout Copy

Uploaded by

UTHAYAKUMAR J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

HINDUSTHAN

INSTITUTE OF TECHNOLOGY
(An Autonomous Institution)
(Approved By AICTE, New Delhi, Affiliated To Anna University, Chennai, Accredited by NBA & NAAC with „A‟ Grade)

COIMBATORE – 641 032

DEPARTMENT OF COMPUTER SCIENCE

AND ENGINEERING

22CS520 – BIG DATA ANALYTICS LABORATORY

Name of the Student: ……………….……………………...

Register Number: ……………….……….….….…..………

Branch: ……………………………………………………..

Year/Semester: ……………………………………………….

1
HINDUSTHAN
INSTITUTE OF TECHNOLOGY
(An Autonomous Institution)
(Approved By AICTE, New Delhi, Affiliated To Anna University, Chennai, Accredited by NBA & NAAC with „A‟ Grade)
Coimbatore – 641 032

Certified that this is the bonafide record of work done by

………..……………………………………………….…………..…in the 22CS520 – BIG DATA
ANALYTICS LABORATORY of this Institution, as prescribed by the Anna University, Chennai
for the Six Semester, Department of Computer Science and Engineering during the academic
year 2024 – 2025.

Place: Coimbatore
Date:

STAFF IN - CHARGE HEAD OF THE DEPARTMENT

University Register Number......................................................................................................................

Submitted for the practical Examination conducted on …………………………………………………

INTERNAL EXAMINER EXTERNAL EXAMINER

2
INDEX

S.NO. DATE EXPERIMENT PAGE MARKS SIGNATURE

NO
1. INSTALL APACHE HADOOP

2. MAPREDUCE PROGRAM TO CALCULATE

THE FREQUENCY
MAPREDUCE PROGRAM TO FIND THE
3.
MAXIMUM TEMPERATURE IN EACH
YEAR

4. MAPREDUCE PROGRAM TO FIND THE

GRADES OF STUDENT‟S
MAPREDUCE PROGRAM TO IMPLEMENT
5.
MATRIX MULTIPLICATION
MAPREDUCE TO FIND THE MAXIMUM
6.
ELECTRICAL CONSUMPTION IN EACH
YEAR
MAPREDUCE TO ANALYZE WEATHER
7. DATA SET AND PRINT WHETHER THE
DAY IS SHINNY OR COOL
MAPREDUCE PROGRAM TO FIND THE
8. NUMBER OF PRODUCTS SOLD IN EACH
COUNTRY
MAPREDUCE PROGRAM TO FIND THE
9. TAGS ASSOCIATED WITH EACH MOVIE
BY ANALYZING MOVIE LENS DATA
10. XYZ.COM IS AN ONLINE MUSIC WEBSITE
WHERE USERS LISTEN TO VARIOUS
TRACKS
MAPREDUCE PROGRAM TO FIND THE
11. FREQUENCY OF BOOKS PUBLISHED
EACH YEAR
12. MAPREDUCE PROGRAM TO ANALYZE
TITANIC SHIP DATA AND TO FIND THE
AVERAGE AGE OF THE PEOPLE

13. MAPREDUCE PROGRAM TO ANALYZE

UBER DATA SET

14. PYTHONAPPLICATION TO FIND THE

MAXIMUM TEMPERATURE USING SPARK
ADDITIONAL EXPERIMENTS
1. HIVE OPERATIONS

2. PIG LATIN MODES, PROGRAMS

Average

3
Exno:1
INSTALL APACHE HADOOP
Date:

AIM:

To Install Apache Hadoop.

Hadoop software can be installed in three modes of

Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open
source project in the big data playing field and is sponsored by the Apache Software
Foundation.

Hadoop-2.7.3 is comprised of four main layers:

 Hadoop Common is the collection of utilities and libraries that support other Hadoop
modules.
 HDFS, which stands for Hadoop Distributed File System, is responsible for persistingdata
to disk.
 YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
 MapReduce is the original processing model for Hadoop clusters. It distributes work within
the cluster or map, then organizes and reduces the results from the nodes into a response to a
query. Many other processing models are available for the 2.x version of Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone mode
which is suitable for learning about Hadoop, performing simple operations, and debugging.

Procedure:

we'll install Hadoop in stand-alone mode and run one of the example example MapReduce
programs it includes to verify the installation.

Prerequisites:

Step1: Installing Java 8

version.Openjdk version
"1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-
b14)OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that OpenJDK has been successfully installed.
Note: To set the path for environment variables. i.e. JAVA_HOME

4
Step2: Installing Hadoop
With Java in place, we'll visit the Apache Hadoop Releases page to find the mostrecent
stable release. Follow the binary for the current release:

Download Hadoop from www.hadoop.apache.org

5
Procedure to Run Hadoop

1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS

If Apache Hadoop 2.2.0 is not already installed then follow the post Build, Install,
Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS.

2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and Node
Manager)

Run following commands.

Command Prompt
C:\Users\abhijitg>cd c:\hadoop
c:\hadoop>sbin\start-dfs
c:\hadoop>sbin\start-yarn starting
yarn daemons

Namenode, Datanode, Resource Manager and Node Manager will be started in few
minutes and ready to execute Hadoop MapReduce job in the Single Node (pseudo-
distributed mode) cluster.

Resource Manager & Node Manager:

6
Run wordcount MapReduce job

Now we'll run wordcount MapReduce job available in

%HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples- 2.2.0.jar

Create a text file with some content. We'll pass this file as input
tothe wordcount MapReduce job for counting words. C:\file1.txt

Install Hadoop

Run Hadoop Wordcount Mapreduce Example

Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to be
used forcounting words.
C:\Users\abhijitg>cd c:\hadoop
C:\hadoop>bin\hdfs dfs -mkdir input

Copy the text file(say 'file1.txt') from local disk to the newly created 'input' directory in HDFS.

7
C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input

Check content of the copied file.

C:\hadoop>hdfs dfs -ls input

Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19 input/file1.txt

C:\hadoop>bin\hdfs dfs -cat input/file1.txt

Install Hadoop
Run Hadoop Wordcount Mapreduce Example

Run the wordcount MapReduce job provided

in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar

C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar

wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to process : 114/02/03
13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
8
:14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1391412385921_0002
14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application
application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032
14/02/03 13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job: job_1391412385921_0002
14/02/03 13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002 running inuber
mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%
14/02/03 13:22:30 INFO mapreduce.Job: Job job_1391412385921_0002 completed
successfully
14/02/03 13:22:31 INFO mapreduce.Job: Counters: 43File
System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=160142
FILE: Number of read operations=0 FILE:
Number of large read operations=0FILE:
Number of write operations=0

HDFS: Number of bytes read=171

HDFS: Number of bytes written=59
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5657 Total
time spent by all reduces in occupied slots (ms)=6128
Map-Reduce Framework
Map input records=2
Map output records=7
Map output bytes=82
Map output materialized bytes=89
Input split bytes=116
Combine input records=7
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=89
Reduce input records=6
Reduce output records=6
Spilled Records=12 Shuffled
Maps =1
9
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=145
CPU time spent (ms)=1418
Physical memory (bytes) snapshot=368246784 Virtual
memory (bytes) snapshot=513716224 Total committed
heap usage (bytes)=307757056
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=55
File Output Format Counters

Bytes Written=59
http://abhijitg:8088/cluster

RESULT:

We've installed Hadoop in stand-alone mode and verified it by running anexample program
it provided.

10
Exno:2 MAPREDUCE PROGRAM TO CALCULATE THE
Date: FREQUENCY

AIM:
To Develop a MapReduce program to calculate the frequency of a given word in a given
file Map Function – It takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (Key-Value pair).
Example – (Map function in Word Count)
Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS, TRAIN
Output
Convert into another set of
data(Key,Value)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)
Reduce Function – Takes the output from Map as an input and combines those data
tuplesinto a smaller set of tuples.
Example – (Reduce function in Word Count)
Input Set of
Tuples(output of
Map function)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1),
(bus,1),(TRAIN,1),(BUS,1),
(buS,1),(caR,1),(CAR,1), (car,1), (BUS,1), (TRAIN,1)
Output Converts into smaller set of tuples
(BUS,7), (CAR,7), (TRAIN,4)
Work Flow of Program

11
Workflow of MapReduce consists of 5 steps
1. Splitting – The splitting parameter can be anything, e.g. splitting by space,
comma, semicolon, or even by a new line („\n‟).
2. Mapping – as explained above
3. Intermediate splitting – the entire process in parallel on different clusters. In orderto
group them in “Reduce Phase” the similar KEY data should be on same cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set from each
cluster) is combine together to form a Result

12
PROGRAM:-

def mapper(text):
words = text.lower().split()
return [(word.strip(".,!?;\"()"), 1) for word in words]

# Define the input text

text = "This is a sample text. It contains some words, and we want to count them!"
mapped = mapper(text)
print("Mapped Output:\n", mapped)

# Step 2: Group by word (Shuffle step)

from collections import defaultdict

def shuffle(mapped_data):
grouped = defaultdict(list)
for word, count in mapped_data:
grouped[word].append(count)
return grouped
shuffled = shuffle(mapped)
print("\nShuffled Output:\n", dict(shuffled))

# Step 3: Reduce step (word count aggregation)

def reducer(grouped_data):
reduced = {word: sum(counts) for word, counts in grouped_data.items()}
return reduced
reduced = reducer(shuffled)
print("\nReduced Output (Final Word Count):\n", reduced)

OUTPUT:

Mapped Output:
[('this', 1), ('is', 1), ('a', 1), ('sample', 1), ('text', 1), ('it', 1), ('contains', 1), ('some', 1), ('words', 1), ('and', 1), ('we', 1),
('want', 1), ('to', 1), ('count', 1), ('them', 1)]

Shuffled Output:
{'this': [1], 'is': [1], 'a': [1], 'sample': [1], 'text': [1], 'it': [1], 'contains': [1], 'some': [1], 'words': [1], 'and': [1], 'we': [1],
'want': [1], 'to': [1], 'count': [1], 'them': [1]}

Reduced Output (Final Word Count):

{'this': 1, 'is': 1, 'a': 1, 'sample': 1, 'text': 1, 'it': 1, 'contains': 1, 'some': 1, 'words': 1, 'and': 1, 'we': 1, 'want': 1, 'to': 1,
'count': 1, 'them': 1}

RESULT:

Thus the above program to find the count of a given words has been executed and verified successfully

13
Exno:3 MAPREDUCE PROGRAM TO FIND THE MAXIMUM
TEMPERATURE IN EACH YEAR
Date:

AIM:
To Develop a MapReduce program to find the maximum temperature in each year.
Description:
MapReduce is a programming model designed for processing large volumes of datain
parallel by dividing the work into a set of independent tasks.Our previous traversal has given
an introduction about MapReduce This traversal explains how to design a MapReduce
program.

PROGRAM.

data = [
"2020-01-01 30",
"2020-05-12 45",
"2020-12-30 10",
"2021-01-15 20",
"2021-06-18 50",
"2021-09-20 48",
"2022-02-11 25",
"2022-07-04 39",
"2022-11-22 41"
]

from collections import defaultdict

# Step 1: Mapper - Extract year and temperature

def mapper(data):
mapped = []
for record in data:
date_str, temp_str = record.split()
year = date_str.split("-")[0]
temperature = int(temp_str)
mapped.append((year, temperature))
return mapped

# Step 2: Shuffle - Group temperatures by year

def shuffle(mapped_data):
grouped = defaultdict(list)
for year, temp in mapped_data:
grouped[year].append(temp)
return grouped

14
# Step 3: Reducer - Find max temperature per year
def reducer(grouped_data):
reduced = {year: max(temps) for year, temps in grouped_data.items()}
return reduced

# Execute MapReduce
mapped_data = mapper(data)
print("Mapped Output:\n", mapped_data)
shuffled_data = shuffle(mapped_data)
print("\nShuffled Output:\n", dict(shuffled_data))
reduced_data = reducer(shuffled_data)
print("\nReduced Output (Max Temperature per Year):\n", reduced_data)

Output:

Mapped Output:
[('2020', 30), ('2020', 45), ('2020', 10), ('2021', 20), ('2021', 50), ('2021', 48), ('2022', 25), ('2022', 39), ('2022', 41)]

Shuffled Output:
{'2020': [30, 45, 10], '2021': [20, 50, 48], '2022': [25, 39, 41]}

Reduced Output (Max Temperature per Year):

{'2020': 45, '2021': 50, '2022': 41}

RESULT:

Thus the above program to find the maximum temperature recorded in a year with the help of a given has been
executed and verified successfully

15
Exno:4 MAPREDUCE PROGRAM TO FIND THE GRADES
Date: OF STUDENT’S

AIM:
To Develop a MapReduce program to find the grades of student’s.

Program:

# Step 1: Mapper – Convert input to (Student, Score) pairs

def mapper(data):

mapped = []

for line in data:

name, score = line.split()

mapped.append((name, int(score)))

return mapped

# Step 2: Shuffle – Not needed here since we map directly per student

# Step 3: Reducer – Assign grades based on score

def grade(score):

if score >= 90:

return 'A'

elif score >= 80:

return 'B'

elif score >= 70:

return 'C'

elif score >= 60:

return 'D'

else:

return 'F'

16
def reducer(mapped_data):

reduced = {name: grade(score) for name, score in mapped_data}

return reduced

# Run the simulation

# Define the data within the same scope

data = [

"Alice 95",

"Bob 67",

"Charlie 88",

"David 73",

"Eva 54",

"Frank 100",

"Grace 82"

mapped_data = mapper(data)

print("Mapped Output:\n", mapped_data)

reduced_data = reducer(mapped_data)

print("\nReduced Output (Grades):\n", reduced_data)

EXPECTED OUTPUT:

Score Range | Grade

90–100 | A
80–89 | B
70–79 | C
60–69 | D
< 60 | F

17
OUTPUT:

Mapped Output:
[('Alice', 95), ('Bob', 67), ('Charlie', 88), ('David', 73), ('Eva', 54), ('Frank', 100), ('Grace', 82)]

Reduced Output (Grades):

{'Alice': 'A', 'Bob': 'D', 'Charlie': 'B', 'David': 'C', 'Eva': 'F', 'Frank': 'A', 'Grace': 'B'}

RESULT:

Thus the above program to find the maximum temperature recorded in a year with the given data has been executed
and verified successfully

18
Exno:5 MAPREDUCE PROGRAM TO IMPLEMENT
Date: MATRIX MULTIPLICATION

AIM:

To Develop a MapReduce program to implement Matrix Multiplication.

In mathematics, matrix multiplication or the matrix product is a binary operation that
produces a matrix from two matrices. The definition is motivated by linear equations and
linear transformations on vectors, which have numerous applications in applied
mathematics, physics, and engineering. In more detail, if A is an n × m matrix and B is an m
× p matrix, their matrix product AB is an n × p matrix, in which the m entries across a row
of A are multiplied with the m entries down a column of B and summed to produce an entry
of AB. When two linear transformations are represented by matrices, then the matrix
product represents the composition of the two transformations.

ALGORITHM for Map Function.

a. for each element mij of M do

produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number ofcolumns of N
b. for each element njk of N do
produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number ofrows of M.

c. return Set of (key,value) pairs that each key (i,k), has list with values
(M,j,mij) and (N, j,njk) for all possible values of j.
19
Algorithm for Reduce Function.

d. for each key (i,k) do

e. sort values begin with M by j in listM sort values begin with N by j in listNmultiply mij
and njk for jth value of each list
f. sum up mij x njk return (i,k), Σj=1 mij x njk

PROGRAM:-

A=[
[1, 2, 3],
[4, 5, 6]
]

B=[
[7, 8],
[9, 10],
[11, 12]
]

C=[
[58, 64],
[139, 154]
]

from collections import defaultdict

# Input matrices
A=[
[1, 2, 3],
[4, 5, 6]
]

B=[
[7, 8],
[9, 10],
[11, 12]
]

# Dimensions
m, n = len(A), len(A[0])
n2, p = len(B), len(B[0])

assert n == n2, "Inner matrix dimensions must match for multiplication."

# Step 1: Mapper
def mapper(A, B):
mapped = []

20
# For each element in A, emit for matching B[j][k]
for i in range(m):
for k in range(n):
for j in range(p):
mapped.append(((i, j), ('A', k, A[i][k])))

# For each element in B, emit for matching A[i][k]

for k in range(n):
for j in range(p):
for i in range(m):
mapped.append(((i, j), ('B', k, B[k][j])))

return mapped

# Step 2: Shuffle
def shuffle(mapped_data):
grouped = defaultdict(list)
for key, value in mapped_data:
grouped[key].append(value)
return grouped

# Step 3: Reducer
def reducer(grouped_data):
result = defaultdict(int)

for (i, j), values in grouped_data.items():

a_dict = {}
b_dict = {}

# Separate A and B values by index k

for tag, k, val in values:
if tag == 'A':
a_dict[k] = val
elif tag == 'B':
b_dict[k] = val

# Multiply and accumulate

total = 0
for k in range(n):
total += a_dict.get(k, 0) * b_dict.get(k, 0)
result[(i, j)] = total

return result

# Execute MapReduce
mapped = mapper(A, B)
print("Mapped Data:\n", mapped)

shuffled = shuffle(mapped)
print("\nShuffled Data:\n", dict(shuffled))

21
reduced = reducer(shuffled)

# Convert to matrix format

C = [[reduced[(i, j)] for j in range(p)] for i in range(m)]
print("\nResultant Matrix C (A x B):")
for row in C:
print(row)

EXPECTED OUTPUT:

Mapped Data:
[((0, 0), ('A', 0, 1)), ((0, 1), ('A', 0, 1)), ((0, 0), ('A', 1, 2)), ((0, 1), ('A', 1, 2)), ((0, 0), ('A', 2, 3)), ((0, 1), ('A',
2, 3)), ((1, 0), ('A', 0, 4)), ((1, 1), ('A', 0, 4)), ((1, 0), ('A', 1, 5)), ((1, 1), ('A', 1, 5)), ((1, 0), ('A', 2, 6)), ((1, 1),
('A', 2, 6)), ((0, 0), ('B', 0, 7)), ((1, 0), ('B', 0, 7)), ((0, 1), ('B', 0, 8)), ((1, 1), ('B', 0, 8)), ((0, 0), ('B', 1, 9)),
((1, 0), ('B', 1, 9)), ((0, 1), ('B', 1, 10)), ((1, 1), ('B', 1, 10)), ((0, 0), ('B', 2, 11)), ((1, 0), ('B', 2, 11)), ((0, 1),
('B', 2, 12)), ((1, 1), ('B', 2, 12))]

Shuffled Data:
{(0, 0): [('A', 0, 1), ('A', 1, 2), ('A', 2, 3), ('B', 0, 7), ('B', 1, 9), ('B', 2, 11)], (0, 1): [('A', 0, 1), ('A', 1, 2), ('A',
2, 3), ('B', 0, 8), ('B', 1, 10), ('B', 2, 12)], (1, 0): [('A', 0, 4), ('A', 1, 5), ('A', 2, 6), ('B', 0, 7), ('B', 1, 9), ('B', 2,
11)], (1, 1): [('A', 0, 4), ('A', 1, 5), ('A', 2, 6), ('B', 0, 8), ('B', 1, 10), ('B', 2, 12)]}

Resultant Matrix C (A x B):

[58, 64]
[139, 154]

RESULT:
Thus the above program to find the value of matrix with the given data has been executed and verified
successfully

22
Exno:6 MAPREDUCE TO FIND THE MAXIMUM
Date: ELECTRICAL CONSUMPTION IN EACH YEAR

AIM:

To Develop a MapReduce to find the maximum electrical consumption in each

year given electrical consumption for each month in each year.

Given below is the data regarding the electrical consumption of an organization. It

contains themonthly electrical consumption and the annual average for various years.

If the above data is given as input, we have to write applications to process it and produce
resultssuch as finding the year of maximum usage, year of minimum usage, and so on. This is
a walkoverfor the programmers with finite number of records. They will simply write the
logic to produce the required output, and pass the data to the application written.

But, think of the data representing the electrical consumption of all the largescale
industries of aparticular state, since its formation.

PROGRAM:
from collections import defaultdict

# Step 1: Mapper – Extract (year, consumption)

def mapper(data):
mapped = []
for line in data:
date_str, value_str = line.split()
year = date_str.split("-")[0]
value = int(value_str)
mapped.append((year, value))
return mapped

# Step 2: Shuffle – Group all values by year

def shuffle(mapped_data):
grouped = defaultdict(list)
for year, value in mapped_data:
grouped[year].append(value)
return grouped

# Step 3: Reducer – Get the max value for each year

def reducer(grouped_data):
reduced = {year: max(values) for year, values in grouped_data.items()}
return reduced

23
# Run MapReduce simulation

mapped_data = mapper(data)
print("Mapped Output:\n", mapped_data)

shuffled_data = shuffle(mapped_data)
print("\nShuffled Output:\n", dict(shuffled_data))

reduced_data = reducer(shuffled_data)
print("\nReduced Output (Max Consumption per Year):\n", reduced_data)

OUTPUT:

Mapped Output:
[('2020', 350), ('2020', 420), ('2020', 390), ('2021', 450), ('2021', 430), ('2022', 470), ('2022', 510), ('2022', 495)]

Shuffled Output:
{'2020': [350, 420, 390], '2021': [450, 430], '2022': [470, 510, 495]}

Reduced Output (Max Consumption per Year):

{'2020': 420, '2021': 450, '2022': 510}

RESULT

Thus the above program to find the maximum electrical consumption in the year with the given data has
been executed and verified successfully

24
Exno:7
MAPREDUCE TO ANALYZE WEATHER DATA SET AND PRINT
Date: WHETHER THE DAY IS SHINNY OR COOL

AIM:

To Develop a MapReduce to analyze weather data set and print whether the day isshinny
or cool day.

PROGRAM:

weather_data = [
"2023-06-01 32 Sunny",
"2023-06-02 21 Rainy",
"2023-06-03 25 Cloudy",
"2023-06-04 34 Cloudy",
"2023-06-05 29 Sunny",
"2023-06-06 22 Foggy"
]

# Step 1: Mapper – Extract date and classify day

def mapper(data):
mapped = []
for line in data:
parts = line.split()
date = parts[0]
temp = int(parts[1])
condition = parts[2]

if condition.lower() == "sunny" or temp > 30:

category = "Shiny"
else:
category = "Cool"

mapped.append((date, category))
return mapped

# Step 2: No shuffle needed (1-to-1 mapping)

# Step 3: Reducer – Just print out the result or store it

def reducer(mapped_data):
result = {date: category for date, category in mapped_data}
return result

# Execute MapReduce
mapped = mapper(weather_data) # weather_data is now defined in the same scope
print("Mapped Output:\n", mapped)

25
reduced = reducer(mapped)
print("\nFinal Classification (Day → Shiny or Cool):")
for date, label in reduced.items():
print(f"{date}: {label}")

OUTPUT:

Mapped Output:
[('2023-06-01', 'Shiny'), ('2023-06-02', 'Cool'), ('2023-06-03', 'Cool'), ('2023-06-04', 'Shiny'), ('2023-06-05',
'Shiny'), ('2023-06-06', 'Cool')]

Final Classification (Day → Shiny or Cool):

2023-06-01: Shiny
2023-06-02: Cool
2023-06-03: Cool
2023-06-04: Shiny
2023-06-05: Shiny
2023-06-06: Cool

RESULT:

Thus the above program to analyze the weather condition with the given data has been executed and
verified successfully

26
Exno:8
MAPREDUCE PROGRAM TO FIND THE NUMBER OF
Date: PRODUCTS SOLD INEACH COUNTRY

AIM:

Develop a MapReduce program to find the number of products sold in each country by considering
sales data containing fields.
PROGRAM:
sales_data = [
"USA,TV,10",
"India,Laptop,5",
"USA,Phone,7",
"India,Tablet,3",
"Germany,TV,6",
"Germany,Phone,4",
"USA,Laptop,2"
]

from collections import defaultdict

# Step 1: Mapper – Extract (Country, Quantity)

def mapper(data):
mapped = []
for line in data:
country, product, quantity = line.split(',')
mapped.append((country.strip(), int(quantity.strip())))
return mapped

# Step 2: Shuffle – Group quantities by country

def shuffle(mapped_data):
grouped = defaultdict(list)

27
for country, qty in mapped_data:
grouped[country].append(qty)
return grouped

# Step 3: Reducer – Sum quantities per country

def reducer(grouped_data):
reduced = {country: sum(qtys) for country, qtys in grouped_data.items()}
return reduced

# Run the MapReduce simulation

mapped_data = mapper(sales_data)
print("Mapped Output:\n", mapped_data)
shuffled_data = shuffle(mapped_data)
print("\nShuffled Output:\n", dict(shuffled_data))
reduced_data = reducer(shuffled_data)
print("\nFinal Output (Total Products Sold per Country):")
for country, total in reduced_data.items():
print(f"{country}: {total}")

OUTPUT:
Mapped Output:

[('USA', 10), ('India', 5), ('USA', 7), ('India', 3), ('Germany', 6), ('Germany', 4), ('USA', 2)]

Shuffled Output:

{'USA': [10, 7, 2], 'India': [5, 3], 'Germany': [6, 4]}

Final Output (Total Products Sold per Country):

USA: 19

India: 8

Germany: 10

RESULT:
Thus the above program to analyze the total products sold per company with the given data has been
executed and verified successfully.

28
Exno:9
MAPREDUCE PROGRAM TO FIND THE TAGS ASSOCIATED
Date: WITH EACHMOVIE BY ANALYZING MOVIE LENS DATA

AIM:
To Develop a MapReduce program to find the tags associated with each movie by analyzing movie
lens data.
PROGRAM:
from collections import defaultdict
# Step 1: Load movieId → title mapping
def load_movies(movie_data):
movie_dict = {}
for line in movie_data:
parts = line.split(",", 2)
movie_id = parts[0]
title = parts[1]
movie_dict[movie_id] = title
return movie_dict

# Step 3: Shuffle – Group tags by movieId

def shuffle(mapped_data):

29
grouped = defaultdict(list)
for movie_id, tag in mapped_data:
grouped[movie_id].append(tag)
return grouped

# Step 4: Reducer – Replace movieId with title

def reducer(grouped_data, movie_dict):
reduced = {}
for movie_id, tags in grouped_data.items():
title = movie_dict.get(movie_id, "Unknown Movie")
reduced[title] = tags
return reduced

# Run the MapReduce simulation

movie_dict = load_movies(movies)
mapped = mapper(tags)
print("Mapped Output:\n", mapped)
shuffled = shuffle(mapped)
print("\nShuffled Output:\n", dict(shuffled))
reduced = reducer(shuffled, movie_dict)
print("\nReduced Output (Movie → Tags):")
for title, tag_list in reduced.items():
print(f"{title}: {tag_list}")

30
OUTPUT:

Mapped Output:
[('10', 'epic'), ('10', 'sci-fi'), ('12', 'romantic')]

Shuffled Output:
{'10': ['epic', 'sci-fi'], '12': ['romantic']}

Reduced Output (Movie → Tags):

Star Wars (1977): ['epic', 'sci-fi']
Titanic (1997): ['romantic']

RESULT:

Thus the above program to analyze the type of a movie with the given data has been executed and verified
successfully.
31
Exno:10
XYZ.COM IS AN ONLINE MUSIC WEBSITE WHERE
Date: USERS LISTEN TOVARIOUS TRACKS

AIM:
XYZ.com is an online music website where users listen to various tracks, the data gets collected
which is given below.

PROGRAM:

from collections import defaultdict

# Step 1: Mapper – Emit (track, 1)

def mapper(data):

mapped = []

for line in data:

user, track = line.strip().split(",")

mapped.append((track, 1))

return mapped

# Step 2: Shuffle – Group by track

def shuffle(mapped_data):

grouped = defaultdict(list)

for track, count in mapped_data:

grouped[track].append(count)

return grouped

# Step 3: Reducer – Sum the counts for each track

def reducer(grouped_data):

32
reduced = {track: sum(counts) for track, counts in grouped_data.items()}

return reduced

# Sample Data

logs = [

"user1,trackA",

"user2,trackB",

"user3,trackA",

"user1,trackC",

"user2,trackA",

"user3,trackC",

"user4,trackA"

# Run MapReduce

mapped = mapper(logs)

print("Mapped Output:\n", mapped)

shuffled = shuffle(mapped)

print("\nShuffled Output:\n", dict(shuffled))

reduced = reducer(shuffled)

print("\nFinal Output (Track Plays Count):")

for track, count in reduced.items():

print(f"{track}: {count}")

33
OUTPUT:

Mapped Output:
[('trackA', 1), ('trackB', 1), ('trackA', 1), ('trackC', 1), ('trackA', 1), ('trackC', 1), ('trackA', 1)]

Shuffled Output:
{'trackA': [1, 1, 1, 1], 'trackB': [1], 'trackC': [1, 1]}

Final Output (Track Plays Count):

trackA: 4
trackB: 1
trackC: 2

RESULT:

Thus the above program to analyze the list of tracks with the given data available in the website has been
executed and verified successfully.

34
Exno:11
MAPREDUCE PROGRAM TO FIND THE
Date: FREQUENCY OF BOOKS PUBLISHED EACH YEAR

AIM:
Develop a MapReduce program to find the frequency of books published each year and find in
which year maximum number of books was published using the following data.
Title Author Published Author Language No of pages

PROGRAM:

# Step 1: Mapper – Extract (Year, 1)

def mapper(data):
mapped = []
for line in data:
parts = line.split(",")
year = parts[2].strip() # Extract the year
mapped.append((year, 1)) # Emit (year, 1)
return mapped

# Step 2: Shuffle – Group by year

def shuffle(mapped_data):
grouped = defaultdict(list)
for year, count in mapped_data:
grouped[year].append(count)
return grouped

# Step 3: Reducer – Sum the counts for each year

def reducer(grouped_data):
reduced = {year: sum(counts) for year, counts in grouped_data.items()}
return reduced

# Sample Data
books = [
"The Great Gatsby,F. Scott Fitzgerald,1925",
"1984,George Orwell,1949",
"To Kill a Mockingbird,Harper Lee,1960",
"The Catcher in the Rye,J.D. Salinger,1951",
"Moby-Dick,Herman Melville,1851",
"Pride and Prejudice,Jane Austen,1813",
"The Hobbit,J.R.R. Tolkien,1937",
"1984,George Orwell,1949",
"The Lord of the Rings,J.R.R. Tolkien,1954"
35
]

# Run the MapReduce process

mapped = mapper(books)
print("Mapped Output:\n", mapped)

shuffled = shuffle(mapped)
print("\nShuffled Output:\n", dict(shuffled))

reduced = reducer(shuffled)
print("\nFinal Output (Books Published Each Year):")
for year, count in reduced.items():
print(f"{year}: {count}")

OUTPUT:
Mapped Output:
[('1925', 1), ('1949', 1), ('1960', 1), ('1951', 1), ('1851', 1), ('1813', 1), ('1937', 1), ('1949', 1), ('1954',
1)]

Shuffled Output:
{'1925': [1], '1949': [1, 1], '1960': [1], '1951': [1], '1851': [1], '1813': [1], '1937': [1], '1954': [1]}

Final Output (Books Published Each Year):

1925: 1
1949: 2
1960: 1
1951: 1
1851: 1
1813: 1
1937: 1
1954: 1

RESULT:

Thus the above program to analyze the list of tracks with the given data available in the website
has been executed and verified successfully.

36
37
38
Exno:12
MAPREDUCE PROGRAM TO ANALYZE TITANIC SHIP
Date: DATA AND TO FIND THEAVERAGE AGE OF THE PEOPLE

AIM:
Develop a MapReduce program to analyze Titanic ship data and to find the average age of the people (both
male and female) who died in the tragedy. How many persons are survived in each class?
PROGRAM:
titanic_data = [
"1,1,Allen,Mr. William Henry,Male,35,0,0,A/5 21171,8.05,,S",
"2,1,Braund,Mr. James,22,Male,1,0,PC 17599,71.2833,C85,C",
"3,3,Creasey,Miss. Alicia,Female,28,0,0,STON/OQ 392076,7.925,,Q",
"4,1,Heikkinen,Miss. Laina,Female,26,0,0,STON/OQ 392078,7.925,,S",
"5,3,Johnson,Miss. Elizabeth,34,Female,0,0,CA. 2343,8.05,,S",
"6,3,Allen,Mr. Thomas,Male,,0,0,315098,8.05,,S"
]

from collections import defaultdict

# Step 1: Mapper – Extract Age and count the number of valid ages

def mapper(data):
mapped = []
for line in data:
parts = line.split(",")
age = parts[4].strip() # Extract the age field
try:
age = float(age) # Convert age to float if valid
if age > 0: # Only consider valid ages
mapped.append((1, age)) # Emit (1, age)
except ValueError:
continue # Skip invalid age entries (empty or non-numeric values)
return mapped

# Step 2: Shuffle – Group all values by key (since it's only one key: 1)
def shuffle(mapped_data):
grouped = defaultdict(list)
for key, age in mapped_data:
grouped[key].append(age)
return grouped

# Step 3: Reducer – Calculate the sum and count of ages, then compute the average
def reducer(grouped_data):
reduced = {}
for key, ages in grouped_data.items():
total_age = sum(ages)
count = len(ages)
average_age = total_age / count if count > 0 else 0
reduced[key] = average_age
39
return reduced

# Sample Data (Titanic)

titanic_data = [
"1,1,Allen,Mr. William Henry,Male,35,0,0,A/5 21171,8.05,,S",
"2,1,Braund,Mr. James,22,Male,1,0,PC 17599,71.2833,C85,C",
"3,3,Creasey,Miss. Alicia,Female,28,0,0,STON/OQ 392076,7.925,,Q",
"4,1,Heikkinen,Miss. Laina,Female,26,0,0,STON/OQ 392078,7.925,,S",
"5,3,Johnson,Miss. Elizabeth,34,Female,0,0,CA. 2343,8.05,,S",
"6,3,Allen,Mr. Thomas,Male,,0,0,315098,8.05,,S"
]

# Run the MapReduce process

mapped = mapper(titanic_data)
print("Mapped Output:\n", mapped)

shuffled = shuffle(mapped)
print("\nShuffled Output:\n", dict(shuffled))

reduced = reducer(shuffled)
print("\nFinal Output (Average Age of Passengers):")
for key, average_age in reduced.items():
print(f"Average Age: {average_age:.2f}")

OUTPUT:

Mapped Output:
[(1, 22.0), (1, 34.0)]

Shuffled Output:
{1: [22.0, 34.0]}

Final Output (Average Age of Passengers):

Average Age: 28.00

RESULT:
Thus the above program to analyze the Titanic ship data and to find the average age of the people (both
male and female) who died in the tragedy has been executed and verified successfully

40
Exno:13
MAPREDUCE PROGRAM TO ANALYZE UBER DATA SET
Date:

AIM:

To Develop a MapReduce program to analyze Uber data set to find the days on which each
basement has more trips using the following dataset.

PROGRAM:

uber_data = [
"1,Location1,Location2,2025-04-01 08:00:00,15,123,456",
"2,Location1,Location3,2025-04-01 09:15:00,30,124,457",
"3,Location2,Location1,2025-04-01 10:00:00,20,125,458",
"4,Location1,Location4,2025-04-01 11:00:00,10,126,459",
"5,Location2,Location3,2025-04-01 12:30:00,25,127,460",
"6,Location3,Location2,2025-04-01 13:00:00,18,128,461"
]

from collections import defaultdict

# Step 1: Mapper – Extract pickup location and ride duration

def mapper(data):
mapped = []
for line in data:
parts = line.split(",")
pickup_location = parts[1].strip() # Pickup location
ride_duration = int(parts[4].strip()) # Ride duration (in minutes)
mapped.append((pickup_location, ride_duration)) # Emit (pickup_location, ride_duration)
return mapped

# Step 2: Shuffle – Group all ride durations by pickup location

def shuffle(mapped_data):
grouped = defaultdict(list)
for location, duration in mapped_data:
grouped[location].append(duration)
return grouped

41
# Step 3: Reducer – Calculate the average ride duration for each pickup location
def reducer(grouped_data):
reduced = {}
for location, durations in grouped_data.items():
total_duration = sum(durations)
count = len(durations)
average_duration = total_duration / count if count > 0 else 0
reduced[location] = average_duration
return reduced

# Sample Data (Uber rides)

# Run the MapReduce process

mapped = mapper(uber_data)
print("Mapped Output:\n", mapped)

shuffled = shuffle(mapped)
print("\nShuffled Output:\n", dict(shuffled))

reduced = reducer(shuffled)
print("\nFinal Output (Average Ride Duration by Pickup Location):")
for location, avg_duration in reduced.items():
print(f"{location}: {avg_duration:.2f} minutes")

42
OUTPUT:
Mapped Output:
[('Location1', 15), ('Location1', 30), ('Location2', 20), ('Location1', 10), ('Location2', 25), ('Location3', 18)]

Shuffled Output:
{'Location1': [15, 30, 10], 'Location2': [20, 25], 'Location3': [18]}

Final Output (Average Ride Duration by Pickup Location):

Location1: 18.33 minutes
Location2: 22.50 minutes
Location3: 18.00 minutes

RESULT:

Thus the above program to MapReduce program to analyze Uber data set to find the days on
which each basement has more trips using the following dataset has been executed and verified successfully

43
Exno:14 PYTHON APPLICATION TO FIND THE MAXIMUM
Date: TEMPERATURE USING SPARK

AIM:
To Develop a Python application to find the maximum temperature using Spark.

PROGRAM:
date,location,temperature
2025-04-01,New York,22
2025-04-01,Los Angeles,28
2025-04-01,Chicago,18
2025-04-02,New York,24
2025-04-02,Los Angeles,30
2025-04-02,Chicago,20

from pyspark.sql import SparkSession

from pyspark.sql.functions import col, avg

# Step 1: Initialize the Spark session

spark = SparkSession.builder.appName("AverageTemperatureByLocation").getOrCreate()

# Step 2: Load the temperature data into a Spark DataFrame

data = [
("2025-04-01", "New York", 22),
("2025-04-01", "Los Angeles", 28),
("2025-04-01", "Chicago", 18),
("2025-04-02", "New York", 24),
("2025-04-02", "Los Angeles", 30),
("2025-04-02", "Chicago", 20)
]

44
columns = ["date", "location", "temperature"]

# Create a DataFrame from the list of tuples

df = spark.createDataFrame(data, columns)

# Step 3: Show the data (optional)

df.show()

# Step 4: Group the data by 'location' and calculate the average temperature
avg_temp_by_location =
df.groupBy("location").agg(avg("temperature").alias("avg_temperature"))

# Step 5: Display the result

avg_temp_by_location.show()

# Step 6: Stop the Spark session

spark.stop()

45
OUTPUT:
+----------+-----------+-----------+
| date| location|temperature|
+----------+-----------+-----------+
|2025-04-01| New York| 22|
|2025-04-01|Los Angeles| 28|
|2025-04-01| Chicago| 18|
|2025-04-02| New York| 24|
|2025-04-02|Los Angeles| 30|
|2025-04-02| Chicago| 20|
+----------+-----------+-----------+

+-----------+---------------+
| location|avg_temperature|
+-----------+---------------+
|Los Angeles| 29.0|
| Chicago| 19.0|
| New York| 23.0|
+-----------+---------------+

RESULT:

Thus the above Python program to application to find the maximum temperature using Spark
has been executed and verified successfully

46
ADDITIONAL EXPERIMENTS

Exno:1
PIG LATIN MODES, PROGRAMS
: Date:

OBJECTIVE:
a) To find the vowel in the first letter.
PROGRAM:

def pig_latin(word):
vowels = "aeiou"

# If the first letter is a vowel, add "way" at the end

if word[0].lower() in vowels:
return word + "way"
else:
# Move the first letter to the end and add "ay"
return word[1:] + word[0] + "ay"

def convert_sentence_to_pig_latin(sentence):
words = sentence.split()
pig_latin_words = [pig_latin(word) for word in words]
return ' '.join(pig_latin_words)

# Sample sentence
sentence = "Hello world this is a test"
pig_latin_sentence = convert_sentence_to_pig_latin(sentence)
print(f"Original Sentence: {sentence}")
print(f"Pig Latin Sentence: {pig_latin_sentence}")

OUTPUT:
Original Sentence: Hello world this is a test
Pig Latin Sentence: elloHay orldway histay isway away esttay

RESULT:
Thus the above pig program to f i n d t h e v o w e l s i n t h e g i v e n s e n t e n c e has been
executed and verified successfully

47
Exno:2
HIVE OPERATIONS
Date:

AIM:
To Use Hive to create, alter, and drop databases, tables, views, functions, and indexes.

Sample Scenario:

We will create a simple sales database with some sample data and then run Hive queries to perform
various operations like:
 Creating a database and tables
 Inserting data
 Running queries like filtering, aggregating, and joining
 Using partitioning

Step 1: Create a Database

CREATE DATABASE sales_db;

Output:
OK

Step 2: Create a Table

USE sales_db;

CREATE TABLE sales_data (

transaction_id INT,
product_name STRING,
amount DOUBLE,
transaction_date DATE
)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ',';

Output:
OK

Step 3: Insert Data into Table

We'll simulate inserting some sample data for transactions. In Hive, we typically load data from HDFS or a
local file system into the table. Here, we'll show the concept using a query that could be used with HDFS.

LOAD DATA LOCAL INPATH '/path/to/local/sales_data.csv' INTO TABLE sales_data;

Example data in the CSV file (sales_data.csv):

1, "Laptop", 1200, "2025-04-01"
2, "Mobile", 600, "2025-04-01"
48
3, "Laptop", 1100, "2025-04-02"
4, "Tablet", 400, "2025-04-02"
5, "Mobile", 650, "2025-04-03"

Output:
OK

Step 4: Query the Data

To see the data in the table:

SELECT * FROM sales_data;

Output:
+-----------------+-------------+--------+------------------+
| transaction_id | product_name| amount | transaction_date |
+-----------------+-------------+--------+------------------+
|1 | Laptop | 1200.0 | 2025-04-01 |
|2 | Mobile | 600.0 | 2025-04-01 |
|3 | Laptop | 1100.0 | 2025-04-02 |
|4 | Tablet | 400.0 | 2025-04-02 |
|5 | Mobile | 650.0 | 2025-04-03 |
+-----------------+-------------+--------+------------------+

Step 5: Performing Aggregation (Sum of Sales)

We can calculate the total sales (SUM(amount)) by product_name.

SELECT product_name, SUM(amount) AS total_sales
FROM sales_data
GROUP BY product_name;

Outpu
+-------------+------------+
| product_name| total_sales|
+-------------+------------+
| Laptop | 2300.0 |
| Mobile | 1250.0 |
| Tablet | 400.0 |
+-------------+------------+

Step 6: Query with Filtering (Filter Transactions Over 1000)

We can filter the data to show only transactions where the amount is greater than 1000.

SELECT * FROM sales_data WHERE amount > 1000;

Output:
+-----------------+-------------+--------+------------------+
| transaction_id | product_name| amount | transaction_date |
+-----------------+-------------+--------+------------------+
|1 | Laptop | 1200.0 | 2025-04-01 |
|3 | Laptop | 1100.0 | 2025-04-02 |
+-----------------+-------------+--------+------------------
49
Step 7: Creating a Partitioned Table

Let’s now partition the sales_data table by year (transaction_year).

CREATE TABLE sales_data_partitioned (

transaction_id INT,
product_name STRING,
amount DOUBLE,
transaction_date DATE
)

PARTITIONED BY (transaction_year INT)

ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Output:
OK

Step 8: Loading Data into Partitions

Load data into specific partitions (e.g., for year 2025):

LOAD DATA LOCAL INPATH '/path/to/sales_data.csv' INTO TABLE sales_data_partitioned PARTITION

(transaction_year=2025);

Output:

Loading data to table sales_data_partitioned

Step 9: Querying Partitioned Data

Now, we can query the data for a specific partition (e.g., for year 2025):

SELECT * FROM sales_data_partitioned WHERE transaction_year = 2025;

Output:
+-----------------+-------------+--------+------------------+-------------------+
| transaction_id | product_name| amount | transaction_date | transaction_year |
+-----------------+-------------+--------+------------------+-------------------+
|1 | Laptop | 1200.0 | 2025-04-01 | 2025 |
|2 | Mobile | 600.0 | 2025-04-01 | 2025 |
|3 | Laptop | 1100.0 | 2025-04-02 | 2025 |
|4 | Tablet | 400.0 | 2025-04-02 | 2025 |
|5 | Mobile | 650.0 | 2025-04-03 | 2025 |
+-----------------+-------------+--------+------------------+-------------------+

Step 10: Dropping a Table

Once you're done with a table, you can drop it. Here's how you drop the sales_data_partitioned table:

DROP TABLE sales_data_partitioned;

50
Output:
OK

RESULT:

Thus the above Hive operations to create, alter, and drop databases, tables, views, functions, and indexes has
been executed and verified successfully

Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
The Modern Cloud Data Platform For Dummies Databricks Special Edition
100% (2)
The Modern Cloud Data Platform For Dummies Databricks Special Edition
36 pages
InfiniBand and High-Speed Ethernet For Dummies
No ratings yet
InfiniBand and High-Speed Ethernet For Dummies
134 pages
Cloudera Administrator Training For Apache Hadoop
No ratings yet
Cloudera Administrator Training For Apache Hadoop
5 pages
bda lab s
No ratings yet
bda lab s
92 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Big Data Analytics Lab Manual(BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual(BE AI&DS)
29 pages
Notes
No ratings yet
Notes
53 pages
BDA Lab Manual_organized (2) (1) - Copy
No ratings yet
BDA Lab Manual_organized (2) (1) - Copy
69 pages
Data Science
No ratings yet
Data Science
82 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
BDA record
No ratings yet
BDA record
58 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
Big Data Analytics IT
No ratings yet
Big Data Analytics IT
55 pages
bda 1
No ratings yet
bda 1
6 pages
Big Data Analysis 3170722 Lab Manual
No ratings yet
Big Data Analysis 3170722 Lab Manual
68 pages
Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution
No ratings yet
Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution
49 pages
Bda Record
No ratings yet
Bda Record
83 pages
2020300053_BDA_EXP1_CHINMAY
No ratings yet
2020300053_BDA_EXP1_CHINMAY
13 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Bda Lab
No ratings yet
Bda Lab
94 pages
bda megh
No ratings yet
bda megh
50 pages
BIG data file
No ratings yet
BIG data file
28 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
CCS334 BDA Lab Manual
No ratings yet
CCS334 BDA Lab Manual
35 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Big Data
No ratings yet
Big Data
43 pages
Big Data File
No ratings yet
Big Data File
16 pages
CCS334 Bda
No ratings yet
CCS334 Bda
23 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Big Data Journal
No ratings yet
Big Data Journal
50 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
big data
No ratings yet
big data
28 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
BDA lab Manual
No ratings yet
BDA lab Manual
62 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
BDA Lab ManuaL[1]
No ratings yet
BDA Lab ManuaL[1]
83 pages
bdh lab manual FINAL(hadoop)
No ratings yet
bdh lab manual FINAL(hadoop)
29 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
Bda Aat
No ratings yet
Bda Aat
18 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
Final Copy - BDA LAB Record
No ratings yet
Final Copy - BDA LAB Record
44 pages
bi lab file
No ratings yet
bi lab file
19 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
MapReduce Merged
No ratings yet
MapReduce Merged
18 pages
bda-manual
No ratings yet
bda-manual
33 pages
bda2
No ratings yet
bda2
25 pages
Bda Record
No ratings yet
Bda Record
46 pages
BIGDATALABCURRENT
No ratings yet
BIGDATALABCURRENT
54 pages
big datalab
No ratings yet
big datalab
4 pages
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
5. BDA_lab-manual_1to4
No ratings yet
5. BDA_lab-manual_1to4
17 pages
Big data management
No ratings yet
Big data management
4 pages
656611017-3-Five-v-s-of-Big-Data
No ratings yet
656611017-3-Five-v-s-of-Big-Data
12 pages
Hadoop Notes 1
No ratings yet
Hadoop Notes 1
9 pages
CN LAB MANUAL
No ratings yet
CN LAB MANUAL
75 pages
Big Data Analytics On Decision Making by Smart Firms in Kenya
No ratings yet
Big Data Analytics On Decision Making by Smart Firms in Kenya
20 pages
Bda Unit 4 Material
No ratings yet
Bda Unit 4 Material
37 pages
BDA Unit 2
No ratings yet
BDA Unit 2
12 pages
Full Download Big Data Analytics Systems Algorithms Applications C.S.R. Prabhu PDF
100% (2)
Full Download Big Data Analytics Systems Algorithms Applications C.S.R. Prabhu PDF
62 pages
Srikanth_Bellary Architect Resume
No ratings yet
Srikanth_Bellary Architect Resume
6 pages
Bda 20cs41001 Course File
No ratings yet
Bda 20cs41001 Course File
133 pages
BE Elex and Comp Engg - 2019 Course
No ratings yet
BE Elex and Comp Engg - 2019 Course
91 pages
Hadoop Online Training
No ratings yet
Hadoop Online Training
7 pages
HUAWEI OceanStor 9000 Big Data Storage System Brochure PDF
No ratings yet
HUAWEI OceanStor 9000 Big Data Storage System Brochure PDF
2 pages
CAPSTONE PROJECTInstallation PDF
No ratings yet
CAPSTONE PROJECTInstallation PDF
33 pages
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
No ratings yet
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
5 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Machine Learning Cheat Sheet: 1. Hardware
No ratings yet
Machine Learning Cheat Sheet: 1. Hardware
14 pages
Module 3
No ratings yet
Module 3
51 pages
Data Engineering Cookbook
100% (1)
Data Engineering Cookbook
125 pages
SplitPDFFile 1 To 7
No ratings yet
SplitPDFFile 1 To 7
7 pages
Hbase-1.1.2-Installation Guide-On-Hadoop-2.x
No ratings yet
Hbase-1.1.2-Installation Guide-On-Hadoop-2.x
7 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
3rdYear2ndSemSyllabi CO CurriculumWise-36-49
No ratings yet
3rdYear2ndSemSyllabi CO CurriculumWise-36-49
14 pages
Understanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions Sudhir Rawat download
100% (1)
Understanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions Sudhir Rawat download
60 pages
BDA Session 2
No ratings yet
BDA Session 2
39 pages
Prashanth - Data Engineer
No ratings yet
Prashanth - Data Engineer
8 pages
9-10 Spark Architecture
No ratings yet
9-10 Spark Architecture
25 pages
Big Data
No ratings yet
Big Data
27 pages
Weather Forecasting Using BigData
No ratings yet
Weather Forecasting Using BigData
3 pages
1 UNIT-1
No ratings yet
1 UNIT-1
59 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster) STEP:1
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster) STEP:1
13 pages