0% found this document useful (0 votes)

22 views

BIG data master

Uploaded by

sahilverma20652

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

BIG data master

Uploaded by

sahilverma20652

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

INDEX

Serial Date of Date of

List of Experiment Sign of
No Experiment Submission
Faculty
To draw and explain Hadoop architecture and
1
ecosystem with the help of a case study.
Perform setting up and installing single node
2
Hadoop in a Windows environment.
To implement the following file management
3 tasks in the Hadoop System (HDFS): Adding
files and directories, retrieving files, Deleting files
4. Create a database ‘STD’ and make a collection
(e.g. "student" with fields 'No., Stu_Name, Enrol.,
4 Branch, Contact, e-mail, Score') using MongoDB.
Perform various operations in the following
experiments.
5. Insert multiple records (at least 10) into the
5
created student collection.
6. Execute the following queries on the collection
created.
a. Display data in proper format.
b. Update the contact information of a specific
student.
6
c. Add a new field remark to the document with
the name 'REM'.
d. Add a new field as no 11, stu_name XYZ,
enroll 00101, branch VB, e-mail [email protected]
Contact
098675345 without using insert statement.
7. Create an employee table in monogdb with 4
departments and 25 employees equally
divided along with one manager. The following
fields should be added; Employee_ID, Dept_ID,
First_Name, Last_Name, Salary (Range
7 between 20K-60K). Now Run the following
queries
a. Find all the employees of a particular
department where salary lies < 40K.
b. Find the highest salary for each department
0 Big Data [704] 1
and fetch the name of such employees.
c. Find all the employees who are on a lesser
salary than 30k; increase their salary by 10% and
display the results.
8. To design and implement a social network
8 graph of 50 nodes and edges between
nodes using networkx library in Python.
9. Design and plot an asymmetric social network
(socio graph) of 5 nodes (A, B, C, D, and E) such
9 that A is directed to B, B is directed to D, D is
directed to A, and D is directed to C.
10. Consider the above scenario (No. 09) and plot
10 a weighted asymmetric graph, the weight range is
between 20 to 50.
11. Implement betweenness measure between
11 nodes across the social network. (Assume the
social network of 10 nodes)
Name of Student: Class: BE
Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:

List of Experiments:

Question 1. To draw and explain Hadoop architecture and ecosystem with the help of a case study.

Answer:

Hadoop is an open-source framework for distributed storage and processing of large datasets. It consists
of a comprehensive ecosystem of components that work together to handle big data challenges. To
illustrate Hadoop's architecture and ecosystem, let's consider a case study involving a retail company,
"Retail Mart," which aims to analyze and derive insights from its vast number of sales and customer
data.

Hadoop Architecture and Ecosystem Components:

1. Hadoop Distributed File System (HDFS):

○ Retail Mart’s data includes sales transactions, customer profiles, inventory data, and
more. HDFS is used to store and manage these large datasets across a distributed
cluster of commodity hardware. Data is divided into blocks and replicated for fault
tolerance.
2. MapReduce:
○ Retail Mart wants to perform analytics on its sales data to gain insights. MapReduce is

0 Big Data [704] 1

a core processing framework in Hadoop. It allows parallel, distributed processing of
data stored in HDFS. In our case study, MapReduce jobs can be used to analyze sales
data, calculate revenues, and extract relevant insights.
3. YARN (Yet Another Resource Negotiator):
○ YARN manages cluster resources and schedules tasks for MapReduce jobs and other
applications. It enables efficient resource utilization by allocating resources
dynamically to various applications. Retail Mart can use YARN to ensure that its sales
data analysis does not overwhelm the cluster.
4. Hive:
○ Hive is a data warehousing and SQL-like query language for Hadoop. Retail Mart can
use Hive to run SQL queries on their sales data, generate reports, and create dashboards
for business intelligence.
5. Pig:
○ Pig is a high-level platform for creating MapReduce programs. It provides a scripting
language to process and analyze data. Retail Mart can use Pig to perform data
transformation and ETL (Extract, Transform, Load) operations on their raw sales
data.
6. HBase:
○ Retail Mart needs to store and access customer profiles and inventory data in a
scalable, real-time database. HBase, a NoSQL database on top of Hadoop, can be used
to achieve this. It provides fast and random access to large datasets.

7. Spark:
○ To perform more complex and iterative data processing, Retail Mart can utilize
Apache Spark, which is often used alongside Hadoop. Spark is well-suited for
machine learning, data streaming, and graph processing.
8. Sqoop:
○ Retail Mart wants to import data from their existing relational databases into
Hadoop. Sqoop is a tool for efficiently transferring data between Hadoop and
structured data stores like relational databases.
9. Flume and Kafka:
○ To ingest real-time data from online transactions and weblogs, Retail Mart can use
Flume and Kafka. These tools enable data streaming into Hadoop for immediate
analysis.

Case Study Scenario:

Retail Mart loads daily sales transaction data into HDFS, which includes information about products,
customers, and sales. They run MapReduce jobs to calculate daily, weekly, and monthly revenues and
perform market basket analysis to find correlations between products. The results are stored in HBase
for real-time access.

Retail Mart also utilizes Hive to create reports for management, showing sales trends and customer
behavior. They use Pig to clean and preprocess the data, while Spark helps them build
recommendation engines based on customer preferences.

In addition, Retail Mart employs Sqoop to import historical sales data from their existing relational
database into HDFS, ensuring all data is in one place for analysis. Flume and Kafka are used to stream
0 Big Data [704] 1
data from online transactions, enabling real-time analytics.

Name of Student: Class: BE

Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:

Question 2. Perform setting up and installing single node Hadoop in a Windows environment.

Answer:

Prerequisites: Before you begin, ensure that you have the following prerequisites in place:

1. A Windows machine with sufficient system resources (RAM, CPU, and storage) to run
Hadoop.
2. Java Development Kit (JDK) installed. Hadoop requires Java. You can download and install
Oracle JDK or OpenJDK.
3. Download the Hadoop distribution for Windows. You can download it from the official
Apache Hadoop website.

Installation Steps:

1. Java Installation:
0 Big Data [704] 1
○ Install Java (if not already installed) and set the JAVA_HOME environment variable to
point to the Java installation directory. Make sure you have the correct version of Java
compatible with the Hadoop version you are using.
2. Hadoop Installation:
○ Extract the downloaded Hadoop distribution to a directory on your Windows
machine. For example, you can extract it to C:\hadoop.
3. Configuration:
○ Navigate to the C:\hadoop\etc\hadoop directory (or the location where you
extracted Hadoop) and edit the following configuration files:
■ core-site.xml: Add the following configuration to specify Hadoop's data
directory:

● hdfs-site.xml: Configure the HDFS data and replication settings:

4. Formatting HDFS:

● Open a Command Prompt and navigate to the Hadoop bin directory, typically
located at C:\hadoop\bin.
● Run the following command to format the HDFS:

hdfs namenode -format

5. Starting Hadoop Services:

○ Start the Hadoop services using the following commands:
■ Start the NameNode and DataNode:

0 Big Data [704] 1

start-dfs.cmd

■ Start the ResourceManager and NodeManager:

start-yarn.cmd

Name of Student: Class: BE

Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:

Question 3. To implement the following file management tasks in Hadoop System (HDFS):
Adding files and directories, retrieving files, Deleting files

Answer:

● In a Hadoop Distributed File System (HDFS), you can perform file

management tasks like adding files and directories, retrieving files, and
deleting files using Hadoop's command-line utilities. Here's how you can
execute each of these tasks:
○ Adding Files and Directories:
■ To add files and directories to HDFS, you can use the
Hadoop fs command-line utility. The command for adding a
file from your local file system to HDFS is Hadoop fs -
copyFromLocal:
lua

hadoop fs -copyFromLocal /local/path/to/source /hdfs/path/to/destination

● To create an HDFS directory, you can use the mkdir

command: bash

hadoop fs -mkdir /hdfs/path/to/directory

● Retrieving Files:
■ To retrieve files from HDFS to your local file system,
use the get command: lua

hadoop fs -get
/hdfs/path/to/source/local/path/to/destination

● Deleting Files:
■ To delete files in HDFS, you can use the rm command. Be
cautious when using this command as it permanently deletes
files.
0 Big Data [704] 1
bash

hadoop fs -rm /hdfs/path/to/file

● To delete an empty directory, use the

rmdir command: bash

hadoop fs -rmdir /hdfs/path/to/empty_directory

● To delete a directory and its contents, use the -r

option with rm: bash

Name of Student: Class: BE

Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:

hadoop fs -rm -r /hdfs/path/to/directory

Question 4. Create a database ‘STD’ and make a collection (e.g. "student" with fields 'No.,
Stu_Name, Enrol., Branch, Contact, e-mail, Score') using MongoDB. Perform various operations
in the following experiments.

Answer:

Create a Database:
In the MongoDB shell, you can create a database named 'STD' using the use command. If the database
doesn't exist, MongoDB will create it when you insert data into it:
Perl

use STD

Create a Collection and Insert Data:

1. Now, create a collection called 'student' and insert some sample data into it. You can use the
insertOne or insertMany method to add documents to the collection. Here's an example of
0 Big Data [704] 1
inserting a single document:
javascript

db.student.insertOne({ No
: 1,
Stu_Name: "John Doe",
Enrol: "E12345",
Branch: "Computer Science",
Contact: "123-456-7890",
email: "[email protected]",
Score: 95
});

2. You can insert more documents using the insertOne or insertMany methods.

Querying Data:
db.student.find()

Updating Data:
db.student.updateOne(
{ Enrol: "E12345" },
{ $set: { Score: 98 } }
);

Deleting Data:
db.student.deleteOne({ No: 1 });

0 Big Data [704] 1

Name of Student: Class: BE

Enrplment No: Batch

Date of Experiment Date of Submission Submitted on:

Remarks by faculty: Grade:

Signature of student: Signature of Faculty:

Question 5. Insert multiple records (at least 10) into the created student collection. Answer:

Below is the code of inserting 10 multiple records in the Student Collection

db.student.insertMany([
{
No: 2,
Stu_Name: "Bhavik",
Enrol: "E23456",
Branch: "Computer Science",
Contact: "9876543210",
email: "[email protected]",
Score: 88
},
{
No: 3,
Stu_Name: "Bhavika",
Enrol: "E34567",
Branch: "Mathematics",
Contact: "9988776655",
email: "[email protected]",
Score: 73
},
{
No: 4,
Stu_Name: "Aeshna",
Enrol: "E45678",

0 Big Data [704] 1

Branch: "Physics",
Contact: "9871234560",
email: "[email protected]",
Score: 92
},
{

No: 5,
Stu_Name: "Ansh",
Enrol: "E56789",
Branch: "Chemistry",
Contact: "9877005500",
email: "[email protected]",
Score: 79
},
{
No: 6,
Stu_Name: "Devendra",
Enrol: "E67890",
Branch: "Mechanical Engineering",
Contact: "9966337700",
email: "[email protected]",
Score: 85
},
{
No: 7,
Stu_Name: "Yash",
Enrol: "E78901",
Branch: "Electrical Engineering",
Contact: "9977553344",
email: "[email protected]",
Score: 94
},
{
No: 8,
Stu_Name: "Anuj",
Enrol: "E89012",
Branch: "Economics", Contact:
"9876009900",
email: "[email protected]",
Score: 78
},
{
No: 9,
Stu_Name: "Aditya",
Enrol: "E90123",
Branch: "History",
0 Big Data [704] 1
Contact: "9888998899",
email: "[email protected]",
Score: 87
},
{

No: 10,
Stu_Name: "Jay", Enrol:
"E01234",
Branch: "Geography", Contact:
"9966558877",
email: "[email protected]",

Score: 91
},
{

No: 11,
Stu_Name: "Patel", Enrol:
"E12345",
Branch: "Business Administration",
Contact: "9876543000",
email: "[email protected]", Score:
75
}
]);

0 Big Data [704] 1

Name of Student: Class: BE
Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:

Question 6. Execute the following queries on the collection created.

a. Display data in proper format.
b. Update the contact information of a specific student.
c. Add a new field remark to the document with the name 'Yash'.
d. Add a new field as no 11, stu_name XYZ, enroll 00101, branch VB, e-mail [email protected]
Contact 098675345 without using insert statement.

Answer:

a. Display Data in Proper Format (In Ascending Order of Name Alphabet):

javascript
db.student.find().sort({ Stu_Name: 1 }).pretty()

0 Big Data [704] 1

This query will retrieve all documents from the 'student' collection and display them in ascending order
of the 'Stu_Name' field, in a properly formatted manner using the pretty() method.

b. Update the Contact Information of a Specific Student (Name = Anuj):

javascript
db.student.updateOne({ Stu_Name: "Anuj" }, { $set: { Contact: "9876543210" } })

This query updates the 'Contact' field for the student with the name 'Anuj'. You can replace
"9876543210" with the new contact information as needed.

c. Add a New Field 'remark' to the Document with the Name 'Yash':

javascript
db.student.updateOne({ Stu_Name: "Yash" }, { $set: { remark: "New remark text" } })

This query adds a new field 'remark' with the value "New remark text" to the document where
'Stu_Name' is 'Yash'. You can adjust the value accordingly.

d. Add a New Field with Student No. 12 (Name: Shruti, Enroll: 00101, Branch: CSE, Email:
[email protected], Contact: 098675345): This task can be accomplished using the updateOne
method without an insert statement. To add a new student with No. 12 and the specified details, run the
following query:
javascript
db.student.updateOne(
{ No: 12 },
{
$set: {
Stu_Name: "Shruti",
Enrol: "00101",
Branch: "CSE",
email: "[email protected]",
Contact: "098675345",
Score: 0 // You can set an initial score or any other default value
}
},
{ upsert: true } // Use upsert to insert if the document doesn't exist
)

0 Big Data [704] 1

Name of Student: Class: BE
Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:
Question 7. Create an employee table in monogdb with 4 departments and 25 employees equally divided along
with one manager. The following fields should be added; Employee_ID,Dept_ID, First_Name, Last_Name,
Salary (Range between 20K-60K). Now Run the following queries

a. Find all the employees of a particular department where salary lies < 40K.
b. Find the highest salary for each department and fetch the name of such employees.
c. Find all the employees who are on a lesser salary than 30k; increase their salary by 10%
and display the results.

Answer:

1. Create the Employee Collection:

Assuming you are using the MongoDB shell, you can create the employee collection as follows:
// Create the Employee Collection
db.createCollection("employee")

// Insert Employee Data

var departments = ["HR", "Finance", "Marketing", "Engineering"]; var
salaries = [20000, 30000, 40000, 50000, 60000];

// Insert employees for each department for

(var deptId = 1; deptId <= 4; deptId++) {
// Insert department manager
db.employee.insert({
Employee_ID: deptId,
Dept_ID: deptId, First_Name:
"Manager", Last_Name: "Dept
0 Big Data [704] 1
" + deptId, Salary: salaries[4],
});

// User can insert the data of 25 Employee dynamically for

(var i = 1; i <= 25; i++) {
db.employee.insert({
Employee_ID: (deptId - 1) * 25 + i + 4,
Dept_ID: deptId,
First_Name: "Employee" + i,
Last_Name: "Dept " + deptId,
Salary: salaries[i % 5],
});
}
}

a. Find all the employees of a particular department where salary is less than 40K. For
example, to find employees in the "HR" department with a salary less than 40K:

db.employee.find({ Dept_ID: 1, Salary: { $lt: 40000 } })

b. Find the highest salary for each department and fetch the names of such employees:

db.employee.aggregate([
{
$group: {
_id: "$Dept_ID",
maxSalary: { $max: "$Salary" }
0 Big Data [704] 1
}
},
{

$lookup: {
from: "employee",
localField: "_id",
foreignField: "Dept_ID",
as: "employees"
}
},
{
$unwind: "$employees"
},
{
$match: { "employees.Salary": "$maxSalary" }
},
{
$project: {
_id: 0,
Dept_ID: "$_id",
Employee_ID: "$employees.Employee_ID",
First_Name: "$employees.First_Name",
Last_Name: "$employees.Last_Name", Salary:
"$maxSalary"
}
}
])

c. Find all employees who are on a salary less than 30K, increase their salary by 10%, and
display the results:

db.employee.updateMany(
{ Salary: { $lt: 30000 } },
{ $mul: { Salary: 1.1 } }
)

db.employee.find({ Salary: { $lt: 30000 } })

0 Big Data [704] 1

Answer:
To design and implement a social network graph with 50 nodes and edges between nodes using the
NetworkX library in Python, follow these steps:

1. Install NetworkX:
If you haven't already, you need to install the NetworkX library. You can install it using pip:
bash
pip install networkx

2. Import NetworkX and Create a Graph:

In your Python script or Jupyter Notebook, import the NetworkX library and create a graph:
python

import networkx as nx #

Create an empty graph

G = nx.Graph()

3. Add Nodes:
You can add nodes to the graph. Since you want 50 nodes, you can do this programmatically:
python

num_nodes = 50
0 Big Data [704] 1
for i in range(1, num_nodes + 1):
G.add_node(i)

4. Add Edges:
To create edges between nodes, you can use various methods, such as adding random
connections or defining a specific network structure. Here's an example of adding random edges
to create a connected graph:

python
import random

# Add random edges to create

connections for i in range(1, num_nodes
+ 1):
for j in range(i + 1, num_nodes + 1):
if random.random() < 0.1: # Adjust the probability as
needed G.add_edge(i, j)

5. Visualize the Graph (Optional):

If you want to visualize the network graph, you can use the matplotlib library along with
NetworkX. Make sure to install matplotlib:
bash

pip install matplotlib

Here's an example of how to visualize the graph:

Python:

import matplotlib.pyplot as plt

nx.draw(G, with_labels=True, node_color=’lightpink’,font_weight='bold')
plt.show()

0 Big Data [704] 1

Name of Student: Class: BE
Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:

Question 9. Design and plot an asymmetric social network (socio graph) of 5 nodes (A, B, C, D, and E)
such that A is directed to B, B is directed to D, D is directed to A, and D is directed to C.

Answer:

1. Install NetworkX

If you haven't already, you need to install the NetworkX library. You can install it using pip: pip

install networkx

2. Import NetworkX and Create a Directed Graph:

In your Python script or Jupyter Notebook, import the NetworkX library and create a directed graph
(DiGraph)
import networkx as nx
import matplotlib.pyplot as plt G

= nx.DiGraph()

3. Add Nodes:
0 Big Data [704] 1
Add nodes A, B, C, D, and E to the graph:
nodes = ["A", "B", "C", "D", "E"]
G.add_nodes_from(nodes)

4. Add Directed Edges:

Add directed edges between nodes A, B, D, and C as specified:

G.add_edge("A", "B")
G.add_edge("B", "D")
G.add_edge("D", "A")
G.add_edge("D", "C")
```

5. Visualize the Graph:

You can use Network X and matplotlib to visualize the directed sociograph: pos =
{
"A": (0, 1),
"B": (1, 2),
"C": (2, 1),
"D": (1, 0),
"E": (3, 2)
}
nx.draw(G, pos, with_labels=True, node_color='green', font_weight='bold', node_size=1000, arrowsize=20)
plt.show()

0 Big Data [704] 1

Name of Student: Class: BE
Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:

Question 10. Consider the above scenario (No. 09) and plot a weighted asymmetric graph, the
weight range is between 20 to 50.

Answer:

To create a weighted asymmetric graph based on the scenario provided in question 9, you can use
NetworkX in Python. In this case, you will assign random weights in the range of 20 to 50 to the
directed edges between nodes A, B, D, and C. Here's how you can achieve this:

1. Install and Import NetworkX

We already install and import the NetworkX, again the same code :

pip install networkx

import networkx as nx
import matplotlib.pyplot as plt G
= nx.DiGraph()

2. Add Nodes: Add nodes A, B, C, D, and E to the graph:

nodes = ["A", "B", "C", "D", "E"]

G.add_nodes_from(nodes)

3. Add Directed Edges with Random Weights: Add directed edges between nodes A, B, D, and C,
and assign random weights in the range of 20 to 50:

# Define the edges and assign random weights edges =

[("A", "B", random.randint(20, 50)),
("B", "D", random.randint(20, 50)),
("D", "A", random.randint(20, 50)),
("D", "C", random.randint(20, 50))]

for edge in edges:

source, target, weight = edge G.add_edge(source,
target, weight=weight)

0 Big Data [704] 1

4. Visualize the Weighted Directed Graph: Use NetworkX and matplotlib to visualize the
weighted asymmetric graph:

pos = {
"A": (0, 1),
"B": (1, 2),
"C": (2, 1),
"D": (1, 0),
"E": (3, 2)
}
# Extract edge weights for drawing
edge_weights = nx.get_edge_attributes(G, 'weight')

# Draw the weighted directed graph with edge labels

nx.draw(G, pos, with_labels=True, node_color='lightpink', font_weight='bold', node_size=1000,
arrowsize=20)
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_weights) plt.show()

Name of Student: Class: BE

Enrplment No: Batch
Date of Experiment Date of Submission Submitted on:
Remarks by faculty: Grade:
Signature of student: Signature of Faculty:
Question 11. Implement betweenness measure between nodes across the social network. (Assume the social
network of 10 nodes)

Answer:

To implement the betweenness centrality measure between nodes in a social network using Python
and NetworkX, you can follow these steps. Here, I'll assume a social network with 10 nodes for
demonstration:

0 Big Data [704] 1

1. Install NetworkX: If you haven't already, install the NetworkX library using pip:
bash

pip install networkx

2. Import NetworkX and Create a Graph: Import NetworkX and create a graph for your social
network:

import networkx as nx

# Create a graph (assuming a social network with 10 nodes) G =

nx.Graph()
nodes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
G.add_nodes_from(nodes)

3. Add Edges (Connections between Nodes): Define the connections (edges) between nodes to
represent your social network. This can be done based on your specific network structure:

# Example edges for demonstration (customize as needed) edges =

[
(1, 2), (1, 3), (1, 4),
(2, 3), (2, 4), (2, 5),
(3, 5), (3, 6), (3, 7),
(4, 7), (4, 8),
(5, 6), (5, 9),
(6, 9), (6, 10),
(7, 8), (7, 10),
(8, 10),
(9, 10)
]
G.add_edges_from(edges)

4. Calculate Betweenness Centrality: Now, calculate the betweenness centrality for each node in
the social network:

betweenness = nx.betweenness_centrality(G)

The betweenness_centrality function computes the normalized betweenness centrality for all nodes in the graph. The
result is stored in the betweenness dictionary, where the keys are node identifiers, and the values are the betweenness
centrality scores.

5. Display Betweenness Centrality Scores: You can print or analyze the betweenness
centrality scores for each node:

for node, centrality in betweenness.items():

0 Big Data [704] 1

print(f"Node {node}: Betweenness Centrality = {centrality:.4f}")

0 Big Data [704] 1

Distance Protection - R2
No ratings yet
Distance Protection - R2
35 pages
Basic Notes Laptops Repair - 1
100% (3)
Basic Notes Laptops Repair - 1
255 pages
Lab Manual Big Data
No ratings yet
Lab Manual Big Data
22 pages
Big Data Lab File
No ratings yet
Big Data Lab File
49 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Bda Lab 1
No ratings yet
Bda Lab 1
9 pages
BDA_Experiment1
No ratings yet
BDA_Experiment1
8 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
No ratings yet
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
44 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
44 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
No ratings yet
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
28 pages
1- HADOOP crash course
No ratings yet
1- HADOOP crash course
52 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
BDA-Module2
No ratings yet
BDA-Module2
43 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
55 pages
BIG DATA UNIT -2
No ratings yet
BIG DATA UNIT -2
18 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Reference: Apache Hadoop: Hadoop: The Definitive Guide, by Tom White, 2 Edition, Oreilly's, 2010
100% (1)
Reference: Apache Hadoop: Hadoop: The Definitive Guide, by Tom White, 2 Edition, Oreilly's, 2010
57 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
56 pages
Big-Data Computing: B. Ramamurthy
No ratings yet
Big-Data Computing: B. Ramamurthy
61 pages
Hadoop - Project 5th Sem - 1
No ratings yet
Hadoop - Project 5th Sem - 1
62 pages
Hadoop Ecosystem
100% (2)
Hadoop Ecosystem
33 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
No ratings yet
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
260 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
Unit 3
No ratings yet
Unit 3
61 pages
1.Mrplab Intro
No ratings yet
1.Mrplab Intro
18 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
Hadoop
No ratings yet
Hadoop
11 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
15 pages
Unit v Programming Model
No ratings yet
Unit v Programming Model
53 pages
Unit_IV_Hadoop
No ratings yet
Unit_IV_Hadoop
90 pages
Session3_4-Bigdata Tools and Movie use case
No ratings yet
Session3_4-Bigdata Tools and Movie use case
79 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
Big Data Introduction & Ecosystems
No ratings yet
Big Data Introduction & Ecosystems
4 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Unit Ii LM
No ratings yet
Unit Ii LM
18 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Hadoop
No ratings yet
Hadoop
154 pages
Introduction To The Big Data Ecosystem
No ratings yet
Introduction To The Big Data Ecosystem
13 pages
Hadoop - The Final Product
100% (2)
Hadoop - The Final Product
42 pages
Big Data & Hadoop - Course Curriculum
No ratings yet
Big Data & Hadoop - Course Curriculum
6 pages
Bigdata Module2 7th-Sem 18cs72
No ratings yet
Bigdata Module2 7th-Sem 18cs72
64 pages
DSCI 5350 - Lecture 2 PDF
No ratings yet
DSCI 5350 - Lecture 2 PDF
54 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
Unit 3 - BD - Hadoop Ecosystem
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
42 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Apache Hadoop
No ratings yet
Apache Hadoop
11 pages
bda2
No ratings yet
bda2
25 pages
2nd Unit Bda
No ratings yet
2nd Unit Bda
30 pages
Hadoop Blueprints
From Everand
Hadoop Blueprints
Tanmay Deshpande
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Motion Control System Information PDF
No ratings yet
Motion Control System Information PDF
349 pages
CCIE Enterprsie Course Material
No ratings yet
CCIE Enterprsie Course Material
1,413 pages
AER201 Final Report
No ratings yet
AER201 Final Report
94 pages
14-10-23 - 23CS101-Problem Solving Using C++ - Lab Schedule
No ratings yet
14-10-23 - 23CS101-Problem Solving Using C++ - Lab Schedule
3 pages
D3DCOMPILER 43.dll For Max Payne 3
No ratings yet
D3DCOMPILER 43.dll For Max Payne 3
3 pages
LT200B UserGuide - en
No ratings yet
LT200B UserGuide - en
9 pages
Cognition Cockpit Overview
No ratings yet
Cognition Cockpit Overview
29 pages
Appendix 7 To Chapter 5 Sms Gap Analysis Checklist and Implementation Plan
No ratings yet
Appendix 7 To Chapter 5 Sms Gap Analysis Checklist and Implementation Plan
8 pages
BIM Clash Detection
No ratings yet
BIM Clash Detection
16 pages
Steelproducts-Detailing Software
No ratings yet
Steelproducts-Detailing Software
2 pages
Oracle Fusion Logistics: Web Services For Integration With Execution Systems
No ratings yet
Oracle Fusion Logistics: Web Services For Integration With Execution Systems
102 pages
Green Supply Chain Management
100% (3)
Green Supply Chain Management
31 pages
Compact NS 630 Up To 3200
No ratings yet
Compact NS 630 Up To 3200
87 pages
Interlink Branch Banking: Banking & Insurance Prepared By: Pratik Shrimali
No ratings yet
Interlink Branch Banking: Banking & Insurance Prepared By: Pratik Shrimali
18 pages
QT 1300 Tri Fold Sliced
No ratings yet
QT 1300 Tri Fold Sliced
6 pages
Specification 386 Twizzle-June 2010
No ratings yet
Specification 386 Twizzle-June 2010
4 pages
07.-Solar-PV-system-design-exercise
No ratings yet
07.-Solar-PV-system-design-exercise
13 pages
Worksheet 4
No ratings yet
Worksheet 4
3 pages
Chicco Gofit Plus Booster Instruction Manual
No ratings yet
Chicco Gofit Plus Booster Instruction Manual
31 pages
EJP
No ratings yet
EJP
19 pages
Electric Power Transmission
No ratings yet
Electric Power Transmission
4 pages
Littelfuse - Thermistor Threaded USP10981 - Datasheet
No ratings yet
Littelfuse - Thermistor Threaded USP10981 - Datasheet
1 page
Air3278 B78K
100% (1)
Air3278 B78K
2 pages
Agamoni, Mrunmayee, Prabhuti, Shalini, Suhrit - AAMM2
No ratings yet
Agamoni, Mrunmayee, Prabhuti, Shalini, Suhrit - AAMM2
14 pages
TV Repeater's Repeater: Boulder Amateur Television Club
No ratings yet
TV Repeater's Repeater: Boulder Amateur Television Club
11 pages
CMS V1.4
No ratings yet
CMS V1.4
82 pages
VSUN320-60M VSUN550-144MH: Years
No ratings yet
VSUN320-60M VSUN550-144MH: Years
2 pages
Trabajo Final Ingles
No ratings yet
Trabajo Final Ingles
6 pages

BIG data master

Uploaded by

BIG data master

Uploaded by

INDEX

Serial Date of Date of

Hadoop Architecture and Ecosystem Components:

1. Hadoop Distributed File System (HDFS):

0 Big Data [704] 1

Case Study Scenario:

Name of Student: Class: BE

● hdfs-site.xml: Configure the HDFS data and replication settings:

hdfs namenode -format

5. Starting Hadoop Services:

0 Big Data [704] 1

■ Start the ResourceManager and NodeManager:

Name of Student: Class: BE

● In a Hadoop Distributed File System (HDFS), you can perform file

hadoop fs -copyFromLocal /local/path/to/source /hdfs/path/to/destination

● To create an HDFS directory, you can use the mkdir

hadoop fs -mkdir /hdfs/path/to/directory

hadoop fs -rm /hdfs/path/to/file

● To delete an empty directory, use the

hadoop fs -rmdir /hdfs/path/to/empty_directory

● To delete a directory and its contents, use the -r

Name of Student: Class: BE

hadoop fs -rm -r /hdfs/path/to/directory

Create a Collection and Insert Data:

0 Big Data [704] 1

Enrplment No: Batch

Date of Experiment Date of Submission Submitted on:

Remarks by faculty: Grade:

Signature of student: Signature of Faculty:

Below is the code of inserting 10 multiple records in the Student Collection

0 Big Data [704] 1

0 Big Data [704] 1

Question 6. Execute the following queries on the collection created.

a. Display Data in Proper Format (In Ascending Order of Name Alphabet):

0 Big Data [704] 1

b. Update the Contact Information of a Specific Student (Name = Anuj):

0 Big Data [704] 1

1. Create the Employee Collection:

// Insert Employee Data

// Insert employees for each department for

// User can insert the data of 25 Employee dynamically for

db.employee.find({ Dept_ID: 1, Salary: { $lt: 40000 } })

db.employee.find({ Salary: { $lt: 30000 } })

0 Big Data [704] 1

2. Import NetworkX and Create a Graph:

Create an empty graph

# Add random edges to create

5. Visualize the Graph (Optional):

pip install matplotlib

Here's an example of how to visualize the graph:

import matplotlib.pyplot as plt

0 Big Data [704] 1

2. Import NetworkX and Create a Directed Graph:

4. Add Directed Edges:

Add directed edges between nodes A, B, D, and C as specified:

5. Visualize the Graph:

0 Big Data [704] 1

1. Install and Import NetworkX

pip install networkx

2. Add Nodes: Add nodes A, B, C, D, and E to the graph:

nodes = ["A", "B", "C", "D", "E"]

# Define the edges and assign random weights edges =

for edge in edges:

0 Big Data [704] 1

# Draw the weighted directed graph with edge labels

Name of Student: Class: BE

0 Big Data [704] 1

pip install networkx

# Create a graph (assuming a social network with 10 nodes) G =

# Example edges for demonstration (customize as needed) edges =

for node, centrality in betweenness.items():

0 Big Data [704] 1

0 Big Data [704] 1

You might also like