0% found this document useful (0 votes)
4 views

Data Analytics Lab Manual Students.docx

The document outlines the vision and mission of the P. R. Pote Patil College of Engineering & Management's Department of Artificial Intelligence & Data Science for the academic year 2024-25, emphasizing academic excellence, ethical values, and professional development. It details program outcomes, educational objectives, and specific outcomes for graduates, focusing on technical proficiency, leadership, and lifelong learning. Additionally, it provides guidelines for teachers and instructions for students regarding practical laboratory work, assessment criteria, and skills development.

Uploaded by

amankokate01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Analytics Lab Manual Students.docx

The document outlines the vision and mission of the P. R. Pote Patil College of Engineering & Management's Department of Artificial Intelligence & Data Science for the academic year 2024-25, emphasizing academic excellence, ethical values, and professional development. It details program outcomes, educational objectives, and specific outcomes for graduates, focusing on technical proficiency, leadership, and lifelong learning. Additionally, it provides guidelines for teachers and instructions for students regarding practical laboratory work, assessment criteria, and skills development.

Uploaded by

amankokate01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

P. R. POTE PATIL
College of Engineering & Management, Amravati.
​ ​

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA


SCIENCE

Year: 2024-25 Semester: VI

Course: Data Analytics Laboratory


Institute Vision Mission


Vision
To flourish as a center of excellence for producing the skilled technocrats
and committed human beings.
Mission
●​ To create conducive environment for teaching &learning.
●​ To impart quality education through demanding academic programs.
●​ To enhance career opportunities by exposure to Industries & recent
technologies.
●​ To develop professionals with strong ethics and human values for the
betterment of society.

Department of Artificial Intelligence & Data Science


Vision
To achieve excellence in education, research & innovation for steering
ethical, impactful, and globally competitive engineers
Mission
●​ To provide academic excellence, ensuring students equipped with
cutting-edge knowledge and skills.
●​ To promote a vibrant research ecosystem encouraging faculties and
students contributing to the advancement of knowledge and innovation.
●​ To inculcate ethical values and a sense of responsibility in students and
preparing them to be ethical leaders who prioritize societal well-being and
equity.

Program Outcomes:
Engineering Graduate will be able to:
1.​ Engineering Knowledge: Apply the knowledge of mathematics, science,
engineering fundamentals, and an engineering specialization to the solution of
complex engineering problems.
2.​ Problem Analysis: Identify, formulate, research literature, and analyze
complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences.
3.​ Design/development of solutions: Design solutions for complex
engineering problems and design system components or processes that meet
the specified needs with appropriate consideration for the public health and
safety, and the cultural, societal, and environmental considerations.
4.​ Conduct investigations of complex problems: Use research-based
knowledge and research methods including design of experiments, analysis
and interpretation of data, and synthesis of the information to provide valid
conclusions.
5.​ Modern tool usage: Create, select, and apply appropriate techniques,
resources, and modern engineering and IT tools including prediction and
modeling to complex engineering activities with an understanding of the
limitations.
6.​ The engineer and society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
7.​ Environment and sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and
demonstrate the knowledge of, and need for sustainable development.
8.​ Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
9.​ Individual and team work: Function effectively as an individual, and as a
member or leader in diverse teams, and in multidisciplinary settings.
10.​Communication: Communicate effectively on complex engineering activities
with the engineering community and with society at large, such as, being able
to comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.
11.​Project management and finance: Demonstrate knowledge and
understanding of the engineering and management principles and apply
these to one’s own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
12.​Life-long learning: Recognize the need for, and have the preparation and
ability to engage in independent and life-long learning in the broadest context
of technological change.

Program Educational Objectives (PEO):


To prepare the Engineering graduates to​
PEO1:​Technical Proficiency: Graduates will demonstrate proficiency in core
areas of computer science, including algorithms, data structures, software
engineering, and database systems, with a specialized focus on artificial
intelligence and data science techniques.
PEO2: Leadership and Innovation: Graduates will exhibit leadership qualities and
innovative thinking, contributing to the development and implementation
of AI-driven solutions that address societal, industrial, and environmental
challenges.
PEO3: Ethical and Professional Responsibility: Graduates will adhere to ethical
and professional standards in their engineering practice, demonstrating
integrity, accountability, and a commitment to societal well-being in the
design and deployment of AI and DS systems.
PEO4: Lifelong Learning: Graduates will adopt a culture of lifelong learning and
professional development, empowering individuals to adapt to emerging
technologies, navigate complex socio-technical landscapes, and contribute
meaningfully to the advancement of AIDS throughout their careers.

Program Specific Outcome (PSO):


PSO1:​Apply Intelligent Systems Development: Develop intelligent systems and
applications integrating AI and DS technologies to enhance functionality
and performance.
PSO2: Problem-Solving Skills: Possess strong analytical and problem-solving
skills, enabling them to identify, formulate, and solve complex engineering
problems using AI and DS methodologies in diverse application domains.y

GUIDELINES FOR TEACHERS


Teachers shall discuss the following points with students before start of practical of the
subject.
1.​ Learning Overview: To develop better understanding of importance of the subject.
To know related skills to be developed such as intellectual and motor skills.
2.​ Know you’re Laboratory Work: To understand the layout of laboratory, specifications
of equipment / instruments /materials, procedure, working in groups, planning time
etc. also to know total amount of work to be done in the laboratory.
3.​ Teacher shall ensure that required equipment is in working condition before start of
each experiment, also keep operating instruction manual available.
4.​ Explain prior concepts to the students before starting of each experiment.
5.​ Evolve student’s activity at the time of conduct of each experiment.
6.​ While taking reading / observation each student (from batch of 20 students) shall be
given a chance to perform / observe the experiment.
7.​ Teacher shall assess the performance of students continuously.
8.​ Teacher is expected to share the skills to be developed in the students.
9.​ Teacher should ensure that the respective skills are developed in the students after
the completion of the practical exercise.
10.​Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from students by the industries.
11.​Teacher may suggest the students to refer additional related literature of the technical
papers / reference books / Seminar Proceedings, etc.
12.​Focus should be given on development of enlisted skills rather than theoretical /
codified knowledge.
13.​During assessment teacher is expected to ask questions to the students to tap their
achievements regarding related knowledge and skills.
14.​Teacher should give more focus on hands on skills.

INSTRUCTIONS FOR STUDENTS


1.​ Students shall read the points given below for understanding the theoretical
concepts and practical applications.
2.​ Listen carefully to the lecture given by teacher about importance of subject,
curriculum philosophy, learning structure, skills to be developed, information about
equipment, instruments, procedure, method of continuous assessment, tentative
plan of work in laboratory and total amount of works to be done in a semester.
3.​ Student shall undergo study visit of the laboratory for types of equipment, and
material to be used, before performing experiments.
4.​ Read the write up of each experiment to be performed, a day in advance.
5.​ Organize the work in the group and make a record of all observations.
6.​ Understand the purpose of experiment and its practical implications.
7.​ Student should not hesitate to ask any difficulty faced during conduct of practical
/exercise.
8.​ Write the answers of the questions allotted by the teacher during practical hours if
possible or afterwards, but immediately.
9.​ Student should develop the habit of pear discussion / group discussion related to
experiments / exercise so that exchanges of knowledge / skills could take place.
10.​Students shall attempt to develop related hands-on-skills and gain confidence.
11.​Student shall focus on development of skills rather than theoretical or codified
knowledge.
12.​Student shall insist for the completions of recommended Laboratory Work, answers
to the given question etc.
13.​Student shall develop the habit of evolving more ideas, innovations, skills etc. that
included in the scope of the manual.
14.​Student shall refer technical magazines, proceedings of the Seminars, refer website
related to the scope of the subjects and update their knowledge and skills.
15.​Student should develop the habit of not depend totally on teachers but to develop
self-learning techniques.
16.​Student should develop the habit to interact with the teacher without hesitation with
respect to academic involved.
17.​Student should develop habit to submit the practical’s exercise continuously and
progressively on the schedule dates and should get the assessment done.
18.​Student should be well prepared while submitting the write up of the exercise. This
will develop the continuity of the studies and he will not be overloaded at the end of
the term.

P. R. POTE PATIL
COLLEGE OF ENGINEERING & MANAGEMENT, AMRAVATI.

DEPARTMENT OF ELECTRICAL ENGINEERING

Certificate
This is to certify that Mr./Ms…………………………………………... of
….... Semester of Bachelor of Technology in Artificial Intelligence &
Data Science of P. R. Pote Patil College of Engineering &
Management, Amravati, has completed the term work satisfactory of
course…………………. for the academic year 20 - 20 as
prescribed in the curriculum.

Date………………… Roll No……………………

​ ​ ​ ​ ​ ​
Subject Teacher Head of the Department

LIST OF (PRACTICALS / EXPERIMENTS)& PROGRESSIVE ASSESSMENT FOR TERM WORK

Academic Year :2024-25 Course……………………………….


Course Code: 6AD01 Semester: VI
Name of Faculty: Prof. S. C. Pakhale
Name of Student: ………………………………………
Roll No………………………………

SN. Title of the Date of Date of


Practical / Experiment Performance Submission Sign of Teacher
1 Install and configure Hadoop framework
2 Working with Hadoop distributed file system
3 Implement the Map reduce method in
hadoop and write the Word count program
4 Develop the Program for Apriori Algorithm
5 Installation of R Studio and write a simple
program for it
6 To construct a Data Frame and develop an R
program for data frame
7 Construct a program for Manipulating &
Processing Data in R.
To Generate Graphs Using Plot(), Hist(),
8 Linechart(), Pie(), Boxplot(), and
Scatterplots() Develop an R program

Signature of Faculty
Course Outcomes
After successful completion of laboratory course, the students will able to
SN Outcomes
After successful completion of this course, students will be able to
1 To understand Data Analytics Life Cycle and Business Challenges
2 To understand Analytical Techniques and Statically Models
3 To understand Statically Modeling Language.
Rubrics used for continuous Assessment in every lab session
Skills Allocated Parameters High Medium Low
Marks
1.​ Handle equipment/ tools/ Most Partially Below
commands correctly or satisfactory successful expectation
Logic Formation (5) (4-5) (3) (0-2)
2.​ Work cohesively in team Exceptional (2) Satisfactory Unsatisfactory
(2) (1) (0)
Process
Related 15
1.​ Integrate system & Partially Incorrect or
Skills measure parameters correct (2-3) unsatisfactory
correctly or Debugging Highly
satisfactory (4) (0-1)
Ability(4) *Completed *Completed
2.​ Completed experiment as In time (4) but delayed with 50%
per schedule (4) (2) delayed (1)
Obtain correct results, Highly Partially Incorrect (0-1)
Interpret results (4) Accurate (4) correct (2-3)
Product Highly accurate Unsatisfactory
Related 10 Draw conclusion (3) (3) Partially (2) (1)
Skills
Answer practical related Highly Moderate Unsatisfactory
questions & submit the satisfactory (3) satisfactory (1)
write up of expt on time (3) (2)
Total Marks 25 Marks
Assessment
Marks Rubrics: 25 = a (05) + b (02) + c (04) + d (04) + e (04) + f (03) + g (03)
a: Handle equipment/ tools/ commands correctly or Logic Formation
b: Work cohesively in team
c: Integrate system & measure parameters correctly or Debugging Ability
d: Completed experiment as per schedule
e: Obtain correct results, Interpret results
f: Draw conclusion
g: Answer practical related questions & submit the write up of experiment on time
SN. Title of the (a) (b) (c) (d) (e) (f) (g) Total
25
Practical / Experiment 05 02 04 04 04 03 03 Marks
1 Install and configure
Hadoop framework
2 Working with Hadoop
distributed file system
Implement the Map reduce
3 method in hadoop and write
the Word count program
4 Develop the Program for
Apriori Algorithm
5 Installation of R Studio and
write a simple program for it
To construct a Data Frame
6 and develop an R program
for data frame
Construct a program for
7 Manipulating & Processing
Data in R.
To Generate Graphs Using
Plot(), Hist(), Linechart(),
8 Pie(), Boxplot(), and
Scatterplots() Develop an R
program
EXPERIMENT NO: 01
Title: Install and configure Hadoop framework
Objective: to enable the use of a distributed computing environment for handling
large volumes of data efficiently
Software Details:
SN Name of Software/Tools Specification Qty Required
01 Hadoop 3.12.4 01
02 Java 8 version 01
Theory:
Hadoop software can be installed in three modes of
Hadoop is a Java-based programming framework that supports the processing and
storage of extremely large data sets on a cluster of inexpensive machines. It was the first
major open source project in the big data playing field and is sponsored by the Apache
Software Foundation.
Hadoop-2.7.3 is comprised of four main layers:
●​ Hadoop Common is the collection of utilities and libraries that support other
hadoop modules.
●​ HDFS, which stands for Hadoop Distributed File System, is responsible for
persisting data to disk.
●​ YARN, short for Yet Another Resource Negotiator, is the "operating system" for
HDFS.
●​ Map Reduce is the original processing model for Hadoop clusters. It distributes
work within the cluster or map, then organizes and reduces the results from the
nodes into a response to a query. Many other processing models are available for
the 2.x version of Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone
mode which is suitable for learning about Hadoop, performing simple operations, and
debugging.
Procedure:
We’ll install Hadoop in stand-alone mode and run one of the example example Map
Reduce programs it includes verifying the installation.
Prerequisites:
Step1: Installing Java 8 version.
Open jdk version "1.8.0_91"
Open JDK Runtime Environment (build1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
Open JDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that Open JDK has been successfully installed.
Note: To set the path for environment variables. i.e. JAVA_HOME
Step2: Installing Hadoop
With Java in place, we'll visit the Apache Hadoop Releases page to find the most recent
stable release. Follow the binary for the current release:
Procedure:
Step1: Installing Java 8 version.

Download Hadoop from www.hadoop.apache.org

Conclusion:
Assessment Scheme:
Process Related Skills Product Related Skills Total Signature of
(15-M) (10-M) (25-M) Faculty

EXPERIMENT NO: 02
Title: Working with Hadoop distributed file system
Objective: To efficiently store and manage large volumes of data across a
distributed environment
Theory:
Working with the Hadoop Distributed File System (HDFS) involves interacting with a
distributed storage system designed to handle large amounts of data across multiple
machines. HDFS is part of the Apache Hadoop ecosystem and provides fault-tolerant
storage with high throughput.
Here’s an overview of how to work with HDFS, including both the Hadoop command-line
interface (CLI) and programmatic approaches (Java and Python).
1. HDFS Overview
HDFS has two key components:
1. NameNode: This is the master node that manages the filesystem namespace and
regulates access to files.
2. DataNode: These are the worker nodes that store the actual data blocks.
HDFS stores large files by splitting them into blocks (default block size is 128 MB or 256
MB) and distributing them across different nodes in the cluster.
2. Using the HDFS Command-Line Interface (CLI)
You can interact with HDFS using the Hadoop CLI. Some basic commands include:
a. Listing files in HDFS:
hdfs dfs -ls /user/hadoop/
This lists all the files and directories in the /user/hadoop/ directory.
b. Creating directories in HDFS:
hdfs dfs -mkdir /user/hadoop/new_dir
This creates a new directory in HDFS at /user/hadoop/new_dir.
c. Copying files from local filesystem to HDFS:
hdfs dfs -put localfile.txt /user/hadoop/
This uploads localfile.txt from your local file system to HDFS under /user/hadoop/.
d. Copying files from HDFS to local file system:
hdfs dfs -get /user/hadoop/testfile.txt /path/to/local/
This retrieves the testfile.txt from HDFS to your local machine.
e. Reading a file from HDFS:
hdfs dfs -cat /user/hadoop/testfile.txt
This prints the content of testfile.txt from HDFS to the terminal.
f. Deleting files from HDFS:
hdfs dfs -rm /user/hadoop/testfile.txt
This deletes the file testfile.txt from HDFS.
g. Checking the status of HDFS:
hdfs dfsadmin -report
This provides an overview of the HDFS cluster's status, including the amount of data
stored and available space.
Program:-

Result:-

Conclusion:
Assessment Scheme:
Process Related Skills Product Related Skills Total Signature of
(15-M) (10-M) (25-M) Faculty

EXPERIMENT NO: 03
Title: Implement the Map Reduce method in hadoop and write the Word count program
Objective: To leverage the distributed computing capabilities of Hadoop to efficiently
process large datasets.
Theory:
Map Reduce is a core component of Hadoop that enables distributed data
processing. It allows you to perform operations like filtering, aggregation, and
transformation of large datasets.
Example Program: Word Count Program This is a classic example where you
count the number of occurrences of each word in a dataset. Here’s how it works:
Mapper: Reads the input and splits it into words.
Reducer: Aggregates the word counts.
Code (MapReduce in Java):
Program:-

Result:-

Conclusion:

Assessment Scheme:
Process Related Skills Product Related Skills Total Signature of
(15-M) (10-M) (25-M) Faculty
EXPERIMENT NO: 04
Title: Develop the Program for Apriori Algorithm
Objective: To implement an efficient method for identifying frequent itemsets and
generating association rules from a transactional dataset.
Theory:
The Apriori algorithm is a classic data mining technique used to find frequent itemsets in
a transaction dataset and derive association rules. It was introduced by Rakesh Agrawal
and Ramakrishnan Srikant in 1994 and is mainly applied in market basket analysis.
Key Concepts:
Frequent Itemsets: A set of items that appear together in a transaction dataset with
frequency above a specified threshold.
Association Rules: These rules express relationships between items, showing how the
presence of one item in a transaction affects the presence of another item. For example,
"If a customer buys bread, they are likely to also buy butter."
Example:
Let's say you have a transaction dataset:
Transaction ID​ Items Bought
1​ ​ ​ Milk, Bread
2​ ​ ​ Milk, Butter
3​ ​ ​ Milk, Bread, Butter
4​ ​ ​ Bread, Butter
Step-by-step:
Step 1: Find frequent 1-itemsets (e.g., Milk, Bread, Butter) by counting the frequency of
individual items.
Step 2: Generate frequent 2-itemsets (e.g., Milk & Bread, Milk & Butter, Bread & Butter).
Step 3: Generate rules like "If Milk is bought, then Bread is also bought."
Program: -
Result: -
Conclusion:
Assessment Scheme:
Process Related Skills Product Related Skills Total Signature of
(15-M) (10-M) (25-M) Faculty
EXPERIMENT NO: 05
Title: Installation of R Studio and write a simple program for it
Objective: The objective of installing R is to set up an environment for statistical
computing and data analysis
Theory:
R is a programming language and free software developed by Ross Ihaka and Robert
Gentleman in 1993.
R possesses an extensive catalog of statistical and graphical methods. It includes
machine Learning algorithms, linear regression, time series, statistical inference to name
a few. Most of the R Libraries are written in R, but for heavy computational tasks, C, C++
and FORTRAN codes are Preferred.
R is not only entrusted by academic, but many large companies also use R programming
Language, including Uber, Google, Airbnb, Facebook and so on.
Data analysis with R is done in a series of steps; programming, transforming, discovering,
modeling and communicate the results.
Program: R is a clear and accessible programming tool
Transform: R is made up of a collection of libraries designed specifically for data science
Discover: Investigate the data, refine your hypothesis and analyze them
Model: R provides a wide array of tools to capture the right model for your data
Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build
Shiny apps to share with the world
What is R used for?
Statistical inference
Data analysis
Machine learning algorithm
Procedure:
Installation of R-Studio on windows:
Step – 1: With R-base installed, let’s move on to installing RStudio. To begin, go to
download RStudio and click on the download button for RStudio desktop
Step – 2: Click on the link for the windows version of RStudio and save the .exe file.
Step – 3: Run the .exe and follow the installation instructions.
1.​ Click next on the welcome window

• Enter/Browse the path to the installation folder and click Next to proceed.

• Select the folder for the start menu shortcut or click on do not create shortcuts and then
click Next.
Wait for the installation process to complete.

• Click Finish to end the installation.

Install the R Packages:-


In R Studio, if you require a particular library, then you can go through the
following instructions:
First, run R Studio.
After clicking on the packages tab, click on install. The following dialog box will
appear.
• In the Install Packages dialog, write the package name you want to install under
• The Packages field and then click install. This will install the package you
searched for or give you a list of matching packages based on your package text.
Installing Packages:-
The most common place to get packages from is CRAN. To install packages from CRAN
you use install. Packages("package name"). For instance, if you want to install the ggplot2
package, which is a very popular visualization package, you would type the following in
the console:-
Syntax:-
# install package from CRAN
install.packages("ggplot2")
Loading Packages:-
Once the package is downloaded to your computer you can access the functions and
resources provided by the package in two different ways:
# load the package to use in the current R session
Library (packagename)
Getting Help on Packages:-
For more direct help on packages that are installed on your computer you can use the
help and vignette functions. Here we can get help on the ggplot2 package with the
following:
Help(package = "ggplot2") # provides details regarding contents of a package
vignette(package = "ggplot2") # list vignettes available for a specific package
vignette("ggplot2-specs") ​ # view specific vignette
Vignette() ​ # view all vignettes on your computer

Program:

Result: -
Conclusion:

Assessment Scheme:
Process Related Skills Product Related Skills Total Signature of
(15-M) (10-M) (25-M) Faculty

EXPERIMENT NO: 06
Title: To construct a Data Frame and develop an R program for data frame
Objective: To create a Data Frame is to understand how to construct and manipulate a
structured collection of data.
Theory:
In R, a data frame is a two-dimensional, tabular data structure that can hold different
types of data (like numeric, character, factor, etc.) in columns. Each column can have
different data types, similar to a spreadsheet or SQL table. Data frames are a key
component in R for data manipulation and analysis.
Creating a Data Frame
You can create a data frame using the data frame () function.
Example:
# creating a simple data frame
df<- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Height = c(5.5, 6.0, 5.8)
)
print(df)
# Output:
# Name ​ Age ​ Height
# 1 Alice ​ 25 ​ 5.5
# 2 Bob ​ 30 ​ 6.0
# 3 Charlie ​ 35 ​ 5.8

Accessing Data Frame Components


You can access data frame components using various methods:
Accessing Columns
You can access columns by their name using the $ operator or by using square brackets.
# Access a column using $
ages<- df$Age
print(ages) # Output: [1] 25 30 35

# Access a column using square brackets


heights<- df[, "Height"]
print(heights) # Output: [1] 5.5 6.0 5.8

Accessing Rows
You can access rows using square brackets with row and column indices.
# Access the first row
first_row<- df[1, ]
print(first_row)

# Output:
# Name ​ Age ​ Height
# 1 Alice ​ 25 ​ 5.5

# Access specific rows and columns (e.g., first row, second column)
age_of_first_person<- df[1, 2]
print(age_of_first_person) # Output: [1] 25

Accessing Multiple Rows and Columns


You can also access multiple rows and columns.
# Access multiple rows (1st and 3rd) and specific columns (Name and Height)
subset_df<- df[c(1, 3), c("Name", "Height")]
print(subset_df)

# Output:
# Name ​ Height
# 1 Alice ​ 5.5
# 3 Charlie ​ 5.8

Adding and Modifying Columns


You can add a new column or modify existing columns easily.
# Adding a new column
df$Weight<- c(120, 150, 180)
print(df)
# Output:
# Name ​ Age ​ Height ​ Weight
# 1 Alice ​ 25 ​ 5.5 ​ ​ 120
# 2 Bob ​ 30 ​ 6.0 ​ ​ 150
# 3 Charlie ​ 35 ​ 5.8 ​ ​ 180

# Modifying an existing column (e.g., increase age by 1)


df$Age<- df$Age + 1
print(df)
# Output:
# Name ​ Age ​ Height ​ Weight
# 1 Alice ​ 26 ​ 5.5 ​ ​ 120
# 2 Bob ​ 31 ​ 6.0 ​ ​ 150
# 3 Charlie ​ 36 ​ 5.8 ​ ​ 180

Deleting Rows and Columns


You can delete rows or columns from a data frame.
Deleting a Column:
# Remove the Weight column
df$Weight<- NULL
print(df)
# Output:
# Name ​ Age ​ Height
# 1 Alice ​ 26 ​ 5.5
# 2 Bob ​ 31 ​ 6.0
# 3 Charlie ​ 36 ​ 5.8

Deleting a Row:
# Remove the second row
df<- df[-2, ] # or df = df[-which(df$Name == "Bob"), ]
print(df)
# Output:
# Name ​ Age ​ Height
# 1 Alice ​ 26 ​ 5.5
# 3 Charlie ​ 36 ​ 5.8

Basic Statistics and Summary


You can perform basic statistics on data frame columns.
# Summary statistics
Summary (df)

# Output:
# Name ​ ​ Age ​ Height
# Length:2 ​ Min. :26 ​ Min. :5.5
# Class :character 1st Qu.:26 1st Qu.:5.5
# Mode :character Median :31 Median :5.6
# Mean :31 Mean :5.6
# 3rd Qu.:36 3rd Qu.:5.8
# Max.:36 Max. :5.8

Importing and Exporting Data Frames


You often need to import data from CSV or Excel files and export data frames to these
formats.
Importing Data
# Importing data from a CSV file
df_imported<- read.csv("filename.csv")

Exporting Data
# Exporting data frame to a CSV file
write.csv(df, "output.csv", row.names = FALSE)

Program:-
Result: -

Conclusion:

Assessment Scheme:
Process Related Skills Product Related Skills Total Signature of
(15-M) (10-M) (25-M) Faculty

EXPERIMENT NO: 07
Title: Construct a program for Manipulating & Processing Data in R.

Objective: To provide an efficient framework for performing essential data operations,
such as filtering, sorting, transforming, aggregating, and summarizing datasets.
Theory:
Data manipulation and processing are fundamental tasks in data analysis, and R is a
powerful tool for handling these operations. By constructing a program for
manipulating and processing data in R, you can automate common data preparation
tasks such as cleaning, transforming, summarizing, and analyzing datasets. This
allows for efficient and reproducible workflows that support data-driven
decision-making.
Steps Involved in Constructing a Data Manipulation Program in R
Import Data: Load the dataset into R using appropriate functions based on the file
type (e.g., read.csv() for CSV files, read_excel() for Excel files).
Inspect the Data:
Check the structure, dimensions, and summary statistics of the data using functions
like head(), str(), summary(), and glimpse().
This helps in identifying the types of variables and spotting potential issues like
missing values or incorrect formats.
Clean Data:
Handle missing values (NAs) by removing them or replacing them with suitable
values (mean, median, or other imputed values).
Remove duplicates using distinct () and correct data types if needed (e.g., convert a
character column to a factor or numeric).
Transform Data:
Create new columns using the mutate() function. This can involve mathematical
operations or string manipulation.
For example, creating a new column to categorize a numerical variable into
categories (e.g., converting scores into letter grades).
Filter and Sort Data:
Use filter() to extract specific rows based on conditions.
Use arrange() to sort the data in a desired order, such as by score in descending
order.
Group and Aggregate Data:
Use group_by() to group the data by categorical variables (e.g., department or
region).
Use summarise() to compute aggregate values (mean, median, sum, etc.) for each
group.
Summarize the Data:
Summarize the dataset using statistical measures, and get insights into its
distribution or central tendency.
Export Processed Data:
Save the processed data to a new file using write.csv() or other appropriate
functions.
Program:-
Result:

Conclusion:

Assessment Scheme:
Process Related Skills Product Related Skills Total Signature of
(15-M) (10-M) (25-M) Faculty
EXPERIMENT NO: 08
Title: To Generate Graphs Using Plot(), Hist(), Linechart(), Pie(), Boxplot(), and
Scatterplots() Develop an R program
Objective: To model and analyze relationships and connections between
different entities or elements.
Theory:
Graphs using Graph functions: Plot(), Hist(),Line chart(), Pie(), Box plot(), Scatter plots()
in r programming
In R, various functions are provided for creating different types of graphs and
visualizations. Below are examples of how to use the most commonly used plotting
functions, including plot (), hist(), lines(), pie(), boxplot(), and scatterplot().

1. Basic Plotting with plot ()


The plot () function is a versatile function for creating basic scatter plots and line graphs.
Example: Scatter Plot
# Create sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 5, 6, 5)

# Basic scatter plot


Plot (x, y, main = "Scatter Plot", x lab = "X-axis Label", y lab = "Y-axis Label", col = "blue",
pch = 19)
2. Histogram with hist()
The hist() function is used to create histograms to visualize the distribution of a dataset.
Example: Histogram
# Create sample data
data<- rnorm(1000) # Generate 1000 random numbers from a normal distribution
# Create histogram
hist(data, main = "Histogram", xlab = "Value", col = "lightblue", border = "black", breaks
= 30)
3. Line Chart with lines ()
After creating a basic plot, you can add lines using the lines() function.
Example: Line Chart
# Create data for line chart
x <- seq(1, 10, by = 0.1)
y <- sin(x)

# Basic plot
plot(x, y, type = "n", main = "Line Chart", x lab = "X-axis", y lab = "Y-axis") # "type = 'n'"
does not plot points
lines (x, y, col = "red", lwd = 2)
# Adding line

4.Pie Chart with pie ()


The pie () function is used to create pie charts.
Example: Pie Chart
# Create sample data
values<- c(10, 20, 30, 40)
labels<- c("A", "B", "C", "D")
# create pie chart
Pie (values, labels = labels, main = "Pie Chart", col = rainbow(length(values)))

5. Box plot with box plot()


The box plot () function creates box-and-whisker plots to visually summarize a dataset.
Example: Box plot
# create sample data
data<- list(A = r norm(100), B = r norm(100, mean = 1), C = r norm(100, mean = 2))
# create box plot
Box plot(data, main = "Box plot", x lab = "Groups", y lab = "Values", col = c("light green",
"light coral", "light blue"))

6. Scatter Plots with plot ()


You can also use the plot() function to create scatter plots (as demonstrated above), or for
more complex scatter plots, you can utilize additional arguments.
Example: Enhanced Scatter Plot
# Create sample data
set.seed(42) # For reproducibility
x <- r norm (100)
y <- r norm (100)

# Enhanced scatter plot


plot(x, y, main = "Enhanced Scatter Plot", x lab = "X Value", y lab = "Y Value", col = "dark
blue", pch = 19, cex = 1.5)
abline(lm(y ~ x), col = "red", lwd = 2)
# Adding a regression line

Program: -
Result: -

Conclusion:
Assessment Scheme:
Process Related Skills Product Related Skills Total Signature of
(15-M) (10-M) (25-M) Faculty

You might also like