0% found this document useful (0 votes)
19 views

lab-record

Uploaded by

vyastanay30
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

lab-record

Uploaded by

vyastanay30
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

lOMoARcPSD|41453364

Lab Record 21BCG10126 - hgv 7huyh bihkbih

Computer Science (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Tanay Vyas ([email protected])
lOMoARcPSD|41453364

VIT Bhopal University

NAS1001 – Associative Data Analytics (LTP-4)


Slot: B11+B12+B13+B14
Class ID: BL2023241000207
FALL SEMESTER 2023-2024

Course Instructor: Dr. D Lakshmi

Name of the Student: Aniket Shrivastava


Register Number: 21BCG10126

List of Experiments

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

List of Challenging Experiments (Indicative) SLO:


1,2,5,9,12

1. Understanding of R System and installation and configuration of R 1-4


Environment and R-Studio, Understanding R Packages, their installation
and management

2. Understanding of nuts and bolts of R: 4-5


a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R

3. Excel and R integration with R connector. 5-7

4. Preparing Data in R 7-9


a. Data Cleaning
b. Data imputation
c. Data conversion

5. Outliers detection using R 9-12

6. Correlation and Regression Analysis in R 10-13

7. Clustering Algorithms implementation using R 13-15

8. Classification Algorithm implementation using R 15-17


Classification (Spam/Not spam)

9. Case study on Stock Market Analysis and applications. Stock data can be 17-19
obtained from Yahoo! Finance, Google Finance. A team of students can
apply statistical modeling on the stock data to uncover hidden patterns. R
provides tools for moving averages, auto regression and time-series
analysis which forms the crux of financial applications.

10. Detect credit card fraudulent transactions - The dataset can be obtained 19-20
from Kaggle. The team will use a variety of machine learning algorithms
that will be able to discern fraudulent from non-fraudulent one.

Experiment No: 1

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Aim: Understanding of R System and installation and configuration of R Environment and R-


Studio, Understanding R Packages, their installation and management

Data Description: R is a programming language for statistical computing and graphics


supported by the R Core Team and the R Foundation for Statistical Computing.
Designed by: Ross Ihaka, Robert Gentleman

Installing R:

Download R:

1. Go to the R Project's official website: https://www.r-project.org/


2. Click on the "CRAN" link under the "Download and Install R" section.
3. For Windows: Double-click the downloaded executable file and follow the installation
instructions.
4. For macOS: Double-click the downloaded package file and follow the installation
instructions.
5. For Linux: Follow the installation instructions specific to your Linux distribution.

Installing RStudio:

Download RStudio:

1. Go to the RStudio download page: https://www.rstudio.com/products/rstudio/download/


2. Under "RStudio Desktop," click the appropriate download link for your operating system
(Windows, macOS, or Linux).
3. Install RStudio:
4. For Windows: Double-click the downloaded installer and follow the installation
instructions.
5. For macOS: Double-click the downloaded disk image (.dmg) file, drag the RStudio icon
to the Applications folder, and then open RStudio from the Applications folder.
6. For Linux: Follow the installation instructions specific to your Linux distribution.

Installing R packages
It is a fundamental part of working with R. R packages contain pre-built functions, data sets, and
documentation that extend the capabilities of the R programming language. Here are the steps
to install R packages using the R console within RStudio:

Open RStudio:
Launch RStudio on your computer.

Open R Console:

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Once RStudio is open, you'll see several panels. The left-top panel is the R Console. This is
where you can directly interact with R by typing commands.

Install a Package:
To install an R package, you'll use the install.packages() function followed by the name of the
package you want to install. For example, to install the "ggplot2" package, type the following
command in the R Console and press Enter: install.packages("ggplot2")

Load the Package:


After installing a package, you need to load it into your R session to use its functions. Use the
library() function for this purpose. For example, to load the "ggplot2" package, type:
library(ggplot2)

Experiment No: 2

Aim: Understanding of nuts and bolts of R:


a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R

Data Description

a. R Program Structure: An R program consists of a series of commands


that are executed sequentially. These commands can be typed directly into
the R console or saved in a script file with a .R extension.

b. R Data Types, Command Syntax, and Control Structures: R


supports various data types, including numeric, character, logical, factor, and
more. Here's a quick overview: Numeric: Used for storing numeric values
(integers or decimals). Character: Used for storing text data. Logical:
Represents binary values TRUE or FALSE. Factor: Represents categorical data
with levels or categories.

c. File Operations in R: R provides functions to perform various file


operations:

R Code

a. R Program Structure:

library(package_name)

print(result)
my_function <- function(arg1, arg2) {
return(result)

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

result <- my_function(value1, value2)

b. R Data Types, Command Syntax, and Control Structures:

x <- 5
name <- "John"
is_valid <- TRUE
sum_result <- 3 + 7

c. File Operations in R:
Reading files
# Reading text files
data <- read.table("data.txt", header = TRUE)

# Reading CSV files


data <- read.csv("data.csv")

# Reading Excel files (requires 'readxl' package)


library(readxl)
data <- read_excel("data.xlsx")

Writing files
# Writing data to text file
write.table(data, "output.txt", sep = "\t", row.names = FALSE)

# Writing data to CSV file


write.csv(data, "output.csv", row.names = FALSE)

# Writing data to Excel file (requires 'openxlsx' package)


library(openxlsx)
write.xlsx(data, "output.xlsx")

Experiment No: 3

Aim: Excel and R integration with R connector.

Data Description:
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.
Sample rows and columns

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code
> install.packages("csv")
> library("csv")
> Salary_Dataset = read.csv(file.choose(), 1)
> Salary_Dataset

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 4

Aim: Preparing Data in R


a. Data Cleaning
b. Data imputation
c. Data conversion

Data Description

In this example, the CSV file has two columns:


experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code
# Load libraries
library(dplyr)
library(missForest)

# Read dataset
data <- read.csv("data.csv")

# Data Cleaning
cleaned_data <- data %>%
distinct() %>%
select(-Irrelevant_Column)

# Check for missing values


missing_values <- sum(is.na(cleaned_data))

if (missing_values > 0) {
# Data Imputation
imputed_data <- missForest(cleaned_data, verbose = TRUE)
} else {
imputed_data <- cleaned_data
}

# Data Conversion (if needed)


imputed_data$Categorical_Column <- as.factor(imputed_data$Categorical_Column)

# Display prepared dataset


print(imputed_data)

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 5

Aim: Outliers detection using R

Data Description
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code

Sample Input and Output

Experiment No: 6

Aim: Correlation and Regression Analysis in R

Data Description

In this example, the CSV file has two columns:


experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Sample rows and columns

R Code

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 7

Aim: Clustering Algorithms implementation using R

Data Description
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Sample Input and Output

Experiment No: 8

Aim: Classification Algorithm implementation using R


Classification (Spam/Not spam)

R Code

# Load required libraries


library(tm) # Text mining
library(e1071) # For Naive Bayes classifier
library(caret) # For model evaluation

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

# Load the SpamAssassin dataset (replace with your actual file path)
spam_data <- read.csv("path/to/spamassassin_data.csv", stringsAsFactors = FALSE)

# Preprocess the text data


corpus <- Corpus(VectorSource(spam_data$text))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords("en"))
corpus <- tm_map(corpus, stripWhitespace)

# Create a document-term matrix


dtm <- DocumentTermMatrix(corpus)

# Convert the document-term matrix to a data frame


spam_df <- as.data.frame(as.matrix(dtm))
colnames(spam_df) <- make.names(colnames(spam_df))

# Combine with labels


spam_df$label <- spam_data$label

# Split data into training and testing sets


set.seed(123)
train_indices <- sample(1:nrow(spam_df), 0.7 * nrow(spam_df))
train_data <- spam_df[train_indices, ]
test_data <- spam_df[-train_indices, ]

# Train a Naive Bayes classifier


naive_bayes_model <- naiveBayes(label ~ ., data = train_data)

# Make predictions
predictions <- predict(naive_bayes_model, newdata = test_data, type = "class")

# Evaluate the model


conf_matrix <- confusionMatrix(predictions, test_data$label)
print(conf_matrix)

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 9

Aim:Case study on Stock Market Analysis and applications. Stock data can be obtained from
Yahoo! Finance, Google Finance. A team of students can apply statistical modeling on the stock
data to uncover hidden patterns. R provides tools for moving averages, auto regression and
time-series analysis which forms the crux of financial applications.

Data Description

Stock data imported from Yahoo FInances.

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code
# Load required libraries
library(dplyr)
library(lubridate)

# Read the stock data CSV file (or load data from API)
stock_data <- read.csv("stock_data.csv")

# Convert date column to Date format


stock_data$Date <- ymd(stock_data$Date)

# Calculate 50-day and 200-day moving averages


stock_data$MA_50 <- SMA(stock_data$Close, n = 50)
stock_data$MA_200 <- SMA(stock_data$Close, n = 200)

# Load required library


library(forecast)

# Convert data to time series format


stock_ts <- ts(stock_data$Close, frequency = 365)

# Fit auto-regression model (ARIMA)


ar_model <- auto.arima(stock_ts)

# Load required libraries


library(ggplot2)
library(forecast)

# Decompose time series into trend, seasonal, and residual components


decomposed <- decompose(stock_ts)

# Plot decomposed components


plot(decomposed)

# Create a time series plot of stock prices and moving averages


ggplot(stock_data, aes(x = Date)) +
geom_line(aes(y = Close, color = "Stock Price")) +
geom_line(aes(y = MA_50, color = "50-day MA")) +
geom_line(aes(y = MA_200, color = "200-day MA")) +
labs(title = "Stock Price and Moving Averages", y = "Price") +
scale_color_manual(values = c("Stock Price" = "blue", "50-day MA" = "red", "200-day MA" =
"green"))

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 10

Aim: Detect credit card fraudulent transactions - The dataset can be obtained from Kaggle. The
team will use a variety of machine learning algorithms that will be able to discern fraudulent
from non-fraudulent one.

Data Description
The dataset was obtained from Kaggle

R Code
# Load required libraries
library(AnomalyDetection)
library(randomForest)

# Load the CreditCardFraud dataset


data("CreditCardFraud")

# Split data into training and testing sets (70% training, 30% testing)
set.seed(123)
train_indices <- sample(1:nrow(CreditCardFraud), 0.7 * nrow(CreditCardFraud))
train_data <- CreditCardFraud[train_indices, ]
test_data <- CreditCardFraud[-train_indices, ]

# Build Random Forest model


rf_model <- randomForest(Class ~ ., data = train_data, ntree = 100)

# Make predictions
predictions <- predict(rf_model, newdata = test_data)

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

# Calculate accuracy
accuracy <- sum(predictions == test_data$Class) / nrow(test_data)
print(paste("Accuracy score on Test Data: :", accuracy))

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

You might also like