DA_Lab_Week-1

The document provides an introduction to R Studio, covering basic operations and data import/export techniques. It discusses data mining, its applications, and the phases of the data mining process, along with an overview of R and RStudio as tools for statistical computing. Additionally, it includes examples of datasets and demonstrates how to save and load data in R, particularly using .Rdata and .CSV files.

Uploaded by

upesh

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

DA_Lab_Week-1

Uploaded by

upesh

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 7

SREE VIDYANIKETHAN ENGINEERING COLLEGE

(AUTONOMOUS)
SREE SAINATH NAGAR, A. RANGAMPET –517102.

Week-1
Aim: Introduction to R Studio, Basic operations and import and
export of data using R Tool.
Agenda:
1. About Data Mining
2. About R and RStudio
3. Basic Operations
a.
4. Datasets
5. Data Import and Export
a. Save and Load R Data
b. Import from and Export to .CSV Files

 Data mining is the process to discover interesting knowledge from large

amounts of data [Han and Kamber, 2000].
 It is an interdisciplinary field with contributions from many areas, such as:
o Statistics, machine learning, information retrieval, pattern recognition and
bioinformatics.
 Data mining is widely used in many domains, such as:
o Retail, Finance, telecommunication and social media.

 The main techniques for data mining include:

o Classification and prediction, clustering, outlier detection, association
rules, sequence analysis, time series analysis and text mining, and also
some new techniques such as social network analysis and sentiment
analysis.

 In real world applications, a data mining process can be broken into six major
phases:
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation and
6. Deployment

1
as defined by the CRISP-DM (Cross Industry Standard Process for Data Mining).
About R:
 R is a free software environment for statistical computing and graphics.
 It provides a wide variety of statistical and graphical techniques (http://www.r-
project.org/).
 R can be easily extended with 7324 packages available on CRAN (Comprehensive
R Archive Network) (http://cran.r-project.org/)
 To help users to find out which R packages to use, the CRAN Task Views are a
good guidance (http://cran.r-project.org/web/views/). They provide collections of
packages for different tasks. Some Task Views related to data mining are:
o Machine Learning & Statistical Learning
o Cluster Analysis & Finite Mixture Models
o Time Series Analysis
o Natural Language Processing
o Multivariate Statistics and
o Analysis of Spatial Data.

RStudio
 RStudio 10 is an integrated development environment (IDE) for R and can run on
various operating systems like Windows, Mac OS X and Linux. It is a very useful
and powerful tool for R programming.

2
 When RStudio is launched for the first time, you can see a window similar to
below Figure. There are four panels:
1. Source panel (top left), which shows your R source code. If you cannot see the
source panel, you can find it by clicking menu \File", \New File" and then \R
Script". You can run a line or a selection of R code by clicking the \Run" bottom on
top of source panel, or pressing \Ctrl + Enter".
2. Console panel (bottom left), which shows outputs and system messages
displayed in a normal R console;
3. Environment/History/Presentation panel (top right), whose three tabs show
respectively all objects and function loaded in R, a history of submitted R code,
and Presentations generated with R;
4. Files/Plots/Packages/Help/Viewer panel (bottom right), whose tabs show
respectively a list of _les, plots, R packages installed, help documentation and
local web content.

It is always a good practice to begin R programming with an RStudio project, which is

a folder where to put your R code, data files and figures.
 To create a new project, click the “Project" button at the top-right corner and
then choose “New Project".
 After that, select “create project from new directory" and then “Empty Project".
After typing a directory name, which will also be your project name, click “Create
Project" to create your project folder and files.

After that, create three folders as below:

1. code, where to put your R souce code;
2. data, where to put your datasets; and
3. figures, where to put produced diagrams.

In addition to above three folders which are useful to most projects, depending on your
project and preference, you may create additional folders below:
1. rawdata, where to put all raw data,
2. models, where to put all produced analytics models, and
3. reports, where to put your analysis reports.

3
Datasets

1. The Iris Dataset

2. The Bodyfat Dataset

The iris dataset (https://archive.ics.uci.edu/ml/datasets/Iris) has been used for

classification in many research publications. It consists of 50 samples from each of three
classes of iris owners [Frank and Asuncion, 2010]. One class is linearly separable from
the other two, while the latter are not linearly separable from each other. There are five
attributes in the dataset:
1. sepal length in cm,
2. sepal width in cm,
3. petal length in cm,
4. petal width in cm, and
5. class: Iris Setosa, Iris Versicolour, and Iris Virginica.

> str(iris)
'data.frame': 150 observations (records, or rows) of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

2. The Bodyfat Dataset

Bodyfat is a dataset available in package TH.data [Hothorn, 2015]. It has 71 rows, and
each row contains information of one person. It contains the following 10 numeric
columns.
_ age: age in years.
_ DEXfat: body fat measured by DXA, response variable.
_ waistcirc: waist circumference.
_ hipcirc: hip circumference.
_ elbowbreadth: breadth of the elbow.
_ kneebreadth: breadth of the knee.
_ anthro3a: sum of logarithm of three anthropometric measurements.
_ anthro3b: sum of logarithm of three anthropometric measurements.
_ anthro3c: sum of logarithm of three anthropometric measurements.
_ anthro4: sum of logarithm of three anthropometric measurements.

The value of DEXfat is to be predicted by the other variables.

4
> data("bodyfat", package = "TH.data")
> str(bodyfat)
data.frame: 71 obs. of 10 variables:
$ age : num 57 65 59 58 60 61 56 60 58 62 ...
$ DEXfat : num 41.7 43.3 35.4 22.8 36.4 ...
$ waistcirc : num 100 99.5 96 72 89.5 83.5 81 89 80 79 ...
$ hipcirc : num 112 116.5 108.5 96.5 100.5 ...
$ elbowbreadth: num 7.1 6.5 6.2 6.1 7.1 6.5 6.9 6.2 6.4 7 ...
$ kneebreadth : num 9.4 8.9 8.9 9.2 10 8.8 8.9 8.5 8.8 8.8 ...
$ anthro3a : num 4.42 4.63 4.12 4.03 4.24 3.55 4.14 4.04 3.91 3.66 ...
$ anthro3b : num 4.95 5.01 4.74 4.48 4.68 4.06 4.52 4.7 4.32 4.21 ...
$ anthro3c : num 4.5 4.48 4.6 3.91 4.15 3.64 4.31 4.47 3.47 3.6 ...
$ anthro4 : num 6.13 6.37 5.82 5.66 5.91 5.14 5.69 5.7 5.49 5.25 ...

Data Import and Export

Save and Load R Data

 Data in R can be saved as .Rdata files with function save() and .Rdata files can be
reloaded into R with load().
 With the code below, we first create a new object a as a numeric sequence (1,
2, ..., 10) and a second new object b as a vector of characters (`a', `b', `c', `d',
`e').
 Object letters is a built-in vector in R of 26 English letters, and letters[1:5]
returns the first five letters. We then save them to a file and remove them from R
with function rm(). After that, we reload both a and b from the file and print their
values.
> a <- 1:10
> b <- letters[1:5]
>getwd() # to know the current directory and setwd() to set
> save(a, b, file="mydatafile.Rdata")
> rm(a, b)
> load("mydatafile.Rdata")
> print(a)
[1] 1 2 3 4 5 6 7 8 9 10
> print(b)
[1] "a" "b" "c" "d" "e"

 An alternative way to save and load R data objects is using functions saveRDS()
and readRDS(). They work in a similar way as save() and load().

5
 The differences are:
a. multiple R objects can be saved into one single _le with save(), but only
one object can be saved in a file with saveRDS(); and
b. readRDS() enables us to restore the data under a different object name,
while load() restores the data under the same object name as when it was
saved.
> a <- 1:10
> saveRDS(a, file="mydatafile2.rds")
> a2 <- readRDS("mydatafile2.rds")
> print(a2)
[1] 1 2 3 4 5 6 7 8 9 10

R also provides function save.image() to save everything in current workspace into a

single file, which is very convenient to save your current work and resume it later, if the
data loaded into R are not very big.

Import from and Export to .CSV Files

 Data frame is a data format that we mostly deal with in R. A data frame is similar
to a table in databases, with each row being an observation (or record) and each
column beding a variable (or feature).
 The example below demonstrates saving a dataframe into file and then reloaded
it into R. At first, we create three vectors, an integer vector, a numeric (real)
vector and a character vector, use function data.frame() to build them into
dataframe df1 and save it into a .CSV file with write.csv(). Function sample(5)
produces a random sample of five numbers out of 1 to 5.
 Column names in the data frame are then set with function names(). After that,
we reload the data frame from the file to a new data frame df2 with read.csv().
Note that the very first column printed below is the row names, created
automatically by R.

Example:
> var1 <- sample(5)
> var2 <- var1 / 10
> var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies")
> df1 <- data.frame(var1, var2, var3)

6
> names(df1) <- c("Var.Int", "Var.Num", "Var.Char")
> write.csv(df1, "mydatafile3.csv", row.names = FALSE)
> df2 <- read.csv("mydatafile3.csv")
> print(df2)

Var.Int Var.Num Var.Char

1 3 0.3 R
2 4 0.4 and
3 1 0.1 Data Mining
4 2 0.2 Examples
5 5 0.5 Case Studies

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
The Akashic Records
75% (8)
The Akashic Records
5 pages
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
100% (1)
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
424 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
Grokking Machine Learning v7 MEAP
100% (9)
Grokking Machine Learning v7 MEAP
280 pages
Generative AI For Beginners1
100% (1)
Generative AI For Beginners1
85 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
Palantir Price List
No ratings yet
Palantir Price List
2 pages
Gestalt
100% (3)
Gestalt
39 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
File 2018 08 24 15 54 17
100% (1)
File 2018 08 24 15 54 17
190 pages
R Programming
No ratings yet
R Programming
20 pages
R Tutiorial
No ratings yet
R Tutiorial
6 pages
seminar_1 2
No ratings yet
seminar_1 2
14 pages
Statistical Models Using R
No ratings yet
Statistical Models Using R
6 pages
R Manual
No ratings yet
R Manual
10 pages
Intro To R
No ratings yet
Intro To R
19 pages
Intro2R Wk2
No ratings yet
Intro2R Wk2
40 pages
Lecture 4.pptx
No ratings yet
Lecture 4.pptx
27 pages
Data Science Wrangling
No ratings yet
Data Science Wrangling
121 pages
ML File
No ratings yet
ML File
12 pages
Unit3 160420200647 PDF
No ratings yet
Unit3 160420200647 PDF
146 pages
Exploratory Data Analysis Using R
No ratings yet
Exploratory Data Analysis Using R
48 pages
R Programming ChatGPT
No ratings yet
R Programming ChatGPT
106 pages
Mod1 R Programming
No ratings yet
Mod1 R Programming
49 pages
L1 Intro R
No ratings yet
L1 Intro R
15 pages
P1 2018
No ratings yet
P1 2018
5 pages
1.R Unit 1
No ratings yet
1.R Unit 1
49 pages
E5 - Statistical Analysis Using R
100% (1)
E5 - Statistical Analysis Using R
45 pages
R For Data Science
No ratings yet
R For Data Science
47 pages
A Brief Introduction To R
No ratings yet
A Brief Introduction To R
17 pages
Unit 1- Data Analysis Using r
No ratings yet
Unit 1- Data Analysis Using r
28 pages
2.R Concepts - BDSM - Oct2020 PDF
No ratings yet
2.R Concepts - BDSM - Oct2020 PDF
37 pages
Mmsac FDP Tutorial
No ratings yet
Mmsac FDP Tutorial
54 pages
ANUSHKA
No ratings yet
ANUSHKA
41 pages
A Brief Guide To R For Beginners in Econometrics
No ratings yet
A Brief Guide To R For Beginners in Econometrics
32 pages
R Programming Lab
No ratings yet
R Programming Lab
46 pages
Data Visualization With R (2019!02!14)
No ratings yet
Data Visualization With R (2019!02!14)
111 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
ACFrOgBH3QzJqtesK4NqhLXNa89YjuS3PaAHn6kik2EC-R4sYvVX0XGFvE8x_Ht58eFFQEc9gzIMgpDiuPIQZWqTXZsOizAWpAQYieh_XY81COXksihekdcTTl6I_u_q0yu-dJYvyI2TJ-67I7L6sC0OM0Q0Rq9vdhlbv9SV2PsshAItQ_Jw3yJvbsJm
No ratings yet
ACFrOgBH3QzJqtesK4NqhLXNa89YjuS3PaAHn6kik2EC-R4sYvVX0XGFvE8x_Ht58eFFQEc9gzIMgpDiuPIQZWqTXZsOizAWpAQYieh_XY81COXksihekdcTTl6I_u_q0yu-dJYvyI2TJ-67I7L6sC0OM0Q0Rq9vdhlbv9SV2PsshAItQ_Jw3yJvbsJm
12 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
Data Science GTU SEM 5
100% (1)
Data Science GTU SEM 5
39 pages
lab-record
No ratings yet
lab-record
21 pages
R Manual
No ratings yet
R Manual
48 pages
The Basics of The R Programming Language
No ratings yet
The Basics of The R Programming Language
21 pages
Getting Started With R
No ratings yet
Getting Started With R
7 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
R Basics
No ratings yet
R Basics
112 pages
R Tutorial Session 1-2
100% (1)
R Tutorial Session 1-2
8 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
DSR LAB MANUAL - 10 programs
No ratings yet
DSR LAB MANUAL - 10 programs
34 pages
Basics of R PDF
No ratings yet
Basics of R PDF
38 pages
SSMDA Expt 7
No ratings yet
SSMDA Expt 7
16 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
Advantages and Disadvantages of R Programming
No ratings yet
Advantages and Disadvantages of R Programming
9 pages
DATA ANALYTICS LAB MANUAL
No ratings yet
DATA ANALYTICS LAB MANUAL
57 pages
MIT R For Machine Learning
No ratings yet
MIT R For Machine Learning
9 pages
Introduction To R
No ratings yet
Introduction To R
6 pages
A Concise Tutorial On R
No ratings yet
A Concise Tutorial On R
112 pages
Sanju - R
No ratings yet
Sanju - R
34 pages
UNIT 1
No ratings yet
UNIT 1
26 pages
How to use the R software
No ratings yet
How to use the R software
18 pages
CH02 Introduction To R
No ratings yet
CH02 Introduction To R
22 pages
PYQ With Solution-4
No ratings yet
PYQ With Solution-4
16 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
Lesson2 Dataframe
No ratings yet
Lesson2 Dataframe
4 pages
s4
No ratings yet
s4
15 pages
R Programming Assignment Answers (1)
No ratings yet
R Programming Assignment Answers (1)
9 pages
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Sudoku Theory
No ratings yet
Sudoku Theory
13 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
Improved Statistical Test
87% (171)
Improved Statistical Test
20 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Attention Is All You Need
50% (2)
Attention Is All You Need
11 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Situationalawareness 1 30
No ratings yet
Situationalawareness 1 30
30 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
AI Money Machine
100% (2)
AI Money Machine
267 pages
What Are The Different Types of Aptitude Test
100% (1)
What Are The Different Types of Aptitude Test
135 pages
101 Productivity Boosting ChatGPT Prompts
No ratings yet
101 Productivity Boosting ChatGPT Prompts
28 pages
Realworld - Python - Hackers Guide2021
67% (3)
Realworld - Python - Hackers Guide2021
362 pages
Cognitive Bias Cheat Sheet
100% (1)
Cognitive Bias Cheat Sheet
17 pages
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
100% (4)
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
61 pages
IT Case - NTT Data Group 10
100% (1)
IT Case - NTT Data Group 10
9 pages
Computer Generations: First Generation (1940-1956) Vacuum Tubes
No ratings yet
Computer Generations: First Generation (1940-1956) Vacuum Tubes
2 pages
HONOR Magic4 Pro User Guide - (Magic UI 6.0 - 01, En)
No ratings yet
HONOR Magic4 Pro User Guide - (Magic UI 6.0 - 01, En)
35 pages
Computer History
No ratings yet
Computer History
1 page
HUAWEI DBS3900 Series Upgrade Guide
No ratings yet
HUAWEI DBS3900 Series Upgrade Guide
10 pages
Intrusion Detection System: Bachelor of Engineering
No ratings yet
Intrusion Detection System: Bachelor of Engineering
14 pages
Adam Leszczyński POUG 2018, Sopot
No ratings yet
Adam Leszczyński POUG 2018, Sopot
39 pages
Best Practices For IMS Database Reorganization: Session: K01
No ratings yet
Best Practices For IMS Database Reorganization: Session: K01
54 pages
Auditing in A CIS Environment
No ratings yet
Auditing in A CIS Environment
12 pages
Online Workshop User Guide
No ratings yet
Online Workshop User Guide
7 pages
Dhirubhai Ambani Institute of Information and Communication Technology
No ratings yet
Dhirubhai Ambani Institute of Information and Communication Technology
1 page
MikroTik Site To Site OpenVPN Server Setup
No ratings yet
MikroTik Site To Site OpenVPN Server Setup
5 pages
Empowerment Technologies: Evaluating Web Pages
No ratings yet
Empowerment Technologies: Evaluating Web Pages
3 pages
Datasheet TC3541CE
No ratings yet
Datasheet TC3541CE
2 pages
Javascript Basics: Based On "Jquery Fundamentals" by Rebecca Murphey
No ratings yet
Javascript Basics: Based On "Jquery Fundamentals" by Rebecca Murphey
18 pages
StoneEagle Services Inc Et. Al.
No ratings yet
StoneEagle Services Inc Et. Al.
34 pages
ChamSys MagicQ Software Quick Start
No ratings yet
ChamSys MagicQ Software Quick Start
44 pages
Entes Catalog
No ratings yet
Entes Catalog
112 pages
Gpu Mining
100% (1)
Gpu Mining
3 pages
Senior Technical Applications Analyst - Hyperion: Job Family Sub Family Career Band
No ratings yet
Senior Technical Applications Analyst - Hyperion: Job Family Sub Family Career Band
3 pages
RoboDK Doc en Robots Fanuc
No ratings yet
RoboDK Doc en Robots Fanuc
14 pages
Ssecond - Year - Paper - B - 2 (26-4-24) Zeshan
No ratings yet
Ssecond - Year - Paper - B - 2 (26-4-24) Zeshan
50 pages
What's MADI
No ratings yet
What's MADI
2 pages
Unikl Miit Bse
No ratings yet
Unikl Miit Bse
5 pages
2.1.0 Powerpoint (Formatting Text)
No ratings yet
2.1.0 Powerpoint (Formatting Text)
7 pages
Synopsis
No ratings yet
Synopsis
7 pages
Workshop On PCB Etching
No ratings yet
Workshop On PCB Etching
2 pages
App Letter
No ratings yet
App Letter
4 pages