0% found this document useful (0 votes)
11 views

Experiment 2

This document demonstrates how to arrange and visualize data using R by loading sample employee data, performing summary statistics and creating frequency tables and cross tabulations, and generating various plots including bar plots and box plots to represent the data graphically. Code is provided to subset the data by gender and status, calculate summary statistics, and create tables and plots to analyze and visualize the employee data.

Uploaded by

HARISH V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Experiment 2

This document demonstrates how to arrange and visualize data using R by loading sample employee data, performing summary statistics and creating frequency tables and cross tabulations, and generating various plots including bar plots and box plots to represent the data graphically. Code is provided to subset the data by gender and status, calculate summary statistics, and create tables and plots to analyze and visualize the employee data.

Uploaded by

HARISH V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Experiment-2

Computing Summary Statistics /plotting and visualizing


data using Tabulation and Graphical Representations
Aim:
The purpose of this experiment is to learn the different alignment of data set and various
graphical representations in R

Procedure:
Step by step procedure to conduct the required experiment –
1. Arrangement of data using various R functions
2. Visualize the data set using various R functions
Code and Results:

empid=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
empid

## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

age=c(30,37,45,32,50,60,35,32,34,43,32,30,43,50,60)
age

## [1] 30 37 45 32 50 60 35 32 34 43 32 30 43 50 60

gender=c(0,1,0,1,1,1,0,0,1,0,0,1,1,0,0)
gender

## [1] 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0

status=c(1,1,2,2,1,1,1,2,2,1,2,1,2,1,2)
status

## [1] 1 1 2 2 1 1 1 2 2 1 2 1 2 1 2

## empid age gender status


## 1 1 30 0 1
## 2 2 37 1 1
## 3 3 45 0 2
## 4 4 32 1 2
## 5 5 50 1 1
## 6 6 60 1 1
## 7 7 35 0 1
## 8 8 32 0 2
## 9 9 34 1 2
## 10 10 43 0 1
## 11 11 32 0 2
## 12 12 30 1 1
## 13 13 43 1 2
## 14 14 50 0 1
## 15 15 60 0 2

# labeling character to numeric


empinfo$gender=factor(empinfo$gender,labels=c("male","female")) empinfo$gender

## [1] male female male female female female male male female male
## [11] male female female male male
## Levels: male female

empinfo$status=factor(empinfo$status,labels=c("staff","faculty"))
empinfo$status

## [1] staff staff faculty faculty staff staff staff faculty


faculty
## [10] staff faculty staff faculty staff faculty
## Levels: staff faculty

empinfo
## empid age gender status
## 1 1 30 male staff
## 2 2 37 female staff
## 3 3 45 male faculty
## 4 4 32 female faculty
## 5 5 50 female staff
## 6 6 60 female staff
## 7 7 35 male staff
## 8 8 32 male faculty
## 9 9 34 female faculty
## 10 10 43 male staff
## 11 11 32 male faculty
## 12 12 30 female staff
## 13 13 43 female faculty
## 14 14 50 male staff
## 15 15 60 male faculty
# Extract male data
male=subset(empinfo,empinfo$gender=="male") male

## empid age gender status


## 1 1 30 male staff
## 3 3 45 male faculty
## 7 7 35 male staff
## 8 8 32 male faculty
## 10 10 43 male staff
## 11 11 32 male faculty
## 14 14 50 male staff
## 15 15 60 male faculty

# Extract female data


female=subset(empinfo, empinfo$gender=='female') female

## empid age gender status


## 2 2 37 female staff
## 4 4 32 female faculty
## 5 5 50 female staff
## 6 6 60 female staff
## 9 9 34 female faculty
## 12 12 30 female staff
## 13 13 43 female faculty

# summary statistics for empinfo data


summary(empinfo)

## empid age gender status


## Min. : 1.0 Min. :30.00 male :8 staff :8
## 1st Qu.: 4.5 1st Qu.:32.00 female:7 faculty:7
## Median : 8.0 Median :37.00
## Mean : 8.0 Mean :40.87
## 3rd Qu.:11.5 3rd Qu.:47.50
## Max. :15.0 Max. :60.00

# summary statistics of male,female and age


summary(male)

## empid age gender status


## Min. : 1.000 Min. :30.00 male :8 staff :4
## 1st Qu.: 6.000 1st Qu.:32.00 female:0 faculty:4
## Median : 9.000 Median :39.00
## Mean : 8.625 Mean :40.88
## 3rd Qu.:11.750 3rd Qu.:46.25
## Max. :15.000 Max. :60.00

summary(female)
## empid age gender status
## Min. : 2.000 Min. :30.00 male :0 staff :4
## 1st Qu.: 4.500 1st Qu.:33.00 female:7 faculty:3
## Median : 6.000 Median :37.00
## Mean : 7.286 Mean :40.86
## 3rd Qu.:10.500 3rd Qu.:46.50
## Max. :13.000 Max. :60.00

summary(age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30.00 32.00 37.00 40.87 47.50 60.00

# creating table (one-way)


table1=table(empinfo$gender) table1

##
## male female
## 8 7

table2=table(empinfo$status) table2

##
## staff faculty
## 8 7

# creating table (two-way)


table3=table(empinfo$gender, empinfo$status) table3

##
## staff faculty
## male 4 4
## female 4 3

plot(empinfo$age,type="l",main="Age
of employees",xlab="empid",ylab="age in
pie(table1)
# Graphical representation (Bar plot)
barplot(table3,beside=T,xlim=c(1,15),ylim=c(0,5),col=c("blue", "red"))
legend("topright",legend=rownames(table3),fill=c('blue','red'),bty="n")

# Graphical representation (Box plot)


boxplot(empinfo$age~empinfo$status,col=c('red','blue'))
Conclusion:
Different alignment of data set and various graphical representations in R have been explored
and executed.
Exercise
1. Import a data set from online and draw histogram by using R.
2. Import the given data set (excel sheet is attached) and do the following by using R codes
(i) Summary statistics
(ii) Frequency table
(iii) Cross tabulation
(iv) All possible plots

You might also like