0% found this document useful (0 votes)
3 views

Introduction

Introduction

Uploaded by

junaidshah7922
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Introduction

Introduction

Uploaded by

junaidshah7922
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Introduction

The majority of data these days is stored in databases, either on


the cloud or on-premises. For purposes of analysis, sharing, etc,
subsets of the data are stored in MS Excel or in Comma Separated
Value (CSV) files. So, it’s quintessential to learn to read data from
these files into R and eventually write our R script’s output back
into Excel/CSV files.
CSV files
The base packages in R are capable of reading CSV files into R.
There is no need to download any package for this. This is how we
import a CSV file named “my_data.csv” into R.
#1. Simple
csv_data = read.csv("data_file_name.csv")
# Notes- the variable csv_data will be stored as a data frame data
type.
# 2. Additional arguments
csv_data1 = read.csv("my_data.csv", sep = ",", na.strings = c("",
"Missing"), skip = 1)
# Notes
# a. sep - this argument is to tell R what's the field separator
character in our data. The default separator is comma
# b. na.strings - (character or character vector) telling R how is
missing data coded in the data. Here in my data misssing data
can be either a blank cell ("") or can be coded as "Missing".
# c. skip - (integer) tells R how lines to skip at the start before
reading the data. Some people practice leaving a few blank lines
at the top before the data.
Excel files
Unlike CSV files, R cannot read in Excel files without downloading
additional packages. There are quite a few packages to read data
Excel files into R — openxlsx, readxl, xlsx, etc.). My personal
favorite is openxlsx and we will be seeing that in detail in this
post and beyond.
First, install openxlsx package by invoking the install.packages
command
install.packages("openxlsx")
Then invoke the library and read the excel file, as follows.
library(openxlsx)
# 1. Simple
excel_data = read.xlsx("my_data.xlsx")
# 2. With additional arguments
excel_data1 = read.xlsx("my_data.xlsx", detectDates = T, sheet =
"Data", startRow = 3, na.strings = "Missing", skipEmptyRows =
True)
# Notes
# a. detectDates- (True or False) this argument will tell R to detect
dates in the data and store it in "Date" format
# b. sheet- (character) tells R the name of Sheet in the Excel file
where the data is stored.You can only import data one sheet at a
time.
# c. na.strings - (character or character vector) telling R how is
missing data coded in the data. Here in my data misssing data
can be either a blank cell ("").
# c. skipEmptyRows - (True/False) an integer that tells R to skip
empty rows after the 1st row of data. If there are any blank rows
in the data, they will be skipped, thereby making data cleaning
easier.
Conclusions
In this post we have seen how to load data from CSV and Excel
files into R. This is a crucial aspect of data analysis that every
analyst needs to know. Here, I’ve explained one package
(openxlsx) that helps to import data in Excel file into R. Feel free
to explore the other packages, mentioned above, and choose the
one that works best for you.
The post is like setting up a tent for camping — the actual fun
starts only now. In the next post, we will be learning about data
processing via a popular package called DPLYR. I’m excited about
that post. 😃
In the meanwhile, just create a sample Excel/CSV file or use any
existing files on your computer and try loading it into R. I will
reiterate the reading of files Excel/CSV files in the next post.

You might also like