0% found this document useful (0 votes)
12 views

Week 1

Uploaded by

mper0084
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Week 1

Uploaded by

mper0084
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

ETW2001 FOUNDATIONS OF DATA ANALYSIS

WEEK 1
INTRODUCTION TO R
This week marks the beginning of our exploration into the world of data
analysis with a detailed introduction to R, utilizing the powerful platform of
RStudio. The aim of this initial phase is to build a solid foundation in
understanding R's unique syntax, an essential skill set for future data
analysts. Students will be introduced to the basics of writing and running R
scripts, covering fundamental programming concepts such as variables,
data types, and elementary operations. This foundational week is crafted to
provide students with the tools they need to effectively work within the
RStudio environment, preparing for more complex subjects in subsequent
weeks.
Prepared by Dr. Mogana Darshini Ganggayah
[email protected]
Page 01

LEARNING OUTCOMES
1. Understand and describe the essential elements
of R's syntax, including data types and basic
operations, demonstrating a foundational grasp of
programming concepts in R.
2. Apply knowledge of R syntax to write and execute
simple scripts, effectively using various data types
and structures to solve basic programming tasks.
3. Analyse and troubleshoot R scripts,
demonstrating the ability to identify and rectify
errors, and optimize code efficiency in basic data
analysis scenarios.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 02

TOPICS OF THE WEEK


01 02
Introduction to R Basic syntax in R
programming and programming
R Studio

03 04
Installing and importing Data structures in R
packages in R

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 03
ETW2001 FOUNDATIONS OF DATA ANALYSIS

QUOTE OF THE WEEK


Page 04

OVERVIEW OF R PROGRAMMING
IN DATA ANALYSIS
R programming has emerged as a cornerstone in the realm of
data analysis, offering a comprehensive suite of tools for data
manipulation, statistical modeling, and graphical representation.
Its open-source nature and extensive community support have
fostered a rich ecosystem of packages tailored for diverse
analytical needs across various domains. R's syntax and
programming environment are specifically designed with data
analysis and statistics in mind, enabling analysts to conduct
complex data operations, create compelling visualizations, and
develop sophisticated statistical models with relative ease. By
leveraging R's capabilities, data analysts can uncover insights
from data, facilitate decision-making processes, and contribute
to evidence-based research, making it an indispensable tool in
today’s data-driven world.

Presentation By Benjamin
ETW2001 FOUNDATIONS OF DATA Shah
ANALYSIS
KEY ASPECTS OF R PROGRAMMING LANGUAGE
Page 05

Statistical Analysis Graphics Capabilities


01 R excels in statistical analysis. It offers a
02 It allows users to create high-quality plots,
comprehensive range of statistical techniques including scatter plots, line charts, histograms, and
including linear and nonlinear modeling, time- more. These plots can be highly customized to
series analysis, classification, clustering, and more effectively communicate data insights.

Packages and Extensions Programming Features


03 Users can enhance its capabilities by installing
04 With features typically found in programming
packages for specific functions or applications. The languages like C, Java, and Python. This includes
Comprehensive R Archive Network (CRAN) hosts conditionals, loops, user-defined recursive
thousands of packages. functions, and input/output facilities.
Data Handling and Storage Applications in Various Fields
05 has robust data handling capabilities. It can import
06 Originally designed for statisticians, R is now
data from various sources including Excel, widely used in fields like finance, genomics,
databases, and other statistical packages. It pharmaceuticals, and social sciences. Its open-
supports a variety of data types like vectors, source nature and comprehensive toolset make it
matrices, arrays, data frames, and lists. a preferred choice for data analysis in research and
industry.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 06

BENEFITS OF DATA
ANALYSIS
1. Informed Decision Making

One of the primary benefits of data analysis is its ability


to provide a solid foundation for decision-making. By
analyzing data, businesses can identify trends,
understand customer behavior, and make strategic
decisions that are backed by evidence rather than
intuition.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 07

BENEFITS OF DATA
ANALYSIS

2. Identifying and Solving Problems

Data analysis helps in identifying patterns that may


indicate underlying problems. By recognizing these
patterns early, organizations can address issues before
they escalate, saving resources and time.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 08

BENEFITS OF DATA
ANALYSIS
3. Enhancing Customer Experience

By understanding customer preferences and behaviors


through data analysis, companies can tailor their
products, services, and interactions to meet the
expectations of their target audience, thereby
improving customer satisfaction and loyalty.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 09

BENEFITS OF DATA
ANALYSIS
4. Optimizing Operations

Data analysis can streamline operations, reduce costs,


and improve efficiency by identifying bottlenecks,
wasteful expenditures, and opportunities for
optimization within an organization's operations.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 10

BENEFITS OF DATA
ANALYSIS
5. Driving Innovation

By analyzing data, organizations can uncover new


opportunities for product development or service
enhancement, driving innovation and maintaining
competitive advantage.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 11

BENEFITS OF DATA
ANALYSIS
6. Risk Management

Data analysis plays a crucial role in risk assessment by


predicting potential risks and devising strategies to
mitigate them, thus safeguarding against financial
losses and operational disruptions.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 12

BENEFITS OF DATA
ANALYSIS
7. Enhancing Performance

Through the continuous monitoring and analysis of


performance data, organizations can set realistic goals,
measure outcomes, and refine strategies to achieve
higher levels of performance.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 13

BENEFITS OF DATA
ANALYSIS
8. Personalization

In marketing, data analysis allows for the


personalization of campaigns and strategies to target
specific segments of the market more effectively,
increasing the return on investment of marketing
efforts.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 14

OVERVIEW OF R STUDIO

RStudio is a comprehensive development environment designed specifically for R, a programming language


dedicated to statistical computing and graphics. As a powerful tool for data scientists and statisticians, RStudio
enhances the R experience by providing an accessible, user-friendly interface that simplifies coding, debugging,
and plotting. It supports seamless integration with various data sources and a vast array of packages, facilitating
extensive data analysis, modeling, and visualization capabilities. RStudio is available in both desktop and server
versions, allowing users to work locally or connect to RStudio Server on a remote server, offering flexibility for
individuals and teams working in diverse environments. The platform promotes efficient workflow management,
from data manipulation to publication-ready visualizations, and includes features like version control integration
for collaborative projects. Its ability to knit R Markdown documents enables users to combine code, output, and
narrative text into comprehensive reports, making RStudio an indispensable tool for reproducible research and
sharing insights.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 15

STEP-BY-STEP GUIDE
Step 1: Installing R and RStudio

1. Install R:
Visit the Comprehensive R Archive Network (CRAN) at
https://cran.r-project.org/.
Select the download link for your operating system
(Windows, macOS, or Linux).
Follow the installation instructions for your OS.
2. Install RStudio:
Go to the RStudio download page at
https://www.rstudio.com/products/rstudio/download/.
Download the free version of RStudio Desktop that
matches your operating system.
Install RStudio by following the prompts.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Step 2: Familiarizing Yourself with the RStudio Interface Page 16

R Script: area where you can write and Environment/History (top right): Displays
edit your R scripts variables in your current workspace and
command history.

Files/Plots/Packages/Help/Viewer (bottom right):


Console (bottom left by default): Where Navigate files, view plots, manage packages,
you can type and execute R commands. access R documentation, and view web content.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Step 3: Setting Up Your Workspace Page 17

1. Create a New Project: Go to File > New Project to create a new workspace. This helps in managing your
work for different projects separately.
2. Create an R Script: Click File > New File > R Script to open a new script tab in the top-left panel. This is where
you'll write longer blocks of code.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Step 4: Saving your work Page 18

1. Save the script: File > Save or Ctrl+S (Cmd+S on macOS).


2. Save the project: Ensures all files related to your project are saved together.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 19

BASIC SYNTAX IN R PROGAMMING


1. Assignment Operator in R

The Leftward Assignment Operator <-:


This is the most commonly used and
traditionally preferred operator in R for
assigning a value to a variable.
It reads as the value on the right is assigned
to the variable on the left.
Example: x <- 5 assigns the value 5 to the
variable x.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 20

BASIC SYNTAX IN R PROGAMMING


2. Comments in R

a) Single-Line Comments:
Single-line comments are created using the # symbol.
Everything following the # on that line is treated as a
comment.

B) R does not have a distinct multi-line comment feature like


some other languages. However, you can place a # at the
beginning of each line to create a block of comments.
Alternatively, you can use if(FALSE) to wrap a block of
code that you want to treat as a comment. This is a
workaround and less common.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


Page 21

BASIC SYNTAX IN R PROGAMMING


3. Variable Assignment

Variable assignment in R is
the process of storing data in
a variable. In R, you can
assign values to variables
using different operators.
Here are the main ways to
perform variable assignment

For character/ string


datatype, you should
include a quotation mark
“”.
For integer, just the
integer

ETW2001 FOUNDATIONS OF DATA ANALYSIS


BASIC SYNTAX IN R PROGAMMING Page 22

4. Basic Arithmetic Operations

Basic arithmetic operations are fundamental in R


programming, just like in any other programming
language. The primary arithmetic operators in R are:

1. Addition (+): Adds two numbers.


2. Subtraction (-): Subtracts the second number from
the first.
3. Multiplication (*): Multiplies two numbers.
4. Division (/): Divides the first number by the second.
5. Exponentiation (^ or **): Raises the first number to
the power of the second.
6. Modulus (%%): Returns the remainder of the
division of the first number by the second.
7. Integer Division (%/%): Divides the first number by
the second and returns the integer quotient.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


INSTALLING AND HANDLING LIBRARIES Page 23

In R, libraries (also known as packages) are collections of


R functions, data, and compiled code that have been
developed by the community to enhance R's basic
capabilities. They cover a wide range of functionalities,
from statistical techniques to graphical devices. Here's
how to import and use them, with examples:

Installing Packages
This command needs to be run only once per R
Before you can use a library in R, you need to install it.
installation, as it downloads and installs the package on
This is typically done using the install.packages()
your system.
function.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


INSTALLING AND HANDLING LIBRARIES Page 24

Loading libraries

Once a package is installed, you use the library()


function to load it into your R session. This makes the
functions and data in the package available for use. For
example, to load ggplot2.

Note:
You need an internet connection to install
packages from CRAN (the Comprehensive R This command needs to be run each time you start a
Archive Network). new R session and want to use the package
Some packages may depend on other packages,
which are usually installed automatically as needed.
Regularly updating R and its packages ensures
access to the latest functions and bug fixes.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


DATA STRUCTURES IN R Page 25

1. FACTOR Customer Satisfaction Survey


A) Description:

Factors are used to represent


categorical data. They can be
unordered (nominal) or ordered
(ordinal). Factors are stored as
integers internally, with labels
associated with these unique
integer values.

B) Use Case:

Essential in statistical modeling,


especially for categorical
variables like gender, social
class, etc.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


DATA STRUCTURES IN R Page 26

2. VECTOR
A) Description:

Vectors are one-dimensional


arrays that can hold numeric,
character, or logical data. All
elements in a vector must be of
the same type.

B) Use Case:

Vectors are fundamental in R


and are used for storing and
manipulating a series of data
points.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


DATA STRUCTURES IN R Page 27

3. LIST
A) Description:

Lists are ordered collections of


objects. A list can contain
elements of different types,
including numbers, strings,
vectors, and even other lists.

B) Use Case:

Lists are versatile and can be


used to build complex data
structures, like nested lists or
data frames.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


DATA STRUCTURES IN R Page 28

4. MATRIX
A) Description:

Matrices are two-dimensional,


rectangular data sets where all
elements are of the same type.

B) Use Case:

Commonly used in
mathematical computations,
statistical analyses, and any
context where data is naturally
two-dimensional.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


DATA STRUCTURES IN R Page 29

5. ARRAY
A) Description:

Arrays are similar


to matrices but
can have more
than two
dimensions.

B) Use Case:

Useful for higher-


dimensional data,
used in more
complex
mathematical and
statistical contexts.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


DATA STRUCTURES IN R Page 30

6. DATAFRAME
A) Description:

Data frames are table-like


structures similar to
matrices but can contain
different types of data in
each column.

B) Use Case:

They are the most


commonly used data
structure in R for data
analysis, especially with
datasets where columns
represent different
variables of varying types.

ETW2001 FOUNDATIONS OF DATA ANALYSIS


SUMMARY OF WEEK 1

ETW2001 FOUNDATIONS OF DATA ANALYSIS

You might also like