UNIT 1
UNIT 1
1. Origins of R:
o R was conceived in 1992 by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand.
o The language was named R as a tribute to its creators (the first letters of their names).
o Although R wasn’t publicly released until 1995, its development was inspired by the limitations encountered with the S
language.
2. S Language:
o The S language was developed by John Chambers and others at Bell Telephone Laboratories (originally part of AT&T Corp).
o Initially implemented as Fortran libraries, S was an internal statistical analysis environment.
o Early versions lacked functions for statistical modeling.
o In 1988, S was rewritten in C, leading to the version we have today (Version 3).
o The book Statistical Models in S documented its statistical analysis functionality.
3. S Evolution and Ownership:
o Since the early '90s, the life of S has taken a winding path.
o In 1993, Bell Labs granted StatSci (later Insightful Corp.) an exclusive license for S.
o In 2004, Insightful purchased S from Lucent for $2 million.
o Insightful sold its implementation as S-PLUS (with additional features).
o In 2008, Insightful was acquired by TIBCO, which now owns and develops the S language.
4. S Philosophy:
o The S philosophy emphasizes ease of data analysis.
o It allows users to start interactively without consciously thinking of themselves as programmers.
o As needs grow, users can gradually transition into programming.
In summary, R, a modern implementation of S, has become the de facto language for data science due to its flexibility, power, and
expressiveness. If you have more questions or need further details, feel free to ask!
Sources:
DATA STRUCTURES :
Explore
Certainly! Let’s explore the essential data structures in R programming. These structures allow us to organize and manipulate data
effectively. Here are the key ones:
1. Vectors:
o A vector is an ordered collection of basic data types of a given length.
o All elements in a vector must be of the same data type (homogeneous).
o Vectors are generally created using the c() function.
o Example:
o # Creating a numeric vector
o x <- c(1, 5, 4, 9, 0)
o typeof(x) # Returns "double"
o length(x) # Returns 5
o
o # Creating a character vector
o y <- c("apple", "banana", "cherry")
o typeof(y) # Returns "character"
2. Lists:
o A list is a generic object consisting of an ordered collection of objects.
o Lists can hold different data types (heterogeneous).
o Example:
o # Creating a list
o empId <- c(1, 2, 3, 4)
o empName <- c("Debi", "Sandeep", "Subham", "Shiba")
o numberOfEmp <- 4
o empList <- list(empId, empName, numberOfEmp)
o print(empList)
3. Dataframes:
o Dataframes are two-dimensional tabular data structures.
o They are heterogeneous and commonly used for data analysis.
o Each column must have the same number of items, and each item in a column must be of the same data type.
o Example:
o # Creating a dataframe
o Name <- c("Amiya", "Raj", "Asish")
o Language <- c("R", "Python", "Java")
o Age <- c(22, 25, 45)
o df <- data.frame(Name, Language, Age)
o print(df)
Remember, these data structures play a crucial role in organizing and analyzing data in R! Feel free to ask if you need more details or have other
questions.
Sources:
FUNCTIONS:
Certainly! In R programming, functions play a crucial role. Let’s dive into the details:
1. What Is a Function in R?
o A function is a set of statements organized together to perform a specific task.
o It’s an object that executes a predefined sequence of commands when called.
o Functions can be built-in (provided by R) or user-defined (created by you).
2. Built-in Functions:
o R offers many helpful built-in functions for various tasks:
▪ min(), max(), mean(), median(): Compute statistics.
▪ sum(): Calculate the sum of a numeric vector.
▪ range(): Find the minimum and maximum values.
▪ abs(): Get the absolute value of a number.
▪ str(): Display the structure of an R object.
▪ length(): Count items in a vector or list.
▪ sort(): Sort a vector.
▪ exists(): Check if a variable exists.
o Example:
o vector <- c(3, 5, 2, 3, 1, 4)
o print(min(vector))
o print(mean(vector))
o print(median(vector))
o print(sum(vector))
o print(range(vector))
o print(length(vector))
o print(sort(vector, decreasing=TRUE))
o print(exists('vector')) # Note the quotation marks
3. Creating User-Defined Functions:
o To create your own function, use the function() keyword.
o Syntax:
o my_function <- function(parameters) {
o # Function body
o # Perform specific tasks
o }
o Example:
o # Custom function to add two numbers
o add_numbers <- function(a, b) {
o result <- a + b
o return(result)
o }
o # Calling the function
o print(add_numbers(10, 20)) # Output: 30
Remember, functions help modularize code, improve readability, and avoid repetition. Feel free to ask if you need more examples or have other
questions!
Sources:
SESSION:
1. What Is an R Session?
o An R session refers to the environment where you interact with R.
oDuring a session, you can execute R commands, create and manipulate objects, and perform data analysis.
oThe session includes any user-defined objects (such as vectors, matrices, data frames, lists, and functions).
2. Workspace and Saving Sessions:
o The workspace is your current R working environment within a session.
o It holds all the objects you create or load during your work.
o At the end of an R session, you can save an image of the current workspace. This image is automatically reloaded the next time
you start R.
o Saving the workspace allows you to continue where you left off, with all your variables and data intact.
3. Exiting an R Session:
o To exit an R session, you can:
▪ Close the R console or RStudio.
▪ Use the q() function (type q() and press Enter).
4. Listing Objects and Removing Them:
o To list all objects in the current session, use ls().
o To remove an object, use rm(object_name).
5. Setting Working Directories:
o You can set the current working directory using setwd("path/to/directory").
o To get the current working directory, use getwd().
Remember, managing sessions helps organize your work and ensures continuity across R sessions. If you have more questions or need further
details, feel free to ask!
Sources:
VARIABLES :
Explore
Certainly! In R programming, variables play a crucial role in storing and managing data. Let’s explore them:
Remember, variables allow you to store and manipulate data efficiently in R. If you have more questions or need further details, feel free to ask!
Sources:
DATA TYPES :
Certainly! In R programming, understanding data types is essential. Let’s explore the fundamental data types in R:
1. Numeric:
o Represents real numbers (with or without decimal points).
o Default type for numbers in R.
o Example:
o x <- 5.6
o typeof(x) # Output: "double"
2. Integer:
o Represents whole numbers (integers).
o You can use the L suffix to explicitly declare an integer.
o Example:
o y <- 5L
o typeof(y) # Output: "integer"
3. Logical:
o Represents Boolean values (TRUE or FALSE).
o Used for logical operations and conditions.
o Example:
o z <- TRUE
o typeof(z) # Output: "logical"
4. Complex:
o Represents complex numbers (with real and imaginary parts).
o Written as a + bi, where i is the imaginary unit.
o Example:
o w <- 1 + 2i
o typeof(w) # Output: "complex"
5. Character:
o Represents text or strings.
o Enclosed in double or single quotes.
o Example:
o name <- "Hello, R!"
o typeof(name) # Output: "character"
6. Raw:
o Represents raw bytes (binary data).
o Created using the as.raw() function.
o Example:
o raw_value <- as.raw(255)
o typeof(raw_value) # Output: "raw"
Remember, choosing the right data type ensures efficient memory usage and accurate computations in R! If you have more questions or need
further details, feel free to ask.
Sources:
VECTORS :
Certainly! In R programming, vectors are fundamental data structures that allow you to store and manipulate elements of the same data type.
Let’s explore vectors in more detail:
1. What Is a Vector in R?
o A vector is a one-dimensional array that holds elements of the same data type.
o It can store numeric values, characters, logical values, and more.
o Vectors are the building blocks of R and play a crucial role in various data manipulation and analysis tasks.
2. Creating Vectors:
o Vectors are generally created using the c() function (which stands for “combine” or “concatenate”).
o Example:
o # Creating a numeric vector
o x <- c(1, 5, 4, 9, 0)
o typeof(x) # Output: "double"
o length(x) # Output: 5
o
o # Creating a character vector
o y <- c("apple", "banana", "cherry")
o typeof(y) # Output: "character"
3. Accessing Elements of a Vector:
o Elements of a vector can be accessed using vector indexing.
o Vector indexing starts from 1 (unlike most programming languages where it starts from 0).
o Example:
o # Accessing specific elements
o x[3] # Returns the 3rd element (4)
o x[c(2, 4)] # Returns the 2nd and 4th elements (5, 9)
o
o # Using negative indexing to exclude elements
o x[-1] # Returns all elements except the 1st one
4. Creating Sequences with seq():
o The seq() function generates sequences with specific step sizes or lengths.
o Example:
o seq(1, 3, by = 0.2) # Generates a sequence from 1 to 3 with a step size of 0.2
o seq(1, 5, length.out = 4) # Generates a sequence from 1 to 5 with 4 elements
Remember, vectors are versatile and efficient for storing and manipulating data in R. If you have more questions or need further examples, feel
free to ask!
Sources:
SCALARS:
Certainly! In R programming, a scalar refers to a single value—a basic building block for more complex data structures. Let’s explore scalars
briefly:
1. Definition:
o A scalar is the simplest object type in R.
o It represents a single value, such as a number or a name.
2. Examples of Scalars:
o Numeric: Decimal numbers (e.g., 1.5) or whole numbers (integers).
o x <- 1.5
o typeof(x) # Output: "double"
o Character: Strings (sequences of characters enclosed in quotes).
o name <- "John"
o typeof(name) # Output: "character"
o Logical: Boolean values (TRUE or FALSE).
o flag <- TRUE
o typeof(flag) # Output: "logical"
3. Remember:
o Scalars are fundamental for computations and data manipulation in R.
o They serve as the building blocks for more complex data structures like vectors and matrices.
Feel free to ask if you need further details or have other questions!
Sources:
CONCLUSION:
Certainly! In R programming, the concept of a conclusion typically refers to summarizing the results or findings of an analysis or project. Let’s
explore this further:
1. Concluding an Analysis:
o When you perform data analysis or statistical modeling in R, you often arrive at insights or results.
o The conclusion is where you summarize these findings and draw meaningful insights.
o It’s essential to communicate your conclusions clearly and concisely.
2. Steps for a Good Conclusion:
o Summarize Results: Briefly state the key findings from your analysis.
o Interpretation: Explain what these findings mean in the context of your problem or research question.
o Recommendations: If applicable, provide recommendations or next steps based on your results.
3. Example: Suppose you analyzed a dataset on customer churn for a telecom company using R. Your conclusion might be:
“After analyzing the data, we found that customers with longer contract durations and higher monthly charges are less likely to
churn. We recommend focusing on retention strategies for high-value customers.”
Remember, a well-crafted conclusion helps stakeholders understand the implications of your work. If you have more questions or need further
details, feel free to ask!
Sources:
DATA FRAMES :
Explore
Certainly! In R programming, a data frame is a versatile and widely used data structure. Let’s dive into the details:
Data frames are essential for data manipulation, exploration, and statistical modeling in R. If you have more questions or need additional
examples, feel free to ask!
Sources:
LISTS :
Certainly! In R programming, a list is a versatile and widely used data structure. Let’s dive into the details:
1. What Is a List in R?
o A list is an ordered collection of objects (elements).
o Unlike vectors, lists can contain elements of different data types (heterogeneous).
o You can think of a list as a container that holds various data objects together.
2. Creating Lists:
o To create a list, use the list() function.
o Example:
o # Creating a simple list
o my_list <- list(
o Name = "Alice",
o Age = 30,
o Scores = c(85, 92, 78)
o )
o print(my_list)
3. Accessing List Components:
o You can access list components by name or index.
o Examples:
o # Accessing by name
o my_list$Name # Access the 'Name' component
o my_list$Scores # Access the 'Scores' component
o
o # Accessing by index
o my_list[[2]] # Access the second component (Age)
4. Named List Components:
o Naming list components makes it easier to access them.
o Example:
o named_list <- list(
o name = "Sudheer",
o age = 25,
o city = "Delhi"
o )
o print(named_list)
Lists are powerful for storing heterogeneous data and organizing complex structures in R. If you have more questions or need additional
examples, feel free to ask!
Sources:
• GeeksforGeeks - R Lists
• An Introduction to R - Lists and Data Frames
• DataFlair - R List Tutorial
MATRICES:
Certainly! In R programming, a matrix is a two-dimensional arrangement of data in rows and columns. It resembles a spreadsheet or a table.
Here are the key points about matrices:
1. Creating a Matrix in R:
o To create a matrix in R, use the matrix() function.
o You need to provide the set of elements (values) in the vector, along with the desired number of rows and columns.
o By default, matrices are filled column-wise.
o Example:
o # Creating a 3x3 matrix
o A <- matrix(
o c(1, 2, 3, 4, 5, 6, 7, 8, 9),
o nrow = 3,
o ncol = 3,
o byrow = TRUE
o )
o print(A)
2. Special Matrices in R:
o R allows you to create various types of special matrices:
▪ Constant Matrix: Filled with a single constant value.
▪ Diagonal Matrix: Non-diagonal elements are zeros.
▪ Identity Matrix: Diagonal elements are ones, and others are zeros.
o Examples:
o # Constant matrix (filled with 5)
o B <- matrix(5, 3, 3)
o print(B)
o
o # Diagonal matrix (with elements 5, 3, 3 on the diagonal)
o C <- diag(c(5, 3, 3), 3, 3)
o print(C)
o
o # Identity matrix (3x3)
o D <- diag(1, 3, 3)
o print(D)
3. Matrix Metrics:
o You can obtain information about a matrix:
▪ Number of rows: nrow(A)
▪ Number of columns: ncol(A)
▪ Dimensions: dim(A)
▪ Total number of elements: length(A)
o Example:
o cat("Number of rows:", nrow(A), "\n")
o cat("Number of columns:", ncol(A), "\n")
o cat("Total elements:", length(A), "\n")
Matrices are essential for linear algebra, statistical modeling, and data manipulation in R. If you have more questions or need further examples,
feel free to ask!
Sources:
1. GeeksforGeeks - R Matrices
2. DataMentor - R Matrix (With Examples)
3. A Short Introduction to R Programming - Matrices
4. javatpoint - R Matrix
ARRAY:
Certainly! In R programming, arrays are essential data structures that allow you to store and manipulate data in multiple dimensions. Let’s
explore arrays in more detail:
1. What Is an Array in R?
o An array is a multi-dimensional data structure that can hold elements of the same data type.
o Unlike vectors (which are one-dimensional), arrays can have more than two dimensions.
o Arrays are useful for representing data in a tabular format with rows and columns.
2. Creating Arrays:
o You can create an array using the array() function.
o Specify the data elements, dimensions (rows, columns, and matrices), and optionally provide names for dimensions.
o Example:
o # Creating a 3x3x2 array
o my_array <- array(
o data = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
o dim = c(3, 3, 2),
o dimnames = list(c("Row1", "Row2", "Row3"), c("Col1", "Col2", "Col3"), c("Matrix1", "Matrix2"))
o )
o print(my_array)
3. Accessing Array Elements:
o Use indexing to access specific elements within the array.
o Example:
o # Accessing specific elements
o my_array[2, 3, 1] # Access the element in the second row, third column, and first matrix
4. Special Types of Arrays:
o Matrices: A two-dimensional array (special case of an array).
o Data Frames: A type of array with named columns (used for tabular data).
o Lists: A flexible array that can hold elements of different data types.
Arrays are powerful for handling multi-dimensional data, especially in scientific computing and data analysis. If you have more questions or
need further examples, feel free to ask!
Sources:
1. TutorialsPoint - R Arrays
2. GeeksforGeeks - R Arrays
3. RDocumentation - array function
4. DataCamp - Arrays in R Tutorial
CLASSES:
Certainly! In R programming, classes play a crucial role in object-oriented programming. Let’s explore the different class systems in R:
1. S3 Class:
o S3 class is the most common and straightforward class system in R.
o It allows you to create objects with associated attributes (such as names or labels).
o S3 classes are used for many built-in R functions and packages.
o Example:
o # Creating an S3 class object
o student1 <- list(name = "John", age = 21, GPA = 3.5)
o class(student1) <- "Student_Info"
2. S4 Class:
o S4 class provides a more formal and structured approach to object-oriented programming.
o You define classes explicitly using the setClass() function.
o S4 classes have slots (member variables) with defined data types.
o Example:
o # Creating an S4 class
o setClass("Student_Info", slots = list(name = "character", age = "numeric", GPA = "numeric"))
o student2 <- new("Student_Info", name = "Alice", age = 22, GPA = 3.8)
3. Reference Class:
o Reference class (also known as RC or R5) is a more recent addition to R.
o It provides a more traditional object-oriented programming experience.
o Reference classes have mutable state (unlike S3 and S4).
o Example:
o # Creating a reference class
o setRefClass("Person", fields = list(name = "character", age = "numeric"))
o person1 <- new("Person", name = "Bob", age = 30)
In summary, classes in R allow you to define custom data structures and methods, making your code more organized and reusable. Each class
system has its own features, so choose the one that best fits your needs!
Sources:
DATA INPUT/OUTPUT:
Certainly! Let’s explore data input and output in R programming. Managing data is crucial for any data analysis or statistical modeling task.
Here are the key aspects of data input and output:
1. Data Input:
o Reading Data from External Sources:
▪ R allows you to read data from various formats, including:
▪ Text Files: Use functions like read.table(), read.csv(), or read.delim() to read data from plain text files.
▪ Excel Files: Use packages like readxl or openxlsx to read data from Excel spreadsheets.
▪ Database Connections: Connect to databases (e.g., MySQL, PostgreSQL) using packages like RMySQL,
RPostgreSQL, or odbc.
▪ Web Services: Fetch data from APIs or web services using packages like httr or jsonlite.
o Interactive Input:
▪ Use functions like readline() or scan() to read input directly from the user via the console.
2. Data Output:
o Writing Data to External Files:
▪ Save your results or data to external files:
▪ Text Files: Use functions like write.table() or write.csv() to save data to plain text files.
▪ Excel Files: Use packages like writexl or openxlsx to write data to Excel files.
▪ Other Formats: Save data in formats like JSON, XML, or HDF5 using relevant packages.
o Printing Output:
▪ Use functions like print(), cat(), or writeLines() to display output in the console.
▪ You can also redirect output to a file using sink().
3. Example:
o Reading data from a CSV file:
o # Read data from a CSV file
o my_data <- read.csv("my_data.csv")
o Writing data to a text file:
o # Save results to a text file
o write.table(my_results, "output.txt", sep = "\t", row.names = FALSE)
Remember, efficient data input/output is essential for effective data analysis and reporting in R. If you have more questions or need further
examples, feel free to ask!
Sources:
Certainly! In R programming, data storage formats are essential for managing and exchanging data. Let’s explore some common data storage
formats:
1. Textual Formats:
o CSV (Comma-Separated Values):
▪ CSV files store tabular data with values separated by commas.
▪ Widely used for data exchange between different software.
▪ Read using functions like read.csv() or read.table().
o TSV (Tab-Separated Values):
▪ Similar to CSV, but values are separated by tabs.
▪ Useful when data contains commas.
▪ Read using functions like read.delim().
2. Object Representation Formats:
o These formats represent R objects in a textual form:
▪ dput(): Deparses an R object into R code.
▪ dump(): Writes multiple R objects to a file.
▪ saveRDS(): Saves an R object in a binary format.
▪ loadRDS(): Reads an R object from a binary file.
3. Binary Formats:
o RDS (R Data Serialization):
▪
Binary format for saving R objects.
▪
Efficient and preserves metadata.
▪
Read using readRDS().
4. Other Formats:
o Excel Files: Use packages like readxl or openxlsx.
o Stata Files: Use write.dta() from the foreign package.
o JSON, XML, HDF5: For specialized data storage needs.
Remember to choose the appropriate format based on your use case, compatibility, and efficiency! If you need more details or examples, feel
free to ask.
Sources:
SUBSETTING OBJECTS :
Certainly! In R programming, subsetting allows you to extract specific elements from an object (such as vectors, data frames, or matrices).
There are several ways to perform subsetting, depending on the type of object and your requirements. Let’s explore some common methods:
Remember, subsetting helps you extract relevant information from your data objects efficiently. If you have specific requirements or need more
examples, feel free to ask!
VECTORIZATION:
Certainly! Vectorization is a powerful concept in R programming that allows you to perform operations on entire vectors or arrays at once,
rather than using explicit loops. Let’s dive into the details:
1. What Is Vectorization?
o Vectorization refers to the practice of applying an operation to an entire vector (or array) of data elements simultaneously.
o Instead of using explicit loops (like for loops), vectorized functions take advantage of optimized low-level code to process data
efficiently.
o Vectorization is a key feature of R, making code concise, readable, and computationally efficient.
2. Examples of Vectorized Operations:
o Element-Wise Arithmetic:
▪ You can perform arithmetic operations (addition, subtraction, multiplication, division) on entire vectors without explicit
loops.
▪ Example:
▪ x <- 1:5
▪ y <- 2:6
▪ z <- x + y # Element-wise addition
o Logical Operations:
▪ Logical operations (e.g., &, |, ==, !=) work element-wise on vectors.
▪ Example:
▪ is_even <- x %% 2 == 0 # Check if elements are even
o Math Functions:
▪ Functions like sqrt(), log(), sin(), etc., operate element-wise on vectors.
▪ Example:
▪ sin_values <- sin(x)
3. Advantages of Vectorization:
o Speed: Vectorized operations are faster than explicit loops.
o Readability: Code is concise and easier to understand.
o Efficiency: R’s optimized C/Fortran code handles the low-level details.
Remember, whenever possible, leverage vectorization in R to write efficient and elegant code! If you have more questions or need further
examples, feel free to ask.
Sources: