0% found this document useful (0 votes)

9 views4 pages

Data analytics using r unit-3

Uploaded by

Padakandla Madhusudanacharyulu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

Data analytics using r unit-3

Uploaded by

Padakandla Madhusudanacharyulu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

UNIT-3

1.What is big data and explain need of data analytics?

Big data refers to large and complex datasets that are difficult to process
using traditional data management tools. These datasets typically have one
or more of the following characteristics:

1. Volume: Big data involves a large volume of data that exceeds the
processing capacity of conventional database systems. This could be
terabytes, petabytes, or even larger datasets.
2. Variety: Big data comes in various formats, including structured data
(like databases), unstructured data (such as text documents and social
media posts), and semi-structured data (like XML and JSON files).
3. Velocity: Big data is often generated at high speed and needs to be
processed quickly to extract valuable insights in a timely manner. For
example, data streaming from sensors or social media feeds.
4. Veracity: Big data can have quality and accuracy issues due to its
diverse sources and complex nature. Data analytics techniques are
needed to clean, validate, and preprocess the data for analysis.
5. Value: Despite the challenges, big data contains valuable information
that can lead to insights, improvements in decision-making, and
competitive advantages for businesses and organizations.

Now, let's talk about the need for data analytics in R programming,
especially concerning big data:

1. Handling Large Datasets: R provides various packages and tools

(e.g., dplyr, data.table, sqldf) that allow users to efficiently handle and
manipulate large datasets, making it suitable for big data analytics.
2. Statistical Analysis: R is widely used for statistical analysis, making
it a valuable tool for exploring and analyzing large datasets to uncover
patterns, trends, correlations, and anomalies.
3. Machine Learning: R offers numerous libraries (e.g., caret,
randomForest, xgboost) for machine learning tasks, enabling users to
build predictive models, clustering algorithms, and other advanced
analytics solutions on big data.
4. Visualization: R has powerful visualization libraries like ggplot2,
plotly, and ggplotly that help in creating insightful visualizations and
dashboards to communicate findings from big data analysis effectively.
5. Integration: R can be integrated with big data technologies such as
Apache Hadoop, Spark, and databases like MySQL, PostgreSQL, and
NoSQL databases, allowing seamless data access and analysis across
different platforms.

2.Explain mean, median,standard deviation ,variance,correlation functions in r

programming?

Mean
Madhusudanacharyulu Padakandla
UNIT-3

It is calculated by taking the sum of the values and dividing with the number of
values in a data series.

The function mean() is used to calculate this in R.

Syntax

mean(x, trim = 0, na.rm = FALSE, ...)

# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <- mean(x)
print(result.mean)

Median

The middle most value in a data series is called the median.

The median() function is used in R to calculate this value.
Syntax
median(x, na.rm = FALSE)

# Create the vector.

x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find the median.

median.result <- median(x)
print(median.result)

Standard Deviation:

Standard deviation measures the amount of variation or dispersion in a set of values.

# Create a sample vector

data <- c(10, 20, 30, 40, 50)
sd_result <- sd(data)

Madhusudanacharyulu Padakandla
UNIT-3

sd_result

Variance:
Variance is a measure of how spread out the values in a dataset are.
# Calculate variance
variance_result <- var(data)
variance_result

Correlation:
 Correlation measures the strength and direction of the linear relationship
between two variables.

# Create two sample vectors

x <- c(1, 2, 3, 4, 5)
y <- c(3, 5, 7, 9, 11)
correlation_result <- cor(x, y)
correlation_result

3.Explain bascic analysis techniques chi-square Test and T-test?

Chi-Square test is a statistical method to determine if two categorical variables
have a significant correlation between them. Both those variables should be from
same population and they should be categorical like − Yes/No, Male/Female,
Red/Green etc.

For example, we can build a data set with observations on people's ice-cream
buying pattern and try to correlate the gender of a person with the flavor of the
ice-cream they prefer. If a correlation is found we can plan for appropriate stock
of flavors by knowing the number of gender of people visiting.

Syntax:

The function used for performing chi-Square test is chisq.test().

chisq.test(data)

Example:

observed <- matrix(c(10, 20, 30, 40), nrow = 2, byrow = TRUE)

colnames(observed) <- c("Group A", "Group B")

Madhusudanacharyulu Padakandla
UNIT-3

rownames(observed) <- c("Category 1", "Category 2")

chi_square_result <- chisq.test(observed)

print(chi_square_result)

T-test:
In R, the t.test() function is used to perform a t-test, which is a statistical test used
to determine if there is a significant difference between the means of two groups.
# Generate example data
group1 <- c(25, 30, 35, 40, 45)
group2 <- c(20, 22, 25, 28, 30)
t_test_result <- t.test(group1, group2)
print(t_test_result)

Madhusudanacharyulu Padakandla

AWS Solution Architect Class Notes
100% (2)
AWS Solution Architect Class Notes
22 pages
R For Everyone - For Data Science
No ratings yet
R For Everyone - For Data Science
10 pages
Most Asked Interview Questions in SAP BW HANA
No ratings yet
Most Asked Interview Questions in SAP BW HANA
7 pages
Statistical Analysis and Visualizations Using R: Okan Bulut
No ratings yet
Statistical Analysis and Visualizations Using R: Okan Bulut
96 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
Complete Download Modern Statistics with R Måns Thulin PDF All Chapters
100% (2)
Complete Download Modern Statistics with R Måns Thulin PDF All Chapters
50 pages
Modern Statistics with R Måns Thulin - The latest ebook is available, download it today
No ratings yet
Modern Statistics with R Måns Thulin - The latest ebook is available, download it today
76 pages
Stats With R
No ratings yet
Stats With R
103 pages
R Short Course
No ratings yet
R Short Course
40 pages
Unit 4
No ratings yet
Unit 4
27 pages
Handbook of Regression Modeling in People Analytics 1st Edition Keith Mcnulty - Read the ebook now with the complete version and no limits
100% (1)
Handbook of Regression Modeling in People Analytics 1st Edition Keith Mcnulty - Read the ebook now with the complete version and no limits
79 pages
Handbook of Regression Modeling in People Analytics 1st Edition Keith Mcnulty 2024 Scribd Download
100% (2)
Handbook of Regression Modeling in People Analytics 1st Edition Keith Mcnulty 2024 Scribd Download
40 pages
Possible Questions on R Programming and Metaverse
No ratings yet
Possible Questions on R Programming and Metaverse
20 pages
Handbook of Regression Modeling in People Analytics 1st Edition Keith Mcnulty 2024 Scribd Download
100% (4)
Handbook of Regression Modeling in People Analytics 1st Edition Keith Mcnulty 2024 Scribd Download
50 pages
Full download Handbook of Regression Modeling in People Analytics 1st Edition Keith Mcnulty pdf docx
100% (1)
Full download Handbook of Regression Modeling in People Analytics 1st Edition Keith Mcnulty pdf docx
65 pages
computer-interactive-statistics
No ratings yet
computer-interactive-statistics
103 pages
(Ebook) R in Action by Robert Kabacoff ISBN 9781935182399, 1935182390 download
100% (1)
(Ebook) R in Action by Robert Kabacoff ISBN 9781935182399, 1935182390 download
57 pages
Ida PDF
No ratings yet
Ida PDF
62 pages
r 2m
No ratings yet
r 2m
34 pages
Statistics-with-R
No ratings yet
Statistics-with-R
10 pages
Contents
No ratings yet
Contents
17 pages
Basic Elements of Computational Statistics Google Drive Download
100% (12)
Basic Elements of Computational Statistics Google Drive Download
15 pages
Training Manual For Data Analytics Using R
No ratings yet
Training Manual For Data Analytics Using R
47 pages
Visual Statistics Use R PDF
No ratings yet
Visual Statistics Use R PDF
388 pages
Visual Statistics Use R!
50% (2)
Visual Statistics Use R!
388 pages
R PROGRAMMING QUESTION BANK Answer
100% (1)
R PROGRAMMING QUESTION BANK Answer
20 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
Computer Interactive Statistics
No ratings yet
Computer Interactive Statistics
102 pages
Using R For Introductory Statistics 1st Edition John Verzani - The full ebook version is just one click away
No ratings yet
Using R For Introductory Statistics 1st Edition John Verzani - The full ebook version is just one click away
46 pages
sep report yash (2)
No ratings yet
sep report yash (2)
33 pages
Unit3__R
No ratings yet
Unit3__R
19 pages
Assignment (4).Module RAmanVerma(22MBA10026)
No ratings yet
Assignment (4).Module RAmanVerma(22MBA10026)
18 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Instant Access to Practical Guide to Cluster Analysis in R Unsupervised Machine Learning Alboukadel Kassambara ebook Full Chapters
100% (2)
Instant Access to Practical Guide to Cluster Analysis in R Unsupervised Machine Learning Alboukadel Kassambara ebook Full Chapters
52 pages
Introduction To R For Social Scientist Preview
No ratings yet
Introduction To R For Social Scientist Preview
26 pages
BA NOTES
No ratings yet
BA NOTES
34 pages
BA End Sem Important (3)
No ratings yet
BA End Sem Important (3)
18 pages
34314
No ratings yet
34314
51 pages
(Ebook) Modern Statistics with R by Måns Thulin download
100% (4)
(Ebook) Modern Statistics with R by Måns Thulin download
83 pages
Computer Interactive Statistics
No ratings yet
Computer Interactive Statistics
103 pages
Research Method Using r
No ratings yet
Research Method Using r
442 pages
4251 Assignment 8
No ratings yet
4251 Assignment 8
15 pages
Advanced Statistical Methods using R Notes
No ratings yet
Advanced Statistical Methods using R Notes
55 pages
Immediate download R in Action 1st Edition Robert Kabacoff ebooks 2024
100% (12)
Immediate download R in Action 1st Edition Robert Kabacoff ebooks 2024
60 pages
unit3_R[1] (1)
No ratings yet
unit3_R[1] (1)
30 pages
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
No ratings yet
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
28 pages
4642
No ratings yet
4642
51 pages
Shipunov Visual Statistics
No ratings yet
Shipunov Visual Statistics
429 pages
2023 Gerunov BusinessAnalyticsR SU
No ratings yet
2023 Gerunov BusinessAnalyticsR SU
107 pages
R Assignment (1)
No ratings yet
R Assignment (1)
22 pages
Theory Questions
No ratings yet
Theory Questions
4 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
Get Practical Guide to Cluster Analysis in R Unsupervised Machine Learning Alboukadel Kassambara free all chapters
100% (1)
Get Practical Guide to Cluster Analysis in R Unsupervised Machine Learning Alboukadel Kassambara free all chapters
55 pages
EDAV
No ratings yet
EDAV
218 pages
Essential R
No ratings yet
Essential R
261 pages
Capital Gains
No ratings yet
Capital Gains
8 pages
STATISTICS
No ratings yet
STATISTICS
6 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson All Chapters Instant Download
100% (1)
Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson All Chapters Instant Download
47 pages
Nm and R- Unit- IV-Q&A
No ratings yet
Nm and R- Unit- IV-Q&A
13 pages
DS_IAT_2_Question_Bank[1] (1)
No ratings yet
DS_IAT_2_Question_Bank[1] (1)
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Computer Chapter 11
No ratings yet
Computer Chapter 11
12 pages
Information Technology Class x Project File 24-25
No ratings yet
Information Technology Class x Project File 24-25
15 pages
UCS310 Latest 2025
No ratings yet
UCS310 Latest 2025
2 pages
Can One Restore RMAN Backups Without A CONTROLFILE and Recovery Catalog?
No ratings yet
Can One Restore RMAN Backups Without A CONTROLFILE and Recovery Catalog?
7 pages
BCIS 4th Semester Syallbus
No ratings yet
BCIS 4th Semester Syallbus
11 pages
Stat 5.8.0 Upgrade Notes
No ratings yet
Stat 5.8.0 Upgrade Notes
37 pages
Week 4 - ER Modelling I
No ratings yet
Week 4 - ER Modelling I
40 pages
Database Design Life Cycle, Database Design Group 29
No ratings yet
Database Design Life Cycle, Database Design Group 29
14 pages
Flight.sql
No ratings yet
Flight.sql
13 pages
CA Assignment Two
No ratings yet
CA Assignment Two
4 pages
Day 1 - 12 A C Ls - 8 - Interface Python With SQL 2024 - 2025
No ratings yet
Day 1 - 12 A C Ls - 8 - Interface Python With SQL 2024 - 2025
10 pages
DCD_Handbook_Final_Version
No ratings yet
DCD_Handbook_Final_Version
109 pages
Server Side
No ratings yet
Server Side
9 pages
Data Mining Question Bank Chapter-1 (Introduction To Data Warehouse and Data Mining) Expected Questions 1 Mark Questions
No ratings yet
Data Mining Question Bank Chapter-1 (Introduction To Data Warehouse and Data Mining) Expected Questions 1 Mark Questions
6 pages
Modal Question Paper
No ratings yet
Modal Question Paper
1 page
DBMS 1
No ratings yet
DBMS 1
22 pages
Azure SQL Course Summary
No ratings yet
Azure SQL Course Summary
5 pages
Ankit Resume
No ratings yet
Ankit Resume
2 pages
Principles of Information: Systems, Tenth Edition
No ratings yet
Principles of Information: Systems, Tenth Edition
61 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
8 pages
Quiz - Google Cloud Skills Boost
100% (1)
Quiz - Google Cloud Skills Boost
1 page
Latihan Pertemuan 1
No ratings yet
Latihan Pertemuan 1
24 pages
CS Project
No ratings yet
CS Project
25 pages
PL - SQL Cursor by Practical Examples
No ratings yet
PL - SQL Cursor by Practical Examples
8 pages
Information Retrieval System
No ratings yet
Information Retrieval System
10 pages
Detail Design Subsystem Design Background and The Dynamic Part
No ratings yet
Detail Design Subsystem Design Background and The Dynamic Part
28 pages
Shreya BigData 3yr
No ratings yet
Shreya BigData 3yr
2 pages
Scrip SQL
No ratings yet
Scrip SQL
7 pages

Data analytics using r unit-3

Uploaded by

Data analytics using r unit-3

Uploaded by

UNIT-3

1.What is big data and explain need of data analytics?

1. Handling Large Datasets: R provides various packages and tools

2.Explain mean, median,standard deviation ,variance,correlation functions in r

The function mean() is used to calculate this in R.

mean(x, trim = 0, na.rm = FALSE, ...)

The middle most value in a data series is called the median.

# Create the vector.

# Find the median.

Standard deviation measures the amount of variation or dispersion in a set of values.

# Create a sample vector

# Create two sample vectors

3.Explain bascic analysis techniques chi-square Test and T-test?

The function used for performing chi-Square test is chisq.test().

observed <- matrix(c(10, 20, 30, 40), nrow = 2, byrow = TRUE)

rownames(observed) <- c("Category 1", "Category 2")

chi_square_result <- chisq.test(observed)

You might also like