Data analytics using r unit-3
Data analytics using r unit-3
1. Volume: Big data involves a large volume of data that exceeds the
processing capacity of conventional database systems. This could be
terabytes, petabytes, or even larger datasets.
2. Variety: Big data comes in various formats, including structured data
(like databases), unstructured data (such as text documents and social
media posts), and semi-structured data (like XML and JSON files).
3. Velocity: Big data is often generated at high speed and needs to be
processed quickly to extract valuable insights in a timely manner. For
example, data streaming from sensors or social media feeds.
4. Veracity: Big data can have quality and accuracy issues due to its
diverse sources and complex nature. Data analytics techniques are
needed to clean, validate, and preprocess the data for analysis.
5. Value: Despite the challenges, big data contains valuable information
that can lead to insights, improvements in decision-making, and
competitive advantages for businesses and organizations.
Now, let's talk about the need for data analytics in R programming,
especially concerning big data:
Mean
Madhusudanacharyulu Padakandla
UNIT-3
It is calculated by taking the sum of the values and dividing with the number of
values in a data series.
Syntax
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x)
print(result.mean)
Median
Standard Deviation:
Madhusudanacharyulu Padakandla
UNIT-3
sd_result
Variance:
Variance is a measure of how spread out the values in a dataset are.
# Calculate variance
variance_result <- var(data)
variance_result
Correlation:
Correlation measures the strength and direction of the linear relationship
between two variables.
For example, we can build a data set with observations on people's ice-cream
buying pattern and try to correlate the gender of a person with the flavor of the
ice-cream they prefer. If a correlation is found we can plan for appropriate stock
of flavors by knowing the number of gender of people visiting.
Syntax:
Example:
Madhusudanacharyulu Padakandla
UNIT-3
print(chi_square_result)
T-test:
In R, the t.test() function is used to perform a t-test, which is a statistical test used
to determine if there is a significant difference between the means of two groups.
# Generate example data
group1 <- c(25, 30, 35, 40, 45)
group2 <- c(20, 22, 25, 28, 30)
t_test_result <- t.test(group1, group2)
print(t_test_result)
Madhusudanacharyulu Padakandla