0% found this document useful (0 votes)

9 views

UNIT II -DA USING R

Uploaded by

Nilavan Nilavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

UNIT II -DA USING R

Uploaded by

Nilavan Nilavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

1

Semester Course Code Title of the Course Hours Credits

V 21UCC53CC10 DATA ANALYSIS USING R

Unit – II (15 Hours)

Aggregating and group processing of variable - Simple analysis using R- Methods for reading
Data - Using R with Databases and Business Intelligence systems.

In this study, we'll explore:

1. Aggregating and Group Processing: Techniques for summarizing data by groups

using functions like aggregate() and dplyr.
2. Simple Analysis: Basic data analysis methods in R, including descriptive statistics
and visualization with ggplot2.
3. Reading Data: Various methods to import data from different sources, including
CSV, Excel, and databases.
4. Using R with Databases: Connecting to and querying databases with R using
packages like DBI.
5. Business Intelligence Integration: Incorporating R with BI tools such as Power BI
and Tableau for advanced reporting and visualization.

In R, "aggregating" and "grouping" refer to processes that summarize and transform data
based on certain criteria. The aggregate function and the dplyr package are commonly used
for these tasks.

Feature aggregate dplyr

Type Base R function Part of the tidyverse package
data %>% group_by(var) %>%
Syntax aggregate(x, by, FUN, ...)
summarise(...)
More readable, uses pipe %>%
Readability Less readable, more verbose
operator
Comprehensive data
Functionality Primarily for aggregation
manipulation
Can be slower for large Generally faster and more
Performance datasets efficient
Standalone, less integrated Integrates well with other
Integration with other packages tidyverse packages
Grouping specified in by Grouping specified in
Grouping argument group_by()
Uses summarise() with various
Summarization Uses FUN to summarize
functions
Pipeline Support No native support Native support with %>%
Complex Limited to simple Supports complex operations
Operations aggregation and transformations

Default Output Data frame Data frame

Requires more setup and
Ease of Use Intuitive and user-friendly
configuration
Feature aggregate dplyr
2

aggregate Function

The aggregate function in R is used to compute summary statistics of data, such as sums,
means, and more, for subsets of the data grouped by one or more variables.

Syntax

aggregate(x, by, FUN)

 x: A data frame or a numeric matrix.

 by: A list of grouping elements.
 FUN: The function to apply to the grouped data.

Example with iris Dataset

The iris dataset is a built-in dataset in R that contains measurements of different species of
iris flowers.

# Load iris dataset

data(iris)

# Aggregate the data: calculate mean of Sepal.Length for each Species

agg_result <- aggregate(iris$Sepal.Length, by = list(Species =
iris$Species), FUN = mean)

# Print the result

print(agg_result)

Output
Species x
1 setosa 5.006
2 versicolor 5.936
3 virginica 6.588

Using dplyr for Grouping and Aggregation

The dplyr package provides more readable and powerful functions for grouping and
aggregation.

Syntax

library(dplyr)

data %>%
group_by(grouping_variable) %>%
summarise(
summary_variable1 = FUN1(target_variable),
summary_variable2 = FUN2(target_variable)
)

Example with iris Dataset

# Load dplyr package

library(dplyr)

# Group by Species and calculate mean of Sepal.Length

agg_df <- iris %>%
group_by(Species) %>%
summarise(
Mean_Sepal_Length = mean(Sepal.Length),
Mean_Sepal_Width = mean(Sepal.Width)
)

# Print the result

print(agg_df)

NOTE:

The %>% operator is the pipe operator from the magrittr package (often used in dplyr). It
passes the iris dataset to the next function.

Output
# A tibble: 3 × 3
Species Mean_Sepal_Length Mean_Sepal_Width
<fct> <dbl> <dbl>
1 setosa 5.01 3.43
2 versicolor 5.94 2.77
3 virginica 6.59 2.97

Explanation

1. Loading Data: The iris dataset is loaded.

2. Using aggregate:
o aggregate(iris$Sepal.Length, by = list(Species = iris$Species),
FUN = mean): This computes the mean of Sepal.Length for each Species in
the iris dataset.
3. Using dplyr:
o group_by(Species): Groups the data by Species.
o summarise(Mean_Sepal_Length = mean(Sepal.Length),
Mean_Sepal_Width = mean(Sepal.Width)): Calculates the mean of
Sepal.Length and Sepal.Width for each group of Species.

These examples demonstrate how to perform aggregation and grouping in R using both base
R functions and the dplyr package. The dplyr package is often preferred for its readability
and ease of use.

In the context of the aggregate function in R, FUN stands for "function." It specifies the
function to be applied to the grouped subsets of data to calculate the summary statistic.

FUN:

When you use the aggregate function, you're typically interested in summarizing the data in
some way. The FUN argument allows you to specify what kind of summary you want, such
as the mean, sum, maximum, minimum, etc. Without specifying FUN, the aggregate
function wouldn't know how to summarize the data.
4

Syntax of aggregate
aggregate(x, by, FUN)

 x: The data frame or numeric matrix to be summarized.

 by: A list of grouping elements.
 FUN: The function to apply to each subset of the data.

Example with iris Dataset

Here’s an example using the iris dataset, where we calculate the mean of Sepal.Length for
each species:

# Load iris dataset

data(iris)

# Aggregate the data: calculate mean of Sepal.Length for each Species

agg_result <- aggregate(iris$Sepal.Length, by = list(Species =
iris$Species), FUN = mean)

# Print the result

print(agg_result)

Output
Species x
1 setosa 5.006
2 versicolor 5.936
3 virginica 6.588

Explanation

1. Data: iris$Sepal.Length is the numeric data we want to summarize.

2. Grouping: by = list(Species = iris$Species) groups the data by the Species
column.
3. Function: FUN = mean specifies that we want to calculate the mean of Sepal.Length
for each species.

Different Functions with FUN

You can use various functions with FUN to get different types of summaries. Here are some
examples:

 Sum:

agg_sum <- aggregate(iris$Sepal.Length, by = list(Species =

iris$Species), FUN = sum)
print(agg_sum)

 Maximum:

agg_max <- aggregate(iris$Sepal.Length, by = list(Species =

iris$Species), FUN = max)
5

print(agg_max)

 Minimum:

agg_min <- aggregate(iris$Sepal.Length, by = list(Species =

iris$Species), FUN = min)
print(agg_min)

 Standard Deviation:

agg_sd <- aggregate(iris$Sepal.Length, by = list(Species =

iris$Species), FUN = sd)
print(agg_sd)

By changing the function specified in FUN, you can easily compute different summary
statistics for your grouped data.

SIMPLE ANALYSIS USING R

“Analysis" refers to the detailed examination and evaluation of the elements or

structure of something, typically as a basis for discussion or interpretation. In the context of
data analysis, it involves systematically applying statistical and logical techniques to
describe, illustrate, condense, and evaluate data. The goal of analysis is to extract useful
information, derive conclusions, and support decision-making.

Key Steps in Data Analysis

1. Data Collection: Gathering the necessary data from various sources.

2. Data Cleaning: Removing or correcting any errors, inconsistencies, or missing values
in the data.
3. Data Exploration: Examining the data to understand its structure, patterns, and
relationships.
4. Data Transformation: Manipulating the data into a suitable format for analysis,
including aggregation and normalization.
5. Data Modeling: Applying statistical models or algorithms to analyze the data.
6. Data Interpretation: Drawing conclusions and insights from the analysis results.
7. Reporting: Communicating the findings through reports, visualizations, or
presentations.

Step-by-Step Analysis

1. Load the Data

The iris dataset is a built-in dataset in R. It contains 150 observations of iris flowers, with
measurements for sepal length, sepal width, petal length, and petal width, along with the
species of the iris flower.

# Load the iris dataset

data(iris)
6

# Display the first few rows of the dataset

head(iris)

Output
mathematica
Copy code
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

2. Exploratory Data Analysis (EDA)

Before performing any aggregation, it’s important to understand the data by exploring its
basic properties.

# Summary statistics for the dataset

summary(iris)

Output
Sepal.Length Sepal.Width Petal.Length Petal.Width
Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa
:50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica
:50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500

This output gives a quick overview of the dataset, including the minimum, first quartile,
median, mean, third quartile, and maximum values for each numerical variable, as well as the
counts for each species.

3. Aggregation Using aggregate

We can use the aggregate function to calculate summary statistics for different groups
within the data. Here, we'll calculate the mean sepal length and sepal width for each species.

# Aggregate the data to calculate mean Sepal.Length and Sepal.Width for

each Species
agg_result <- aggregate(. ~ Species, data = iris, FUN = mean, na.rm = TRUE)

# Print the aggregated result

print(agg_result)

Output
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
7

1 setosa 5.006 3.428 1.462 0.246

2 versicolor 5.936 2.770 4.260 1.326
3 virginica 6.588 2.974 5.552 2.026

Explanation

 aggregate(. ~ Species, data = iris, FUN = mean, na.rm = TRUE): This line
uses the aggregate function to calculate the mean of all numerical variables
(Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width) grouped by
Species. The na.rm = TRUE argument ensures that any missing values are ignored in
the calculation.

4. Aggregation Using dplyr

The dplyr package provides more readable and flexible functions for data manipulation and
aggregation. We will achieve the same aggregation using dplyr.

# Load dplyr package

library(dplyr)

# Group by Species and calculate mean of Sepal.Length and Sepal.Width

agg_df <- iris %>%
group_by(Species) %>%
summarise(
Mean_Sepal_Length = mean(Sepal.Length, na.rm = TRUE),
Mean_Sepal_Width = mean(Sepal.Width, na.rm = TRUE),
Mean_Petal_Length = mean(Petal.Length, na.rm = TRUE),
Mean_Petal_Width = mean(Petal.Width, na.rm = TRUE)
)

# Print the aggregated result

print(agg_df)

Output
# A tibble: 3 × 5
Species Mean_Sepal_Length Mean_Sepal_Width Mean_Petal_Length
Mean_Petal_Width
<fct> <dbl> <dbl> <dbl>
<dbl>
1 setosa 5.01 3.43 1.46
0.246
2 versicolor 5.94 2.77 4.26
1.33
3 virginica 6.59 2.97 5.55
2.03

Explanation

 group_by(Species): This function groups the data by the Species column.

 summarise(...): This function summarizes each group by calculating the mean of
Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width, ignoring any
missing values.

Conclusion
8

We have performed a simple analysis of the iris dataset in R, including loading the data,
conducting basic EDA, and summarizing the data using both the aggregate function and the
dplyr package. This analysis helps us understand the average measurements of different iris
species, providing insights into the dataset's structure.

Example of Data Analysis: The iris Dataset

Let's walk through a simple data analysis example using the iris dataset in R. The iris
dataset contains measurements of sepal length, sepal width, petal length, and petal width for
three species of iris flowers.

1. Data Collection

In this case, the iris dataset is preloaded in R.

# Load iris dataset

data(iris)

2. Data Cleaning

Check for missing values and handle them if necessary.

# Check for missing values

sum(is.na(iris))

If there are missing values, you could handle them like this:

# Remove rows with missing values

iris_clean <- na.omit(iris)

3. Data Exploration

Explore the dataset to understand its structure and summary statistics.

# Display the structure of the dataset

str(iris)

# Summary statistics
summary(iris)

4. Data Transformation

Group the data by species and calculate the mean of each measurement.

# Load dplyr package

library(dplyr)

# Group by Species and calculate mean of Sepal.Length and Sepal.Width

agg_df <- iris %>%
group_by(Species) %>%
summarise(
Mean_Sepal_Length = mean(Sepal.Length, na.rm = TRUE),
9

Mean_Sepal_Width = mean(Sepal.Width, na.rm = TRUE),

Mean_Petal_Length = mean(Petal.Length, na.rm = TRUE),
Mean_Petal_Width = mean(Petal.Width, na.rm = TRUE)
)

# Print the aggregated result

print(agg_df)

5. Data Modeling

For this simple example, we won't apply a complex model, but we could proceed with
various statistical or machine learning models to analyze relationships and patterns further.

6. Data Interpretation

Interpret the results from the aggregation step.

# Interpretation of the results

# The means of the measurements for each species give us an idea of the
typical size and shape of the iris flowers in each species.

7. Reporting

Create visualizations to communicate the findings.

# Load ggplot2 package

library(ggplot2)

# Create a boxplot of Sepal.Length by Species

ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot() +
labs(title = "Sepal Length by Species", x = "Species", y = "Sepal
Length")

# Print the plot

Conclusion

Analysis involves breaking down data into meaningful patterns and insights through
systematic steps. By following these steps, you can gain a deeper understanding of your data
and make informed decisions based on your findings.

NOTE:

The attributes such as minimum, first quartile, median, mean, third quartile, and maximum
values, along with counts for each category, are fundamental descriptive statistics. They
provide essential insights into the distribution and central tendency of numerical data. Here
are the practical uses of each attribute:

Descriptive Statistics and Their Practical Uses

1. Minimum Value
o Definition: The smallest value in the dataset.
10

o Use: Identifies the lower bound of the data range. It's useful for understanding
the lowest extreme and for detecting outliers or unusual values.
2. First Quartile (Q1)
o Definition: The value below which 25% of the data falls.
o Use: Helps to understand the lower 25% of the data distribution. It's useful in
identifying the spread of the lower portion of the dataset and is also a
component of the interquartile range (IQR), which measures statistical
dispersion.
3. Median (Q2)
o Definition: The middle value of the dataset when sorted in ascending order.
o Use: Represents the central tendency of the data, less affected by outliers and
skewed data compared to the mean. It's used to understand the typical value in
the dataset.
4. Mean
o Definition: The average of all data points.
o Use: Represents the central tendency but can be influenced by outliers. It's
useful for calculating the expected value and in various statistical analyses.
5. Third Quartile (Q3)
o Definition: The value below which 75% of the data falls.
o Use: Helps to understand the upper 25% of the data distribution. Like Q1, it's
a component of the interquartile range (IQR).
6. Maximum Value
o Definition: The largest value in the dataset.
o Use: Identifies the upper bound of the data range. It's useful for understanding
the highest extreme and for detecting outliers or unusual values.
7. Counts for Each Category
o Definition: The number of occurrences of each category (e.g., species in the
iris dataset).
o Use: Useful for understanding the distribution of categorical data, comparing
the frequency of different categories, and ensuring that each category is
adequately represented in the analysis.

Practical Examples

Let's consider practical scenarios where these descriptive statistics are useful:

1. Business Analytics

 Sales Analysis: Understanding the minimum, maximum, and quartiles of daily sales
can help a business manage inventory, identify sales trends, and detect anomalies.
 Customer Feedback: Median customer satisfaction scores provide a robust measure
of central tendency, helping businesses understand typical customer sentiment without
being skewed by outliers.

2. Healthcare

 Patient Data: Analyzing the mean, median, and quartiles of patient wait times can
help in resource allocation and improving service efficiency.
 Blood Pressure Levels: Understanding the distribution of blood pressure readings
(minimum, Q1, median, Q3, maximum) can aid in identifying at-risk patients and
tailoring medical interventions.
11

3. Education

 Student Scores: Teachers can use the quartiles and median scores to understand the
distribution of student performance and identify students who may need additional
support.
 Class Participation: Counts of class participation by different groups (e.g., gender,
grade level) help in assessing engagement and inclusivity.

4. Real Estate

 Property Prices: Real estate agents can use the descriptive statistics of property
prices in a neighborhood to advise clients on buying and selling decisions.
 Rental Rates: Understanding the distribution of rental rates helps in setting
competitive prices and identifying market trends.

Example Using the iris Dataset in R

Let’s compute these statistics for the iris dataset in R.

# Load necessary library

library(dplyr)

# Summary statistics for the iris dataset

summary_stats <- iris %>%
group_by(Species) %>%
summarise(
Min_Sepal_Length = min(Sepal.Length),
Q1_Sepal_Length = quantile(Sepal.Length, 0.25),
Median_Sepal_Length = median(Sepal.Length),
Mean_Sepal_Length = mean(Sepal.Length),
Q3_Sepal_Length = quantile(Sepal.Length, 0.75),
Max_Sepal_Length = max(Sepal.Length),
Count = n()
)

# Print the summary statistics

print(summary_stats)

Output
# A tibble: 3 × 8
Species Min_Sepal_Length Q1_Sepal_Length Median_Sepal_Length
Mean_Sepal_Length Q3_Sepal_Length Max_Sepal_Length Count
<fct> <dbl> <dbl> <dbl>
<dbl> <dbl> <dbl> <int>
1 setosa 4.3 4.8 5.0
5.01 5.2 5.8 50
2 versicolor 4.9 5.6 5.9
5.94 6.3 7.0 50
3 virginica 4.9 6.2 6.5
6.59 6.9 7.9 50
Methods for reading Data
Reading data refers to the process of importing or loading data from various sources
into a programming environment or software for analysis. In R, there are multiple methods
12

for reading data depending on the source and format of the data. Below are the methods,
along with their meanings and definitions:

1. Reading Data from CSV Files

read.csv()

 Meaning: Reads comma-separated values (CSV) files.

 Definition: A function in base R to import data from CSV files.

data <- read.csv("path/to/your/file.csv")

readr::read_csv()

 Meaning: Reads CSV files more efficiently than read.csv().

 Definition: A function from the readr package designed for fast and easy reading of
CSV files.

library(readr)
data <- read_csv("path/to/your/file.csv")

2. Reading Data from Excel Files

readxl::read_excel()

 Meaning: Reads data from Excel (.xls or .xlsx) files.

 Definition: A function from the readxl package to import data from Excel
spreadsheets.

library(readxl)
data <- read_excel("path/to/your/file.xlsx", sheet = "Sheet1")

3. Reading Data from Text Files

read.table()

 Meaning: Reads data from text files with specified delimiters.

 Definition: A function in base R for reading tabular data from text files.

data <- read.table("path/to/your/file.txt", header = TRUE, sep = "\t")

readr::read_table()

 Meaning: Reads delimited text files.

 Definition: A function from the readr package for reading delimited text files
efficiently.

library(readr)
data <- read_table("path/to/your/file.txt")

4. Reading Data from JSON Files

jsonlite::fromJSON()

 Meaning: Reads data from JSON (JavaScript Object Notation) files.

 Definition: A function from the jsonlite package to parse and import JSON data.

library(jsonlite)
data <- fromJSON("path/to/your/file.json")

5. Reading Data from Databases

DBI::dbConnect() and DBI::dbGetQuery()

 Meaning: Connects to databases and retrieves data.

 Definition: Functions from the DBI package used to connect to a database and run
SQL queries to import data.

library(DBI)
con <- dbConnect(RSQLite::SQLite(), "path/to/your/database.sqlite")
data <- dbGetQuery(con, "SELECT * FROM your_table")
dbDisconnect(con)

6. Reading Data from Web APIs

httr::GET()

 Meaning: Makes HTTP requests to web APIs to retrieve data.

 Definition: A function from the httr package to send GET requests to web APIs and
fetch data.

library(httr)
response <- GET("https://api.example.com/data")
data <- content(response, "parsed")

7. Reading Data from RDS Files

readRDS()

 Meaning: Reads R serialized data files.

 Definition: A base R function to read data from RDS files, which store R objects in a
binary format.

data <- readRDS("path/to/your/file.rds")

8. Reading Data from SAS, SPSS, and Stata Files

haven::read_sas(), haven::read_sav(), haven::read_dta()

 Meaning: Reads data from SAS, SPSS, and Stata files.

 Definition: Functions from the haven package to import data from proprietary
statistical software file formats.

library(haven)
data_sas <- read_sas("path/to/your/file.sas7bdat")
14

data_spss <- read_sav("path/to/your/file.sav")

data_stata <- read_dta("path/to/your/file.dta")

Summary

Each of these methods allows you to import data into R from different file formats and
sources. Understanding these methods and their specific functions helps you efficiently load
and manage data for analysis.

Using R with Databases and Business Intelligence systems

Using R with databases and business intelligence (BI) systems enables powerful data
manipulation, analysis, and visualization capabilities. R can connect to various databases,
perform complex queries, and integrate with BI systems for reporting and dashboarding.
Here's a comprehensive guide on how to leverage R in these contexts.

Connecting to Databases with R

1. Setting Up Database Connections

R can connect to several types of databases, such as SQLite, MySQL, PostgreSQL, SQL
Server, and Oracle, using the DBI package along with database-specific packages like
RSQLite, RMySQL, RPostgres, odbc, etc.

Example: Connecting to an SQLite Database

# Load necessary packages
library(DBI)
library(RSQLite)

# Establish a connection to the SQLite database

con <- dbConnect(RSQLite::SQLite(), "path/to/your/database.sqlite")

# List tables in the database

tables <- dbListTables(con)
print(tables)

# Query data from a table

data <- dbGetQuery(con, "SELECT * FROM your_table")

# Disconnect from the database

dbDisconnect(con)
Example: Connecting to a MySQL Database
# Load necessary packages
library(DBI)
library(RMySQL)

# Establish a connection to the MySQL database

con <- dbConnect(RMySQL::MySQL(),
dbname = "your_database_name",
host = "your_host",
port = 3306,
user = "your_username",
password = "your_password")
15

# List tables in the database

tables <- dbListTables(con)
print(tables)

# Query data from a table

data <- dbGetQuery(con, "SELECT * FROM your_table")

# Disconnect from the database

dbDisconnect(con)
Example: Connecting to a PostgreSQL Database
# Load necessary packages
library(DBI)
library(RPostgres)

# Establish a connection to the PostgreSQL database

con <- dbConnect(RPostgres::Postgres(),
dbname = "your_database_name",
host = "your_host",
port = 5432,
user = "your_username",
password = "your_password")

# List tables in the database

tables <- dbListTables(con)
print(tables)

# Query data from a table

data <- dbGetQuery(con, "SELECT * FROM your_table")

# Disconnect from the database

dbDisconnect(con)

Performing Data Analysis and Visualization

Once the data is loaded into R, you can perform various data manipulation, analysis, and
visualization tasks using packages such as dplyr, ggplot2, and others.

Data Manipulation with dplyr

# Load necessary package

library(dplyr)

# Example data manipulation

data_summary <- data %>%
group_by(category_column) %>%
summarise(
mean_value = mean(numeric_column, na.rm = TRUE),
max_value = max(numeric_column, na.rm = TRUE)
)

# View the summary

print(data_summary)

Data Visualization with ggplot2

# Load necessary package

library(ggplot2)
16

# Example data visualization

ggplot(data, aes(x = category_column, y = numeric_column, fill =
category_column)) +
geom_bar(stat = "identity") +
labs(title = "Example Bar Plot", x = "Category", y = "Value")

Integrating R with Business Intelligence Systems

R can be integrated with various BI systems for enhanced reporting and dash boarding
capabilities. This integration allows for complex analytics within BI platforms.

1. Using R with Microsoft Power BI

Power BI supports R scripts for data transformation and visualization. Here's an example of
using R in Power BI:

1. Load Data: Import data into Power BI.

2. Add R Script: In the Power Query Editor, select "Transform" > "Run R Script".
3. Write R Script: Enter your R script to manipulate or visualize the data.

Example R Script in Power BI:

# Load necessary package

library(ggplot2)

# Create a plot
ggplot(dataset, aes(x = category_column, y = numeric_column)) +
geom_point() +
labs(title = "Scatter Plot", x = "Category", y = "Value")

2. Using R with Tableau

Tableau supports R integration through the Rserve package. Here’s how to set it up:

1. Install Rserve: Run the following commands in R:

install.packages("Rserve")
library(Rserve)
Rserve()

2. Connect Tableau to R:
o In Tableau, go to "Help" > "Settings and Performance" > "Manage External Service
Connection".
o Choose "Rserve" and enter the server details.
3. Use R Scripts in Tableau:
o Create calculated fields using R scripts.
o Example: SCRIPT_REAL("mean(.arg1)", SUM([numeric_column]))

Example Workflow

Combining these steps into a workflow, you can connect to a database, perform data
manipulation and visualization in R, and then integrate the results into a BI system for
reporting.
17

1. Connect to Database:

library(DBI)
library(RMySQL)

con <- dbConnect(RMySQL::MySQL(),

dbname = "your_database_name",
host = "your_host",
port = 3306,
user = "your_username",
password = "your_password")
data <- dbGetQuery(con, "SELECT * FROM your_table")
dbDisconnect(con)

2. Data Manipulation:

library(dplyr)

data_summary <- data %>%

group_by(category_column) %>%
summarise(
mean_value = mean(numeric_column, na.rm = TRUE),
max_value = max(numeric_column, na.rm = TRUE)
)

3. Data Visualization:

library(ggplot2)

ggplot(data_summary, aes(x = category_column, y = mean_value)) +

geom_bar(stat = "identity") +
labs(title = "Mean Value by Category", x = "Category", y = "Mean
Value")

4. Integrate with BI System:

o Export the visualization as an image or data file.
o Import the image or data into Power BI or Tableau.
o Use R scripts within the BI tool for additional analysis or visualization.

Conclusion

Using R with databases and business intelligence systems enhances the ability to
perform advanced analytics and create insightful visualizations.

This integration allows for efficient data management, complex statistical analysis,
and seamless reporting, making it a powerful combination for data-driven decision-making.
18

Plane and Solid Geometry Formulas
71% (7)
Plane and Solid Geometry Formulas
2 pages
Unit 2
No ratings yet
Unit 2
32 pages
Dplyr Cheatsheet PDF
100% (1)
Dplyr Cheatsheet PDF
2 pages
Practical Machine Learning: Learn how to build Machine Learning applications to solve real-world data analysis challenges with this Machine Learning book – packed with practical tutorials
From Everand
Practical Machine Learning: Learn how to build Machine Learning applications to solve real-world data analysis challenges with this Machine Learning book – packed with practical tutorials
Sunila Gollapudi
3/5 (2)
A Century of Psychology As Science
No ratings yet
A Century of Psychology As Science
1,032 pages
Institutional Framework For Environment and Natural Resources Planning
No ratings yet
Institutional Framework For Environment and Natural Resources Planning
24 pages
Partial Differential Equations II: 2D Laplace Equation On 5x5 Grid
No ratings yet
Partial Differential Equations II: 2D Laplace Equation On 5x5 Grid
26 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
Summarizing Data
No ratings yet
Summarizing Data
13 pages
Chapter 3 _STAT1204..
No ratings yet
Chapter 3 _STAT1204..
10 pages
R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF
No ratings yet
R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF
1 page
R Programs
No ratings yet
R Programs
30 pages
DSR LAB MANUAL - 10 programs
No ratings yet
DSR LAB MANUAL - 10 programs
34 pages
lab week2-3
No ratings yet
lab week2-3
26 pages
Introduction To R
No ratings yet
Introduction To R
11 pages
Lab 1- Basic functions in R and plotting
No ratings yet
Lab 1- Basic functions in R and plotting
8 pages
DS Lab
No ratings yet
DS Lab
31 pages
Unit-Iv Bdaur-Bcom
No ratings yet
Unit-Iv Bdaur-Bcom
9 pages
R
No ratings yet
R
13 pages
Plot Library Handouts
No ratings yet
Plot Library Handouts
6 pages
Data Transformation With Dplyr - Cheatsheet
100% (1)
Data Transformation With Dplyr - Cheatsheet
2 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
Unit 3
No ratings yet
Unit 3
11 pages
DAL 371 SLID 13 Functions
No ratings yet
DAL 371 SLID 13 Functions
48 pages
R Language - Experiment 1 (21-01-25)
No ratings yet
R Language - Experiment 1 (21-01-25)
8 pages
R Examples
No ratings yet
R Examples
56 pages
Statdescr
No ratings yet
Statdescr
23 pages
STA 272 Chapter 02 Notes and Codes Data Frames in R
No ratings yet
STA 272 Chapter 02 Notes and Codes Data Frames in R
5 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
Lecture 7 - Integrated Analysis With R
No ratings yet
Lecture 7 - Integrated Analysis With R
79 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
ECON 1100 R04 - R.Commands PDF
No ratings yet
ECON 1100 R04 - R.Commands PDF
15 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
Data Transformation Cheatsheet
No ratings yet
Data Transformation Cheatsheet
2 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
R Imp Funtions
No ratings yet
R Imp Funtions
10 pages
CH 3
No ratings yet
CH 3
33 pages
R Programming
No ratings yet
R Programming
4 pages
r Module 5
No ratings yet
r Module 5
21 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
DAL 371 SLID 12 SummarizingData
No ratings yet
DAL 371 SLID 12 SummarizingData
35 pages
Data Transformacion Rstudio
No ratings yet
Data Transformacion Rstudio
2 pages
Data Visualisation Slides 1-6
No ratings yet
Data Visualisation Slides 1-6
318 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
Introduction To R
No ratings yet
Introduction To R
34 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Merging and Importing Data Additionalmaterial
No ratings yet
Merging and Importing Data Additionalmaterial
2 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
Introduction To Analytics Unit II PDF
No ratings yet
Introduction To Analytics Unit II PDF
20 pages
Course Title: Introduction To R in Business Applications
No ratings yet
Course Title: Introduction To R in Business Applications
19 pages
R
No ratings yet
R
38 pages
Tutorial 1 - R Programming
No ratings yet
Tutorial 1 - R Programming
40 pages
Exercise - Commands in Blue, Comments in Green, Outputs in Black
No ratings yet
Exercise - Commands in Blue, Comments in Green, Outputs in Black
4 pages
Plyr Package in R Programming
No ratings yet
Plyr Package in R Programming
9 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Functional Python Programming
From Everand
Functional Python Programming
Steven Lott
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
From Everand
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
Younes Hamdani
No ratings yet
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
account_statement (1)
No ratings yet
account_statement (1)
1 page
Educational Statistics EDU 408
No ratings yet
Educational Statistics EDU 408
4 pages
IB CHEM TR 10.1 Worksheet
No ratings yet
IB CHEM TR 10.1 Worksheet
3 pages
Sail MRRP TMT For July 2022
100% (1)
Sail MRRP TMT For July 2022
1 page
13sept_2ndd_12157e4ff57cbf35dcaee7bb6b7d92c0
No ratings yet
13sept_2ndd_12157e4ff57cbf35dcaee7bb6b7d92c0
36 pages
Gerber Hui Kuo Crowdfunding
No ratings yet
Gerber Hui Kuo Crowdfunding
28 pages
Seminar Report On Mobile IP
100% (10)
Seminar Report On Mobile IP
30 pages
Nurul Afiqah Farahana
No ratings yet
Nurul Afiqah Farahana
4 pages
Final Project Ogl340d Journal
No ratings yet
Final Project Ogl340d Journal
9 pages
Simulink
No ratings yet
Simulink
21 pages
The Wide World of Sports and Entertainment
No ratings yet
The Wide World of Sports and Entertainment
63 pages
Complete Grounding Solution: New Installation Back of Rack Shown
No ratings yet
Complete Grounding Solution: New Installation Back of Rack Shown
2 pages
Annual Security Refresher Briefing JUNE 2024
No ratings yet
Annual Security Refresher Briefing JUNE 2024
20 pages
SNEHA's Resume
No ratings yet
SNEHA's Resume
1 page
Internship Report
100% (4)
Internship Report
52 pages
Dragon Quest 8 Equipment and Item FAQ
No ratings yet
Dragon Quest 8 Equipment and Item FAQ
46 pages
Introduction of International Business Environment
No ratings yet
Introduction of International Business Environment
29 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
28 pages
Jurnal Rio Setiawan, 2017 2022
No ratings yet
Jurnal Rio Setiawan, 2017 2022
10 pages
DR Narendra Kumar PR
No ratings yet
DR Narendra Kumar PR
2 pages
q111_deck_general
No ratings yet
q111_deck_general
21 pages
Pro Sheet Metal
No ratings yet
Pro Sheet Metal
15 pages
6511131678-A01-001 (Pep)
No ratings yet
6511131678-A01-001 (Pep)
15 pages
Assignment: Types of Architectural Competitions
No ratings yet
Assignment: Types of Architectural Competitions
5 pages
CHA 2401 - Fundamentals of Nanoscience and Nanotechnology Lecture 3-29-09-2023
No ratings yet
CHA 2401 - Fundamentals of Nanoscience and Nanotechnology Lecture 3-29-09-2023
8 pages
Auidolingualism
No ratings yet
Auidolingualism
6 pages