(Tutorial) The 10 Most Important Packages in R For Data Science - DataCamp
(Tutorial) The 10 Most Important Packages in R For Data Science - DataCamp
Olivia Smith
August 30th, 2020
R PROGRAMMING
R is the most popular language for Data Science. There are many packages and libraries
provided for doing different tasks. For example, there is dplyr and data.table for data
manipulation, whereas libraries like ggplot2 for data visualization and data cleaning
library like tidyr . Also, there is a library like 'Shiny' to create a Web application and
knitr for the Report generation where nally mlr3 , xgboost , and caret are used in
Machine Learning.
1. ggplot2
ggplot2 is based on the 'Grammar of Graphics", which is a popular data visualization
library. Graphs with one variable, two variables, and three variables, along with both
categorical and numerical data, can be built. Also, grouping can be done through symbol,
https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science 1/8
1/1/2021 (Tutorial) The 10 Most Important Packages in R for Data Science - DataCamp
size, color, etc. The interactive graphics can be made with the help of plot.ly , where
the 3D image should be made from plot3D .
You can easily install the package ggplot2 in R's console as seen below:
install.packages("ggplot2")
You can easily load the package ggplot2 by using the following syntax:
library(ggplot2)
The following tutorials on DataCamp provide much detailed knowledge about 'ggplot2'.
2. data.table
data.table is the fastest package that can handle a vast amount of data during data
manipulation. It is mostly used for health care domains for genomic data and elds like
business for predictive analytics. Also, the data size ranges from more than 10 GB to
100GB.
You can easily install the package data.table in R's console as seen below:
install.packages("data.table")
library(data.table)
https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science 2/8
1/1/2021 (Tutorial) The 10 Most Important Packages in R for Data Science - DataCamp
3. dplyr
dplyr is the package which is used for data manipulation by providing different sets of
verbs like select() , arrange() , filter() , summarise() , and mutate() . It can also
work with computational backends like dplyr , sparklyr , and dtplyr .
1. You can install dplyr through using the tidyverse package, which will come with
the package dplyr .
install.packages("tidyverse")
install.packages("dplyr")
library(dplyr)
The following tutorial and course in DataCamp provide detailed knowledge of dplyr .
4. tidyr
tidyr helps to create tidy data. The signi cant amount of work mostly goes on when
cleaning and tidying the data. Basically, tidy data consists of those datasets where every
cell acts as a single value, where every row is an observation, and every column is
variable.
https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science 3/8
1/1/2021 (Tutorial) The 10 Most Important Packages in R for Data Science - DataCamp
install.packages("tidyr")
library(tidyr)
The following tutorial in DataCamp provides detailed knowledge in tidyr . Cleaning Data
in R
5. Shiny
Shiny can be used to build the web application without requiring JavaScript. It can be
used together with htmlwidgets, JavaScript actions, and CSS themes to have extended
features. Also, it can be used to build dashboards along with the standalone web
applications.
install.packages("shiny")
library(shiny)
You can visit the link mentioned below to learn more about Shiny .
Shiny Fundamentals with R
6. plotly
plotly is the graphing library used to create graphs that are interactive and can also be
used with JavaScript known as plotly.js .
install.packages("plotly")
https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science 4/8
1/1/2021 (Tutorial) The 10 Most Important Packages in R for Data Science - DataCamp
library(plotly)
You can visit the link mentioned below to learn more about plotly .
Intermediate Interactive Data Visualization with plotly in R
7. knitr
knitr is the package mostly used for research. It is reproducible, used for report
creation, and integrates with various types of code structures like LaTeX, HTML,
Markdown, LyX, etc. It was inspired by Sweave and has extended the features by adding
lots of packages like a weaver, animation, cacheSweave, etc.
install.packages("knitr")
library(knitr)
You can visit the link mentioned below to learn more about knitr .
Reporting with R Markdown
8. mlr3
mlr3 package is created for doing Machine Learning. It is also ef cient, which supports
Object-Oriented programming where 'R6' objects are being provided along with machine
learning work ow. It is also seen as one of the extensible frameworks for clustering,
regression, classi cation, and survival analysis.
install.packages("mlr3")
https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science 5/8
1/1/2021 (Tutorial) The 10 Most Important Packages in R for Data Science - DataCamp
library(mlr3)
You can visit the link mentioned below to learn more about mlr3 .
mlr3Book
9. XGBoost
XGBoost is an implementation of the gradient boosting framework. It also provides an
interface for R where the model in R's caret package is also present. Its speed and
performance are faster than the implementation in H20, Spark, and Python. This
package's primary use case is for machine learning tasks like classi cation, ranking
problems, and regression.
install.packages('xgboost')
library(xgboost)
You can visit the link mentioned below to learn more about XGBoost .
Extreme Gradient Boosting with XGBoost
10. Caret
A caret package is a short form of Classi cation And Regression Training used for
predictive modeling where it provides the tools for the following process.
2. Data splitting: Splitting the training data into two similar categorical data sets is done.
https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science 6/8
1/1/2021 (Tutorial) The 10 Most Important Packages in R for Data Science - DataCamp
3. Feature selection: Techniques which is most suitable like Recursive Feature selection
can be used.
4. Training Model: caret provides many packages for machine learning algorithms.
5. Resampling for model tuning: The model can be tuned using repeated k-fold, k-fold,
etc. Also, the parameter can be tuned using 'tuneLength.'
6. Variable importance estimation: vlamp() can be used for any model to access the
variable importance estimation.
install.packages('caret')
library(caret)
You can visit the link mentioned below to learn more about caret from the author "Max
Kuhn".
Machine Learning with caret in R
Congratulations
Congratulations, you have made it to the end of this tutorial!
In this tutorial, you've learned about different packages in R used for the Data Science
process. This tutorial focused on installation, loading, and nally, getting the resources to
DataCamp for learning about these packages.
https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science 7/8
1/1/2021 (Tutorial) The 10 Most Important Packages in R for Data Science - DataCamp
0
13
Subscribe to RSS
https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science 8/8