Immediate download Applied Analytics through Case Studies Using SAS and R: Implementing Predictive Models and Machine Learning Techniques Deepti Gupta ebooks 2024
Immediate download Applied Analytics through Case Studies Using SAS and R: Implementing Predictive Models and Machine Learning Techniques Deepti Gupta ebooks 2024
com
https://textbookfull.com/product/applied-
analytics-through-case-studies-using-sas-and-r-
implementing-predictive-models-and-machine-
learning-techniques-deepti-gupta/
https://textbookfull.com/product/machine-learning-with-r-cookbook-
second-edition-analyze-data-and-build-predictive-models-bhatia/
textbookfull.com
https://textbookfull.com/product/predictive-analytics-with-microsoft-
azure-machine-learning-barga/
textbookfull.com
https://textbookfull.com/product/invest-in-asean-countries-analysis-
and-treaties-lorenzo-riccardi/
textbookfull.com
Public Health Mini Guides Obesity 1 Pap/Psc Edition Nick
Townsend And Angela Scriven (Eds.)
https://textbookfull.com/product/public-health-mini-guides-
obesity-1-pap-psc-edition-nick-townsend-and-angela-scriven-eds/
textbookfull.com
https://textbookfull.com/product/new-york-school-of-interior-design-
home-ellen-s-fisher/
textbookfull.com
https://textbookfull.com/product/key-to-the-new-world-a-history-of-
early-colonial-cuba-luis-martinez-fernandez/
textbookfull.com
https://textbookfull.com/product/kantian-perspectives-on-issues-in-
ethics-and-bioethics-1st-edition-lina-papadaki/
textbookfull.com
The Miniature Guide to Critical Thinking Concepts and
Tools 8th Edition Richard Paul
https://textbookfull.com/product/the-miniature-guide-to-critical-
thinking-concepts-and-tools-8th-edition-richard-paul/
textbookfull.com
Applied Analytics
through Case Studies
Using SAS and R
Implementing Predictive Models and
Machine Learning Techniques
—
Deepti Gupta
Applied Analytics
through Case Studies
Using SAS and R
Implementing Predictive Models
and Machine Learning Techniques
Deepti Gupta
Applied Analytics through Case Studies Using SAS and R
Deepti Gupta
Boston, Massachusetts, USA
Acknowledgments�������������������������������������������������������������������������������������������������xvii
Introduction������������������������������������������������������������������������������������������������������������xix
v
Table of Contents
vi
Table of Contents
vii
Table of Contents
viii
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 397
ix
About the Author
Deepti Gupta completed her MBA in Finance & PGPM in
Operation Research in 2010. She has worked with KPMG
and IBM private limited as a Data Scientist and is currently
working as a data science freelancer. Deepti has extensive
experience in predictive modeling and machine learning
and her expertise is in SAS and R. Deepti has developed
data science courses and delivered data science trainings
and conducted workshops in both corporate and academic
institutions. She has written multiple blogs and white papers.
Deepti has a passion for mentoring budding data scientists.
xi
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
About the Contributor
Dr. Akshat Gupta is currently working as a Senior
Applications Engineer at MilliporeSigma in Applications
Engineering, Global Manufacturing Sciences and
Technology (MSAT) group. He authored the health-
care case study (Chapter5) of this book. His focal area of
research is cell culture clarification and tangential flow
filtration. Dr. Gupta has extensive experience in Design
of Experiments (DOE) and statistical analysis. He holds
a Bachelor of Technology (B.Tech) degree in Chemical
Engineering from the Vellore Institute of Technology, and a Master of Science (MS)
and Doctor of Philosophy (Ph.D.) in Chemical Engineering from the University of
Massachusetts Lowell. He also has graduate certificates in Modeling and Simulation, and
Nanotechnology.
xiii
About the Technical Reviewer
Preeti Pandhu has a Master of Science degree in Applied
(Industrial) Statistics from the University of Pune. She is SAS
certified as a base and advanced programmer for SAS 9 as
well as a predictive modeler using SAS Enterprise Miner 7.
Preeti has more than 18 years of experience in analytics and
training.
She started her career as a lecturer in statistics and began
her journey into the corporate world with IDeaS (now a SAS
company), where she managed a team of business analysts
in the optimization and forecasting domain. She joined SAS as a corporate trainer before
stepping back into the analytics domain to contribute to a solution-testing team and
research/consulting team. She was with SAS for 9 years. Preeti is currently passionately
building her analytics training firm, DataScienceLab (www.datasciencelab.in).
xv
Acknowledgments
Book writing is one of the most interesting and challenging attempt one can take up.
This book could not have been completed without the encouragement, guidance, and
support of my family. I would like to thank Dr. Akshat Gupta, Ved Prakash Garg, Col. Atul
Gupta, Dr. Anvita Garg, Ayush Gupta, RS Miyan, Ansi Miyan, Dr. James Chrostowski, and
my colleagues and friends for their productive discussions and suggestions. My special
thanks to Celestin John who provided great help on everything ranging from technical
support to answering my queries. I appreciate the thoughtful and insightful comments
from the editor and the reviewers. Thanks to the Apress team, especially to Divya Modi
for all the patience, support, and guidance in completing this project.
xvii
Introduction
Analytics is a big buzz and a need for today’s industries to solve their business problems.
Analytics helps in mining the structured and unstructured data in order to withdraw the
effective insights from the data, which will help to make effective business decisions. SAS
and R are highly used tools in analytics across the globe by all industries for data mining
and building machine learning and predictive models. This book focuses on industrial
business problems and a practical analytical approach to solve those problems by
implementing predictive models and machine learning techniques using SAS and R
analytical languages.
The primary objective of this book is to help statisticians, developers, engineers,
and data analysts who are well versed in writing codes; have a basic understanding of
data and statistics; and are planning to transition to a data scientist profile. The most
challenging part is practical and hands-on knowledge of building predictive models and
machine learning algorithms and deploying them in industries to address industrial
business problems. This book will benefit the reader in solving the business problems
in various industrial domains by sharpening their analytical skills in getting practical
exposure to various predictive model and machine learning algorithms in six industrial
domains.
xix
Introduction
describes how analytics contribute in the retail industry and offers a detailed explanation
of forecasting a case study in R and SAS. Chapter 4 describes how analytics is reshaping
the telecommunications industry and gives a detailed explanation of a case study on
predicting customer churn in R and SAS. Chapter 5 describes the application of analytics
in the healthcare industry and gives a clear explanation of a case study on predicting the
probability of benign and malignant breast cancer using R and SAS. Chapter 6 describes
the role of analytics in the airline industry and provides a case study on predicting flight
arrival delays (minutes) in R and SAS. Chapter 7 describes the application of analytics
in the FMCG industry with a detailed explanation of a business case study on customer
segmentation based on their purchasing history using R and SAS.
• Data analysts who know about data mining but would like to
implement predictive models and machine learning techniques.
• Developers who are well versed with coding but would like to
transition to a career in data science.
xx
CHAPTER 1
1
© Deepti Gupta 2018
D. Gupta, Applied Analytics through Case Studies Using SAS and R,
https://doi.org/10.1007/978-1-4842-3525-6_1
Chapter 1 Data Analytics and Its Application in Various Industries
Data
Collection
Data
Put into use
Preparation
Data
Results
Analysis
Model
Building
D
ata Collection
The first step in the process is data collection. Data relevant to the applicant is collected.
The quality, quantity, validity, and nature of data directly impact the analytical outcome.
A thorough understanding of the data on hand is extremely critical.
It is also useful to have an idea about some other variables that may not directly be
sourced from the industry or the specific application itself but may have a significant
impact if included into the model. For example, when developing a model to predict
flight delays, weather can be a very important variable, but it might have to be obtained
from a different source then the rest of the data set. Data analytics firms also have ready
access to certain key global databases including weather, financial indices, etc. In recent
years, data mining of digital social media like Twitter and Facebook is also becoming
very popular.7 This is particularly helpful in understanding trends related to customer
satisfaction with various services and products. This technique also helps reduce the
reliance on surveys and feedbacks. Figure 1-2 shows a Venn diagram of various sources
of data that can be tapped into for a given application.
Industry data
Data from
other sources
(Weather,
Firm Specific data social media,
etc.)
Application
specific data
3
Chapter 1 Data Analytics and Its Application in Various Industries
D
ata Preparation
The next step is data preparation. Usually raw data is not in a format that can be directly
used to perform data analysis. In very simple terms, most platforms require data to be
in a matrix form with the variables being in different columns and rows representing
various observations. Figure 1-3 shows an example of structured data.
D
ata Analysis
Once data is converted into a structured format, the next stage is to perform data
analysis. At this stage underlying trends in the data are identified. This step can include
fitting a linear or nonlinear regression model, performing principal component analysis
or cluster analysis, identifying if data is normally distributed or not. The goal is to identify
4
Chapter 1 Data Analytics and Its Application in Various Industries
what kind of information can be extracted from the data and if there are underlying
trends that can be useful for a given application. This phase is also very useful for
scoping out the models that can be most useful to capture the trends in data and if the
data satisfies underlying assumptions for the model. One example would be to see if the
data is normally distributed or not to identify if parametric models can be used or a
non-parametric model is required.
M
odel Building
Once the trends in data are identified, the next step is to put the data to work and build
a model that will help with the given application or help solve a business problem. A
vast number of statistical models are available that can be used, and new models are
being developed every day. Models can significantly vary in terms of complexity and
can range from simple univariate linear regression models to complex machine learning
algorithms. Quality of a model is not governed by complexity but rather by its ability to
account for real trends and variations in data and sift information from noise.
R
esults
Results obtained from the models are validated to ensure accuracy and model
robustness. This can be done two ways; the first is by splitting the original data set into
training and validation data sets. In this approach, part of the data is used for model
building and the remaining part is used for validation. The other approach is to validate
data against real-time data once the model is deployed. In some cases, the same data is
used to build multiple different types of models to confirm if the model outputs are real
and not statistical artifacts.
5
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Chapter 1 Data Analytics and Its Application in Various Industries
T ypes of Analytics
Analytics can be broadly classified under three categories: descriptive analytics,
predictive analytics, and prescriptive analytics.9 Figure 1-4 shows the types and
descriptions of types of analytics.
6
Chapter 1 Data Analytics and Its Application in Various Industries
• Binary: Binary data can take only two possible values. For
example, Yes/No, True/False.
7
Chapter 1 Data Analytics and Its Application in Various Industries
While on the topic of data, it is a good time to get a basic understanding of “Big Data.”
Big Data is not just a buzzword but is fast becoming a critical aspect of data analytics.
It is discussed in more detail in the following section.
8
Chapter 1 Data Analytics and Its Application in Various Industries
harness their data to use it for finding new opportunities, faster and better decision
making, increased security, and competitive advantages over rivals, such as higher
profits and better customer service. Characteristics of Big data are often described using
5 Vs, which are velocity, volume, value, variety, and veracity.11 Figure 1-5 illustrates 5 Vs
related to the big data.
Volume
Petabytes
Files
Records
Value Velocity
Statistical Batch
Events Real time
Correlations Streams
5 Vs of Big Data
Variety
Veracity
Structured
Authenticity
Unstructured
Reliability
Trustworthiness Social
Mobile
Big Data analytics applications assist data miners, data scientists, statistical modelers,
and other professionals to analyze the growing volumes of structured and mostly
unstructured data such as data from social media, emails, web servers, sensors, etc.
Big data analytics helps companies to get accessibility to nontraditional variables or
sources of information, which helps organizations to make quicker and smarter business
decisions.
9
Chapter 1 Data Analytics and Its Application in Various Industries
10
Random documents with unrelated
content Scribd suggests to you:
“I am glad I have not such a fearful temper.—Miss Endicott, you
play croquet, of course. I challenge you to a game.”
Fanny tripped gayly down the path. But mamma, I noticed, looked
very grave.
CHAPTER IV.