BA Unit IV
BA Unit IV
UNIT IV
PREDICTIVE ANALYTICS
Predictive analytics
• Predictive analytics is the process of using data to forecast future
outcomes. The process uses data analysis, machine learning, artificial
intelligence, and statistical models to find patterns that might predict
future behavior. Organizations can use historic and current data to
forecast trends and behaviors seconds, days, or years into the future with
a great deal of precision.
PREDICTIVE ANALYTICS FRAMEWORKS
• Define the problem: A prediction starts with a good thesis and set of requirements. For
instance, can a predictive analytics model detect fraud? Determine optimal inventory levels for
the holiday shopping season? Identify potential flood levels from severe weather? A distinct
problem to solve will help determine what method of predictive analytics should be used.
• Acquire and organize data: an organization may have decades of data to draw upon, or a
continual flood of data from customer interactions. Before predictive analytics models can be
developed, data flows must be identified, and then datasets can be organized in a repository
such as a data warehouse like big query.
• Pre-process data: raw data is only nominally useful by itself. To prepare the data for the
predictive analytics models, it should be cleaned to remove anomalies, missing data points, or
extreme outliers, any of which might be the result of input or measurement errors.
• Develop predictive models: data scientists have a variety of tools and techniques to develop
predictive models depending on the problem to be solved and nature of the dataset. Machine
learning, regression models, and decision trees are some of the most common types of
predictive models.
• Validate and deploy results: check on the accuracy of the model and adjust accordingly. Once
acceptable results have been achieved, make them available to stakeholders via an app,
website, or data dashboard.
PREDICTIVE ANALYTICS TECHNIQUES:
Predictive analytics tends to be performed with three main types of techniques:
Regression analysis
Regression is a statistical analysis technique that estimates relationships between variables. Regression is useful to
determine patterns in large datasets to determine the correlation between inputs. It is best employed on continuous
data that follows a known distribution. Regression is often used to determine how one or more independent
variables affects another, such as how a price increase will affect the sale of a product.
Decision trees
Decision trees are classification models that place data into different categories based on distinct variables. The
method is best used when trying to understand an individual's decisions. The model looks like a tree, with each
branch representing a potential choice, with the leaf of the branch representing the result of the decision. Decision
trees are typically easy to understand and work well when a dataset has several missing variables.
Neural networks
Neural networks are machine learning methods that are useful in predictive analytics when modeling very complex
relationships. Essentially, they are powerhouse pattern recognition engines. Neural networks are best used to
determine nonlinear relationships in datasets, especially when no known mathematical formula exists to analyze
the data. Neural networks can be used to validate the results of decision trees and regression models.
USES AND EXAMPLES OF PREDICTIVE ANALYTICS
Predictive analytics can be used to streamline operations, boost revenue, and mitigate risk for almost any business or industry, including
banking, retail, utilities, public sector, healthcare, and manufacturing. Sometimes augmented analytics are used, which uses big data
machine learning. Here are some more use case examples, including data lake analytics.
• Predictive analytics examines all actions on a company’s network in real time to pinpoint abnormalities that
indicate fraud and other vulnerabilities.
• Conversion and purchase prediction & Customer segmentation
• Companies can take actions, like retargeting online ads to visitors, with data that predicts a greater likelihood of
conversion and purchase intent.
• Risk reduction & Operational improvement
• Credit scores, insurance claims, and debt collections all use predictive analytics to assess and determine the
likelihood of future defaults.
• Companies use predictive analytics models to forecast inventory, manage resources, and operate more
efficiently.
• By dividing a customer base into specific groups, marketers can use predictive analytics to make forward-looking
decisions to tailor content to unique audiences.
• Maintenance forecasting
• Organizations use data to predict when routine equipment maintenance will be required and can then schedule
it before a problem or malfunction arises.
LOGIC AND DATA DRIVEN MODELS
• Predictive modelling is the method of making, testing and authenticating
a model to best predict the likelihood of a conclusion. Several modelling
procedures from artificial intelligence, machine learning and statistics are
present in predictive analytics software solutions. Models can utilize
single or more classifiers to decide the probability of a set of data
related to another set.
• The different models available for predictive analytics software enables
the system to develop new data information and predictive models. Each
model has its own strengths and weakness and is best suited for various
types of problems.
PREDICTIVE MODELLING
• Predictive modelling is at the heart of business decision making
• Building decision models more than science is an art
• Creating an ideal decision model demands:
• Armed with all the above details, we can logically arrive at a conclusion and can derive
the following model for the above problem statement:
• M = profit margin
•
Data-driven models
• The main aim of data-driven model concept is to find links between the
state system variables (input and output) without clear knowledge of the
physical attributes and behaviour of the system. The data driven
predictive modelling derives the modelling method based on the set of
existing data and entails a predictive methodology to forecast the future
outcomes.
• It is data-driven only when there is no clear knowledge of the
relationships among variables/system, though there is lot of data. Here,
you are simply predicting the outcomes based on the data. The model is
not based on hand-picked variables, but may contain unobserved,
hidden combination of variables.
Data driven modeling (DDM)
• Data driven modeling (DDM) is a technique using which the configurator model components are
dynamically injected into the model based on the data derived from external systems such as catalog
system, customer relationship management (CRM), Watson, and so on.
• The Omni-configurator engine constructs the model components including option class and option item
during runtime based on the service request parameters, and populates associated properties before
executing business logic contained inside the configurator model.
• Using the DDM technique, a modeler can define a configurator model by using the sterling configurator
visual modeler tool with DDM properties that defines the data source and selection criteria for injecting
the catalog items into the model. The data is retrieved from the system or data source by using the data
source adapters implemented for each system or data source.
• Based on the data source defined in the model, the corresponding data source adapters are invoked to
fetch the data. Model components are dynamically created in the configurator model based on the
data returned by the data source adapter.
• The DDM technique provides the following benefits over the static modeling technique: it reduces the
total cost of ownership (TCO) by eliminating manual construction of model components by the modeler
that represents products within configurator models.
• It reduces the time to market since the model is dynamically updated with changes in the catalog
system.
•
1. Differences between Static and dynamic modeling techniques
Static Dynamic
There are three key trends that will drive the future of data
modeling.
1.First, data modeling capabilities are being baked into more
business applications and citizen data science tools. These
capabilities can provide the appropriate guardrails and templates
for business users to work with predictive modeling.
2.Second, the tools and frameworks for low-code predictive
modeling are making it easier for data science experts to quickly
cleanse data, create models and vet the results.
3.Third, better tools are coming to automate many of the data
engineering tasks required to push predictive models into
production. Carroll predicts this will allow more organizations to
shift from simply building models to deploying them in ways that
deliver on their potential value.
Challenges of predictive modeling
Data preparation. One of the most frequently overlooked challenges of predictive
modeling is acquiring the correct amount of data and sorting out the right data to use
when developing algorithms. By some estimates, data scientists spend about 80% of their
time on this step. Data collection is important but limited in usefulness if this data is not
properly managed and cleaned.
Once the data has been sorted, organizations must be careful to avoid overfitting. Over-
testing on training data can result in a model that appears very accurate but has
memorized the key points in the data set rather than learned how to generalize.
Technical and cultural barriers. While predictive modeling is often considered to be
primarily a mathematical problem, users must plan for the technical and organizational
barriers that might prevent them from getting the data they need. Often, systems that
store useful data are not connected directly to centralized data warehouses. Also, some
lines of business may feel that the data they manage is their asset, and they may not
share it freely with data science teams.
Choosing the right business case. Another potential obstacle for predictive modeling
initiatives is making sure projects address significant business challenges. Sometimes,
data scientists discover correlations that seem interesting at the time and build algorithms
to investigate the correlation further. However, just because they find something that is
statistically significant does not mean it presents an insight the business can use.
Predictive modeling initiatives need to have a solid foundation of business relevance.
Bias. "One of the more pressing problems everyone is talking about, but few have
addressed effectively, is the challenge of bias," Carroll said. Bias is naturally introduced
into the system through historical data since past outcomes reflect existing bias.
Nate Nichols, distinguished principal at Narrative Science, a natural language generation
tools provider, is excited about the role that new explainable machine learning methods
such as LIME or SHAP could play in addressing concerns about bias and promoting trust.
PREDICTIVE ANALYSIS PROCEDURE
DATA MINING FOR PREDICTIVE ANALYTICS
• What is data mining?
• Data mining refers to a process of analyzing data from different contexts and
summarizing it into useful information. The information gathered from data mining could
include customer patterns, purchase patterns, transaction times, customer demand, and the
relationship between the sold items. It is a powerful technology with great potential to
assist companies in targeting the most significant information in the data set they have
gathered about the customer behaviours and potential of the customers.
• These are the given steps involved in the process of data mining
• Business understandings
• Data selection
• Data preparation
• Modelling
• Evaluation
• Deployment
APPLICATION OF DATA MINING
• Financial analysis
• Biological data analysis
• Market analysis
• Retail industry
• Manufacturing engineering
• Criminal investigation
Predictive Analytics Data Mining
• Exploration– the first and foremost stage usually starts with data preparation; i.E., From cleaning data
to data transformations, selecting subsets of records and so forth. The primary stage can take place
anywhere between a simple choice of straightforward predictors for a regression model, to elaborate
exploratory analyses using a wide variety of graphical and statistical methods. Keeping the nature of
the analytic problem in mind, businesses can quickly identify the most relevant variables and at the
same time, determine the complexity and/or the general nature of models.
• Model building or pattern identification– the second stage is all about learning about several models
and choosing the right one for your need. Depending on the predictive performance, you need to
conduct such simple yet elaborative process. Although, several techniques can be taken into account such
as bagging (voting, averaging), boosting, stacking (stacked generalizations), and meta-learning. It is
interesting to know that many of these are based on so-called “competitive evaluation of models.” This
means applying different models to the same data set and then comparing their performance to choose
the best.
• Deployment- the last and final stage involves the use of the selected model and applying the same to
generate predictions or estimates of the expected outcome. Data mining as a business information
management tool seems to becoming popular day in day out. However, the only difference between
data mining and the traditional exploratory data analysis (EDA) is that data mining is more oriented
towards applications than the fundamental nature of the underlying phenomena. Which means it is less
concerned with identifying the specific relations between the involved variables.
ANALYSIS OF PREDICTIVE ANALYTICS
• In predictive analysis we use both excel and IBM SPSS for to computer
statistics in this step of BA process.
• Multiple regression can be used to evaluate independent variables are
the best included or exclude in linear model called step wise multiple
regression.
• Validation statistics – the multiple correlation coefficient and the F-test
from ANOVA