BBABB602 Study Material and Syllabus
BBABB602 Study Material and Syllabus
STUDY MATERIAL
COURSE OBJECTIVES:
1. To describe the role of data analytics and decision support systems in business and record the current
issues with those of the firm to solve business problems.
2. To introduce the fundamental principles of computer-based information analysis and design and
develop an understanding of the principles and techniques used.
3. To enable students to understand the various knowledge representation methods and different expert
system structures as strategic weapons to counter the threats to business and make business more
competitive.
4. To enable the students to use of data analysis to assess the impact of Technology on electronic
commerce and electronic business and understand the specific threats and vulnerabilities of
computer systems.
.
COURSE OUTCOMES:
CO1: The students will be able to relate the basic concepts and technologies used in the field of data
analytics.
CO2: The students will be able to compare the processes of developing and implementing data analytics
algorithms.
CO3: The students will be able to examine the role of the ethical, social, and security issues of data
analytics systems.
CO4: The students will be able to investigate and translate the role of data analytics in organizations,
and the strategic management processes, with the implications for the management.
Course Content:
2|Page
Module Number Description of Topic Page No.
1 Simple Linear Regression: 9-35
Introduction – Overview –
Importance -Least Square Method–
Normal Equations - Calculation of
Regression Coefficients –
Properties of Regression Line –
Uses of Regression;
• Multiple Linear Regression:
Overview – Importance -
Least Square Method –-
Normal Equations –
Calculation of Regression
Coefficients - Properties of–
Testing Relevance of an
Additional Explanatory
Variable
2 Basic concept of Logistic 35-50
Regression – Assessing the
Model –
• log-likelihood statistic –
deviance statistic – R and R2
– Wald Statistic – odds ratio –
Sources of Bias and Common
Problems - Interpreting
Binary Logistic Regression
3 • Basic concept of Factor 50-71
Analysis, Factor Analysis
Model, Statistics Associated
with Factor Analysis, Factor
Analysis Process – Formulate
the Problem – Construct the
Correlation Matrix-
Determine the method of
Factor Analysis –Determine
the number of Factors –
Factor Extraction eigenvalues
and scree plot- Factor
Rotation – Interpret Factors –
Calculate Factor Scores -
Determine Model Fit.
4 • Basic concept of Cluster 71-82
Analysis, Statistics
Associated with Cluster
Analysis, Cluster Analysis
Process - Formulate the
Problem – Select a distance
measure – Select a clustering
procedure – Decide on the
number of Clusters – Interpret
and Profile Cluster – Asses
the reliability and validity .
3|Page
Module Topic Sub-topics Mapping with Industry and Lecture Correspond
number International Standard Hours ing
Assignment
4|Page
Binary Basic concept of International Academia: 12 Basic
2 Logistic Logistic https://ocw.mit.edu/courses/15-071- concept
Regression – the-analytics-edge-spring- of
Regression 2017/pages/logistic-regression/
Assessing the Logistic
Model – Industrial Mapping : Predictive Regressi
• log- model creation on –
likelihood Assessin
statistic – g the
deviance Model –
statistic – R log-
and R2 – likeliho
Wald od
Statistic – statistic
odds ratio – –
Sources of devianc
Bias and e
Common statistic
Problems - – R and
Interpreting R2
Binary
Logistic
Regression
3 Factor • Basic International Academia: 12 Basic
Analysis concept of https://ocw.mit.edu/courses/18- concept
Factor s096-topics-in-mathematics-with- of
Analysis, applications-in-finance-fall- Factor
Factor 2013/resources/lecture-15-factor- Analysi
modeling/
Analysis s,
Industrial Mapping : Predictive
Model, Factor
model creation
Statistics Analysi
Associated s
with Factor Model,
Analysis, Statisti
Factor cs
Analysis Associ
Process – ated
Formulate with
the Problem Factor
– Construct Analysi
the s,
Correlation Factor
Matrix- Analysi
Determine s
the method of Process
Factor –
Analysis – Formul
Determine ate the
the number Proble
of Factors – m–
Factor Constr
Extraction uct the
eigenvalues Correla
5|Page
and scree tion
plot- Factor Matrix-
Rotation – Determ
Interpret ine the
Factors – method
Calculate of
Factor Scores Factor
- Determine Analysi
Model Fit. s–
Cluster • Basic International Academia: 12 Basic
4 Analysis concept of https://ocw.mit.edu/courses/6- concept
Cluster 0002-introduction-to- of
Analysis, computational-thinking-and-data- Cluster
Statistics science-fall- Analysi
2016/resources/lecture-12-
Associated s,
clustering/
with Cluster Statistic
Analysis, Industrial Mapping : Predictive s
Cluster model creation Associat
Analysis ed with
Process - Cluster
Formulate Analysi
the Problem s,
– Select a Cluster
distance Analysi
measure – s
Select a Process
clustering
procedure –
Decide on the
number of
Clusters –
Interpret and
Profile
Cluster –
Asses the
reliability
and validity .
Learning Resources:
Text Book:
References:
6|Page
CO-PO Mapping:
BBABA602C01 2 3 1 2
BBABA602CO2 2 3 2 2
BBABA602CO3 1 1 1 2
BBABA602CO4 1 1 3 3
7|Page
MODULE -1
8|Page
Multiple Linear Regression
9|Page
10 | P a g e
11 | P a g e
12 | P a g e
13 | P a g e
14 | P a g e
15 | P a g e
16 | P a g e
17 | P a g e
18 | P a g e
19 | P a g e
20 | P a g e
21 | P a g e
22 | P a g e
23 | P a g e
24 | P a g e
25 | P a g e
26 | P a g e
Exercise
2.
(BL6, Create)
6. Explain the assumptions of multiple linear regression models using matrix notation.
7. How can you test the overall significance of regression model? (BL: 5, Evaluate)
27 | P a g e
MODULE – 2
In multiple regression, in which there are several predictors, a similar equation is derived in which
each predictor has its own coefficient. As such, Y is predicted from a combination of each predictor
variable multiplied by its respective regression coefficient.
28 | P a g e
29 | P a g e
30 | P a g e
31 | P a g e
32 | P a g e
33 | P a g e
34 | P a g e
USES OF LOGISTIC REGRESSION
Logistic regression is commonly used for prediction and classification problems. Some of these
use cases include:
• Fraud detection: Logistic regression models can help teams identify data anomalies,
which are predictive of fraud. Certain behaviors or characteristics may have a higher
association with fraudulent activities, which is particularly helpful to banking and other
financial institutions in protecting their clients. SaaS-based companies have also started to
adopt these practices to eliminate fake user accounts from their datasets when conducting
data analysis around business performance.
• Disease prediction: In medicine, this analytics approach can be used to predict the
likelihood of disease or illness for a given population. Healthcare organizations can set up
preventative care for individuals that show higher propensity for specific illnesses.
Logistic regression does not make many of the key assumptions of linear regression and general
linear models that are based on ordinary least squares algorithms – particularly regarding linearity,
normality, homoscedasticity, and measurement level.
Firstly, it does not need a linear relationship between the dependent and independent variables.
Logistic regression can handle all sorts of relationships, because it applies a non-linear log
transformation to the predicted odds ratio. Secondly, the independent variables do not need to be
multivariate normal – although multivariate normality yields a more stable solution. Also the error
terms (the residuals) do not need to be multivariate normally distributed. Thirdly,
homoscedasticity is not needed. Logistic regression does not need variances to be heteroscedastic
for each level of the independent variables. Lastly, it can handle ordinal and nominal data as
independent variables. The independent variables do not need to be metric (interval or ratio
scaled). However some other assumptions still apply.
35 | P a g e
Binary logistic regression requires the dependent variable to be binary and ordinal logistic
regression requires the dependent variable to be ordinal. Reducing an ordinal or even metric
variable to dichotomous level loses a lot of information, which makes this test inferior compared
to ordinal logistic regression in these cases.
Secondly, since logistic regression assumes that P(Y=1) is the probability of the event occurring,
it is necessary that the dependent variable is coded accordingly. That is, for a binary regression,
the factor level 1 of the dependent variable should represent the desired outcome.
Thirdly, the model should be fitted correctly. Neither over fitting nor under fitting should occur.
That is only the meaningful variables should be included, but also all meaningful variables should
be included. A good approach to ensure this is to use a stepwise method to estimate the logistic
regression.
Fourthly, the error terms need to be independent. Logistic regression requires each observation to
be independent. That is that the data-points should not be from any dependent samples design, e.g.,
before-after measurements, or matched pairings. Also the model should have little or no
multicollinearity. That is that the independent variables should be independent from each other.
However, there is the option to include interaction effects of categorical variables in the analysis
and the model. If multicollinearity is present centering the variables might resolve the issue, i.e.
deducting the mean of each variable. If this does not lower the multicollinearity, a factor analysis
with orthogonally rotated factors should be done before the logistic regression is estimated.
Fifthly, logistic regression assumes linearity of independent variables and log odds. Whilst it does
not require the dependent and independent variables to be related linearly, it requires that the
independent variables are linearly related to the log odds. Otherwise the test underestimates the
strength of the relationship and rejects the relationship too easily, that is being not significant (not
rejecting the null hypothesis) where it should be significant. A solution to this problem is the
categorization of the independent variables. That is transforming metric variables to ordinal level
and then including them in the model. Another approach would be to use discriminant analysis, if
the assumptions of homoscedasticity, multivariate normality, and absence of multicollinearity are
met.
Lastly, it requires quite large sample sizes. Because maximum likelihood estimates are less
powerful than ordinary least squares (e.g., simple linear regression, multiple linear regression);
whilst OLS needs 5 cases per independent variable in the analysis, ML needs at least 10 cases per
independent variable, some statisticians recommend at least 30 cases for each parameter to be
estimated.
36 | P a g e
TYPES OF LOGISTIC REGRESSION
There are three types of logistic regression models, which are defined based on categorical
response.
Exercises:
37 | P a g e
MODULE – 3
38 | P a g e
39 | P a g e
40 | P a g e
41 | P a g e
42 | P a g e
43 | P a g e
44 | P a g e
45 | P a g e
46 | P a g e
47 | P a g e
48 | P a g e
49 | P a g e
50 | P a g e
51 | P a g e
Exercises
1. Why is it useful to rotate the factors? Which is the most common method of rotation? (BL: 5,
Evaluate)
2. What guidelines are available for interpreting the factors? (BL: 4, Analyze)
3. What is the major difference between principal components analysis and common factor
analysis? (BL: 4, Analyze)
4. What hypothesis is examined by Bartlett’s test of sphericity? For what purpose is this test
used? (BL: 5, Evaluate)
5. For what purpose is the Kaiser–Meyer–Olkin measure of sampling adequacy used? (BL: 5,
Evaluate)
52 | P a g e
MODULE – 4
53 | P a g e
54 | P a g e
55 | P a g e
56 | P a g e
57 | P a g e
58 | P a g e
59 | P a g e
60 | P a g e
61 | P a g e
62 | P a g e
63 | P a g e
64 | P a g e
65 | P a g e
66 | P a g e
Exercise:
1. Why is the average linkage method usually preferred to single linkage and complete linkage? (BL:
5, Evaluate)
2. What guidelines are available for deciding the number of clusters? (BL4:, Analyze)
3. Upon what basis may a researcher decide which variables should be selected to formulate a
clustering problem? (BL: 5, Evaluate)
4. What are some of the uses of cluster analysis in marketing? (BL4:, Analyze)