0% found this document useful (0 votes)

7 views

Assignment 2 Sol

Solution of assignment Analytics R

Uploaded by

prasadcanvadoctor

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Assignment 2 Sol

Solution of assignment Analytics R

Uploaded by

prasadcanvadoctor

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Assignment – 2

Q1. Build a logistic regression model based on the

training data set to identify good customers and bad
customers. A good customer is one who has never
delayed the payment, whereas a bad customer is one
who has delayed the payment even once. Discuss the
interpretations of the model coefficients in detail.
Intercept

 The base likelihood of someone defaulting on their loan, without considering other
factors, is moderately high (log-odds of 1.537386).

Age

 Older customers are a bit less likely to default on their loans. For each additional year
of age, the log-odds of defaulting decrease by 0.014793.

Number of Dependents

 Having more dependents slightly increases the chance of defaulting. For each
additional dependent, the log-odds of defaulting increase by 0.039196.

Monthly Income

 Higher monthly income slightly reduces the chance of defaulting. For each additional
thousand INR of income, the log-odds of defaulting decrease by 0.005679.

Salary Date

 Getting paid earlier in the month significantly lowers the risk of defaulting. For each
unit increase in the salary date fraction, the log-odds of defaulting decrease by
0.453885.

Loan Tenure

 Longer loan terms significantly increase the risk of defaulting. For each additional
year of loan tenure, the log-odds of defaulting increase by 0.725872.
Down Payment

 Making a larger down payment significantly reduces the risk of defaulting. For each
unit increase in the down payment fraction, the log-odds of defaulting decrease by
1.073405.

Profession

 Business people are slightly more likely to default compared to professionals. The
log-odds of defaulting increase by 0.269408 for business people.

Education Level

 Customers with only high school education (HSC) are a bit more likely to default
compared to undergraduates. The log-odds of defaulting increase by 0.105550 for
HSC qualified customers. Those with a postgraduate education are less likely to
default, with the log-odds decreasing by 0.346036.

Gender

 Men are slightly more likely to default than women. The log-odds of defaulting
increase by 0.237154 for men.

Post-Dated Checks

 Customers who provide full post-dated checks are much less likely to default. The
log-odds of defaulting decrease by 1.280570.

Refrigerator Ownership

 Owning a refrigerator slightly reduces the risk of defaulting. The log-odds of

defaulting decrease by 0.190968.

Washing Machine Ownership

 Owning a washing machine slightly reduces the risk of defaulting. The log-odds of
defaulting decrease by 0.154021.

Key Takeaways

 Older age (-0.014793), higher income (-0.005679), earlier salary date (-0.453885),
larger down payment (-1.073405), postgraduate education (-0.346036), and
ownership of household appliances (refrigerator: -0.190968, washing machine: -
0.154021) are all factors that reduce the likelihood of loan default.
 More dependents (0.039196), longer loan terms (0.725872), and being a business
person (0.269408) slightly increase the risk of defaulting.
 Providing full post-dated checks (-1.280570) is a strong indicator of a lower risk of
default.
Code and output-

Call:
glm(formula = DefaulterFlag ~ AGE + NOOFDEPE + MTHINCTH + SALDATFR +
TENORYR + DWNPMFR + PROFBUS + QUALHSC + QUAL_PG + SEXCODE +
FULLPDC + FRICODE + WASHCODE, family = binomial, data = trainData)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.537386 0.128123 11.999 < 2e-16 ***
AGE -0.014793 0.001605 -9.218 < 2e-16 ***
NOOFDEPE 0.039196 0.010093 3.884 0.000103 ***
MTHINCTH -0.005679 0.003376 -1.682 0.092525 .
SALDATFR -0.453885 0.038951 -11.653 < 2e-16 ***
TENORYR 0.725872 0.042502 17.079 < 2e-16 ***
DWNPMFR -1.073405 0.121978 -8.800 < 2e-16 ***
PROFBUS 0.269408 0.045698 5.895 3.74e-09 ***
QUALHSC 0.105550 0.038096 2.771 0.005594 **
QUAL_PG -0.346036 0.074628 -4.637 3.54e-06 ***
SEXCODE 0.237154 0.057760 4.106 4.03e-05 ***
FULLPDC -1.280570 0.033054 -38.742 < 2e-16 ***
FRICODE -0.190968 0.036351 -5.253 1.49e-07 ***
WASHCODE -0.154021 0.043326 -3.555 0.000378 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 27726 on 23124 degrees of freedom

Residual deviance: 24786 on 23111 degrees of freedom
AIC: 24814

Number of Fisher Scoring iterations: 4

Q2. Judge the performance of the model based on the

validation data set, after identifying the optimal
threshold point. Is the performance of the model
satisfactory? Consider at least two criteria.

 True Positives: 3775

 True Negatives: 509
 False Positives: 1184
 False Negatives: 313

Performance Metrics

 Accuracy: 0.741
 Precision: 0.619
 Recall (Sensitivity): 0.301
 F1 Score: 0.405
 AUC: 0.717

The recall is quite low at 30.1%, which means the model is missing a large number of actual
defaulters.
The precision is relatively better at 61.9%, indicating that when the model predicts a default,
it is reasonably accurate.
The balanced accuracy (0.612) takes into account both the sensitivity and specificity, giving a
more balanced view of the model performance.
There are many false negatives (313), which means many defaulters are being classified as
non-defaulters.
There are also many false positives (1184), which means many non-defaulters are being
classified as defaulters
The model shows moderate performance with an accuracy of 74.1% and an AUC of 0.717.
However, the low recall (30.1%) is a major issue as it indicates the model's poor performance
in identifying actual defaulters.

Code and output-

"AUC: 0.717266143080232"
>
>
>
>
> opt_threshold=0.5
> print(paste("Optimal Threshold:", opt_threshold))
[1] "Optimal Threshold: 0.5"
>
>
> testData$predicted_class=ifelse(testData$predicted_prob > opt_threshold, 1, 0)
> levels(testData$predicted_class)=levels(testData$DefaulterFlag)
>
> table(testData$predicted_class)

0 1
822 4959
> table(testData$DefaulterFlag)

0 1
1693 4088
> conf_matrix=confusionMatrix(as.factor(testData$predicted_class),
as.factor(testData$DefaulterFlag))
> print(conf_matrix)
Confusion Matrix and Statistics

Reference
Prediction 0 1
0 509 313
1 1184 3775

Accuracy : 0.741
95% CI : (0.7295, 0.7523)
No Information Rate : 0.7071
P-Value [Acc > NIR] : 5.438e-09

Kappa : 0.2638

Mcnemar's Test P-Value : < 2.2e-16

Sensitivity : 0.30065
Specificity : 0.92343
Pos Pred Value : 0.61922
Neg Pred Value : 0.76124
Prevalence : 0.29286
Detection Rate : 0.08805
Detection Prevalence : 0.14219
Balanced Accuracy : 0.61204

'Positive' Class : 0

> table(testData$predicted_class,testData$DefaulterFlag)

0 1
0 509 313
1 1184 3775
>
> accuracy=conf_matrix$overall['Accuracy']
> precision=conf_matrix$byClass['Pos Pred Value']
> recall=conf_matrix$byClass['Sensitivity']
> f1_score=2 * ((precision * recall) / (precision + recall))
>
> print(paste("Accuracy:", accuracy))
[1] "Accuracy: 0.741048261546445"
> print(paste("Precision:", precision))
[1] "Precision: 0.619221411192214"
> print(paste("Recall:", recall))
[1] "Recall: 0.300649734199646"
> print(paste("F1 Score:", f1_score))
[1] "F1 Score: 0.404771371769384"
Q3. Suppose Auto Finance Ltd. provides a loan for 2
years. The management of Auto Finance Ltd. has
estimated that the profit associated with a “True
Positive” case is Rs. 6000. Furthermore, they also
estimated that the losses associated with a “False
Negative” case and a “False Positive” case are Rs.
12000 and Rs. 6000, respectively. The profit
associated with a “True Negative” case is Rs. 2000.
Provide the benefit matrix. Based on the confusion
matrix obtained for the validation data set,
calculate the total profit/loss for the company.

 TP: 3775
 TN: 509
 FP: 1184
 FN: 313
Benefit Matrix

 Profit for True Positive: Rs. 6000

 Loss for False Positive: -Rs. 6000
 Loss for False Negative: -Rs. 12000
 Profit for True Negative: Rs. 2000

Code and output-

Total Profit/Loss: 12808000

profit_TP=6000
> loss_FP=-6000
> loss_FN=-12000
> profit_TN=2000
>
>
> TP=3775
> TN=509
> FP=1184
> FN=313
>
>
> total_profit_loss=(TP * profit_TP) + (FP * loss_FP) + (FN * loss_FN) + (TN * profit_TN)
> print(paste("Total Profit/Loss:", total_profit_loss))
[1] "Total Profit/Loss: 12808000"

Q4. Based on the benefit matrix, can you provide an

alternative approach for deciding the optimal
threshold point? Does the optimal threshold point
differ from the one identified earlier? If yes, recalculate
the confusion matrix and profit/loss.

Using an alternative approach to determine the optimal threshold for maximizing profit/loss,
we identified an optimal threshold of 0.37, which yields a maximum profit/loss of Rs.
14,520,000. This approach significantly improves financial performance compared to a
default threshold.

At this threshold, the model's performance metrics are as follows: an accuracy of 71.93%,
sensitivity (recall) of 6.56%, specificity of 98.99%, precision of 73.03%, and balanced
accuracy of 52.78%. While the model achieves high specificity and reasonable precision, the
sensitivity is very low, indicating that the model misses many actual defaulters. This poses a
significant risk as these missed defaulters may default on their loans.

From a managerial perspective, this threshold allows for a selective loan approval strategy,
ensuring that those approved are less likely to default, thus reducing overall risk exposure.
However, due to the low sensitivity, many potential defaulters are not identified,
necessitating additional monitoring and intervention strategies for high-risk customers.

The model’s high specificity ensures confident identification of non-defaulters, allowing for
secure loan approvals. However, given the low sensitivity, enhanced monitoring and
proactive engagement with potential defaulters are crucial to mitigate risk. This approach
focuses on profit maximization by approving fewer, but more reliable loans. Continuous
monitoring and potential adjustments of the threshold based on economic conditions and
customer behavior will help maintain optimal financial performance.

In conclusion, while the threshold of 0.37 maximizes profitability, it requires careful

management of the risk posed by missed defaulters. Auto Finance Ltd. should balance the
profit-maximizing benefits of this threshold with strategies to manage and monitor potential
risks, ensuring a holistic approach to loan approval and risk management.

Confusion Matrix and Statistics

Reference
Prediction 0 1
0 111 41
1 1582 4047

Accuracy : 0.7193
95% CI : (0.7075, 0.7308)
No Information Rate : 0.7071
P-Value [Acc > NIR] : 0.02196

Kappa : 0.0757

Mcnemar's Test P-Value : < 2e-16

Sensitivity : 0.06556
Specificity : 0.98997
Pos Pred Value : 0.73026
Neg Pred Value : 0.71896
Prevalence : 0.29286
Detection Rate : 0.01920
Detection Prevalence : 0.02629
Balanced Accuracy : 0.52777

'Positive' Class : 0

"Total Profit/Loss for the optimal threshold: 14520000"

> Optimal Threshold based on Profit/Loss: 0.37"
> print(paste("Maximum Profit/Loss:", max_profit_loss))
[1] "Maximum Profit/Loss: 14520000"

Q 5. Build now a classification tree and random forest

model in the same context. Judge the performances
of these models as compared to the fitted logistic
regression model above. Identify the important
predictors. Compare and contrast the same with your
observation earlier

Logistic Regression

 Accuracy: 74.1%
 Precision: 61.9%
 Recall: 30.1%
 F1 Score: 40.5%
 AUC: 0.717

Classification Tree

 Accuracy: 73.1%
 Precision: 55.5%
 Recall: 33.2%
 F1 Score: 41.5%
 AUC: 0.675

Random Forest

 Accuracy: 72.7%
 Precision: 55.1%
 Recall: 29.2%
 F1 Score: 38.2%
 AUC: 0.712

Logistic Regression: Best overall performance with highest accuracy (74.1%) and AUC (0.717).
Moderate precision and recall.
Classification Tree: Slightly higher recall (33.2%) than logistic regression but lower overall
performance.
Random Forest: Lowest recall (29.2%) and slightly lower accuracy. Most important predictors:
FULLPDC, DWNPMFR, MTHINCTH, TENORYR, AGE.

Code and output-

print(paste("Accuracy:", accuracy))
[1] "Accuracy: 0.741048261546445"
> print(paste("Precision:", precision))
[1] "Precision: 0.619221411192214"
> print(paste("Recall:", recall))
[1] "Recall: 0.300649734199646"
> print(paste("F1 Score:", f1_score))
[1] "F1 Score: 0.404771371769384"
tree_pred_class
0 1
996 4785
> table(testData$DefaulterFlag)

0 1
1666 4115
> conf_matrix_tree=confusionMatrix(as.factor(tree_pred_class),
as.factor(testData$DefaulterFlag))
> print(conf_matrix_tree)
Confusion Matrix and Statistics

Reference
Prediction 0 1
0 553 443
1 1113 3672

Accuracy : 0.7308
95% CI : (0.7192, 0.7422)
No Information Rate : 0.7118
P-Value [Acc > NIR] : 0.000689

Kappa : 0.2548

Mcnemar's Test P-Value : < 2.2e-16

Sensitivity : 0.33193
Specificity : 0.89235
Pos Pred Value : 0.55522
Neg Pred Value : 0.76740
Prevalence : 0.28819
Detection Rate : 0.09566
Detection Prevalence : 0.17229
Balanced Accuracy : 0.61214

'Positive' Class : 0

> pred_tree <- prediction(tree_pred_prob, testData$DefaulterFlag)

>
> perf_tree <- performance(pred_tree, "tpr", "fpr")
> auc_tree <- performance(pred_tree, "auc")
> auc_value_tree <- [email protected][[1]]
> print(paste("AUC for Classification Tree:", auc_value_tree))
[1] "AUC for Classification Tree: 0.675466444171836"
>
> rf_model <- randomForest(DefaulterFlag ~ AGE + NOOFDEPE + MTHINCTH +
SALDATFR + TENORYR + DWNPMFR +
+ PROFBUS + QUALHSC + QUAL_PG + SEXCODE + FULLPDC +
FRICODE + WASHCODE,
+ data = trainData, ntree = 500, importance = TRUE)
> rf_pred_prob <- predict(rf_model, testData, type = "prob")[,2]
> rf_pred_class <- predict(rf_model, testData,type="class")
>
> table(rf_pred_class)
rf_pred_class
0 1
884 4897
> table(testData$DefaulterFlag)

0 1
1666 4115
> conf_matrix_rf=confusionMatrix(as.factor(rf_pred_class),
as.factor(testData$DefaulterFlag))
> print(conf_matrix_rf)
Confusion Matrix and Statistics

Reference
Prediction 0 1
0 487 397
1 1179 3718

Accuracy : 0.7274
95% CI : (0.7157, 0.7388)
No Information Rate : 0.7118
P-Value [Acc > NIR] : 0.004512

Kappa : 0.2276

Mcnemar's Test P-Value : < 2.2e-16

Sensitivity : 0.29232
Specificity : 0.90352
Pos Pred Value : 0.55090
Neg Pred Value : 0.75924
Prevalence : 0.28819
Detection Rate : 0.08424
Detection Prevalence : 0.15291
Balanced Accuracy : 0.59792

'Positive' Class : 0

> pred_rf <- prediction(rf_pred_prob, testData$DefaulterFlag)

> perf_rf <- performance(pred_rf, "tpr", "fpr")
> auc_rf <- performance(pred_rf, "auc")
> auc_value_rf <- [email protected][[1]]
> print(paste("AUC for Random Forest:", auc_value_rf))
[1] "AUC for Random Forest: 0.711854632496984"
>
> importance(rf_model)
0 1 MeanDecreaseAccuracy MeanDecreaseGini
AGE 9.623505 11.267268 14.523652 1188.37944
NOOFDEPE 1.482156 12.704786 12.204627 609.31152
MTHINCTH -4.916054 29.425415 24.097508 1215.93278
SALDATFR -10.184515 38.832478 31.731234 416.84193
TENORYR 37.754316 44.645115 66.346723 605.85118
DWNPMFR 16.075278 29.865796 37.419267 1263.42715
PROFBUS -6.726649 20.042520 17.653610 118.17213
QUALHSC 5.062489 6.207251 8.515154 164.18279
QUAL_PG 7.893954 2.478583 6.909609 79.42708
SEXCODE 6.413266 6.126289 9.197658 108.96061
FULLPDC 114.473405 83.382018 130.298974 713.12662
FRICODE 13.466021 13.159150 21.752985 147.09237
WASHCODE 7.025977 18.756664 24.814680 125.78148
> varImpPlot(rf_model)
>
> rf_accuracy <- conf_matrix_rf$overall['Accuracy']
> rf_precision <- conf_matrix_rf$byClass['Pos Pred Value']
> rf_recall <- conf_matrix_rf$byClass['Sensitivity']
> rf_f1_score <- 2 * ((rf_precision * rf_recall) / (rf_precision + rf_recall))
>
> print(paste("Random Forest Accuracy:", rf_accuracy))
[1] "Random Forest Accuracy: 0.727382805742951"
> print(paste("Random Forest Precision:", rf_precision))
[1] "Random Forest Precision: 0.550904977375566"
> print(paste("Random Forest Recall:", rf_recall))
[1] "Random Forest Recall: 0.292316926770708"
> print(paste("Random Forest F1 Score:", rf_f1_score))
[1] "Random Forest F1 Score: 0.381960784313726"
> print(paste("Random Forest AUC:", auc_value_rf))
[1] "Random Forest AUC: 0.711854632496984"
>
>
>
>
> # Classification Tree Results
> tree_accuracy <- conf_matrix_tree$overall['Accuracy']
> tree_precision <- conf_matrix_tree$byClass['Pos Pred Value']
> tree_recall <- conf_matrix_tree$byClass['Sensitivity']
> tree_f1_score <- 2 * ((tree_precision * tree_recall) / (tree_precision + tree_recall))
>
> print(paste("Classification Tree Accuracy:", tree_accuracy))
[1] "Classification Tree Accuracy: 0.730842414807127"
> print(paste("Classification Tree Precision:", tree_precision))
[1] "Classification Tree Precision: 0.555220883534137"
> print(paste("Classification Tree Recall:", tree_recall))
[1] "Classification Tree Recall: 0.331932773109244"
> print(paste("Classification Tree F1 Score:", tree_f1_score))
[1] "Classification Tree F1 Score: 0.415477084898573"
> print(paste("Classification Tree AUC:", auc_value_tree))
[1] "Classification Tree AUC: 0.675466444171836"

Q 6. Based on your chosen model, how will you take

managerial decisions? Discuss in detail.

Using the logistic regression model, Auto Finance Ltd. can make a number of strategy
management choices that will improve the loan approval process, make risk management better,
better divide customers into groups, focus marketing efforts, and make operations run more
smoothly. With an AUC of 0.717 and a high accuracy rate of 74.1%, the logistic regression model
is a solid basis for these choices.

Loan Approval Strategy: The main goal is to reduce failures while giving loans to people who can
pay them back. To catch more defaulters, the company can find a good mix between recall and
precision by changing the decision threshold. This can be done by lowering the recall level, which
will help find more people who might not pay back their loans. Using the expected odds, it is also
possible to make a credit scoring system that can make decisions automatically. Applicants with
high scores can be accepted right away, while applicants with average scores may need more
review. This speeds up the clearance process and makes sure that the risk is evaluated more
accurately.

Risk management: Auto Finance Ltd. can increase tracking of high-risk customers found by the
model to cut down on financial losses caused by defaults. To stop people from not making their
payments on time, early intervention tactics like payment reminders or loan restructuring options
can be used. Managing general risk is easier when the loan portfolio is diversified by balancing
customers with high and low risk. Changing credit limits based on risk scores and giving higher
limits to customers with low risk and lower limits or more security requirements to customers
with high risk can help reduce losses even more.

Customer Segmentation: Knowing about the different types of customers helps a business make
sure its goods and services are just right for each one. Risk numbers can be used to divide
customers into low-, medium-, and high-risk groups. Then, loan products can be made that are
specifically designed for each group. For example, low-risk customers can get loans with low
interest rates, while high-risk customers can get loans with flexible payment plans. Personalized
communication strategies can also be used to build better relationships with low-risk customers
and get involved with high-risk customers ahead of time to stop defaults.

Marketing and Reaching Out to Customers: The logistic regression model can make marketing
plans much more effective by focusing on low- and medium-risk customers who are more likely
to be approved for loans. This focused marketing method gets more customers while lowering the
risk. Different groups of people with different levels of risk can be given different promotional
deals. For example, low-risk customers can get special interest rates or no processing fees.
Medium-risk customers could be kept by offering rewards for on-time payments and loyalty,
which would help build long-term relationships with customers and increase total retention rates.

Operational Improvements: The logistic regression model makes it possible to set up automated
systems for loan handling and risk assessment. This cuts down on the amount of work that needs
to be done by hand and speeds up the decision-making process. Staff training programs on how to
understand model results and use them in decision-making can make sure that the predictions are
used correctly. Setting up methods for continuous improvement and regularly adding new data to
the model to keep it accurate and useful should be done. The model will continue to make
accurate predictions as long as success metrics are tracked and any necessary changes are made.

Code-

set.seed(017)

library(readr)
library(dplyr)
library(caret)

d=read.csv(file.choose())

head(d)
str(d)
attach(d)
colSums(is.na(d))

data=na.omit(d)

d$DefaulterFlag=as.factor(data$DefaulterFlag)

set.seed(017)
trainIndex=createDataPartition(data$DefaulterFlag, p = 0.8, list = FALSE)
trainData=data[trainIndex,]
testData=data[-trainIndex,]

lg.mod.1=glm(DefaulterFlag ~ AGE + NOOFDEPE + MTHINCTH + SALDATFR +

TENORYR + DWNPMFR +
PROFBUS + QUALHSC + QUAL_PG + SEXCODE + FULLPDC +
FRICODE + WASHCODE,
data = trainData, family = binomial)

summary(lg.mod.1)
install.packages("ROCR")
library(ROCR)

testData$predicted_prob=predict(lg.mod.1, newdata = testData, type = "response")

pred=prediction(testData$predicted_prob, testData$DefaulterFlag)
perf=performance(pred, "tpr", "fpr")

plot(perf, colorize = TRUE)

abline(a = 0, b = 1, lty = 2, col = "Red")

auc=performance(pred, "auc")
[email protected][[1]]
print(paste("AUC:", auc_value))

opt_threshold=0.5
print(paste("Optimal Threshold:", opt_threshold))

testData$predicted_class=ifelse(testData$predicted_prob > opt_threshold, 1, 0)

levels(testData$predicted_class)=levels(testData$DefaulterFlag)

table(testData$predicted_class)
table(testData$DefaulterFlag)
conf_matrix=confusionMatrix(as.factor(testData$predicted_class),
as.factor(testData$DefaulterFlag))
print(conf_matrix)
table(testData$predicted_class,testData$DefaulterFlag)

accuracy=conf_matrix$overall['Accuracy']
precision=conf_matrix$byClass['Pos Pred Value']
recall=conf_matrix$byClass['Sensitivity']
f1_score=2 * ((precision * recall) / (precision + recall))

print(paste("Accuracy:", accuracy))
print(paste("Precision:", precision))
print(paste("Recall:", recall))
print(paste("F1 Score:", f1_score))

profit_TP=6000
loss_FP=-6000
loss_FN=-12000
profit_TN=2000
TP=3775
TN=509
FP=1184
FN=313

total_profit_loss=(TP * profit_TP) + (FP * loss_FP) + (FN * loss_FN) + (TN * profit_TN)

print(paste("Total Profit/Loss:", total_profit_loss))

## Question4-:

library(ROCR)
library(caret)

profit_TP=6000
loss_FP= -6000
loss_FN=-12000
profit_TN=2000

testData$predicted_prob=predict(lg.mod.1, newdata = testData, type = "response")

thresholds=seq(0, 1, by = 0.01)

profits_losses=numeric(length(thresholds))

for (i in seq_along(thresholds)) {
threshold=thresholds[i]
predicted_class=ifelse(testData$predicted_prob > threshold, 1, 0)

conf_matrix=table(factor(predicted_class, levels = c(0, 1)),

factor(testData$DefaulterFlag, levels = c(0, 1)))

TP=conf_matrix[2, 2]
TN=conf_matrix[1, 1]
FP=conf_matrix[2, 1]
FN =conf_matrix[1, 2]

total_profit_loss= (TP * profit_TP) + (FP * loss_FP) + (FN * loss_FN) + (TN * profit_TN)

profits_losses[i] = total_profit_loss
}

optimal_threshold = thresholds[which.max(profits_losses)]
max_profit_loss = max(profits_losses)

print(paste("Optimal Threshold based on Profit/Loss:", optimal_threshold))

print(paste("Maximum Profit/Loss:", max_profit_loss))

testData$predicted_class_opt = ifelse(testData$predicted_prob > optimal_threshold, 1,

testData$predicted_class_opt = as.factor(testData$predicted_class_opt)
testData$DefaulterFlag = as.factor(testData$DefaulterFlag)

levels(testData$predicted_class_opt) = levels(testData$DefaulterFlag)

conf_matrix_opt = confusionMatrix(testData$predicted_class_opt,
testData$DefaulterFlag)
print(conf_matrix_opt)

TP_opt = conf_matrix_opt$table[2, 2]
TN_opt = conf_matrix_opt$table[1, 1]
FP_opt = conf_matrix_opt$table[2, 1]
FN_opt = conf_matrix_opt$table[1, 2]

total_profit_loss_opt = (TP_opt * profit_TP) + (FP_opt * loss_FP) + (FN_opt * loss_FN) +

(TN_opt * profit_TN)
print(paste("Total Profit/Loss for the optimal threshold:", total_profit_loss_opt))

library(readr)
library(dplyr)
library(caret)
library(rpart)
library(randomForest)
library(ROCR)
set.seed(017)
data$DefaulterFlag = as.factor(data$DefaulterFlag)
trainIndex = createDataPartition(data$DefaulterFlag, p = 0.8, list = FALSE)
trainData = data[trainIndex,]
testData = data[-trainIndex,]
tree_model = rpart(DefaulterFlag ~ AGE + NOOFDEPE + MTHINCTH + SALDATFR +
TENORYR + DWNPMFR +
PROFBUS + QUALHSC + QUAL_PG + SEXCODE + FULLPDC +
FRICODE + WASHCODE,
data = trainData, method = "class")
tree_pred_prob = predict(tree_model, testData, type = "prob")[,2]
tree_pred_class = predict(tree_model, testData, type = "class")

table(tree_pred_class)
table(testData$DefaulterFlag)
conf_matrix_tree=confusionMatrix(as.factor(tree_pred_class),
as.factor(testData$DefaulterFlag))
print(conf_matrix_tree)
pred_tree = prediction(tree_pred_prob, testData$DefaulterFlag)

perf_tree = performance(pred_tree, "tpr", "fpr")

auc_tree = performance(pred_tree, "auc")
auc_value_tree = [email protected][[1]]
print(paste("AUC for Classification Tree:", auc_value_tree))

rf_model = randomForest(DefaulterFlag ~ AGE + NOOFDEPE + MTHINCTH +

SALDATFR + TENORYR + DWNPMFR +
PROFBUS + QUALHSC + QUAL_PG + SEXCODE + FULLPDC +
FRICODE + WASHCODE,
data = trainData, ntree = 500, importance = TRUE)
rf_pred_prob = predict(rf_model, testData, type = "prob")[,2]
rf_pred_class = predict(rf_model, testData,type="class")

table(rf_pred_class)
table(testData$DefaulterFlag)
conf_matrix_rf=confusionMatrix(as.factor(rf_pred_class),
as.factor(testData$DefaulterFlag))
print(conf_matrix_rf)
pred_rf = prediction(rf_pred_prob, testData$DefaulterFlag)
perf_rf = performance(pred_rf, "tpr", "fpr")
auc_rf = performance(pred_rf, "auc")
auc_value_rf = [email protected][[1]]
print(paste("AUC for Random Forest:", auc_value_rf))

importance(rf_model)
varImpPlot(rf_model)

rf_accuracy=conf_matrix_rf$overall['Accuracy']
rf_precision=conf_matrix_rf$byClass['Pos Pred Value']
rf_recal=conf_matrix_rf$byClass['Sensitivity']
rf_f1_score=2 * ((rf_precision * rf_recall) / (rf_precision + rf_recall))

print(paste("Random Forest Accuracy:", rf_accuracy))

print(paste("Random Forest Precision:", rf_precision))
print(paste("Random Forest Recall:", rf_recall))
print(paste("Random Forest F1 Score:", rf_f1_score))
print(paste("Random Forest AUC:", auc_value_rf))

# Classification Tree Results

tree_accuracy = conf_matrix_tree$overall['Accuracy']
tree_precision = conf_matrix_tree$byClass['Pos Pred Value']
tree_recall = conf_matrix_tree$byClass['Sensitivity']
tree_f1_score = 2 * ((tree_precision * tree_recall) / (tree_precision + tree_recall))

print(paste("Classification Tree Accuracy:", tree_accuracy))

print(paste("Classification Tree Precision:", tree_precision))
print(paste("Classification Tree Recall:", tree_recall))
print(paste("Classification Tree F1 Score:", tree_f1_score))
print(paste("Classification Tree AUC:", auc_value_tree))

Project - Time Series Forecasting - Rajendra M Bhat
82% (11)
Project - Time Series Forecasting - Rajendra M Bhat
33 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
The Complete Guide To Artificial Intelligence in Radiology (2022)
100% (2)
The Complete Guide To Artificial Intelligence in Radiology (2022)
34 pages
Statistics Assignment 2
No ratings yet
Statistics Assignment 2
16 pages
MGT555 - Individual Assignment 1 - AFIQ NAJMI BIN ROSMAN 2020878336-NURUL NABILAH BINTI AYOB 202183944 NBO5B
No ratings yet
MGT555 - Individual Assignment 1 - AFIQ NAJMI BIN ROSMAN 2020878336-NURUL NABILAH BINTI AYOB 202183944 NBO5B
10 pages
MGT555 - Individual Assignment 1 - 2019641012 - NBO5B
No ratings yet
MGT555 - Individual Assignment 1 - 2019641012 - NBO5B
10 pages
QWE Case Study
No ratings yet
QWE Case Study
5 pages
CAP Lab General Checklist 2017
No ratings yet
CAP Lab General Checklist 2017
144 pages
Sta Chương15
No ratings yet
Sta Chương15
4 pages
Resultados
No ratings yet
Resultados
26 pages
tut6_solution
No ratings yet
tut6_solution
4 pages
Halima Shahid, Sp19-Baf-024 - Usa
No ratings yet
Halima Shahid, Sp19-Baf-024 - Usa
13 pages
Advanced Statistics - Project Report
100% (5)
Advanced Statistics - Project Report
14 pages
Capstone - 1 Notes - Vikas Chauhan PDF
100% (3)
Capstone - 1 Notes - Vikas Chauhan PDF
13 pages
Model and Submission Sheets
No ratings yet
Model and Submission Sheets
1,101 pages
ba notes
No ratings yet
ba notes
9 pages
Sigma Calculator
No ratings yet
Sigma Calculator
17 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Nilay Debnath CSE 06607735
No ratings yet
Nilay Debnath CSE 06607735
22 pages
LR CVD
No ratings yet
LR CVD
7 pages
Predicting The Churn in Telecom Industry
No ratings yet
Predicting The Churn in Telecom Industry
23 pages
Intro LOGIT
No ratings yet
Intro LOGIT
46 pages
LR After Ga
No ratings yet
LR After Ga
5 pages
SAS Modeling Tool To Access Credit Risk
No ratings yet
SAS Modeling Tool To Access Credit Risk
19 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
CustomerChurn Assignment
100% (3)
CustomerChurn Assignment
15 pages
Classification Problems
100% (1)
Classification Problems
25 pages
FIN441 Homework 2 FL16 Solution
No ratings yet
FIN441 Homework 2 FL16 Solution
10 pages
Week 15.2 - Six Sigma Calculations
No ratings yet
Week 15.2 - Six Sigma Calculations
13 pages
Chapter 3
No ratings yet
Chapter 3
24 pages
Student - File Normal Distribution and Bayes Theorem (1) .Ipynb - Colaboratory
No ratings yet
Student - File Normal Distribution and Bayes Theorem (1) .Ipynb - Colaboratory
5 pages
Exam3 510
No ratings yet
Exam3 510
11 pages
Six Sigma-An Overview-1
No ratings yet
Six Sigma-An Overview-1
28 pages
Itvoyagers Bi Practical 10 Mu Tyit
No ratings yet
Itvoyagers Bi Practical 10 Mu Tyit
8 pages
Assignment1
No ratings yet
Assignment1
9 pages
QM Assignment
No ratings yet
QM Assignment
7 pages
Pratik Zanke Source Codes
No ratings yet
Pratik Zanke Source Codes
20 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Business Research: Corporate Governance in Financial Institutions of Bangladesh
No ratings yet
Business Research: Corporate Governance in Financial Institutions of Bangladesh
31 pages
MICROECONOMETRICSCV
No ratings yet
MICROECONOMETRICSCV
6 pages
Practice Paper 2 - Solutions
No ratings yet
Practice Paper 2 - Solutions
13 pages
Introduction To Six Sigma: By: - Hakeem-Ur-Rehman Iqtm-Pu
No ratings yet
Introduction To Six Sigma: By: - Hakeem-Ur-Rehman Iqtm-Pu
22 pages
Final
No ratings yet
Final
13 pages
Bartlett Test To Check Whether The Data Is Suitable For PCA
100% (2)
Bartlett Test To Check Whether The Data Is Suitable For PCA
24 pages
Oci Calculator
No ratings yet
Oci Calculator
43 pages
AMA Assignment
No ratings yet
AMA Assignment
6 pages
Statistics Presentation
No ratings yet
Statistics Presentation
20 pages
STATA Priliminary Data Analysis
No ratings yet
STATA Priliminary Data Analysis
9 pages
Logistic Regression EBay
No ratings yet
Logistic Regression EBay
10 pages
Naseer
No ratings yet
Naseer
1 page
RSK4805_Solutions_03_2024 (3)
No ratings yet
RSK4805_Solutions_03_2024 (3)
12 pages
Value at Risk
No ratings yet
Value at Risk
28 pages
Statistics Assignment 20220102077
No ratings yet
Statistics Assignment 20220102077
6 pages
Telecom Customer Churn RV PDF
No ratings yet
Telecom Customer Churn RV PDF
29 pages
Chapter 2 Risk Assessment
No ratings yet
Chapter 2 Risk Assessment
33 pages
Individual Assignment Business Intelligence & Analytics: Indian Institute of Management Shillong
No ratings yet
Individual Assignment Business Intelligence & Analytics: Indian Institute of Management Shillong
32 pages
Rose Sparkling Wine
100% (1)
Rose Sparkling Wine
32 pages
solutions3 (1)
No ratings yet
solutions3 (1)
12 pages
07 - Beyond Confounders - Causal Inference For The Brave and True
No ratings yet
07 - Beyond Confounders - Causal Inference For The Brave and True
15 pages
Shuvajit Majumdar Section 1 and Section 2 Answers
No ratings yet
Shuvajit Majumdar Section 1 and Section 2 Answers
9 pages
Solutions Manual to accompany Introduction to Linear Regression Analysis
From Everand
Solutions Manual to accompany Introduction to Linear Regression Analysis
Douglas C. Montgomery
1/5 (1)
A Quantitative Approach to Commercial Damages: Applying Statistics to the Measurement of Lost Profits
From Everand
A Quantitative Approach to Commercial Damages: Applying Statistics to the Measurement of Lost Profits
Mark G. Filler
No ratings yet
Small Animal Clinical Diagnosis by Laboratory Methods 5th Edition Michael D. Willard Dvm Ms - Own the ebook now and start reading instantly
100% (1)
Small Animal Clinical Diagnosis by Laboratory Methods 5th Edition Michael D. Willard Dvm Ms - Own the ebook now and start reading instantly
56 pages
Jurnal Sensitivitas & Spesifisitas
No ratings yet
Jurnal Sensitivitas & Spesifisitas
9 pages
Commissioning Tests To Assure MV Power Cable Systems Meet
No ratings yet
Commissioning Tests To Assure MV Power Cable Systems Meet
6 pages
Psychometric Properties of Conversion Disorder Scale-Revised (CDS) For Children
No ratings yet
Psychometric Properties of Conversion Disorder Scale-Revised (CDS) For Children
6 pages
Session 15 Price Sensitivity Measurement
No ratings yet
Session 15 Price Sensitivity Measurement
32 pages
Leveraging LLMs in Financial Security
No ratings yet
Leveraging LLMs in Financial Security
28 pages
(2017) Empowering Preschool Teachers To Identify Mental Health Problems A Task-Sharing Intervention in Ethiopia
No ratings yet
(2017) Empowering Preschool Teachers To Identify Mental Health Problems A Task-Sharing Intervention in Ethiopia
11 pages
Phishing
No ratings yet
Phishing
13 pages
Applying Data Mining To Customer Churn Prediction in An Internet Service Provider
No ratings yet
Applying Data Mining To Customer Churn Prediction in An Internet Service Provider
7 pages
Validation Report For Horiba Pockit
No ratings yet
Validation Report For Horiba Pockit
15 pages
PSQI
No ratings yet
PSQI
6 pages
Artificial Intelligence Neuro Symbolic 978-3-642-23960-1 - 10
No ratings yet
Artificial Intelligence Neuro Symbolic 978-3-642-23960-1 - 10
10 pages
Prognostic Value of Elevated Lactate Dehydrogenase
No ratings yet
Prognostic Value of Elevated Lactate Dehydrogenase
14 pages
Social and Preventive Medicine Sample
No ratings yet
Social and Preventive Medicine Sample
11 pages
Chlortetracycline and Tetracyline - FAO
No ratings yet
Chlortetracycline and Tetracyline - FAO
18 pages
SWYC Manual v101 Web Format 33016
No ratings yet
SWYC Manual v101 Web Format 33016
157 pages
197-1591179912-Prasad-3
No ratings yet
197-1591179912-Prasad-3
10 pages
DaSilva 2021
No ratings yet
DaSilva 2021
12 pages
Ambo University Inistitute of Technology Department of Computer Science
No ratings yet
Ambo University Inistitute of Technology Department of Computer Science
13 pages
Open Perimeter Interface (OPI)
No ratings yet
Open Perimeter Interface (OPI)
15 pages
Epidemiology and Screening For Renal Cancer
No ratings yet
Epidemiology and Screening For Renal Cancer
14 pages
BTS Childhood CAP Guides
No ratings yet
BTS Childhood CAP Guides
25 pages
6 Sensitivity_Analysis
No ratings yet
6 Sensitivity_Analysis
2 pages
S94 - Harish Kumar Rao Manav Chowk Cc-Sahyogi Juneja Towers, 716, Rattangarh Road, Manav Chowk, Ambala CITY, PH-9996006678
No ratings yet
S94 - Harish Kumar Rao Manav Chowk Cc-Sahyogi Juneja Towers, 716, Rattangarh Road, Manav Chowk, Ambala CITY, PH-9996006678
3 pages
III BCA ML_syll_model_all units - Copy
No ratings yet
III BCA ML_syll_model_all units - Copy
85 pages
Development of The GerdQ, A Tool For The Diagnosis
No ratings yet
Development of The GerdQ, A Tool For The Diagnosis
9 pages
PSM Test Questions (5 Uni)
No ratings yet
PSM Test Questions (5 Uni)
14 pages
Nciph ERIC16
No ratings yet
Nciph ERIC16
4 pages