0% found this document useful (0 votes)

53 views15 pages

Cis111 - 6 Assignment 2 Advanced Data Techqniuqe For Data Mining

This document discusses advanced data mining techniques for direct marketing campaigns. It introduces the topic and describes how banks analyze customer data to target marketing offers. The assignment is to analyze different data mining techniques that can be used for this purpose using a bank marketing dataset from Kaggle. Business intelligence and data mining techniques are commonly used to improve business performance.

Uploaded by

Hafeez Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

0% found this document useful (0 votes)

53 views15 pages

Cis111 - 6 Assignment 2 Advanced Data Techqniuqe For Data Mining

Uploaded by

Hafeez Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 15

UNIT:

ASSIGNMENT:
UNIT CORDINATOR:

STUDENT NAME: DIKSHA

ID:
EMAIL: {YOUR.NAME}@STUDY.BEDS.AC.UK

1
ADVANCED DATA MININGTECHNIQUES
FOR DIRECT MARKETING CAMPAIGNS

2
1 . INTRODUCTION

This task is describe the basic and advanced data mining techniques with bank marketing dataset
i i i i i i i i i i i i i i

of kaggle. The banking sector is increasing day by day in terms of innovation and evolving. We
i i i i i i i i i i i i i i i i i

choose this dataset for two reasons, one is this dataset is used in kaggle competition. And the second
i i i i i i i i i i i i i i i i i i

is Bank store a very large amount of data including customer’s personal info, and have previous
i i i i i i i i i i i i i i i i

history of all time customers. This way they market their products and offers with the help of
i i i i i i i i i i i i i i i i i

customers history. For targeting the customers’ demands bank use one to one meeting and media
i i i i i i i i i i i i i i i

is called direct marketing.As in this assignment we have to analyze different techniques of Data
i i i i i i i i i i i i i i i

mining approaches.
i i

Business intelligence with data mining is very common now a days, there are a lot of techniques and
i i i i i i i i i i i i i i i i i

i solution for the business improvement are developed now. Particularly, in the data science field a
i i i i i i i i i i i i i i

i modern world is using for their decisions. Previous data is first task for solving the existing problem
i i i i i i i i i i i i i i i i

i while the prediction about upcoming data is very useful. As we know, a lot of different techniques
i i i i i i i i i i i i i i i i

and advanced techniques are developing but in this task we use some of them. Below is the list of
i i i i i i i i i i i i i i i i i i i

Data mining methods and algorithms, Data cleaning like dealing with missing values and
i i i i i i i i i i i i i

removing outliers, Data visualization with multiple libraries with python, Track pattern in data
i i i i i i i i i i i i i

with two types of analysis one is uni variant analysis with one column wise and second is bi variant
i i i i i i i i i i i i i i i i i i i

i analysis with multiple column analysis. Visualization graph is used with the help of python library
i i i i i i i i i i i i i i

for showing uni variate and bi variate analysis. After that, classification is the process of dealing
i i i i i i i i i i i i i i i i

with classes and multiple column classify with the simple format. There are many classification
i i i i i i i i i i i i i i

techniques but we used linear model with the provided dataset. And the final one we used decision
i i i i i i i i i i i i i i i i i

tree classifier based on target column and as we know decision tree classifier is worked with only
i i i i i i i i i i i i i i i i i

one true value of column. In this Paragraph it should be decide the objective of this task, Earlier I
i i i i i i i i i i i i i i i i i i i

said business decision can be made with data mining and the best method is decision tree classifier
i i i i i i i i i i i i i i i i i

is best one for all the others techniques of those dataset which have data of target column. Other
i i i i i i i i i i i i i i i i i i

machine learning algorithm, like XGBoost and SVM is the best algorithm for analyzing the
i i i i i i i i i i i i i i

dataset. But in this assignment we are only used decision tree classifier. The objective of this
i i i i i i i i i i i i i i i

dataset is to effect those customers and improve the techniques of direct marketing, calculate the
i i i i i i i i i i i i i i i

features where direct marketing efficiently used.

i i i i i i

3
2 . DESIGNING A SOLUTION I I

First need to analyzed the dataset, how many feature this dataset have. The Question is which are
i i i i i i i i i i i i i i i i

i usefull for us for meeting the objective and which are just outliers and how many columns are
i i i i i i i i i i i i i i i i

i effecting the dataset in a different directions. As I describe earlier that many data mining
i i i i i i i i i i i i i i

techniques are exist but we use some techniques in which we find a solution.
i i i i i i i i i i i i i i

DATASET:
Dataset is described with the help of python. This dataset include more then 41k rows with 20
i i i i i i i i i i i i i i i i

i columns. Later we extract the most important features from them. Below picture is explaining the
i i i i i i i i i i i i i i

i basic structure the Dataset, First column is age, describing the age of customer, and later the job is
i i i i i i i i i i i i i i i i i

i explaining his/her profession. And the others features like education, compaign duration are also
i i i i i i i i i i i i

i have some importance points in them. Let’s dive into column values.
i i i i i i i i i i

Fig 1: Describing the Dataset

i i i i

4
Below picture is raises another important point in which target column count is described and this
i i i i i i i i i i i i i i i

resulted two type of information.

i i i i i

A- Dataset have so much no column values with respect to yes values.

i i i i i i i i i i i i

B- This column value acts as different either we use half dataset. Hence the distribution is
i i i i i i i i i i i i i i i

i skewed towards max rows of columns values.

i i i i i i

fig 2: explains the target column

i i i i i

5
Above picture explains complete dataset, now considering this column we analyse in what age have
I i i i i i i i i i i i i i

yes deposit account or in what age less deposit account. As we have many no in the target column.
i i i i i i i i i i i i i i i i i i i

Fig 3: describing the count of target column with respect age column.
i i i i i i i i i i i

Hence above picture shows that almost age have many record count between 30 to 50 But the
i i i i i i i i i i i i i i i i

i distribution of yes or no is different. We saw almost 90% records have no column then after targeting
i i i i i i i i i i i i i i i i i

i age column with respect to distribution of either they have account or not is not skewed so much. This
i i i i i i i i i i i i i i i i i i

Results we don’t have so much skewedness in the distribution of age feature.

i i i i i i i i i i i i i

Now we are moving towards Uni variate analysis. This techniques is used for analyzing single
i i i i i i i i i i i i i i

column values means the distribution of columns. The question is how the column is skewed
i i i i i i i i i i i i i i i

towards values. How the distribution of data is included in the dataset.

i i i i i i i i i i i i

6
Fig 4: Explains the Education column values counts
i i i i i i i

Fig 5: Explains the Job column.

i i i i i

7
Now let’s move to the bi-varaite analysis with column to column dependencies and how the one
i i i i i i i i i i i i i i i

i column is effecting the other column value. How are they effect with multiple values of
i i i i i i i i i i i i i i

distribution?
i

BIVARIATE ANALYSIS: I

First analysis is between column values of marital status with age and target column in this dataset.
i i i i i i i i i i i i i i i i

i There are many graph explains this type of analysis but we use boxplot of this analysis. Boxplot have
i i i i i i i i i i i i i i i i i

many benefits because it shows the one column values with different colors.
i i i i i i i i i i i i

This analysis results that divorced and married have yes target column with respect to other marital
i i i i i i i i i i i i i i i

i status. So, Target column explains that single peoples have less yes target column and age wise they
i i i i i i i i i i i i i i i i

are younger then other martial status. Below picture describing the this analysis.
i i i i i i i i i i i i

Fig 6: BiVariate Analysis of age and martial with target column.

i i i i i i i i i i

8
Now lets move another Bi variate analysis, this time we target our column is education which is
i i i i i i i i i i i i i i i i

very skewed towards target column. As we know target column is very skewed towards no values,
i i i i i i i i i i i i i i i i

we are finding how this column effect another columns. Below picture explains the education
i i i i i i i i i i i i i i

main values with respect to age and target the last feature. We are analyzing the question is that
i i i i i i i i i i i i i i i i i i

how many educated and uneducated peoples have deposit account with respect to age matters.
i i i i i i i i i i i i i i

Fig 7: Describing education on target column yes or no.

i i i i i i i i i

Above picture resulted that, those who have basic 4y education with age 60+ have more
i i i i i i i i i i i i i i

deposit accounts then others, second results is that whose education is unknown have deposit
i i i i i i i i i i i i i i

account. Hence with the count wise, basic4y education is high records of yes.
i i i i i i i i i i i i i

3 . EXPERIMENTS

Now we are calculating the advanced data mining techniques like classifications, Regression and
i i i i i i i i i i i i

the decision tree classifier. First we need to extract features and split dataset into two data streams like
i i i i i i i i i i i i i i i i i i

9
training dataset and testing dataset. Training dataset is used for train our model like decision tree
i i i i i i i i i i i i i i i

in this case and we predict the next values with the help of testing dataset. Split with datasets are
i i i i i i i i i i i i i i i i i i i

very important for accuracy, and also we have to calculate our main features which are needed for
i i i i i i i i i i i i i i i i i

the evaluation procedure. First of all we need to extract a matrix of correlation with the column to
i i i i i i i i i i i i i i i i i

column. This way we can make a decision of calculating main features.

i I i i i i i i i i i i

We are using sklearn library for preprocessing the dataset. The function name is LableEncoder
i i i i i i i i i i i i i

which works to transform the datasets columns into one-hot encoding of numeric columns.
i i i i i i i i i i i i i

Because for training models it need to be completed that all the numeric column should results in
i i i i i i i i i i i i i i i i i

one boundary. Means all the columns features train to maximum and minimum values of their
i i i i i i i i i i i i i i i

distributions. I will explain this way, once the column values have different distribution values then
i i i i i i i i i i i i i i i

its very difficult to train a model. So, all the column features assigned as same distributions values.
i i i i i i i i i i i i i i i i i

Standard scaler from sklearn is used to transform the values to some distribution. After
i i i i i i i i i i i i i

transforming dataset look like this,

i i i i i

Fig 8: After transforming Dataset shape: i i i i i

After transforming the next step is to split the dataset into train test, we use sklearn.train_test_split
i i i i i i i i i i i i i i i

i into two different datasets the shape of after split is in this format, X_train.shape have 32950 rows
i i i i i i i i i i i i i i i i

and 19 column one column is removed from training because it is used y training for evaluating,
i i i i i i i i i i i i i i i i

since y column also have rows equal to 32950. Then for testing X_test.shape have 8238 rows with
i i i i i i i i i i i i i i i i i

19 columns and Y_test.shape also have 8238 rows respectively.

i i i i i i ii i i

First Experiment is Logistic regression which results accuracy of 90%. On the test set so this model
i i i i i i i i i i i i i i i i

i needs some improvements. Now we are moving to next experimental technique is. Before moving to
i i i i i i i i i i i i i i

classification report we need to create confusion matrix for approximate results.

i i i i i i i i i i i

10
Following parameters are to be calculated with the classification techniques is the result of confusion
i i i i i i i i i i i i i i

matrix and prediction using classification problem.

i i i i i i

Below picture is taken from code which is explaining the confusion matrix, accuracy score and f1-
I, i i i i i i i i i i i i i i

i score and all the parameters of predictions using the y target class. This means either we use logistic
i i i i i i i i i i i i i i i i i

regression or classification techniques the difference of results are calculated below.

i i i i i i i i i i i

Fig 9: Summary of classifications Results

i i i i i

Above pictures shows that our predictions with confusion matrix are more correct then wrong
i i i i i i i i i i i i i

i predictions. Actual no with predicted no is 7191 and actual no with predicted yes are 103 means
i i i i i i i i i i i i i i i i

i correction predictions have greater number then actual yes is predicted. With the precision of 91%
i i i i i i i i i i i i i i

means corrected predictions have higher weight.

i i i i i i

11
Fig 10: ROC Curve of FP Rate
i i i i i i

12
Roc curve means the ratio of false positive with respect to true positive. Our algorithm predict false
i i i i i i i i i i i i i i i i

i positive then true positive. This explains the predicted calculation are either towards positive or
i i i i i i i i i i i i i

i negative. The ration converts the accuracy of the False Positive which are calculated false but their
i i i i i i i i i i i i i i i

actual values is true and true positive which are calculates true and their actual value is yes, means our
i i i i i i i i i i i i i i i i i i

algorithm differentiate the two different ratio. This is the curve gives us the algorithm testing. Above
i i i i i i i i i i i i i i i i

graph is explaining the Roc curve of logistic regression.

i i i i i i i i i

Now the Decision tree classifier is the path finder of the results. Complete dataset divided into path of
i i i i i i i i i i i i i i i i

reaching the exact result.

i i i i

Fig 11: Decision tree classifier of Bank Dataset

i i i i i i i

13
The tree is resulted in this way,
i i i i i i

1- When entropy is greater than 0.9 it always yes for the predicted class.
i i i i i i i i i i i i

2- When column value of nr.employed <= -1.099 then entropy is 0.5 resulted yes with
i i i i i i i i i i i i i

i the predicted class. i i

3- With column value of nr.employed if we check cons.conf.idx value <= -1.328 then
i i i i i i i i i i i i

i it is always yes, either which column value is added.

i i i i i i i i i

4- When checking with three column values nr.employed with month, and days of week
i i i i i i i i i i i i

i then this value is no with entropy of 0.9

i i i i i i i i

5- If we consider poutcome <= -.1.5 with days of week comparing all entropy value
i i i i i i i i i i i i i

i of nr.employed will lead to no in decision.

i i i i i i i

6- If we consider poutcome <= -.1.5 with cons.price.idx comparing all entropy

i i i i i i i i i i

i value of nr.employed will lead to no in decision. i i i i i i i i

7- The best case is nr.employed with month addition to poutcome value of <= 1.5 lead to
i i i i i i i i i i i i i i i

i the yes in predicted class.

i i i i

4 . CONCLUSIONS

The bank marketing strategy effects with multiple data patterns. In this results made from Decision
i i i i i i i i i i i i i i

i tree are those which lead to positive marketing strategy. This pattern results conclude the resulted
i i i i i i i i i i i i i i

i best marketing campaigns. The resulted path is either the value of no and yes, no scheme is
i i i i i i i i i i i i i i i i

i promoting to shift to yes and the yes pattern need another intentions for their business. Managers
i i i i i i i i i i i i i i i

and other stakeholders made their choices according to situations. As many patterns seems likely to
i i i i i i i i i i i i i i i

old styles in bi varaite analysis. Moreover the classifier takes too much in making the business
i i i i i i i i i i i i i i i i

higher.
i

14
5 . REFERENCES

1. Aggarwal, C. C. (2015). Data Mining: The Textbook. Springer.

2. Bramer, M. (2016). Principles of Data Mining. Springer.

3. Chen, M., Hao, Y., & Zhang, Y. (Eds.). (2018). Data Mining: Theories, Algorithms, and
Examples. CRC Press.

4. Han, J., Pei, J., Kamber, M., & Dong, G. (2011). Data Mining: Concepts and
Techniques (3rd ed.). Morgan Kaufmann.

5. Tan, P. N., Steinbach, M., & Kumar, V. (2013). Introduction to Data Mining. Pearson.

6. Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Yu, P. S.
(2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-
37.

7. Aggarwal, C. C., & Zhai, C. (2012). Mining text data. Springer Science & Business
Media.

8. Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. LDV
Forum, 20(1), 19-62.

9. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge
discovery in databases. AI magazine, 17(3), 37-54.

10. Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.

(Cambridge Studies in Modern Economic History) J. Adam Tooze-Statistics and The German State, 1900-1945 - The Making of Modern Economic Knowledge-Cambridge University Press (2003) PDF
100% (1)
(Cambridge Studies in Modern Economic History) J. Adam Tooze-Statistics and The German State, 1900-1945 - The Making of Modern Economic Knowledge-Cambridge University Press (2003) PDF
333 pages
Machine Learning Lab Viva
100% (1)
Machine Learning Lab Viva
9 pages
(Synthesis+Lectures+on+Engineering) +David+L +Whitman,+Ronald+E +Terry-Fundamentals+of+Engineering+Economics+and+Decision+Analysis-Morgan+&+Claypool+Publishers+ (2012) PDF
100% (1)
(Synthesis+Lectures+on+Engineering) +David+L +Whitman,+Ronald+E +Terry-Fundamentals+of+Engineering+Economics+and+Decision+Analysis-Morgan+&+Claypool+Publishers+ (2012) PDF
221 pages
Machine Learning Project 3
No ratings yet
Machine Learning Project 3
74 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
DWDM (Unit-4)-2
No ratings yet
DWDM (Unit-4)-2
23 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
BUSINESS INTELLIGENCE docs
No ratings yet
BUSINESS INTELLIGENCE docs
12 pages
Default Payment Analysis of Credit Card Clients: July 2018
No ratings yet
Default Payment Analysis of Credit Card Clients: July 2018
7 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Data Mining Techniques Unit 2
No ratings yet
Data Mining Techniques Unit 2
48 pages
EMPLOYEE PERFORMANCE ANALYSIS
No ratings yet
EMPLOYEE PERFORMANCE ANALYSIS
3 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
22 pages
Lecture 2.1.3 2.1.4
No ratings yet
Lecture 2.1.3 2.1.4
34 pages
Data Understanding and Prepration
100% (1)
Data Understanding and Prepration
10 pages
DWDM Unit 1 Chap2 PDF
No ratings yet
DWDM Unit 1 Chap2 PDF
21 pages
Data Mining: Concepts and Techniques: November 21, 2013
No ratings yet
Data Mining: Concepts and Techniques: November 21, 2013
64 pages
Data Mining Outline
No ratings yet
Data Mining Outline
5 pages
Data Mining: Concepts and Techniques: January 14, 2014
No ratings yet
Data Mining: Concepts and Techniques: January 14, 2014
64 pages
unit 3 DWDM
No ratings yet
unit 3 DWDM
25 pages
data_preprocess_steps
No ratings yet
data_preprocess_steps
2 pages
AA MDM MST
No ratings yet
AA MDM MST
8 pages
Mutivariate and Baysian
No ratings yet
Mutivariate and Baysian
21 pages
2 - Preprocessing
No ratings yet
2 - Preprocessing
74 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
Data Mining: Concepts and Techniques: April 30, 2012
No ratings yet
Data Mining: Concepts and Techniques: April 30, 2012
64 pages
Data - part 1
No ratings yet
Data - part 1
58 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
Introd M
No ratings yet
Introd M
37 pages
DWDM 4 UNIT NOTES
No ratings yet
DWDM 4 UNIT NOTES
21 pages
UNIT 4
No ratings yet
UNIT 4
42 pages
SMDM Project Report
No ratings yet
SMDM Project Report
39 pages
2016 Book PrinciplesOfDataMining PDF
100% (3)
2016 Book PrinciplesOfDataMining PDF
530 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Dawak2006 Final
No ratings yet
Dawak2006 Final
10 pages
#CH-2.2.2
No ratings yet
#CH-2.2.2
16 pages
Unit I
No ratings yet
Unit I
57 pages
Mining Class Comparisions and Mining Descriptive Statistical Measures
No ratings yet
Mining Class Comparisions and Mining Descriptive Statistical Measures
24 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
73 pages
DW&DM(Unit -4)
No ratings yet
DW&DM(Unit -4)
9 pages
Income Qualification Project3
No ratings yet
Income Qualification Project3
40 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
It - Kit 601 - Pes - SS - 31.05.2023
No ratings yet
It - Kit 601 - Pes - SS - 31.05.2023
13 pages
Chapter 5: Concept Description: Characterization and Comparison
No ratings yet
Chapter 5: Concept Description: Characterization and Comparison
58 pages
bg4 calculatingGDP
No ratings yet
bg4 calculatingGDP
63 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
3 pages
Sas Semma
100% (1)
Sas Semma
39 pages
Qm457-Chapter8
No ratings yet
Qm457-Chapter8
7 pages
Comparative Analysis of Classification Models On Income Prediction
No ratings yet
Comparative Analysis of Classification Models On Income Prediction
5 pages
2 Data mining tasks a functionalities (1)
No ratings yet
2 Data mining tasks a functionalities (1)
24 pages
Unit III: Concept Description: Characterization and Comparison
No ratings yet
Unit III: Concept Description: Characterization and Comparison
53 pages
Dr. Gaurav Dixit: Department of Management Studies
No ratings yet
Dr. Gaurav Dixit: Department of Management Studies
26 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
REVIEWER
No ratings yet
REVIEWER
9 pages
Greyse and Garbanzo Meet the Space Cats
From Everand
Greyse and Garbanzo Meet the Space Cats
Agnita Rose
No ratings yet
Impact of Internet Research Paper
100% (1)
Impact of Internet Research Paper
4 pages
Synopsis Report
No ratings yet
Synopsis Report
7 pages
HW 4 Chap 2
No ratings yet
HW 4 Chap 2
4 pages
期刊文章的优缺点
100% (2)
期刊文章的优缺点
6 pages
Stats Suggestions
No ratings yet
Stats Suggestions
15 pages
Association of Torture and Other Potentially
No ratings yet
Association of Torture and Other Potentially
13 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
4 pages
Profitability and Timeliness of Financial Reports in Nigerian Quoted Companies
No ratings yet
Profitability and Timeliness of Financial Reports in Nigerian Quoted Companies
12 pages
The Relationship Between The Academic Procrastination and Self-Efficacy Among Sample of King Saud University Students
No ratings yet
The Relationship Between The Academic Procrastination and Self-Efficacy Among Sample of King Saud University Students
12 pages
Andrew Crudge, Will Thomas, Kaiyuan Zhu, Landmark Recognition Using Machine Learning
No ratings yet
Andrew Crudge, Will Thomas, Kaiyuan Zhu, Landmark Recognition Using Machine Learning
5 pages
Angel Hill's Thesis
No ratings yet
Angel Hill's Thesis
45 pages
U6 1-RegressionBasics
No ratings yet
U6 1-RegressionBasics
45 pages
G7 Math 7 Exam 4th
No ratings yet
G7 Math 7 Exam 4th
7 pages
Levels of Measurement Answers. Enrichment (Stat)
No ratings yet
Levels of Measurement Answers. Enrichment (Stat)
4 pages
Unit 1 - AP For Data Science
No ratings yet
Unit 1 - AP For Data Science
19 pages
Effect of Job Stress On Job Burnout of Early Childhood Education Teachers
No ratings yet
Effect of Job Stress On Job Burnout of Early Childhood Education Teachers
9 pages
Regression
No ratings yet
Regression
8 pages
The Data Science Guide
100% (1)
The Data Science Guide
92 pages
LECTURE NOTES #2 CP and The Comparative Method
No ratings yet
LECTURE NOTES #2 CP and The Comparative Method
6 pages
Probability M
No ratings yet
Probability M
80 pages
PR1 L4
No ratings yet
PR1 L4
43 pages
Chapter 1666
No ratings yet
Chapter 1666
22 pages
Principles of Experimental Design and Data Analysis
100% (2)
Principles of Experimental Design and Data Analysis
8 pages
Sta 121 Slides
100% (1)
Sta 121 Slides
103 pages
Section and Solution
No ratings yet
Section and Solution
4 pages
CH 07
No ratings yet
CH 07
99 pages
BS-CHAPTER5
No ratings yet
BS-CHAPTER5
4 pages

Cis111 - 6 Assignment 2 Advanced Data Techqniuqe For Data Mining

Uploaded by

Cis111 - 6 Assignment 2 Advanced Data Techqniuqe For Data Mining

Uploaded by

UNIT:

STUDENT NAME: DIKSHA

features where direct marketing efficiently used.

Fig 1: Describing the Dataset

resulted two type of information.

A- Dataset have so much no column values with respect to yes values.

i skewed towards max rows of columns values.

fig 2: explains the target column

Results we don’t have so much skewedness in the distribution of age feature.

towards values. How the distribution of data is included in the dataset.

Fig 5: Explains the Job column.

Fig 6: BiVariate Analysis of age and martial with target column.

Fig 7: Describing education on target column yes or no.

column. This way we can make a decision of calculating main features.

transforming dataset look like this,

Fig 8: After transforming Dataset shape: i i i i i

19 columns and Y_test.shape also have 8238 rows respectively.

classification report we need to create confusion matrix for approximate results.

matrix and prediction using classification problem.

regression or classification techniques the difference of results are calculated below.

Fig 9: Summary of classifications Results

means corrected predictions have higher weight.

graph is explaining the Roc curve of logistic regression.

reaching the exact result.

Fig 11: Decision tree classifier of Bank Dataset

i the predicted class. i i

i it is always yes, either which column value is added.

i then this value is no with entropy of 0.9

i of nr.employed will lead to no in decision.

6- If we consider poutcome <= -.1.5 with cons.price.idx comparing all entropy

i value of nr.employed will lead to no in decision. i i i i i i i i

i the yes in predicted class.

1. Aggarwal, C. C. (2015). Data Mining: The Textbook. Springer.

2. Bramer, M. (2016). Principles of Data Mining. Springer.

10. Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.

You might also like