0% found this document useful (0 votes)

3 views33 pages

ANIL DS PROJECT

The document outlines an internship project on Data Science by K. Anil, covering topics such as data science fundamentals, Python programming, statistics, and machine learning. It includes a final project focused on predicting client subscriptions to term deposits for a retail banking institution using client and call data. The project involves data analysis, model building, and generating predictions using logistic regression and decision tree algorithms.

Uploaded by

anilkatta639

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views33 pages

ANIL DS PROJECT

Uploaded by

anilkatta639

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

INTERNSHIP

ON
DATA SCIENCE

BY
K.ANIL
FROM
TABLE OF CONTENTS

 Introduction to Data Science

 Python for Data Science
 Understanding the statistics for Data Science
 Predictive modeling and basics of Machine
Learning
 About final project.
what is data science ?

 Data Science is about finding patterns in data,

through analysis, and make future predictions.
 By using Data Science, companies are able to
make:
 Better decisions (should we choose A or B)
 Predictive analysis (what will happen next?)
 Pattern discoveries (find pattern, or maybe hidden
information in the data)
Structured vs Unstructured data
oStructured data refers to data that is
organized and formatted in a specific
way to make it easily readable and
understandable by both humans and
machines.

oStructured data is typically found in

databases and spreadsheets, and is
characterized by its organized nature.

oStructured data is highly valuable because it can be easily searched,

queried, and analyzed using various tools and techniques
python for Data
Science
o List : Lists are used to store multiple items in a single variable.
•Lists are one of 4 built-in data types in Python used to store collections
of data, the other 3 are Tuple, Set, and Dictionary, all with different
qualities and usage.
Lists are created using square brackets:
Example: thislist = ["apple", "banana", "cherry"]
Methods Description
append() Adds an element at the
end of the list
clear() Removes all the
elements from the list
copy() Returns a copy of the list
sort() Sorts the list
o Tuple : Tuples are used to store multiple items in a single variable.

• Tuples are written with round brackets.

•A tuple is a collection which is ordered and unchangeble.

Example of tuple:
thistuple = ("apple", "banana", "cherry")
Methods Description
count() Returns the number of times a
specified value occurs in a
tuple
Searches the tuple for a
index() specified value and returns the
position of where it was found
•Dictionary: Dictionaries are used to store data values in key:value pairs.
• A dictionary is a collection which is ordered*, changeable and do not
allow duplicates.
Example: thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
Method Description

clear() Removes all the elements from the

dictionary

copy() Returns a copy of the dictionary

fromkeys() Returns a dictionary with the

specified keys and value

get() Returns the value of the specified key

items() Returns a list containing a tuple for

each key value pair
Statistics for Data Science

• Data scientists need to understand the fundamental concepts

of descriptive statistics and probability theory, which include the
key concepts of probability distribution, statistical significance,
hypothesis testing and regression.

•Competency in statistics, computer programming and information

technology could lead you to a successful career in a wide range of
industries. Data scientists are needed almost everywhere, from
health care and science to business and banking
ROLE OF MACHINE LEARNING
IN DATA SCIENCE

 Machine learning analyzes and examines large

chunks of data automatically.

 It automates the data analysis process and makes

predictions in real-time without any human
involvement.

 You can further build and train the data model to

make real-time predictions.
:
Some basic import functions taken in data
science

 Numpy
 Pandas
 Matplotlib
FINAL PROJECT
PROJECT STATEMENT:
 Your client is a retail banking institution. Term deposits are a
major source of income for a bank. A term deposit is a cash
investment held at a financial institution. Your money is
invested for an agreed rate of interest over a fixed amount of
time, or term.
 The bank has various outreach plans to sell term deposits to
their customers such as email marketing, advertisements,
telephonic marketing and digital marketing. Telephonic
marketing campaigns still remain one of the most effective
way to reach out to people. However, they require huge
investment as large call centers are hired to actually execute
these campaigns. Hence, it is crucial to identify the
customers most likely to convert beforehand so that they can
be specifically targeted via call.
You are provided with the client data such as : age of the client, their job
type, their marital status, etc. Along with the client data, you are also
provided with the information of the call such as the duration of the call,
day and month of the call, etc. Given this information, your task is to
predict if the client will subscribe to term deposit.
Data PROVIDED:
You are provided with following files:
train.csv : Use this dataset to train the model. This file contains all the
client and call details as well as the target variable “subscribed”. You have
to train your model using this file.
test.csv : Use the trained model to predict whether a new set of clients
will subscribe the term deposit
Information provided in test.csv file
Information provided in train.csv file
Libraries which are used in this project

To read the csv file

Columns present in train.csv and test.csv files

Shape helps to return the no.of rows and columns

dtypes returns the data types of the columns of the data
Univariate Analysis
Now Let's look at the distribution of our target variable, i.e. subscribed. As
it is a categorical variable, let us look at its frequency table, percentage
distribution and bar plot.

To find the proportions of subscribed and unsubscribed can be obtained

as follows
Plot the bar graph for the obtained ratio’s

So, 3715 users out of total 31647 have subscribed which is around 12%.
Let's now explore the variables to have a better understanding of the
dataset. We will first explore the variables individually using univariate
analysis, then we will look at the relation between various independent
variables and the target variable. We will also look at the correlation plot to
see which variables affects the target variable most.
Now let's look at what are the different types of jobs of the clients.
As job is a categorical variable, we will look at its frequency table
Bivariate Analysis
From the above graph we can infer that students and retired people have
higher chances of subscribing to a term deposit, which is surprising as
students generally do not subscribe to a term deposit. The possible reason
is that the number of students in the dataset is less and comparatively to
other job types, more students have subscribed to a term deposit.
We can infer that clients having no previous default have slightly higher
chances of subscribing to a term loan as compared to the clients who
have previous default history.
Let's now look at how correlated our numerical variables are. We will see
the correlation between each of these variables and the variable which
have high negative or positive values are correlated. By this we can get
an overview of the variables which might affect our target variable. We
will convert our target variable into numeric values first.
We can infer that duration of the call is highly correlated with the target
variable. This can be verified as well. As the duration of the call is more, there
are higher chances that the client is showing interest in the term deposit and
hence there are higher chances that the client will subscribe to term deposit.

Next, we will start to build our predictive model to predict whether a client
will subscribe to a term deposit or not.As the sklearn models takes only
numerical input, we will convert the categorical variables into numerical
values using dummies. We will remove the ID variables as they are unique
values and then apply dummies. We will also remove the target variable and
Model Building
Logistic Regression

We got an accuracy score of around 90% on the validation dataset.

Logistic regression has a linear decision boundary. What if our data have
non linearity? We need a model that can capture this non linearity.
Let's try decision tree algorithm now to check if we get better accuracy
with that
Decision Tree

We got an accuracy of more than 90% on the validation set. You can try to
improve the score by tuning hyperparameters of the model. Let's now
make the prediction on test dataset. We will make the similar changes in
the test set as we have done in the training set before making the
predictions.
test = pd.get_dummies(test)
test_prediction = clf.predict(test)

Finally, we will save these predictions into a csv file. You can then open this
csv file and copy paste the predictions on the provided excel file to generate
score.

submission = pd.DataFrame()
# creating a Business_Sourced column and saving the predictions
in it
submission['ID'] = test['ID']
submission['subscribed'] = test_prediction
submission['subscribed'].replace(0,'no',inplace=True)
submission['subscribed'].replace(1,'yes',inplace=True)
submission.to_csv('submission.csv', header=True, index=False)

This results in formation of submission file where actual predict in values are
formed.

Presentation On Data Science
No ratings yet
Presentation On Data Science
15 pages
Data Science Training in Naresh I Technologies
100% (3)
Data Science Training in Naresh I Technologies
18 pages
data science
No ratings yet
data science
10 pages
Project Presentation
No ratings yet
Project Presentation
19 pages
Project Presentation.
No ratings yet
Project Presentation.
19 pages
Data Science Course From Packt
No ratings yet
Data Science Course From Packt
11 pages
Project Report
No ratings yet
Project Report
19 pages
4227 GUI Ebook Data Science Interview Guide
No ratings yet
4227 GUI Ebook Data Science Interview Guide
25 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Predicting The Term Deposit Subscription
No ratings yet
Predicting The Term Deposit Subscription
38 pages
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
100% (1)
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
41 pages
NAME-Rajat Gupta Section - B2B2 (Marketing and Analytics) UID - 2019-1706-0001-0007
No ratings yet
NAME-Rajat Gupta Section - B2B2 (Marketing and Analytics) UID - 2019-1706-0001-0007
9 pages
Data Analytics on Banking
No ratings yet
Data Analytics on Banking
3 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
50 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
INDUSTRY 2 Akshat
No ratings yet
INDUSTRY 2 Akshat
12 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
2 - Business Problems and Data Science Solutions
No ratings yet
2 - Business Problems and Data Science Solutions
26 pages
DsNaIT v2.0
No ratings yet
DsNaIT v2.0
43 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Data Science
No ratings yet
Data Science
8 pages
INDUSTRY 2 Jaimin
No ratings yet
INDUSTRY 2 Jaimin
14 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Short Notes
No ratings yet
Short Notes
44 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
Python Data Science Projects
No ratings yet
Python Data Science Projects
14 pages
CB0494 Notes
No ratings yet
CB0494 Notes
6 pages
Assignment
No ratings yet
Assignment
4 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
Data Mining Mid Project Report-Sagor
No ratings yet
Data Mining Mid Project Report-Sagor
11 pages
Industrialreport
No ratings yet
Industrialreport
26 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
5_6237938787641463884
No ratings yet
5_6237938787641463884
9 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
No ratings yet
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
20 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
21 pages
AI ML June 4 2022
No ratings yet
AI ML June 4 2022
40 pages
Data Science Project A01735388
No ratings yet
Data Science Project A01735388
21 pages
305 BA PYTHON - APR 2022 ANSWER Key
No ratings yet
305 BA PYTHON - APR 2022 ANSWER Key
14 pages
Banking Dataset - Marketing Targets
No ratings yet
Banking Dataset - Marketing Targets
19 pages
data-science-report
No ratings yet
data-science-report
32 pages
Six Weeks Summer Training Reportpdf
100% (1)
Six Weeks Summer Training Reportpdf
26 pages
Data Analytics
No ratings yet
Data Analytics
34 pages
Machine Learning Using Python
No ratings yet
Machine Learning Using Python
2 pages
Data Science Introduction_lecture Class.ppt
No ratings yet
Data Science Introduction_lecture Class.ppt
62 pages
Data Science Notes
No ratings yet
Data Science Notes
44 pages
With Python: Machine Learning
No ratings yet
With Python: Machine Learning
3 pages
Lab 03
No ratings yet
Lab 03
13 pages
Module 2
No ratings yet
Module 2
20 pages
120 Interview Questions
83% (12)
120 Interview Questions
19 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Machine Learning With Real Life Project: by - Rishabh Gaur
100% (2)
Machine Learning With Real Life Project: by - Rishabh Gaur
26 pages
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Computer Programming and Problem Solving Explorations
From Everand
Computer Programming and Problem Solving Explorations
Pasquale De Marco
No ratings yet
java pdf
No ratings yet
java pdf
118 pages
compiler design 1marks
No ratings yet
compiler design 1marks
2 pages
iot doc
No ratings yet
iot doc
18 pages
Assignment - Ai
No ratings yet
Assignment - Ai
11 pages
Plug-and-Play Image Restoration With Deep Denoiser Prior
No ratings yet
Plug-and-Play Image Restoration With Deep Denoiser Prior
16 pages
Iso Dis 19113
No ratings yet
Iso Dis 19113
38 pages
PROJECT REPORT For Machine Learning
100% (1)
PROJECT REPORT For Machine Learning
22 pages
Parameter-Efficient Fine-Tuning of Whisper For Low-Resource Speech Recognition
No ratings yet
Parameter-Efficient Fine-Tuning of Whisper For Low-Resource Speech Recognition
4 pages
Real-Time Human Tracking Using Multi-Features Visual With CNN-LSTM and Q-Learning
No ratings yet
Real-Time Human Tracking Using Multi-Features Visual With CNN-LSTM and Q-Learning
15 pages
Automated Root Cause Analysis of No
No ratings yet
Automated Root Cause Analysis of No
13 pages
Proposal
No ratings yet
Proposal
12 pages
AI Principal
No ratings yet
AI Principal
49 pages
Unit-1
No ratings yet
Unit-1
18 pages
Intelligent System For Vehicles License Plate Recognition Using A Hybrid Model of GAN, CNN and ELM
No ratings yet
Intelligent System For Vehicles License Plate Recognition Using A Hybrid Model of GAN, CNN and ELM
7 pages
1 s2.0 S0079610722000803 Main
No ratings yet
1 s2.0 S0079610722000803 Main
13 pages
Fake_Logo_Detection_Project
No ratings yet
Fake_Logo_Detection_Project
16 pages
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
No ratings yet
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
6 pages
DP-100 Designing and Implementing a Data Science Solution on Azure Updated Dumps
No ratings yet
DP-100 Designing and Implementing a Data Science Solution on Azure Updated Dumps
67 pages
PBL REPORT FINAL(ARYAN AND SATYAM)
No ratings yet
PBL REPORT FINAL(ARYAN AND SATYAM)
24 pages
Ai-(Ix)-Practice Paper 2 New
No ratings yet
Ai-(Ix)-Practice Paper 2 New
5 pages
AI Midterm review
No ratings yet
AI Midterm review
4 pages
Decision Tree Code Explanation
No ratings yet
Decision Tree Code Explanation
4 pages
6.Data Mining - Classification Ppt
No ratings yet
6.Data Mining - Classification Ppt
37 pages
Steel 2
No ratings yet
Steel 2
39 pages
pe03_ai-based-afci-solution-for-solar-inverter
No ratings yet
pe03_ai-based-afci-solution-for-solar-inverter
28 pages
Enhanced Anomaly-Based Fault Detection System in E
No ratings yet
Enhanced Anomaly-Based Fault Detection System in E
19 pages
Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and Microsoft Azure Sridhar Alla download pdf
100% (5)
Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and Microsoft Azure Sridhar Alla download pdf
55 pages
Updated Used Cars Price Prediction Using Machine Learning
No ratings yet
Updated Used Cars Price Prediction Using Machine Learning
24 pages
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
No ratings yet
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
5 pages
2 Machine Learning Overview
No ratings yet
2 Machine Learning Overview
86 pages
Ai&ml Unit 3
No ratings yet
Ai&ml Unit 3
81 pages
Optimizing Differential Evolution for Fitting Constitutive Models
No ratings yet
Optimizing Differential Evolution for Fitting Constitutive Models
1 page
CogAgent-A Visual Language Model for GUI Agents
No ratings yet
CogAgent-A Visual Language Model for GUI Agents
10 pages
OpenAI o1 System Card
No ratings yet
OpenAI o1 System Card
49 pages

ANIL DS PROJECT

Uploaded by

ANIL DS PROJECT

Uploaded by

INTERNSHIP

 Introduction to Data Science

 Data Science is about finding patterns in data,

oStructured data is typically found in

oStructured data is highly valuable because it can be easily searched,

• Tuples are written with round brackets.

•A tuple is a collection which is ordered and unchangeble.

clear() Removes all the elements from the

copy() Returns a copy of the dictionary

fromkeys() Returns a dictionary with the

get() Returns the value of the specified key

items() Returns a list containing a tuple for

• Data scientists need to understand the fundamental concepts

•Competency in statistics, computer programming and information

 Machine learning analyzes and examines large

 It automates the data analysis process and makes

 You can further build and train the data model to

To read the csv file

Shape helps to return the no.of rows and columns

To find the proportions of subscribed and unsubscribed can be obtained

We got an accuracy score of around 90% on the validation dataset.

You might also like