Assignment II

Spark

Uploaded by

HPot PotTech

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Assignment II

Spark

Uploaded by

HPot PotTech

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Assignment II

Prediction of Credit Card Defaulters

Hive:
Initially I had done the dropping of id col on hive but my vm crashed and I lost all the screenshots for
that.

Understand and analyze the Dataset

First we load the data.

We then remove the ID column since it is not required.

We look at the schema.

We look at the different statistics of the numerical columns

We then look at the distribution of data of different features.
Next we see the distribution of the target variable.
As can be seen, the dataset is skewed.

Next we check if there are any null values.

Next we find the correlation between different features.

We can see that the bill_amts are highly correlated. Since we are using logistic
regression and one of the assumptions is that the features are uncorrelated,
hence we remove bill_amt2-bill_amt6.

We then change the target variable from 0/1 to No/Yes.

We transform the pay columns since we need them to be indices starting from 0
for the one hot encoder to work.
Determine the features.

We ignore bill_amt2-5 as stated above.

We first transform the categorical columns to one-hot representation

Then we vectorize all the required features so as it can be fed as input to the
logistic regression model.

We also scale the data to zero mean and unit variance.

We do all this by creating a pipeline of transformations and then fitting the

features through the pipeline.
Divide dataset

We split the dataset into train:test in 60:40 ratio.

Determine a Model and its measurement function

We define a logistic regression model and train the model on the train dataset.

Verify the Model accuracy.

We look at the area under ROC, accuracy and F1-score of our model
Use Sparkweb UI to determine which task take the most of your
execution time.

The fit command took the most time. It spawned 106 jobs with 126 stages. The maximum time in a
stage was 7 seconds as shown above.

C++: A Beginner's Guide, Second Edition
From Everand
C++: A Beginner's Guide, Second Edition
Herbert Schildt
No ratings yet
Logistic Regression Assignment Quiz
80% (5)
Logistic Regression Assignment Quiz
7 pages
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Credit Card Approval
No ratings yet
Credit Card Approval
15 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Introduction to the simulation of power plants for EBSILON®Professional Version 15
From Everand
Introduction to the simulation of power plants for EBSILON®Professional Version 15
Steffen Swat
No ratings yet
Assignment 2: Hive
No ratings yet
Assignment 2: Hive
11 pages
PAMLSET2.docx (1)
No ratings yet
PAMLSET2.docx (1)
4 pages
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
From Everand
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
Abdelfattah Ragab
No ratings yet
PAMLSET1new.docx (1)
No ratings yet
PAMLSET1new.docx (1)
4 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
Machine Learning
No ratings yet
Machine Learning
9 pages
HCI ScorecardModel PPT
No ratings yet
HCI ScorecardModel PPT
9 pages
Train
No ratings yet
Train
17 pages
Flipkart Training: Exploratory Data Analysis
No ratings yet
Flipkart Training: Exploratory Data Analysis
9 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
Practical 3 - Categorical Feature Engineering
No ratings yet
Practical 3 - Categorical Feature Engineering
6 pages
Spark Python Course APPLY Project Solution Guide Hints
No ratings yet
Spark Python Course APPLY Project Solution Guide Hints
2 pages
Angular Observables and Promises: A Practical Guide to Asynchronous Programming
From Everand
Angular Observables and Promises: A Practical Guide to Asynchronous Programming
Abdelfattah Ragab
No ratings yet
PCA - Colab
No ratings yet
PCA - Colab
2 pages
AI CODE
No ratings yet
AI CODE
2 pages
Liton Nath
No ratings yet
Liton Nath
1 page
Aim: Predicting The Survival of Titanic Passengers
No ratings yet
Aim: Predicting The Survival of Titanic Passengers
20 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
75.an Approach For Prediction of Loan Approval Using
No ratings yet
75.an Approach For Prediction of Loan Approval Using
5 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
100% (1)
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
76 pages
Credit Risk Project
No ratings yet
Credit Risk Project
11 pages
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet
Mini Project
No ratings yet
Mini Project
9 pages
Question 1 The Given Dataset Can Be Visualized As Follows
No ratings yet
Question 1 The Given Dataset Can Be Visualized As Follows
13 pages
C++ Programming Language
From Everand
C++ Programming Language
Knowledge Flow
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
From Everand
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Michael Blake
5/5 (1)
120 Advanced JavaScript Interview Questions
From Everand
120 Advanced JavaScript Interview Questions
Hernando Abella
No ratings yet
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
Assignment 1 (Fall 2024)
No ratings yet
Assignment 1 (Fall 2024)
4 pages
Angular Performance Optimization: Everything you need to know
From Everand
Angular Performance Optimization: Everything you need to know
Abdelfattah Ragab
No ratings yet
Machine Learning
100% (1)
Machine Learning
33 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
ashfatmaterial
No ratings yet
ashfatmaterial
4 pages
Data Analytics Program
No ratings yet
Data Analytics Program
11 pages
Final Mla File For Practical
No ratings yet
Final Mla File For Practical
30 pages
Financial Risk Analysis: Great Learning PGPBABI 2017
No ratings yet
Financial Risk Analysis: Great Learning PGPBABI 2017
25 pages
Final Report (1)
No ratings yet
Final Report (1)
17 pages
Regression Log
No ratings yet
Regression Log
4 pages
BFCAI BigDataAnalytics Lecture#5 2
No ratings yet
BFCAI BigDataAnalytics Lecture#5 2
69 pages
GROUP 9
No ratings yet
GROUP 9
9 pages
Documenting the Solution to Develop a Behaviour Score
No ratings yet
Documenting the Solution to Develop a Behaviour Score
9 pages
L4 - Logistic Regression - B
No ratings yet
L4 - Logistic Regression - B
45 pages
Omicron
No ratings yet
Omicron
23 pages
(English) Logistic Regression Nomogram (DownSub - Com)
No ratings yet
(English) Logistic Regression Nomogram (DownSub - Com)
3 pages
Binary Logistic (5)
No ratings yet
Binary Logistic (5)
29 pages
The Little Book of Javascript
From Everand
The Little Book of Javascript
Karl Agius
No ratings yet
Logistic Regression
No ratings yet
Logistic Regression
2 pages
Komal ML Assg1
No ratings yet
Komal ML Assg1
9 pages
Binary Logistic Regression From Scratch
No ratings yet
Binary Logistic Regression From Scratch
10 pages
ML Hota Assign3
No ratings yet
ML Hota Assign3
4 pages
ComplexArithmetic - Jupyter Notebook
No ratings yet
ComplexArithmetic - Jupyter Notebook
14 pages
Assignment I (DF)
No ratings yet
Assignment I (DF)
10 pages
Assignment I (Dataframe) : Analysis of Stocks Data
No ratings yet
Assignment I (Dataframe) : Analysis of Stocks Data
9 pages
Elastic Stack 7
No ratings yet
Elastic Stack 7
280 pages
Learning Tensorflow
No ratings yet
Learning Tensorflow
9 pages
Democracy Administration 1
No ratings yet
Democracy Administration 1
34 pages
ABC Guide On Citizen Engagement
No ratings yet
ABC Guide On Citizen Engagement
11 pages
Practical TOGAF 9 Sample Soln 2014Q2
No ratings yet
Practical TOGAF 9 Sample Soln 2014Q2
35 pages
Hello, World: Artificial Intelligence and Its Use in The Public Sector
No ratings yet
Hello, World: Artificial Intelligence and Its Use in The Public Sector
185 pages
Developing Cloud Native Applications With Microservices Architecture - Google Slides
No ratings yet
Developing Cloud Native Applications With Microservices Architecture - Google Slides
1 page