0% found this document useful (0 votes)

86 views

Activity 4 CGPA Vs Placement Package Program

Activity 4 CGPA vs Placement Package Program

Uploaded by

Himanshu Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views

Activity 4 CGPA Vs Placement Package Program

Activity 4 CGPA vs Placement Package Program

Uploaded by

Himanshu Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

11/18/24, 12:53 PM ML PROJECT 2: CGPA VS Package.

ipynb - Colab

Data Processing is an important part of any task that includes data-driven work. It helps us to provide meaningful insights from the data. As we
know Python is a widely used programming language, and there are various libraries and tools available for data processing.

In this article, we are going to see Data Processing in Python, Loading, Printing rows and Columns, Data frame summary, Missing data values
Sorting and Merging Data Frames, Applying Functions, and Visualizing Dataframes.

Data Preprocessing involves a series of steps such as:

1.Data Collection.

2.Data Cleaning.

3.Data Transformation.

4.Feature Engineering: Scaling, Normalization and Standardization.

5.Feature Selection.

6.Handling Imbalanced Data.

7.Encoding Categorical Features.

8.Data Splitting.

keyboard_arrow_down IMPORT THE DEPENDENCIES/NECESSARY LIBRARIES

CGPA v/s Package (in LPA) Prediction using Simple Linear Regression

Dataset Link https://www.kaggle.com/datasets/parvmodi/cgpa-vs-package-in-lpa

# import important libraries

import numpy as np # for linear algebra
import pandas as pd # for data frames processing
import matplotlib.pyplot as plt # for plotting basic graphs
import seaborn as sns # for plotting advanced graphics & datavisualization
from sklearn.linear_model import LinearRegression # for linear regression model
from sklearn.model_selection import train_test_split # for splitting the dataset for training and testing
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score # for evaluating the model performance

LOAD/IMPORT DATASET FROM CSV FILE TO PANDAS DATA FRAMES

# import the data

placement_data = pd.read_csv('/content/Placement.csv')
print(placement_data) # print command is used to show the output

---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-2-9d60e68a70f6> in <cell line: 2>()
1 # import the data
----> 2 placement_data = pd.read_csv('/content/Placement.csv')
3 print(placement_data) # print command is used to show the output

4 frames
/usr/local/lib/python3.10/dist-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map,
is_text, errors, storage_options)
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,

FileNotFoundError: [Errno 2] No such file or directory: '/content/Placement.csv'

Next steps: Explain error

# SHOW/DISPLAY FIRST 5 ROWS OF DATASET

placement_data.head()

# IF WE WANT TO SEE FIRST 10 ROWS

placement_data.head(12)

# SHOW THE LAST 5 ROWS OF DATASET

placement_data.tail()

https://colab.research.google.com/drive/17Px7K5hY0IQ4R396TXofcZRG1ZD-qxUT#scrollTo=3NZ4Y6FmQiR0&printMode=true 1/4
11/18/24, 12:53 PM ML PROJECT 2: CGPA VS Package.ipynb - Colab
# SHOW THE LAST 10 ROWS OF DATASET
placement_data.tail(10)

# FIND THE SHAPE/DIMENSION OF DATASET

placement_data.shape

# GET MORE INFORMATION ABOUT DATASET (SUCH AS NO. OF ROWS, COLUMNS, COLUMN NAME, DATA TYPE, NULL VALUE)
placement_data.info() # Data imputation

# FIND MISSING VALUE IN DATASET

placement_data.isnull().sum() # isnull command tells missing entries in a particular column

# GET STATISTICAL INFORMATION ABOUT DATASET

placement_data.describe()

# VISUALISE DATASET IN SCATTER PLOT

sns.regplot(x = placement_data['cgpa'], y = placement_data['package']) # regplot is used to plot regression plot
plt.show()

Step 2: Performing Simple Linear Regression

Steps of Model Building

1.Create X and y

2.Create train and test sets

3.Train the model on training set (i.e. learn the coefficients)

4.Evaluate the model on training set and test set

# SEPRATE INDEPENDENT VARIABLE(X) AND DEPENDENT VARIABLE (Y)

# 'cgpa' is an independent variable and 'package' is a dependent variable
X = placement_data['cgpa']
Y = placement_data['package']

# print independent and dependent variable

print(X)
print(Y)

# train test split

[X_train, X_test, Y_train,Y_test] = train_test_split(X, Y, train_size = .70, random_state = 2)
print('The shape of X_train is', X_train.shape)
print('The shape of X_test is', X_test.shape)
print('The shape of Y_train is', Y_train.shape)
print('The shape of Y_test is', Y_test.shape)
X_test.head()

# X_train and X_test are a series and we want to convert them to the 2D array for model building
# reshape X_train and X_test to (n,1)
X_train_lm = X_train.values.reshape(-1, 1)
print('The shape of X_train_lm is', X_train_lm.shape)
X_test_lm = X_test.values.reshape(-1, 1)
print('The shape of X_test_lm is', X_test_lm.shape)

# create an object of linear regression

lm = LinearRegression()

# fit the model

lm.fit(X_train_lm, Y_train)

# see the parameters

print('The coefficient is', round(lm.coef_[0], 2))
print('The intercept is', round(lm.intercept_, 2))

# make predictions on the training dataset

x_train_data_pred = lm.predict(X_train_lm)

# plot the model

plt.scatter(X_train, Y_train)
plt.plot(X_train, x_train_data_pred, color = 'r')
plt.show()

https://colab.research.google.com/drive/17Px7K5hY0IQ4R396TXofcZRG1ZD-qxUT#scrollTo=3NZ4Y6FmQiR0&printMode=true 2/4
11/18/24, 12:53 PM ML PROJECT 2: CGPA VS Package.ipynb - Colab

Step 4: Predictions and Evaluation on Test Set

# make predictions on the test set

x_test_data_pred = lm.predict(X_test_lm)
#print(x_test_data_pred)

R squared Method

It is a regression error metric that justifies the performance of the model. It represents the value of how much the independent variables are
able to describe the value for the response/target variable.

Thus, an R-squared model describes how well the target variable is explained by the combination of the independent variables as a single
unit.

The R squared value ranges between 0 to 1 and is represented by the below formula:

R2= 1- SSres / SStot

Here,

SSres: The sum of squares of the residual errors. SStot: It represents the total sum of the errors. Always remember, Higher the R square value,
better is the predicted model

# Model Evaluation on training data

MAE_train = round(mean_absolute_error(y_true = Y_train, y_pred = x_train_data_pred), 2)
MSE_train = round(mean_squared_error(y_true = Y_train, y_pred = x_train_data_pred), 2)
RMSE_train = round(np.sqrt(mean_squared_error(y_true = Y_train, y_pred = x_train_data_pred)), 2)
r_square_train = round(r2_score(y_true = Y_train, y_pred = x_train_data_pred), 2)
# Print the each type of error for training data
print('The Mean Absolute Error for the training set is', MAE_train)
print('The Mean Square Error for the training set is', MSE_train)
print('The Root Mean Square Error for the training set is', RMSE_train)
print('The R Square Error for the training set is', r_square_train)

# Model Evaluation on test data

MAE_test = round(mean_absolute_error (y_true = Y_test, y_pred = x_test_data_pred), 2)
MSE_test = round(mean_squared_error(y_true = Y_test, y_pred = x_test_data_pred), 2)
RMSE_test = round(np.sqrt(mean_squared_error(y_true = Y_test, y_pred = x_test_data_pred)), 2)
r_square_test = round(r2_score(y_true = Y_test, y_pred = x_test_data_pred), 2)
# Print the each type of error for Test data
print('The Mean Absolute Error for the Test Data is', MAE_test)
print('The Mean Square Error for the Test Data is', MSE_test)
print('The Root Mean Square Error for the Test Data is', RMSE_test)
print('The R Square Error for the Test Data is', r_square_test)

# plot the model with the test set

plt.scatter(X_train, Y_train)
plt.scatter(X_test, Y_test)
plt.plot(X_test, x_test_data_pred, color = 'g')
plt.show()

# Make predictive model

# NOW PREDICT ON NEW INPUT UNSEEN DATA POINTS
input_data =(7.48)
input_data_as_numpy_array = np.asarray(input_data)
# RESHAPE THE NUMPY ARRAY AS WE ARE PREDICTING ONLY FOR SINGLE DATA INTANCE AT A TIME
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)
prediction = lm.predict(input_data_reshaped)
print(prediction)

https://colab.research.google.com/drive/17Px7K5hY0IQ4R396TXofcZRG1ZD-qxUT#scrollTo=3NZ4Y6FmQiR0&printMode=true 3/4
11/18/24, 12:53 PM ML PROJECT 2: CGPA VS Package.ipynb - Colab

https://colab.research.google.com/drive/17Px7K5hY0IQ4R396TXofcZRG1ZD-qxUT#scrollTo=3NZ4Y6FmQiR0&printMode=true 4/4

Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
Rakesh Kumar - 21554244 - Big Data - Assessment 2
No ratings yet
Rakesh Kumar - 21554244 - Big Data - Assessment 2
23 pages
ML (Prac1)
No ratings yet
ML (Prac1)
12 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Experiment 1 solution
No ratings yet
Experiment 1 solution
5 pages
Data Preprocessing in Python
No ratings yet
Data Preprocessing in Python
3 pages
Enhanced_Student_Data_Processing_System
No ratings yet
Enhanced_Student_Data_Processing_System
4 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Chapter 1. Data Preparation (2)
No ratings yet
Chapter 1. Data Preparation (2)
74 pages
S-9
No ratings yet
S-9
18 pages
Just Give Me The Codes Lecture 2: Data Importation: Goals: Import Data Into Jupyterlab View The Dataset
No ratings yet
Just Give Me The Codes Lecture 2: Data Importation: Goals: Import Data Into Jupyterlab View The Dataset
9 pages
Advanced Python Lab
No ratings yet
Advanced Python Lab
17 pages
Data Preprocessing Tutorial
No ratings yet
Data Preprocessing Tutorial
39 pages
Lab Assignment 1
No ratings yet
Lab Assignment 1
2 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
Ethics And Ai Exp-2
No ratings yet
Ethics And Ai Exp-2
5 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
MLC Practical
No ratings yet
MLC Practical
51 pages
1
No ratings yet
1
3 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Day-4 DS Practicals
No ratings yet
Day-4 DS Practicals
5 pages
Assignment1
No ratings yet
Assignment1
2 pages
Pandas-1
No ratings yet
Pandas-1
13 pages
Data Cleaning and Preprocessing
No ratings yet
Data Cleaning and Preprocessing
4 pages
DSBDA Lab Manual24-25
No ratings yet
DSBDA Lab Manual24-25
58 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
Python practice questions (1)
No ratings yet
Python practice questions (1)
5 pages
Day11 Machine Learning
No ratings yet
Day11 Machine Learning
37 pages
1_Data Preprocessing and Cleaning_55
No ratings yet
1_Data Preprocessing and Cleaning_55
8 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Data Preprocessing Implementation 13112023 061217pm
No ratings yet
Data Preprocessing Implementation 13112023 061217pm
31 pages
Advance Python
No ratings yet
Advance Python
5 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Python and Excel
No ratings yet
Python and Excel
11 pages
Aids-B Ii-Ii DSP Lab LP
No ratings yet
Aids-B Ii-Ii DSP Lab LP
2 pages
CSC407_Chapter 2-3
No ratings yet
CSC407_Chapter 2-3
46 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
ML Practical 03
No ratings yet
ML Practical 03
20 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
Data Science
No ratings yet
Data Science
18 pages
Prac 7
No ratings yet
Prac 7
5 pages
Unit 3
No ratings yet
Unit 3
102 pages
Lab_questionbank
No ratings yet
Lab_questionbank
3 pages
final dev record
No ratings yet
final dev record
49 pages
Spark Python Course APPLY Project Solution Guide Hints
No ratings yet
Spark Python Course APPLY Project Solution Guide Hints
2 pages
dataframing_in_csv
No ratings yet
dataframing_in_csv
14 pages
PRACTICAL FILE INFOMATICS PRACTICES 2024-25
No ratings yet
PRACTICAL FILE INFOMATICS PRACTICES 2024-25
39 pages
Pandas: Reference Sheet
No ratings yet
Pandas: Reference Sheet
9 pages
ANL252 SU4 Jul2022
No ratings yet
ANL252 SU4 Jul2022
55 pages
Pre-Processing Example - 1
No ratings yet
Pre-Processing Example - 1
6 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Institute of Ethiopian Studies Journal of Ethiopian Studies
No ratings yet
Institute of Ethiopian Studies Journal of Ethiopian Studies
18 pages
Electron Diffraction - SAED and CBED
No ratings yet
Electron Diffraction - SAED and CBED
35 pages
Kinematics
No ratings yet
Kinematics
26 pages
reward Bardach
No ratings yet
reward Bardach
11 pages
History of The Medical Technology Profession: Jaqueline Nicole F. Parayno, RMT MSMT
No ratings yet
History of The Medical Technology Profession: Jaqueline Nicole F. Parayno, RMT MSMT
34 pages
Funtion of Mass Media
100% (1)
Funtion of Mass Media
2 pages
MPSS 2015 1 PDF
No ratings yet
MPSS 2015 1 PDF
63 pages
PDF Boiling Heat Transfer and Two-Phase Flow Second Edition Tang download
100% (1)
PDF Boiling Heat Transfer and Two-Phase Flow Second Edition Tang download
65 pages
Time and Global States PDF
No ratings yet
Time and Global States PDF
18 pages
N 407-Compass Work For Basic Studies s4 Ready
100% (1)
N 407-Compass Work For Basic Studies s4 Ready
20 pages
PAES 228 - Fiber Decorticator - Specs
No ratings yet
PAES 228 - Fiber Decorticator - Specs
7 pages
DLL - MTB 3 - Q2 - Catch Up Activity
No ratings yet
DLL - MTB 3 - Q2 - Catch Up Activity
3 pages
Parental Anxiety Societal Influences and the Fear of Judgment in Parenting (1)
No ratings yet
Parental Anxiety Societal Influences and the Fear of Judgment in Parenting (1)
18 pages
Sample Lesson Plan For Cot
No ratings yet
Sample Lesson Plan For Cot
3 pages
Binder1 Compressed Compressed
No ratings yet
Binder1 Compressed Compressed
3 pages
Presentation On Jet Pump
No ratings yet
Presentation On Jet Pump
11 pages
Instant ebooks textbook Tinbergen s Legacy in Behaviour Sixty Years of Landmark Stickleback Papers 1st Edition Frank Hippel (Ed.) download all chapters
100% (11)
Instant ebooks textbook Tinbergen s Legacy in Behaviour Sixty Years of Landmark Stickleback Papers 1st Edition Frank Hippel (Ed.) download all chapters
60 pages
PHYSICAL-SCIENCE-Q4-Daily Lesson Plan
No ratings yet
PHYSICAL-SCIENCE-Q4-Daily Lesson Plan
2 pages
Broken Dreams
No ratings yet
Broken Dreams
133 pages
Week 2 - Le
No ratings yet
Week 2 - Le
10 pages
10 Grammar Simplification
No ratings yet
10 Grammar Simplification
33 pages
Card Reader-C1 d1-pc PDF
No ratings yet
Card Reader-C1 d1-pc PDF
9 pages
Responsible Parenthood
No ratings yet
Responsible Parenthood
2 pages
9709 Mathematics November 2022 Mark Scheme 12
No ratings yet
9709 Mathematics November 2022 Mark Scheme 12
18 pages
SHS - Statistics
No ratings yet
SHS - Statistics
7 pages
ENGIE IMS Broschuere Englisch
No ratings yet
ENGIE IMS Broschuere Englisch
12 pages
National Institute For Small Industry Extension and TRAINIG (Nisiet)
No ratings yet
National Institute For Small Industry Extension and TRAINIG (Nisiet)
14 pages
7 Effective Classroom Management in Physical Education 6
No ratings yet
7 Effective Classroom Management in Physical Education 6
10 pages
Ahmed Timoumi Indian School of Business
No ratings yet
Ahmed Timoumi Indian School of Business
6 pages
2023 - Exemplar English Gr1T4 Maths Endline Assessment
No ratings yet
2023 - Exemplar English Gr1T4 Maths Endline Assessment
4 pages

Activity 4 CGPA Vs Placement Package Program

Uploaded by

Activity 4 CGPA Vs Placement Package Program

Uploaded by

11/18/24, 12:53 PM ML PROJECT 2: CGPA VS Package.

Data Preprocessing involves a series of steps such as:

4.Feature Engineering: Scaling, Normalization and Standardization.

6.Handling Imbalanced Data.

7.Encoding Categorical Features.

keyboard_arrow_down IMPORT THE DEPENDENCIES/NECESSARY LIBRARIES

Dataset Link https://www.kaggle.com/datasets/parvmodi/cgpa-vs-package-in-lpa

# import important libraries

LOAD/IMPORT DATASET FROM CSV FILE TO PANDAS DATA FRAMES

# import the data

FileNotFoundError: [Errno 2] No such file or directory: '/content/Placement.csv'

Next steps: Explain error

# SHOW/DISPLAY FIRST 5 ROWS OF DATASET

# IF WE WANT TO SEE FIRST 10 ROWS

# SHOW THE LAST 5 ROWS OF DATASET

# FIND THE SHAPE/DIMENSION OF DATASET

# FIND MISSING VALUE IN DATASET

# GET STATISTICAL INFORMATION ABOUT DATASET

# VISUALISE DATASET IN SCATTER PLOT

Step 2: Performing Simple Linear Regression

Steps of Model Building

2.Create train and test sets

3.Train the model on training set (i.e. learn the coefficients)

4.Evaluate the model on training set and test set

# SEPRATE INDEPENDENT VARIABLE(X) AND DEPENDENT VARIABLE (Y)

# print independent and dependent variable

# train test split

# create an object of linear regression

# fit the model

# see the parameters

# make predictions on the training dataset

# plot the model

Step 4: Predictions and Evaluation on Test Set

# make predictions on the test set

**R squared Method**

R2= 1- SSres / SStot

# Model Evaluation on training data

# Model Evaluation on test data

# plot the model with the test set

# Make predictive model

You might also like

R squared Method