Activity 4 CGPA Vs Placement Package Program
Activity 4 CGPA Vs Placement Package Program
ipynb - Colab
Data Processing is an important part of any task that includes data-driven work. It helps us to provide meaningful insights from the data. As we
know Python is a widely used programming language, and there are various libraries and tools available for data processing.
In this article, we are going to see Data Processing in Python, Loading, Printing rows and Columns, Data frame summary, Missing data values
Sorting and Merging Data Frames, Applying Functions, and Visualizing Dataframes.
1.Data Collection.
2.Data Cleaning.
3.Data Transformation.
5.Feature Selection.
8.Data Splitting.
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-2-9d60e68a70f6> in <cell line: 2>()
1 # import the data
----> 2 placement_data = pd.read_csv('/content/Placement.csv')
3 print(placement_data) # print command is used to show the output
4 frames
/usr/local/lib/python3.10/dist-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map,
is_text, errors, storage_options)
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,
https://colab.research.google.com/drive/17Px7K5hY0IQ4R396TXofcZRG1ZD-qxUT#scrollTo=3NZ4Y6FmQiR0&printMode=true 1/4
11/18/24, 12:53 PM ML PROJECT 2: CGPA VS Package.ipynb - Colab
# SHOW THE LAST 10 ROWS OF DATASET
placement_data.tail(10)
# GET MORE INFORMATION ABOUT DATASET (SUCH AS NO. OF ROWS, COLUMNS, COLUMN NAME, DATA TYPE, NULL VALUE)
placement_data.info() # Data imputation
1.Create X and y
# X_train and X_test are a series and we want to convert them to the 2D array for model building
# reshape X_train and X_test to (n,1)
X_train_lm = X_train.values.reshape(-1, 1)
print('The shape of X_train_lm is', X_train_lm.shape)
X_test_lm = X_test.values.reshape(-1, 1)
print('The shape of X_test_lm is', X_test_lm.shape)
https://colab.research.google.com/drive/17Px7K5hY0IQ4R396TXofcZRG1ZD-qxUT#scrollTo=3NZ4Y6FmQiR0&printMode=true 2/4
11/18/24, 12:53 PM ML PROJECT 2: CGPA VS Package.ipynb - Colab
It is a regression error metric that justifies the performance of the model. It represents the value of how much the independent variables are
able to describe the value for the response/target variable.
Thus, an R-squared model describes how well the target variable is explained by the combination of the independent variables as a single
unit.
The R squared value ranges between 0 to 1 and is represented by the below formula:
Here,
SSres: The sum of squares of the residual errors. SStot: It represents the total sum of the errors. Always remember, Higher the R square value,
better is the predicted model
https://colab.research.google.com/drive/17Px7K5hY0IQ4R396TXofcZRG1ZD-qxUT#scrollTo=3NZ4Y6FmQiR0&printMode=true 3/4
11/18/24, 12:53 PM ML PROJECT 2: CGPA VS Package.ipynb - Colab
https://colab.research.google.com/drive/17Px7K5hY0IQ4R396TXofcZRG1ZD-qxUT#scrollTo=3NZ4Y6FmQiR0&printMode=true 4/4