0% found this document useful (0 votes)
8 views

22IZ023 Nikhil - Exercise 6_ Linear Regression

Uploaded by

nikhildeutsch03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

22IZ023 Nikhil - Exercise 6_ Linear Regression

Uploaded by

nikhildeutsch03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Exercise-6 Linear Regression

Aim​
To implement a linear regression model for predicting glucose levels based on age
using Python.

Logic Description​
Linear regression is a statistical method used to model the relationship between a
dependent variable (Glucose) and an independent variable (Age). The model learns
a linear equation to make predictions.

Algorithm

1.​ Load the dataset using Pandas.


2.​ Check for the existence of required columns (Age, Glucose).
3.​ Remove rows with missing values.
4.​ Split the dataset into training (80%) and testing (20%) sets.
5.​ Train a linear regression model using the training data.
6.​ Make predictions on the test data.
7.​ Evaluate the model using metrics such as Mean Squared Error (MSE) and R²
Score.
8.​ Display actual vs. predicted values.

Package/Tools Description

SI.N Name of Description


O Package/Tool

1 Pandas Data manipulation and analysis

2 NumPy Numerical operations

3 Scikit-learn Machine learning library for regression and


evaluation
Source Code:-
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Step 1: Get the filename from the user


filename = input("Enter the filename: ")

# Step 2: Load the dataset


df = pd.read_csv(filename)

# Step 3: Check if 'Age' and 'Glucose' columns exist


if "Age" not in df.columns or "Glucose" not in df.columns:
print("Error: Dataset must contain 'Age' and 'Glucose' columns.")
else:
# Step 4: Remove rows with missing values in 'Age' or 'Glucose'
df = df[['Age', 'Glucose']].dropna()

# Step 5: Split the dataset into training and testing sets (80% train,
20% test)
X = df[['Age']] # Feature
y = df['Glucose'] # Target variable
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Step 6: Train a Linear Regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Step 7: Make predictions


y_pred = model.predict(X_test)

# Step 8: Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
coef = model.coef_[0]
intercept = model.intercept_
# Step 9: Display results
print("\nLinear Regression Model Evaluation:")
print(f"Mean Squared Error: {mse:.6f}")
print(f"R² Score: {r2:.6f}")
print(f"Coefficient: {coef:.6f}")
print(f"Intercept: {intercept:.6f}")

# Step 10: Display actual vs. predicted values


results = pd.DataFrame({'Actual': y_test.values, 'Predicted': y_pred})
print("\nComparison of Actual vs. Predicted Glucose Levels:")
print(results.head(10)) # Display first 10 rows

Output:-
Test Cases

Test Input Expected Output


Case

1 Dataset with missing values Error if Age/Glucose columns are


missing

2 Dataset after dropna() Cleaned dataset without missing


values

3 Linear Regression Model Successful model training


Training

4 Prediction on test data Glucose level predictions

5 Model Evaluation MSE and R² Score

Inferences

●​ The model predicts glucose levels based on age with a linear relationship.
●​ Model performance depends on data distribution and quality.
●​ A lower MSE and a higher R² indicate better model accuracy.

Result​
Linear regression was successfully implemented for predicting glucose levels,
demonstrating an understanding of regression analysis using Python.

You might also like