0% found this document useful (0 votes)

15 views

Experiment 1

Uploaded by

mohammed.ansari

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Experiment 1

Uploaded by

mohammed.ansari

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Sardar Patel Institute of Technology,Mumbai

Department of Electronics and Telecommunication Engineering

B.E. Sem-VII- PE-IV (2024-2025)
IT 24 - AI in Healthcare

Experiment 1: Regression in Healthcare Dataset

Name: Mohammed Shanouf Valijan Ansari (2021300004) (Batch - M) Date: 14-08-2024

Objective:

● Write program for regression analysis for healthcare dataset.

● To demonstrate the working principle of regression techniques on medical data set

for building the model to classify/ predict using a new sample.
Outcomes:

● Explore the Medical Dataset suitable for linear/ logistic regression problem

● Explore the pattern from the dataset and apply suitable algorithm

System Requirements:
Linux OS with Python and libraries or R or windows with MATLAB
Theory:

Regression is a statistical method used to model the relationship between a dependent

variable (output) and one or more independent variables (inputs). Mathematically, in the
case of simple linear regression, this relationship is expressed as y=β0+β1x+ϵy, where y is the
dependent variable, x is the independent variable, β0 and β1\beta_1β1 are the coefficients
(intercept and slope, respectively), and ϵ represents the error term, accounting for the
deviation of observed values from the predicted ones. The goal is to determine the values of
β0\beta_0β0 and β1\beta_1β1 that minimize the sum of squared errors (SSE) between the
observed and predicted values of y. In multiple regression, the equation extends to
y=β0+β1x1+β2x2+⋯+βnxn+ϵ, involving multiple independent variables. The coefficients
are typically estimated using methods like Ordinary Least Squares (OLS), and the model's
accuracy is often assessed by metrics such as R-squared, which measures the proportion of
variance in the dependent variable explained by the independent variables.

Regression analysis has several types, each serving different purposes based on the nature of the
data and the relationship between variables. The main types include:
1. Linear Regression: This is the simplest form, where the relationship between the
dependent and independent variables is modeled as a straight line. It is used when the
data shows a linear trend.
2. Multiple Linear Regression: An extension of linear regression, this involves multiple
independent variables to predict the dependent variable. It is significant for understanding
how several factors simultaneously affect an outcome.
3. Polynomial Regression: This type models the relationship as an nth-degree polynomial,
allowing for curved relationships between variables. It is useful when the data exhibits
nonlinear trends.
4. Logistic Regression: Although named "regression," it is used for binary classification
problems, modeling the probability that a given input belongs to a specific class. It is
significant in fields like medical diagnostics and social sciences.
5. Ridge and Lasso Regression: These are regularization techniques applied to linear
regression to prevent overfitting by adding a penalty to the magnitude of coefficients.
Ridge regression penalizes the sum of squared coefficients, while Lasso penalizes the
sum of absolute coefficients, also allowing for feature selection.
6. Quantile Regression: Instead of modeling the mean of the dependent variable, quantile
regression estimates the median or other quantiles. It is significant in cases where the
relationship between variables varies across different points of the distribution.

Each type of regression is significant in its own way, allowing analysts and researchers to choose
the most appropriate model for their specific data characteristics and research objectives.

Image Depicting Linear Regression on One Input Variable

Datasets:
For Linear Regression:
Patient Records of a Particular Hospital
(https://huggingface.co/datasets/Nicolybgs/healthcare_data)
For Logistic Regression:
Diabetes Dataset
(https://www.kaggle.com/datasets/mathchi/diabetes-data-set)
ALGORITHM:

Step 1: Create a sample dataset with multiple independent variables and one dependent
variable (Y).
Step 2: The data is split into training and testing sets using the train_test_split function.
Step3: Different regression model is created and fitted to the training data.
Step4: Predictions are made on the test set.
Step5: The model is evaluated using metrics like Mean Absolute Error, Mean Squared Error,
and Root Mean Squared Error.
Step6: Finally, the coefficients and intercept of the regression equation are printed.

Code:
For Linear Regression:
(Colab Notebook - LinearRegression.ipynb)
Task:
To predict the number of days an admitted patient will stay in a particular hospital depending on
the patient’s disease severeness, hospital department, doctor under consideration, ward, etc

Imports-
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error,
r2_score
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df =
pd.read_csv("hf://datasets/Nicolybgs/healthcare_data/healthcare_data.csv")

Preprocessing-
Getting all columns’ information and their respective unique values
for column in df.columns:
print(column, ' ---> ', df[column].unique())

Dropping irrelevant columns

df = df.drop(columns=['patientid'])

Assigning numeric values to ‘Age’ column

def range_to_midpoint(age_range):
start, end = age_range.split('-')
return (int(start) + int(end)) / 2

df['Age'] = df['Age'].apply(range_to_midpoint)

Encoding all the columns with categorical outputs using hot-bit encoding scheme, keeping the
columns with numeric values intact
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

categorical_columns = ['Department', 'gender', 'Type of Admission',

'Severity of Illness', 'Insurance', 'Ward_Facility_Code', 'doctor_name',
'health_conditions']
numeric_columns = ['Available Extra Rooms in Hospital', 'staff_available',
'Age', 'Visitors with Patient', 'Admission_Deposit', 'Stay (in days)']

categorical_transformer = OneHotEncoder(sparse=False)
preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_columns)
],
remainder='passthrough'
)

df = preprocessor.fit_transform(df)

df = pd.DataFrame(df, columns=(

list(preprocessor.named_transformers_['cat'].get_feature_names_out(categor
ical_columns)) +
numeric_columns
))

Analysis-
Finding out the variables that are highly correlated with the output variable
corr_matrix = df.corr()

plt.figure(figsize=(12, 12))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1,
fmt='.2f')
plt.title('Correlation Matrix Heatmap')
plt.show()
Dropping the attributes that have almost no correlation with the concerned variable
df_imp = df.drop(columns = list(set(list(df.columns)) - set(['Age',
'Stay (in days)',
'Department_TB & Chest
disease',
'Department_anesthesia',
'Department_gynecology',
'Department_radiotherapy',
'Department_surgery',
'gender_Female',
'gender_Male',
'gender_Other',

'Ward_Facility_Code_A',
'Ward_Facility_Code_B', 'Ward_Facility_Code_C',
'Ward_Facility_Code_D',
'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr
Isaac',
'doctor_name_Dr John', 'doctor_name_Dr Mark', 'doctor_name_Dr
Nathan',
'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr
Sarah',
'doctor_name_Dr Simon', 'doctor_name_Dr Sophia',])))

Training and Testing: Model A (Considering all attributes)

X = df.drop('Stay (in days)', axis=1)

y = df['Stay (in days)']

print("Model A (Considering All Attributes)")

ratios = [(0.3, '7:3'), (0.2, '8:2')]

for test_size, ratio in ratios:

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=test_size, random_state=35)

model = LinearRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

mae = mean_absolute_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

rmse = np.sqrt(mse)

print(f"Train-Test Ratio {ratio}:")

print(f"Mean Squared Error (MSE): {mse}")

print(f"Mean Absolute Error (MAE): {mae}")

print(f"R2 Score (R2): {r2}")

print(f"Root Mean Squared Error (RMSE): {rmse}")

print(f"Coefficients: {model.coef_}")

print(f"Intercept: {model.intercept_}")

print('-' * 40)

Training and Testing: Model B (Considering attributes showing strong correlation with output)
X = df_imp.drop('Stay (in days)', axis=1)

y = df_imp['Stay (in days)']

print("Model B (Considering Attributes Showing Strong Correlation to

Output)")

ratios = [(0.3, '7:3'), (0.2, '8:2')]

for test_size, ratio in ratios:

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=test_size, random_state=35)

model = LinearRegression()

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)

mae = mean_absolute_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

rmse = np.sqrt(mse)

print(f"Train-Test Ratio {ratio}:")

print(f"Mean Squared Error (MSE): {mse}")

print(f"Mean Absolute Error (MAE): {mae}")

print(f"R2 Score (R2): {r2}")

print(f"Root Mean Squared Error (RMSE): {rmse}")

print(f"Coefficients: {model.coef_}")

print(f"Intercept: {model.intercept_}")

print('-' * 40)

For Logistic Regression:

(Colab Notebook - LogisticRegression.ipynb)
Task:
To predict whether a particular person is diabetic or not depending on his/her glucose levels, skin
thickness, age, insulin level, blood pressure, BMI, etc

Imports-
import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, precision_score, recall_score,

f1_score, confusion_matrix, classification_report

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler

df = pd.read_csv('/content/drive/MyDrive/Datasets/diabetes.csv')

Preprocessing-
Normalizing the data
scaler_minmax = MinMaxScaler()

df = pd.DataFrame(scaler_minmax.fit_transform(df), columns=df.columns)

Analysis-
Finding the extent of correlations of independent variables with the dependent binary variable
corr_matrix = df.corr()

plt.figure(figsize=(10, 8))

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1,

fmt='.2f')

plt.title('Correlation Matrix Heatmap')

plt.show()
Considering attributes that are strongly correlated with the outcome
df_imp = df[['Pregnancies', 'Glucose', 'BMI', 'Age', 'Outcome']]

Training and Testing: Model A (Considering all attributes)-

X = df.drop('Outcome', axis=1)

y = df['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

random_state=42)

model = LogisticRegression()

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)

print("Accuracy:", accuracy)

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

print(f"Coefficients: {model.coef_}")

print(f"Intercept: {model.intercept_}")

Training and Testing: Model B (Considering attributes showing strong correlation with the
outcome)-
X = df_imp.drop('Outcome', axis=1)

y = df_imp['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

random_state=42)

model = LogisticRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)

print("Accuracy:", accuracy)

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

print(f"Coefficients: {model.coef_}")

print(f"Intercept: {model.intercept_}")

Output:
For Linear Regression:
Model A on two split ratios - 7:3, 8:2
Model B on two split ratios - 7:3, 8:2
For Logistic Regression:

Model A on 7:3 split ratio

Model A on 8:2 split ratio

Model B on 7:3 split ratio

Conclusion:

By performing this experiment, I was able to understand how regression analysis can be carried
out on healthcare datasets to predict certain information, either categorical or continuous.
Following are some of the observations regarding the models trained during this experiment-
● In case of Linear Regression, the two models trained did not show a significant amount of
change in their performance parameters when the train-test split ratio was altered.
However, one does see improvements in the model when the independent variables under
consideration are those that show positive correlation with the output variable (the
number of days an admitted patient will stay in the hospital).
● In case of Logistic Regression, the first model, trained on all the attributes, showed an
accuracy of 77.5% on a 7:3 split ratio, while the accuracy was around 80.5% on 8:2 split
ratio. While the analysis shows that only a few variables have a comparative strong
correlation with the outcome, training a model on only those variables reduces the
accuracy to 73.6%, implying that the variables with low correlation to outcome still drive
the outcome collectively.

Test Bank and Solutions For Introduction To Econometrics 4th Edition by James Stock
No ratings yet
Test Bank and Solutions For Introduction To Econometrics 4th Edition by James Stock
7 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Academic Writing:: Guidelines For Preparing A Seminar Paper With Examples
No ratings yet
Academic Writing:: Guidelines For Preparing A Seminar Paper With Examples
50 pages
Atm 08 16 982
No ratings yet
Atm 08 16 982
8 pages
AIH_LAB1
No ratings yet
AIH_LAB1
10 pages
Breast Cancer Detection Algo Comparison
No ratings yet
Breast Cancer Detection Algo Comparison
15 pages
Experiment 5
No ratings yet
Experiment 5
10 pages
Experiment 5
No ratings yet
Experiment 5
9 pages
Support Vector Machine To Detect Hypertension
No ratings yet
Support Vector Machine To Detect Hypertension
4 pages
Building A Simple Machine Learning Model On Breast Cancer Data
No ratings yet
Building A Simple Machine Learning Model On Breast Cancer Data
12 pages
Predictive - Modelling - Project - PDF 1
No ratings yet
Predictive - Modelling - Project - PDF 1
31 pages
SVM Quantile DNA
No ratings yet
SVM Quantile DNA
9 pages
XSTK Câu hỏi
No ratings yet
XSTK Câu hỏi
19 pages
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
No ratings yet
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
7 pages
Breast Cancer Detection
No ratings yet
Breast Cancer Detection
15 pages
cancer detection
No ratings yet
cancer detection
8 pages
Journal of Statistical Software: Mstate: An R Package For The Analysis of Competing Risks and Multi-State Models
No ratings yet
Journal of Statistical Software: Mstate: An R Package For The Analysis of Competing Risks and Multi-State Models
30 pages
Kunal DS
No ratings yet
Kunal DS
92 pages
SVM
No ratings yet
SVM
12 pages
Linear Discriminant Analysis and Support Vector Machines For Classifying Breast Cancer
No ratings yet
Linear Discriminant Analysis and Support Vector Machines For Classifying Breast Cancer
4 pages
Modern Pridictive Modelling(Regression)
No ratings yet
Modern Pridictive Modelling(Regression)
12 pages
Web Application
No ratings yet
Web Application
13 pages
Cancer Detection Using Data Mining
No ratings yet
Cancer Detection Using Data Mining
13 pages
Disease Prediction
No ratings yet
Disease Prediction
9 pages
BSChem-Statistics in Chemical Analysis PDF
No ratings yet
BSChem-Statistics in Chemical Analysis PDF
6 pages
Application of Big Mining On Health Care Industry
No ratings yet
Application of Big Mining On Health Care Industry
6 pages
mlPPT_11_45
No ratings yet
mlPPT_11_45
31 pages
Project_Report
No ratings yet
Project_Report
18 pages
lecture2-supervised-learning slides
No ratings yet
lecture2-supervised-learning slides
56 pages
Big Data & Predictive Analytics: How To Submit
No ratings yet
Big Data & Predictive Analytics: How To Submit
4 pages
Assign 3 Datamining
No ratings yet
Assign 3 Datamining
9 pages
Guide Data Exploration
No ratings yet
Guide Data Exploration
16 pages
Parcial 1 Mineria
No ratings yet
Parcial 1 Mineria
16 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
18 pages
4823 Dsejournal
No ratings yet
4823 Dsejournal
129 pages
Ai Powered Medical Diagnosis-Phase 3
No ratings yet
Ai Powered Medical Diagnosis-Phase 3
10 pages
Dissertation Using Logistic Regression
100% (2)
Dissertation Using Logistic Regression
6 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines
No ratings yet
Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines
10 pages
HEART DOC
No ratings yet
HEART DOC
15 pages
CRI StatisticalModeling Methods
No ratings yet
CRI StatisticalModeling Methods
89 pages
52) Statistical Analysis
No ratings yet
52) Statistical Analysis
11 pages
Bootstrap
No ratings yet
Bootstrap
33 pages
LINEAR REGRESSION MODEL 1
No ratings yet
LINEAR REGRESSION MODEL 1
23 pages
Medical
No ratings yet
Medical
4 pages
Mini-Review: Kathleen F. Kerr, Allison Meisner, Heather Thiessen-Philbrook, Steven G. Coca, and Chirag R. Parikh
No ratings yet
Mini-Review: Kathleen F. Kerr, Allison Meisner, Heather Thiessen-Philbrook, Steven G. Coca, and Chirag R. Parikh
9 pages
MEDI 1020_Workshop 7_Regression (1)
No ratings yet
MEDI 1020_Workshop 7_Regression (1)
15 pages
Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling
No ratings yet
Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling
8 pages
Exp2 Milf
No ratings yet
Exp2 Milf
7 pages
Synopsis (Heart Disease Prediction)
No ratings yet
Synopsis (Heart Disease Prediction)
7 pages
AIML Expt
No ratings yet
AIML Expt
7 pages
AIML Report.
No ratings yet
AIML Report.
12 pages
Aih Exp 1
No ratings yet
Aih Exp 1
6 pages
2024-2017
No ratings yet
2024-2017
7 pages
Ecotrix With R and Python
No ratings yet
Ecotrix With R and Python
25 pages
Statistics in Medicine - 2004 - Steyerberg - Validation and Updating of Predictive Logistic Regression Models A Study On
No ratings yet
Statistics in Medicine - 2004 - Steyerberg - Validation and Updating of Predictive Logistic Regression Models A Study On
20 pages
Report
No ratings yet
Report
7 pages
Manual ML 1
No ratings yet
Manual ML 1
8 pages
Weka Project1 Sajeena
No ratings yet
Weka Project1 Sajeena
14 pages
Hierarchical Logistic Regression Modeling With SAS GLIMMIX
No ratings yet
Hierarchical Logistic Regression Modeling With SAS GLIMMIX
9 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Experiment-3
No ratings yet
Experiment-3
9 pages
Experiment-4
No ratings yet
Experiment-4
8 pages
Experiment-1
No ratings yet
Experiment-1
21 pages
Experiment-7
No ratings yet
Experiment-7
13 pages
Experiment-5
No ratings yet
Experiment-5
14 pages
Experiment-2
No ratings yet
Experiment-2
12 pages
DSM Mini Project
No ratings yet
DSM Mini Project
11 pages
Experiment-8
No ratings yet
Experiment-8
13 pages
Experiment 5
No ratings yet
Experiment 5
8 pages
DSM Practical 1
No ratings yet
DSM Practical 1
14 pages
Experiment 1
No ratings yet
Experiment 1
7 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
Lab6A-Asset Tracking
No ratings yet
Lab6A-Asset Tracking
27 pages
Experiment 4
No ratings yet
Experiment 4
12 pages
Assignment-1, 2
No ratings yet
Assignment-1, 2
2 pages
Experiment 7
No ratings yet
Experiment 7
6 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Assignment On Module-3
No ratings yet
Assignment On Module-3
3 pages
Class Assignment On Decision Trees
No ratings yet
Class Assignment On Decision Trees
6 pages
Class-Work-1 (26-08-2024)
No ratings yet
Class-Work-1 (26-08-2024)
5 pages
Class-Work-Naive-Bayes (21-10-2024)
No ratings yet
Class-Work-Naive-Bayes (21-10-2024)
5 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
No ratings yet
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
29 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
No ratings yet
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
14 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
14 pages
Univariate Time Series Modelling and Forecasting
No ratings yet
Univariate Time Series Modelling and Forecasting
62 pages
Panel Data Econometrics Sul Donggyu; instant download
100% (1)
Panel Data Econometrics Sul Donggyu; instant download
26 pages
Econometrics Lecture Note Chapter 3 (1)
No ratings yet
Econometrics Lecture Note Chapter 3 (1)
39 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
25 pages
Stat 565: Some Basic Time Series Models
No ratings yet
Stat 565: Some Basic Time Series Models
34 pages
12th Economics EM Slow Learners Study Materials English Medium PDF Download
No ratings yet
12th Economics EM Slow Learners Study Materials English Medium PDF Download
22 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Exercise 1
0% (1)
Exercise 1
5 pages
Chapter 3 - Lecture Notes
No ratings yet
Chapter 3 - Lecture Notes
20 pages
F - Assignment
No ratings yet
F - Assignment
9 pages
Generalized linear mixed models modern concepts methods and applications 1st Edition Stroup - Read the ebook online or download it for a complete experience
No ratings yet
Generalized linear mixed models modern concepts methods and applications 1st Edition Stroup - Read the ebook online or download it for a complete experience
54 pages
Regresia Multipla (1) - Modelare, Interpretare Si Testare
No ratings yet
Regresia Multipla (1) - Modelare, Interpretare Si Testare
10 pages
Stata Manual 2009
No ratings yet
Stata Manual 2009
222 pages
2020-01-17_787
No ratings yet
2020-01-17_787
48 pages
BM PDF
No ratings yet
BM PDF
307 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
midterm2024
No ratings yet
midterm2024
3 pages
VAR and VEC
No ratings yet
VAR and VEC
12 pages
Astreriou Dimitrios, Applied Econometrics - A Modern Approach Using EViews and Microfit (2006) PDF
100% (1)
Astreriou Dimitrios, Applied Econometrics - A Modern Approach Using EViews and Microfit (2006) PDF
425 pages
Retail EU Study
No ratings yet
Retail EU Study
452 pages
CH13 Wooldridge 7e+PPT 2pp
No ratings yet
CH13 Wooldridge 7e+PPT 2pp
14 pages
Exercises Eda Binsyoiii
No ratings yet
Exercises Eda Binsyoiii
8 pages
Bayesplot
No ratings yet
Bayesplot
1 page
Introductory econometrics A modern approach 6ed. Edition Wooldridge J.M download
No ratings yet
Introductory econometrics A modern approach 6ed. Edition Wooldridge J.M download
47 pages
Econometrica - 2022 - Hansen - A Modern Gauss Markov Theorem
No ratings yet
Econometrica - 2022 - Hansen - A Modern Gauss Markov Theorem
12 pages
HW 9 Update
No ratings yet
HW 9 Update
3 pages

Experiment 1

Uploaded by

Experiment 1

Uploaded by

Sardar Patel Institute of Technology,Mumbai

Department of Electronics and Telecommunication Engineering

Experiment 1: Regression in Healthcare Dataset

Name: Mohammed Shanouf Valijan Ansari (2021300004) (Batch - M) Date: 14-08-2024

● Write program for regression analysis for healthcare dataset.

● To demonstrate the working principle of regression techniques on medical data set

Regression is a statistical method used to model the relationship between a dependent

Image Depicting Linear Regression on One Input Variable

Dropping irrelevant columns

Assigning numeric values to ‘Age’ column

categorical_columns = ['Department', 'gender', 'Type of Admission',

Training and Testing: Model A (Considering all attributes)

y = df['Stay (in days)']

print("Model A (Considering All Attributes)")

ratios = [(0.3, '7:3'), (0.2, '8:2')]

for test_size, ratio in ratios:

X_train, X_test, y_train, y_test = train_test_split(X, y,

mae = mean_absolute_error(y_test, y_pred)

print(f"Train-Test Ratio {ratio}:")

print(f"Mean Squared Error (MSE): {mse}")

print(f"Mean Absolute Error (MAE): {mae}")

print(f"R2 Score (R2): {r2}")

print(f"Root Mean Squared Error (RMSE): {rmse}")

y = df_imp['Stay (in days)']

print("Model B (Considering Attributes Showing Strong Correlation to

ratios = [(0.3, '7:3'), (0.2, '8:2')]

for test_size, ratio in ratios:

X_train, X_test, y_train, y_test = train_test_split(X, y,

mse = mean_squared_error(y_test, y_pred)

mae = mean_absolute_error(y_test, y_pred)

print(f"Train-Test Ratio {ratio}:")

print(f"Mean Squared Error (MSE): {mse}")

print(f"Mean Absolute Error (MAE): {mae}")

print(f"R2 Score (R2): {r2}")

print(f"Root Mean Squared Error (RMSE): {rmse}")

For Logistic Regression:

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, precision_score, recall_score,

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1,

plt.title('Correlation Matrix Heatmap')

Training and Testing: Model A (Considering all attributes)-

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

print("F1 Score:", f1)

print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

print("F1 Score:", f1)

print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

Model A on 7:3 split ratio

Model A on 8:2 split ratio

You might also like