0% found this document useful (0 votes)
23 views

Mini Project Template Both

This document is a mini project report submitted by Aryan Khera and Harsh Singh to Visvesvaraya Technological University in partial fulfillment of their Bachelor of Engineering degree in Artificial Intelligence and Machine Learning. The report details their project on "Urban Governance Forecasting with Machine Learning" conducted under the guidance of their professor Mr. Manjunatha P B. It includes certificates of completion, an acknowledgment of those who assisted in the project, and an abstract of the project.

Uploaded by

Harsh singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Mini Project Template Both

This document is a mini project report submitted by Aryan Khera and Harsh Singh to Visvesvaraya Technological University in partial fulfillment of their Bachelor of Engineering degree in Artificial Intelligence and Machine Learning. The report details their project on "Urban Governance Forecasting with Machine Learning" conducted under the guidance of their professor Mr. Manjunatha P B. It includes certificates of completion, an acknowledgment of those who assisted in the project, and an abstract of the project.

Uploaded by

Harsh singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Belagavi-590018, Karnataka

AI&ML Application Development Laboratory (18AIL76)


Mini Project Report
on
“Urban Governance Forecasting with Machine Learning”

Submitted in partial fulfillment for the award of the degree of


Bachelor of Engineering in
Artificial Intelligence & Machine Learning

Submitted by

USN Name
1BI20AI005 Aryan Khera
1BI20AI014 Harsh Singh

for the academic year 2023-24

Under the Guidance of


Mr. Manjunatha P B
Assistant Professor

Department of Artificial Intelligence & Machine Learning


Bangalore Institute of Technology
K.R. Road, V.V.Pura, Bengaluru-560 004
2023-24
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“Jnana Sangama”, Belagavi-590018, Karnataka

BANGALORE INSTITUTE OF TECHNOLOGY


Department of Artificial Intelligence & Machine Learning
K.R. Road, V.V.Pura, Bengaluru-560 004

Certificate

This is to certify that AI&ML Application development Laboratory Mini


project work entitled “Urban Governance Forecasting with Machine
Learning” carried out by

USN Name
1BI20AI005 Aryan Khera
1BI20AI014 Harsh Singh

bonafide students of Bangalore Institute of Technology in partial fulfillment


for the award of degree of Bachelor of Engineering in Artificial Intelligence &
Machine Learning under Visvesvaraya Technological University, Belagavi,
during the academic year 2023-24 is true representation of mini project work
completed satisfactorily.

Mr. Manjunatha P B Dr. Jyothi D. G. Dr. Aswath M. U.


Assistant Professor Professor & HoD Principal
Dept. of AI&ML Dept. of AI&ML BIT
BIT, Bengaluru BIT, Bengaluru. Bengaluru.

Name of the Examiners, Signature with date


1.

2.
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the completion of a task would be incomplete
without crediting the people who made it possible, whose constant guidance and
encouragement crowned the efforts with success.

We would like to profoundly thank Management of Bangalore Institute of Technology for


providing such a healthy environment for successful completion of mini project work.

We would like to express our thanks to the Principal Dr. Aswath M.U. for his
encouragement that motivates us for the successful completion of mini project work.

It gives us immense pleasure to thank Dr. Jyothi D.G. Professor & Head, Department of
Artificial Intelligence & Machine Learning for her constant support and encouragement.

We would like to express our deepest gratitude to our mini project guide Mr. Manjunatha
P.B for his constant support and guidance throughout the Mini Project work.

We are very much pleasured to express our sincere gratitude to the friendly co-operation
showed by all the staff members of Artificial Intelligence and Machine Learning
Department, BIT.

Last but not the least, we would here by acknowledge and thank our friends and family who
have been our source of inspiration always instrumental in successful completion of the
Project work.

Date: 1-1-2024
Place: Bengaluru

Aryan Khera
Harsh Singh
ABSTRACT
INDEX
LIST OF FIGURES

Fig No Title Page no


CHAPTER – 1
INTRODUCTION
Chapter 1

INTRODUCTION

Dept., of AI&ML,BIT 2023-24 1


CHAPTER – 2
LITERATURE REVIEW
LITERATURE REVIEW CHAPTER 2

Chapter 2

LITERATURE REVIEW
Paper 1:
[1] Title: “Classification Of Diabetes Disease Using Support Vector Machine.
International Journal of Engineering Research and Applications” by Jegan, Chitra.
(2013)
• Diabetes mellitus, a global health concern, affects 285 million people worldwide,
projected to reach 380 million in the next 20 years. To address this, a classifier is
essential for efficient and cost-effective diabetes detection. The Pima Indian diabetic
database at the UCI machine learning laboratory is a standard for testing such
algorithms. This study advocates the use of Support Vector Machine (SVM) as a
classifier, showcasing its success in accurately diagnosing diabetes from high-
dimensional medical datasets.

Paper 2:
[2] Title: “Prediction of Diabetes Using Data Mining Techniques. Research Journal of
Pharmacy and Technology” by Mareeswari, V. & Saranya, R & Mahalakshmi, R &
Preethi, E. (2017)

• Diabetes, a widespread global ailment, disproportionately affects women and poses


an escalating risk. In response to the cumbersome diagnosis process, this study
employs the K Nearest Neighbor (KNN) classification technique. Utilizing a
diabetes dataset for training and patient details for testing, the KNN algorithm
efficiently classifies training data and predicts target data. The research underscores
the algorithm's effectiveness in diabetes diagnosis, with a nuanced analysis
conducted across different k values.

Paper 3:
[3] Title: “Prognosis of Diabetes Using Data mining Approach-Fuzzy C Means
Clustering and Support Vector Machine. International Journal of Computer Trends

Dept., of AI&ML,BIT 2023-24 3


LITERATURE REVIEW CHAPTER 2

and Technology” by Sanakal, Ravi & Jayakumari, Smt. (2014)

• This study employs data mining methods, specifically FCM and SVM, to analyze a
diabetes databank for diagnosis. The dataset, sourced from the UCI repository,
includes 9 clinical attributes and an output indicating diabetes diagnosis across 768
cases. By leveraging these techniques, the research aims to enhance clinical
decision-making with efficient analysis of large medical datasets.

2.1 EXISTING SYSTEM:


The existing systems for diabetes prediction often involve the application of machine
learning algorithms on datasets containing relevant health information. Commonly used
algorithms include logistic regression, decision trees, support vector machines, and neural
networks. These models are trained on historical data that include features such as glucose
levels, blood pressure, BMI, and other health indicators. The goal is to develop accurate
prediction models that can assess an individual's risk of diabetes based on their health
profile. These systems play a crucial role in early detection and preventive healthcare
strategies for managing diabetes.

2.2 PROBLEM STATEMENT:


Achieving predictive healthcare for diabetes involves training machines with expert-like
understanding from authentic datasets to address complex health issues proactively.

2.3 PROPOSED SYSTEM:


The proposed system is designed to address the binary classification challenge presented by
the Indian PIMA dataset, with a specific focus on diabetes diagnosis. This classification task
involves determining whether an individual has diabetes or not. To achieve this goal, the
system employs both machine learning and deep learning approaches. In the realm of
machine learning, the Support Vector Machine (SVM) algorithm is selected for its proven
efficacy in handling complex datasets. Simultaneously, deep learning is integrated into the
system through the use of a Neural Network, a powerful model capable of capturing intricate
patterns and relationships within the data. This dual approach is expected to enhance
prediction accuracy, providing a robust solution for the classification of the Indian PIMA

Dept., of AI&ML,BIT 2023-24 4


LITERATURE REVIEW CHAPTER 2

dataset and contributing to more effective diabetes predict

Dept., of AI&ML,BIT 2023-24 5


CHAPTER – 3
SYSTEM REQUIREMENTS
Chapter 3
SYSTEM REQUIREMENTS

3.1 Hardware Requirements

Processor: Intel i5 10th gen or above


Monitor Screen: 14 or above
RAM: 6GB or above
Memory: 512GB SSD/ 1TB HDD and above

3.2 SOFTWARE REQUIREMENT

Programming Languages: Python


Software Required: Jupyter notebook
Python Libraries: csv, image, numpy, matplotlib
Operating System: Windows
Terminal window- to run the Python file using commands

Dept., of AI&ML,BIT 2023-24 7


CHAPTER 4
SYSTEM DESIGN
Chapter 4
SYSTEM DESIGN
4.1 ALGORITHM
• LOGISTIC REGRESSION: Logistic regression is a commonly used statistical
method for predicting the likelihood of a particular outcome based on one or more
predictor variables. When it comes to healthcare, specifically in the realm of diabetes
prediction, logistic regression can be a valuable tool for assessing the risk of an
individual developing diabetes based on various risk factors or predictors.
• By using logistic regression, healthcare providers and researchers can develop
predictive models to estimate the probability of an individual developing diabetes
within a specified time frame. This can be particularly useful for early intervention
and preventive measures. Logistic regression models can serve as a clinical decision
support tool for healthcare professionals. By inputting patient-specific data into the
model, clinicians can obtain a risk score or probability estimate, which can inform
diagnostic and treatment decisions.
• Binary Logistic Regression: A statistical method for predicting the probability of a
binary outcome (an event with only two possible outcomes). Uses a logistic function
(S-shaped curve) to map the relationship between independent variables (predictors)
and the dependent variable (outcome).
• Nominal Logistic Regression: Used for response variables with three or more
unordered categories. It treats each category independently, estimating a set of
coefficients for each category compared to a reference category. Interprets
coefficients indicate the change in the log odds of being in a particular category
relative to the reference category, given a one-unit change in the predictor variable.
• Ordinal Logistic Regression: Used for response variables with three or more
ordered categories. Incorporates the order of the categories, often using the
proportional odds assumption. This means that the coefficients for each predictor
are assumed to be the same across all levels of the response variable, except for a
possible intercept shift. Interprets coefficients indicate the change in the log odds of
being in a higher category (or lower, depending on the model) relative to the
reference category, given a one-unit change in the predictor variable.

Dept., of AI&ML,BIT 2023-24 9


SYSTEM DESIGN CHAPTER 4

The following steps are involved in the diabetes prediction model:

1. Gathering Data:

• Data Collection: Obtain relevant data from sources such as healthcare databases,
electronic health records (EHRs), medical literature, or patient surveys. The dataset
might include variables like age, gender, BMI, family history of diabetes, blood
pressure, cholesterol levels, and glucose levels.
• Data Sources: Collect data from clinical settings, research studies, public health
databases, or wearable devices that monitor health metrics.
• Data Quality: Ensure that the data is accurate, reliable, and comprehensive. Validate
the data sources and address any issues related to missing values, outliers, or
inconsistencies that could affect the predictive model's performance.

2. Data Pre-processing:

• Data Cleaning: Clean the dataset by handling missing values (e.g., imputation using
mean, median, or regression-based methods), removing duplicates, and correcting
errors in the data.
• Data Transformation: Transform variables as needed, such as normalizing numerical
variables (e.g., glucose levels, BMI) to a common scale or encoding categorical
variables (e.g., gender, family history) using appropriate techniques like one-hot
encoding.
• Feature Engineering: Create new features or variables that may be predictive of
diabetes risk, such as BMI categories, age groups, or interaction terms between
relevant variables.
• Data Splitting: Divide the dataset into a training set (to train the predictive model)
and a test set (to evaluate the model's performance).

3. Researching the Model:

• Model Selection: Research various models suitable for binary classification tasks.
Logistic regression is a natural choice, given its interpretability and simplicity. Other
models like decision trees, random forests, or support vector machines can also be
considered.

Dept., of AI&ML,BIT 2023-24 10


SYSTEM DESIGN CHAPTER 4

• Model Assumptions: Understand the assumptions of logistic regression, such as the


linearity of the log-odds and absence of multicollinearity. Ensure that these
assumptions align with the characteristics of the diabetes dataset.
• Hyperparameter Tuning: For models with hyperparameters (e.g., regularization
strength in logistic regression), perform tuning to find the optimal values that
maximize predictive performance.

4. Training and Testing the Model:

• Model Training: Train the selected model on the training dataset using features
(predictors) to predict the binary outcome variable (diabetes or non-diabetes).
• Model Testing: Evaluate the model's performance on the test dataset. Assess metrics
such as accuracy, precision, recall, F1-score, and ROC-AUC to gauge how well the
model generalizes to new, unseen data.
• Cross-Validation: Implement cross-validation techniques to ensure robustness in
model evaluation and mitigate overfitting or underfitting issues.

5. Evaluation:

• Model Evaluation: Analyze the evaluation metrics to understand the model's


strengths and weaknesses. Interpret the results in the context of diabetes prediction
and healthcare decision-making.
• Model Interpretability: For logistic regression, interpret the coefficients associated
with each predictor variable. This provides insights into the direction and strength
of the relationships between features and the likelihood of diabetes.
• Iterative Refinement: Based on the evaluation, refine the model. This may involve
adjusting features, experimenting with different models, or fine-tuning
hyperparameters to achieve better predictive performance.
• By following these steps, we can develop a diabetes prediction model that is
informed by the specific characteristics of the dataset and optimized for accurate and
interpretable predictions in a healthcare context.

Dept., of AI&ML,BIT 2023-24 11


SYSTEM DESIGN CHAPTER 4

4.2 Dataflow

DATA ACQUISITION

DATA PRE-
PROCESSING

FEATURE
SELECTION

TRAINING DATA TRAINING DATA

LOGISTIC
REGRESSION

PREDICTION
RESULT

Fig 4.1 Dataflow

Dept., of AI&ML,BIT 2023-24 12


SYSTEM DESIGN CHAPTER 4

1. Data Acquisition:
Data acquisition for diabetes prediction using Logistic Regression involves
gathering patient data from sources such as electronic health records or clinical
databases, selecting relevant features like age, gender, blood pressure, and
cholesterol levels, and ensuring data quality and privacy compliance. The acquired
dataset is then split into training and testing sets, with the training set used to train
the Logistic Regression model to recognize patterns associated with diabetes
disease. This step is crucial as the quality of the acquired data directly influences the
model's ability to make accurate predictions and generalize to new, unseen patient
data, laying the foundation for a successful predictive model.

2. Data Preprocessing:
Data preprocessing for diabetes disease prediction using Logistic Regression
involves a series of essential steps to optimize the raw data for accurate model
training. This process encompasses handling missing values through imputation or
removal, addressing outliers to prevent their impact on model performance,
normalizing or standardizing numerical features to ensure uniform scaling, encoding
categorical variables into a numerical format, performing feature engineering to
derive new informative features, addressing imbalances in the dataset through
oversampling or under sampling.

3. Feature Selection:
Feature selection is a critical step in the machine learning pipeline, aimed at
identifying and retaining the most relevant features from the dataset while discarding
less informative or redundant ones. In the context of heart disease prediction using
Logistic Regression, feature selection involves choosing the subset of features that
contribute most significantly to the model's predictive performance.
4. Spliting the Dataset:
In the diabetes disease prediction using Logistic Regression, the dataset is crucially
split into a training set for model training and a testing set for evaluation. This
process ensures that the model generalizes well to new data. Typically, a random or
stratified split is employed, with a portion reserved for training (e.g., 80%) and the
remainder for testing (e.g., 20%). The aim is to prevent overfitting, enabling the
Logistic Regression model to accurately predict heart disease on unseen data.

Dept., of AI&ML,BIT 2023-24 13


SYSTEM DESIGN CHAPTER 4

5. Classification:
One of the Simplest and best ML classification algorithm is Logistic Regression.
The LR is the supervised ML binary classification algorithm widely used in most
application. It works on categorical dependent variable the result can be discrete or
binary categorical variable 0 or 1. The sigmoid function is used as a cost function.
Sigmoid function maps a predicted real value to a probabilistic value between „0‟
and „1‟.

6. Predict Result:
To predict heart disease using Logistic Regression, preprocess new patient data,
extract features, and input them into the trained model. The model assigns a
probability score, and a threshold (e.g., 0.5) is applied for binary classification.
Predictions above the threshold indicate the presence of heart disease; below the
threshold, absence. Interpretation should consider the chosen threshold's impact on
sensitivity and specificity in the context of heart disease prediction.

Dept., of AI&ML,BIT 2023-24 14


CHAPTER 5
IMPLEMENTATION
Chapter 5
IMPLEMENTATION
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
Data collection and Processing(heading)
#loading the csv data to a pandas dataframe
data = pd.read_csv('diabetes.csv')

#print first 5 rows of the dataset


data.head()

#print last 5 rows of the dataset


data.tail()

# Number of rows and columns in the dataset


data.shape

#getting some info about the data


data.info()

#checking for missing values


data.isnull().sum()

#statistical measures about the data


data.describe()

#checking the distribution of Target variable

Dept., of AI&ML,BIT 2023-24 16


IMPLEMENTATION CHAPTER 5

data['Outcome'].value_counts()
Splitting the Features and Target(heading)
x = data.drop(columns='Outcome', axis=1)
y = data['Outcome']

print(x)
print(y)

#Analysing the Outcome variable


import seaborn as sns
y=data["Outcome"]
sns.countplot(y)

#analysing 'Pregnancies' feature

data['Pregnancies'].unique()

sns.barplot(data['Pregnancies'],y)

#Analysing the 'Glucose' feature

data['Glucose'].unique()

sns.barplot(data['Glucose'],y)

Splitting the Data into Training data and Test data(heading)


X_train,X_test,Y_train,Y_test = train_test_split(x,y,test_size=0.2, stratify=y,
random_state=2)
print(x.shape,X_train.shape,X_test.shape)

Model Training(heading)
Logistic Regression
model = LogisticRegression()

Dept., of AI&ML, BIT 2023-24 17


IMPLEMENTATION CHAPTER 5

#training the LogisticRegression model with Training data


model.fit(X_train, Y_train)

Model Evalution(heading)
Accuracy Score(heading)

#accuracy on training data


X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

print('Accuracy on Training data:',training_data_accuracy)

#accuracy on test data


X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

print('Accuracy on Test data:',test_data_accuracy)


Building a Predictive System(heading)
input_data = (1,85,66,29,0,26.6,0.351,31)

#change the input data to a numpy array

input_data_as_numpy_array = np.asarray(input_data)

#reshape the numpy as we are predicting for only on instance

input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction =model.predict(input_data_reshaped)

print(prediction)

if prediction[0]==0:

print('The Person does not have diabetes')

else:

print('The Person has diabetes ')

Dept., of AI&ML, BIT 2023-24 18


CHAPTER – 6
RESULTS WITH SNAPSHOTS
CHAPTER 6
RESULTS AND SNAPSHOTS

Fig 6.11 Data collection and Processing


Data collection is aquisition of datasets and displaying according to our need.

Fig 6.12 Printing values in a row

Dept., of AI&ML,BIT 2023-24 20


RESULTS WITH SNAPSHOTS CHAPTER 6

Fig 6.13
Getting information and checking for missing values in the dataset we have taken.

Fig 6.14 Statistical measures about the data which is essential for the regression.

Dept., of AI&ML,BIT 2023-24 21


RESULTS WITH SNAPSHOTS CHAPTER 6

Fig 6.15 Splitting the features and target


Reduction in Variance is a method for splitting the node used when the target variable is
continuous, i.e., regression problems. It is called so because it uses variance as a measure
for deciding the feature on which a node is split into child nodes. Variance is used for
calculating the homogeneity of a node.

Dept., of AI&ML,BIT 2023-24 22


RESULTS WITH SNAPSHOTS CHAPTER 6

Fig 6.2 Exploratory Data Analysis


Exploratory Data Analysis (EDA) is a process that helps data analysts understand their data
and form hypotheses. It's a crucial step before applying machine learning techniques.

Fig 6.3 Analysis of Pregnancies feature

Dept., of AI&ML,BIT 2023-24 23


RESULTS WITH SNAPSHOTS CHAPTER 6

Fig 6.4 Analysis of Glucose feature

Fig 6.5 Splitting the data


Data splitting is a technique that involves dividing the original data into three or four sets.
The purpose of data splitting is to avoid overfitting, which is when a machine learning model
fits its training data too well and fails to fit additional data.

Dept., of AI&ML,BIT 2023-24 24


RESULTS WITH SNAPSHOTS CHAPTER 6

Fig 6.6 Model training


Model training is a phase in the data science development lifecycle. It's the primary step in
machine learning, resulting in a working model that can be validated, tested, and deployed.

Fig 6.7 Model Evaluation

Model evaluation in machine learning is the process of using various metrics and approaches
to determine a trained model's effectiveness and quality. It involves evaluating whether the
model achieves the required goals and how well it generalizes to fresh, untested data.

Dept., of AI&ML,BIT 2023-24 25


RESULTS WITH SNAPSHOTS CHAPTER 6

Fig 6.8 Predictive System

Dept., of AI&ML,BIT 2023-24 26


CONCLUSION

Diabetes prediction using machine learning has demonstrated the potential to significantly
contribute to the field of healthcare and disease management. Through the utilization of
advanced machine learning algorithms, we have developed a predictive model that
analyzes relevant medical data to accurately identify individuals at risk of diabetes.

The main objective of the project is to classify and identify Diabetes Patients using ML
algorithms is being discussed throughout the project.

We build the model using some machine learning algorithms such as logistic regression,
decision tree, Random Forest and Gradient Boosting, these all are supervised machine
learning algorithm in machine learning.

As part of the future scope, we hope to try out different algorithms to optimize the feature
output process, increase the feature similarity of data to improve the model’s representation
capability.

Dept., of AI&ML,BIT 2023-24 27


REFERENCES

[1]. Ahamed, K. U., Islam, M., Uddin, A., Akhter, A., Paul, B. K., Yousuf, M. A., . . . Moni, M. A.,
et al. (2021). A deep learning approach using effective preprocessing techniques to detect covid-19
from chest CT-scan and X-ray images. Computers in Biology and Medicine, 139, Article 105014.
10.1016/j.compbiomed.2021.105014.

2. Albahli, S. (2020). Type 2 machine learning: An effective hybrid prediction model for
early type 2 diabetes detection. Journal of Medical Imaging and Health Informatics, 10,
1069–1075.

3. Brownlee, J. (2016a). A gentle introduction to the gradient boosting algorithm for


machine learning. https://machinelearningmastery.com/gentle-introductiongradient-
boosting-algorithm-machine-learning/ Accessed: 2021-03-20. Brownlee, J. (2016b). K-
nearest neighbors for machine learning. https:// machinelearningmastery.com/k-nearest-
neighbors-for-machine-learning/ Accessed: 2021-03-20. Brownlee, J. (2016c). Logistic
regression for machine learning. https://www. geeksforgeeks.org/understanding-logistic-
regression/ Accessed: 2021-03-20.

4. Choubey, D. K., Paul, S., Kumar, S., & Kumar, S. (2017). Classification of Pima Indian
diabetes dataset using naive Bayes with genetic algorithm as an attribute selection. In
Proceedings of the international conference on communication and computing system
(ICCCS 2016) (pp. 451–455).

5.Dinh, A., Miertschin, S., Young, A., & Mohanty, S. D. (2019). A data-driven approach to
predicting diabetes and cardiovascular disease with machine learning. BMC Medical
Informatics and Decision Making, 19, 211.

You might also like