IOMP_DOC-2[1]3 final
IOMP_DOC-2[1]3 final
on
Bachelor of Technology
in
by
ABHIJITH S R – 21J21A0501
of
1
JOGINPALLY B.R. ENGINEERING COLLEGE
Accredited by NAAC with A+ Grade, Recognized under Sec. 2(f) of UGC Act. 1956
Approved by AICTE, Affiliated to JNTUH, Hyderabad and ISO 9001:2015 Certified
Bhaskar Nagar, Yenkapally, Moinabad (Mandal)
R.R (Dist)-500075. T.S., India
CERTIFICATE
Dr.D. Magdalene Delighta Angeline B.Tech., M.Tech., Ph.D. Dr. T. Prabakaran B.E, M.E., Ph.D.,
Associate Professor Professor
EXTERNAL EXAMINER
2
DECLARATION OF THE STUDENT
I hereby declare that the Industrial Oriented Mini Project entitled “SOFTWARE DRIVEN
WASTE SEGREGATION USING MACHINE LEARNING AND CNN”, presented under the
supervision of Dr.D.Magdalene Delighta Angeline B.Tech.,M.Tech.,Ph.D. Assistant professor and
submitted to Joginpally B.R. Engineering College is original and has not been submitted in part or
whole for Bachelor degree to any other university.
ABHIJITH S R – 21J21A0501
SAI KRISHNA RAO K - 21J21A0536
JAGADEESH KUMAR K – 22J25A0507
3
ACKNOWLEDGEMENT
We would like to take this opportunity to place it on record that this Project Report would
never have taken shape but for the cooperation extended to us by certain individuals. Though it is
not possible to name all of them, it would be unpardonable on our part if we do not mention some
of the very important persons.
We express our gratitude to Dr.T. PRABAKARAN B.E, M.E., Ph.D, HOD of Computer
Science and Engineering for his valuable suggestions and advices.
Finally, we would like to thank our parents and friends for their cooperation to complete
this Project Report.
ABHIJITH S R – 21J21A0501
SAI KRISHNA RAO K - 21J21A0536
JAGADEESH KUMAR K – 22J25A0507
4
ABSTRACT
The Smart Recycling System aims to enhance waste management efficiency through software-
driven waste identification and classification. By leveraging image processing techniques with OpenCV
and machine learning algorithms, the system classifies recyclable materials such as plastic, paper, and
metal from uploaded images. The system utilizes a Convolutional Neural Network (CNN) model, trained
on a custom dataset, to identify various waste types with high accuracy. The user interacts with the
system by uploading images of waste, which the software then processes and classifies, suggesting the
correct recycling category for each material. This approach eliminates manual sorting, helping to reduce
waste segregation errors. The system’s intuitive software interface ensures ease of use, even for
individuals unfamiliar with recycling guidelines. By focusing solely on software, this project provides
an effective and scalable solution to optimize recycling processes without the need for hardware or IoT
integration.
5
TABLE OF CONTENTS
1 INTRODUCTION 11
1.1 Objective 12
1.2 Scope and Challenges 12
1.3 Problem Analysis 13
2 LITERATURE REVIEW 15
2.1 Existing System Analysis 15
2.2 Areas for Improvement 16
2.3 Proposed System 17
3 SYSTEM ANALYSIS 19
3.1 Functional Requirements 21
3.2 Non-Functional Requirements 22
3.3 Hardware Requirements 22
3.4 Software Requirements 22
3.5 Data Collection Process 23
3.6 Deliverables and Beneficiaries 23
3.7 Algorithm 23
3.8 Methodology 24
4 SYSTEM FEASIBILITY 25
4.1 Economical Feasibility 26
4.2 Technical Feasibility 28
7
LIST OF FIGURES
8
LIST OF TABLES
9
LIST OF ABBREVATIONS
AI - Artificial Intelligence
ML - Machine Learning
DL - Deep Learning
API - Application Programming Interface
DB - Database
SQL - Structured Query Language
JSON - JavaScript Object Notation
UI - User Interface
UX - User Experience
IDE - Integrated Development Environment
CSV - Comma-Separated Values
SVM - Support Vector Machine
KNN - K-Nearest Neighbours
RF - Random Forest
OCR - Optical Character Recognition
UAT - User Acceptance Testing
ER - Entity-Relationship
RAM - Random Access Memory
10
CHAPTER 1
INTRODUCTION
The code is focused on data preprocessing, feature engineering, and machine learning model
training for a smart bin dataset. It starts by loading the data, handling missing values, and encoding
categorical features. The next steps involve detecting and removing outliers, followed by the creation
of new features based on the existing ones. Afterward, a correlation analysis is performed to identify
relationships between features. The data is scaled for machine learning models that are sensitive to
feature scaling. The code then trains and evaluates various machine learning models (KNN, SVM,
Logistic Regression, Decision Tree, Random Forest, and Neural Networks) using performance
metrics like accuracy, precision, recall, and confusion matrices to predict the target variable, likely
related to waste classification or recycling.
Once the dataset is cleaned, the next step involves identifying and removing outliers that could
potentially skew the analysis. This is done using boxplots for visual inspection, followed by replacing
extreme outlier values with the mean of the respective columns. Afterward, the program calculates
the change in fill levels (FL_C, FL_C_3, FL_C_12) based on the difference between the FL_A and
FL_B columns, which likely represent different states or measurements of the dataset over time.
The correlation matrix is then computed to understand the relationships between various
features, and a heatmap is plotted to visualize the strength of these correlations. To ensure that features
are on a comparable scale, standard scaling is applied to the selected columns, ensuring that no
feature disproportionately influences the model due to differing units or ranges.
The dataset is then split into training and testing sets, where features (FL_A, FL_C, and VS)
are used to predict the target variable (Class). Various classification algorithms are tested, including
K-Nearest Neighbours (KNN), Support Vector Machine (SVM), Logistic Regression, Decision Tree,
Neural Networks, and Random Forest. Each model is trained on the training data and tested on the
11
testing set to evaluate its performance. Metrics like accuracy, precision, recall, F1 score, and the
Matthews correlation coefficient (MCC) are calculated for each model to assess the quality of the
predictions.
The code also includes plotting functions for visualizing the model scores for KNN, decision
tree, and random forest classifiers for different hyperparameters (like the number of neighbours or
trees). These visualizations help identify the optimal number of neighbours or trees that maximize the
model’s performance. Additionally, confusion matrices are plotted for each model, providing a clear
view of how well each algorithm distinguishes between the different classes.
1.1 Objective
The objective of the provided code is to preprocess a dataset, train multiple machine learning
models, evaluate their performance, and identify the best model for predicting a target variable, Class,
based on several input features related to sensor data from waste bins. The code seeks to apply various
classification algorithms to the pre-processed data and compare their effectiveness in terms of
accuracy and other evaluation metrics. Ultimately, the goal is to determine which machine learning
model offers the best predictive performance for classifying the waste data.
A key part of the objective is to handle and clean the data effectively before model training.
The preprocessing includes handling missing values, removing outliers, and encoding categorical
variables. Missing values are replaced with the median of respective columns to ensure that the dataset
is complete and does not introduce biases due to missing data. Outliers are identified using boxplots
and replaced with median values to prevent them from skewing the results. The encoding of
categorical variables like Class, Container Type, and Recyclable fraction ensures that all features are
in a numerical format suitable for machine learning algorithms.
Another significant aspect of the objective is to scale the features to ensure that no individual
feature dominates the learning process due to differences in scale. Feature scaling using
standardization (StandardScaler) is applied to the numerical features, ensuring that each feature
contributes equally to the model. This is particularly important for distance-based algorithms like
KNN and SVM, which are sensitive to the magnitude of the features. The scaling step ensures that
the models are not biased towards features with larger numerical ranges.
The scope of the provided code is focused on applying machine learning techniques to a real-
world dataset, which involves sensor data from waste bins. The dataset contains various features that
help in predicting the class of waste, such as Class, Container Type, Recyclable fraction, and sensor
12
readings like temperature and humidity. The primary goal is to preprocess the data, train different
machine learning models, evaluate their performances, and identify the best-suited model for the
classification task. This objective is pertinent to waste management systems, where accurate
classification of waste is essential for efficient sorting and recycling processes.
One of the key aspects of the scope is the use of multiple machine learning algorithms to solve
the classification problem. The code implements algorithms like K-Nearest Neighbors (KNN),
Support Vector Machine (SVM), Logistic Regression, Decision Trees, Multi-Layer Perceptron
(MLP), and Random Forest. Each model is chosen based on its distinct characteristics, and the code
aims to assess their effectiveness in classifying waste into predefined categories. This comparison is
important to identify the strengths and weaknesses of different algorithms, ultimately selecting the
one that provides the best accuracy and reliability for waste classification.
One of the main challenges in the code arises from data preprocessing. The dataset contains
missing values, and the process of imputing these missing values with the median could introduce
biases if the missingness is not random. For example, if certain classes or features have higher missing
rates, simply imputing them may distort the overall distribution of the data. Additionally, the
treatment of outliers could be another challenge. While replacing outliers with the median helps
mitigate their impact, this approach may not always be appropriate for all types of data or
distributions, potentially affecting model performance.
The problem presented in the code involves classifying waste data, where the goal is to predict
the class of waste based on various sensor features such as temperature, humidity, and other
characteristics associated with waste containers. In essence, it is a classification problem where the
system must accurately identify the category of waste based on the input sensor readings. The
challenge lies in working with a real-world dataset that may contain missing values, noisy data,
imbalanced classes, and features of varying types, which can affect the performance of machine
learning models. Addressing these challenges is crucial for creating an effective waste classification
system.
One significant problem is the presence of missing data in the dataset. Real-world datasets often
contain gaps due to errors in data collection, sensor malfunctions, or non-responses from the monitored
environment. In this case, the missing values are imputed using the median value of the corresponding
feature. However, the imputation strategy is simplistic and may not always be ideal, particularly if the
missingness is not random. If the data is missing in a non-random manner, the imputation of missing
values using the median could introduce bias into the dataset, which may affect the performance of the
models, leading to inaccurate predictions or misrepresentations of the waste classes. This challenge is
13
particularly evident when dealing with large datasets that are highly sensitive to data quality
Feature scaling also presents a challenge, especially when using algorithms like K-Nearest
Neighbors (KNN), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). These models
are sensitive to the scale of the features, meaning that differences in the magnitude of features could
lead to biased results. For example, if one feature has values ranging from 0 to 1 and other ranges from
1000 to 10000, models like SVM or KNN could give undue importance to the feature with the larger
range, distorting the classification results. While the code addresses this issue by standardizing the
numerical features, the challenge remains that different algorithms react differently to scaled data. Tree-
based models like Random Forest or Decision Trees are not sensitive to feature scaling, which makes
the preprocessing step more complex when trying to balance the needs of all models.
Another important problem is the handling of categorical variables, such as Class (the target
variable), Container Type, and Recyclable Fraction. These features are encoded using label encoding,
which assigns numerical values to each category. While this method works for models like decision
trees, which are not sensitive to the ordinal nature of categorical variables, it can introduce issues for
models that assume a continuous relationship between the encoded values. For example, linear models
or logistic regression may misinterpret the numerical labels as having an inherent ordinal relationship,
which is not the case for all categorical variables. This misinterpretation can lead to less accurate
predictions. As a result, a more advanced encoding method such as one-hot encoding or target encoding
may be needed to better represent the categorical data for certain models.
14
CHAPTER 2
LITERATURE REVIEW
Smart Bins are an advanced innovation aimed at revolutionizing traditional waste management
practices by introducing automation, data collection, and intelligent processing. These bins are
designed to address common inefficiencies, such as delayed waste collection, improper segregation,
and low recycling rates, through the integration of sensor technology and data analytics.
2.1 Fill Levels Monitoring
Sensors installed in the bins continuously monitor how full they are and report this data as fill levels,
typically categorized as FL_A (low), FL_B (medium), and FL_C (high). This information is
transmitted to a centralized system to ensure timely waste collection. For instance, when a bin
approaches FL_C, it triggers an alert to prevent overflow, thereby maintaining hygiene and aesthetics
in urban areas.
2.2 Waste Type Detection
Smart Bins can identify the type of waste they contain. By utilizing specialized sensors or image
processing techniques, they can distinguish between organic, plastic, metal, and paper waste. This
automated classification eliminates human intervention and improves segregation accuracy, which is
critical for efficient recycling processes.
2.3 Environmental Condition Monitoring
For bins storing organic waste, additional sensors can measure parameters like temperature and
moisture. This data helps monitor decomposition rates and detect any issues, such as the potential
release of harmful gases, allowing timely intervention.
By leveraging these features, Smart Bins not only automate waste segregation but also optimize
waste collection schedules and reduce operational inefficiencies. For example, municipal authorities
can plan collection routes based on real-time fill levels, avoiding unnecessary trips to half-empty bins
while prioritizing those nearing capacity. This targeted approach minimizes fuel consumption, reduces
carbon emissions, and improves overall resource allocation.
Moreover, Smart Bins are not limited to passive waste management; they actively engage users by
guiding them to dispose of waste correctly. Some systems use visual or auditory cues to indicate the
appropriate disposal bin for each waste type, promoting user participation in recycling efforts.
2.4 Existing System Analysis
The existing system presented above focuses on processing a dataset related to smart bin data, using
various machine learning techniques to build predictive models for waste management classification. It
involves several steps:
15
The dataset is loaded and missing values are replaced with the median value of respective columns. Label
encoding is performed to transform categorical variables into numerical values. Outliers in specific
features (e.g., FL_A, FL_B) are detected using boxplots and then replaced with the mean value where
necessary. A new feature, FL_C, is created as the difference between FL_A and FL_B to represent the
change in fill level. Standard scaling is applied to certain features to standardize them for better
performance in machine learning models. Several models such as K-Nearest Neighbour’s (KNN),
Support Vector Machine (SVM), Logistic Regression, Decision Trees, Neural Networks, and Random
Forests are used for classification. Each model is evaluated using metrics like accuracy, precision,
recall, F1 score, and Matthews Correlation Coefficient (MCC). The performance of each model is
evaluated using confusion matrices and various classification metrics. Visualizations such as confusion
matrices and model performance plots are generated
17
Moreover, the system is designed to encode categorical variables into numerical
representations. This step is essential since machine learning algorithms generally require numerical
input. The proposed system employs label encoding for categorical columns, transforming each
category into a unique integer value. While label encoding works in certain cases, this method can be
expanded or replaced with one-hot encoding for nominal categorical variables to avoid introducing any
unintended ordinal relationships, thereby improving model accuracy. Data normalization or scaling can
also be added to the pipeline, especially for algorithms like KNN or SVM that are sensitive to feature
scale.
18
CHAPTER 3
SYSTEM ANALYSIS
System Analytics refers to the application of data analysis techniques to evaluate, optimize, and
understand the performance of a system. It helps identify inefficiencies, enhance system components,
and ensure functionality aligns with desired outcomes. For the provided project, system analytics
revolves around the systematic preprocessing, modelling, and evaluation of data for the smart bin waste
classification system.
Standard scaling is applied to numerical features (FL_A, FL_B, FL_C, and VS) to normalize their
ranges, ensuring that models like KNN and SVM are not biased by varying magnitudes of features.
• Model Training and Evaluation
Splitting Dataset: The data is divided into training and testing sets for effective model validation.
19
Model Implementation: Multiple machine learning algorithms are implemented to classify waste
bins, including:
• K-Nearest Neighbors (KNN)
• Support Vector Machines (SVM)
• Logistic Regression
• Decision Trees
• Neural Networks (MLP)
• Random Forest
Performance Metrics: Models are evaluated using metrics like:
• Accuracy: Measures overall correctness.
• Precision and Recall: Focus on relevance and detection capability.
• F1-Score: Balances precision and recall.
• Matthews Correlation Coefficient (MCC): Evaluates the quality of binary classifications.
Confusion Matrices: These visualize model performance by illustrating true positives, false positives,
true negatives, and false negatives.
• Visualization and Insights
Model Accuracy: Plots are generated to analyses accuracy trends, such as evaluating KNN
performance across varying K values.
Confusion Matrices: Graphical representations of prediction outcomes for each model.
Metric Comparisons: Precision, Recall, and F1-scores are visualized
20
3.1 Functional Requirements
The functional requirements define the core tasks and capabilities the system must provide:
3.1.1 Data Collection
The system accepts intake and manage data related to recycling bins, focusing on critical
aspects like fill levels, container types, and recyclable fractions. For instance, the dataset
may contain columns representing the amount of waste in different sections of a bin (e.g.,
FL_A, FL_B), the type of waste container (plastic, metal, or paper), and whether the
contents are recyclable. This collected data forms the foundation for making predictions
about bin status and recycling optimization.
3.1.2 Prediction and Classification
Based on the collected data, the system predicts whether a bin is full or not. For example,
if the fill level exceeds a predefined threshold, the bin will be classified as "full." This
prediction helps streamline waste collection by ensuring bins are emptied on time.
Furthermore, the classification system can identify the type of recyclables and segregate
them accordingly.
22
3.6 Deliverables and Beneficiaries
3.6.1 Deliverables
A trained model capable of accurately predicting bin statuses (e.g., full, or empty) based on
input data. Visualization tools, including confusion matrices, accuracy plots, and performance
metrics, to provide insights into model performance.
3.6.2 Beneficiaries
• Waste Management Companies: The system optimizes collection schedules,
reducing unnecessary trips and operational costs.
• Local Authorities: Helps in better allocation of resources for waste management.
• Environmentally Conscious Users: Encourages responsible disposal habits by
providing clear recycling instructions.
3.7 Algorithm
The project employs multiple machine learning classifiers to predict bin statuses:
3.7.1 Primary Algorithms
K-Nearest Neighbors (KNN): KNN Identifies bin status by comparing its features to
the closest neighbors in the dataset.
Support Vector Machine (SVM): SVM Builds a hyperplane to classify bins into
"full" or "not full."
Logistic Regression: Logistic Regression uses probabilities to classify bins. Decision
Tree: Decision Tree is tree-based model that splits data into rules for prediction.
Neural Network (NN): Neural Network Employs layers to detect complex patterns in
data.
Random Forest: Random Forest combines multiple decision trees to improve
prediction reliability.
3.8 Methodology
The methodology involves a systematic approach to model development and evaluation:
3.8.1 Data Preprocessing
In Data Processing missing values are imputed using the median to prevent data loss.
Outliers, identified via boxplots, are replaced with the column mean to maintain data
consistency.
Features are scaled using StandardScaler to ensure compatibility across models.
3.8.2 Modelling
Multiple classifiers are implemented to compare performance.
Hyperparameter tuning is performed using grid search or manual adjustments.
23
CHAPTER 4
SYSTEM FEASIBILITY
• Data Quality
The effectiveness of the analysis largely depends on the quality and relevance of the dataset. A
high-quality dataset should contain enough examples with a balanced distribution of classes, which
prevents bias in the model’s predictions. Furthermore, the features used for model training should
be meaningful and capture the underlying patterns of the data. If the dataset includes noisy or
irrelevant features, the model’s performance will suffer. Therefore, ensuring that the dataset is well-
pre-processed—by removing or imputing missing values, encoding categorical variables correctly,
and addressing any inconsistencies—is crucial to achieving reliable results.
• Model Evaluation
Even though the models are trained and evaluated using standard performance metrics like
accuracy, precision, recall, and F1-score, it is essential to evaluate feature relevance. The features
selected for training play a significant role in determining the model's accuracy. Poorly selected or
irrelevant features may lead to overfitting or underfitting, diminishing model performance.
Therefore, feature selection or engineering techniques can significantly improve the model's
predictive ability. For instance, creating new features from existing data or selecting the most
influential ones could increase the model’s robustness and interpretability.
• Model Selection
The choice of algorithms (e.g., KNN, SVM, Logistic Regression, Random Forest) should be
driven by the problem's nature and the data's characteristics. For example, if the task is a
classification problem, algorithms like KNN or Logistic Regression might be more effective.
However, for more complex data patterns, algorithms like Random Forest or SVM could yield better
results. Additionally, computational efficiency plays a crucial role—while SVM models can be
computationally expensive, particularly for large datasets, Random Forests may better handle
complex, non-linear relationships. The decision should consider the trade-off between accuracy and
computational cost, especially when dealing with large-scale datasets.
24
• Handling Imbalanced Data
Imbalanced datasets, where one class significantly outnumbers the other, can cause the model
to be biased toward the majority class. This can lead to misleading predictions, especially in
classification problems like bin status prediction. To address this, techniques such as SMOTE
(Synthetic Minority Over-sampling Technique) or under sampling can be employed to balance the
dataset. SMOTE generates synthetic examples for the minority class, while under sampling reduces
the number of examples from the majority class. Implementing these techniques ensures that the
model treats all classes with equal importance, improving its ability to generalize well across both
classes.
The Smart Recycling System project, as designed, focuses on software-based solutions for
automating waste segregation and classification, without hardware-based sensors or real-time data
collection. The economic feasibility analysis considers the costs associated with software
development, operational costs, and potential benefits.
25
4.1.2 Operational Costs
• Cloud Computing / Hosting
Although the system is not dependent on real-time sensors, cloud infrastructure may be required
to host the trained machine learning models and run predictions for large datasets. This would
include hosting the models on cloud platforms like AWS, Google Cloud, or Microsoft Azure,
which typically charge based on computational usage.
Data Storage: Storing large amounts of historical data (e.g., fill levels, waste classification data)
on cloud storage platforms would incur additional costs, but these costs would be manageable
given the software focus of the project.
• Software Maintenance and Updates:
Periodic updates and model retraining may be required as new data becomes available, which
would incur additional developer hours and cloud costs for training and testing new models.
Regular maintenance of the user interface and backend systems to ensure smooth operation and
bug fixes.
4.1.3 Revenue Generation and Cost Savings
• Improved Waste Management Efficiency:
While this project is purely software-based, it can still offer significant cost savings for waste
management companies by optimizing the segregation of waste. More accurate classification of
recyclables ensures that waste is properly sorted, leading to higher rates of recycling and lower
disposal costs.
Reduced Labor Costs: By automating the waste classification process, the system reduces the
manual effort required to sort recyclables, lowering labor costs for waste management operations.
• Environmental and Social Benefits:
Recycling Optimization: The system improves the efficiency of waste segregation, which in turn
helps reduce the amount of waste sent to landfills, thus promoting environmental sustainability.
Public Awareness and Engagement: By providing an intuitive interface for users to input data
and receive feedback on recycling practices, the system can enhance public awareness of
recycling protocols, contributing to broader environmental goals and possibly attracting
government funding or incentives.
• Potential for Commercialization:
Licensing: The software could be licensed to waste management companies, municipalities, or
organizations focused on sustainability. This can become a recurring revenue stream for future
versions of the software.
26
Subscription Model: Offering the software as a service (SaaS) to local authorities, urban
planners, or environmental organizations could generate ongoing subscription-based revenue,
allowing you to scale the system over time.
4.1.4 Scalability
• Geographical Expansion: The system is scalable because it is entirely software-based, and with
appropriate marketing and partnerships, it could be deployed across different cities or
municipalities. As the software expands its reach, it would handle data from multiple sources,
increasing its overall utility and profitability.
• Adoption by Other Industries: Beyond waste management, the system can be expanded to
industries that rely on material classification, such as manufacturing, logistics, or supply chain
industries that manage recyclable materials.
4.1.5 Social and Environmental Impact
• Job Creation: While the system automates waste classification, the development, marketing, and
ongoing maintenance of the software would create jobs for developers, data scientists, and
marketing professionals.
• Environmental Sustainability: The system supports sustainability by promoting accurate waste
segregation and recycling, directly contributing to the reduction of environmental pollutants, and
encouraging better recycling practices in communities.
• Cloud Infrastructure: If necessary, cloud platforms such as AWS, Google Cloud, or Microsoft
Azure can be used to host the system, especially if it needs to scale to handle multiple smart bins'
data. These platforms provide the computational power needed to train and test models without
requiring local hardware upgrades.
• Local Computational Resources: If a smaller-scale system is required, the model training and
predictions can be run locally on machines with adequate processing power. The resource
requirements for machine learning models like KNN, Random Forest, and Logistic Regression
are not computationally intensive, so they can run on systems with moderate specs.
• Data Storage: Data storage requirements are relatively moderate, as historical data about
recycling bins can be stored on cloud or local databases, ensuring that enough storage is available
for training datasets and model outputs.
• Programming Language: Python is the primary programming language used for this project, as
it has extensive libraries for machine learning, data analysis, and visualization. Python's versatility
and popularity ensure that finding technical resources, support, and troubleshooting are not
challenging.
• Libraries: The main libraries required are:
scikit-learn for implementing machine learning algorithms and model evaluation. matplotlib and
seaborn for visualization of data and results.
Flask or Django for creating a simple web-based user interface if required.
The use of these libraries ensures technical feasibility, as they are well-documented, widely
used, and supported by the community.
29
4.3 Social Feasibility
The social feasibility of the Smart Recycling System focuses on evaluating how well the system
aligns with societal needs, its potential impact on the community, and its acceptance by
stakeholders, including local authorities, waste management companies, and the general public.
This analysis helps determine if the project will be well-received and if it will contribute
positively to society.
30
4.3.2 Community and Stakeholder Benefits
• Waste Management Companies: The system improves operational efficiency for waste
management companies by helping them plan waste collection schedules more effectively. By
predicting fill levels, these companies can deploy collection trucks only when needed, reducing
the number of unnecessary trips. This results in cost savings, improved logistics, and a reduction
in greenhouse gas emissions.
• Local Authorities: Local authorities, responsible for waste management and public health, can
use the system to monitor the status of recycling bins and ensure that waste is handled promptly.
This could lead to better service delivery for citizens, with improved cleanliness and sanitation in
neighborhoods.
• Environmentally Conscious Citizens: The public, especially environmentally conscious
citizens, will benefit from an organized and efficient recycling system. The system offers
convenience by ensuring recycling bins are emptied in a timely manner, making it easier for
people to recycle without the risk of overflowing bins. This could increase participation in
recycling programs and overall community engagement in sustainability efforts.
• Educational Value: The system can also serve as an educational tool, raising awareness about
recycling practices. Through its interface or public reporting features, it can educate users about
the importance of waste separation, the impact of recycling, and how individuals can contribute
to a greener planet.
32
CHAPTER 5
SYSTEM DESIGN
33
Use in Smart Recycling System: It models different types of waste (e.g., paper, plastic) under the
common Waste Type entity, enhancing data organization.
34
5.2.1 Data Collection Layer
Purpose: Collects data from smart bins, including fill levels, waste types, and user interactions.
Components: Smart sensors in the bins, user input interfaces (e.g., mobile apps), and real-time data
collection systems.
Functionality: Sensors continuously monitor the bin's fill levels and waste types, while users can
input data related to the waste they deposit.
5.2.2 Data Processing and Machine Learning Layer
Purpose: Processes the collected data and makes predictions about the bin's status (e.g., full, empty).
Components: Data processing units, machine learning algorithms (KNN, Random Forest, etc.), and
data storage (databases).
Functionality: This layer applies pre-processing techniques like data cleaning, feature engineering,
and scaling before feeding the data into machine learning models. The system predicts whether the
bin is full or not and classifies the type of waste.
5.2.3 Prediction and Decision-Making Layer
Purpose: Executes predictive models to determine the fill status and waste type classification.
Components: Machine learning models (e.g., KNN, SVM), prediction servers, and algorithms.
Functionality: Based on the processed data, the models classify the bin’s fill level and waste type.
The system uses these predictions to optimize the collection process, schedule pickups, and provide
alerts to users.
5.2.4 User Interface Layer
Purpose: Provides an interface for users (e.g., waste management operators, local authorities, and
citizens) to interact with the system.
Components: Web or mobile applications, dashboards, and alert systems.
Functionality: Users can view the status of recycling bins, receive alerts about full bins, and manage
collection schedules. The interface provides real-time data visualization, including bin statuses, waste
type classifications, and system performance metrics.
5.2.5 Collection and Optimization Layer:
Purpose: Manages the physical collection of waste from the bins and optimizes collection routes.
Components: Collection trucks, GPS systems, and route optimization software.
Functionality: Based on the predictions and alerts from the machine learning layer, the system
optimizes waste collection routes. The collection trucks are dispatched to bins based on their fill
status, reducing inefficiencies in the collection process.
5.2.6 Integration and Communication Layer:
Purpose: This Layer ensures seamless communication between different layers and components.
Components: APIs, data transmission protocols (e.g., MQTT), and cloud services.
35
Functionality: This layer connects various components of the system (data collection, machine
learning models, user interfaces) and ensures that data flows in real-time, allowing the system to
operate smoothly and respond quickly to changes.
36
and their interactions with system features such as logging waste, checking bin statuses, and
scheduling collections.
The key elements are
Actors: Users (e.g., Waste Management Personnel, Citizens)
Use Cases: Bin fill level monitoring, waste classification, collection scheduling
Relationships: Associations between actors and use cases, showing what functionality each actor can
access.
37
Relationships: Inheritance, Association, and Aggregation relationships that depict.
38
Fig 5.5 Sequence Diagram
39
CHAPTER 6
SYSTEM IMPLEMENTATION
The system implementation focuses entirely on the design, development, and integration of
software components, as the project does not involve any hardware sensors or IoT devices. The
system is designed to manage and analyze data related to recycling, such as waste types and bin
statuses, using machine learning models and data processing techniques. By leveraging these
software-driven methods, the system predicts bin statuses and classifies waste types, providing an
efficient and scalable solution for optimizing recycling processes.
42
CHAPTER 7
SYSTEM TESTING
Testing is the process of evaluating and verifying that a software application or system meets
specified requirements and functions correctly. It involves running the software under different
conditions to identify defects, bugs, or areas for improvement, ensuring that the system works as
intended and meets user expectations. Testing helps ensure software quality, reliability, security, and
performance by identifying issues before the software is deployed for end-users.
TC02 Handle Missing Column Missing values Missing Pass Verified Completed
Values with are replaced with values successfully.
missing median for replaced. No
values numeric columns. NaNs.
44
TC03 Correlation Full Correlation Correlation Pass Matches Completed
Matrix dataset heatmap is heatmap expected
displayed. displayed. visualization.
Correct
relationships
between features
are visualized.
TC04 Split Dataset Dataset Dataset is Dataset split Pass Splitting Completed
and split correctly split correctly: confirmed.
ratio (80- into training and 80-20 ratio.
20) testing subsets.
TC06 Evaluate KNN Test data Model predicts Accuracy: Pass Performance Completed
on Test Data class labels and 87%. F1- is acceptable.
provides score: 0.85.
accuracy,
precision, recall,
and F1-score.
TC07 Handle Missing Dataset Missing values Missing Fail Preprocessing Under
Values in Input with are values not function Review
Data missing replaced/imputed, handled needs review.
values and no errors properly.
occur during
preprocessing or
training.
TC08 Compute Test data, Confusion matrix Confusion Pass Correct Completed
Confusion predicted is computed and matrix output
Matrix for labels matches expected matches verified.
KNN values. expected
Predictions values.
45
CHAPTER 8
RESULTS
The image presents two components: an overview of the dataset used in the project and a correlation
heatmap visualizing interdependencies among various features. The dataset consists of 13 columns,
each representing specific attributes related to waste classification: The target variable indicating the
waste category and The image shows that the dataset contains 4638 non-null entries for all columns,
ensuring no missing data after preprocessing. Data types include float64 for numerical features
and int64 for categorical features, reflecting efficient storage and representation.
46
2. Correlation Heatmap
The heatmap visualizes the Pearson correlation coefficients between features in the dataset.
Correlation values range from -1 (strong negative correlation) to +1 (strong positive
correlation).
Observations:
Target Variable (Class):
• Moderately correlated with FL_B (-0.55) and weakly correlated with other fill- level
features like FL_A and derived features (FL_C, FL_C_3, FL_C_12).
• Weak negative correlation with Container Type (-0.4).
• Strong positive correlations among related features like FL_B and FL_B_3
(0.87), indicating temporal consistency in waste fill levels.
• High correlation values (close to 1) between features within the same time interval or
derived features like FL_C_12 and FL_A_12 (0.69).
Recyclable Fraction:
• Weak correlations with most features, indicating its limited dependence on fill- level
measurements.
47
Fig 8.2 Neural Network Confusion Matrix
The output illustrates the confusion matrix and the evaluation metrics for a Neural Network model
used for waste classification.
▪ True Negative (TN): 398 instances were correctly classified as negative (e.g., waste not belonging
48
▪ False Negative (FN): 41 instances were incorrectly classified as negative.
▪ True Positive (TP): 459 instances were correctly classified as positive (e.g., waste belonging to
The confusion matrix is supported by quantitative metrics for a detailed performance assessment:
• TP+TN
Accuracy is calculated as Accuracy =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
This model achieved an accuracy of 92.35%, highlighting the model’s ability to correctly
classify the majority of instances.
Precision of 93.86% indicates that the model is highly reliable when it predicts a positive class.
A recall of 91.8% demonstrates that the model can identify most of the positive cases
correctly.
• 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score is calculated as Precision = 2 ×
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
The F1 Score of 92.82% reflects a strong balance between precision and recall.
A robust metric for imbalanced datasets, accounting for all confusion matrix elements.
The MCC score of 0.847 indicates a high level of agreement between predicted and actual
labels.
• The Neural Network model exhibits excellent performance across multiple metrics,
deployment, where both false positives and false negatives have significant implications.
• The high MCC score further validates the robustness of the classification, suggesting that the
This output demonstrates the effectiveness of the Neural Network in achieving high accuracy
and reliability in waste classification, validating its potential for scalable and practical
applications.
49
Fig 8.3 Random Forest
The graph shows how accuracy improves with the increasing number of trees in the Random Forest
model.
Key data points include
At 1 tree, accuracy is 83.1%.With 7 trees, accuracy peaks at 89.2%, after which it stabilizes. Beyond
16 trees, accuracy plateaus at 89.4%, indicating diminishing returns for additional trees. The
steady improvement in accuracy with more trees demonstrates the effectiveness of ensemble learning.
The plateau suggests that increasing the number of trees beyond 16 does not significantly enhance
performance.
50
CHAPTER 9
CONCLUSION AND FUTURE ENHANCEMENT
9.1 Conclusion
The Smart Recycling System project successfully demonstrates the potential of software-
driven solutions to address the growing global issue of waste management and recycling. By
developing a user-friendly platform that automates the classification of waste and provides actionable
recycling recommendations, the system contributes significantly to sustainable practices. The
integration of algorithms for waste categorization ensures that users can efficiently identify the type
of waste they are disposing of, while the recommendation engine offers appropriate disposal or
recycling options, making the entire recycling process smoother and more accessible.
Throughout its development, the system has been thoroughly tested to ensure its functionality,
reliability, and user experience. Unit testing, integration testing, and other essential software testing
methods confirmed that the individual components work as expected and that the system operates
seamlessly as a whole. The final product is an effective tool for individuals looking to adopt better
recycling practices, contributing to environmental sustainability efforts.
The Smart Recycling System is not only a practical solution but also highlights the growing
importance of technology in promoting eco-friendly habits. With its ability to accurately classify
waste and provide relevant recommendations, the system empowers users to make informed decisions
about waste disposal. In the long run, it has the potential to play a key role in reducing waste, lowering
carbon footprints, and supporting global recycling initiatives. The project lays a strong foundation for
future developments in smart waste management, and its impact can be further amplified with
additional features and enhancements in future iterations.
9.1.1 Impact and Contribution
The project contributes to the circular economy model by encouraging responsible waste
disposal and maximizing the reuse of resources. By making recycling more accessible and efficient,
the system promotes environmental sustainability. This project also demonstrates how technology
can be used to address critical global challenges, providing a scalable solution for waste management.
51
types automatically. This would enhance the system's ability to classify waste without
requiring user input, improving efficiency.
• Machine Learning Model Improvement: The waste classification algorithm can be
further refined using advanced machine learning techniques. By training the model on a
larger and more diverse dataset, the system's classification accuracy can be significantly
improved. Additionally, integrating deep learning models could help classify more
complex or ambiguous waste items.
• Geolocation-Based Recommendations: Future versions of the system could integrate
geolocation features, providing users with information on nearby recycling centres, drop-
off locations, or collection schedules based on their location. This would make the system
more convenient and relevant to users' daily lives.
• Mobile Application Development: The system could be expanded into a mobile
application, making it accessible to a broader audience. A mobile app could offer real-
time notifications, push alerts for waste collection schedules, and allow users to access
recycling tips on the go.
• Collaborations with Local Governments and Organizations: Partnering with local
governments or recycling organizations could help scale the system and ensure that it is
aligned with local recycling rules and regulations. Such collaborations could also help
provide users with rewards or incentives for actively participating in recycling programs.
• User Engagement and Education: The system could include educational resources about
recycling best practices and the environmental impact of waste. Gamifying the experience
(e.g., with rewards or points for recycling efforts) could encourage users to engage with
the system more frequently and improve participation in recycling efforts.
• Support for More Waste Categories: The system could expand its database to include a
wider variety of waste materials, such as electronic waste, textiles, and organic waste. This
would make the system applicable to a broader range of recycling scenarios.
By implementing these enhancements, the Smart Recycling System could become an even more
powerful tool in promoting sustainable waste management and encouraging responsible consumption.
The continuous evolution of the system will contribute to global sustainability efforts and help tackle
the growing environmental challenges posed by waste.
52
REFERENCES
53
APPENDIX
A. SOURCE CODE:
#Corrleation Matrix
import seaborn as sns
from matplotlib import rcParams
from matplotlib.cm import rainbow
corrmat=df.corr()
top_corr_features=corrmat.index
plt.figure(figsize=(15,15))
sns.heatmap(df[top_corr_features].corr(),annot=True,cmap='RdYlGn')
plt.show()
#Standard Scaling
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
standardScaler=StandardScaler()
columns_to_scale=['FL_B','FL_A', 'FL_C', 'VS']
df[columns_to_scale]=standardScaler.fit_transform(df[columns_to_scale])
df.head()
#K Nearest Neighbors
from sklearn.neighbors import KNeighborsClassifier
knn_scores=[]
for k in (range(1,21)):
knn_classifier=KNeighborsClassifier(n_neighbors=k)
knn_classifier.fit(train_X,train_y)
knn_scores.append(knn_classifier.score(test_X,test_y))
56
#Plotting the Graph of model scores for different k values
plt.figure(figsize=(30,30))
plt.plot([k for k in range(1,21)],knn_scores,color="blue")
for i in range(1,21):
plt.text(i, knn_scores[i-1],(i,round(knn_scores[i-1],4)))
plt.xticks([i for i in range(1,21)])
plt.xlabel("Number of Neighbors (K)",color="Red",weight="bold",fontsize="18")
plt.ylabel("Scores",color="Red",weight="bold",fontsize="18")
plt.title("K Neighbors Classifier scores for different K values",color="Red",weight="bold",fontsize="20")
#plt.figure(figsize=(30, 20))
plt.show()
plt.rcParams["font.weight"]="bold"
plt.rcParams["axes.labelweight"]="bold"
# Classification Report
print(classification_report(test_y, pred))
TP = cm[1][1]
FP = cm[0][1]
TN = cm[0][0]
57
FN = cm[1][0]
b = ((TP+FP)*(TP+FN)*(TN+FP)*(TN+FN))
b = b**0.5
a = (TP*TN-FP*FN)
MCC_KNN = round(a/b,3 )
Precision_Score = TP / (FP + TP)
Recall_Score = TP / (FN + TP)
Accuracy_Score = (TP + TN)/ (TP + FN + TN + FP)
F1_Score = 2* Precision_Score * Recall_Score/ (Precision_Score + Recall_Score)
print("Accuracy : ",Accuracy_Score*100)
print("Precision : ",Precision_Score*100)
print("Recall : ",Recall_Score*100)
print("F1 score : ",F1_Score*100)
print('MCC Score : ', MCC_KNN)
sns.jointplot(x='FL_C',y='FL_A',data=df,hue='Class')
# Assuming you have 'train_X', 'train_y', 'test_X', 'test_y' ready and already split
svm_model = LinearSVC()
svm_model.fit(train_X, train_y) # Train the SVM model
# Predictions
pred = svm_model.predict(test_X)
58
# Manually plotting the confusion matrix using seaborn heatmap
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Predicted Negative", "Predicted
Positive"], yticklabels=["True Negative", "True Positive"])
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.title('Confusion Matrix')
plt.show()
# MCC Calculation
b = ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)) ** 0.5
a = (TP * TN - FP * FN)
MCC_SVM = round(a / b, 3) if b != 0 else 0 # Avoid division by zero
# Printing metrics
print("Accuracy (calculated manually):", Accuracy_Score * 100)
print("Accuracy (from model's score):", svm_model.score(test_X, test_y) * 100) # Alternatively, use the
model's built-in accuracy
print("Precision:", Precision_Score * 100)
print("Recall:", Recall_Score * 100)
print("F1 score:", F1_Score * 100)
print("MCC score:", MCC_SVM)
59
#Logistic Regression
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
import seaborn as sns # For better heatmap visualization
# Assuming 'train_X', 'train_y', 'test_X', 'test_y' are already defined and split
classifier = LogisticRegression(random_state=0)
classifier.fit(train_X, train_y) # Train the logistic regression model
60
# MCC Calculation (Matthews Correlation Coefficient)
b = ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)) ** 0.5
a = (TP * TN - FP * FN)
MCC_SVM = round(a / b, 3) if b != 0 else 0 # Avoid division by zero
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import confusion_matrix
import seaborn as sns # For better heatmap visualization
# Assuming 'train_X', 'train_y', 'test_X', 'test_y' are already defined and split
# Decision Tree Classifier
clf_model = DecisionTreeClassifier(random_state=0)
clf_model.fit(train_X, train_y)
pred = clf_model.predict(test_X)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
# Assuming the 'df' DataFrame is available with required features and 'Class' as target
# Splitting the data
train, test = train_test_split(df, test_size=0.2)
train_X = train[['FL_B', 'FL_C', 'FL_C_12', 'FL_C_3']]
train_y = train['Class']
test_X = test[['FL_B', 'FL_C', 'FL_C_12', 'FL_C_3']]
test_y = test['Class']
64
# Plot the Random Forest performance with different n_estimators
plt.figure(figsize=(10, 6))
plt.plot(range(1, 21), RF_scores, color='blue')
for i in range(1, 21):
plt.text(i, RF_scores[i-1], f"({i}, {RF_scores[i-1]:.3f})", ha='center', va='bottom')
plt.xticks(range(1, 21))
plt.xlabel('Number of Trees', color='Red', weight='bold', fontsize=12)
plt.ylabel('Accuracy', color='Red', weight='bold', fontsize=12)
plt.title('Random Forest Accuracy vs Number of Trees', color='Red', weight='bold', fontsize=14)
plt.show()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()
# MCC Calculation
b = ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)) ** 0.5
a = (TP * TN - FP * FN)
MCC_RF = round(a / b, 3) if b != 0 else 0 # Avoid division by zero
# Other metrics
Precision_Score = TP / (FP + TP) if (FP + TP) != 0 else 0
Recall_Score = TP / (FN + TP) if (FN + TP) != 0 else 0
Accuracy_Score = (TP + TN) / (TP + FN + TN + FP)
F1_Score = 2 * Precision_Score * Recall_Score / (Precision_Score + Recall_Score) if (Precision_Score +
Recall_Score) != 0 else 0
66
B. INTERNSHIP CERTIFICATES
67
68
69
C.PUBLICATION
70