A Robotic Waste Sorting Machine With Modified Conveyor Plates and Deep Learning Based Optical Detectio
A Robotic Waste Sorting Machine With Modified Conveyor Plates and Deep Learning Based Optical Detectio
Abstract—In this work, we introduce a groundbreaking robotic play a vital role in environmental degradation and public health
waste-sorting system that fuses an innovative mechanical con- concerns.
veyor design with state-of-the-art deep learning for real-time In many Indian municipalities, waste management is largely
classification. Our conveyor features ten rows of hinged trays,
each independently actuated by compact servos, enabling precise, characterized by informal means where waste is collected,
row-level ejection of targeted waste fractions. A dual-camera sorted, and recycled by an unorganized labor force. Although
setup captures RGB frames every 0.5 s, and an efficient this informal sector plays a vital role in the recycling chain,
queue-based synchronization aligns detection and actuation with the process is time-consuming, hazardous, and inefficient, es-
millisecond accuracy—eliminating the need for encoders or pecially for such a large volume of waste. The implementation
complex pneumatics. We rigorously benchmarked four modern
architectures—ShuffleNetV2, MobileNetV2, Custom ResNet-50, of automated, accurate, and sustainable waste sorting solutions
a Swin-CNN hybrid, and ViT—on the TrashNet dataset, un- is more of an urgent necessity than a luxury.
covering a superior balance of speed, accuracy, and model Recent advancements in automation and artificial intel-
compactness. Our final deployment employs a lightweight CNN ligence have shown promising platforms for smart waste
variant achieving ¿96 % classification accuracy with sub-50 ms management systems. The integration of computer vision and
latency, all within a ¡5 MB footprint. Designed for plug-and-play
integration into existing materials-recovery facilities, this system deep learning techniques has the potential to stir up waste
promises to slash energy use, reduce maintenance overhead, segregation by offering real-time, highly accurate sorting of
and scale from small-town recycling centers to metro-scale complex and heterogeneous waste streams. Advanced image
operations—paving the way toward a truly circular economy. processing techniques can identify subtle differences in mate-
rial composition and color, ensuring that recyclables are sorted
more effectively. This not only reduces the dependence on
I. I NTRODUCTION human labor but also enhances the purity of sorted waste.
The present study introduces an innovative robotic waste
Due to India’s rapid urbanization and Industrialization, the sorting machine specifically designed to address the challenges
volume of municipal waste is ever increasing with an un- faced by Indian waste management systems. Unlike traditional
precedented challenge of managing it. Major cities like Delhi, state-of-the-art optical sorting systems that rely on pneumatic
Mumbai, Bangalore, Chennai, and Kolkata are battling with actuation, our system employs a mechanical conveyor belt
dual issues of waste quality and segregation practices, which with rows of plates and servo-actuated trays. At the intake
pose serious environmental hazards and undermine resource zone, waste is continuously fed onto the conveyor where it is
recovery efforts and sustainability. A significant portion of divided into multiple plate regions. The dual high frames-per-
waste ends up in open dumps or landfills, despite government second camera system captures images at a high frequency,
initiatives like the Swatch Bharat Mission, which aims to and a deep learning model classifies the images of the waste
improve sanitation and waste management nationwide. Un- in real time. The system captures images every 0.5 seconds
organized manual segregation, inefficient recycling facilities, at a detection point (Point A) and aligns these detections with
and a lack of organized waste streams are major barriers that the actuation point (Point B) after a calculated delay of about
2 seconds, corresponding to the speed of the conveyor belt developed a system that utilized texture features (extracted
and the distance between Point A and Point B. through techniques like Gray Level Co-occurrence Matrix)
In our approach, four state-of-the-art models were trained, to monitor levels of waste bins. Such techniques were con-
including convolutional neural network (CNN) models, vi- ceptually straightforward and computationally lightweight but
sual transformers (ViT), and hybrid models (CNN + ViT). were impractical in real-world environments where significant
After rigorous comparison based on balanced performance variations in lighting, occlusion, and background clutter can
in accuracy, inference speed, model compactness, and some dramatically decrease performance.
other factors, we decided to go forward with the XXX model. Traditional techniques often relied on edge detection algo-
The deep learning model is integrated into a holistic system rithms and explicitly tuned thresholds. Although these methods
where it predicts the control signals for the servo motors, laid a foundation for automated inspection, their highly sensi-
ensuring that when the detected waste on a specific plate tive nature towards variation of environmental conditions and
region (among the three per row) meets the predefined criteria. inability to adapt to complex tasks, limited their application
The corresponding mechanical actuator is triggered to eject outcome and practicality.
and separate the waste into another waste collection bin or
C. Deep Learning-Based Methods
direct it onto another path for further processing.
By combining deep learning techniques with a novel yet The introduction of deep learning has revolutionized com-
simple mechanical design, our system not only increases the puter vision. These techniques provided significantly more
efficiency and reliability of waste segregation but also offers robust and accurate results for waste classification tasks. CNNs
a cost-effective solution for Indian cities. This integration of have emerged as the most popular approach due to their
cutting-edge technology is poised to transform the waste man- automatic and dynamic feature extraction capabilities and
agement landscape by reducing manual intervention, increas- high performances in image recognition, classification, and
ing recycling rates, and ultimately contributing to a cleaner, segmentation tasks. Several studies have applied CNNs to
more sustainable urban environment in India. waste sorting applications with noble success:
• Electronic Waste Classification: Sarswate et al. [9]
II. L ITERATURE R EVIEW employed a CNN-based system, using the YOLO (You
A. Overview of Waste Sorting and its Evolution Only Look Once) object detection framework, for the
In rapidly urbanizing countries like India, managing mass- classification and segregation of various components of
scale municipal waste is one of the most critical environmental electronic waste. Their approach achieved high precision
challenges. Traditional methods of sorting wastes and recy- and recall in detecting e-waste items but the process was
clables have heavily relied on manual segregation, which is not computationally extensive for real-time application.
only labor intensive but also inconsistent. The inefficiencies • Plastic Waste Segregation: Research by Choi et al. [4]
in manual sorting have spurred efforts to develop automated demonstrated how deep learning models were able to
systems. The early mechanically autonomous solutions in- distinguish chemically similar plastics. The use of deep
corporating basic sensor technologies and image processing CNNs powered the system to maintain high classification
built the foundation of today’s technologies, as these methods accuracy even when waste items exhibited subtle visual
often struggled with the heterogeneous nature of wastes and differences.
changing environmental conditions. • Advanced Architectures: More recent models, such as
Computers have increasingly played a heavy role in the DSYOLO-Trash by Ma et al. [19], incorporate attention
automation of the sorting process by enabling the machines mechanisms (Convolutional Block Attention Module) and
to “see” and differentiate waste based on visual patterns. This object-tracking algorithms, that enhance detection perfor-
evolution from basic feature extraction to state-of-the-art deep mance even under mixed conditions.
learning techniques, particularly Convolutional Neural Net- If we consider accuracy and precision, Deep learning tech-
works (CNNs) has shown dramatic improvements in accuracy niques are clear winners here. However, they require large,
and precision. However, despite the advancements in artificial annotated datasets for supervised learning, are also computa-
intelligence, the harmonic integration of these state-of-the-art tionally extensive, and demand high resources.
models with physical machinery is still a challenge. There is
a significant research gap between the theoretical studies and D. Hybrid Approaches
practical implementation of these systems. Hybrid methods are a combination of both traditional
techniques and deep learning techniques that lead up to the
B. Traditional Computer Vision Models result. They typically incorporate conventional pre-processing
Early research in automated waste sorting relied on prim- techniques (noise reduction, edge, detection, and thresholding)
itive CV algorithms. These incorporated feature extraction with deep learning models to enhance overall performance.
techniques like Scale-Invariant Feature Transform (SIFT), For example, Cuingnet et al. [5] developed a hybrid system
Speeded-Up Robust Features (SURF), and Histogram of Ori- that integrated traditional image processing (using OpenCV)
ented Gradients (HOG) to classify wastes under predefined with deep learning-based classification to improve the accu-
and controlled conditions. For example, Arebey et al. [24] racy of sorted aluminum can streams in real-time. Similarly,
Jadli and Hain [3] proposed a system in which conventional and inference speeds, advanced hardware like high-end
feature extraction was incorporated with transfer learning GPUs or accelerated processors especially suited for
techniques to boost performance. These hybrid approaches matrix calculations, along with efficient algorithms are
reduce the dependency on large datasets and train the deep required.
learning model more efficiently. • Integration with Existing Infrastructure: Adapting and
integrating these systems into pre-existing waste manage-
E. Application-Specific Studies
ment frameworks or machines presents technical, logis-
Several studies catered computer vision-based waste sorting tical, and compatibility issues, particularly in developing
systems to specific waste streams or operational contexts: countries like India.
• Municipal Solid Waste (MSW): In urban scenarios, as
the complexity of waste streams is relatively high, it III. P REPROCESS W ORKFLOW OF M IXED WASTE
requires a model that can be generalized across diverse S TREAMS
waste types. Studies such as those by Sousa et al. [14] and Before an item reaches the optical sorter, it undergoes
Lavanya et al. [] have focused on MSW, demonstrating several steps to separate materials into specific categories ef-
the potential of CNN-based systems to handle mixed ficiently. The workflow consists of multiple sequential stages,
waste conditions, with challenges in model adaptability. designed to progressively condition mixed waste into streams
• E-Waste and Electronic Components: With the rise that are optimal for high-precision optical sorting. This section
of consumer electronics, electronic waste has not only explains each step thoroughly, including the industry-leading
become a rising concern but an environmental hazard due machinery used.
to the presence of hazardous components like batteries,
A. Manual Pre-sorting (Hazardous/Bulky Items)
etc. The work by Joseph et al. [] and subsequent studies
have shown that deep learning–based approaches show The incoming waste, straight from a landfill or dumpyard, is
promising results with high classification accuracy for first manually sorted to remove hazardous or bulky items that
sorting e-waste. might interfere with the workflow or damage the machinery.
• Drone-Based Waste Monitoring: Malche et al. [8] Although most of this presorting is performed by human labor,
explored the use of drone-based systems incorporating modern facilities increasingly employ robotic sorting arms for
TinyML for areal waste monitoring, which broadens the improved safety and efficiency. One industry-leading solution
application of Computer Vision techniques from static is the ZenRobotics Recycler (ZRR) by ZenRobotics. These
facilities to dynamic urban scenarios. This approach AI-powered robots use hyperspectral imaging to accurately
addresses challenges regarding large-scale monitoring of detect and remove batteries, electronics, and other hazardous
waste in expansive areas. materials from the waste stream.
• Smart Cities: Incorporating computer vision in bins
(smart bins), as discussed by Pan et al. [10], highlights a
shift towards IoT-based waste classification and manage-
ment. These systems also monitor bin levels in real-time
to optimize collection and minimize operational costs.
F. Challenges and Limitations
Despite significant progress, several major challenges still
exist for computer vision-based waste classification systems:
• Variability in Waste Materials: Waste streams vary
heavily both in material composition and presentation. As
models are trained in controlled environments, they may
perform sub-optimally when deployed in an industrial set-
ting, where there exists varying illumination, occlusion,
and background complexity.
• Dataset Limitations: Deep learning models like CNNs
are supervised models and thus require large-scale, anno-
tated datasets to train and achieve high accuracy. While Fig. 1. ZenRobotics Recycler (ZRR) for hazardous and bulky item removal
datasets like TrashNet and TACO are used widely, their https://zenrobotics.com/recycler/
limited scale and controlled conditions do not fully cap-
ture the diversity that is expected to be encountered in a
practical setting. B. Primary Shredding
• Real-Time Processing: Deep learning models are com- After pre-sorting, waste is broken into smaller chunks
putationally intensive, especially for high throughput and using a primary shredder. Dual-shaft shredders are common
real-time applications. To achieve the desired outcome due to their high throughput and ability to produce uniform
particle sizes. The Tana E-Shredder by Tana can process
over 50tons/hour, reducing waste pieces to less than 15cm.
An alternative is Krause Manufacturing’s dual-shaft shredding
system, which offers similar high-volume performance.
https:
//www.buntingusa.com/products/eddy-current-separators/
Fig. 2. Tana 440ET Electric Shredder
https://www.tana.fi/en/shredders/tana-440et/
and vibratory action to handle even sticky, wet feedstocks with
ease.
C. Ferrous Metal Removal
Once shredded, the stream passes under overhead magnetic
separators that extract ferrous metals such as steel cans, nails,
and car parts. The Eriez Suspended Electromagnets by Eriez
Magnetics achieve up to 99% removal efficiency for ferrous
fragments.
https://www.eriez.com/electromagnets
https:
//www.tomra.com/en/sorting/recycling/metal/autosort-laser
https://www.vecoplan.com/products/vez-series/
B. Sensor System
1) Cameras and Sensors:
• Digital (RGB) Camera: Captures high-resolution color
images for object shape, texture, and visible-spectrum
features.
• NIR Camera: Mounted adjacent to the digital camera
in the blue intake box, intended for material composition
analysis via near-infrared spectroscopy. (Note: NIR data
is currently not leveraged due to lack of annotated NIR
datasets.)
The combined field-of-view is calibrated so that each cap-
tured frame spans exactly three plate regions. Software di-
vides each frame into three vertical sections—one per plate—
Fig. 11. Conveyor belt structure and intake system(2) for independent classification.
2) Hardware Integration: Both cameras interface via
2) Actuation Zone (Point B): Directly beneath the conveyor USB/FireWire to a central PC running the image processing
at a fixed location (Point B), three red levers—each linked to pipeline. An Arduino Uno R3 is connected over serial (COM
its own servo motor—are aligned in three parallel columns port at 9600 baud) to receive actuation commands. Each of
corresponding to Plate 1, Plate 2, and Plate 3 positions. When the three servos is wired to Arduino PWM pins (e.g., 9, 10,
the detection logic identifies that a given plate in a specific 11), providing precise angular control for lever pivoting and
row contains the target waste category, the associated servo is subsequent return to the ready position.
Input
3×224×224
3×3 Conv
s=2
BatchNorm
ReLU
2×2 MaxPool
Fig. 13. Dual-camera sensor system layout. s=2
[ShuffleNetV2 Unit] × N
C. Software System and Queue Logic
1) Real-Time Image Capture and Processing: A Python-
1×1 Conv
based main loop captures a new frame every 0.5 seconds via
OpenCV. Each frame is automatically split into three regions
of interest (ROIs), corresponding to Plates 1–3. These ROIs AdaptiveAvgPool
are preprocessed (resized, normalized) and passed through the
selected deep learning classifier to detect whether the target Flatten
waste class is present in each plate.
2) Queue-Based Row Tracking:
FC → Softmax
a) Queue Structure::
• Implements a circular deque with maximum length 10 Fig. 14. ShuffleNetV2 architecture flow.
(equal to row count).
• Entry format:
{ b) Key Equations:
"row_id": <int>,
Y::c = X ∗ K dw (c), K dw ∈ Rk×k×C
"detection_time": <timestamp>,
"active_plates": [<1|2|3>, ...] (1)
} C
X pw
Z::d = Y::c · Kc,d , K pw ∈ R1×1×C×D
• row_id cycles from 0 to 9, matching physical rows on c=1
the conveyor. (2)
2 2
b) Synchronization Logic:: A background task continu- FLOPssep = HW (k C + C D), FLOPsstd = HW k C D
ously polls the queue. For each entry, it computes elapsed (3)
= now - detection_time. When elapsed ≥ 2.0
2. Custom ResNet-50: Employs residual learning via skip
seconds, the entry’s active_plates list is read and
connections to enable very deep networks without gradient
serialized into a command string (e.g., "1,3\n"). This string
vanishing [41].
is sent via serial to the Arduino, which actuates the corre-
sponding servo(s). Entries are then removed from the queue, c) Residual Mapping:
ensuring timely and accurate actuation. The logic supports y = F (x; {Wi }) + x (4)
simultaneous activation of multiple servos whenever multiple
plates in the same row require discharge. 3. MobileNetV2: Uses inverted residuals and linear bottle-
3) Deep Learning Model Training and Comparison: necks to minimize computations on mobile devices [43] [31].
a) Detailed Overview of Selected Deep Learning Mod- d) Block Equations:
els: This section provides an in-depth discussion of five
state-of-the-art architectures—ShuffleNetV2, Custom ResNet- x̂ = σ(BN(Wexp x)), (5)
50, MobileNetV2, Hybrid Swin-CNN, and Vision Trans- x̃ = σ(BN(Wdw x̂)), (6)
former—highlighting their design principles, computational y = BN(Wproj x̃) (7)
characteristics, and suitability for waste image classification.
1. ShuffleNetV2 (×1.0): Optimized for real-time edge infer- 4. Hybrid Swin-CNN: Merges hierarchical shifted-window
ence by balancing memory-access cost and FLOPs via channel self-attention (Swin) with convolutional blocks (ResNet-18) to
splitting, pointwise convolutions, and channel shuffling [44]. capture global context and local textures [33].
Input Input
3×224×224 3×224×224
7×7 Conv
Swin-Tiny ResNet-18
s=2
Backbone Backbone
BatchNorm
ReLU GlobalAvgPool GlobalAvgPool
(768-d) (512-d)
3×3 MaxPool Concatenate
s=2 (1280)
[Bottleneck×3]
FC 512→GELU→Dropout
[Bottleneck×4]
LayerNorm
[Bottleneck×6]
[Bottleneck×3] FC→Softmax
Input Patchify
3×224×224 16×16