Vision Based Automated Hole Assembly System With Quality Inspection Thesis
Vision Based Automated Hole Assembly System With Quality Inspection Thesis
2023-09-21
Kim, Doowon
Kim, D. (2023). Vision-based automated hole assembly system with quality inspection (Master's
thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
https://hdl.handle.net/1880/117321
Downloaded from PRISM Repository, University of Calgary
UNIVERSITY OF CALGARY
by
Doowon Kim
A THESIS
CALGARY, ALBERTA
SEPTEMBER, 2023
Automated manufacturing, driven by rising demands for mass-produced products, calls for
efficient systems such as the peg-in-hole assembly. Traditional industrial robots perform these
tasks but often fall short in speed during pick-and-place processes. This study presents an
innovative mechatronic system for peg-in-hole assembly, integrating a novel peg insertion tool,
assembly mechanism and control algorithm. This combination achieves peg insertion with a 200
µm tolerance without the need for pick-and-place, meeting the requirements for high precision and
rapidity in modern manufacturing. Dual cameras and computer vision techniques, both traditional
and machine learning (ML)-based, are employed to detect workpiece features essential for
assembly. Traditional methods focus on image enhancement, edge detection and circular feature
recognition, whereas ML verifies workpiece positions. This research also introduces a novel
testing on varied workpiece surfaces, the robustness of the methods is affirmed. The assembly
system demonstrates a 99.00% success rate, while the quality inspection method attains a 97.02%
accuracy across diverse conditions, underscoring the potential of these techniques in automated
II
Acknowledgements
I wish to sincerely express my gratitude to those instrumental in bringing this thesis to fruition.
First and foremost, my profound thanks go to my supervisor, Dr. Jihyun Lee, who not only granted
me this invaluable opportunity but also furnished me with unwavering guidance and support
throughout this endeavour. Equally, I am indebted to my co-supervisor, Dr. Simon Park, for his
academic counsel and mentorship. Their combined wisdom and support have been the
cornerstones of my journey.
Laboratory (fondly known as iAR Lab). I would especially like to highlight the contributions of
Ebrahim Mahairi, Zhanhao Wang, Mitchell Weber, Jeff Shin, Ali Khishtan, Omer Sajjad, Dr.
Erfan Shojaei Barjuei and Navid Moghtaderi. My collaboration with the team at the Multi-
functional Engineering Dynamics Automaton Lab (MEDAL) has been both insightful and
rewarding. Special thanks to Dr. Majid TabkhPaz, Mr. Sherif Hassan and GN Corporations Inc.
Their invaluable research opportunities and technical expertise have profoundly influenced my
engineering journey.
On a personal note, the unwavering support from my parents, brother and close friends has
been a foundation throughout this research journey. Sincerely, thank you all.
Finally, a significant portion of this research was made feasible by the support from NSERC.
III
Table of Contents
Abstract ............................................................................................................................... II
Acknowledgements ............................................................................................................III
6.1. Contributions..........................................................................................................99
References ........................................................................................................................107
VI
List of Figures
Figure 2.1. Automated peg-in-hole assembly system with compliance control system [18]. ........ 8
Figure 2.2. Automated peg-in-hole assembly system with a robot manipulator and a camera
[8]. ......................................................................................................................................... 10
Figure 2.3. Unit motions for peg-in-hole assembly. (a) Pushing. (b) Rubbing. (c) Wiggling. (d)
Screwing. [3] ......................................................................................................................... 11
Figure 2.6. Generated welding path of a golf club from the system using computer vision [53].
............................................................................................................................................... 24
Figure 2.8. Automated QI for ceramic tiles using only traditional vision techniques for
detecting cracks [58]. ............................................................................................................ 27
Figure 2.9. Visualization of features in different layers of the CNN algorithm presented by Ren
et al. (A) Original image. (B) Feature map from first convolutional layer. (C) Feature map
from third convolutional layer. (D) Feature map from fifth convolutional layer [59].......... 29
Figure 3.4. Peg insertion tool. (a) Cross-sectional view. (b) Isometric view. (c) 3D printed
prototype. .............................................................................................................................. 38
VII
Figure 3.5. Assembly state after peg launch. ................................................................................ 39
Figure 3.6. Components of the peg insertion tool. (a) Combined assembly of the plunger with
the top-view camera, highlighting their positioning on the peg insertion tool. (b) Detailed
view of the plunger. .............................................................................................................. 40
Figure 3.7. Components of the peg insertion tool. (a) Rear view of the pusher. (b) Isometric
view of the pusher. (c) Combined assembly of the pusher, highlighting its positioning on
the peg insertion tool. ............................................................................................................ 41
Figure 3.8. Reloader. (a) Auxiliary view. (b) Photo of the integrated machine with the reloader.
............................................................................................................................................... 42
Figure 3.10. Overview of the image processing and machine positioning algorithm. Red circle:
Detected closest circle; Red dot: Centre of the detected closest circle; Green “×” mark:
Target point. .......................................................................................................................... 49
Figure 3.13. Visual representation of the assembly workspace and the workpiece as captured
by the side-view camera. ....................................................................................................... 55
VIII
Figure 4.3. Statistical method results based on hole status. (a) Negative defect - white peg
inserted. (b) Negative defect - black peg inserted. (c) Positive defect.................................. 72
Figure 4.5. Sample hole images used for ML-based QI training. ................................................. 75
Figure 5.1. Assembly experimental light settings (a) Room. (b) Room + lamp brightness Level
1. (c) Room + lamp brightness Level 2. (d) Lamp brightness Level 1. (e) Lamp brightness
Level 2. ................................................................................................................................. 79
Figure 5.2. Experimented workpiece types (a) Metallic surface. (b) Metal surface wrapped in
black vinyl film. (c) 3D printed with blue PLA plastic filament. ......................................... 82
Figure 5.3. Hole image sample for QI testing. Samples are extracted from the following
workpiece types. (a) Metallic surface. (b) Metal surface wrapped in black vinyl film. (c)
3D printed with blue PLA plastic filament. .......................................................................... 86
Figure 5.4. QI testing results on hole images from the metallic surface workpiece (Fig. 5.3a)
using different ML models. (a) Traditional CNN. (b) ResNet.............................................. 88
Figure 5.5. QI testing results on hole images from the metal surface wrapped in black vinyl
film workpiece (Fig. 5.3b) using different ML models. (a) Traditional CNN. (b) ResNet. . 89
Figure 5.6. QI testing results on hole images from the 3D printed with blue PLA plastic
filament workpiece (Fig. 5.3c) using different ML models. (a) Traditional CNN. (b)
ResNet. .................................................................................................................................. 90
IX
List of Tables
Table 4.1. Minimum and maximum of hundred σ values for each hole status. ............................ 73
Table 5.1. Assembly system results on the light settings (Fig. 5.1) with 20 trials per setting.
The successful completion of an assembly trial is determined when all 20 holes on the
workpiece are filled with the pegs. ....................................................................................... 81
Table 5.2. Assembly system results on the workpiece types (Fig. 5.2) with 20 trials per type.
The successful completion of an assembly trial is determined when all 20 holes on the
workpiece are filled with the pegs. ....................................................................................... 82
Table 5.3. Performance of three QI methods for each workpiece type (Fig. 5.3) with 615 hole
images per each type test. Identical hole images are used for each QI method. ................... 92
Table 5.4. σthreshold adjustment sensitivity analysis test results on the statistical method with 615
hole images per each adjustment. ......................................................................................... 93
Table 5.5. Different light setting (Fig. 5.1) sensitivity analysis test results on the statistical
method with 615 hole images per each setting. .................................................................... 94
X
Chapter 1. Introduction
Hole assemblies, such as peg-in-hole, are manufacturing processes used in the automotive, oil
and gas, aerospace, and construction fields. One prime example is the frac plug used in the oil and
gas sector. These plugs, essential for hydraulic fracturing—a key operation for extracting oil from
unconventional shale oilfields—feature a cylindrical body riddled with multiple holes intended for
peg assembly. With shale oilfields becoming dominant contributors to the global oil supply in the
last two decades [1], the demand for frac plugs has surged. Despite this, many factories still rely
on manual labor for hole assembly, struggling to keep pace with increasing product demands. This
labor-intensive approach, combined with the diversity in workpiece dimensions and hole
systems. However, the pressing need to improve production efficiency and reduce costs has
spurred the demand for automated solutions in the manufacturing sector [2, 3]. As a result, the
development of novel mechatronics systems that combine advanced automation, computer vision
(CV) techniques and machine learning (ML) algorithms has emerged as a solution to address the
challenges posed by peg-in-hole assemblies, allowing for efficient and reliable automated
Flexible automation in manufacturing and assembly provides an adequate solution for many
manufacturers in meeting their high demand while dealing with different product variations and a
changing environment. Sensors play a vital role in flexible automation because they enable
1
automated systems to detect and read the changes and behave accordingly. Hence, sensor-based
technologies such as radiography [4], laser scanners [5] and vision [6] have been used for
automated systems. Although radiography and laser scanners provide measurements with high
precision and accuracy, their integration with automated systems is challenging due to additional
requirements. In Canada, for example, Health Canada has established Safety Code 34, which
outlines the radiation protection and safety requirements for industrial X-ray equipment [7]. This
code mandates the creation of a controlled environment for radiographic operations, a provision
that can sometimes be complex to implement. Therefore, alternative sensors are being explored to
common approach is to use CV-based automation, which offers several advantages, such as cost-
effective devices and various techniques for different applications. These benefits have accelerated
Customized mechatronics systems for similar assembly processes have been proposed in
previous studies [8, 9, 10]. Most of these systems employ robotic systems with a gripper to perform
the assembly. The detection system is mainly composed of an image acquisition device, connected
to an electronic device for image processing and robot control. However, these robotic systems are
often time-consuming, especially during the pick-and-place process. For instance, a robot arm
must first move to the peg, grasp it, then navigate back to the designated hole for assembly. When
faced with multiple holes, this sequence must be executed repeatedly, underscoring the need to
enhance the efficiency of this mechanism. Additionally, expenses can escalate due to complex
joint control involving multiple sensors to ensure accurate assembly. Therefore, an advanced
2
mechatronics system with a simple control, low system costs, and high assembly efficiency is
necessary to automate the peg-in-hole assembly with high efficiency and repeatability.
In addition to manufacturing and assembly, CV techniques for quality inspection (QI) have
been developed and discussed in numerous studies for product inspection [11]. They provide low-
cost and high-accuracy solutions compared to human labour. The CV techniques typically involve
steps of image data acquisition, image preprocessing, image segmentation and region of interest
extraction, and image processing. Lately, the integration of ML methods within CV systems has
gained traction due to their broad application potential [12, 13, 14]. However, they necessitate
highly trained experts to generate the appropriate model [15]. Dataset preparation for training often
requires intensive manual work, such as labelling or modifying each image [16]. The model must
be retrained if the system requires additional classifications, with use of additional images that
have been prepared with manual work. As such, ML methods might not be universally optimal,
especially if iterative upgrades are expected. Systems based on traditional CV techniques can
parallel the functionalities of ML methods. However, they often require extensive customization
for each application, which can introduce uncertainty during method development. In contrast,
traditional CV techniques are computationally efficient, and their results are more traceable for
subsequent analysis. Thus, developing adaptable methods with traditional CV techniques can
1.2. Objectives
assembly system, complemented by a QI algorithm. This goal is achieved through the following
specific objectives:
3
Objective 1: Design of a novel peg-insertion machine for improved production efficiency
This objective introduces a peg-insertion tool that automates the hole assembly process.
Unlike existing models that employ robotic grippers, this tool aims to insert pegs into specified
holes with precision. A thorough exploration of the peg-insertion process highlights challenges
and prerequisites. Factors such as peg dimensions, hole specifics, insertion pressures and
Key components, such as the three-axis machine, rotary roller and syringe pumps, are
integrated into the system. The design balances durability and compactness, making it fit for
industrial contexts. An advanced automation control system integrates into the design. By merging
microcontrollers with visual feedback mechanisms, the machine offers precise movement during
operations. Following its development, the machine undergoes tests to assess its functionality,
algorithm
the system. Each stage, from image acquisition to tool path generation, is carefully managed. A
suitable camera system captures images of assembly components. Once obtained, preprocessing
methods improve image quality and support the subsequent feature extraction process. Utilizing
the You Only Look Once (YOLOv5) architecture, an object recognition system identifies assembly
workpieces. Feature extraction then employs established techniques such as edge detection and the
The aim is to develop a CV algorithm that differentiates between positive and negative
assembly defects. Using the established image capture system, images are processed to ensure
quality for defect detection and classification. Three prediction models are in development for QI.
The statistical model incorporates traditional CV techniques and statistical analysis. The CNN
model uses convolutional neural networks, and the residual network (ResNet50) model applies
deep learning for defect detection. These models are in training and validation phases, with a focus
This thesis is structured into several chapters to provide a comprehensive understanding of the
research conducted. 0 offers a thorough literature review encompassing various topics that are
relevant to this thesis. The aim is to familiarize the reader with the terminology and concepts
associated with each topic, as well as present an overview of the current progress in these areas
In Chapter 3, the setup of the assembly system is described, along with the development of
the CV algorithm and the control algorithm specifically designed for this system. The focus is on
providing detailed insights into the technical aspects of the system’s construction and operation.
Moving on to Chapter 4, the vision-based QI system developed for this project is explained.
The chapter explores the specifics of the algorithm employed and outlines the development and
Chapter 5 presents the results of tests conducted with the prototype system and engages in a
outcomes and highlighting key insights. Additionally, it offers recommendations for future work,
6
Chapter 2. Literature Survey
This chapter provides a comprehensive literature survey of two distinct research areas related
to this study. The primary objective is to establish a foundational understanding of key concepts
and terminology within each field, while also emphasizing their current advancements. The first
section focuses on the domain of assembly automation, specifically centred around peg-in-hole
systems and discusses devices employed in such systems. In the second section, an extensive
approaches. Furthermore, the chapter summarizes the various methods utilized for feature
extraction and quality control in CV. Finally, a comprehensive summary is provided, highlighting
The development of automated peg-in-hole assembly processes has embraced the utilization
of robotic arms due to their high flexibility. Consequently, significant research has been conducted
in the field of robotic control to facilitate the peg-in-hole assembly. Numerous techniques,
including feedback [9] and compliant control [3], elastic displacement device [9], and impedance
control [17], have been extensively studied to adopt robotic systems for the assembly. Notably,
Song et al. [18] presented a peg-in-hole assembly system using a robot arm (manipulator) equipped
7
with a force sensor, as shown in Fig. 2.1. The new assembly strategy was proposed that does not
need exact relative pose data between the assembled parts. This system did not consist of sensors
like camera that can read the surroundings. Instead, the dragging teaching mode with manual
training process of the robot arm is used to drag the peg to the vicinity of the hole. Instead, a
"dragging teaching mode" was employed, wherein a manual training process guides the robot arm
to bring the peg close to the hole. However, this assembly technique begins with the presumption
that assembly proceeds in a singular direction, indicating potential areas for enhancement.
Figure 2.1. Automated peg-in-hole assembly system with compliance control system [18].
The additional sensors like cameras for the peg-in-hole assemblies are adopted to easily
approximate the position and orientation of the hole. Utilizing visions systems that uses cameras
combined with image processing algorithms offers an efficient method to capture detailed visual
information from the assembly area. This visual feedback, when integrated with the control
8
mechanisms of the robotic arm or manipulator, facilitates enhanced precision and adaptability,
especially in environments where the hole's position may be subjected to minute variations or
disturbances. Nigro et al. [8] presented an innovative assembly system in which a camera was
affixed to the gripper of a robotic manipulator, as illustrated in Fig. 2.2. Their approach utilized
the You-Only-Look-Once version 3 (YOLOv3) object detector for hole detection, complemented
by a 3D surface reconstruction method. The data acquired from this detection phase steered the
robot's approach towards the hole. Nonetheless, certain positional inaccuracies stemming from
factors such as reprojection, reconstruction, and merging were identified. Following the initial
positioning, the assembly's peg insertion phase was executed, adopting an admittance control that
conferred the required compliance to the peg. Yet, the process still manifested an augmented error
rate during insertion, largely attributed to the aforementioned positional discrepancies and the
absence of any subsequent hole detection after the initial approach. An effective solution to
counteract these issues could lie in the conceptualization of a feedback mechanism, continually
9
Figure 2.2. Automated peg-in-hole assembly system with a robot manipulator and a camera [8].
Other control techniques for the peg-in-hole assembly are developed to compensate for the
positional errors. Combination of force control with the vision system is one of the most widely
adopted control techniques because it provides a direct measure of the contact situation between
the peg and the hole. Deng et al. [19] used force control in addition to the vision control to solve
the position error in its assembly. However, attempts to simplify the process by reducing the
investigated the use of motion planning algorithms and trajectory optimization techniques to
achieve precise and reliable assembly. As a part of this, Park et al. [3] proposed an assembly
strategy (Fig. 2.3) that consists of an analysis of the state of contact between the peg and the hole
and unit of motions to resolve different states and replace expensive devices such as sensors or
10
Figure 2.3. Unit motions for peg-in-hole assembly. (a) Pushing. (b) Rubbing. (c) Wiggling. (d)
Screwing. [3]
The techniques reviewed in this section revolve around the utilization of robotic arms and
grippers to perform the peg-in-hole assembly process. However, this design approach imposes a
constraint on the number of pegs that can be assembled simultaneously, limiting it to one or fewer
than five pegs. After inserting a peg, the gripper needs to retrieve second peg from storage and
return to the hole, which introduces the runtime for the pick-and-place mechanism. It is desirable
to minimize this runtime, and therefore, additional research is being conducted to explore
CV has emerged as a prominent field of research, driven by the extensive exploration of vision
sensors. Vision sensors provide unparalleled richness of information compared to other sensory
parameters. This inherent advantage has propelled the evolution of CV and machine vision
recent surge in ML has significantly broadened the capabilities of CV, revolutionizing the field by
11
2.2.1. Image acquisition
The implementation and design of CV technology starts with the selection of an image
semiconductor (CMOS) are the two most commonly used image sensors in cameras. CCD offers
several benefits, including a broad dynamic range, exceptional sensitivity and resolution, minimal
distortion, and compact size. Meanwhile, CMOS is known for its cost-effectiveness, low power
consumption and high level of integration [20]. In addition, the selection of camera type plays a
vital role in the performance of CV. The camera type is decided by the arrangement of the
photosensitive units. Line scan cameras consist of photosensitive units distributed linearly in a
line, whereas area scan cameras consist of units distributed widely in two dimensions [21].
Two separate image acquisition units computationally merged can create 3D cloud images
using stereo vision techniques. This enables users to capture the depth and positional information
of the scene. Similarly, light detection and ranging and time-of-flight sensors acquire images that
provide depth information about the scene. They measure distance by emitting an artificial light
signal such as laser or LED and measuring the time for the reflected light to return to the receiver.
By measuring the distance, they can offer a wide array of applications such as shape analysis [22],
The images acquired by the sensors undergo a series of image processing steps to extract the
necessary features and information. The initial step in image processing is often image denoising,
12
which aims to reduce the noise present in raw images. By applying denoising techniques, such as
averaging, Gaussian and Kalman filters, the noise can be effectively suppressed, enhancing the
quality of the image and facilitating subsequent analysis [25, 26, 27, 28]. Once denoising is
performed, region of interest (ROI) identification is commonly carried out to focus on the relevant
information while eliminating redundant background details, thereby enhancing system accuracy
and computational efficiency. This technique, although simple in its calculation, has proven to be
highly effective. Zhang et al. [29] utilized ROI identification to develop an off-axis vision
scenarios. However, it is important to note that ROI identification relies on user-defined constraints,
Feature extraction plays a crucial role in enabling the system to identify distinctive traits in
the image for subsequent analysis. Various methods are available for feature extraction, including
thresholding, edge detection and segmentation techniques. Thresholding divides the pixels of an
image into different categories by setting a threshold value. Over the years, numerous thresholding
methods have been proposed, particularly for images with varying grayscale values between the
target and background [30]. These methods encompass fixed thresholding, histogram-based
techniques and adaptive thresholding [31, 32, 33]. The fixed threshold method is characterized by
its speed and suitability for cases where there is a significant difference between the image
background and the target. It involves a simple comparison between the threshold value and the
pixel values to determine whether to retain or discard them based on the threshold value [31].
Histogram-based methods, such as histogram equalization, adjust the contrast of the image using
its histogram, making them among the most widely employed image enhancement techniques [32].
13
Adaptive thresholding is similar to the fixed thresholding method but dynamically adjusts the
threshold value across the image, resulting in a more localized and adaptive approach [33]. The
edge detection methods are another commonly employed technique for the feature extraction. The
edge detection methods use the discontinuity at the edge pixels in their grayscale values to identify
edges through derivation. The first derivative edge detection operators mainly include Sobel and
Prewitt, and the second derivative operators mainly include Canny, Laplacian, Laplacian of
The enhanced and edge-identified images are further processed to extract various features,
including texture, transform domain and shape. Texture features capture the visual patterns and
variations in the image, allowing for texture-based analysis and classification. Common texture
feature extraction methods include local binary patterns (LBP) [35], gray-level co-occurrence
matrices (GLCM) [36] and Gabor filters [37]. Transform domain features involve transforming
the image into a different domain, such as the frequency domain using Fourier transform or wavelet
in the image, enabling the extraction of frequency-based features or multi-resolution analysis [38].
Shape features describe the geometric characteristics and contours of objects within the image.
These features can be extracted using techniques such as contour tracing [39], Hough transform
[40] or mathematical shape descriptors such as Hu moments or Zernike moments [41, 42].
Although traditional CV techniques have proven to be reliable and widely adopted in various
applications, they do have drawbacks when compared to ML-based CV techniques. One of the
main limitations of traditional methods is their reliance on handcrafted features and heuristics,
which can be labour-intensive and require domain-specific knowledge for fine-tuning [43]. This
14
manual feature engineering process may not always capture all possible intricate patterns and
complexities present in the image, potentially leading to reduced performance and accuracy in
challenging scenarios.
Furthermore, traditional CV techniques may not perform as designed when confronted with
variations or when adapting to new and unseen data. These techniques often lack the flexibility to
generalize well to diverse and dynamic environments because they heavily depend on predefined
rules and assumptions [43]. Interpreting and explaining the decisions made by traditional CV
algorithms can also be challenging. As the number of traditional CV techniques employed within
an algorithm increases, the reasoning behind the results may become less transparent, posing
difficulties in understanding the underlying processes and potentially limiting their applications in
alongside the emergence of ML-based algorithms. Traditional techniques serve as the foundation
for various algorithms and possess distinct advantages, including lower computational
requirements and easily interpretable behaviours. These advantages make traditional techniques a
subject of ongoing research and practical implementation [43]. Despite the advancements in ML,
traditional CV methods continue to contribute to the field, providing optimized solutions and
Over the past decade, ML-based algorithms have gained significant popularity and proven
their effectiveness in various CV tasks, including region detection, feature extraction and image
detection and vehicle recognition. By leveraging the power of ML, these algorithms can learn
complex patterns and features from large datasets, leading to improved accuracy and performance
[43].
Fig. 2.4 illustrates the evolutionary journey of ML-based CV techniques, particularly focusing
on object detectors. Over time, these detectors have undergone significant advancements and
improvements, driven by research and technological developments in the field of ML. The early
stages of ML object detectors are characterized by simpler architectures and limited capabilities.
However, with the introduction of machine learning and CNNs, the performance of object
detectors has significantly improved, enabling them to achieve higher accuracy and handle more
complex tasks.
as Viola Jones (VJ) detectors, faster region-based convolutional neural network (RCNN), YOLO
and single shot multibox detector. These architectures are designed to address specific challenges,
such as faster inference speed, better real-time performance and improved accuracy. For earlier
16
detectors, such as the VJ Detector, the algorithms are built based on the handcrafted features to
compensate for the lack of effective image representations. VJ Detector is the first real-time human
face detector without any constraints. It used the sliding window technique to go through all
possible locations and scales in an image to localize the human face. It used three important
techniques, namely integral image, feature selection and detection cascades, to outperform other
Following the earlier stages, a significant advancement in ML object detection occurred with
the groundbreaking work of Krizhevsky et al. [46], who presented AlexNet, a deep learning model
based on the CNN architecture. Prior to AlexNet, while CNN architectures had existed, none had
demonstrated the same level of efficacy on large-scale datasets. AlexNet's design was unique,
consisting of multiple convolutional layers, followed by max-pooling, fully connected layers, and
a final softmax layer, making it substantially deeper than previous models. It also introduced the
efficient usage of the Rectified Linear Unit (ReLU) as its activation function, which accelerated
training times by mitigating the vanishing gradient problem. A significant aspect of AlexNet's
success was its performance in the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), where it dramatically reduced the error rates compared to prior methods. This wasn't
just a victory, but a demonstrative showcase of deep learning's capability, particularly with CNN
image classification tasks inspired researchers to explore the potential of CNNs in object detection.
The success of AlexNet led to the development of CNN-based two-stage detectors that use a
multi-stage approach to object detection. They involve a separate region proposal step to identify
potential object regions in the image, followed by a classification and regression step to refine the
17
proposed regions and classify the objects [47]. This approach allowed for more accurate and robust
object detection, especially in complex scenarios with multiple objects and overlapping instances.
RCNN, developed by Girshick et al. [48] is one of the first two-stage detectors to leverage
CNNs for object detection. They proposed an innovative approach that involved generating region
proposals using selective search and then performing CNN-based classification and bounding box
regression on these regions. Despite its significant progress in accuracy, RCNN had some
limitations, particularly in terms of speed and computational complexity, due to the need to process
In response to these challenges, Ren et al. [49] introduced Faster RCNN as an improvement
over RCNN. It incorporated a Region proposal network that learned to generate region proposals
directly from the convolutional feature maps, eliminating the need for external region proposal
methods. This innovation led to faster and more efficient object detection, with improved accuracy
Another key advancement is the introduction of Feature pyramid networks (FPN), presented
by Lin et al [50]. FPN aimed to address the issue of scale variation in object detection. FPN
proposed a top-down architecture that allowed feature maps of different scales to be merged and
used for object detection. By leveraging multi-scale features, FPN significantly enhanced the
ability of detectors to detect objects of varying sizes, leading to more robust and accurate
performance.
in a single forward pass through the neural network. Unlike their two-stage counterparts, which
adopt a multi-stage approach, one-stage detectors directly predict location and size (bounding
18
boxes), and class scores for potential objects in the input image without requiring a separate region
The efficiency and simplicity of one-stage detectors stem from their direct prediction process,
wherein they simultaneously determine object locations and associated class labels. This
characteristic enables them to achieve real-time performance, making them ideal for applications
where speed is critical. Additionally, one-stage detectors are well-suited for scenarios involving
numerous objects in cluttered environments because they can efficiently detect multiple instances
in a single pass.
YOLO is one of the major one-stage detectors presented by Redmon et al [51]. The YOLO
architecture is designed to achieve real-time performance, making it highly suitable for time-
critical applications, such as video surveillance and autonomous vehicles. By leveraging a unified
framework, YOLO can handle multiple objects simultaneously, effectively detecting and
classifying objects even in cluttered environments. Its single-shot nature allows YOLO to strike a
balance between speed and accuracy, making it an attractive choice for various CV tasks including
manufacturing.
19
Figure 2.5. YOLO prediction model [51].
The continual advancement of YOLO has led to the publication of multiple versions, including
v2, v3, v4 and v5. The successive iterations of YOLO have addressed various challenges and
further optimized its detection accuracy and processing speed. As a result, YOLO has garnered
widespread attention and adoption in the CV community, driving innovations in real-time object
As the technology continues to advance, it is expected that ML-based CV techniques will play
an increasingly critical role in ensuring consistent and high-quality manufacturing outputs across
various industries. Additionally, the integration of transfer learning and fine-tuning methodologies
allowed ML CV techniques to be more adaptable and efficient because they could leverage
pretrained models to handle new and diverse datasets with reduced training time. Furthermore, the
continuous expansion of annotated datasets and the availability of powerful hardware accelerated
20
the progress of ML object detectors, enabling them to excel in numerous applications. With the
advent of edge computing and embedded systems, ML-based CV techniques have become
object detectors will undergo further advancements, unlocking new possibilities and applications
Although ML-based CV techniques have shown remarkable progress and achieved state-of-
the-art performance in various applications, they are not without their downsides compared to
traditional CV techniques. One of the main disadvantages of ML-based approaches is their high
computational resources, making them less suitable for resource-constrained environments [43].
In contrast, traditional CV techniques often rely on handcrafted algorithms and heuristics, which
generally have lower computational overheads, enabling faster and more efficient processing [43].
Another drawback of ML-based techniques is the need for extensive labeled datasets for
training. Building large and diverse datasets can be time-consuming and labour-intensive,
hindering the adoption of ML models in domains with limited data availability [43]. Traditional
CV techniques, conversely, can be more easily adapted and customized for specific tasks without
the need for extensive data-driven training. Additionally, the interpretability of ML models can be
produce interpretable results, ML models, especially deep neural networks, can be regarded as
“black boxes,” making it challenging to understand the reasoning behind their decisions. This lack
21
of transparency can raise questions about the reliability and trustworthiness of ML-based systems
[43].
Despite these challenges, both ML and traditional CV techniques have their unique strengths,
and the choice between them depends on the specific requirements and constraints of the
application. Striking a balance between the advantages and limitations of each approach is crucial
By combining these approaches, researchers and practitioners can leverage the strengths of each
method. For instance, a hybrid approach may involve using traditional techniques for initial feature
extraction and preprocessing, followed by ML algorithms for further analysis and classification.
This hybridization can lead to more comprehensive and accurate analysis of visual data while
providing an optimized computational efficiency [43]. For instance, image preprocessing steps
with traditional CV algorithms are often performed on an input image for a object detector that
CV technology has made significant strides in the field of manufacturing and assembly,
playing a pivotal role in component and feature detection, and aiding in the planning of tool paths.
identification or physical templates, which can be time-consuming and prone to errors. CV systems
offer a more efficient and reliable solution by automatically detecting and recognizing components
22
within an image or video stream. By leveraging advanced algorithms, these systems analyze visual
data to identify specific components based on their shape, colour, texture or other distinguishing
features. This automated component detection not only saves time but also improves accuracy,
reducing the risk of assembly errors and enhancing overall productivity. Once components are
detected, planning tool paths around them can be used for optimizing the assembly process. CV
can assist in this aspect by providing information about the positions and orientations of
components. By utilizing cameras and sensors, the system can accurately determine the location
This information can then be used to plan efficient tool paths that minimize the distance travelled
by robotic arms or other automated systems, reducing assembly time and improving production
throughput. Additionally, CV can help identify obstacles or potential collisions, enabling the
system to plan safe and collision-free tool paths. Tsai et al. [53] demonstrated vision technology
that recognizes the weld seam on gold-club heads and automatically generates a welding path for
the robot (Fig. 2.6). For some of the early works, Solvang et al. [54] presented a robot path
generation strategy that uses the vision system to track a line that has been drawn by an operator
23
Figure 2.6. Generated welding path of a golf club from the system using computer vision [53].
These include image segmentation algorithms for isolating components from the background,
feature extraction algorithms for capturing relevant geometric or textural information and pattern
techniques are clear as mentioned earlier in section 2.2.2, and they generally need human
intervention to make the final classification. Hence, they are mostly used as a supplementary tool
that helps operators working at the site. In addition, some techniques require references to specific
features or tools for calibration or boundary definition, adding complexity in adopting the
techniques. De Araujo et al. [55] presented an image processing algorithm based on only
traditional CV techniques to find the workpiece position on three-axis machine to aid machining
workers in their operations. A physical reference point called part zero is used to zero the object’s
24
Figure 2.7. Workpiece referencing computer vision system [55].
components, eliminating the need for calibration or explicit boundary definition, which are major
constraints of traditional CV techniques. ML object detectors also possess the capability to adapt
diverse datasets that encompass a wide range of lighting scenarios, camera perspectives and
component variations, CV systems can acquire the ability to generalize and make accurate
significant advantage because their performance can continuously improve through continuous
learning and exposure to new data. Consequently, ML object detectors provide the necessary
flexibility and adaptability to effectively address the challenges posed by dynamic environments
25
Derived from these benefits, ML has proven to be highly successful in various process
optimization, monitoring and control applications [56]. For instance, Jin et al. [57] developed a
real-time monitoring and autonomous correction system using an ML model to modify 3D-printing
parameters iteratively and adaptively in additive manufacturing processes. This system can detect
in-plane printing conditions and in-situ correct defects faster than a human can.
The integration of CV into manufacturing and assembly processes brings several benefits. It
providing accurate component information, CV facilitates precise tool path planning, minimizing
assembly time and maximizing efficiency. Moreover, the ability of CV systems to adapt to
CV for manufacturing and assembly. However, the primary distinction lies in the final output with
and assembly applications focus on feature extraction for localization and positioning purposes.
effectiveness and practicality in various manufacturing applications. These methods offer several
advantages, including lower computational requirements. With manual feature engineering, these
systems can be tailored to specific manufacturing processes and product characteristics, allowing
for precise defect detection and classification. Additionally, traditional CV techniques have a
proven track record of reliability and accuracy in detecting common defects and flaws in
26
manufacturing processes. By leveraging these well-established methods, manufacturers can
benefit from cost-effective and efficient QI solutions that have stood the test of time.
insights into the detected defects and contributing to a better understanding of the production
process. As a result, these systems continue to play a crucial role in ensuring product quality,
Hocenski et al. [58] developed an automated QI system for ceramic tiles using only traditional
vision techniques, including Canny edge detection and histogram subtraction method. Their
and ensuring the quality of ceramic tiles (Fig. 2.8), showcasing the practicality and reliability of
Figure 2.8. Automated QI for ceramic tiles using only traditional vision techniques for detecting
cracks [58].
manufacturing. These detectors leverage the power of ML algorithms, such as CNNs, to identify
defects, anomalies and deviations accurately and efficiently in manufactured products. By training
27
on large datasets of both defect-free and defective products, these detectors can learn the
distinguishing features and patterns associated with different types of defects. As a result, they can
quickly and reliably identify and classify defects, ensuring that only products of the highest quality
reach the market. Moreover, ML-based object detectors offer the advantage of adaptability,
allowing them to handle variations in product appearance and environmental conditions, making
For instance, Ren et al. [59] applied a CNN-based algorithm to perform defect classification
for surface inspection tasks, encompassing diverse surfaces such as Northeastern University
surface defect database, weld, wood and micro-structure surfaces. This algorithm effectively built
Subsequently, pixel-wise prediction is conducted by convolving the trained classifier over the
input image, enabling image classification and defect segmentation. The algorithm’s performance
28
Figure 2.9. Visualization of features in different layers of the CNN algorithm presented by Ren
et al. (A) Original image. (B) Feature map from first convolutional layer. (C) Feature map from
third convolutional layer. (D) Feature map from fifth convolutional layer [59].
Similarly, in the domain of hot-rolled strip steel inspection, Feng et al. [60] developed an ML-
based surface defect classifier employing ResNet50 architecture augmented with FcaNet and
mechanisms, the model demonstrated its ability to effectively capture intricate details and
processes not only improves product quality but also streamlines inspection procedures, reduces
The literature survey in this chapter provides a comprehensive review of two key research
areas related to this study: assembly automation and CV in manufacturing and assembly. In the
various industries such as robotics, electronics and automotive. The chapter concerns the
significance of fast and precise assembly and explores various control techniques. It also addresses
the constraint of the current design approach, which limits the speed of the assembly, and
highlights the ongoing research to optimize systems for efficient peg-in-hole assembly.
The second part of the chapter explores CV techniques, categorizing them into ML-based and
underlines the potential and growth of ML-based CV techniques. This suggests a possible
increasing reliance on such methods for consistent manufacturing outcomes in the future.
Traditional CV techniques remain valuable due to their low computational requirements and
the interpretability of their results. These methods have consistently supported various
manufacturing tasks such as component detection and tool path planning. Their straightforward
nature and ability to be adjusted for specific manufacturing scenarios make them an integral part
approaches. Although the traditional methods continue to offer reliable solutions, ML techniques
30
In conclusion, the literature presents a scenario where traditional CV and modern ML
techniques coexist. This combination is likely to provide the manufacturing sector with diverse
tools to address varying needs. As research progresses, further insights and improvements in both
31
Chapter 3. Peg-in-Hole Assembly System
This chapter focuses on the design and implementation of the prototype for the proposed
assembly system, encompassing both hardware and software components. The main objective of
the system is to assemble the workpiece with the pegs, as illustrated in Fig. 3.1. The workpiece
itself possesses a cylindrical geometry with a diameter of 93.0 mm. It incorporates 20 surface
holes, each with a diameter of 9.7 mm and a depth of 7.1 mm. The assembly process involves the
utilization of pegs, available in black and white colours (represented as yellow in Fig. 3.1 simply
to improve the visibility). These pegs are standardized, measuring 9.5 mm in diameter and 6.9 mm
in height. Both the workpiece and peg dimensions come with a tolerance of ±0.1 mm.
A key aspect of this assembly process is ensuring a 200 µm clearance between the pegs and
holes. The system is therefore tailored to uphold this clearance. An additional step involves
applying glue to firmly anchor the pegs. Lastly, the assembly process does not require any specific
32
This assembly (Fig. 3.1) is the frac plug, a crucial component in the oil and gas industry.
Surprisingly, there have been no automated systems introduced for its assembly thus far. This has
led to the reliance on manual labor for what is, essentially, a basic and repetitive task. Such a
reliance can be considered a suboptimal use of skilled labor, especially when these professionals
could be more effectively utilized in performing complex and intricate operations. Recognizing
this gap, the thesis aims to develop a system to automate the assembly process, significantly
The hardware design integrates the base machine with the peg insertion tool, and the software
design emphasizes image processing techniques for automating the assembly process. As depicted
in Fig. 3.2, the assembly system prototype's primary components are the central computer,
microcontroller, two cameras, and the assembly machine. Throughout the chapter, the design
process will be explored and discussed. These iterations showcase the evolution of the system
design, highlighting the improvements and enhancements made to enhance its functionality and
performance.
33
Figure 3.2. Configuration of the assembly system.
This section presents the detailed hardware design of the proposed automated peg-in-hole
assembly system prototype. The hardware components are carefully selected and integrated to
ensure smooth and accurate operation of the system. The key elements of the hardware design
The key objective of the design of the hardware is to achieve full control over the workpiece.
For this, the three-axis machine (CNCTOPBAOS 3018 Pro CNC Router Kit) is selected as the
34
base machine, and the rotary roller (Genmitsu Laser Rotary Roller Engraving Module) is installed
The hardware selection process for the system design prioritized the use of commercially
available components to ensure the simplicity and accessibility of the overall solution. The system
incorporates four NEMA 17 stepper motors (1.3A), with each motor dedicated to controlling a
To drive these stepper motors, the system utilizes TB6600 stepper motor drivers. The TB6600
drivers are H-bridge bipolar constant phase flow drivers that provide micro-stepping capability,
allowing for smoother and more precise motor control. In this system, the motors are operated at
a current of 1.0A. The motors accountable for y-axis and r-axis movements (Fig. 3.3), which are
crucial for workpiece alignment, are set to run on a 32 micro-step setting. The remaining motors
are configured to operate on the full step setting. By carefully selecting and configuring the
components, the system can achieve the desired level of control and accuracy in its motion
operations.
35
Figure 3.3. Hardware configuration of the assembly machine prototype.
To ensure consistency and repeatability during the assembly process, the peg insertion tool is
securely attached to the tool mount, while the camera mount remains fixed, maintaining a constant
relative position between the camera and the tool. Detailed information regarding the peg insertion
To achieve glue metering, the system incorporates a syringe pump (DIY Syringe Pump,
RobotDigg) that is controlled by a stepper motor. This syringe pump is driven by the TB6600
36
driver, utilizing a 32 micro-step setting to ensure controlled metering of the glue application. The
syringe is mounted onto the syringe pump, and a needle adapter is connected to PVC tubing. The
other end of the tubing is attached to a blunt tip needle. Connections between the tubing is
established using hose barbs and Luer-lock fitting adapters. The selection of the needle size
(25Ga), tube material (PVC transparent hose vinyl tubing) and magnitude of the syringe pump
actuation is determined through iterative testing to ensure controllability and sustainability. The
needle tip, responsible for dispensing the glue, is positioned within the peg insertion tool where
The peg insertion tool, shown in Fig. 3.4, performs two main roles: storing and launching
pegs. It features a long loading path designed to hold multiple pegs in a sequential stack. This path
intersects the pusher path near its base, where the launching mechanism operates. Powered by a
stepper motor (NEMA 14, 0.4A), the pusher employs a linear back-and-forth movement. When
actuated in one direction, it releases one peg from the base of the stack. In the reverse motion, it
prepares the next peg for dispatch. This pusher and its corresponding path have undergone iterative
design and tests, ensuring its mechanical integrity and consistent performance. Once the peg
reaches the end of the pusher path, termed the peg launch zone, it descends into its designated hole,
completing the peg launch. Consequently, the need for a pick-and-place mechanism is eliminated,
37
Figure 3.4. Peg insertion tool. (a) Cross-sectional view. (b) Isometric view. (c) 3D printed
prototype.
38
Figure 3.5. Assembly state after peg launch.
Fig. 3.6a illustrates the strategic design and positioning of the glue needle and top-view camera
within the peg insertion tool. The glue needle, linked to a syringe pump, is situated at a distance
of 7.41 mm from the peg launch zone. This configuration ensures that the machine can alternate
between applying glue and executing peg assembly within the glue's curing time of 10 seconds.
Furthermore, the top-view camera is housed within the plunger, enabling direct visualization of
the target hole during the assembly process. This placement allows the plunger to exert downward
pressure on the launched peg without necessitating any machine repositioning. For this purpose,
39
Figure 3.6. Components of the peg insertion tool. (a) Combined assembly of the plunger with the
top-view camera, highlighting their positioning on the peg insertion tool. (b) Detailed view of the
plunger.
The illustrations in Fig. 3.7a and b showcase the pusher's rear and isometric views,
respectively, highlighting the presence of the pusher guide. Designed to fit within the pusher guide
path, which is essentially a groove depicted in Fig. 3.7c, this guide ensures the pusher maintains
linear oscillation without veering off its dedicated course. Furthermore, the peg contact area on the
pusher features a contoured design, facilitating an enhanced grip when interfacing with the peg.
This design maximizes the contact surface area therefore prevents the peg from rebounding upon
initial contact. Consequently, the peg remains in consistent touch with the pusher throughout its
launch trajectory. Such a design prohibits any unwanted rotational movement of the peg within
the tool and ensures finer control of the peg drop position from the peg launch zone.
40
Figure 3.7. Components of the peg insertion tool. (a) Rear view of the pusher. (b) Isometric view
of the pusher. (c) Combined assembly of the pusher, highlighting its positioning on the peg
insertion tool.
3.1.2. Reloader
The reloader, an additional tool designed to be used with the peg insertion tool, serves the
purpose of reloading pegs into the peg insertion mechanism. This tool comprises two stepper
motors (NEMA 17, 1.5A), each responsible for actuating a sorter that rotates to align the
orientation of the pegs correctly. Additionally, the reloader features a vibration motor that
facilitates the movement of pegs down the funnel, ensuring a smooth and continuous feeding
process into the peg insertion tool. To minimize any undesirable effects of vibration on the peg-
in-hole assembly, a spring mount is incorporated between the reloader mount and the funnel,
isolating the vibration and maintaining the stability of the overall system. The reloader enhances
the efficiency of the peg reloading process, streamlining the assembly workflow and minimizing
interruptions.
41
Figure 3.8. Reloader. (a) Auxiliary view. (b) Photo of the integrated machine with the reloader.
42
3.1.3. Summary of the tools
The integration of the peg insertion tool and the reloader, with their specialized storage and
launching features, streamlines the automated peg-in-hole assembly process. By employing the
wiggling and pushing mechanisms, the system adeptly addresses the challenges of insertion,
guaranteeing precise and stable assembly. Using a 3D printer (Mega X, Anycubic) and PLA
filament (1.75mm), prototypes of both the peg insertion tool and the reloader are produced.
This section provides an in-depth exploration of the software design of the proposed
automated peg-in-hole assembly system prototype. The software design encompasses critical
aspects such as the communication protocol, assembly process, camera selection, and image
processing techniques.
The software operates on a central computer, powered by an 11th Gen Intel(R) Core(TM) i7-
11800H @ 2.30GHz CPU and an NVIDIA RTX A2000 Laptop GPU with 12 GB RAM. Python
(Version 3.9.10) is the chosen language, focusing on image processing and system control
algorithms.
The Python program interacts with the Teensy 4.1 microcontroller through serial
produces control signals based on the instructions from the Python program. Notably, the
microcontroller employs digital signals, encompassing digital high, low, and PWM signals, for
43
motor control. A core aspect of the design is the communication protocol between the
The communication between the central computer and the microcontroller is facilitated
data is transmitted serially, byte by byte. The chosen baud rate for this system is 9600 bit/sec.
The communication protocol is designed to streamline the interaction between the Python
program (central computer) and the microcontroller. Commands sent from the Python program to
value: "False" or "True". The transmission of a "False" response signifies that the microcontroller
has received the command and is initiating the requisite motor control sequence based on the
provided instructions. Once the motor operation concludes, the microcontroller returns a "True"
signal, denoting that it is ready to accept and execute the subsequent command.
This feedback mechanism is pivotal as it ensures command serialization, eliminating the risk
of command overlaps. Such serialization is crucial as it prevents potential errors that might arise
This section presents the implementation and programming of the assembly process presented
in Fig. 3.9. The algorithm begins by waiting for the placement of the workpiece on the base
machine’s worktable, which is detected by the side-view camera. Upon confirmation of the
workpiece’s presence, the algorithm proceeds to position the system based on the workpiece’s
44
centre. Subsequently, a series of image processing techniques is applied to locate the holes. By
comparing the displacement of all circles with the predefined target location, the algorithm
calculates the tool path based on the nearest hole. The target location is determined considering
the peg launch position and the camera’s location. The algorithm aligns the hole using the
computed tool path and applies glue. The final step involves launching and inserting the peg into
the hole, and the process repeats until all holes are filled. Throughout the assembly, the algorithm
ensures peg alignment and insertion by coordinating the synchronized motion of the base machine,
peg insertion tool and other system components. Feedback from vision sensors is integrated into
the algorithm to validate peg insertion and trigger subsequent steps. The subsequent sections will
explore the selection of cameras and the application of specific techniques to facilitate the
assembly process, providing a comprehensive understanding of the software design aspects of the
45
Figure 3.9. Assembly process flow diagram.
Consideration is given to the selection and integration of cameras into the software design.
The chosen cameras are positioned to capture different perspectives of the workpiece and the peg
insertion tool as shown in Fig. 3.2. The Zed mini camera, chosen as the side-view camera, offers
depth sensing capabilities and comes with a software development kit that facilitated the custom
software development for this project. Positioned at a distance of 125–150 mm from the centre of
the worktable and the rotary roller, the side-view camera provides a comprehensive view of the
entire worktable. To minimize external interference from the background, an acrylic board is
placed at the opposite end of the machine. However, due to the resolution limitations and the
46
substantial distance between the side-view camera and the workpiece, the accuracy is insufficient
to compensate for the 200 µm clearance between the peg and the hole. To address this challenge,
a top-view camera is incorporated into the system, offering a direct and consistent view of the
workpiece and the hole within a range of 15 mm. It also serves as a verifier, confirming successful
peg insertion before the algorithm proceeds to the next hole. The top-view camera selected is an
endoscope with a 5.5 mm outer diameter, installed inside the peg insertion tool using a custom
clamp mount, as illustrated in Fig. 3.4a. The technical specifications of the cameras used are
provided in Table 3.1, offering a comprehensive overview of their capabilities and functionalities
47
Table 3.1. Camera technical specifications.
Attribute Side-view camera Top-view camera
HD
rolling shutter
The software design incorporates a four-step image processing algorithm, as shown in Fig.
3.10. This algorithm is incorporating multiple techniques to detect the workpiece, define the ROI,
extract the hole locations and accurately position the machine based on the calculated distance
between the peg launch zone and the extracted holes. To accomplish these tasks, the algorithm
extensively utilizes the OpenCV library [61], a popular CV library known for its comprehensive
image processing capabilities. The algorithm follows the following sequential steps.
48
Figure 3.10. Overview of the image processing and machine positioning algorithm. Red circle:
Detected closest circle; Red dot: Centre of the detected closest circle; Green “×” mark: Target
point.
49
In the first step of the image processing algorithm, the system remains in a standby state until
the side-view camera, which provides a comprehensive view of the workspace, detects the
presence of the workpiece. To achieve this, the YOLOv5 object detector [62] is employed, and a
customized model is trained using PyTorch (Version 1.10.1 + CU113). This model is specifically
The dataset used for model training comprises 697 raw images of the workpiece. To improve
the training process and enhance the model’s ability to generalize, data augmentation techniques
are applied. These techniques introduce variations and diversity into the dataset, resulting in a total
of 7520 augmented images. The augmentation process includes transformations such as cut out,
sharpen, flip and darken, as shown in Fig. 3.11. This process creates a more comprehensive dataset
To evaluate and validate the trained model, the augmented dataset is divided into three subsets:
train, validation and test datasets. The train subset is the largest, containing 6893 images, and is
used as the primary dataset for training the model. The validation subset consists of 313 images,
which are utilized during the training process to assess the model’s performance, fine-tune its
parameters and prevent overfitting. Finally, the test subset comprises 314 images, serving as an
independent dataset for evaluating the final performance and generalization ability of the trained
model.
50
Figure 3.11. Data augmentation for YOLOv5 model training.
To facilitate the training process, the YOLOv5l model is selected as a pretrained model,
providing a strong foundation for further training. Several parameters are optimized to ensure
efficient training within the available GPU memory. The training is conducted over 100 epochs,
with a batch size of 13 and an image size of 416. The box loss specifically gauges the algorithm’s
ability to accurately locate the centre of an object and ensure that the predicted bounding box
adequately encompasses the object. The objectness loss measures the probability of an object’s
existence within a proposed ROI, providing valuable insights into the likelihood of an image
window containing an object. The classification loss evaluates the algorithm’s capability to predict
the correct class of a given object. During the training process (Fig. 3.12), convergence is observed
in all the loss metrics, including objectness, box and classification losses. These losses gradually
51
approach a value close to 0, indicating that the algorithm effectively locates the centre of objects,
accurately predicts bounding boxes and correctly classifies objects within the proposed regions of
interest. Notably, there is a rapid decline in these losses until approximately epoch 25, showcasing
substantial improvement in the training and validation datasets. These metrics serve as essential
indicators of the model’s accuracy and ability to correctly identify and classify objects. As a result,
the model trained in this stage is considered apt for incorporation into the assembly system and is
subsequently integrated.
52
Figure 3.12. Custom YOLOv5 model training results.
53
The integration of the object detection model into the assembly system allows for the detection
of the workpiece as soon as it enters the field of view of the side-view camera. Upon detection, the
system proceeds to draw a boundary box around the identified object using the custom trained
YOLOv5 model. However, the system does not immediately progress to the next stage of the
assembly process. It instead invokes a placement verification step to confirm the workpiece's
positioning within the designated working zone. During this verification, the system observes the
workpiece for a continuous span of four seconds. This duration affords the operator sufficient time
to position the workpiece on the workspace subsequent to its initial detection by the system.
In the second step of the image processing algorithm, the machine performs necessary
movements to bring the workpiece to the location of the peg insertion tool. This step also involves
an image cropping process to prepare future steps. Prior to image cropping, the following
algorithm is implemented to centre the detected workpiece with respect to the peg insertion tool
location:
i. The algorithm establishes the relationship between the image coordinate system
ii. Based on the peg insertion tool location, a target point is defined, and the centre
iii. The target and centre points are then converted from the image coordinate system
v. The machine is moved until the target and centre points overlap, ensuring the
54
From the perspective of the side-view camera (Fig. 3.13), the vertical position of the detected
workpiece is assumed to be fixed. Only the horizontal position within the image coordinate system
is taken into account for the centring process, based on the workpiece location. The object detector
returns the location of the four corners of the boundary box. The workpiece centre is computed by
summing the coordinate value of all four corners and dividing it by four. Then, the algorithm
calculates the displacement in pixels by subtracting the target location from the centre location of
the workpiece. For precision, measurements from five frames are captured. The centre location
closest to the target position is chosen for path creation. Prior to initiating machine movements,
Figure 3.13. Visual representation of the assembly workspace and the workpiece as captured by
the side-view camera.
55
Figure 3.14. Pixel to mm conversion.
𝐹𝑂𝑉
𝑑 (𝑚𝑚) = 𝑐𝑑 ∗ tan ( )
2
(3-1)
𝑚𝑚 2𝑑
=
𝑝𝑖𝑥𝑒𝑙 𝑇𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑝𝑖𝑥𝑒𝑙𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑎𝑥𝑖𝑠
where d is the distance from the center of the captured image to its edge, measured in
millimeters. cd is camera distance, denoting the gap between the workpiece and the camera. FOV
is field of view, which is the visible scene captured by the camera lens. The cd is determined by
the mount system demonstrated in Fig. 3.2. For the side-view camera, the cd is continuously
verified using the depth measuring feature based on stereography. The FOV is provided in the
camera datasheets as listed in Table 3.1. Additionally, the camera datasheets specify the total
number of pixels in the vertical and horizontal axes. For instance, in the case of 1080p resolution,
the vertical and horizontal axes utilize 1080×1920 pixels, respectively. By utilizing these
parameters, including cd, FOV and pixel dimensions from the camera datasheets, the algorithm
note that separate conversion magnitudes are utilized for the side-view and top-view cameras,
56
tailored to their specific parameters. This consideration accounts for the differing characteristics
of each camera and ensures precise and reliable measurements in the respective camera views.
Upon obtaining the precise measurements using the aforementioned technique, the assembly
system proceeds to send commands to the microcontroller and the machine for executing the
required movement. Once the movement is completed, the vision system enters a verification
phase to assess the alignment accuracy. This verification is achieved by comparing the position of
the object centre with the target location. To establish a feedback loop and ensure precise
alignment, the system iterates through this process continuously until the object centre is within a
predefined threshold distance of the target location. In this particular case, the threshold value is
set at 0.8 mm, indicating that the system aims to achieve a high level of accuracy, with the object
Once the positioning is completed, the system crops the image by setting the boundary box as
the ROI. The background is replaced with a black image of the same size. This step prevents the
background information from being processed in subsequent image processing steps while
maintaining the correspondence between the machine and image coordinate systems.
In the third step of the image processing algorithm, the holes are extracted, and the machine
is positioned and prepared for peg insertion. The image processing techniques are applied
sequentially in the following order: Contrast Limited Adaptive Histogram Equalization (CLAHE),
Grayscale Conversion, Canny Edge Detector and Hough Circle Transform [63, 64, 65].
The workpiece used in the assembly process is made of a reflective metallic material, which
often results in the appearance of light marks on its surface. These light marks can be similar in
colour to the white pegs, occasionally obstructing the algorithm’s ability to detect the holes when
57
they are assembled with white pegs. To address this issue, CLAHE is employed. CLAHE improves
the contrast of the image based on the local grayscale distribution, making the edge boundaries of
the holes more distinctive while minimizing the impact on other regions of the image [63].
Next, the image is converted to grayscale as a prerequisite step for edge detection. Grayscale
conversion simplifies the image representation by eliminating colour information while retaining
the necessary intensity values. Then, the Gaussian blurring is applied to the input grayscale image,
which aims to smooth the image [66]. The Canny Edge Detector is then applied to the grayscale
image. This edge detection technique uses the calculus of variations to convert the image into a
binary image, highlighting the edges only [64, 67]. By emphasizing the edges, the algorithm can
The image, post-processing via the Canny Edge Detector, undergoes evaluation using the
Hough Circle Transform to identify circles. The Hough Circle Transform is a specialized algorithm
devised for circle detection in images. For every edge pixel from the input image, probable circle
centers are deduced based on a designated range of radii in conjunction with the pixel's gradient
direction. Votes are accumulated in a 3D matrix for these potential centers and their respective
radii. Predominant vote accumulations in this matrix denote probable circle centers with their
associated radii. By specifying a radius range, the algorithm narrows its focus to circles within that
size range, bolstering both computational efficiency and precision of detection [65, 68]. The side-
view and top-view cameras may require different parameter values to achieve the better accuracy
in circle detection. Therefore, individualized parameters are designated for each camera. By
implementing the Hough Circle Transform using optimal parameters, the algorithm isolates circles
from binary images, providing circle data for subsequent stages in the assembly process.
58
Recent studies have explored the use of ML-based techniques for hole detection, which offer
high accuracy with reduced false positives [8]. However, these techniques tend to be more complex
and computationally intensive compared to the image processing algorithm proposed in this study.
The algorithm presented in this study, which includes image cropping, CLAHE and Hough Circle
Transform, achieves a high level of accuracy while imposing a lighter computational load.
After finding circles with the Hough Circle Transform, the algorithm calculates the distance
between each circle and the target point, following a similar methodology as described in step 2.
To enhance precision, data from two consecutive frames are factored into this computation.
Ultimately, the machine adjusts its position until the nearest identified hole aligns with the target
point. However, it is crucial to acknowledge the presence of potential uncertainties in the final
positioning due to the inherent vibrations that may cause shifts in the relationship between the
image and machine coordinate systems, due to the linkage limitation of the side-view camera as
discussed in Section 3.2.3. These vibrations can introduce variations and compromise the accuracy
of the positioning, thus introducing a level of uncertainty in the assembly process. In response to
this challenge, step 4 is introduced wherein the algorithm is applied to the top-view camera, which
offers improved accuracy and provides reliable data for the assembly process.
In step 4, the algorithm seamlessly transitions to the top-view camera, which is positioned
inside the peg insertion tool to provide a direct and unobstructed view of the assembled hole. This
configuration ensures a stationary relative position between the top-view camera and the peg
launch zone, reducing the impact of uncertainties compared to the side-view camera, which
involves multiple links in between. Similar to the side-view camera, the top-view camera
undergoes grayscale conversion, Canny edge detection and Hough Circle Transform techniques
59
for hole extraction. These image processing techniques enable the identification and extraction of
the holes with high accuracy. The relative position between the closest hole and the newly defined
target point for the top-view camera is calculated. To facilitate this calculation, a new relationship
between the top-view camera’s image coordinate system and the machine coordinate system is
established as mentioned in step 2, enabling the conversion of the relative position to the machine
coordinate system.
Once the relative position is determined, the machine adjusts its position until the hole aligns
with the target point. For the top-view camera, the data from three frames are used to find the
closest circle to the target location. For this step, the alignment threshold value of 0.3 mm is
adopted. After the alignment, the glue application, peg launch and the two assembly mechanisms
described in Section 3.1.1 are activated to complete the hole assembly process.
To address potential failures in the peg assembly process, an additional verification step is
implemented using the top-view camera. This verification step becomes crucial, especially
considering the tight tolerance requirement of 200 µm. When an insertion failure occurs, the peg
remains in the peg insertion zone directly beneath the top-view camera, obstructing its view, as
illustrated in Fig. 3.15. To overcome this challenge, the verification process focuses on detecting
circles after the assembly mechanisms have been executed. If the top-view camera fails to detect
any circles, it indicates that the peg has not been successfully inserted, despite the alignment being
correct. Leveraging this characteristic, the system repeats the assembly mechanisms until circles
are detected by the top-view camera once again. However, if the elapsed time during this process
exceeds eight seconds, indicating a prolonged unsuccessful attempt, the system takes an alternative
approach. It clears the peg from the peg launch zone and proceeds to repeat the entire sequence,
60
starting from circle alignment and going to glue application, peg launch and two assembly
mechanisms. This allows sufficient time for the glue to cure without the peg being present. By
incorporating this additional verification step and implementing the necessary adjustments in case
of prolonged failures, the system enhances the reliability and accuracy of the peg assembly process,
ensuring the successful and secured assembly of the peg within the specified tolerance.
After the successful completion of the verification step, the workpiece undergoes rotation to
orient the tool toward the next hole, followed by repetition of steps 3 and 4. This iterative process
continues until all the holes in the workpiece have been effectively assembled.
The integration of the top-view camera plays a crucial role in this assembly procedure. By
capturing the necessary visual information and executing the sequential steps outlined earlier, the
algorithm ensures the accurate detection of hole locations and precise alignment. This integration
61
of the top-view camera, combined with the systematic execution of the assembly steps, results in
Overall, this approach enables the automated assembly system to consistently and accurately
assemble the workpiece, providing a high level of reliability and efficiency in the assembly
process.
3.3. Summary
In this chapter, the image processing techniques employed in the automated assembly system
for precise hole detection and alignment are discussed. The system utilizes a combination of CV
algorithms and machine control to achieve accurate and reliable assembly of workpieces.
The chapter began by introducing the image processing steps involved in the detection and
extraction of holes. The process starts with capturing images from a side-view camera and a top-
view camera. The captured images are then processed using various techniques to identify the
holes accurately.
The first step in the image processing algorithm involves obtaining the gradient magnitude
and direction using the Canny edge detection algorithm. The gradient magnitude represents the
rate of change of pixel values in the image and helps identify areas with significant intensity
variations, corresponding to edges. The gradient direction is quantized into four angles,
After obtaining the gradient magnitude and direction, non-maximum suppression is applied
to eliminate redundant pixels that do not represent edges. This step ensures that only the pixels
corresponding to the sharpest change in intensity, the true edges, are preserved. Hysteresis
62
thresholding is then used to classify pixels as sure edges or non-edges based on their intensity
By employing these image processing techniques, a binary edge image is generated, where
white pixels represent detected edges, and black pixels represent non-edges. This binary edge
image provides a precise representation of the edges in the original image, enabling subsequent
The next step in the algorithm involves applying the Hough Circle Transform technique to
detect circles in the image. This technique utilizes specific parameters such as the minimum
distance between circle centres, edge intensity threshold and circle radius range to identify and
extract circles accurately. Separate lines of code are used for the side-view and top-view cameras,
The integration of the top-view camera is particularly crucial in the assembly procedure
because it offers an unobstructed view and reduces uncertainties associated with vibrations.
Similar to the side-view camera, the top-view camera undergoes grayscale conversion, Canny edge
detection and Hough Circle Transform techniques for hole extraction. The relative position
between the closest hole and the target point is calculated, facilitating precise alignment.
To address potential failures in the peg assembly process, an additional verification step is
implemented using the top-view camera. This step focuses on detecting circles after the assembly
mechanisms have been executed. If no circles are detected, indicating a failed insertion, the system
repeats the assembly mechanisms until circles are detected again. If a prolonged unsuccessful
attempt occurs, the system clears the peg and restarts the entire sequence, ensuring reliable
assembly.
63
The chapter concludes by emphasizing the reliability and efficiency achieved through the
integration of the top-view camera and the systematic execution of assembly steps. The
combination of CV algorithms and machine control enables consistent and accurate assembly of
workpieces, ensuring successful peg insertion within specified tolerances. Overall, the image
processing techniques discussed in this chapter provide a robust foundation for automated
assembly systems, facilitating precise hole detection, alignment and reliable assembly processes.
64
Chapter 4. Vision-Based Quality Inspection
The goal of the proposed QI system is to detect positive defects, specifically unfilled holes,
during the assembly process. To achieve this, the QI system utilizes the hardware system, various
image processing techniques and the side-view camera integrated into the assembly system, as
described in Chapter 3. In this chapter, we evaluate two QI methods: a statistical method and ML-
based methods.
Both approaches directly require the cropped image of the hole as their input. Thus, this
chapter includes a prerequisite step, the hole-cropping algorithm, which identifies and extracts the
hole images for further analysis. Statistical method is a novel approach developed specifically for
this hole assembly process, utilizing traditional CV techniques. We subsequently compare this
method’s performance and feasibility with ML-based methods, including traditional CNN and
ResNet architectures. The comparative analysis aims to determine the most effective and suitable
The hole-cropping algorithm is explained in Fig. 4.1. The objective of this algorithm is to crop
out the hole images from the assembly image while it is being assembled. Traditional image
processing techniques including grayscale conversion, Canny Edge Detector and Hough Circle
Transformation are first used in the process to detect the circles similar to the process elaborated
in Section 3.2.4. However, these techniques often detect false positives, which will always produce
the wrong results once input into the QI methods. Hence, they should always be filtered before the
65
QI predictions. The false positives cause more detrimental effects in QI compared to the assembly
process. The assembly process receives continual update from the real-time view of the workspace,
hence it can self-correct and mitigate the effect of the false positives from updating frames.
However, the QI system takes in a single image as an input to make the predictions. Hence, it does
not possess the ability to self-correct. To resolve this problem, the hole-cropping algorithm is
developed.
Fig. 4.1 presents the hole-cropping algorithm, designed to extract hole images from the
assembly image during the assembly process. The algorithm utilizes traditional image processing
techniques, such as grayscale conversion, Canny edge detection and Hough circle transformation,
similar to the assembly system image processing process detailed in Section 3.2.3. However, these
techniques often generate false positives, which can lead to inaccurate results when fed into the QI
methods. As the assembly process continuously updates from views of the workspace, it can self-
correct and mitigate the impact of false positives in updating frames. Conversely, the QI system
takes a single image as input for predictions and lacks the ability to self-correct. To address this
66
Figure 4.1. Hole-cropping algorithm.
i. Five images of the rotating workpiece are acquired, each with an approximate 1° rotation
ii. Holes are detected in all images using image processing techniques.
iii. Holes located outside the new ROI and those exceeding the threshold aspect ratios (defined
as 0.75:1 and 1.25:1) are filtered out. The new ROI encompasses regions that provide
undisturbed and undistorted hole images, determined based on their relative position to the
centre of the boundary box. Holes with distorted aspect ratios, falling outside this range,
image.
v. Holes with a distance exceeding the threshold distance (equivalent to 1° rotation) to the
closest hole in the preceding image are eliminated. For instance, if a hole is detected in the
second image, it is linked to the closest circle in the first image. If the relative distance
between two holes exceeds the 1° equivalent distance, both circles are filtered out.
However, if the relative distance is less than or equal to 1°, the circle in the second image
vi. Steps iii–v are repeated until all acquired images are processed.
vii. The remaining holes from the processed images are cropped and subjected to the QI
methods.
Through this process, the holes are effectively detected and cropped during the assembly
process using traditional image processing techniques. By applying rigorous filtering, it eliminates
false positives and ensures accurate hole identification. The algorithm’s iterative approach and
new ROI creation and aspect ratio thresholding result in reliable data for QI predictions. This
enhances the QI system’s performance, making it a valuable tool for quality control in assembly
The statistical method capitalizes on variations in light reflection intensity levels observed at
the holes during assembly (negative defects) or when they are not assembled (positive defects).
68
I = E h( i , i , r , r ) cos i (4-1)
where I represents the light reflection intensity level, E denotes the light source intensity,
(θi,ϕi) and (θr,ϕr) represent the incident angle and the viewing angle relative to the surface normal,
In the context of our assembly process, the workpiece is constructed from magnesium, which
exhibits a high value for surface reflectance (ρ). Additionally, the unassembled holes in the
workpiece create three additional surfaces with different surface normals. Consequently, the light
is reflected in multiple directions when the holes are unassembled, leading to rapid fluctuations in
the intensity level within the hole region. Conversely, in the case of the assembled workpiece, the
holes are filled with ceramic pegs that possess significantly lower ρ values. As a result, light is
primarily reflected in the direction normal to the workpiece surface, leading to a more consistent
The high ρ value of magnesium and the presence of multiple surfaces contribute to the distinct
variation in light reflection intensity observed in the unassembled holes, making them identifiable
as positive defects. Conversely, the reduced ρ value and uniform reflection in the direction normal
to the workpiece surface make the assembled holes distinguishable, indicating negative defects in
the assembly process. By exploiting these variations in light reflection intensity, the statistical
method offers a viable approach for accurate defect detection and QI during the assembly process.
The statistical method encompasses various image processing steps to capture and
mathematically express specific behaviours, as depicted in Fig. 4.2. Initially, the Sobel filter is
employed to smoothen the hole image and eliminate noise. Unlike its typical use for edge
69
detection, we adapt the Sobel filter with a box blur kernel to prioritize noise removal over edge
sharpening. This adjustment is significant, as modifying filter component values might influence
subsequent analyses, specifically changing the attributes at the boundary between areas of varying
intensity levels within the hole image. The resulting gradient array computes the gray value
fluctuations numerically, with edges of the hole exhibiting higher magnitudes due to intensity
discontinuities [70]. Because our method focuses on measuring intensity fluctuations in the hole
images, these edge gradient values carry significant information and cannot be disregarded.
However, to emphasize inner region information and eliminate edge effects, we remove the first
70
Figure 4.2. Statistical method.
To characterize the intensity fluctuation in the hole images quantitatively, we calculate the
standard deviation (σ) of the resulting gradient values. Fig. 4.3 illustrates exemplary results
obtained using the statistical method based on the hole status. When white or black pegs are
inserted, the gray value graph of negative defect holes remains generally consistent within the
cropped region, with changes limited to the ±10 range. Consequently, their gradient values remain
close to 0. Conversely, positive defect holes exhibit greater fluctuations in the gray value graph,
with a significant leap corresponding to the boundary between the shaded and unshaded area
71
within the hole. This results in an unstable behaviour throughout the gradient graph, with a sharp
spike evident at the shaded area. In summary, negative defect holes yield σ values close to 0,
whereas positive defect holes produce σ values greater than 1. Hence, the statistical method
effectively classifies positive and negative defects based on the σ values, enabling accurate defect
Figure 4.3. Statistical method results based on hole status. (a) Negative defect - white peg
inserted. (b) Negative defect - black peg inserted. (c) Positive defect.
The threshold σ value (σthreshold) denotes the specific numerical value utilized to differentiate
positive and negative defects. To determine σthreshold, the σ values of 100 hole images for each hole
status are computed, and their corresponding maximum and minimum values are identified, as
presented in Table 4.1. A negative defect hole assembly is defined as one that does not impede the
workpiece's rotation on the system's rotary roller. An obstruction is any halt or bump caused by a
peg that is only partially inserted, which may protrude and interfere with the rotary roller. Thus,
only the assembled workpieces that rotate without such obstructions are used to source negative
72
defect hole images. Contrastingly, a positive defect hole assembly is defined by its vacancy, which
permits the insertion of an additional peg to conform to the negative defect criteria previously
described. The minimum σ value for positive defect holes (1.05) exceeds the maximum σ value
for negative defect holes (0.72). To establish a balanced threshold, the mean of the minimum and
Table 4.1. Minimum and maximum of hundred σ values for each hole status.
Hole status Minimum σ Maximum σ
Consequently, during hole analysis, a hole is categorized as a positive defect if its σ value
surpasses σthreshold, and as a negative defect if its σ value is equal to or less than σthreshold:
It is important to emphasize that σthreshold is determined based on the maximum value that
accounts for both white and black peg-inserted negative defects, solely focusing on the
73
4.3. Machine learning-based method
CNN has gained widespread adoption as a powerful algorithm for QI [71, 72, 73]. Its success
in various CV tasks can be attributed to its ability to effectively extract essential features from
input images. A typical CNN consists of multiple convolutional layers, followed by a neural
network component. The convolutional layers play a critical role in feature extraction by applying
filters to the input images, enabling them to capture intricate patterns and distinctive characteristics
relevant to the inspection process. Furthermore, the inclusion of pooling layers between
convolutional layers is instrumental in reducing the activation map size of the output data. Pooling
helps in down-sampling the feature maps, facilitating more efficient processing and contributing
to the network’s ability to handle larger and more complex datasets. During the feature extraction
process, the convolutional layers transform the extracted values into nonlinear values through
activation functions. This nonlinearity introduces essential degrees of freedom to the model,
enhancing its capacity to learn and distinguish between intricate patterns in the inspection data
[46].
The use of CNN for QI is justified by its capability to automatically learn and adapt to the
specific features and patterns related to the inspected items. This makes CNN a versatile and robust
algorithm for tasks such as defect detection, classification and other quality assessment
applications. Additionally, its ability to handle large amounts of data and its capacity for deep
learning make CNN a suitable choice for addressing complex inspection challenges and achieving
high accuracy in quality control processes. As a result, CNN has become a popular choice for
researchers and practitioners in various industries seeking reliable and efficient QI solutions [44].
74
For this thesis, the CNN architecture shown in Fig. 4.4 is utilized. A hole image, as shown in
Fig. 4.5, is extracted from the hole-cropping algorithm and resized to 300×300 for training. The
CNN output is of Boolean data type, where “true” denotes positive defect and “false” denotes
negative defect holes. Leveraging the capabilities of CNN in learning intricate patterns and
distinguishing between defects, this architecture is well-suited for addressing the specific QI
75
4.3.2. Residual network
ResNet, another widely recognized ML algorithm, has been extensively studied and proven
effective in various research works [15, 60, 74]. ResNet is built upon the CNN structure but
introduces skip connections that directly connect inputs to the back layers. This unique architecture
allows the back layers to directly learn the residuals, resulting in more efficient training and
For this thesis, the ResNet50 model, as illustrated in Fig. 4.6, is utilized to extract the features
from hole images. The model is trained using hole images of size 300x300, similar to the sample
images displayed in Fig. 4.5. Upon processing a hole image, the ResNet50 model produces a 1x2
size array as its output. Each element in this array represents a class in the model, specifically
4.4. Summary
In this chapter, the vision-based QI system is presented, with the primary objective of
detecting positive defects, specifically unfilled holes, during the assembly process. The QI system
utilizes the hardware system, various image processing techniques and the side-view camera
76
integrated into the assembly system for defect identification. Two distinct QI methods are
evaluated: the statistical method and ML-based methods, including CNN and ResNet.
The hole-cropping algorithm serves as a prerequisite step to extract and identify hole images
from the assembly images. Traditional image processing techniques, such as grayscale conversion,
Canny Edge Detection and Hough Circle Transformation, are employed in this algorithm. By
effectively filtering false positives and generating reliable data, the algorithm contributes to
The statistical method capitalizes on variations in light reflection intensity levels observed at
the holes during assembly and when not assembled. By quantifying the intensity fluctuations using
the standard deviation (σ) of gradient values, positive and negative defect holes are effectively
classified. The threshold σ value (σthreshold) determines to differentiate between positive and
In contrast, the ML-based methods, CNN and ResNet, demonstrate their efficacy in feature
extraction and defect classification. CNN’s ability to learn intricate patterns and handle complex
datasets, along with ResNet’s unique skip connections and efficient training, make them suitable
By combining the strengths of both the statistical method and ML-based methods, the vision-
based QI system provides a comprehensive approach to defect detection in the assembly process.
The application of these methods facilitates accurate identification and classification of defects,
enhancing the quality control process and ensuring the production of high-quality assembled
products.
77
Chapter 5. Results and Discussions
This chapter presents the comprehensive results obtained from the implementation and
evaluation of the proposed automated peg-in-hole assembly system prototype and the QI system.
The previous chapters have discussed the design considerations, software development and image
processing techniques employed to achieve accurate and reliable hole detection and alignment. In
addition to the assembly process, this chapter explores the outcomes of the QI system, which plays
a critical role in assessing the quality and accuracy of the assembled workpieces. The QI system
algorithms to verify the alignment and securing of each peg within the hole. The presented results
not only focus on the successful execution of the assembly process but also highlight the
a more efficient and precise manufacturing workflow. Throughout this chapter, an analysis and
discussion of the obtained results describe the system’s performance, addressing any challenges
encountered during the testing phase and proposing potential refinements to optimize both the
assembly and QI processes. By critically examining the system’s strengths and limitations, this
chapter aims to provide valuable insights toward the advancement of automated assembly systems,
78
5.1. Assembly system results
The proposed assembly system is evaluated with a focus on assessing its robustness and
performance under various light settings. A total of 100 assembly trials are conducted, with each
light setting tested 20 times, resulting in a diverse range of lighting conditions achieved by
manipulating the lamp’s brightness level and controlling the room light environment, as illustrated
in Fig. 5.1. Throughout the experiments, a metallic surface workpiece (Fig. 8a) and white pegs are
consistently employed to ensure uniformity and consistency in testing conditions. Each assembly
trial consisted of one workpiece assembly. Therefore, it was recorded as a successful assembly
when all 20 holes of the workpiece was assembled. Hence, a total of 2000 hole assemblies is
attempted during the 100 assembly trails. Thus, over the course of the 100 trials, a total of 2000
hole assemblies are attempted. Following each assembly, the workpiece underwent a complete
rotation on the machine. A trial's success was contingent upon this rotation proceeding without
hindrance from any protruding pegs, in adherence to the definitions of negative and positive
defects delineated in Section 4.2. This evaluation aimed to examine the system’s ability to operate
effectively under real-world scenarios and its performance across varying lighting conditions.
Figure 5.1. Assembly experimental light settings (a) Room. (b) Room + lamp brightness Level 1.
(c) Room + lamp brightness Level 2. (d) Lamp brightness Level 1. (e) Lamp brightness Level 2.
79
The key metrics analyzed include the success rate, representing the proportion of successful
assembly trials, and the assembly rate, measuring the average time required for a single workpiece
assembly. The results, presented in Table 5.1, demonstrate a high level of robustness and reliability
in the system’s operation, with consistent success rates and assembly times across all light settings.
The overall success rate achieved 99%, signifying the system’s capability to consistently achieve
accurate and reliable assembly under different lighting conditions. Additionally, the system
exhibited an average assembly rate of 386.00 seconds per workpiece, highlighting its efficiency in
completing the assembly process. These findings underscore the system’s potential for industrial
applications, offering a reliable and efficient solution for automated assembly processes with good
success rates and assembly rate. In conclusion, the experimental evaluation presents crucial
insights into the system’s performance, validating its efficacy and suitability for practical
80
Table 5.1. Assembly system results on the light settings (Fig. 5.1) with 20 trials per setting. The
successful completion of an assembly trial is determined when all 20 holes on the workpiece are
filled with the pegs.
Light setting Success rate (%) Assembly rate
(sec/workpiece)
Additional experiments are conducted to assess the adaptability of the assembly system using
different types of workpieces, as shown in Fig. 5.2. The first variation involves a metallic
workpiece wrapped in black vinyl film (Fig. 5.2b). To ensure the tolerance of the holes and pegs,
the black vinyl film is removed around the holes. The second variation comprises a workpiece 3D
printed using blue PLA filament (Fig. 5.2c). Each type undergoes 20 experiments under identical
light settings of room + lamp brightness level 2, as shown in Fig. 5.1c. Similar to previous
experiments on varied lighting conditions, a trial's success is contingent upon the complete
insertion of pegs into all 20 holes of the workpiece without any interference during a full machine
rotation. The results, presented in Table 5.2, indicate a success rate of 90% and above for all
workpiece types. However, there are notable differences in the assembly rate among the variations,
81
with the metallic surface exhibiting the fastest rate of 386.00 seconds/workpiece, followed by the
surface wrapped in black vinyl film at 420.00 seconds/workpiece, and the 3D printed workpiece
at 493.00 seconds/workpiece.
Figure 5.2. Experimented workpiece types (a) Metallic surface. (b) Metal surface wrapped in
black vinyl film. (c) 3D printed with blue PLA plastic filament.
Table 5.2. Assembly system results on the workpiece types (Fig. 5.2) with 20 trials per type. The
successful completion of an assembly trial is determined when all 20 holes on the workpiece are
filled with the pegs.
(sec/workpiece)
Throughout all experiments, no programming errors are observed, demonstrating the system’s
stability and reliability. The system proved capable of continuous operation for more than two
82
consecutive days without requiring any system restarts, further affirming its robustness and
The proposed assembly system is tailored to efficiently assemble metallic surface workpieces
with fixed characteristics, such as cylindrical geometry and specific hole and peg dimensions and
colours. However, the system’s adaptability to different workpiece properties is evident from the
results obtained. Although optimized for metallic surfaces, the system can effectively detect
workpieces with diverse characteristics and accurately extract hole locations using image
processing techniques. The system has demonstrated high success rates when assembling variation
types, including surfaces wrapped in black vinyl film and 3D printed workpieces with blue PLA
plastic filament. However, the optimization impact is observable in the assembly rate, with the
metallic surface achieving the fastest rate, followed by the surface wrapped in black vinyl film and
the 3D printed workpiece. The variation in assembly rates primarily results from the additional
time taken in the hole detection process for different workpiece types.
Adapting the assembly system to new workpieces with varying diameters or a different
number of holes is straightforward. This process involves adjusting specific parameters, including
the number of holes, workpiece diameter, side-view camera target point and tool height.
Modifications to the peg insertion tool, including adjustments to its dimensions and circle detection
parameters, are required if the dimensions of the hole and peg change. Additionally, changes in
dealing with different surface geometries, the design of a new fixture and re-design of the machine
83
positioning algorithm are necessary. The presented prototype is designed for components with
dimensions not exceeding 300×180×120 mm. For larger components, the system can be adapted
by utilizing a machine with a larger work area. The versatility of the assembly system allows it to
surface workpieces with fixed characteristics, including cylindrical geometry and specific hole and
peg dimensions and colours. However, the system’s adaptability to different workpiece properties
is evident from the obtained results. Despite being optimized for metallic surfaces, the system
effectively detects workpieces with diverse characteristics and accurately extracts hole locations
using image processing techniques. The system demonstrates high success rates in assembling
variation types, encompassing surfaces wrapped in black vinyl film and 3D printed workpieces
with blue PLA plastic filament. However, the optimization impact is observable in the assembly
rate, with the metallic surface achieving the fastest rate, followed by the surface wrapped in black
vinyl film and the 3D printed workpiece. The variation in assembly rates primarily results from
the additional time taken in the hole detection process for different workpiece types.
In the event of introducing new workpieces with different diameters or varying numbers of
holes, the presented assembly system can be easily adapted for automated assembly. These
modifications involve adjusting specific parameters, including the number of holes, workpiece
diameter, side-view camera target point and tool height. Additionally, alterations to the peg
insertion tool, such as adjusting its dimensions and circle detection parameters, are necessary when
the dimensions of the hole and peg change. Furthermore, changes in workpiece material properties
84
require adjustments to the image processing algorithms. In cases of different surface geometries,
the design of a new fixture and a re-design of the machine positioning algorithm become necessary.
It is worth noting that the presented prototype is specifically designed for components with
dimensions not exceeding 300×180×120 mm. For larger components, the system’s adaptability
can be extended by utilizing a machine with a larger work area. The assembly system’s versatility
5.3. QI results
The ML models employed in this study are trained using a dataset comprising 2503 hole
images exclusively obtained from the metallic surface similar to images shown in Fig. 5.3a. Out
of these 2503 images, 500 are reserved as a validation dataset, of which 285 belong to the negative
defect class and 215 to the positive defect class. The training dataset comprises the remaining 2003
hole images, with 1141 from the negative defect class and 862 from the positive defect class. The
negative defect class is split evenly between images of black peg insertions and white peg
insertions. The criteria for selecting negative defect hole images in the dataset aligns with the
approach discussed in Section 4.2, ensuring that they do not impede the workpiece's rotation within
the system. The training process is conducted three times for each architecture, with distinct test
datasets utilized for evaluation. The test datasets are divided into three categories, corresponding
to different workpiece types: metallic surface, surface wrapped in black vinyl film and blue PLA
plastic surface, as illustrated in Fig. 5.3. Each test dataset encompassed 615 hole images, prepared
using the hole-cropping algorithm as elaborated in Section 4.1. Consistent light settings,
85
characterized by room + lamp brightness level 2 in Fig. 5.1, including the direction of the lamp
Figure 5.3. Hole image sample for QI testing. Samples are extracted from the following
workpiece types. (a) Metallic surface. (b) Metal surface wrapped in black vinyl film. (c) 3D
printed with blue PLA plastic filament.
86
It is important to note that no identical hole images are employed in both the training or test
datasets, nor in the images used to calculate the σthreshold in Section 4.2. For the training of the
ResNet50 model, a batch size of 64 and a learning rate of 1.0×10-3 are adopted for 128 epochs.
The traditional CNN utilized the same batch size and number of epochs, but with a decaying
learning rate with a factor of 0.2, patience set at 3 and a minimum learning rate of 1.0×10-4. The
training histories of the models are depicted in Fig. 5.4 for the metallic surface, Fig. 5.5 for the
metal surface wrapped in black vinyl film, and Fig. 5.6 for the 3D printed with blue PLA plastic
filament workpiece types, respectively. The final test accuracies recorded upon completion of
model training during cross-validation are used to represent the accuracy of each ML model. To
derive statistical test accuracies, each test dataset is inputted into the program separately, which
systematically analyzed each hole image and generated the prediction results for further
evaluation.
87
Figure 5.4. QI testing results on hole images from the metallic surface workpiece (Fig. 5.3a)
using different ML models. (a) Traditional CNN. (b) ResNet.
88
Figure 5.5. QI testing results on hole images from the metal surface wrapped in black vinyl film
workpiece (Fig. 5.3b) using different ML models. (a) Traditional CNN. (b) ResNet.
89
Figure 5.6. QI testing results on hole images from the 3D printed with blue PLA plastic filament
workpiece (Fig. 5.3c) using different ML models. (a) Traditional CNN. (b) ResNet.
90
Table 5.3 presents the experimental accuracy of the three QI methods discussed in this study.
Among them, the statistical method achieves the highest average experimental accuracy at 97.02%.
This outperforms the ResNet50 model, which achieves 93.98%, and the traditional CNN model at
89.65%. All methods demonstrate a perfect accuracy of 100% when tested on the metallic surface.
However, when tested on the black vinyl-wrapped surface workpiece, there is a decline in accuracy
for each method, though all still exceed 85%. The most significant drop in accuracy is observed
when testing on a blue PLA plastic surface. Here, the ResNet50 and traditional CNN models yield
test accuracies of 82.28% and 79.19% respectively, marking a decrease of 17.72% and 20.81%
from their performance on metallic surfaces. In contrast, the statistical method experiences a more
modest drop in accuracy, with a decline of only 8.13% when compared to its performance on the
metallic surface.
91
Table 5.3. Performance of three QI methods for each workpiece type (Fig. 5.3) with 615 hole
images per each type test. Identical hole images are used for each QI method.
Metallic surface
Statistical 100.00
ResNet50 100.00
Statistical 99.19
ResNet50 99.67
Statistical 91.87
ResNet50 82.28
Average
Statistical 97.02
ResNet50 93.98
92
To further evaluate the statistical method, sensitivity analyses are conducted. For these tests,
the same dataset of 615 hole images from the metallic surface workpiece (Figure 5.3a) is used, as
in the previous QI assessments. The σthreshold incrementally adjusts from 0 (σthreshold = 0.89) to ±
50% (σthreshold = 0.45, σthreshold = 1.33), with accuracies logged based on this dataset of 615 hole
images (refer to Table 5.4). In all instances, test accuracy surpasses 90%.
Table 5.4. σthreshold adjustment sensitivity analysis test results on the statistical method with 615
hole images per each adjustment.
σthreshold [%] Test accuracy (%) σthreshold (%) Test accuracy (%)
Moreover, testing occurs under various lighting conditions illustrated in Fig. 5.1, excluding
the room setting (Fig. 5.1a) due to its lack of the angled light source, the lamp. A fresh dataset,
also containing 615 hole images but representing each lighting condition, is used for this purpose.
Results are documented in Table 5.5, and consistently, the statistical method yields an accuracy
93
Table 5.5. Different light setting (Fig. 5.1) sensitivity analysis test results on the statistical
method with 615 hole images per each setting.
Light setting Test accuracy (%)
5.4. QI discussions
The study’s findings indicate that the statistical method consistently performs well across
different surface conditions, whereas the traditional CNN and ResNet50 architectures used exhibit
reduced adaptability. The QI methods evaluated in this study are optimized mainly for inspecting
metallic workpieces and black and white pegs, taking into account consistent characteristics in
terms of diameter, height, surface and material properties. Given this focus, hole images
exclusively from metallic surfaces are utilized both for calculating the σthreshold of the statistical
method and for training the ML models. As a result, all methods demonstrate high test accuracies
on both metallic and black vinyl-wrapped surfaces due to the similarities in hole characteristics.
Nonetheless, there is a discernible decline in accuracy for the black vinyl-wrapped surfaces, which
can be attributed to their differences compared to the purely metallic surfaces used for model
When it came to assessing the blue PLA plastic surface, which has unique colour, material
and reflective properties, all methods exhibited a marked reduction in accuracy. Yet, the statistical
94
methods maintained relatively high-performance levels. In contrast, the traditional CNN and
ResNet50 models saw a more pronounced decrease in accuracy. This reduction might stem from
overfitting, given the vast number of training parameters and the specificity of the dataset, which
primarily consists of hole images. Notably, the statistical method, being devoid of ML training
intricacies, successfully avoids such overfitting issues. Furthermore, the statistical method
consistently showcases test accuracies above 90% in all sensitivity tests, underscoring its
robustness.
In conclusion, our experimental findings underscore that the statistical method outperforms
both the ResNet50 and traditional CNN models in terms of average test accuracy. This finding
suggests that, within this specific context, the statistical method offers a more efficient and
effective solution. Unlike its counterparts, it eliminates the need for model training, sidestepping
the inherent complexities that often accompany ML-based methods. Moreover, the statistical
and data overload [75, 76], making it a well-balanced approach in terms of accuracy and
adaptability.
However, the statistical method does have some limitations. First, obtaining several sample
data is necessary to calculate the σthreshold, and the inspection accuracy heavily depends on it.
Additionally, finding the σthreshold can be challenging if the pegs and workpiece have similar
reflective properties. Second, a consistent directional light source, as shown by the angled lamp in
Fig. 3.3, must be provided to create shade gradients inside the holes, which is crucial for defect
classification.
95
5.5. Summary
In this chapter, a detailed exploration of the assembly system’s capabilities and adaptability is
presented. Initially tailored for metallic workpieces, the system demonstrates commendable
versatility by efficiently handling different workpiece variations, such as those wrapped in black
vinyl film and those 3D printed with blue PLA filament. Notably, the rate of assembly shows
variability based on workpiece type. An essential feature of the system’s robustness is its
uninterrupted operation capacity, running for over two consecutive days without the necessity of
For the QI methods, a deep dive into a statistical method that uses a series of image processing
techniques, and ML models, specifically the traditional CNN and ResNet50, is undertaken. These
methods are rigorously trained with a comprehensive set of hole images from metallic surfaces.
Among the methods evaluated, the statistical QI approach stands out, consistently outperforming
the ML models across different workpiece surfaces. This suggests the superior adaptability of the
statistical method, especially when the training dataset is specialized, as is the case in this study.
However, it is also worth noting that the statistical method comes with its inherent challenges,
particularly its dependency on substantial sample sizes for σthreshold calculation and its reliance on
consistent lighting.
In conclusion, this chapter highlights the system’s impressive adaptability across varied
workpiece types. When compared to its ML peers, the statistical QI method emerges as a more
balanced solution, offering superior accuracy and adaptability, despite certain limitations.
96
Chapter 6. Conclusions
This thesis aimed to develop an automated assembly and vision-based QI system for industrial
prototype, as well as exploring the feasibility and effectiveness of various assembly techniques
and image processing. Throughout the study, several key objectives were achieved, and important
findings obtained.
The development of an efficient image processing algorithm lies at the core of the automated
peg-in-hole assembly system. In this regard, the algorithm is designed to capture images from a
side-view camera and a top-view camera, serving as valuable inputs for the assembly process.
Utilizing a custom trained YOLOv5 object detection model, the images obtained from the side-
view camera underwent thorough processing to detect the workpiece and accurately determine its
position. To optimize computational efficiency, the side-view camera image was cropped,
eliminating redundant background information and enabling focused analysis on the workpiece
region. Additionally, the Canny edge detection algorithm was employed to calculate the gradient
magnitude and direction of the captured images from both cameras. This facilitated the
identification of critical features such as edges and circles that corresponded to the holes in the
workpiece.
the automated peg-in-hole assembly system was demonstrated through the construction of a
prototype. The integration of key components, including the peg insertion tool, three-axis machine
and rotary roller, played a crucial role in achieving precise positioning and seamless assembly of
97
the workpiece. The system exhibited exceptional capabilities in hole detection, peg alignment, glue
application and peg launching, effectively completing the assembly process with the average
success rate of 99 and 95% at different light settings and different workpiece types, respectively.
These results reinforce the potential and viability of the automated peg-in-hole assembly system
in industrial applications, providing a strong foundation for further research and development in
the field.
The vision-based image processing methods discussed in this thesis aim to address the
classification of positive and negative defects in assemblies produced by the automated assembly
system introduced earlier. Three distinct methods are presented, all of which rely on the proposed
hole image extraction process to obtain the necessary input hole images. The hole image extraction
process utilizes a combination of conventional image processing techniques, such as the Canny
edge detector and Hough circle transform, to detect and extract circular features. To mitigate the
occurrence of false positives, which are commonly encountered in conventional image processing
techniques, additional conditions such as boundary box and aspect ratio are introduced. These
conditions successfully generated cropped hole images containing only true positive defects,
Among the three methods proposed in this thesis, the statistical model represents a novel
approach that builds upon conventional image processing techniques. Initially, the input image
undergoes a series of preprocessing steps, including noise removal using the Sobel filter, grayscale
conversion and extraction of pixel values along the centre column. Leveraging the observation that
positive defects exhibit greater variations in pixel values due to light reflection and shading, the
98
gradient of the grayscale values along the centre column and subsequently calculating its standard
deviation. Through an iterative process involving 300 trials, a threshold value is determined, which
is then used to compare against the standard deviation value for prediction. The statistical method
achieved an average test accuracy of 97.02%, showcasing its effectiveness in defect classification.
Meanwhile, the traditional CNN and ResNet methods represent ML-based approaches for QI.
These architectures were selected due to their wide usage and proven effectiveness in image
classification tasks. Both models were trained on a dataset comprising 2503 hole images,
encompassing both positive and negative defects. The training process involved optimizing the
model parameters to minimize the classification error and maximize the accuracy of defect
identification. The traditional CNN and ResNet models achieved average test accuracies of
By incorporating these vision-based image processing methods into the automated assembly
system, a comprehensive framework for assembly and defect detection can be established. The
combination of statistical analysis and traditional CNN and ResNet models provides a diverse
range of approaches to cater to various scenarios and requirements. The achieved results highlight
the potential of these methods in accurately identifying and classifying defects, thereby enhancing
6.1. Contributions
This research work has made several significant contributions to the field of automated peg-
in-hole assembly and QI. The key contributions of this thesis are summarized as follows:
99
The system integrated several key components, including a side-view camera, a top-view
camera, a custom trained YOLOv5 object detection model, and a feature detection algorithm.
Through the integration of these components, the system achieved the automation of the assembly
process, enabling precise positioning and assembly of the workpiece. The system effectively
detected the holes, aligned the pegs and applied glue before launching the pegs into the
corresponding holes, resulting in successful assembly. Notably, the presented peg insertion tool
introduced a novel concept by incorporating peg storage and the sequential launch of one peg at a
time. This approach eliminated the need for conventional pick-and-place mechanisms, enhancing
ii. Hybrid image processing techniques for effectively executing the hole assemblies
techniques for automated assembly systems. The developed hole image extraction process
effectively utilizes conventional techniques, such as the Canny edge detector and Hough circle
transform, to detect and extract hole features. To enhance the accuracy of defect classification,
additional conditions are introduced to filter out false positives, resulting in the generation of
cropped hole images that accurately represent true positives. These enhancements improve the
overall effectiveness and reliability of the defect classification system, further contributing to the
This thesis presents three distinct methods for classifying positive and negative defects in
assembly products. The statistical model introduces a novel approach to defect classification by
100
leveraging conventional image processing techniques. The traditional CNN and ResNet models
provide ML-based approaches for QI. Through rigorous training and testing, these methods
demonstrate promising results, highlighting their potential for accurate defect identification. The
statistical model utilizes gradient and standard deviation calculations to capture pixel value
variations, achieving an average test accuracy of 97.02%. The CNN and ResNet models showcase
Though this research work has made significant progress in the development of the automated
peg-in-hole assembly system and the vision-based defect classification methods, it is important to
The workpieces and pegs are assumed to be defect-free components, adhering to specified
dimensions within a tolerance of ±0.1 mm. The surface finish across all workpieces and pegs is
also presumed to be consistent. Environmental conditions such as temperature and humidity during
the assembly operations are held constant and fall within typical industrial parameters, ensuring
no external influence on the material properties or the assembly process itself. Additionally,
potential external disturbances like vibrations that could affect the peg-in-hole assembly process
The assembly system is designed to accommodate various workpiece variations. For optimal
operation, specific parameters, such as the number of holes per workpiece, the number of rows of
holes, the displacement between rows, and any patterns associated with hole positions, must be
adjusted in the program to reflect these variations. In its present configuration, the system can
101
manage workpieces with a maximum diameter of 125 mm and a length of up to 150 mm. To
facilitate larger workpieces, there would be a need to expand the machine's dimensions.
Additionally, if pegs of different dimensions are to be used, the peg insertion tool must undergo
modifications. As it stands, the tool is specially designed to cater to the peg of a specific size. Any
deviation from this size would necessitate design adjustments to ensure seamless operation and
accurate assembly.
The performance of the image processing algorithms and defect classification methods
discussed in this research is dependent on the availability of a minimum light source. Adequate
illumination is essential to ensure clear and high-quality image capture, which is crucial for
accurate detection of workpieces, holes and defects. Insufficient lighting conditions can lead to
reduced image clarity, increased noise and inaccurate identification of assembly components.
Therefore, it is recommended to maintain a minimum light source level that meets the system’s
requirements to ensure optimal performance and reliable results. Future researchers can explore
techniques to enhance the system’s adaptability to varying lighting conditions and develop
The developed automated assembly system and the proposed defect classification methods are
implemented and evaluated using a prototype. The performance and robustness of the system may
vary when scaled up to an industrial-level machine. Further research and development are required
to address the challenges associated with scaling up the system, including the integration of
industrial motors, enhancing the reloader’s capacity and organizing the electrical wiring for
102
The defect classification models assume that all defects can be accurately represented and
identified through the proposed image processing and ML techniques. However, certain complex
defects or variations in the assembly process may not be adequately captured or classified by the
current models. For instance, no data on partially inserted pegs are provided, hence the models are
classifying between fully inserted pegs and completely empty holes. The performance of the defect
classification methods may be affected by the complexity and variability of defects encountered
such as cropping the side-view camera image and employing optimized algorithms, there are
potential areas for further improvement. This study did not include specific metrics to
requirements of the image processing and defect classification methods to ensure performance in
computer overheating because prolonged operation under demanding conditions may pose risks.
Future researchers should focus on optimizing these aspects, taking into account both efficiency
and system reliability, to enhance the overall effectiveness and practicality of the automated peg-
By recognizing these limitations and assumptions, future researchers can build upon this work
and address these challenges to advance the field of automated assembly and QI. These
103
considerations provide valuable insights into the practical implementation and further
The focus of this thesis has been the development of a lab-scale prototype for automated peg-
in-hole assembly, aimed at industrial applications. However, to achieve the ultimate goal of an
industrial-scale machine capable of robust, complete and reliable long hours of operation to meet
extensive research across various institutions. This dynamic and rapidly evolving field continually
introduces improved models each year, driven by advancements in deep learning, neural networks
and data augmentation methodologies. It is worth noting that the ML architectures utilized for QI
in this study represent fundamental models, and it is evident that numerous advanced models have
already surpassed their performance. Consequently, a significant aspect of our future work will
focus on exploring and implementing these cutting-edge techniques to enhance the QI system’s
capabilities.
Although both the assembly and QI systems have been developed independently as presented,
the integration of these two tasks into a single system remains pending. Additional algorithms are
required to enable simultaneous assembly and QI, including a mechanism for counting the number
To fulfill the project’s scope of an industrial-scale machine, suitable for installation and
a. Production of a metallic peg insertion tool for improved robustness and reliability.
b. Integration with a robotic arm and gripper to enable automatic placement of the
reliability.
time.
e. Addition of a gauge sensor to the reloader for automatic pausing and resuming of
Upon completion of the upgrades and production of the industrial-scale machine, validation
and optimization steps must be undertaken. This includes rigorous testing of the system’s
performance, assessing its robustness and reliability under various operating conditions, and fine-
105
By addressing these future work areas, the development of an industrial-scale automated peg-
in-hole assembly system will be advanced, providing a highly robust, reliable and efficient solution
106
References
[1] L. Zhao, T. R. Dunne, J. Ren and P. Cheng, "Dissolvable Magnesium Alloys in Oil
[2] J. Su, R. Li, H. Qiao, J. Xu, Q. Ai and J. Zhu, "Study on dual peg-in-hole insertion
[3] H. Park, J. Park, D.-H. Lee, J.-H. Park, M.-H. Baeg and J.-H. Bae, "Compliance-
[5] S. Son, H. Park and K. H. Lee, "Automated laser scanning system for reverse
[6] J. Beyerer, F. P. León and C. Frese, Machine Vision: Automated Visual Inspection:
107
[7] Health Canada, "Radiation Protection and Safety for Industrial X-Ray Equipment,"
Vegas, 2020.
[9] S. Wang, G. Chen, H. Xu and Z. Wang, "A Robotic Peg-in-Hole Assembly Strategy
Based on Variable Compliance Center," IEEE Access, vol. 7, no. 7, pp. 167534-167546,
2019.
compliant micro gripper for robotic peg-in-hole assembly," in 2013 IEEE International
pipe images," Automation in Construction, vol. 15, no. 1, pp. 58-72, 2006.
[12] X. Zheng, J. Chen, H. Wang, S. Zheng and Y. Kong, "A deep learning-based
approach for the automated surface inspection of copper clad laminate images," Applied
[13] S. A. Singh and K. A. Desai, "Automated surface defect detection framework using
108
[14] F. Wei, G. Yao, Y. Yang and Y. Sun, "Instance-level recognition and quantification
for concrete surface bughole based on deep learning," Automation in Construction, vol.
107, 2019.
[15] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image
[16] D. Wang and Y. Shang, "A new active labeling method for deep learning," in 2014
[18] J. Song, Q. Chen and Z. Li, "A peg-in-hole robot assembly system based on Gauss
[19] W. Deng, C. Zhang, Z. Zou, M. Gao, X. Wang and W. Yu, "Peg-in-hole assembly
of industrial robots based on Object detection and Admittance force control," in 2021
2021.
and A. Zehnder, "CCD and CMOS sensors," in Observing Photons in Space: A Guide
to Experimental Space Astronomy, New York, NY, Springer, New York, NY, 2013, pp.
423-442.
109
[21] K. C. P. Wang and X. Li, "Use of Digital Cameras for Pavement Surface Distress
Survey," Transportation Research Record, vol. 1675, no. 1, pp. 91-97, 1999.
[22] N. M. Tuan, Y. Kim, J.-Y. Lee and S. Chin, "Automatic Stereo Vision-Based
stereo-vision system," in Proc. SPIE 7390, Modeling Aspects in Optical Metrology II,
73900X, 2009.
[24] K. Schmid, T. Tomic, F. Ruess, H. Hirschmüller and M. Suppa, "Stereo vision based
Images by Using the Averaging Filter Name COV," in Intelligent Information and
[26] A. Jain and R. Gupta, "Gaussian filter threshold modulation for filtering flat and
[27] J. Pan, X. Yang, H. Cai and B. Mu, "Image noise smoothing using a modified
[28] M. Piovoso and P. A. Laplante, "Kalman filter recipes for real-time image
110
[29] Y. Zhang, J. Y. Fuh, D. Ye and G. S. Hong, "In-situ monitoring of laser-based PBF
via off-axis vision and image processing approaches," Additive Manufacturing, vol. 25,
[30] P. Sahoo, S. Soltani and A. Wong, "A survey of thresholding techniques," Computer
Vision, Graphics, and Image Processing, vol. 41, no. 2, pp. 233-260, 1988.
[31] V. Baligar, L. Patnaik and G. Nagabhushana, "Low complexity, and high fidelity
image compression using fixed threshold method," Information Sciences, vol. 176, no.
[32] R. P. Singh and M. Dixit, "Histogram Equalization: A Strong Technique for Image
[35] Z. Guo, L. Zhang and D. Zhang, "A Completed Modeling of Local Binary Pattern
Operator for Texture Classification," IEEE Transactions on Image Processing, vol. 19,
111
[36] F. Roberti de Siqueira, W. Robson Schwartz and H. Pedrini, "Multi-scale gray level
co-occurrence matrices for texture description," Neurocomputing, vol. 120, pp. 336-345,
2013.
[38] S. Hemalath, U. D. Acharya, A. Renuka and P. R. Kamath, "A Secure Color Image
[39] J. Seo, S. Chae, J. Shim, D. Kim, C. Cheong and T.-D. Han, "Fast Contour-Tracing
Algorithm Based on a Pixel-Following Method for Image Sensors," Sensors, vol. 16, p.
353, 2016.
[40] J. Illingworth and J. Kittler, "A survey of the hough transform," Computer Vision,
Graphics, and Image Processing, vol. 44, no. 1, pp. 87-116, 1988.
[41] R. K. Sabhara, C.-P. Lee and K.-M. Lim, "Comparative study of hu moments and
zernike moments in object recognition," SmartCR, vol. 3, no. 3, pp. 166-173, 2013.
[42] Y. Mingqiang, K. Kidiyo and R. Joseph, "A survey of shape feature extraction
Krpalkova, D. Riordan and J. Walsh, "Deep Learning vs. Traditional Computer Vision,"
[44] Z. Zou, K. Chen, Z. Shi, Y. Guo and J. Ye, "Object Detection in 20 Years: A
Survey," Proceedings of the IEEE, vol. 111, no. 3, pp. 257-276, 2023.
112
[45] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple
Systems, 2012.
One-stage and Two-stage Object Detection," Library Philosophy and Practice, pp. 1-
32, 2021.
[48] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for
[49] S. Ren, K. He, R. B. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time
[50] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan and S. Belongie, "Feature
[51] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified,
113
[52] S. Lu, B. Wang, H. Wang, L. Chen, M. Linjian and X. Zhang, "A real-time object
detection algorithm for video," Computers & Electrical Engineering, vol. 77, pp. 398-
408, 2019.
[53] M. J. Tsai, H.-W. Lee and N.-J. Ann, "Machine vision based path planning for a
[57] Z. Jin, Z. Zhang and G. X. Gu, "Autonomous in-situ correction of fused deposition
modeling printers using computer vision and deep learning," Manufacturing Letters, vol.
Ceramic Tiles Defect Detection," in IECON 2006 - 32nd Annual Conference on IEEE
114
[59] R. Ren, T. Hung and K. C. Tan, "A Generic Deep-Learning-Based Approach for
Automated Surface Inspection," IEEE Transactions on Cybernetics, vol. 48, no. 3, pp.
929-940, 2018.
[60] X. Feng, X. Gao and L. Luo, "A ResNet50-Based Method for Classifying Surface
systems for signal, image and video technology, vol. 38, no. 1, pp. 35-44, 2004.
Pattern Analysis and Machine Intelligence, Vols. PAMI-8, no. 6, pp. 679-698, 1986.
[65] R. O. Duda and P. E. Hart, "Use of the Hough transformation to detect lines and
curves in pictures," Communications of the ACM, vol. 15, no. 1, p. 11–15, 1972.
115
[68] J. Illingworth and J. Kittler, "The Adaptive Hough Transform," IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vols. PAMI-9, no. 5, pp. 690-698, 1987.
[69] H.-C. Lee, E. J. Breneman and C. P. Schulte, "Modeling light reflection for
[70] O. R. Vincent and O. Folorunso, "A Descriptive Algorithm for Sobel Image Edge
2009, 2009.
[72] W. Lihao and D. Yanni, "A Fault Diagnosis Method of Tread Production Line
[73] P. Wei, C. Liu, M. Liu, Y. Gao and H. Liu, "CNN-based reference comparison
method for classifying bare PCB defects," The Journal of Engineering, vol. 2018, no.
based on ResNet-50 for beef quality classification," Information Sciences Letters, vol.
116
[75] C. Gambella, B. Ghaddar and J. Naoum-Sawaya, "Optimization problems for
machine learning: A survey," European Journal of Operational Research, vol. 290, no.
3, 2021.
Learning With Big Data: Challenges and Approaches," IEEE Access, vol. 5, pp. 7776-
7797, 2017.
2023].
117