0% found this document useful (0 votes)
87 views

An Enhanced Detection Method of PCB Defect Based On Improved YOLOv7

Uploaded by

Mostafa A. Sliem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

An Enhanced Detection Method of PCB Defect Based On Improved YOLOv7

Uploaded by

Mostafa A. Sliem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

electronics

Article
An Enhanced Detection Method of PCB Defect Based on
Improved YOLOv7
Yujie Yang and Haiyan Kang *

School of Information Management, Beijing Information Science and Technology University, Beijing 100192, China
* Correspondence: [email protected]

Abstract: Printed circuit boards (PCBs) are a critical component of modern electronic equipment,
performing a crucial role in the electronic information industry chain. However, accurate detection of
PCB defects can be challenging. To address this problem, this paper proposes an enhanced detection
method based on an improved YOLOv7 network. First, the SwinV2_TDD module is proposed, which
adds a convolutional layer to extract the local features of the PCB. Then, the Magnification Factor
Shuffle Attention (MFSA) mechanism is introduced, which adds a convolutional layer to each branch
of the Shuffle Attention (SA) to expand its depth and enhance the adaptability of the attention mecha-
nism. The SwinV2_TDD module and MFSA mechanism are integrated into the YOLOv7 network,
replacing some ELAN modules and changing the activation function to Mish. The evaluation indexes
used are Precision (P), Recall (R), and mean Average Precision (mAP). Experimental results show that
the enhanced method achieves an AP of 98.74%, indicating a significant improvement in PCB defect
detection performance.

Keywords: deep learning; printed circuit boards; YOLOv7; target detection; swin transformer;
attention mechanism

1. Introduction
The printed circuit board (PCB) holds immense importance in the electronic industry
as a crucial component for the development of electronic products. PCBs are becoming
Citation: Yang, Y.; Kang, H. An
increasingly integrated and smaller [1] due to the excellent craftsmanship, precise wiring,
Enhanced Detection Method of PCB
and rapid development of integrated circuits. However, with the reduction in size, defects
Defect Based on Improved YOLOv7.
in the PCBs are also getting smaller and more challenging to detect. Therefore, it is
Electronics 2023, 12, 2120. https://
imperative to conduct a thorough defect detection process during PCB-related production
doi.org/10.3390/electronics12092120
to improve product quality and reduce company costs.
Academic Editor: Mohamed The conventional methods of detecting defects in PCBs are classified into three cat-
Benbouzid egories: manual visual inspection, electrical testing, and optical inspection [2]. Manual
Received: 24 February 2023
visual inspection involves workers inspecting bare PCBs directly using their eyes and
Revised: 27 April 2023
other equipment. However, this method has become inadequate due to the increasing
Accepted: 4 May 2023
demand for higher precision in PCB development, as it has poor detection stability and low
Published: 6 May 2023 efficiency. On the other hand, electrical testing employs contact testing to detect defects in
bare PCBs, which requires complex testing circuits, expensive molds, and fixtures for each
batch of PCBs. This method is also limited in detecting multi-layer PCBs and poses a risk
of secondary damage. In contrast, automated optical inspection (AOI) is a non-contact in-
Copyright: © 2023 by the authors. spection method that uses machine vision technology and image processing algorithms [3].
Licensee MDPI, Basel, Switzerland. Industrial cameras capture images of the PCBs, which are transmitted to a computer that
This article is an open access article provides feedback on the defect detection results. AOI is more stable and accurate than the
distributed under the terms and previous methods, with a faster detection speed [4], and does not impact the PCB.
conditions of the Creative Commons The advancement of deep learning has led to the development of contactless automatic
Attribution (CC BY) license (https://
detection methods, which have become a popular area of research due to their strong
creativecommons.org/licenses/by/
recognition adaptability and generalization ability. Typically, deep learning-based detection
4.0/).

Electronics 2023, 12, 2120. https://doi.org/10.3390/electronics12092120 https://www.mdpi.com/journal/electronics


Electronics 2023, 12, 2120 2 of 18

networks can be categorized into one-stage and two-stage networks. The one-stage network
includes Single Shot Detector (SSD) [5], and You Only Look Once (YOLO) [6]. In contrast,
the two-stage network includes regions with convolutional neural networks (R-CNN) [7],
Fast R-CNN [8], and Faster R-CNN [9], which is an improved version of R-CNN. The
primary difference between these networks is that the one-stage network directly predicts
the location and category of defects in the network after feature extraction, while the
two-stage network first generates proposals that may include defects, then conducts the
detection process. Specifically, the two-stage network generates candidate boxes of different
sizes that may contain defect features, then performs target detection to predict defect
classes and locations. However, the detection speed is slow due to the generation of many
candidate frames. On the other hand, the one-stage network performs both training and
detection in a single network without the need for explicit region proposals, resulting in
faster detection speed. This paper adopts the one-stage network based on YOLOv7 [10]
and improves it to meet real-time performance requirements in the industrial field.
The Swin Transformer v2 [11] is designed to overcome three significant challenges
in large visual model training and application, namely, model instability, the resolution
gap problem, and a chronic lack of labeled data. To address these challenges, the Swin
Transformer v2 proposes three primary methods. First, it combines cosine attention and
post-normalization to enhance model stability. Second, it introduces a logarithmic space con-
tinuous location deviation method, which enables the model to be trained on low-resolution
images, then transferred to its higher-resolution counterparts. Lastly, it introduces SimMIM,
a self-supervised pretraining method that reduces the need for large amounts of labeled
data. To improve the global feature extraction and stability of the model, SwinV2_CSPB
modules can replace some ELAN modules in the YOLOv7 backbone network.
The attention mechanism is a widely used method to enhance model performance.
Typically, attention weight is obtained by calculating the importance of each position in the
input sequence. Shuffle Attention (SA) [12] improves upon this method by shuffling and
reordering the input sequence, then calculating the importance of each position to obtain
the attention weight. Compared with the traditional attention mechanism, it increases
computational efficiency by using a new calculation method that reduces the amount of
computation required to calculate attention weight. Additionally, SA can enhance the
model’s generalization ability, resulting in more consistent performance on both training
and test data.
The main contributions of this paper are as follows:
(1) The Swin Transformer v2 has been further enhanced with the SwinV2_TDD (Tiny
Defect Detection) structure, which involves adding a convolutional layer and an upsam-
pling layer at the beginning of each stage in the Swin Transformer v2. This is to extract local
features of PCBs and prevent excessive compression of feature maps, thereby improving
the accuracy of detecting small defects.
(2) The Magnification Factor Shuffle Attention (MFSA) mechanism is introduced as a
solution to the issue of gradient vanishing in the attention calculation of SA, which is based
on a simple, fully connected layer. MFSA proposes adding a 1 × 1 convolutional layer to
expand the network’s layers and introducing a scaling factor to adjust the model’s percep-
tion of data dynamically. This improvement enhances the model’s ability to effectively
capture long-range dependencies and improves its generalization ability.
(3) The SwinV2_TDD structure and the MFSA mechanism are integrated into the
backbone network of YOLOv7 to enhance its performance in detection on PCBs. The
SwinV2_TDD structure is used to replace some of the ELAN modules. The activation
function is changed to Mish, which improves the model’s nonlinear expression capability.
The rest of this paper is organized as follows: Section 2 of the paper reviews related
works in PCB defect detection. Section 3 presents the three main techniques used in the
enhanced method and provides project formulations. In Section 4, the performance of the
proposed method is evaluated through ablation experiments. Finally, Section 5 concludes
the paper.
Electronics 2023, 12, 2120 3 of 18

2. Related Work
The conventional approach for detecting visual anomalies in artificial systems has
drawbacks, such as high cost, low efficiency, and errors in detection. As an alternative, the
electrical properties of components can be leveraged for detecting defects in printed circuit
boards (PCBs) through a semi-automatic, manual detection method that includes online
and functional testing [13]. Researchers have explored various techniques to enhance
this method, such as compressing images using wavelet transform to reduce memory
and computation requirements [14], using traditional machine learning algorithms for
defect detection [15], and designing low-complexity neural network and machine vision
schemes to improve defect detection [16]. Other approaches include using Fourier image
reconstruction to identify small defects [17] and ultrasonic laser thermal imaging for real-
time defect detection [18]. Although these methods can reduce costs compared to manual
detection, their limited application is attributed to factors, such as the non-reusability of
the test process, the high cost of equipment, and complex writing functions, among others.
Machine vision detection methods have emerged as a viable solution to overcome
the shortcomings of traditional artificial detection methods and are increasingly being
applied in modern industries [19]. There are three primary categories of PCB defect
detection methods based on machine vision: reference, non-reference, and hybrid methods.
The reference method [20] typically involves image segmentation techniques to detect
defects. For example, Li et al. compared PCB images with and without defects to identify
defects [21]. Non-reference methods [22] mainly rely on machine learning algorithms for
defect detection. For instance, Malge et al. employed an image segmentation algorithm
to detect PCB defects [23]. The hybrid method [24] combines reference and non-reference
methods to achieve more accurate defect detection. For example, Ray et al. developed
a hybrid detection method by comparing PCB images and using image segmentation
techniques [25]. Image segmentation techniques include threshold segmentation, edge
segmentation, and region segmentation methods. For example, Ardhy et al. [26] used
the adaptive Gaussian threshold segmentation to achieve rapid detection with minimal
parameters, but the detection efficacy varied significantly in different areas with light strips.
Baygin et al. [27] used Hough transform for edge segmentation and combined it with
the Canny operator to enhance detection efficiency. Ma et al. [28] improved the region
growth algorithm for region segmentation to achieve better detection outcomes. However,
these methods require manual tuning of model parameters, which may lead to suboptimal
accuracy and efficiency.
Recent studies have demonstrated that the accuracy of automated optical inspection
(AOI) is higher compared to other methods. However, due to the system’s high sensitivity,
it has very strict parameter-setting rules and may miss some cases, necessitating manual
screening after machine screening is complete [29]. Meanwhile, deep learning technology
has been rapidly advancing. Target defect detection methods based on deep learning have
shown to be highly accurate, fast, and do not require manual screening. Thus, they are
more cost-effective and efficient. Moreover, the parameter-setting rules are not as strict as
those in the AOI system. As a result, deep learning-based methods are being increasingly
studied and applied in various industries.
Due to advancements in computing technology, complex operations have become
more affordable, resulting in the rapid development of neural networks, including a large
number of deep neural networks. In the field of PCB defect detection, many scholars have
applied deep learning techniques. DenseNet [30] achieved better performance with fewer
parameters and computing costs by densely connecting all front and back layers to enable
feature reuse. Huang et al. [31] improved detection accuracy and efficiency by designing a
convolutional neural network that connects each layer in a feedforward manner. Compared
to conventional machine vision methods, deep learning algorithms have stronger nonlinear
abilities, higher robustness, and are applicable to more complex scenarios. He [32] proposed
an improvement measure that helped achieve a 96.91% accuracy rate. Geng et al. [33]
improved the detection accuracy to 96.65% by using focal loss and ResNet50 as the backbone
Electronics 2023, 12, 2120 4 of 18

network. Ding et al. [34] designed TDD-net, a detection network specifically aimed at tiny
PCB defects, which adopted a multi-scale fusion strategy and applied online hard example
mining to enhance the certainty of ROI proposals, resulting in a detection accuracy of
98.90%. Sun et al. [35] proposed the Inception-ResNet-v2 model, which improved the PCB
detection accuracy by adding an SE module to part of the structure. Hu et al. [36] presented
UF-Net, which retained more defect target information by using the Skip Connect method
and achieved a detection accuracy of 98.6%. Li et al. [37] improved the mAP value to
98.71% by replacing the convolution layer in the trunk with the residual structure unit CSP
based on the YOLOv4 algorithm. Wang et al. [38] proposed a lightweight model that used
the ShuffleNetV2 structure in the YOLOv5 backbone and achieved an accuracy of 95%.
YOLOv7, as a classic representative of the target detection algorithm, has surpassed the
previous YOLO series in detection speed and accuracy.
This paper proposes an improved PCB defect detection method based on the study
of the algorithms discussed above. The proposed method is based on the YOLOv7 al-
gorithm and achieves higher accuracy. The specific improvements include applying the
SwinV2_TDD structure in the backbone network to enlarge resolution, improve model
stability, and extract local features of PCB images better. The proposed MFSA mecha-
nism effectively combines spatial attention and channel attention to enhance target feature
information and dynamically adjust the model’s perception of data. Additionally, the
activation function is changed to Mish to improve training stability and final accuracy. The
experiment shows that the proposed enhanced detection method performs better in PCB
defect detection.

3. Materials and Methods


3.1. YOLOv7 Description
The YOLOv7 algorithm is a one-phase target detector that excels in both speed and
accuracy within the 5 FPS to 160 FPS range. Its main contributions include model reparam-
eterization, which is introduced into the network architecture, a label allocation strategy
that adopts cross-grid search and matching, and an efficient network architecture proposed
by ELAN. The algorithm also includes a training method of auxiliary head that aims to
increase the training cost and enhance accuracy without affecting the reasoning time, as the
auxiliary head only appears in the training process. The YOLOv7 network comprises three
parts, and the specific structure is illustrated in Figure 1.
Input: To prepare input images for the network, several preprocessing steps are
performed. These steps include random scaling, cropping, and splicing to enrich the dataset
and add small targets to make the network more robust. The best anchor is adaptively
calculated from the training sets. The images are then uniformly scaled to a standard size
and input to the backbone network.
Backbone: This module is extensively utilized for feature extraction and comprises
CBS, ELAN, and MP-1 structures. CBS, which is composed of Conv + BN + SiLU, is primar-
ily employed for image channel alteration, feature extraction, and image downsampling.
ELAN enhances network robustness by extracting additional features and controlling the
shortest and longest gradient paths. ELAN comprises two branches, with the first branch
passing through a 1 × 1 convolution module and the second branch through four 3 × 3 con-
volution modules to extract features, then the four features are merged to obtain the final
feature extraction result. Based on ELAN design, E-ELAN employs merge cardinality,
shuffle, and expand techniques to enhance network learning capacity while preserving the
initial gradient path. MP involves two branches, max pooling and Conv with stride = 2,
that are used simultaneously for image downsampling, and the number of channels before
and after remains constant.
of channels before and after remains constant.
Head: The backbone network persists in producing three-layer feature maps wit
varying sizes. The RepVGG block and Conv are followed by the prediction of three imag
detection tasks: classification, background classification, and frame. Auxiliary head train
Electronics 2023, 12, 2120
ing and positive and negative sample matching strategies are employed to5 ofenhance
18
th
overall performance of the model.

Figure
Figure 1. 1. YOLOv7
YOLOv7 Network
Network structure.
structure.

3.2.Head: TheSwin
Improved backbone networkv2
Transformer persists in producing three-layer feature maps with
varying sizes. The RepVGG block and Conv are followed by the prediction of three image
3.2.1. Swin
detection Transformer
tasks: classification,v2
background classification, and frame. Auxiliary head training
and positive
Deep and negative
learning sample matching
networks strategies
often encounter are employed
challenges to enhance
during theand
training overall
application
performance
such as (1) of the model.
visual models being prone to large-scale instability; (2) high-resolution image
or windows being required for many downstream visual tasks; and (3) high graphics pro
3.2. Improved Swin Transformer v2
cessing
3.2.1. Swinunit (GPU) memory
Transformer v2 consumption when dealing with large images and high reso
lutions. To tackle these issues, Liu et al. proposed the Swin Transformer technology [11
Deep learning networks often encounter challenges during training and application,
which
such includes
as (1) (1) post-normalization
visual models technology
being prone to large-scale and scaling
instability; cosine attention
(2) high-resolution to enhanc
images
or windows being required for many downstream visual tasks; and (3) high graphicsdeviatio
the stability of large visual tasks; and (2) a log-spaced continuous location
method to
processing enable
unit (GPU) the modelconsumption
memory trained on coarse imageswith
when dealing to belarge
applied to and
images higher-resolutio
high
counterparts.
resolutions. The these
To tackle specific structure
issues, is proposed
Liu et al. depictedthe in Figure 2. Furthermore,
Swin Transformer zero[11],
technology redundanc
which includes (1) post-normalization technology and scaling cosine attention
optimizers, activation checkpoints, and sequential self-attention calculations can signif to enhance
the stability
cantly of large
reduce GPU visual tasks; and
memory (2) a log-spaced
consumption. Bycontinuous
training alocation deviation method
Swin Transformer model usin
tothese
enablemethods,
the modelittrained on coarse images to be applied to higher-resolution counterparts.
can be applied to large visual tasks, including those involving high-re
The specific structure is depicted in Figure 2. Furthermore, zero redundancy optimizers,
olution images, while mitigating model instability and GPU memory consumption. Th
activation checkpoints, and sequential self-attention calculations can significantly reduce
structure
GPU memory of consumption.
Swin Transformer v2 is ashown
By training in Figure 2.model using these methods,
Swin Transformer
it can be applied to large visual tasks, including those involving high-resolution images,
while mitigating model instability and GPU memory consumption. The structure of Swin
Transformer v2 is shown in Figure 2.
x FOR PEER REVIEW
Electronics 2023, 12, 2120 6 of 186 of 18

Electronics 2023, 12, x FOR PEER REVIEW 6 of 18

Stage 1 Stage 2 Stage 3 Stage 4

Stage 1 Stage 2 Stage 3 Stage 4

Linear Embedding

Patch Merging

Patch Merging
Patch Merging
Patch Partition

Linear Embedding
Swin Swin Swin Swin

Patch Merging

Patch Merging
Patch Merging
Patch Partition
Images Transformer Transformer
Swin SwinTransformer Swin Transformer Swin
Images Block Block
Transformer Transformer Block Transformer Block Transformer
Block Block Block Block

ⅹ2 ⅹ2 ⅹ2 ⅹ2 ⅹ6 ⅹ6 ⅹ2 ⅹ2

Figure
Figure 2. Swin Transformer
Figure 2.2.
v2 Swin Transformer
structure.
Swin Transformer v2v2 structure.
structure.

The
Thestructure
structureofofthe
theSwin
SwinTransformer Block
Transformer is is
Block shown inin
shown Figure 3. 3.
Figure
The structure of the Swin Transformer Block is shown in Figure 3.
Xl
z Xl
z
Softmax Layer Norm

Softmax Layer Norm MLP


b MLP
q k
MLP v
b Layer Norm
log MLP MLP MLP
qx y k Attention
MLP v
Log-CPB z Layer Norm
log MLP MLP MLP Xl 1

x y 3.3.Swin
Figure
Figure SwinTransformer
TransformerBlock structure.
Block structure. Attention
Log-CPB The
Thereason
reasonforforthe
z instabilityininthe
theinstability thetraining
trainingprocess
process is the difference in amplitude of
X l 1 is the difference in amplitude of
the interlayer activation function, caused by adding
the interlayer activation function, caused by adding a chief branch a chief branch between
betweenthe theoutput
outputof
ofleft
leftelements.
elements.ToTo address this issue, the Swin Transformer V2
address this issue, the Swin Transformer V2 relocates the LN layer. relocates the LN layer.
Spe-
Figure 3. Swin Transformer Block structure.
Specifically, for the maximum model training (Swin V2-H and
cifically, for the maximum model training (Swin V2-H and Swin V2-G), an additional LN Swin V2-G), an additional
LN layer
layer is is addedtotoevery
added everysixsixTransformer
Transformer modules modules to to ensure
ensure training
trainingstability.
stability.While
Whilethe
the
The reason forattention
the instability
of pixel in
pairs the
is training
typically process
computed byis the
taking difference
the dot in
attention of pixel pairs is typically computed by taking the dot product of key vectors,this
product amplitude
of key of this
vectors,
the interlayer activation
method
method function,
often
oftenleads caused
leads bypixel
totoseveral
several adding
pixelpairs acontrolling
pairs chief branch
controlling the between
theattention the for
attentiongraph
graphoutput of ofof
fora anumber
number
blocks
blocksand
left elements. To address andheads.
this issue,
heads. ToTomitigate
the Swinthis
mitigate problem,
problem,a ascaling
Transformer
this cosine
cosineattention
V2 relocates
scaling the LN
attentionmethod
layer.isisSpe-
method proposed
proposed
that calculates
that calculates the attention
thetraining of
attention(Swinpixel i
of pixelV2-Hand pixel
i and pixel j by scaling cosine.
cifically, for the maximum model and jSwinby scalingV2-G),cosine.
an additional LN
layer is added to every six Transformer modules to ensureqi , Ktraining
/π + Bij stability. While the (1)(1)
 
Sim
𝑆𝑖𝑚(𝑞qi , K j, 𝐾 =) cos𝑐𝑜𝑠(𝑞
=
𝑖 𝑗 𝑖 , 𝐾𝑗 )/𝜋 +
j 𝐵𝑖𝑗
attention of pixel pairs is typically computed by taking the dot product of key vectors, this
InInthis
thiscontext,
context,thethevariable
variableππisisa aparameter
parameterthat thatcan
canbebelearned
learnedand andisisnot
notshared
shared
method often leadsacross
to several
acrosslayers
pixel
layersororsets
pairs controlling
setsofoflayers.
layers.Typically,
Typically,
the
it ithas
attention
hasa avalue
graph
valuegreater
greaterthan
for0.01
than0.01
aandnumber of the
andrepresents
representsthe
blocks and heads. To mitigate
difference
difference this problem,
ininrelative
relative positiona
position scalingpixel
between
between cosine
pixel attention
i and
i and method
pixelj. j.The
pixel The isfunction,
proposed
cosinefunction,
cosine duetotoitsits
due
that calculates the attention
natural of pixel i and
naturalnormalization,
normalization, pixel
results ininlow
results j by
low scaling
attention
attention cosine.
values.
values.Rather
Ratherthan thandirectly
directlyoptimizing
optimizingthe the
deviation
deviationparameters,
parameters,thethecontinuous
continuousrelative
relativeposition
positiondeviation
deviationmethod
methodemploys
employsa asmall
small
element 𝑆𝑖𝑚(𝑞
elementnetwork inin
network ,
the
𝑖 𝑗 𝐾 ) = 𝑐𝑜𝑠(𝑞
relative
the , 𝐾
coordinates:
relative coordinates:
𝑖 𝑗 )/𝜋 + 𝐵 𝑖𝑗 (1)
B 𝐵(∆
∆x , ∆𝑥y, ∆𝑦=)= 𝑔(∆
g ∆ x, ∆𝑥y, ∆𝑦 ) and is not shared (2)(2)
 
In this context, the variable π is a parameter that can be learned
across layers or sets of layers. Typically,
Thesymbol
symbol g in thisitequation
has a value greater
represents than
a small 0.01 and represents
meta-network consisting ofthe
two lay-
The g in this equation represents a small meta-network consisting of two layers
difference in relative position
ofers between
of multi-layer
multi-layer pixel
(MLP)iand
perceptron
perceptron anda ReLU
(MLP) pixel j. ReLU
The cosine
and aactivation function,
activation
function. ∆ x , ∆y due
function. (∆𝑥to
, ∆𝑦its
corresponds ) corre-
to
natural normalization,sponds to
results the
inscaled
low coordinate
attention of linear
values. space,
Rather and the
than new deviation
directly is learned
optimizing
the scaled coordinate of linear space, and the new deviation is learned from the original from
the the
original deviation. If the training is parameterized directly
deviation parameters, the continuous relative position deviation method employs a small and the pre-trained bias pa-
rameters are not used, then the performance of the window may suffer when it is
element network in the relative coordinates:

𝐵(∆𝑥 , ∆𝑦 ) = 𝑔(∆𝑥 , ∆𝑦 ) (2)


promoted to high resolution. The network g generates a deviation value for any relative
position, which makes it suitable for fine-tuning tasks with randomly variable window 7 of 18
Electronics 2023, 12, 2120
sizes. During the reasoning process, the bias value for each relative position can be calcu-
lated in advance and saved as a model parameter. This approach ensures that the initial
parameterized bias deviation.
method and reasoning
If the training is process remain
parameterized consistent.
directly and the pre-trained bias parameters
are significant
To accommodate not used, thenchanges
the performance of the size,
in window window may suffer when
a significant it is promoted
proportion of rel-to high
ative coordinate ranges would need to be extrapolated. To address this challenge, loga-which
resolution. The network g generates a deviation value for any relative position,
makes it suitable for fine-tuning tasks with randomly variable window sizes. During the
rithmic space is used insteadprocess,
reasoning of linear
thespace:
bias value for each relative position can be calculated in advance
and saved as a ^
model parameter. This approach ensures that the initial parameterized bias
∆𝑥 = 𝑠𝑖𝑔𝑛
method and reasoning (𝑥) ∙remain
process 𝑙𝑜𝑔(1consistent.
+ |∆𝑥 |) (3)
To accommodate significant changes in window size, a significant proportion of
^
relative coordinate ranges would need to be extrapolated. To address this challenge,
(4)
∆ 𝑦=
logarithmic space
𝑠𝑖𝑔𝑛(𝑦) ∙ 𝑙𝑜𝑔(1 + |∆𝑦 |)
is used instead of linear space:
^ ^
In this context, (∆𝑥 , ∆𝑦 ) represents the coordinate
ˆ
∆ x = sign( xin )·loglogarithmic
( 1+|∆ x |) space. The use of (3)
logarithmic interval coordinates results in a significantly reduced extrapolation ratio
when passing relative position deviations comparedˆ to using the original linear interval
∆ ∆y

coordinates. y = sign ( y )· log 1 + (4)
ˆ ˆ
In this context, (∆x , ∆y ) represents the coordinate in logarithmic space. The use of
3.2.2. SwinV2_TDD logarithmic interval coordinates results in a significantly reduced extrapolation ratio when
To better capture local
passing information
relative and accurately
position deviations compared locate
to usingdefect areas
the original in PCB
linear defect
interval coordinates.
detection, Swin Transformer v2 has been improved. As defects tend to be concentrated in
3.2.2. SwinV2_TDD
local areas, greater attention is required for local information. The first improvement in-
To better capture local information and accurately locate defect areas in PCB defect
volves adding a convolutional
detection, Swin layer with a 3v2× has
Transformer 3 kernel to the head
been improved. of each
As defects stage
tend to beofconcentrated
Swin
Transformer v2. Additionally, a pooling operation with a 2 × 2 max pooling operation
in local areas, greater attention is required for local information. The first improvement is
added to each stage’sinvolves adding a convolutional
convolutional layer with
layer to reduce the asize3 × of
3 kernel
featureto the headand
maps of each stage of Swin
improve
Transformer v2. Additionally, a pooling operation with a
the computation efficiency. This results in faster feature extraction while maintaining2 × 2 max pooling operation is
added to each stage’s convolutional layer to reduce the size of feature maps and improve
high-quality featurethe
extraction
computationandefficiency.
reducingThismemory
results consumption.
in faster feature extraction while maintaining
To prevent excessive compression
high-quality of feature
feature extraction maps and
and reducing enhance
memory the feature’s expres-
consumption.
sive power, an upsampling layer with a 2 × 2 upsampling operation is added
To prevent excessive compression of feature maps and enhance after each
the feature’s expressive
stage’s convolutionalpower,
layerantoupsampling
restore the layer
feature 2 × 2 to
with a map upsampling operation
its original is added
size. This after each
ensures thatstage’s
convolutional layer to restore the feature map to its original size. This ensures that the
the feature maps are not excessively compressed, thus, improving expressive power. The
feature maps are not excessively compressed, thus, improving expressive power. The
improved Swin Transformer
improved Swin v2Transformer
structure v2is structure
illustrated in Figure
is illustrated 4, and
in Figure 4, andthis
thisenhanced
enhanced model
model is referred toisasreferred
SwinV2_TDD.
to as SwinV2_TDD.

Figure 4. SwinV2_TDD structure.


Figure 4. SwinV2_TDD structure.

3.3. Improved SA Mechanism


3.3. Improved SA Mechanism
3.3.1. SA
3.3.1. SA The SA module is a neural network module that processes input features and aims to
The SA moduleenhance the performance of convolutional neural networks by establishing relationships
is a neural network module that processes input features and aims
between features in both the spatial and channel dimensions. It leverages channel splitting
to enhance the performance
and Shuffleof convolutional
units neuraland
to integrate channel networks by establishing
spatial attention relation-
into each group block for
ships between features in both
processing thefeatures.
input spatial The
andinput
channel dimensions.
features It leverages
of the SA module channel
are a four-dimensional
splitting and Shuffletensor
unitswith a shape ofchannel
to integrate [N, C, H,and
W], where
spatialN attention
represents into
the batch
eachsize, C represents
group block the
for processing input features. The input features of the SA module are a four-dimensional
tensor with a shape of [N, C, H, W], where N represents the batch size, C represents the
number of channels, and H and W represent the height and width of the input image,
respectively. The SA module divides the input feature map into multiple groups and in-
FOR PEER REVIEW 8 of 18

Electronics 2023, 12, 2120 8 of 18


This reduces the dependence between features and improves computational efficiency
through parallel computing.
number of
The channel attention channels,
branch andutilizes
of SA H and W represent
global averagethe height
poolingand (GAP)
width oftothe input image,
generate
respectively. The SA module divides the input feature map into multiple groups and
channel-wise statistics, followed
incorporates by scaling
channel and attention
and spatial shifting into
the channel
a block ofvector using
each group a pair
using the of
Shuffle
parameters. More specifically,
unit. This reducesforthe each subgroup
dependence of features,
between features and a C-dimensional
improves computational vectorefficiency
is
produced using global average
through pooling,
parallel computing. representing the average value of all channels in
that subgroup. Then, an The channel
MLP attention branch
is applied to thisofvector,
SA utilizes global average
resulting pooling
in scaling and(GAP) to generate
shifting
channel-wise statistics, followed by scaling and shifting the channel vector using a pair
parameters utilized to scale and shift each channel in the subgroup.
of parameters. More specifically, for each subgroup of features, a C-dimensional vector
The spatial attention
is producedbranch
usinggenerates
global averagea tensor of shape
pooling, H×W
representing the×average
1 using group
value of allnor-
channels
malization, representing the sum Then,
in that subgroup. of squares
an MLPof channel
is applied values
to this at resulting
vector, each spatial position.
in scaling and shifting
Then, an MLP is used parameters utilized
to process to scale
this tensor,and generating
shift each channel
scalingin the
andsubgroup.
shifting parameters
The spatial attention branch generates a tensor of shape H × W × 1 using group
applied to each pixel in the subgroup. Finally, the outputs of the spatial and channel at-
normalization, representing the sum of squares of channel values at each spatial position.
tention branches are added
Then, together
an MLP is usedand normalized
to process using
this tensor, batch normalization.
generating scaling and shifting parameters
The Shuffle unit is a to
applied component
each pixel ininthe thesubgroup.
SA module Finally,thattheswaps
outputschannels to reduce
of the spatial and channel
coupling between features
attention and improve
branches computational
are added efficiency.using
together and normalized It splits the
batch input tensor
normalization.
into two parts, one thatThe Shufflechannel
requires unit is a component
swapping in andthetheSA other
modulethat thatdoes
swapsnot.channels to reduce
The sub-
coupling between features and improve computational efficiency. It splits the input tensor
tensor channels thatinto
need swapping are interleaved and concatenated to achieve channel
two parts, one that requires channel swapping and the other that does not. The sub-
swapping, increasing data
tensor diversity
channels that and
need improving
swapping aregeneralization ability while
interleaved and concatenated reducing
to achieve channel
the risk of overfitting.
swapping, increasing data diversity and improving generalization ability while reducing
The SA module the risk combines
then of overfitting. the outputs of the channel attention and spatial atten-
The SA module then combines the outputs of the channel attention and spatial atten-
tion branches, which are normalized with a normalization layer to avoid gradient prob-
tion branches, which are normalized with a normalization layer to avoid gradient problems,
lems, such as vanishing
such as orvanishing
exploding gradients,
or exploding and facilitate
gradients, betterbetter
and facilitate feature learning.
feature learning.The
The SA
SA process and structure
process are
and illustrated
structure are in Figure in
illustrated 5.Figure 5.

Figure 5. Shuffle Attention structure.


Figure 5. Shuffle Attention structure.

The SA module appliesThe SA “channel


module applies “channel segmentation”
segmentation” to processtosets
process sets of sub-features
of sub-features in in
parallel, dividing the input feature map into groups and integrating channel attention and
parallel, dividing the input feature map into groups and integrating channel attention and
spatial attention within each group using the Shuffle unit. In the channel attention branch,
spatial attention within
globaleach group
average using
pooling the Shuffle
is utilized unit.
to create In the channel
channel-wise attention
statistics, branch,
which are then scaled
global average pooling is utilized to create channel-wise statistics, which are then scaled
and shifted by a pair of parameters. In the spatial attention branch, spatial statistics are
and shifted by a pair of parameters.
generated using groupInnormalization,
the spatial attention
then scaledbranch, spatial
and shifted statistics
to create compactare
features
generated using group normalization, then scaled and shifted to create compact features unit
similar to the channel information. The two branches are combined, and the Shuffle
is used for each sub-feature to capture feature dependencies in both spatial and channel
similar to the channel information. The two branches are combined, and the Shuffle unit
dimensions. Finally, the sub-features are merged using the “channel shuffle” operator to
is used for each sub-feature to capture
enable message featureamong
communication dependencies in both spatial and channel
each sub-feature.
dimensions. Finally, the sub-features are merged using the “channel shuffle” operator to
enable message communication among each sub-feature.

3.3.2. MFSA
In YOLOv7, when the input image size is (640, 640), the model produces three pre-
Electronics 2023, 12, 2120 9 of 18
FOR PEER REVIEW 9 of 18

3.3.2. MFSA
In YOLOv7, when the input image size is (640, 640), the model produces three pre-
different scale prediction layers to highlight the proportion of small-scale targets. This
diction layers of varying sizes, specifically (20, 20), (40, 40), and (80, 80). However, as PCB
attention mechanism has improved
defects are typically the
small accuracy
in size and ofthere
recognizing low-contrast
are relatively objects
fewer large targets, inimportant
it is the
model. to focus on improving the recognition accuracy of smaller objects [39]. To achieve this,
The SA module theispaper
effective at capturing
proposes using the SA features
mechanism of different
module, whichscales and weights
applies orientations
to different
scale prediction layers to highlight the proportion of
in images by combining interactions between channels and spatial dimensions. However, small-scale targets. This attention
mechanism has improved the accuracy of recognizing low-contrast objects in the model.
the attention calculation in SA is limited by the use of a simple, fully connected layer,
The SA module is effective at capturing features of different scales and orientations in
which may lead toimages
problems such asinteractions
by combining gradient between
vanishing. To and
channels address
spatialthis issue, aHowever,
dimensions. 1 × 1 the
convolution layer isattention
addedcalculation
to increase in SAthe depthby
is limited andthe enhance the adaptive
use of a simple, nature
fully connected layer,ofwhich
the may
attention mechanism, allowing the network to better adapt to different tasks and datasets. layer
lead to problems such as gradient vanishing. To address this issue, a 1 × 1 convolution
is added to increase the depth and enhance the adaptive nature of the attention mechanism,
Additionally, a magnification factor is introduced to dynamically adjust the model’s per-
allowing the network to better adapt to different tasks and datasets. Additionally, a
ception of data, which improves its nonlinear
magnification factor is introduced fitting ability and
to dynamically overall
adjust accuracy.
the model’s The op-
perception of data,
timal value of the magnification factor can be determined through experimentation.
which improves its nonlinear fitting ability and overall accuracy. The optimal value of the
The MFSA mechanism
magnificationdepicted
factor canin be
Figure 6 is achieved
determined by adding a shortcut connec-
through experimentation.
tion and applying a max pooling layer. The input data is processed,byand
The MFSA mechanism depicted in Figure 6 is achieved adding
each a shortcut
channel connection
is
and applying a max pooling layer. The input data is processed, and each channel is
multiplied to produce a feature map that completes the original feature relocation of the
multiplied to produce a feature map that completes the original feature relocation of the
channel dimension channel
data, resulting
dimensionin enhanced
data, resultingmodel performance.
in enhanced model performance.

Figure 6. MFSA structure.


Figure 6. MFSA structure.

The original
The original computation computation
of SA of SA as
is expressed is expressed
follows: as follows:
𝑁 1 N
1 fc = ∑ αi xi Wc,i
𝑓𝑐 = ∑ 𝛼𝑖 𝑥𝑖 𝑊N𝑐,𝑖i = 1 (5)
(5)
𝑁
𝑖=1
where xi represents the i-th feature map in the input tensor, N represents the number of
where 𝑥𝑖 represents the i-th
channels in feature
the inputmap in αthe
tensor, input tensor,
i represents N represents
the attention theWnumber
weights, and of the
c,i represents
weights after channel shuffling. After adding convolutional layers
channels in the input tensor, 𝛼𝑖 represents the attention weights, and 𝑊𝑐,𝑖 represents the and a scaling factor s,
the above computation can be expressed as the following formula:
weights after channel shuffling. After adding convolutional layers and a scaling factor s,
the above computation can be expressed as the following
1 N formula:
N i∑
fc = αi xi (W c,i + Wc,j )s (6)
𝑁 =1
1
𝑓𝑐 = the
where Wc,j represents ∑ 𝛼𝑖 𝑥𝑖 (𝑊 + 𝑊𝑐,𝑗 ) 𝑠 (6) the
operation, and s represents
𝑁 weights after
𝑐,𝑖 the convolutional
scaling factor. 𝑖=1

where 𝑊𝑐,𝑗 represents the weights after the convolutional operation, and s represents the
scaling factor.

3.4. Change the Activation Function to Mish


Activation functions are crucial components in deep learning, enabling complex neu-
Electronics 2023, 12, 2120 10 of 18

3.4. Change the Activation Function to Mish


Activation functions are crucial components in deep learning, enabling complex
neural network architectures, and improving learning ability. In recent years, the Swish
activation function has emerged as a dominant player in the field. This function is a
sigmoid-weighted linear unit that is smooth, non-monotonic, and has no upper or lower
limits. Its computational formula is given below.
Electronics 2023, 12, x FOR PEER REVIEW 10 of 18

f ( x ) = x × σ ( βx ) (7)

Here, σ ( x ) is the sigmoid function:


1
𝜎 (𝑥 ) = 1 −𝑥 (8)
σ ( x ) = 1 + 𝑒− x (8)
1+e
𝛽 is a trainable parameter. When 𝛽 = 1, the Swish activation function becomes the
β is a trainable
SiLU activation parameter.
function. When β trunk
The YOLOv7 = 1, the Swish activation
network uses the SiLU function [40]becomes
activation thefunc-
SiLU
activation function. The YOLOv7 trunk network uses the SiLU
tion in its convolutional layer to avoid the problem of gradient disappearance caused by [40] activation function
in its
the convolutional
saturation layer to avoid
of the Sigmoid function. the problem
As x is very large, 𝑓(𝑥)
of gradient disappearance
approachescaused by the
x, but when
xsaturation
approximates of the Sigmoid
negative function.
infinity, x is very large,
𝑓(𝑥)Asapproximates f ( x )Mish
0. The approaches
activationx, but when
function
x approximates
combines negative
the nonlinear infinity, f ( xof
characteristics ) approximates
both tanh and0.sigmoid The Mish activation
functions, function
and exhibits
combines
stronger the nonlinear
nonlinear characteristics
expression ability when of both
the tanh
inputand valuesigmoid
is small. functions, and exhibits
The defects on the
PCB are relatively small and require precise extraction and processing of details andon
stronger nonlinear expression ability when the input value is small. The defects edgethe
PCB are relatively small and require precise extraction and processing
information during detection. The Mish activation function can better capture the detailed of details and edge
informationin
information during detection.
the image The Mish
and improve the activation
accuracyfunction
of objectcan better capture
detection. the detailed
At the same time,
information in the image and improve the accuracy of object
the derivative of the Mish activation function has a shape similar to that of a function detection. At the samewithtime,
theadaptive
an derivative of the
slope, Mish can
which activation
alleviate function has a shape
the problem similar to
of gradient that of a function
disappearance and with
im-
an adaptive slope, which can alleviate the problem of gradient
prove the training stability and convergence speed of the model. Therefore, the Mish func-disappearance and improve
the [41]
tion training
wasstability
used to and convergence
replace SiLU. Thespeed Mishof the model. Therefore,
computational formula the Mish
is as function [41]
follows:
was used to replace SiLU. The Mish computational formula is as follows:
𝑓(𝑥 ) = 𝑥𝑡𝑎𝑛ℎ(𝑙𝑛(1 + 𝑒 𝑥 )) (9)
f ( x ) = xtanh(ln(1 + e x )) (9)
The Mish activation function is unsaturated [42] and has no upper bound, which
helps The
to avoid
Mishthe problemsfunction
activation of gradient disappearance
is unsaturated [42]orand
explosion
has nocaused upper by saturation,
bound, which
resulting in significantly improved training speeds. Additionally,
helps to avoid the problems of gradient disappearance or explosion caused by saturation, it has a small weight
and a lower
resulting in bound on theimproved
significantly negative training
axis, which helps
speeds. to prevent the
Additionally, it hasneuron
a smallnecrosis
weight phe-
and a
nomenon
lower bound associated with the axis,
on the negative ReLUwhich functionhelps and produces
to prevent thea neuron
strong regularization
necrosis phenomenon effect.
The function
associated is nearly
with smooth
the ReLU at every
function point, making
and produces a strongit easier to optimize
regularization andThe
effect. more gen-
function
eralizable, and it facilitates
is nearly smooth the flow
at every point, makingof information
it easier toin deeper networks.
optimize The Mish activa-
and more generalizable, and
tion function is
it facilitates thewidely
flow ofused in YOLOv4
information in [43],
deeperdemonstrating
networks. The a 0.494%
Mish improvement
activation functionover
Swish and used
is widely a 1.671% improvement
in YOLOv4 over ReLU. Figure
[43], demonstrating a 0.494%7 illustrates
improvement the graphover of the Mish
Swish and a
1.671% improvement over ReLU. Figure 7 illustrates the graph of the Mish function.
function.

Figure
Figure7.7.Mish
Mishfunction
functiongraph.
graph.

3.5. Enhanced YOLOv7 Backbone


Based on the characteristics of SwinV2_TDD and MFSA, two ELAN modules in the
backbone network of YOLOv7 are replaced with SwinV2_TDD modules, combining the
strong modeling capability of the transformer structure with important visual signal pri-
Electronics 2023, 12, 2120 11 of 18

3.5. Enhanced YOLOv7 Backbone


Based on the characteristics of SwinV2_TDD and MFSA, two ELAN modules in the
backbone network of YOLOv7 are replaced with SwinV2_TDD modules, combining the
strong modeling capability of the transformer structure with important visual signal priors.
Meanwhile, the MFSA mechanism is introduced to enhance the extraction of small-scale
Electronics 2023, 12, x FOR PEER REVIEW
features in the image. The original backbone and enhanced backbone structures are shown 11 of 1
in Figure 8.

Figure8. 8.
Figure Enhanced
Enhanced backbone
backbone structure
structure comparison:
comparison: (left) original
(left) original backbonebackbone and
and (right) (right) enhance
enhanced
backbone.
backbone.

4.4.Results
Results
4.1. Experimental Conditions
4.1. Experimental Conditions
This paper’s experimental environment is based on the Ubuntu 20.04 LTS operating
This
system. Thepaper’s
CPU usedexperimental
is AMD Ryzen environment
7 5800H, and is
thebased on the
GPU used UbuntuGeForce
is NVIDIA 20.04 LTS
RTXoperatin
system.
3060. The The
CUDACPU11.7used is AMD
acceleration Ryzen
library 7 5800H,
is used, and
and the the GPU
PyTorch used isisNVIDIA
framework used for GeForc
RTX 3060. The CUDA 11.7 acceleration library is used, and the PyTorch framework is use
implementation.
for implementation.
4.2. Dataset
4.2. In this paper, the Intelligent Robot Open Laboratory of Peking University’s open-
Dataset
source dataset [34] is utilized, which includes six types of common defects: missing hole,
mouseIn this
bite, paper,
open the
circuit, Intelligent
short Robot
circuit, spur, andOpen Laboratory
spurious of Peking
copper, as shown University’s
in Figure 9. The open
source dataset [34] is utilized, which includes six types of common defects:
dataset comprises a total of 693 images, each containing 3 to 5 defects. The image size ismissing hole
mouse
600 × 600. bite, open circuit, short circuit, spur, and spurious copper, as shown in Figure 9
The dataset comprises a total of 693 images, each containing 3 to 5 defects. The image siz
is 600 × 600.
The limited size of the dataset used in this study can affect the detection of PCB boar
defects. To address this, data augmentation techniques were employed to improve th
generalization ability of the network during training. Data augmentation is a techniqu
that involves transforming original images through operations, such as rotations, crop
Electronics 2023, 12, x FOR PEER REVIEW 12 of 18

Electronics 2023, 12, 2120 12 of 18


test set contained 139 images, which were original images that had not undergone any
augmentation.

(a) (b) (c)

(d) (e) (f)


Figure
Figure9.9.Diagram
Diagramofofsix
sixdefect
defecttypes
types(a)
(a)missing
missinghole;
hole;(b)
(b)mouse
mousebite;
bite;(c)
(c)open
opencircuit;
circuit;(d)
(d)short
short
circuit; (e) spur; (f) spurious copper.
circuit; (e) spur; (f) spurious copper.

4.3. Evaluation
The limitedIndicators
size of the dataset used in this study can affect the detection of PCB board
defects. To address this, data
Precision (P), Recall augmentation
(R), False techniques
Positive Rate (FPR), were employed
and mean to improve
Average the
Precision
generalization
(mAP) were used ability of the network
as evaluation during The
indicators. training.
ratio Data augmentation
between is a technique
positive samples that
quantity
involves
and transforming
all detected samplesoriginal images
quantity through
of this type isoperations,
denoted assuch as rotations,
P, and cropping,
its calculation and
formula
isscaling, to generate more training data [44]. During model testing, an unaugmented test
as follows:
dataset was used to evaluate the model’s performance and generalization ability. Using the
same augmented images for testing as for 𝑇𝑃 can lead to overly optimistic evaluations
training
𝑃= (10)
of the model’s performance because the training 𝑇𝑃 + 𝐹and
𝑃 test sets contain different versions
of the
The same
ratiooriginal
between images.
detectedTo positive
address classes
this issue, the dataset
quantity and allofpositive
693 original images
classes was
quantity
randomly divided into training and testing sets at an 8:2 ratio. Data augmentation was
is denoted as R:
only applied to the training set, and all images were resized to a uniform size of 640 × 640.
The augmented training set contained 9920 images, 𝑇𝑃 while the test set contained 139 images,
𝑅 =
which were original images that had not undergone
(11)
𝑇𝑃 + 𝐹𝑁 any augmentation.
4.3. Evaluation Indicators 𝐹𝑃
𝐹𝑃𝑅 = (12)
𝐹𝑃 (FPR),
Precision (P), Recall (R), False Positive Rate + 𝑇𝑁 and mean Average Precision (mAP)
were used
where as evaluation
𝑇𝑃 represents indicators.
the quantity of The ratiodenoted
samples betweenas positive samples
positive and arequantity
actuallyand all
posi-
detected samples quantity of this type is denoted as P, and its calculation formula
tive; 𝐹𝑃 represents the quantity of samples denoted as positive but are actually negative; is as
follows:
𝑇𝑁 represents the quantity of samples denoted Tas negative and are actually negative; and
P
=
𝐹𝑁 represents the quantity of samples Pdenoted as Fnegative but are actually positive. (10)
TP + P
The value of P or R alone cannot objectively reflect the quality of the detection results.
The ratio between detected positive classes quantity
Therefore, it is required to combine these two evaluation indexesand all positive classes
to measure the quantity
perfor-
is denoted as R:
mance of the algorithm. Using a combination ofTPpoints with different P and R values can
draw a P-R curve, also called a P-R curve.R =Based on a P-R curve, AP could be obtained (11) by
TP + FN
counting the P value corresponding to each R value. Its computational formula is as fol-
lows: FP
FPR = (12)
F
1P
+ TN
where TP represents the quantity of samples 𝑃(𝑅)𝑑𝑅as positive and are actually positive;
𝐴𝑃 = ∫denoted (13)
FP represents the quantity of samples denoted
0 as positive but are actually negative; TN
represents the quantity of samples denoted as negative and are actually negative; and FN
represents the quantity of samples denoted as negative but are actually positive.
Electronics 2023, 12, 2120 13 of 18

The value of P or R alone cannot objectively reflect the quality of the detection results.
Therefore, it is required to combine these two evaluation indexes to measure the perfor-
mance of the algorithm. Using a combination of points with different P and R values can
draw a P-R curve, also called a P-R curve. Based on a P-R curve, AP could be obtained
by counting the P value corresponding to each R value. Its computational formula is
as follows: Z 1
AP = P( R)dR (13)
0
The sum of all AP classes divided by the number of classes is the mAP:

∑in = 0 AP(i )
mAP = (14)
n

4.4. Analysis of Experimental Results


4.4.1. Performance Analysis of SwinV2_TDD Structure
In this experiment, we investigated the impact of the SwinV2_TDD structure on
model performance. We conducted comparative experiments that included the original
YOLOv7 network, the YOLOv7 network with Swin Transformer v2 structure, and the
improved YOLOv7 network with SwinV2_TDD structure. To evaluate the performance, we
used P, R, and mAP as the metrics. The results of the experiments are presented in Table 1.

Table 1. Analysis and Comparison of SwinV2_TDD Structure Performance.

Experiments P/% R/% mAP/% mAP0.5:0.95/%


YOLOv7s 86.47 97.21 95.08 51.25
Swin Transformer v2-YOLOv7 89.03 98.87 97.14 52.47
SwinV2_TDD-YOLOv7 90.21 99.11 97.54 53.50

Table 1 represents the effectiveness of the proposed SwinV2_TDD method in this paper.
(1) The results indicate that replacing some ELAN modules with the Swin Transformer
v2 structure in the original YOLOv7 network improves the P value by 2.56%, R value by
1.66%, and mAP value by 2.06%. This suggests that incorporating the Swin Transformer
v2 structure in YOLOv7 can enhance the accuracy of detecting PCB defects.
(2) Moreover, replacing some ELAN modules with the SwinV2_TDD structure in
the YOLOv7 network results in a greater improvement in P value by 3.74%, R value by
1.90%, and mAP value by 2.46%, compared to the original YOLOv7 network. Furthermore,
compared to adding the Swin Transformer v2 structure to YOLOv7, SwinV2_TDD achieves
better P improvement by 1.18%, R improvement by 1.66%, and mAP improvement by
2.06%. Therefore, these findings verify the effectiveness of SwinV2_TDD in achieving
higher detection accuracy than Swin Transformer v2 in YOLOv7.

4.4.2. MFSA Magnification Factor Experiment


This experiment aimed to examine how the scaling factor in the MFSA mechanism
affects algorithm performance. The study kept other network structures constant and
varied the scaling factor to identify the optimal value, ultimately enhancing the model’s
performance. mAP and FPR served as the evaluation criteria, and Table 2 shows the
comparative results of the experiments.
Electronics 2023, 12, 2120 14 of 18

Table 2. The influence of different magnification factors on model performance.

Magnification Factor mAP/% FPR/%


1 93.23 7.64
2 96.05 5.22
3 97.16 4.41
4 96.78 5.60
5 95.56 6.85
6 93.04 7.02
7 93.73 7.19
8 91.56 7.78
9 91.04 8.56

Based on the findings in Table 2, it is apparent that the mAP of the model improves
as the scaling factor increases, whereas the false alarm rate decreases. Specifically, the
highest accuracy of the MFSA-YOLOv7 model was achieved at a scaling factor of 3, with a
maximum accuracy of 97.16%, and a minimum false alarm rate of 4.21%. However, when
the scaling factor exceeds 3, the model’s accuracy starts to decrease while the false alarm
rate continues to increase. Thus, the experiment suggests that the optimal value for the
scaling factor is 3, since a larger value causes the attention mechanism to learn too much
irrelevant information, leading to a decrease in the model’s performance.

4.4.3. Performance Analysis of MFSA Mechanism


The focus of this experiment was to investigate the impact of the improved MFSA mech-
anism on model performance. The comparative study comprised three models: the original
YOLOv7 network, the YOLOv7 network with the SA mechanism, and the YOLOv7 network
with the improved MFSA mechanism, which utilized a scaling factor of 3. P, R, and mAP were
the evaluation metrics used, and Table 3 presents the experimental results.

Table 3. Performance analysis of MFSA mechanism.

Experiments P/% R/% mAP/% mAP0.5:0.95/%


YOLOv7s 86.47 97.21 95.08 51.25
SA-YOLOv7 88.63 98.21 96.54 51.92
MFSA-YOLOv7 89.88 98.84 97.16 52.73

Table 3 shows the effectiveness of the proposed MFSA mechanism in this paper. The
following observations can be made.
(1) When compared to the original YOLOv7 network, introducing the SA mechanism
in the YOLOv7 network resulted in an increased P value by 2.16%, R value by 1.00%, and
mAP value by 1.46% in detecting PCB defects. This suggests that incorporating the SA
mechanism in the YOLOv7 network can improve the accuracy of PCB defect detection.
(2) Comparing the original YOLOv7 network to the YOLOv7 network with the MFSA
mechanism, it was found that the latter improved the P value by 3.41%, R value by
1.63%, and mAP value by 2.08%. Additionally, when compared to the YOLOv7 net-
work with the SA mechanism, the MFSA mechanism improved the P value by 1.25%,
R value by 0.63%, and the mAP value by 0.62%. These results demonstrate that incor-
porating the MFSA mechanism in the YOLOv7 network can achieve higher detection
accuracy than incorporating the SA mechanism, thereby validating the effectiveness of the
MFSA mechanism.

4.4.4. Comparison of Model Performance with Different Activation Functions


The impact of various activation functions on the performance of the YOLOv7 network
was investigated in this experiment. Activation functions are known to enhance the
model’s nonlinear fitting ability and facilitate the learning of intrinsic correlations within
Electronics 2023, 12, 2120 15 of 18

the data, ultimately leading to improved model performance. The experiment involved
substituting Sigmoid, Relu, SiLU, and Mish activation functions for those in the original
YOLOv7 network structure while keeping the network structure unchanged. P, R, and
mAP were employed as performance evaluation criteria, and the findings are tabulated
in Table 4.

Table 4. Comparison of model performance with different activation functions.

Experiments P/% R/% mAP/% mAP0.5:0.95/%


Sigmoid 83.62 94.19 92.38 48.61
ReLU 85.41 96.12 94.78 50.93
SiLU 86.47 97.21 95.08 51.25
Mish 87.93 98.34 96.17 51.74

Based on the experiment’s outcomes, the Mish activation function exhibited the best
performance, achieving P, R, and mAP scores of 87.93%, 98.34%, and 96.17%, respectively.
Compared to the use of the SiLU activation function in the original YOLOv7 network, it
outperformed by 1.46%, 1.13%, and 1.09%, correspondingly. Furthermore, compared to
Sigmoid and ReLU activation functions, the Mish activation function showed significant
improvements in P, R, and mAP. These findings demonstrate that the Mish activation func-
tion enhances the model’s nonlinear fitting ability and has a stronger nonlinear expression
ability when detecting smaller objects.

4.4.5. Comparison of Performance between Different Models


The results of the PCB defect detection model based on the improved YOLOv7 de-
signed in this work were compared with the current mainstream object detection networks,
including SSD512, YOLOv3, YOLOv5, YOLOv7, Faster R-CNN, and DenseNet. The detec-
tion results are shown in Table 5.

Table 5. Comparison of performances of different models.

Algorithms P/% R/% mAP/% mAP0.5:0.95/%


SSD512 84.07 94.85 92.09 48.79
YOLOv3 85.13 95.36 92.75 49.12
YOLOv5s 86.47 97.21 94.69 50.53
YOLOv7s 87.21 97.81 95.08 51.25
Faster R-CNN 85.42 96.48 93.08 49.87
DenseNet 87.35 97.46 94.12 51.39
Our proposed 94.53 99.49 98.74 53.52

(1) The experimental results show that the enhanced YOLOv7 network model exhibited
the highest accuracy in detecting PCB defects, with P, R, and mAP scores of 94.53%,
99.49%, and 98.74%, respectively. Compared to the original YOLOv7 network, there was
a significant improvement of 7.32%, 1.68%, and 3.66% in P, R, and mAP, respectively.
Additionally, the improved model also demonstrated notable performance enhancements
in comparison to various other popular object detection networks.
(2) Furthermore, the improved YOLOv7 network model achieved its highest mAP0.5:0.95,
reaching 53.52%, which is a 2.27% increase compared to the original YOLOv7 network and
is also higher than the values achieved by other mainstream object detection networks.
These results suggest that based on the improved YOLOv7 network, the enhanced method
can maintain high P and R values across different IoU thresholds in PCB defect detection.
Therefore, the findings suggest that the accuracy of PCB defect detection can be effectively
improved using the enhanced method.
(2) Furthermore, the improved YOLOv7 network model achieved its highest
mAP0.5:0.95, reaching 53.52%, which is a 2.27% increase compared to the original YOLOv7
network and is also higher than the values achieved by other mainstream object detection
networks. These results suggest that based on the improved YOLOv7 network, the en-
Electronics 2023, 12, 2120
hanced method can maintain high P and R values across different IoU thresholds in16PCB of 18
defect detection. Therefore, the findings suggest that the accuracy of PCB defect detection
can be effectively improved using the enhanced method.

4.4.6. Display
4.4.6. Display of
of Detection
Detection Effect
Effect
In the
In the following
following examples,
examples, the
the enhanced
enhanced method
method was
was able
able to
to detect
detect all
all six
six types
types of
of
errors with high accuracy. Specifically, the detection accuracy for missing_hole,
errors with high accuracy. Specifically, the detection accuracy for missing_hole, mouse_bite,
and spurious_copper
mouse_bite, was 1.00, while
and spurious_copper wasthe detection
1.00, accuracy
while the for open_circuit,
detection short_circuit,
accuracy for open_circuit,
and spur was 0.99. Figure 10 shows the specific detection effect pictures.
short_circuit, and spur was 0.99. Figure 10 shows the specific detection effect pictures.

(a) (b) (c)

(d) (e) (f)


Figure
Figure 10.
10. Sample
Sampletest
testresults:
results:(a)(a)missing
missinghole;
hole;(b)(b)
mouse bite;
mouse (c) (c)
bite; open circuit;
open (d) (d)
circuit; short circuit;
short (e)
circuit;
spur; (f) spurious
(e) spur; copper.
(f) spurious copper.

5. Conclusions
Conclusions
The paper
paperpresents
presentsanan improved
improved method
method for detecting defects
for detecting in printed
defects circuit boards
in printed circuit
(PCBs) by enhancing the YOLOv7 network with an improved Swin
boards (PCBs) by enhancing the YOLOv7 network with an improved Swin Transformer Transformer V2 struc-
ture.
V2 The proposed
structure. method introduces
The proposed the MFSA
method introduces themechanism, which includes
MFSA mechanism, a convolu-
which includes a
tional layer and
convolutional a scaling
layer factor to
and a scaling enhance
factor the attention
to enhance mechanism’s
the attention mechanism’sadaptability and
adaptability
perception
and ability.
perception Moreover,
ability. the activation
Moreover, function
the activation is changed
function to Mish
is changed to increase
to Mish accu-
to increase
racy and generalization ability. The experiments are conducted on
accuracy and generalization ability. The experiments are conducted on public datasetspublic datasets and
a dataset of painted and wired Rigid PCBs. Moreover, the proposed
and a dataset of painted and wired Rigid PCBs. Moreover, the proposed defect detection defect detection
method is
method is trained
trainedandandtested
testedonly
onlyonona adataset
dataset ofof
painted
painted and wired
and wiredrigid PCBs.
rigid TheThe
PCBs. results
re-
show that the proposed method achieves a higher mAP of 3.66% compared
sults show that the proposed method achieves a higher mAP of 3.66% compared to the to the original
YOLOv7 YOLOv7
original network, demonstrating its effectiveness
network, demonstrating for PCB defectfor
its effectiveness detection. However,
PCB defect since
detection.
detecting small PCB defects is challenging, the network will be further optimized in the
future to improve detection accuracy.

Author Contributions: Conceptualization, Y.Y. and H.K.; methodology, Y.Y.; software, Y.Y.; vali-
dation, Y.Y. and H.K.; formal analysis, Y.Y.; investigation, Y.Y.; resources, Y.Y.; data curation, Y.Y.;
writing—original draft preparation, Y.Y.; writing—review and editing, H.K.; visualization, Y.Y.;
supervision, H.K. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Humanities and Social Sciences research project of the
Ministry of Education (grant number 20YJAZH046) and the Scientific Research Project of the Beijing
Educational Committee (grant number KM202011232022).
Data Availability Statement: The data that support the findings of this study are available from the
corresponding author upon reasonable request.
Electronics 2023, 12, 2120 17 of 18

Acknowledgments: This work was supported by the Humanities and Social Sciences Research Project
of the Ministry of Education, the Scientific Research Project of the Beijing Educational Committee, and
the Department of Information Security of Beijing Information Science and Technology University.
Conflicts of Interest: The authors declare that they have no competing interest.

References
1. Zheng, L.J.; Zhang, X.; Wang, C.Y.; Wang, L.F.; Li, S.; Song, Y.X.; Zhang, L.Q. Experimental study of micro-holes position accuracy
on drilling flexible printed circuit board. In Proceedings of the 11th Global Conference on Sustainable Manufacturing, Berlin,
Germany, 23–25 September 2013.
2. Deng, L. Research on PCB Surface Assembly Defect Detection Method Based on Machine Vision. Master’s Thesis, Wuhan
University of Technology, Wuhan, China, 2019.
3. Zhu, Y.; Ling, Z.G.; Zhang, Y.Q. Research progress and prospect of machine vision technology. J. Graph. 2020, 41, 871–890.
4. Khalid, N.K.; Ibrahim, Z.; Abidin, M.S.Z. An Algorithm to Group Defects on Printed Circuit Board for Automated Visual
Inspection. Int. J. Simul. Syst. Sci. Technol. 2008, 9, 1–10.
5. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings
of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14;
Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37.
6. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
7. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 580–587.
8. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December
2015; pp. 1440–1448.
9. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28. [CrossRef]
10. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object
detectors. arXiv 2022, arXiv:2207.02696.
11. Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up
capacity and resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans,
LA, USA, 18–24 June 2022; pp. 12009–12019.
12. Zhang, Q.L.; Yang, Y.B. Sa-net: Shuffle attention for deep convolutional neural networks. In Proceedings of the ICASSP 2021-
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021;
pp. 2235–2239.
13. Liu, Z.; Qu, B. Machine vision based online detection of PCB defect. Microprocess. Microsyst. 2021, 82, 103807. [CrossRef]
14. Kim, J.; Ko, J.; Choi, H.; Kim, H. Printed circuit board defect detection using deep learning via a skip-connected convolutional
autoencoder. Sensors 2021, 21, 4968. [CrossRef]
15. Gaidhane, V.H.; Hote, Y.V.; Singh, V. An efficient similarity measure approach for PCB surface defect detection. Pattern Anal. Appl.
2018, 21, 277–289. [CrossRef]
16. Annaby, M.H.; Fouda, Y.M.; Rushdi, M.A. Improved normalized cross-correlation for defect detection in printed-circuit boards.
IEEE Trans. Semicond. Manuf. 2019, 32, 199–211. [CrossRef]
17. Tsai, D.M.; Huang, C.K. Defect detection in electronic surfaces using template-based Fourier image reconstruction. IEEE Trans.
Compon. Packag. Manuf. Technol. 2018, 9, 163–172. [CrossRef]
18. Cho, J.W.; Seo, Y.C.; Jung, S.H.; Jung, H.K.; Kim, S.H. A study on real-time defect detection using ultrasound excited thermography.
J. Korean Soc. Nondestruct. Test. 2006, 26, 211–219.
19. Dong, J.Y.; Lu, W.T.; Bao, X.M.; Luo, S.Y.; Wang, C.Q.; Xu, W.Q. Research progress of the PCB surface defect detection method
based on machine vision. J. Zhejiang Sci.-Tech. Univ. (Nat. Sci. Ed.) 2021, 45, 379–389.
20. Chen, S. Analysis of PCB defect detection technology based on image processing and its importance. Digit. Technol. Appl. 2016,
10, 64–65.
21. Li, Z.M.; Li, H.; Sun, J. Detection of PCB Based on Digital Image Processing. Instrum. Tech. Sens. 2012, 8, 87–89.
22. Liu, B.F.; Li, H.W.; Zhang, S.Y.; Lin, D.X. Automatic Defect Inspection of PCB Bare Board Based on Machine Vision. Ind. Control.
Comput. 2014, 27, 7–8.
23. Malge, P.S.; Nadaf, R.S. PCB defect detection, classification and localization using mathematical morphology and image processing
tools. Int. J. Comput. Appl. 2014, 87, 40–45.
24. Moganti, M.; Ercal, F. Automatic PCB inspection systems. IEEE Potentials 1995, 14, 6–10. [CrossRef]
25. Ray, S.; Mukherjee, J. A Hybrid Approach for Detection and Classification of the Defects on Printed Circuit Board. Int. J. Comput.
Appl. 2015, 121, 42–48. [CrossRef]
Electronics 2023, 12, 2120 18 of 18

26. Ardhy, F.; Hariadi, F.I. Development of SBC based machine-vision system for PCB board assembly automatic optical inspec-
tion. In Proceedings of the 2016 International Symposium on Electronics and Smart Devices (ISESD), Bandung, Indonesia,
29–30 November 2016; pp. 386–393.
27. Baygin, M.; Karakose, M.; Sarimaden, A.; Akin, E. Machine vision-based defect detection approach using image processing.
In Proceedings of the 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey,
16–17 September 2017; pp. 1–5.
28. Ma, J. Defect detection and recognition of bare PCB based on computer vision. In Proceedings of the 2017 36th Chinese Control
Conference (CCC), Dalian, China, 26–28 July 2017; pp. 11023–11028.
29. Deng, Y.S.; Luo, A.C.; Dai, M.J. Building an automatic defect verification system using deep neural network for pcb defect
classification. In Proceedings of the 2018 4th International Conference on Frontiers of Signal Processing (ICFSP), Poitiers, France,
24–27 September 2018; pp. 145–149.
30. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
31. Huang, W.; Wei, P. A PCB dataset for defects detection and classification. arXiv 2019, arXiv:1901.08204.
32. He, X.Z. Research on Image Detection of Solder Joint Defects Based on Deep Learning. Master’s Thesis, Southwest University of
Science and Technology, Mianyang, China, 2021; pp. 30–38.
33. Geng, Z.; Gong, T. PCB surface defect detection based on improved Faster R-CNN. Mod. Comput. 2021, 19, 89–93.
34. Ding, R.; Dai, L.; Li, G.; Liu, H. TDD-net: A tiny defect detection network for printed circuit boards. CAAI Trans. Intell. Technol.
2019, 4, 110–116. [CrossRef]
35. Sun, C.; Deng, X.Y.; Li, Y.; Zhu, J.R. PCB defect detection based on improved Inception-ResNet-v2. Inf. Technol. 2020, 44, 4.
36. Hu, S.S.; Xiao, Y.; Wang, B.S.; Yin, J.Y. Research on PCB defect detection based on deep learning. Electr. Meas. Instrum. 2021, 58,
139–145.
37. Li, C.F.; Cai, J.L.; Qiu, S.H.; Liang, H.J. Defect detection of PCB based on improved YOLOv4 algorithm. Electron. Meas. Technol.
2021, 44, 146–153.
38. Wang, S.Q.; Lu, H.; Lu, D.; Liu, Y.; Yao, R. PCB Board Defect Detection Based on Lightweight Artificial Neural Network. Instrum.
Tech. Sens. 2022, 5, 98–104.
39. Zhou, W.J.; Li, F.; Xue, F. Identification of Butterfly Species in the Wild Based on YOLOv3 and Attention Mechanism. J. Zhengzhou
Univ. (Eng. Sci.) 2022, 43, 34–40. [CrossRef]
40. Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement
learning. Neural Netw. 2018, 107, 3–11. [CrossRef]
41. Misra, D. Mish: A self-regularized non-monotonic activation function. arXiv 2019, arXiv:1908.08681.
42. Kateb, Y.; Meglouli, H.; Khebli, A. Steel surface defect detection using convolutional neural network. Alger. J. Signals Syst. 2020, 5,
203–208. [CrossRef]
43. Guo, X. Research on PCB Bare Board Defect Detection Algorithm Based on Deep Learning. Master’s Thesis, Nanchang University,
Nanchang, China, 2021. [CrossRef]
44. Guo, D.; Qiu, B.; Liu, Y.; Xiang, G. Supernova Detection Based on Multi-scale Fusion Faster RCNN. In Proceedings of the 2021 6th
International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 9–11 April 2021.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like