0% found this document useful (0 votes)
4 views

Research_on_Target_Localization_Technology_based_on_Depth_of_field_fusion_of_generative_adversarial_networks

Uploaded by

ultramancocomi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Research_on_Target_Localization_Technology_based_on_Depth_of_field_fusion_of_generative_adversarial_networks

Uploaded by

ultramancocomi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Proceedings of 2024 IEEE

International Conference on Mechatronics and Automation


August 4 - 7, Tianjin, China

Research on Target Localization Technology based on


Depth of field fusion of generative adversarial networks
Lingfeng Qiao1, Guodong Liu 1, Cheng Lu 1, Binghui Lu*
2024 IEEE International Conference on Mechatronics and Automation (ICMA) | 979-8-3503-8807-7/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICMA61710.2024.10633036

1
School of Instrumentation Science and Engineering
Harbin Institute of Technology, 92 Xidazhi Street, Harbin, Heilongjiang, China

[email protected]; [email protected]; [email protected];


[email protected] *corresponding author

Abstract - Target positioning is a crucial step in beam target to synthesize multiple images of different focus regions in the
coupling, but the multi channel imaging system of the beam same scene into a fully focused image with high reliability and
target coupling sensor struggles to obtain reliable target position understandability. Existing MFIF algorithms based on deep
information due to its small depth of field and poor illumination. learning can be divided into decision diagral-based methods
This paper proposes a depth of field fusion algorithm based on and end-to- end methods [3]. Similar to transform domain-
generative adversarial networks that can merge multiple partially based methods, the end-to-end MFIF algorithm directly inputs
clear target images into a single image with higher clarity, the source image into the network, without post-processing,
thereby extending the depth of field and improving the accuracy and directly produces the fused image. Many end-to-end MFIF
of target positioning. Specifically, we first use OpticalFlow Net to
algorithms have an encoder-decoder structure. Xu et al. [4]
estimate the optical flow to warp the source image and defocused
image to a unified camera view, then generate an image with clear designed the first end-to-end fully convolutional two-stream
target edges using Fusion Net; finally, we extract the target network, which includes a convolutional layer, a fusion layer,
position parameters through post processing methods such as and a deconvolution layer. In this network, the convolutional
edge detection to assist in high precision positioning. Additionally, layer generates feature maps from the source image and then
to train our network, we created a multi focus pixel misaligned feeds them to the fusion layer for feature integration. Finally,
dataset based on the principle of point spread. Experimental these features are used to reconstruct the fused image in the
results show that our algorithm outperforms existing advanced deconvolution layer. Zero-shot Multi-Focus Image Fusion
methods in both quality and effectiveness, being able to control (ZMFF)[5] is an unsupervised network that uses INet and
the target positioning accuracy within 2.7μm, meeting the
MNets to simultaneously generate a clear fused image and the
accuracy requirements.
corresponding focus map, and uses reconstruction constraints
Index Terms - Depth of field fusion, Target positioning, to improve information transmission. Most of the existing deep
GANs, PSF. learning-based MFIF methods fuse images with good lighting
conditions and strict alignment, and the depth difference
between the foreground and background in the scene is
I. INTRODUCTION
obvious. However, in the application scenario of this paper,
Inertal Confinement Fusion (ICF) is one of the main ways the target objects to be processed are small, and the actual
to obtain fusion energy [1]. Target identification and collected images have the following characteristics: the depth
positioning are key issues [2]. The online positioning of the of field of the microscopic imaging system is very small, there
target is completed by monitoring the multi-channel imaging are few texture details in dark areas affected by lighting, and
system of the beam-target coupling aiming sensor, as shown in the target images collected during the movement of the camera
Figure 1. The specific process is as follows: the upper, lower, have not been aligned.
middle and side cameras of the multi-channel imaging system In order to collect multiple continuous defocused images,
in the beam-target coupling sensor jointly observe the target to the pixels of the image will be misaligned due to camera shake
determine the spatial position of the target, and the six-degree- when the camera is moved. In general, there are two ways to
of-freedom robotic arm is used to adjust the position of the generate MFIF datasets. The first is to capture multi-focus
target according to the measurement results, thereby achieving images in a natural scene. Kou et al. [6] obtained images with
high-precision positioning of the target. different depths of field by changing the aperture size, to some
The material of the target is metal, has a complex structure extent sacrificing the resolution of the fused image. The
surface, and the side is mostly curved. Compared with the second method is to use the full definition image to generate
upper and lower cameras, limited by the small depth of field of the blurred image, for example, Gaussian blur is applied to the
the imaging system, the middle and side cameras cannot get a full definition image in SYNDOF[7] . Ma, etc. [8] - matte
clear and complete target edge, which affects the positioning enrolled boundary from focal diffusion model is put forward,
accuracy. Aiming at the problem of small depth of field of and generated near the border with focus on effective focal
imaging system, Multi-focus image fusion (MFIF) is a method diffusion simulation from the confocal images. Maximov [9],

979-8-3503-8807-7/24/$31.00 ©2024 IEEE 1868

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on December 20,2024 at 08:51:04 UTC from IEEE Xplore. Restrictions apply.
such as using Blender Cycles renderer creates a synthetic data clear images are fused, and on the other hand, the problem of
set. Synthetic datasets often face a domain gap between pixel dislocation in the process of image acquisition is solved.
synthetic and natural images. C. OpticalFlow-Net
In summary, in view of the fact that the middle and side In order to solve the problem of image pixel misalignment
cameras cannot obtain clear target edges, we propose a target caused by camera jitter during shooting, we adopt the
image processing method for the imaging system with small Recurrent All-Pairs Field Transforms (RAFT)[10] as our
depth of field and few identifiable features, which is used to optical flow network, which has excellent generalization
assist the target location of the middle and side cameras: ability and can be applied to small amplitude prediction of
1) We propose a novel end-to-end network, the depth of optical flow. To align the source images overlaid in focus into
field fusion algorithm based on generative adversarial a unified camera view. Optical flow is the motion of each pixel
networks, for fusing multi-sequence defocused images. in the image sequence, we will offset and rotate the resulting
2) Our depth-of-field fusion algorithm combines with defocused map, and record the offset rotation parameters as
image processing to assist the middle and side cameras of the optical flow annotations.
target to observe the target and realize the target localization. RAFT mainly consists of three parts: the feature encoder
The depth-of-field fusion algorithm is used to obtain an image extracts the feature vector of each pixel; The correlation layer
with clear target edges, and the subsequent image processing is generates 4D correlation volume for all pixel pairs, and
performed to obtain the target edge position for target subsequent pooling generates low-resolution volume. The
positioning. cyclic update operator based on GRU retrieves values from the
3) A virtual data set with multi-focus and pixel correlation volume and iteratively updates the flow field
misalignment is established, and a more realistic data set is initialized to zero. By iterating this process, RAFT is able to
obtained by using the point diffusion principle. Each group of improve its estimation of optical flow until convergence is
image data consists of one clear all-focus image and ten reached. The loss function used is L1 loss.
defocused images. D. Fusion-Net
The structure of this paper is as follows: Section 2 The fusion network model is mainly divided into three
provides a detailed introduction to our proposed target parts: feature extraction, feature fusion and image
positioning algorithm, including the depth of field fusion reconstruction. With the generative adversarial network as the
algorithm, dataset creation, and target position calculation. framework, the encoder-decoder structure is often used in
Section 3 presents a quantitative analysis of the algorithm and image processing fields such as image segmentation. In this
its application in target positioning, along with the calculation project, U-net is used as the backbone network, as shown in
of positioning accuracy. Section 4 offers the conclusion. Fig. 2.
In the feature extraction part, the encoder uses a 3×3
convolution layer with a stride of 2 for downsampling. The
feature map of the convolution process is mainly based on
local information. In order to make the feature map of the
image contain more contextual information, Huang et al. [11]
proposed a new cross-cross attention (CCAttention) block to
obtain the horizontal and vertical global context information of
each pixel. We use CCAttention twice in the last
downsampling to capture the global context information.
Compared with the traditional self-attention mechanism that
needs to calculate the relationship between each pixel and all
other pixels, it can effectively integrate global information
while maintaining computational efficiency.
Fig. 1 Principle diagram of target positioning guidance
In the feature fusion part, the decoder gradually restores
II. METHOD the spatial dimension through the transposed convolution of
A. Overall design the nearest neighbor interpolation. There is a concat jump
In order to locate the target, the acquired defocused target connection between the encoder and the decoder, which fuses
image is processed. Firstly, ten defocused target images are the low-level feature map in the encoder with the feature map
fused into an image with clear target edge by using the image in the decoder, which helps to retain these detailed features
fusion algorithm proposed in this paper. Then, the position of during the reconstruction process. After restoring the spatial
the target is calculated to realize the target location of the dimension, the extracted features are fused again using a 3×3
middle and side cameras. convolution layer combined with a softmax operator. In order
B. Image fusion to retain the shallow information of the original image, the
In order to obtain a clear image of the target edge and image is reconstructed. The input original image and the fused
perform high-precision target positioning, we study the image are spliced as the input of the feature reconstruction
collected defocused target images. On the one hand, the partial module to continuously provide the underlying information for

1869

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on December 20,2024 at 08:51:04 UTC from IEEE Xplore. Restrictions apply.
the subsequent layers, thereby retaining the texture details of proposed a Relativistic Average Discriminator (RaGAN) to
the original image. estimate the probability that the input data is more realistic
In recent years, GAN(Generative adversarial network) than the opposite type of data. That is, the discriminator
[12] has been proposed to provide high-quality reconstructed evaluates not only how real an individual image is, but also
images. However, the training process of this early proposed how real an image is relative to other images.
GAN is unstable. To improve this, Jolicoeur-Martineau[13]

Fig. 2 shows the architecture of the depth-of-field fusion model based on generative adversarial networks
E. Loss of C j ×H j ×W j is generated by the j-th convolutional layer of the
The adversarial loss helps to bridge the gap between the network ϕ .
generated data distribution and the actual data distribution,
 ϕ, j 1
ζ feat ( x, real ) = C H W ‖ϕ j ( x) − ϕ j ( real ‖ ) 2F
and the perceptual loss improves the structural similarity
between the reconstructed image and the ground truth image.  j j j

The overall loss function of the network is composed of the ϑstyle


ϕ, j
( x, real ) =‖Gϕj ( x) − Gϕj ( real ‖ ) 2F
generator loss and the discriminator loss, where the generator  H j Wj (4)
Gϕ ( x) ′ = 1
loss ΓG is composed of the generator adversarial loss LG and
 j c ,c  j h, w,c j h, w,c′
C j H jW j h =1 w =1
ϕ ( x ) ϕ ( x )
the perceptual loss PL, and the discriminator loss Γ D is  ϕ, j ϕ, j
composed of the discriminator LD .  PL = ζ feat + ϑstyle .
Γ G = PL + LG (1) F. Dataset
When the optical system is photographing the object,
Γ D = LD (2)
when the target object is not imaged in the sensor plane, the
Since RaGAN has better performance than traditional image will be blurred. The image blur caused by the defocus
GAN. Therefore, the GAN loss is replaced with the RaGAN of the image plane is called defocus blur, which corresponds
loss to generate higher quality images, and the adversarial loss to the mathematical model of defocus, as shown in Fig. 3.
is expressed as (3) :
(
C ( xr ) = S D ( xr ) − E x ~ E D ( x f )
 f
)
C x = S D x − E D x
 ( f) ( ( f ) xr ~ P ( r ) )
 (3)
(
 LD = E xr ~ P  log ( −C ( xr ) )  + E x f ~ E log C ( x f ) )

 LG = E x ~ P  log ( C ( xr ) )  + E x ~ E  log −C ( x f
( ))
 r   f 
Where xr and xf represent the real image data and the Fig. 3 Imaging model of defocused image
generated fake image data, respectively., respectively, P and E According to the imaging part of the imaging model, the
are the distribution of the real image and the fake image, radius of the spot r, the diameter of the exit pupil D, the
respectively. When the input of the discriminator is x, the distance of the focus position from the focal plane of the
output of the discriminator is D(x). S is the Sigmoid function camera △x 'and the distance s between the sensor and the lens
as a nonlinear activation that compresses the output value to satisfy the similar triangle principle, and the relationship can
(0,1). be obtained:
In this paper, the perceptual loss of the VGG19[14] 2r ∆x ′
network is used to compare the difference in high-level feature = (5)
D s ± ∆x ′
representation between the generated image and the target
image, which helps to generate images that are similar to the According to geometrical optics, the formula for axial
target image in content and texture. Perceptual loss PL magnification is:
ϕ, j ϕ, j ∆x ′
contains content loss ζ feat and style loss ϑstyle . When α= = β2 (6)
∆x
processing an image x, The feature map ϕ j ( x) with a shape

1870

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on December 20,2024 at 08:51:04 UTC from IEEE Xplore. Restrictions apply.
According to the lens parameters of the beam target defocused images. The network was run on 1200 epochs. The
coupling sensor, the relationship between the diameter of the batch size was chosen to be 2. Adam optimizer with
fuzzy circle and the defocus can be obtained by substituting it momentum parameters β1 = 0.9 and β2 = 0.999 was used for
into: training. The learning rate is set to 0.0001. In each training
8.915∆x iteration, the generator and discriminator cross-update their
c ( x, y ) = 2r = (7) parameters.
152.4 + ∆x
Considering the diffraction and other non-ideal conditions H. Position calculation
of the lens, the point spread function of the lens can be Target localization is mainly divided into two parts: edge
approximated by two-dimensional Gaussian function ℎ( , extraction and position parameter calculation. After the depth
)[15] as follows. of field fusion based on the principle of point spread, the
x2 + y 2
target image with clear edges can be obtained. Edge feature
1 −
information can be extracted by edge detection, and then the
h ( x, y ) = 2
e 2σ 2
(8)
2πσ six-dimensional pose information of the target can be obtained
Among them, σ ( x, y ) = kc ( x, y )∕2 [15]; k is a constant, according to the edge information. Therefore, this paper uses
the DexiNed edge detection algorithm based on deep learning
therefore, the fuzzy formula for each pixel is established based
[17], which can output clearer and finer edges in the case of
on the point spread principle. k = 1 / 2 ,Therefore, we build a poor imaging effect. After obtaining the target edge, the left
multi-focus dataset based on the point spread principle, and edge of the target is obtained by local threshold segmentation,
perform pixel misalignment on it. contour extraction and rotation rectangle fitting.
The dataset production includes the following steps: first,
the 3D model is created, and then the different pose images of III. EXPERIMENTAL RESULTS
the model are rendered and the corresponding depth data are A. Evaluation of the algorithm
obtained. The depth data were converted into the defocus In order to test the effect of the depth of field fusion
amount of each pixel, and the blurring effect of each pixel was algorithm we proposed in the field of image fusion, our
simulated. A set of 500 images including one full-definition method was compared with the boundary-finding multi-focus
image and ten defocus images that simulated gradually image fusion (BF) based on multi-scale morphological focus
approaching the target were generated, as shown in Fig. 4. measurement [18], DSIFT (Dense SIFT) [19], IFCNN [20]
and FusionDiff [21] methods. We used two sets of image data
to complete the comparative experiment. One set selected the
fusion results of 20 sets of virtual data sets made in this paper,
and the other set selected the fusion results of 20 sets of
defocused images collected by the middle and side cameras.
The fusion results were quantitatively analyzed using the BF
method, FusionDiff and the method in this paper. Considering
Fig. 4 (a) full clear image of the model; (b) depth map of depth data that the camera depth of field in our application scenario is
mapping; (c) defocused image after blurring small, but it is impossible to determine the appropriate number
In order to simulate the offset and rotation in the actual of defocused images, four defocused images (Our method4)
camera acquisition and ensure the accuracy of target pose and ten defocused images (Our method10) were selected for
detection, the offset and rotation processing is applied to the training and comparison during training. Therefore, four
defocused images numbered 2~10 in the process of dataset results were obtained for each set of images for comparison.
production, and the position of the first defocused image is B. Quantitative analysis
used as the reference to reduce the impact of camera shaking. In the fusion images that need to be evaluated in this
The Angle change of the image was set to 0.02rad, and the paper, the virtual data set has reference images, and the
position change was 0.0276m. commonly used full reference image quality assessment
G. Experimental setting and training strategy methods, namely PSNR and SSIM indicators [22], are used.
In this paper, OpticalFlow-Net is pre-trained first, and we PSNR is a pixel-by-pixel image evaluation index, the
adopt a smaller version of RAFT as our OpticalFlow-Net to higher the PSNR, the better the image quality, which is used
calculate the optical flow between two random images in the to measure the deviation between the reconstructed image R
focus stack, which can achieve good generalization on our and the true image G:
dataset. We use AdamW[16] with a weight decay of 0.00005
 MAX G 
to optimize OpticalFlow-Net. The learning rate is set to PSNR(G, R) = 20 × log10  2  (9)
0.0001 and the batch size is 15. In addition, to improve the ‖G − R‖ 
results of optical flow estimation, the iterations are set to 10. Zhou et al. proposed a perception-based image quality
Then, OpticalFlow-Net was jointly trained with the fusion metric SSIM, which quantifies the differences in brightness,
network, and 500 groups of virtual datasets were used for contrast, and structure between the reconstructed image R and
training, each of which contained one clear image and ten the true image G. The value of SSIM ranges from 0 to 1,

1871

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on December 20,2024 at 08:51:04 UTC from IEEE Xplore. Restrictions apply.
where a value closer to 1 indicates that two images are more Where µ is the mean of the image, σ G is the variance of
similar, and it is expressed as follows.
the image, σ RG is the covariance of the image G and R.
( 2µ µ + 2.55)( 2σ RG + 58.5225)
SSIM(G, R) = 2 G 2 R (10) 20 groups of virtual data sets are selected to evaluate the
( )(
µR + µG + 2.55 σ R2 + σ G2 + 58.5225 ) fusion effect of the four fusion methods using PSNR and
SSIM, as shown in Table I:
TABLE I
VIRTUAL IMAGE QUALITY ASSESSMENT VALUES OF DIFFERENT FUSION METHODS

Fusion method DSIFT BF IFCNN FusionDiff Our method4 Our method10


PSNR 19.9 19.147 20.026 17.160 19.294 22.786
SSIM 0.879 0.932 0.880 0.886 0.936 0.962

The collected target image has no reference image, and 0.025mm, the evaluation results of the six methods are
the no-reference image quality assessment method is used. normalized, only Multi-scale retains the unimodal property.
Due to the particularity of the target image scene, the In summary, in order to meet the requirements of no-
centralized no-reference image quality assessment method is reference quality assessment, the Multi-scale method is
compared. For the study of no-reference image quality selected as the quality assessment standard for the actual
assessment methods, four spatial based methods were acquired images.
selected, including energy gradient function (EOG) [23], gray As shown in Table II, The evaluation results of 20 groups
variance product function (SMD2) [23], maximum gradient of fusion results (including 10 groups of central camera and
and gradient variability (MG and VOG) [24]. A method using 10 groups of side camera images) show that our method ranks
Multi-scale morphological focus-measure (Multi-scale), a the top two in the quality assessment, and the running time is
SVR- based evaluation method BRISQUE[25], and a deep shorter. Compared with the better BF method, the running
learning-based method CLIP-IQA+[26]. time is shortened by nearly 400 times, and the computational
To evaluate the sensitivity of the six no-reference quality efficiency is improved. In addition, our method uses ten
assessment methods to camera defocus blur, we acquired two defocused images for fusion, which can achieve good fusion
sets of images, each with a different defocus interval. When effect while ensuring speed.
the defocus interval of the acquired images is reduced to
TABLE II
NO-REFERENCE IMAGE QUALITY ASSESSMENT VALUES FOR DIFFERENT FUSION METHODS

DSIFT BF IFCNN FusionDiff Our method4 Our method10


Median camera quality asse 19.201 20.129 19.921 19.079 18.534 19.952
Side camera quality 20.012 22.784 22.687 20.034 20.059 23.164
Time/s 19.270 21.012 2.344 217.121 0.013 0.050
C. Target positioning accuracy verification
Based on the above design, a laser measuring instrument
is introduced as the reference truth value to extract the target
image pose, and the effect of the depth of field fusion
algorithm is verified. As shown in Fig. 5, firstly, the laser
measuring instrument was placed below the target to measure
the distance, so as to check whether the lower plane of the
target was horizontal. After leveling the target, the target
image is collected with the middle camera. The target
movement was controlled by a 6-DOF robotic arm, the
defocus image of each position was recorded by the middle
camera, and the actual movement distance was measured by Fig. 5 Principle of accuracy verification
the laser measuring instrument. ∆x . The distance measured by the laser measuring instrument
was used as the actual moving distance of the target, and the
target was controlled to move 1mm each time. A total of eight
groups of image data were collected, and each group included
ten defocused images before and after target movement. Each
group of defocused images was fused by the depth of field
through the network, and then the DexiNed edge detection

1872

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on December 20,2024 at 08:51:04 UTC from IEEE Xplore. Restrictions apply.
method was used to obtain the target edge position, and the
moving pixel distance of the target was calculated. In order to
verify the reliability of the depth of field fusion algorithm, the
defocusing images of each group of image data are selected,
and the edge detection and position calculation are also
carried out. As shown in Figure 6.
The imaging pixel equivalent of the calibrated camera is
4.6μm, and the actual distance of the target edge movement Fig. 6 (a) Fused image; (b) DexiNed edge detection results of the fused
can be calculated. Table III shows the error and standard image; (c) Rectangle fitting results of the fused image
deviation of the detection results with respect to 1mm for
eight groups of images.
TABLE III
POSITION DETECTION ERROR BEFORE AND AFTER CAMERA IMAGE FUSION IN CYLINDRICAL TARGET
(MOVING STEP SIZE 1MM)

Image serial number 1 2 3 4 5 6 7 8 Standard deviation


Fused plot(μm ) 5.2 0.1 3.4 1.8 4.8 6.4 3.4 0.8 2.7
Defocus map(μm ) 8.9 5.4 1.6 3.2 24.7 7.4 15.6 54.2 22.2
Through Table III data analysis, we find that the standard [4] Xu, Kaiping, et al. "Multi-focus image fusion using fully convolutional
two-stream network for visual sensors." KSII Transactions on Internet
deviation of error of pose extraction using defocused images
and Information Systems (TIIS) 12.5 (2018): 2253-2272.
is 22.2μm, which is significantly higher than that using depth- [5] Hu, Xingyu, et al. "Zero-shot multi-focus image fusion." 2021 IEEE
of-field fusion images, which proves the superiority and International Conference on Multimedia and Expo (ICME). IEEE, 2021.
effectiveness of the depth-of-field fusion algorithm in target [6] Kou, Tingdong, et al. "Integrated MPCAM: Multi-PSF learning for large
pose extraction. At the same time, the pose extraction error of depth-of-field computational imaging." Information Fusion 89 (2023):
452-472.
the clear target edge image after depth-of-field fusion can be [7] Lee, Junyong, et al. "Deep defocus map estimation using domain
controlled within 2.7μm, which meets the accuracy adaptation." Proceedings of the IEEE/CVF conference on computer
requirements of target positioning. vision and pattern recognition. 2019.
[8] Ma, Haoyu, et al. "An α-matte boundary defocus model-based cascaded
IV. CONCLUSION network for multi-focus image fusion." IEEE Transactions on Image
Processing 29 (2020): 8668-8679.
In this paper, we propose a depth of field fusion algorithm [9] Maximov, Maxim, Kevin Galim, and Laura Leal-Taixé. "Focus on
based on generative adversarial networks for high-precision defocus: bridging the synthetic to real domain gap for depth
target localization of middle and side cameras during beam estimation." Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition. 2020.
target coupling. The algorithm combines the structure of [10] Teed, Zachary, and Jia Deng. "Raft: Recurrent all-pairs field transforms
OpticalFlow-Net and Fusion-Net to realize the Fusion of for optical flow." Computer Vision–ECCV 2020: 16th European
multi-focus misaligned images and generate an all-focus Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II
image with clear target edges. The DexiNed edge detection 16. Springer International Publishing, 2020.
[11] Huang, Zilong, et al. "Ccnet: Criss-cross attention for semantic
method is used to obtain the target position information to segmentation." Proceedings of the IEEE/CVF international conference
assist the target positioning. Compared with the existing depth on computer vision. 2019.
of field fusion algorithm, the PSNR and SSIM evaluation [12] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in
indexes of our algorithm on the virtual data set are improved neural information processing systems 27 (2014).
[13] Jolicoeur-Martineau, Alexia. "The relativistic discriminator: a key
to 22.786 and 0.962, respectively, and the target positioning element missing from standard GAN." arXiv preprint
accuracy is within 2.7μm, which meets the accuracy arXiv:1807.00734 (2018).
requirements and shows good performance. It can be extended [14] Karen, Simonyan. "Very deep convolutional networks for large-scale
to other small depth of field microscopic observation tasks. In image recognition." arXiv preprint arXiv: 1409.1556 (2014).
[15] Subbarao, Murali, and Gopal Surya. "Application of spatial-domain
addition, we create a virtual dataset of multi- focus and
convolution/deconvolution transform for determining distance from
misaligned images with continuous defocus changes based on image defocus." Optics, Illumination, and Image Sensing for Machine
the principle of point diffusion. This method can be extended Vision VII. Vol. 1822. SPIE, 1993.
to the detection of other shape models. [16] Hutter, Frank, and Ilya Loshchilov. "Decoupled weight decay
regularization." International Conference on Learning Representations
REFERENCES
(ICLR). Vol. 7. 2019.
[1] Shilovskaia, Olga. "Nuclear energy-the energy of the future: hybrid [17] Soria, Xavier, et al. "Dense extreme inception network for edge
fusion reactor." (2021). detection." Pattern Recognition 139 (2023): 109461.
[2] Majumdar, K. C., et al. Systems reliability analysis for the national [18] Zhang, Yu, Xiangzhi Bai, and Tao Wang. "Boundary finding based
ignition facility. No. UCRL-JC-122826; CONF-960912-11. Lawrence multi-focus image fusion through multi-scale morphological focus-
Livermore National Lab.(LLNL), Livermore, CA (United States), 1996. measure." Information fusion 35 (2017): 81-101.
[3] Zhang, Xingchen. "Deep learning-based multi-focus image fusion: A [19] Liu, Yu, Shuping Liu, and Zengfu Wang. "Multi-focus image fusion
survey and a comparative study." IEEE Transactions on Pattern with dense SIFT." Information Fusion 23 (2015): 139-155.
Analysis and Machine Intelligence 44.9 (2021): 4819-4838.

1873

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on December 20,2024 at 08:51:04 UTC from IEEE Xplore. Restrictions apply.
[20] Zhang, Y., et al. "A general image fusion framework based on
convolutional neural network., 2020, 54." DOI: https://doi.
org/10.1016/j. inffus 11 (2019): 99-118.
[21] Li, Mining, et al. "FusionDiff: Multi-focus image fusion using
denoising diffusion probabilistic models." Expert Systems with
Applications 238 (2024): 121664.
[22] Wang, Zhou, et al. "Image quality assessment: from error visibility to
structural similarity." IEEE transactions on image processing 13.4
(2004): 600-612.
[23] Willmott, Cort J., Scott M. Robeson, and Kenji Matsuura. "A refined
index of model performance." International Journal of
climatology 32.13 (2012): 2088-2094.
[24] Zhan, Yibing, and Rong Zhang. "No-reference image sharpness
assessment based on maximum gradient and variability of
gradients." IEEE Transactions on Multimedia 20.7 (2017): 1796-1808.
[25] Mittal, Anish, Anush Krishna Moorthy, and Alan Conrad Bovik. "No-
reference image quality assessment in the spatial domain." IEEE
Transactions on image processing 21.12 (2012): 4695-4708.
[26] Wang, Jianyi, Kelvin CK Chan, and Chen Change Loy. "Exploring clip
for assessing the look and feel of images." Proceedings of the AAAI
Conference on Artificial Intelligence. Vol. 37. No. 2. 2023.

1874

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on December 20,2024 at 08:51:04 UTC from IEEE Xplore. Restrictions apply.

You might also like