0% found this document useful (0 votes)

20 views

Advances in Machine Learning and Deep Learning Applications Towards Wafer Map Defect Recognition and Classification-A Review

Advances in machine learning and deep learning applications towards wafer map defect recognition and classification-a review

Uploaded by

helloszs2023

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Advances in Machine Learning and Deep Learning Applications Towards Wafer Map Defect Recognition and Classification-A Review

Advances in machine learning and deep learning applications towards wafer map defect recognition and classification-a review

Uploaded by

helloszs2023

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Journal of Intelligent Manufacturing (2023) 34:3215–3247

https://doi.org/10.1007/s10845-022-01994-1

Advances in machine learning and deep learning applications

towards wafer map defect recognition and classification: a review
Tongwha Kim1 · Kamran Behdinan1

Received: 24 September 2021 / Accepted: 7 July 2022 / Published online: 23 August 2022
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

Abstract
With the high demand and sub-nanometer design for integrated circuits, surface defect complexity and frequency for semicon-
ductor wafers have increased; subsequently emphasizing the need for highly accurate fault detection and root-cause analysis
systems as manual defect diagnosis is more time-intensive, and expensive. As such, machine learning and deep learning
methods have been integrated to automated inspection systems for wafer map defect recognition and classification to enhance
performance, overall yield, and cost-efficiency. Concurrent with algorithm and hardware advances, in particular the onset of
neural networks like the convolutional neural network, the literature for wafer map defect detection exploded with new devel-
opments to address the limitations of data preprocessing, feature representation and extraction, and model learning strategies.
This article aims to provide a comprehensive review on the advancement of machine learning and deep learning applications
for wafer map defect recognition and classification. The defect recognition and classification methods are introduced and
analyzed for discussion on their respective advantages, limitations, and scalability. The future challenges and trends of wafer
map detection research are also presented.

Keywords Wafer Map · Semiconductor manufacturing · Machine learning · Deep learning · Defect recognition · Defect
classification

Introduction The onset of unknown/rare, mixed, and complicated

defects ultimately results in increased costs, low product
Integrated circuits (IC) are the fundamental electronic com- yield and deteriorated fabrication process stability. As such,
ponent for many electronic devices and are developed on utmost importance has been set upon defect detection and
semiconductor wafer substrates. As the electronics indus- root-cause analysis (RCA) as defect patterns can indicate
try demands for high levels of innovation, development, and the potential causes of process variation. With the accuracy
competition (Ebayyeh & Mousavi, 2020), ICs are contin- and time constraints of manual detection and the advances in
uously developed and scaled to state-of-the-art design and algorithms, hardware, and data availability, machine learn-
complexity. With such growth, defect complexity and fre- ing (ML) and deep learning (DL) have been increasingly
quency have increased, which has subsequently warranted adapted and integrated into various domain applications (i.e.,
the greater need for accurate and real-time quality monitor- medical, manufacturing, finance), including surface defect
ing and control to promote high yield, cost-efficiency, and detection for semiconductor wafer surfaces. During the wafer
performance. and IC fabrication processes, defects arise from process
and equipment instability, as well as environmental fac-
tors such as airborne particles. Traditionally, wafer maps
B Tongwha Kim (WM)—which are visual representations of circuit probe
[email protected] [electrical] testing data—were used by engineers with high-
Kamran Behdinan level domain knowledge for manual defect recognition and
[email protected] classification. However, with the increase in design complex-
1 Advanced Research Laboratory for Multifunctional ity and sub-nanometer IC design, automated detection and
Lightweight Structures (ARL-MLS), Department of recognition is increasingly sought after (Liu & Chien, 2013).
Mechanical & Industrial Engineering, University of Toronto,
Toronto, ON, Canada

123
3216 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Fig. 1 Mind map illustrating overall structure of paper

With recent developments in computer vision and ML/DL

techniques, defect recognition and classification algorithms Fig. 2 General wafer and IC fabrication processes
were further enhanced, wherein the respective applications
focused on improving overall performance, cost-efficiency,
and runtime. Even reinforcement learning has been leveraged the production yield, defect complexity, and effectiveness
as a search algorithm for optimal parameters and architec- of quality inspection technologies. Similarly, with the pro-
tures (Baker et al., 2017; Bello et al., 2017; Shon et al., gression of ML, DL, and computer vision, the algorithms
2021). Various model architectures, algorithms and learning for wafer map defect detection (WMDD) have incorpo-
mechanisms have been explored to achieve state-of-the-art rated these methods to enhance model performance regarding
performance. As such, this paper concentrates on the vari- accuracy, computational load, run-time, and learning capa-
ous ML and DL applications for WM defect recognition and bility. This section introduces the semiconductor wafer
classification. This paper introduces and compares the var- fabrication and inspection processes, as well as the fun-
ious wafer map defect detection algorithms, along with the damental components of the ML/DL applications for WM
discussion of their respective advantages, and limitations. defect recognition and classification.
The current challenges and future research trends of WM
defect recognition and classification are also presented.
Semiconductor wafer fabrication and inspection
The rest of this paper is organized as follows:
Section Background presents the background into fabrication
Semiconductor wafers are the silicon-based substrates used
processes, as well as the fundamental components for wafer
to fabricate ICs. The application and scale of ICs require
map defect recognition and classification. Section Method-
precise manufacturing and strict quality control. The general
ologies and learning strategies provides the details, analysis,
wafer and IC fabrication line is shown in Fig. 2, including
and discussion of the advances of WM defect learning and
the quality inspection checkpoints. The major stages of wafer
detection algorithms. Finally, the conclusions are drawn in
fabrication and inspection are briefly described below.
Section Discussion and conclusion, along with challenges
Wafer fabrication starts at silicon ingot growth and extrac-
and future trends. The mind map in Fig. 1 illustrates the
tion. Mono-crystalline or poly-crystalline silicon is used for
structure of the paper.
silicon growth. In practice, silicon ingots are typically grown
using the Czochralski (CZ) method or alternatively, the float-
zone (FZ) method (Airaksinen et al., 2015; Cuevas & Sinton,
Background 2018). Note that the growth method may impact production
costs, and material properties, such as thermal stress resis-
The current and future trends for wafer fabrication, specifi- tance. After ingots are grown, they are extracted and cropped
cally the evolving technologies and design standards, affect to remove the non-cylindrical ends. The silicon ingot is then

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3217

sliced into thin wafers by (diamond) wire cutting. For the pur- which involves electrical testing to determine die function-
poses of wafer tracking, wafers are marked with characters ality. As part of assembly and packaging, the wafer is sliced
to indicate manufacturing information (i.e., identification, into individual pieces, in which the faulty dies are discarded,
dopants, orientation) (Airaksinen et al., 2015). Afterwards, and the remaining dies are forwarded to packaging.
using a profiled diamond wheel, the wafer edges are grinded The current technologies and design standards for IC
to a standardized or customized edge profile to adjust diam- fabrication are evolving, specifically for photolithography
eter, and minimize risk of slipping and chipping (Airaksinen and IC design. Current designs and lithography technolo-
et al., 2015). Resulting from the prior cutting process, the gies are at the sub-10 nm scale, specifically with extreme
wafer surface is susceptible to large total thickness varia- ultraviolet (EUV) lithography (Hasan & Luo, 2018; Preil,
tions (TTV), which disposes the surface to additional process 2016). With competition and fast-evolving technologies, the
variations from downstream processes. As such, lapping or future trends for IC fabrication include sub-5 and sub-3 nm
single-sided grinding is conducted to achieve TTV, surface scale lithography. As these future trends and technologies
roughness, and thickness measures within acceptable stan- are realized, defect frequency and complexity increase, sub-
dard ranges. Residual mechanical damage may develop on sequently increasing the emergence of unknown, rare, and
the surface and/or edges after the lapping and grinding opera- mixed-type defects; rendering defect detection more diffi-
tions (Airaksinen et al., 2015). To remove the damage and any cult, and emphasizing the need for more robust and reliable
remaining impurities, chemical etching (alkaline or acidic) detection methods. The wafer production and IC fabrication
is conducted. Subsequently, the wafers undergo polishing processes, associated defects and causes are summarized in
to achieve desired thickness, TTV, and flatness. Then the Table 1.
polished wafers undergo a cleaning sequence and quality
inspection prior to IC fabrication. Quality inspections for
wafers involve measuring the physical, material, and chemi-
Table 1 Summary of processes and associated defects
cal properties of the finished product with respect to standard
and design specifications (Airaksinen et al., 2015; Cuevas & Defect Associated Cause
Sinton, 2018). For surface inspections, wafer defect detec- process/stage
tion systems leverage WM images, or wafer images. WM
Random Clean room Environmental
images are the spatial results from electrical testing, which conditions of clean
illustrate individual die functionality, such that defect pat- room may induce
terns are clusters of faulty dies. Wafer bin maps (WBM) are particles and debris
the resulting binarized WM images. Wafer images are gen- onto wafer surface
erated from automated visual or electron beam inspection Loc Lapping, grinding Non-uniform surface
Polishing Uneven cleaning
systems (Patel et al., 2020). Automated visual inspection sys-
tems typically utilize optical imaging techniques, including Edge-Loc Lapping, grinding Non-uniform surface
Polishing Uneven cleaning
scanning acoustic tomography (SAT) (Chen, 2020), scanning
Center Polishing Non-uniform surface
electron microscopy (SEM) (Kim & Oh, 2017; Cheon et al., during chemical
2019), and charged-coupled device (CCD)-based imaging mechanical
(Chen et al., 2020a, 2020b; Wen et al., 2020). process (CMP)
IC fabrication consists of photolithography, assembly, and Edge-Ring Lapping, grinding Non-uniform surface
packaging. Photolithography is used to pattern the wafer, and Photolithography Layer-to-layer
misalignment
involves a repetition of various steps: masking, exposure, and
Chemical etching
etching. Mask design is used to develop the desired patterns issues
for masking; inverse-lithography technologies (ILT) deter- Scratch Assembly, packaging Mishandling
mine the optimal mask to achieve the desired wafer patterns, Polishing Hardening of
and is emerging as a prominent research field (Shi et al., 2019, Clean room polishing pads
2020). Masking involves the application of photoresist, and Agglomeration of
particles
photomask alignment to the wafer. Then the wafer is exposed
Near-full – Agglomeration of
to ultraviolet (UV) light through the photomask to reveal multiple systematic
the patterns, which is followed by etching. Using chemical and random defects
processes, etching develops and removes the exposed pho- Donut Lapping, grinding Non-uniform surface
toresist and exposed oxide layer. To create the desired IC Polishing Equipment
patterns, photolithography is repeated in cycles for pattern handling or
hardening of
and structure development. After the dies (also known as
polishing pads
chips) have been developed, wafers undergo a sorting test,

123
3218 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Data defect classes, which consists of 29 mixed-type defects of

2-, 3-, and 4-mixed types, 8 single-type defects, and normal
The data is the input used to train, validate, and test the mod- (non-defect). Exploratory data analysis (Fig. 4, center) has
els. They originate from real-world fabrication lots or are shown that most mixed-type wafer maps have two to three
simulated via generative modeling. Generative models learn defect types. In Fig. 4 (bottom), the distribution of defect
the probability distribution of the data, such that data can be classes with respect to defect types is shown. Pleschberger
generated by sampling from the learned distribution (Kingma et al. (2019) collected a total of 1000 wafer maps from
et al., 2014; Ruthotto & Haber, 2021). With the advent of five lots, which included five different classes of simulated
powerful, and deep generative models, many studies have defect patterns with varying degrees of Gaussian noise. Each
applied generative modeling for the purpose of wafer map WM is described to contain approximately 17,000 devices
data generation (Ji & Lee, 2020; Lee & Kim, 2020; Wang (dies), in which the (x, y) spatial coordinates are given, along
et al., 2019). Wafer map data generation can also be used with their respective electrical testing results. Beyond the
as a data augmentation tool to tackle class imbalance and public datasets, many developments obtained private WBM
is introduced later in-depth in Section Enhanced learning datasets directly from semiconductor manufacturing compa-
strategies. nies (Adly et al., 2015b; Bella et al., 2019; Hwang & Kim,
Across the wafer map defect detection literature, two 2020; Tello et al., 2018). As wafer map labelling is manually
defect classes have been identified as random, and system- conducted by domain engineers, which is time consuming
atic. Random defects are caused by environmental factors and expensive, majority of the provided data were limited in
within the manufacturing space, such as air particles and are size and types of defects. Note that wafer map datasets typ-
globally distributed across the wafer surface. They have no ically have two limitations: (1) severe class imbalance, and
identified association to fabrication processes, and as such are (2) lack of labels. Class imbalance is the unequal propor-
typically removed during image preprocessing. Systematic tions of data examples for each class. Severe class imbalance
defects are caused by process deviations and have localized persists in the datasets as wafer defects appear at lower
spatially correlated patterns. The root causes for systematic frequencies than normal wafer maps. As such, datasets typi-
defects have been identified and associated to specific fabri- cally lack labels and have an abundance of unlabelled wafer
cation processes (Table 1). maps. It should be noted that with manual annotation, there
Single-type defect wafer maps reflect the presence of a sin- exists incorrect and/or uncertain labelling due to human error
gle defect pattern, in which labels indicate the most salient (Northcutt et al., 2021; Park et al., 2020).
defect pattern. Mixed-type defects are the agglomeration of
random defects and two or more systematic defect patterns. It
is important to note that majority of past works have focused Features
on the detection of single-type systematic defects. However,
with the onset of complex defects, recent works have shifted Features capture the intrinsic information from the input data
focus onto mixed-type defect recognition and classification. and are a critical component as feature learning can bot-
Shown in Fig. 3 are various examples of normal, single- tleneck model performance. Respective to the model and
type and mixed-type defects from the Mixed WM-38 dataset learning strategy, features are derived from feature gener-
(Wang et al., 2020). Wafer maps labeled as normal are with- ation or feature extraction.
out defects. Feature generation is the process in which features are
For data sourcing, there are publicly available and pri- engineered from raw data transformations. Prior to the onset
vate datasets. The WM-811K data (Wu et al., 2015) is a of neural networks, past works relied on manual feature
prominently used public dataset and is heavily featured in generation for effective features as input data to classifiers
past works. The WM-811K dataset consists of 811,457 wafer (Mohanaiah et al., 2013; Ooi et al., 2013; Saqlain et al., 2019;
bin maps from 46,393 real-world fabrication lots, and other White et al., 2008; Wu et al., 2015; Yu & Lu, 2016). These
manufacturing process data, including die size, lot name and past works have included generated features such as: (a) geo-
wafer index. The labels are single-type defects; however, it is metrical features, (b) Radon projection features, (c) density
important to note that the labels reflect the most salient defect features, (d) texture features, and (e) gray features, which are
pattern, despite the presence of mixed-type defects. The described in Table 2. Through manual feature generation,
exploratory data analysis for this dataset (Fig. 4, top) revealed original features are obtained and used for model training.
majority of the data is unlabeled, and amongst the labeled The main advantages of manual feature generation are that
wafer maps, a majority are labeled as Normal. Another pub- these features require minimal storage and computation, and
licly available dataset is the Mixed WM-38 (Wang et al., that domain knowledge can be instilled during feature engi-
2020). The Mixed WM-38 dataset consists of 38,015 wafer neering, which can be especially beneficial for well-known
bin maps of mixed-type defects. This dataset includes 38 and heavily studied defect patterns (Saqlain et al., 2019).

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3219

Fig. 3 Normal, single-type, and mixed-type defects with image dimensions of (52, 52) from Wang et al. (2020)

123
3220 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Fig. 4 Defect class distribution for WM-811 K (top) and Mixed WM-38 (center, bottom) datasets

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3221

Table 2 Summary of generated feature types and usage in literature

Class Papers Feature Description

Geometrical White et al. (2008) Area Reflects the area of the defect pattern, typically the most salient
Chang et al. region (single-type) is considered. Also expressed as ratio of
(2012) defect pattern area to wafer map area
Wu et al. (2015) Perimeter Defines the perimeter of the defect pattern. Also expressed as ratio
Fan et al. (2016) of defect pattern perimeter to radius of wafer map
Wang and Ni
(2019) Convexity Indicates the convexity of a defect pattern, and is expressed as the
Saqlain et al. ratio of the convex hull’s perimeter to the total perimeter of the
(2019) defect
Kang and Kang Length of Major/Minor Computes the length of the major/minor axes of the approximated
(2021) Axes ellipse that surrounds the defect pattern (most salient region)
Eccentricity Describes the outline shape of the approximated ellipse that
surrounds the defect pattern
Solidity Estimates the proportion of defective die in the convex hull of the
defect pattern
Hough Transform Identifies edges and lines in defect patterns
Hu Invariant Moments Set of seven values for central image moments. Image moments can
recognize patterns independent of size, position, and orientation
(Hu, 1962)
Projection Wu et al. (2015) Radon Transform Image projections at various angles are collected as a 2D
Yu and Lu (2016) representation of the wafer map. Projections obtain
Piao et al. (2018) geometric/structural information specific to defect patterns
Saqlain et al.
(2019)
Kang and Kang
(2021)
Density Fan et al. (2016) – Reflects the computed failed die density distribution. Involves
Saqlain et al. dividing the wafer map into multiple segments, and computing the
(2019) defective die density per segment
Kang and Kang
(2021)
Texture Yu and Lu (2016) – Describes and extracts surface textural features from images using
the statistical method, gray level co-occurrence matrix (GLCM).
Examples of textural features are correlation, entropy, energy,
contract, and uniformity (Mohanaiah et al., 2013)
Gray Yu and Lu (2016) – Reflects the pixel distribution in images. Is typically expressed with
various statistical features, including mean, variance, skewness,
and kurtosis

However, this advantage also poses as a caveat to gener- discriminant analysis (LDA) are applied to extract the crit-
ating effective, handcrafted features because the degree of ical features for a lower dimensional representation (Wang
domain knowledge may not be sufficient to represent and dif- & Ni, 2019; Yu & Liu, 2020). As information is lost when
ferentiate the different defect patterns (Kang & Kang, 2021; transforming into a lower dimensional space, PCA aims
Yu & Lu, 2016). This also imposes a limitation in detecting to minimize the number of features while maximizing the
rare/unknown defects in regards to forming features: impor- amount of variance captured by set of features. A major
tant characteristics of these defects may not be known or limitation of PCA is that it does not consider spatial rela-
understood to generate effective features for detection and tions within the data, such that the underlying patterns are
classification. not effectively captured. On the other hand, LDA weakly
In contrast to feature generation, feature extraction can maintains spatial relations by using class labels to instill
be applied to raw data, such as the wafer map images. Fea- low-level discriminatory power in separating classes in the
ture extraction includes dimensionality reduction techniques, lower dimensional subspace (Wang et al., 2014). Despite
and representation learning. Dimensionality reduction tech- these limitations, dimensionality reduction techniques can
niques, like principal component analysis (PCA) and linear

123
3222 Journal of Intelligent Manufacturing (2023) 34:3215–3247

reduce computational complexity, and improve model perfor- Table 3 Selection of prominent machine learning and deep learning
mance. Additionally, research into non-linear dimensionality algorithms for wafer map defect detection
reduction (manifold learning) techniques have demonstrated Algorithm Approach Paper
improved retention of spatial relations, including autoen-
coders (AE), t-distributed stochastic neighbor embedding Supervised Conventional ML Piao et al. (2018)
(t-SNE), locally linear embedding (LLE), multi-dimensional Classifiers (i.e., SVM, Saqlain et al.
scaling (MDS), and isomap (Faaeq et al., 2018). decision trees, (2019)
ensembles) Kang and Kang
Representation learning is automated feature extraction. (2021)
Deep learning models, like convolutional neural network Neural Networks Kyeong and Kim
(CNN)—which are neural networks that employ nonlinear (2018)
kernels for learning shared weights for input feature maps, Nakazawa and
have been highly used in various computer vision tasks due to Kulkarni (2018)
Kim et al. (2021)
the automated feature extraction ability (Nakazawa & Kulka- Wang et al.
rni, 2018; Park et al., 2020; Shen & Yu, 2019). The automated (2020)
feature extraction learns rich and highly descriptive fea- Unsupervised Mixture Models Kim et al., (2018)
tures at each convolution layer. Similarly, representation Ezzat et al.
learning can also be conducted via inference models. Proba- (2021)
bilistic generative models, such as variational autoencoders Density-based Jin et al. (2018)
(VAE) and generative adversarial networks (GAN), leverage (DBSCAN, OPTICS)
inference methods to approximate and learn latent feature Semi-supervised Pretraining-Finetuning Yu (2019)
Shon et al.
representations of the data via latent variable(s) z (Kingma
(2021)
et al., 2014; Kong & Ni, 2020a). It is important to note that the
Generative Modelling Kong and Ni
latent space embeds the input to a compact, and non-linear (2018)
representation. Depending on the learning approach, auto- Hu et al. (2021)
mated feature learning can be executed with labeled and/or Yu et al. (2019b)
unlabeled data. With representation learning, raw data can be Yu and Liu
(2020)
used, and can gain high discriminatory power as the under- Lee and Kim
lying structure of the data can be learned, demonstrating (2020)
capability with complex patterns and data structures (Khas- Kong and Ni
tavaneh & Ebrahimpour-Komleh, 2020; Zhong et al., 2016). (2020a)
The significance of representation learning is demonstrated Enhanced Model Optimization Bello et al. (2017)
Learning Jang et al. (2020)
with transfer learning (Section Enhanced learning strate- Shon et al.
gies), wherein the feature extractor networks (backbone) of (2021)
pretrained models have gained strong feature extraction capa- Incremental Learning Shim et al. (2020)
bilities to extract meaningful features (Chien et al., 2020; Kong and Ni
Ishida et al., 2019; Shen & Yu, 2019). However, the capacity (2020a)
of representation learning is constrained by model complex- Data Augmentation Wang et al. (2019)
ity, as performance is dependent on whether the model is Saqlain et al.
(2020)
suited to the respective data complexity and problem.
Transfer Learning Shen & Yu (2019)
With the onset of neural networks, research has shifted Ishida et al.
from manual feature generation to feature representation (2019)
learning as leveraging feature learning algorithms has proven Chien et al.
to generate more meaningful and effective features for down- (2020)
stream tasks, especially for problems with complex data
structures. in-depth in Section Methodologies and learning strategies. In
Table 3, the prominent works for each main algorithm used
Algorithms for wafer map defect detection in wafer map defect detection are listed.
Supervised learning utilizes labels for model training, and
The algorithms are the learning strategies in which the model loss functions, which measure the error between the predic-
learns and trains from the input data. In this section, the tions and ground truth. The labels are factored into the loss
three learning strategies that we will focus on are introduced: function, and acts as the supervisory signal for the model
supervised, unsupervised, and semi-supervised learning. The to learn the mapping for an input and the respective desired
main algorithms for wafer map defect detection are discussed model output. Loss functions are optimized by finding the

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3223

global minimum or optimal local minimum. It should be different defect patterns for both single-type and mixed-type
noted that loss functions are dependent on the downstream defects. Unsupervised methods have also been leveraged for
task, and their mathematical optimization is constrained by pretraining to supplement supervised methods with unsu-
the convexity. The problem for supervised wafer map defect pervised feature representation learning using autoencoders
detection is defined as classification, in which the algorithms (Shon et al., 2021; Yu, 2019). By taking advantage of the
aim to learn the mapping from input to output to predict spe- plethora of unlabeled data, unsupervised pretraining meth-
cific defect patterns. Early literature has transitioned from ods operate to learn general feature representations to better
conventional machine learning classifiers to neural networks. initialize the model weights for supervised training (relative
Conventional machine learning classifiers typically require to zero or random initialization) via reconstruction errors.
extensive preprocessing and manual feature generation and Semi-supervised learning leverages both labeled and unla-
have mainly been applied for single-type defect detection. beled data for the model training process. During the training
Common classifiers used in WM defect detection include process, the labeled data is utilized in the same manner as
SVM, decision trees, and ensembles (Fan et al., 2016; Piao supervised learning, whereas the unlabeled data is leveraged
et al., 2018; Saqlain et al., 2019; Wu et al., 2015). Neural for transduction-based inference learning. This is reflected
networks are prominently used throughout the literature and in the loss function, where a combined, and weighted loss
have demonstrated capability for single-type and mixed-type function is defined to account for both labeled and unla-
defect classification. beled data. With transduction-based inference learning, all
It is important to note that the classification problem can available data is observed to enhance the learned data rep-
be multi-class or multi-label. In multi-class classification, resentations for inferring missing labels. Relative to the
there are a distinct number of classes that the classifier learns former learning strategies, development of semi-supervised
and models. Each data sample belongs to a single class, and algorithms is growing to overcome the limitations imposed
the classifier predicts the probability across all classes that by supervised and unsupervised learning. For WMDD,
the data sample belongs to a particular class. Multi-label pretraining-finetuning and semi-supervised generative mod-
classification is a multi-output algorithm, such that the data els have been implemented to tackle the real-world issue of
examples can be annotated with multiple target classes. For limited annotated wafer maps. For pretraining-finetuning,
multi-class neural networks, the softmax function is used in unsupervised pretraining methods are followed by super-
the final output layer to compute the decimal probabilities, vised finetuning. Semi-supervised generative models are
which add up to 1.0. On the other hand, multi-label neural probabilistic methods, which include the models: variational
networks utilize the sigmoid function in the final output layer autoencoders (VAE), and modified Ladder networks (Kong
to predict the probabilities (between 0 and 1) for each class. & Ni, 2020a; Lee & Kim, 2020). These methods have been
Mixed-type defect detection can be framed as a multi-class applied towards both single-type and mixed-type defect pat-
or multi-label classification problem. As a multi-class clas- terns.
sification problem, mixed-type defects are segmented into Beyond the model training algorithms, enhanced learn-
multiple single-type defect patterns and are subsequently ing algorithms and techniques have been applied for wafer
classified with a network of binary classifiers (Kong & Ni, map defect detection to boost performance, and to address
2019, 2020b; Kyeong & Kim, 2018). On the other hand, as a the issues with labeled data availability, class imbalance,
multi-label classification problem, mixed-type defect detec- rare/unknown defect detection, and model sensitivity. These
tion aims to recognize the different patterns and predicts the algorithms and techniques have been introduced as data aug-
probability per class label for a single wafer map (Lee & mentation, incremental learning, transfer learning, and model
Kim, 2020; Wang et al., 2020). optimization (Bello et al., 2017; Jang et al., 2020; Ji & Lee,
Unsupervised learning algorithms leverage unlabeled data 2020; Shim et al., 2020).
to learn their underlying patterns, and structure. For wafer
map defect detection applications, the main unsupervised
learning tasks are clustering, and pretraining. Clustering Evaluation
focuses on self-organization to cluster data based on similar-
ity and dissimilarity distances. Popular clustering algorithms Evaluation methods are used to assess the performance, and
for wafer map defect detection include density-based spatial can be conducted at validation, or the final testing stage. The
clustering of applications with noise (DBSCAN), ordering results from the validation stage drive hyperparameter tuning
point to identify the cluster structure (OPTICS), and mix- and model optimization. The evaluation methods are depen-
ture models, such as Gaussian mixture models (GMM) and dent on the data and learning approach. Across the wafer map
infinite warped mixture models (iWMM) (Ezzat et al., 2021; defect detection literature, the common performance evalu-
Fan et al., 2016; Iwata et al., 2013; Kim et al., 2018). Spa- ation indices have been identified and summarized in Table
tial clustering applications in WMDD aim to segment the 4 below (Hwang & Kim, 2020; Kim et al., 2018; Lee &

123
3224 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Table 4 Summary of common

performance evaluation indices Learning approach Performance Evaluation Index Equation

Supervised, Accuracy (Top-1) Acc = T P+T N

T P+F P+T N +F N (1)
Semi-supervised
Precision Pr e = T P+F P
TP
(2)
Pr ei = T PTi +F Pi
Pi (3)
Recall Re = T P T+PF N (4)
Rei = T PiT+F Pi
Ni (5)
F-Measure F1 = 2 ×PrPr ecision × Recall
ecision + Recall (6)
Micro-precision N C j j (7)
j=1 yi ŷ i
M Pr e = i=1 N C j =
i=1 j=1 yi
j
j T P

j T P + j FP
j j

N C j j
Micro-recall j=1 yi ŷi
(8)
M Re = i=1
N C j =
i=1 j=1 ŷi
j
j TP

j T P j+ j FN j

Exact match ratio

N (9)
EMR = 1
N I (yi = ŷi )
i=1

Hamming loss H amming loss = (10)

1
N C
j
NC I (yi = ŷi )
i=1 j=1

Unsupervised Rand Index RI = a+b

n = T P+T N (11)
T P+F P+F N +T N
2
R I −E[R I ]
Adjusted Rand Index ARI = 1−E[R I ] (12)
I (X; Y)
Normalized mutual information NMI = √H(X)H(Y) (13)
M I − E[M I ]
Adjusted mutual information AM I = max(H (U ), H (V )) − E[M I ] (14)
Purity
k (15)
Purit y = 1
N max j ci ∩ t j
i=1

Kim, 2020; Li et al., 2021; Saqlain et al., 2019). In Table 4, average of precision and recall (Eq. 6), such that its respec-
the variables TP, TN, FP, and FN represent True Positives, tive score indicates how close the predicted and ground
True Negatives, False Positives, and False Negatives respec- truth values are. These metrics are used to evaluate the
tively. multi-class classification performance for wafer map defect
The evaluation methods for supervised learning algo- detection.
rithms indicate how well the model has learned by the In the context of multi-label classification problems, exact
number of correct and incorrect predictions. The (top- match ratio (EMR), micro-precision (MPre), micro-recall
1) accuracy, precision, recall, and confusion matrix are (MRe), and Hamming loss can be used (Lee & Kim, 2020;
the metrics typically used to evaluate and compare mod- Santos & Canuto, 2012; Wang et al., 2020). MPre (Eq. 7) and
els. Equations (1) to (5) represent the (top-1) accuracy, MRe (Eq. 8) differ from their multi-class counterpart by con-
precision, and recall. The accuracy indicates the total num- sidering partially correct predictions, as each correct target
ber of correctly identified wafer maps; precision signifies label is counted for each sample i ∈ N , where N represents
the total correctly identified wafer maps from all identi- the total number of samples, and each class j ∈ C, where
fied wafer maps, and recall indicates the total number of C represents the total number of known classes labels. On
correctly identified wafer maps within a given set. Note the other hand, EMR (Eq. (9)) is computed similarly to accu-
that Eqs. (2) and (3), as well as Eqs. (4) and (5) rep- racy and reflects all fully correct predictions. Note that yi
resent the same equation, but are qualified by the given and yi represent the true labels and predicted labels respec-
j
class i, such that the precision and recall are computed tively, whereas yi j and
yi are the per class label equivalents.
for each respective class i. The F-1 metric is the weighted Hamming loss reflects the proportion of incorrectly predicted

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3225

labels to the total number of labels at the individual label- filtering. Preprocessing operations include image size stan-
level. As shown in Eq. (10), the indicator function evaluates dardization, binarization, and denoising.
to 1 when predicted labels do not match the ground truth label. Image size standardization is to reshape the raw wafer
Like MPre and MRe, N and C represent the total number of maps to a single, uniform size, and utilizes interpolation algo-
samples and total number of known classes respectively. rithms to minimize quality loss. Interpolation algorithms are
As semi-supervised algorithms leverage unlabeled data subject to the pixel neighborhood size for approximation,
for label imputation, and feature representation learning dur- such that with increasing sizes results in longer rendering
ing training, the performance is evaluated like supervised times and higher quality. The bicubic interpolation algorithm
algorithms. Evaluation methods like accuracy, EMR, etc. are is typically used due to the optimal quality and time trade-
calculated on the labeled data. off. Binarization is used to convert wafer maps to wafer bin
For unsupervised wafer map defect detection methods, the maps, in which individual die functionality is indicated by
performance indices typically evaluate the defect clustering 0 s and 1 s.
results. The following have been identified and described by Image denoising (outlier detection) and filtering refers to
Eqs. (11) to (15) as the commonly used evaluation metrics the process of removing random defects. It is typically con-
for unsupervised defect detection algorithms: (i) Rand Index ducted to enhance model performance and accuracy as the
(RI), (ii) adjusted Rand Index (ARI), (iii) normalized mutual removal of random defects enhances the systematic defects.
information (NMI), (iv) adjusted mutual information (AMI), Past works have utilized spatial filtering and clustering meth-
and (v) Purity. These metrics focus on comparing the clusters ods to remove noise and isolate the systematic defects (Chien
via similarity, and shared information. et al., 2013; Liu & Chien, 2013; Wang, 2008, 2009; Yuan
RI is the ratio of the number of correct similar pairs (a), et al., 2010). Spatial filtering algorithms focus on how to
and correct dissimilar pairs (b) to all possible combination effectively differentiate between the random defects and the
pairs, where n represents the number of samples. ARI is the dies that belong to systematic defects. Spatial clustering algo-
RI, but adjusted, such that independent of the number of clus- rithms focus on forming a separate cluster for each different
ters and samples, randomly clustered samples are closer to 0, defect patterns. The input to these methods has already fil-
and highly similar samples are closer to 1. In Eq. (12), E[R I ] tered the defect patterns. Support vector clustering (SVC)
indicates the expected RI value. NMI (Eq. 13) is the normal- has been used in Wang (2009) and Yuan et al. (2010) for
ization of mutual information (MI), which results in scores defect denoising, and identification of systematic defect pat-
between 0 and 1. For Eq. (13), I (X; Y), H(X) and H(Y) terns. SVC demonstrated robustness against noisy data, but
represent the mutual information between X and Y, and the high sensitivity to defect complexity as clustering efficiency
entropy of X and Y respectively. AMI is mutual informa- decreases with more complex defect patterns (i.e., multiple
tion adjusted, such that permutations of the class and cluster defects). Similarly, the k-nearest neighbors (kNN) algorithm
labels would not affect the score. Lastly, purity (Eq. (15)) is also used to differentiate between defective dies that belong
measures the accuracy of cluster assignments by tallying the to systematic defect patterns (Huang, 2007). The spatial ran-
number of correctly assigned samples and dividing by the domness filter is a statistical method that checks the spatial
total number of samples (N ). independence of adjacent dies. The spatial independence is
computed by taking the logarithm (Log) of the odds ratio
(θ̂), in which the resulting Logθ̂ determines whether the
wafer map is spatially random, contains a defect cluster, or
Methodologies and learning strategies
repeating patterns (Chien et al., 2013; Liu & Chien, 2013).
Although the filtering results indicate which wafer maps
In this section, the recent developments in AI applications
should be used for classification, as the spatial independence
for WM defect recognition and classification are introduced,
test is computed for the dies and not the pattern, misclassi-
analyzed, and discussed. This section is organized into (1)
fication can occur. Median filtering is a popular denoising
preprocessing, (2) supervised learning, (3) unsupervised
method that replaces each die’s value with the median value
learning, (4) semi-supervised learning, and (5) enhanced
of the neighboring dies, and has been used in many works
learning strategies.
for image preprocessing (Kong & Ni, 2020a; Wang et al.,
2006; Yu & Lu, 2016; Yu, 2019). Median filtering can be
Preprocessing effective in removing the random defects, however, may also
remove important pattern information as some of the sys-
The purpose of the data preprocessing stage is to preprocess tematic pattern dies may be removed. The thin geometries of
and prepare the wafer map images for feature extraction and the Scratch, and Edge-Ring defects are particularly sensitive
model training. Data preprocessing typically includes a mul- to median filtering (Fig. 5). It is important to note that poor
titude of operations for image transformations, and spatial spatial filtering and spatial clustering can significantly affect

123
3226 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Fig. 5 a–d Original wafer maps, e–h Wafer maps after median filtering

downstream tasks as the filtered systematic defect pattern densities of each defect, in particular the Scratch and Edge-
quality is damaged. Ring defects. According to Jin et al. (2019), their proposed
Wang and Chen (2019) proposed using three masking fil- DBSCAN-based algorithm considers defect pattern type for
ters to preprocess wafer maps and extract rotation-invariant outlier detection. The outliers are completely removed for
features for defect pattern classification. To address the lim- most defects (i.e., Loc, Donut, Random), and either care-
its of traditional spatial filtering methods for curvilinear and fully removed for the Scratch and Edge-Ring defects. The
edge patterns, polar, line, and arc masks were applied at var- authors recommended to not completely remove the outliers
ious angles to real-world wafer maps to extract features of for the Scratch and Edge-Ring defects as defect pattern qual-
concentric, linear, and eccentric patterns. Used to train vari- ity would deteriorate.
ous classifiers (i.e., neural networks, random forest, SVM), The above filtering methods have demonstrated limita-
the masking filters demonstrated effectiveness with high tions towards the Scratch and Edge-Ring defects due to their
defect recognition rates, but limited recognition for defect thin and elongated shapes. As such, Kim et al. (2018) pro-
patterns with complex geometry (i.e., Scratch, Reticle). posed the connected-path filtering (CPF) algorithm. The CPF
The king-move neighborhood (Chien et al., 2013; Hsu algorithm uses depth-first search (DFS) to explore all pos-
et al., 2020; Wang, 2008; Wang & Ni, 2019), and Moore sible paths between two defective dies, and recognizes the
neighborhood (Jin et al., 2019) are utilized to compute the connected paths that are longer than a threshold criterion to
spatial correlation weights for the adjacent dies. Although represent the identified defective die connected paths. Note
both the king-move neighborhood and Moore neighborhood that the CPF algorithm relies on the optimal threshold cri-
filters consider the eight surrounding dies, the Moore neigh- terion to effectively detect systematic defects, which can
borhood filter also considers the center die. Typically, a be determined by parameter-tuning or domain experts. The
global threshold criterion is applied to the spatial correla- authors utilized domain experts to set a global threshold cri-
tion weights, such that dies are removed if the criterion is terion of 12 for all defects, in which distances greater than
not met. The downfall of using a global threshold criterion is 12 are recognized as systematic defects. The advantage of
that it does not consider the geometries and typical defect die the CPF algorithm is that the threshold criterion allows for

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3227

the detection of the Scratch defect. The limitation of apply- et al., 2012; Hsu & Chien, 2007; Kim et al., 2020b; Li &
ing a global threshold criterion for all defect-types is that the Huang, 2009; Liao et al., 2014; Ooi et al., 2013). These
local spatial information, such as defective die density and methods reported a high overall detection accuracy (approxi-
distribution, defect-type geometry, and disjoint connection mately > 90%), however demonstrated low detection rates for
paths [due to random defects], is not considered. Specifically, geometrically complex defect patterns (i.e., Donut, Scratch,
the defective dies that are not associated with a connected mixed-types), and diminished effectiveness with imbalanced
path are completely disregarded. Additionally, defining a uni- datasets. To boost overall classification accuracy, Jin et al.
versal threshold for all defect-types cause scalability issues (2020) incorporated error-correcting output codes (ECOC)
for real-world applications as the onset of complex, and and SVM for single-type WM defect classification using
mixed-type defects would require domain experts and fre- CNN-based feature extraction.
quent updates to threshold values. Yu and Lu (2016) proposed the joint local and non-local
To address the limitations of the CPF algorithm, the linear discriminant analysis (JLNLDA) framework, which
graph-theoretic approach for adjacency clustering (AC) was utilizes manifold learning to extract highly discriminative
developed by Ezzat et al. (2021). Based on graph theory, this features. With the aim to preserve defect geometry at lower
algorithm represents the dies and the neighborhood connec- dimensional space, four neighborhood graphs: two graphs
tions on the wafer map as the graph nodes, and graph edges. for local and non-local spatial information, and two penal-
Although the AC algorithm is executed as a spatial clustering ization graphs that apply penalties to promote maximizing
task, it functions as a spatial filtering method by leveraging between-class separation, and minimizing within-class sep-
spatial correlation information between adjacent dies to clus- aration. Geometry, gray, texture, and radon-based features
ter the defective dies into two groups: random and systematic were generated, followed by dimensionality reduction and
defects. The authors compared the AC and CPF algorithms, feature extraction. For wafer defect detection, JLNLDA was
and demonstrated the improved performance of AC in fil- extended to construct JLNLDA-FD, a Fischer discriminant-
tering high complexity defects, and overall improved impact based recognition model to compute the discriminant func-
to the defect recognition task. The authors have noted that tion value of a wafer map belonging to the defect classes,
too small or too large separation loss would result in unde- such that wafer maps are classified as the defect class with
sired filtering results (i.e., weak to absent filtering effect, the maximum probability.
same label wafer maps), and cross-validation may be used Saqlain et al. (2019) proposed a soft voting ensemble
to determine the optimal weight trade-off. In comparison to (SVE) classifier for wafer defect recognition and classifica-
existing preprocessing methods, this algorithm fully utilizes tion. Using the WM-811K dataset, three multi-type features
the available spatial information (i.e., spatial dependency of (geometry-based, density-based, radon-based) are extracted,
adjacent dies), demonstrating state-of-the-art performance. and used as inputs to train the base classifiers of the ensem-
ble. The authors used four state-of-the-art ML classifiers for
Supervised learning the ensemble: logistic regression, gradient boosting machine
(GBM), ANN, and random forest. To train the proposed
Supervised learning utilizes labels as a supervisory signal ensemble, the base classifiers are trained individually using
for training. Early literature for wafer map defect detection the extracted features, and then in a soft voting ensemble
mostly consists of supervised machine learning algorithms, approach, the results of the base classifiers are combined to
including common models such as artificial neural network output the final defect prediction. Soft voting uses weighted
(ANN), random forest (RF), and support vector machines averages to determine the final prediction; based on perfor-
(SVM). Note that in wafer map defect detection applica- mance, better performing classifiers have higher weights for
tions, multi-class classification is more popular and widely voting. The authors reported defect classification accuracy of
developed than multi-label learning. The methodologies dis- 95.87%, proving the ensemble classifier achieves improved
cussed in this section are structured into three categories: (i) performance relative to a single individual base classifier.
conventional machine learning, (ii) deep learning, and (iii) Although both JLNLDA and SVE achieved high defect clas-
specialized modules. sification rates, their performance is contingent on manually
Conventional machine learning algorithms used for generated features, which can bottleneck performance.
WMDD include SVM, decision trees, and ensembles. Extensions of supervised ANNs have featured in WMDD
Although a bit antiquated due to the onset of neural networks literature, including multilayer perceptron (MLP), and gen-
and deep learning, conventional ML algorithms can remain eral regression network (GRN) (Adly et al., 2015a, 2015b;
competitive. In related works, SVM and decision trees were Huang, 2007; Huang et al., 2009; Tello et al., 2018). In
prominently used for single-type WM defect classification (Huang, 2007) and (Huang et al., 2009), self-supervised
as the classifiers are relatively computationally inexpensive, MLP models were trained to recognize clusters of defec-
stable, and can work well with high-dimensional data (Chang tive dies, however, classification was restrained to predicting

123
3228 Journal of Intelligent Manufacturing (2023) 34:3215–3247

good and bad wafers, such that limited details of the defect et al., 2021; Kong & Ni, 2019, 2020b; Kyeong & Kim,
were learned. GRNs utilize Gaussian kernels as activation 2018; Zhuang et al., 2020). For multi-label classification of
functions in the hidden layer. Adly et al., (2015b) applied a mixed-type defects, CNN models used sigmoid activation to
randomized bootstrapping technique to train an ensemble of compute the probability for each defect label. On the other
GRN models, such that each model would learn from a ran- hand, for multi-class classification of mixed-type defects,
dom, independently sampled data to decrease variance, and Kyeong and Kim (2018) proposed the use of CNNs for
increase detection accuracy. Similarly, Adly et al., (2015a) mixed-type defect pattern classification by training multiple
extended the previous work with a data dimensionality reduc- binary CNNs (Fig. 7). Each CNN is built to detect the absence
tion technique, which employed Voronoi diagrams for data or presence of a distinct pattern (Scratch, Ring, Circle, Zone),
partitioning and K-means for clustering to represent the data and then the CNN outputs are combined. By leveraging
at a reduced size. As the Voronoi diagrams portion the data multiple CNNs, this method has the advantage of adaption,
into a vector space; smaller regions reflect different defect as new defect patterns can be easily trained and added to
patterns, and K-means clustering was used to find the centroid the existing framework. Compared to SVM and multilayer
for each region in the vector space, which was subsequently perceptron (MLP), the proposed CNN achieved superior
used for training. Both the GRN-based models demonstrated classification accuracy, recall and precision of 0.910, 0.945,
high accuracy, but by applying the data reduction technique, and 0.949 respectively. Similarly, in (Zhuang et al., 2020), a
computational time complexity was reduced. As these pre- network of deep belief networks (DBN) was used to classify
vious works considered only single-type defects, Tello et al. six defect patterns for single-type and mixed-type defect clas-
(2018) combined the randomized GRN (RGRN) model with sification. Kong and Ni proposed mixed-type defect detection
a CNN model. By using information gain theory to sepa- by pattern segmentation, such that overlapped defect pat-
rate the data into single-type and mixed-type defects, RGRN terns are processed into multiple single patterns, which are
and CNN classify single-type and mixed-type defects respec- then classified using multiple binary CNNs (Kong & Ni,
tively, achieving an overall accuracy of 86.17%. Although 2019, 2020b). Both proposed models achieved comparable
mixed-type defect detection was investigated, a limited range classification performance as other high performing models
of mixed-type defects were considered. and demonstrated how pattern segmentation of overlapped
Deep learning models employ CNNs and additional layers mixed-type defects can improve recognition and classifica-
for training. Due to the automated feature extraction capabil- tion accuracy. Kim et al. (2021) applied the object detection
ity, deep learning has been heavily applied for image-based algorithm, single shot detector (SSD), to effectively recog-
tasks, including wafer map defect recognition and classifi- nize, segment and classify the multiple instances of defect
cation. Deep learning models typically have more than three patterns within a mixed-type defect sample. As object detec-
layers, and with each progressive layer, the model extracts tion frameworks require bounding box (BB) information (for
higher level features. Many related works utilize CNNs for the desired object instances), an automatic BB generator was
single-type, and mixed-type WMDD. In (Batool et al., 2020; designed to utilize digital image preprocessing techniques
Bella et al., 2019; Du & Shi, 2020; Kim et al., 2020a; Maksim and libraries (i.e. PIL, spatial filters) to obtain the BBs. The
et al., 2019; Nakazawa & Kulkarni, 2018; Yu et al., 2019a), SSD algorithm simultaneously solves the object classifica-
CNNs with customized model architecture were trained for tion and localization problems, which subsequently improves
single-type WM defect classification. For example, the cus- run-time, and performance. The SSD model utilized pretrain-
tom CNN architecture by Nakazawa and Kulkarni (2018) ing from large-scale image datasets, and fine-tuned the last
for multi-class defect pattern classification achieved an over- output layer on a selection of the WM-811K data. Compared
all test accuracy of 98.2%, and considered 22 defect classes to the CNN model, the proposed SSD model achieves a higher
(Fig. 6), in which many classes were variations of fundamen- accuracy for single-type and mixed-type defects.
tal defect patterns. It should be noted that in the case of class The methods categorized as specialized modules inte-
distinctiveness, many classes were quite similar, such that grate advanced model elements different from standardized
misclassification rates were high as the model had difficulty model components, which can encompass specialized loss
differentiating between the similar-looking defect patterns. functions, modified kernel functions, etc. Park et al. (2020)
Additionally, in multi-class classification methods, mixed- proposed a Siamese network integrated with an uncertainty-
type defect detection is difficult as the most salient defect reducing technique for class label reconstruction via G-
pattern is typically predicted, disregarding the other present means clustering (Fig. 8). For discriminative feature learning,
defects. the Siamese network learns feature embeddings based on
The related works for mixed-type WM defect detection similarities between the input image pairs, and aims to min-
framed the problem as multi-label classification (Devika & imize the contrastive loss, such that embeddings for similar
George, 2019; Hyun and Kim (2020); Wang et al., 2020) images are closer together, and embeddings for dissimilar
or multi-class classification (Byun & Baek, 2020; Kim images are farther apart. G-means clustering leverages the

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3229

Fig. 6 Structure of proposed CNN by Nakazawa and Kulkarni (2018)

learned feature embeddings from the Siamese network to are subsequently added to a standard convolution (Fig. 9)
enable enhanced class label reconstruction and outlier detec- (Dai et al., 2017; Zhu et al., 2019). The authors compared
tion. The results demonstrate that the proposed model can the proposed DCN to state-of-the-art mixed-type defect clas-
segment mixed-type defects, however, has difficulty with sification models on the Mixed WM-38 dataset, in which
controlling the degree of pattern segmentation, and differ- the results demonstrated the superior performance of DCN
entiating between the unknown cases from the known cases. in the detection of complex mixed-type defects. Similarly,
By leveraging class label reconstruction, uncertainty associ- Tsai and Lee (2020a) incorporated depth-wise separable
ated with the wafer map labels can be mitigated. convolutions to improve run-time and reduce overfitting
Modified convolutional blocks were proposed by Wang as they have fewer parameters than standard convolutions.
et al. (2020), Tsai and Lee (2020a), Hyun and Kim (2020), By using depth-wise separable convolutions, the proposed
and Alawieh et al. (2020). Wang et al. (2020) used deformable model achieved a 96.63% classification accuracy based on
convolution networks (DCN) for multi-label classification, single-type defect patterns. Another development of modi-
which demonstrated enhanced performance as deformable fied convolutional blocks was introduced by Hyun and Kim
convolutional layers can learn and recognize the geometric (2020); a memory module to keep track of a fixed number of
variations of defect patterns. Deformable convolutional units rare occurrences for each class to mitigate class imbalance
learn the two-dimensional offsets to learn different deforma- issues. The memory module is used to learn high quality
tions of the filter sizes and geometric characteristics, which representative samples in latent space for each defect class.

Fig. 7 Proposed mixed-type

defect classification model in
Kyeong and Kim (2018)

123
3230 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Fig. 8 Proposed Siamese

network with class label
reconstruction in (Park et al.,
2020)

To learn the low dimensional representations of the data Unsupervised learning

within the CNN structure, this method utilized triplet loss
for training. Compared to CNN and SVM variations, the pro- To leverage the abundance of unlabeled data, unsupervised
posed memory module achieved comparable test accuracies learning has been applied for clustering, as well as the auxil-
on three different datasets. iary task of pre-training to supplement supervised learning.
Alawieh et al. (2020) proposed a reject option for In context of wafer map defect detection, the related unsu-
CNN deep selective learning, such that misclassification of pervised learning works are introduced.
unknown defects can be avoided. Deep selective learning is Clustering algorithms utilize similarity or distance mea-
leveraged when new defects emerge, change in class dis- sures to group data; they aim to minimize the distance
tribution, and resource allocation. The model is trained to between intra-cluster samples (high intra-cluster similar-
achieve an optimal trade-off between rejection and classifi- ity), and maximize inter-cluster distances (low intra-cluster
cation, such that the model rejects prediction of select wafer similarity). Spatial clustering is applied for wafer map seg-
maps when the risk of misclassifying is high. This creates a mentation, such that the defect patterns are separated into
pool of samples to be examined for enhance understanding clusters. In early works that utilized clustering algorithms,
and identification of new defects. The authors demonstrated adaptive resonance theory (ART) based models (Chen &
the use of deep selective learning can achieve superior perfor- Liu, 2000; Choi et al, 2012; Hsu & Chien, 2007; Palma
mance relative to conventional CNNs. Similarly, Cheon et al. et al., 2005) were prominently used. These ART-based mod-
(2019) designed a CNN model with an unknown defect detec- els are recurrent models, demonstrating memory retention,
tion option. Using kNN, the anomalies (unknown defects) are knowledge adaption and growth when identifying charac-
recognized, compared to other known defect clusters, and teristics of new or similar defect patterns. In (Taha et al.,
classified as unknown when determined as lacking cluster 2018), spatial dependence across all maps was considered for
membership. the proposed wafer clustering algorithm, Dominant Defec-
tive Patterns Finder (DDPfinder). Similarly to Adly et al.,
(2015a), Voronoi diagrams are used to partition the defect

Fig. 9 Standard convolution

sampling (left) and deformable
convolution sampling (right)

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3231

patterns, and to determine the respective spatial depen- Unsupervised pre-training is typically conducted by train-
dence relative to the identified centroid defective die point. ing an autoencoder in an unsupervised approach to minimize
Hierarchical clustering was used by Alawieh et al. (2018) the reconstruction loss and learn latent feature represen-
to minimize clustering sensitivity to outliers; incorporat- tations of the data. For classification tasks, a classifier is
ing various optimization methods to determine the optimal added to the trained encoder and fine-tuned; the fine-tuning
number of clusters, optimal number of singular values for adjusts the encoder and classifier. The general process of
noise removal, and optimal number of defect patterns. As unsupervised pre-training is shown in Fig. 10. Shon et al.
clustering algorithms are sensitive to initialization and hyper- (2021) applied unsupervised pre-training and data augmenta-
parameters (i.e., number of clusters), many suffered from tion to improve CNN classifier performance based on limited
difficulty of determining the appropriate number of clusters labeled wafer maps. Using the unlabeled data of WM-811K,
for defect patterns (Patel et al., 2015; Xu & Tian, 2015). a convolutional variational autoencoder (CVAE) was trained
Related works (Hwang & Kim, 2020; Jin et al., 2019; in efforts to better initialize the feature extraction layers of
Kim et al., 2018) leveraged clustering algorithms for defect the CNN classifier. Subsequently, the CVAE encoder and
detection. Kim et al. (2018) utilized connected-path filter- CNN classifier are fine-tuned in an end-to-end manner by
ing, and then spatial clustering via infinite warped mixture minimizing the cross-entropy loss. The results showed that
models (iWMM). iWMMs (originally introduced in (Iwata the proposed method achieved high classification perfor-
et al., 2013)) apply a warping function to the defect clus- mance at early epochs, indicating the benefit of unsupervised
ters, such that in the latent space, the clusters have Gaussian pre-training. Although pre-training can improve downstream
shapes. In (Ezzat et al., 2021; Iwata et al., 2013; Kim et al., classification performance, as the WM-811K data consists of
2018), the authors report iWMM as an effective clustering single-type defects, the proposed model is limited in complex
algorithm due to its warping function, and ability to effec- mixed-type defect recognition as CVAE may have difficulty
tively estimate the number of clusters, which circumvents differentiating between the multiple defects with a single
the influence of setting the number of clusters. However, discriminative network. Similarly, Yu (2019) proposed a two-
Kim et al. (2018) noted that iWMM had difficulty in appro- phase methodology for wafer map recognition: an enhanced
priately isolating the partial-ring defect pattern due to its stacked denoising autoencoder (ESDAE) for feature learning
complex and non-Gaussian geometry. Jin et al. (2019) intro- via unsupervised pre-training, and then supervised finetun-
duce DBSCANWBM, a novel DBSCAN-based clustering ing. ESDAE consists of two autoencoders, which incorporate
method. DBSCANWBM inherits DBSCAN characteristics, manifold regularization such that intrinsic local and nonlocal
and was adapted to: (i) consider defect-type for outlier geometric information is preserved. ESDAE involves a cost-
detection, (ii) bypass the requirement to specify number of sensitive layer-wise training procedure, in which each layer
clusters, (iii) parallelize outlier detection and defect detec- is trained to minimize the reconstruction error, and assigns
tion, and (iv) detect both single-type and mixed-type defects. different costs to different defect classes for misclassification
By adjusting outlier removal relative to defect-type, the sys- to address class imbalance. The experimental results on the
tematic defect geometries can be better preserved, which in influence of manifold regularization demonstrate that perfor-
turn, can improve classification accuracy. Hwang and Kim mance improved with the increasing degree of regularization
(2020) developed a one-step clustering method that com- (γ). Compared to a typical stacked denoising autoencoder
bines Gaussian mixture models and Dirichlet process (DP) (SDAE), logistic regression, DBN, and back propagation net-
to a VAE framework. Within the proposed VAE framework, work (BPN), ESDAE achieved the best defect recognition
DP is used to automate the updating of number of clus- accuracy of 97.03%. Despite the improved performance, the
ters, and the GMMs are employed as a prior distribution proposed methodology involves feature generation of orig-
to learn the nuances of different wafer maps. Like iWMM, inal geometrical, gray, texture, and projection features for
and DBSCANWBM, this VAE framework works without model training; generally, as it is difficult to estimate the
specifying the number of clusters in advance. The VAE effectiveness of manually generated features, model perfor-
framework encodes and decodes latent feature representa- mance may be hampered. Likewise to CVAE, ESDAE trains
tions that follow a Gaussian mixture distribution (Hwang & on the single-type defects in WM-811K, and as such is inad-
Kim, 2020). The authors reported that their proposed clus- equate against mixed-type defects.
tering framework estimated the number of clusters more
accurately than the comparison models, and achieved better Semi-supervised learning
clustering performance relative to adjusted mutual informa-
tion and adjusted rand index. The clustering methods that The performance of supervised learning is limited by the
utilized generative models have demonstrated improved per- amount of available labels; on the other hand, without the
formance as the models are built to learn effective feature supervisory signal from labels, the performance of unsu-
representations. pervised learning for defect classification is unsatisfactory

123
3232 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Fig. 10 General unsupervised

pretraining using layer-wise
training of autoencoder with
layer-wise training (left) and
supervised finetuning with
classifier (right)

in comparison. Semi-supervised learning is introduced, and two small datasets containing only single-type defect pat-
addresses the limitations of supervised and unsupervised terns, the small class sample sizes most likely skewed feature
learning. Regarding real-world applicability, with limited learning, such that the model had difficulty differentiating
available labels, a surplus of unlabeled wafer maps, and large between similar-looking pattern variations. This is shown by
volumes of incoming unlabeled wafer maps, semi-supervised the confusion matrices reported in (Kong & Ni, 2018), which
learning can achieve better performance as it utilizes both divulge the misclassification rates of select defects. Addition-
labeled and unlabeled data for model training. For semi- ally, as the datasets contained only single-type defect wafer
supervised learning, the labeled wafer maps are used to learn maps, defect classification is limited and requires modifica-
the relevant features for each defect pattern, and then the tion and model re-training for mixed-type defects.
unlabeled wafer maps are used to refine the feature repre- In (Yu & Liu, 2020), PCACAE, a novel semi-supervised
sentations. To the best of our knowledge, semi-supervised two-dimensional PCA-based convolutional autoencoder
learning algorithms for wafer map defect recognition and with effective feature extraction capability is introduced. To
classification have been scarcely developed. overcome class imbalance and preserve spatial information,
Kong and Ni (2018) trained a CNN-based Ladder net- conditional two-dimensional PCA (C2DPCA) is proposed.
work in a semi-supervised manner to detect and classify C2DPCA aims to find the optimal projection direction by
wafer map defects. The semi-supervised Ladder network minimizing the reconstruction error, and as an image pro-
consists of a clean encoder, a corrupted encoder, and a jection method, can effectively map the high dimensional
decoder, which were trained and tested separately on two wafer maps into lower dimensional space. By transform-
datasets with 22 classes of single-type defect patterns. The ing the principal eigenvectors from 1 to 2D, C2DPCA-based
encoders are responsible for learning the latent features of the kernels are formed, such that discriminative principal com-
wafer maps. The latent features from the encoder layers are ponents are learned and used downstream for pretraining and
shared with the decoder through skip connections to recover finetuning purposes. The authors compared PCACAE perfor-
additional spatial information. Given the noised latent fea- mance to a pretrained deep learning models (i.e., AlexNet,
tures from the corrupted encoder, the decoder reconstructs GoogleNet), stacked denoising autoencoder (SDAE), and
the wafer maps with the aim to minimize the reconstruc- DBN. The results and visualizations reported in (Yu & Liu,
tion error at each layer. Compared to supervised CNN with 2020) indicate the usefulness of pretraining, and that the
varying amounts of labeled data, the authors established how C2DPCA-based kernels have effective, and powerful feature
semi-supervised learning can improve wafer map defect clas- learning capabilities. As the PCACAE framework trained
sification accuracy. As the proposed framework trained on on the WM-811K dataset, defect recognition is limited to
single-defect patterns. With the use of pretraining, PCACAE

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3233

Fig. 11 Proposed SVAE

methodology in (Kong & Ni,
2020a)

has shown reduced computational run-time (per iteration) defect recognition and classification to single-type defects,
relative to the comparison models. Although C2DPCA has disregarding the onset of mixed-type defects.
demonstrated to be effective, it is limited regarding non-linear A semi-supervised convolutional deep generative model
data, as it is essentially an orthogonal linear transformation (SS-CDGMM), shown in Fig. 12, was proposed by Lee and
on the data. Kim (2020). In contrast to other semi-supervised models
Kong and Ni (2020a) also presented a semi-supervised which established multi-class classification for single-type
variational autoencoder (SVAE) with incremental learning defect patterns, a multi-label configuration for mixed-type
(Section Enhanced learning strategies) for wafer map defect defect classification was utilized. Kingma et al. (2014)
classification, which was trained and tested on two datasets introduced new semi-supervised deep generative models
with 22 classes of single-type defect patterns. The proposed (SS-DGM), wherein the data is described as being gen-
SVAE framework (Fig. 11) comprised of three networks: erated by a latent class variable and a continuous latent
(i) inference network, (ii) discriminative network, and (iii) variable. As an extension of SS-DGM, SS-CDGMM consists
generative network. The inference network is responsible of multiple discriminative networks structures, such that each
for approximating and learning the latent feature representa- corresponding latent class variable is dedicated to one of the
tions of the wafer map defects. The discriminative network fundamental defect-types. Like Kong and Ni (2020a), SS-
is used to predict the labels of the unlabeled WMs, includ- CDGMM consists of an inference network, discriminative
ing WMs with rare/unseen defect patterns. The generative networks, and a generative network, however each discrim-
network leverages the learned latent features and predicted inative network is used to learn the absence and presence
labels for the unlabeled wafer maps to reconstruct the orig- of its respective single-defect pattern. Compared to various
inal wafer map. The authors compared the classification models (i.e., CNN, multi-layer perceptron (MLP), SS-DGM,
performance of a CNN, and the supervised components of unified VAEs), including the state-of-the-art, convolutional
SVAE, and semi-supervised Ladder network (Kong & Ni, ladder network (ConvLadder), the results showed compa-
2018) with different percentages of supervised training data. rable or better performance to the state-of-the-art. Relative
The results demonstrated the superior performance of the to the comparison models, SS-CDGMM demonstrated how
semi-supervised approach as the Ladder network, and SVAE it effectively uses labeled and unlabeled data, as well as
consistently achieved higher classification accuracy than the the effectiveness of using multiple discriminative networks.
supervised CNN, particularly at lower percentages of super- However, as the training and test data were generated and bal-
vised data. Despite the improved performance, the confusion anced across the classes, the impact of class imbalance has
matrices showed some defect classes were prone to mis- not been investigated or addressed. Additionally, only four
classification, which may have been attributed by the class distinct single-type defect patterns were considered, disre-
imbalance as the classes of the datasets were the defect garding the other known distinct defect patterns (i.e., Donut,
patterns and their respective variants. Yu et al. (2019b) pro- Near-full). Although more defect patterns may be consid-
posed a hybrid learning model, stacked convolutional sparse ered, this would result in higher run-times as the marginal
denoising autoencoder (SCSDAE). Employing data sam- log-likelihood component of the objective function requires
pling methods, SCSDAE has demonstrated effective learning computation over all defect classes.
of discriminative features from the single-type WM data; Moving away from generative modelling, self-supervised
with performance superior to deep neural networks. Simi- pretraining is emerging as an effective pretraining method
larly to (Kong & Ni, 2018), the training and test datasets for semi-supervised frameworks and classification tasks (He
contained only single-type defect patterns, which constrains et al., 2019; Chen et al., 2020b). Self-supervised contrastive

123
3234 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Fig. 12 Proposed SS-CDGMM in (Lee & Kim, 2020)

learning has been increasingly leveraged as a feature learn- 2017). Data augmentation can be executed in many ways,
ing method, wherein meaningful representations can be such as resampling, data modification, and data generation.
learned from unlabeled data and data augmentations. Hu et al. Resampling methods function to balance the class distri-
(2021) proposed a contrastive learning framework for single- bution of the existing data. Undersampling and oversampling
type defect patterns, followed by supervised finetuning of a are subcategories of resampling methods. Undersampling
classifier. Despite performance that is lower than other algo- reduces the amount of data examples from the majority
rithms, the reported results demonstrate detection rates on classes by removing data, whereas oversampling increases
par with state-of-the-art contrastive methods (i.e., SimCLR), the amount of data examples by sampling from the minority
and great potential for contrastive learning. classes with replacement. Both subcategories of resampling
methods are effective for obtaining a more balanced class
distribution, however, have their share of limitations. Under-
sampling ultimately reduces the overall amount of data, and
Enhanced learning strategies may disregard critical data examples for the majority classes,
which may impede feature learning and model performance.
The methods included in this section focus on enhancing Oversampling may result in overfitting and increased gen-
model learning, and are used to elevate model performance. eralization error, as well as increased computational time
They are organized into the following groups: (i) data aug- as the overall amount of training data is increased. Due to
mentation, (ii) incremental learning, (iii) transfer learning the limitations of resampling methods, data augmentation
and fine-tuning, and (iv) model optimization. via modification and generation are typically conducted as
Data augmentation aims to reduce overfitting by increas- they can increase and balance the amount of data, as well as
ing the amount of data, and is typically used to mitigate class increase the data diversity.
imbalance issues, which neural networks and deep learning
models are particularly sensitive towards (Perez & Wang,

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3235

Data modification methods apply label-preserving oper- denoised training data to improve model training of deep con-
ations to create synthetic variations of the existing data. volutional neural networks. Similarly, in (Lee & Kim, 2020),
Considering the circular shape of the wafer maps and the the authors employed the trained VAE to generate labeled
diversity of defect patterns, select geometric operations can wafer maps by leveraging the learned class latent variables
be applied to maintain the geometric characteristics, and for each defect-type. Data augmentation via generation can
original labels. In Kang (2020) and Jang et al. (2020), create highly diverse and realistic data, however, requires
rotation and horizontal flipping operations were applied substantial computational time and power to effectively train
to create diversified, rotation-invariant wafer maps, which the generative models.
subsequently improved defect classification performance. Incremental learning (IL) aims to increase model per-
Similarly, Saqlain et al. (2020) applied random rotations of formance by extending and adapting an existing model’s
10°, horizontal flipping, width shift, height shift, shearing, knowledge base with new training data. In context of real-
channel shifting, and zooming to augment the data. These world wafer fabrication, labels are expensive to obtain,
operations are used as they diversify the data with changes and have limited availability, which bottlenecks model per-
in orientation, position, and/or size. The different operations formance. Additionally, as defect complexity evolves and
used in data modification methods help improve model gen- new, unseen defect patterns emerge, model efficiency may
eralization as models are trained to be highly tolerant to the decrease overtime. As such, IL methods are employed to
diversified variations of defect patterns. enhance model performance in the long-term against evolv-
The data generation methods utilize generative models, ing wafer map data and defect patterns. Popular methods
such as generative adversarial networks and autoencoders, include active learning, and pseudo-labeling.
to supplement the existing collection of data by generating Active learning utilizes a querying strategy to select infor-
new synthetic data. The generative models focus on learning mative unlabeled data for manual annotation to fine-tune and
the latent feature representations and distributions of the data. further train an existing model (Fig. 14). It is important to
As the performance of many deep learning models are con- note that there are many querying strategies (Settles, 2009),
tingent on the amount and distribution of labeled data, data including uncertainty sampling, information gain, query-
generation methods are used to create realistic, new instances by-committee, expected error reduction, and total expected
of data. GANs consist of two convolutional neural networks: variance minimization. Shim et al. (2020) proposed a CNN
a generator and discriminator (Fig. 13). The generator learns with active learning via uncertainty sampling for wafer map
to create authentic fake data, and the discriminator learns defect classification. Uncertainty sampling selects the most
to distinguish between the real and fake data. Variations of ambiguous unlabeled data examples; least confidence, mar-
GANs have been developed to improve the generative mod- gin, and entropy are common estimators for uncertainty. In
elling capability. Wang et al. (2019) proposed the adaptive addition to the common uncertainty estimators, the authors
balancing generative adversarial network (AdaBalGAN), a compared mean standard deviation, variation ratio, Bayesian
conditional categorical GAN that incorporates imbalanced active learning by disagreement (BALD), and predictive
learning to generate a balanced set of synthetic data. In entropy as uncertainty estimation methods. Their results indi-
addition to the generator and discriminator, AdaBalGAN cated that BALD and mean standard deviation provided the
includes an adaptive generative controller, which recognizes best performance for defect classification via CNN with
the minority defect classes by considering defect class size, active learning. On the other hand, Kong and Ni (2020a)
as well as the recognition accuracy difference between each employed active learning using information entropy for their
defect class and the majority defect class. By recognizing the semi-supervised models, such that the unlabeled wafer maps
imbalanced class distribution, the adaptive generative con- with the maximum information entropy were selected for
troller automatically adjusts the number of synthesized wafer labeling and model fine-tuning. When investigating the sig-
maps for each defect-type. Ji and Lee (2020) developed a nificance of active learning, and pseudo-labeling, the results
deep convolutional GAN, which compounds the image pro- demonstrated improved classification accuracy. Although
cessing capabilities of multiple convolutional layers, for data active learning strategies have helped in improving model
augmentation. Aside from GANs, there are many types of performance, they are vulnerable to class imbalance, and
autoencoders, including variational, convolutional, denois- catastrophic forgetting. Class imbalance introduces sampling
ing, stacked, and sparse; the fundamental components of bias in query sampling, which skews the querying towards
autoencoders are the encoder and decoder. The encoder com- the newer classes (Ren et al., 2020), and brings on catas-
presses the input into latent space representation, and the trophic forgetting. In the process of finetuning the model
decoder uses the latent representation to reconstruct the input. with the new labeled data, catastrophic forgetting can occur
Shawon et al. (2019) and Tsai and Lee (2020b) utilized a con- when the previously learned information is degraded, and
volutional autoencoder (CAE) to generate new instances of significantly lowers model generalization (Luo et al., 2020).
The effectiveness of active learning methods is sensitive to

123
3236 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Fig. 13 AdaBalGAN framework structure from (Wang et al., 2019)

Fig. 14 General procedure for

active learning

the querying and model updating strategies, which warrants they are incorrectly predicted. Kong and Ni (2020a) imple-
careful consideration for model implementation. mented pseudo-labeling with confidence level constraints.
Pseudo-labeling supplements the incremental training The authors computed and compared the information entropy
data for model fine-tuning with predicted class labels for the for each unlabeled wafer map against a criterion threshold to
unlabeled data. As a semi-supervised learning strategy, this ensure highly confident wafer maps were used for model fine-
method uses an existing trained model to assign the pseudo- tuning. Similarly, to account for uncertainty, a 2:1 ratio for
label as the class with the maximum predicted probability. the original labeled wafer maps and pseudo-labeled wafer
The pseudo-labels for the unlabeled data would increase maps to diminish the potential disturbance from incorrect
the overall training dataset size, however, it is important to pseudo-labels.
note that pseudo-labels may disturb model performance if Transfer learning is the process of utilizing a pretrained
model for another task. As the pretrained models were trained

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3237

Table 5 Comparison of pretrained models for wafer map defect detection

T-DenseNet VGG Faster R-CNN-KITTI (Chien et al., Faster R-CNN-COCO (Chien et al.,
(Shen & Yu, (Ishida et al., 2020) 2020)
2019) 2019)

Center 64.52% 88.80% 98.92% 97.81%

Donut 91.18% 89.00% – –
Edge-Loc 81.46% 86.90% – –
Edge-Ring 66.79% 83.90% – –
Loc 100.00% 72.60% 97.27% 98.50%
Scratch 72.60% 77.20% 98.32% 96.14%
Near-full 99.31% 90.50% – –
Random 65.52% 73.50% 99.26% 99.06%
None 85.45% 95.60% – –

on large, diverse image datasets (i.e., ImageNet, CIFAR- and epochs, etc., which typically requires extensive man-
10), it is presumed that the model effectively learned feature ual searching. As such, strategies for optimization policies
representations and obtained powerful generalization capa- and network architecture engineering have been developed
bilities. The learned feature representations of the pretrained to automate the design process.
models can be re-purposed to train a new classifier, or the pre- Recently, reinforcement learning (RL) models have been
trained models can be fine-tuned to fit to a specific dataset leveraged to parse optimization policies. Bello et al. trains a
and task. The application of transfer learning and fine-tuning recurrent neural network (RNN) controller with RL for neu-
can significantly reduce training time, and achieve high per- ral optimizer searching (Bello et al., 2017). Essentially, the
formance without requiring large volumes of data. Related performance of child networks trained with different sets of
works have utilized pretrained models for wafer map defect optimizer update rules are compared to determine the opti-
recognition and classification. Shen and Yu (2019) proposed mal set of updating rules for optimization methods (Bello
the T-DenseNet framework; the pretrained DenseNet model et al., 2017). Similarly, in (Shon et al., 2021), RL was used
was fine-tuned on the wafer map dataset, and then the refined to train a RNN controller to determine the optimal data aug-
feature representations were used to set up an online testing mentation policy for wafer map transformation operations
system for incoming unlabeled wafer maps. Similarly, the (i.e., rotation, flipping, zooming). The general training pro-
pretrained VGG model (Ishida et al., 2019), and faster R- cess for RNN controllers and search algorithms is shown
CNN model (Chien et al., 2020) were utilized for wafer map in Fig. 15. Architecture engineering is used to learn and
defect recognition and classification. In Table 5, the perfor- automate the design process of deep neural network design
mance of the models in (Chien et al., 2020; Ishida et al., 2019; selection. Related works, like (Baker et al., 2017; Zoph et al.,
Shen & Yu, 2019) for the test dataset is shown, and reflect how 2018) have also used RL to explore and discover high per-
effective deep transfer learning is despite the shorter train- forming network architectures relative to task and dataset.
ing times, and how pretrained model selection may affect In both applications, RL was leveraged for as a search algo-
performance on the downstream tasks. rithm for optimal parameters and design, which improved
Research has demonstrated the importance of model selec- model training and performance, but required separate and
tion and hyperparameter tuning as these design choices (i.e., extensive training.
conditional variables, objective function, architecture, etc.) In contrast to various existing frameworks for global opti-
significantly influence performance (Banchhor & Srinivasu, mization of hyperparameters and model parameters (i.e.,
2021; Parsa et al., 2020; Ungredda & Branke, 2021). As an grid search, random search, sequential search), Bayesian
enhanced learning strategy, model optimization concentrates optimization (BO) frameworks have demonstrated state-of-
on the advanced strategies for optimizing model parameters. the-art performance with high efficiency in computation-
For image tasks like wafer map defect recognition and classi- ally expensive-to-evaluate applications (Snoek et al., 2012,
fication, the design choices for model architecture can affect 2015). As a black-box method, BO algorithms aim to prob-
performance and computational time. Standard optimiza- abilistically model the unknown function—commonly with
tion techniques involve extensive hyperparameter tuning of Gaussian processes—and establish the posterior distribution
layer parameters (i.e., stride, filter size, etc.), training batches of the respective results for the explored hyperparameter set-
tings. By maintaining the resulting posterior distribution and

123
3238 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Fig. 15 General training

procedure for child networks and
search algorithms

increase in defect complexity and frequency necessitates

the development of reliable and scalable defect detection
algorithms for efficient, and robust RCA and quality monitor-
ing. The advances in ML and DL have subsequently caused
immense progress in their application for wafer map defect
detection, with the aims to improve model accuracy, cost-
efficiency, and production yield.
For data, the wafer map datasets face limitations regard-
ing labeled data availability, class imbalance, and restricted
data access. It is known that manual annotation is expensive
and time-costly, and as a result, real-world datasets are either
small-sized, have a limited range of defects, and/or have a
plethora of unlabeled data. Class imbalance persists across
many datasets as wafer defects appear at lower frequencies
Fig. 16 General flowchart of Bayesian optimization framework than normal wafer maps; similarly, defect patterns have var-
ied occurrence probabilities. With respect to algorithms that
leverage labeled data, the compounding effect of these lim-
exploiting past observations, BO algorithms utilize an acqui- itations impose difficulty in achieving robust learning and
sition strategy to make informed decisions about which best high-level defect detection, particularly for unknown/rare
set of function parameters to evaluate next. As demonstrated defect patterns, mixed-type defects, and minor classes. Due
in (Jang et al., 2020), Gaussian process-based BO was used to restricted data access, many works directly employ pri-
to tune the CNN hyperparameters, such as learning rate, filter vate datasets from semiconductor manufacturing companies.
size, number of filters, and number of nodes in the fully con- With limited access to real data, synthetic data genera-
nected layers. The hyperparameter settings were evaluated tion is increasingly appearing in related works to develop
on the training data, and considered training time to prevent more effective models. However, with synthetic data, train-
overfitting attributed by high complexity model architecture. ing generative models to produce realistic wafer maps that
In Fig. 16, the flowchart of the general BO framework is are up-to-date on present design standards (i.e., wafer size,
shown. IC node size), and similar to real-world defect patterns is
difficult and relatively time-costly.
Features are a critical component of ML/DL training. Past
Discussion and conclusion works have demonstrated how manual feature generation
informed with high-level domain expertise can be advan-
The current status of semiconductor wafers and ICs have tageous, however, may also lead the derived features to
reached sub-10 nm features, meanwhile the projected out- miss hidden/underlying structures. Similarly, with respect
look for their fabrication and process technologies indicate to new defects, manually generated features face limita-
the realization of sub-5 and sub-3 nm feature sizes. With the tions as important characteristics of the defects may not be
advent and realization of these future trends, the expected

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3239

known or understood well-enough to generate effective fea- train the network of binary classifiers, which only considered
tures. The onset of CNNs were prompted by the automated a limited range of defects, and required large amounts of data
feature extraction capability in which rich, descriptive fea- for sufficient training. The prominent supervised methods are
tures can be learned. Similarly, with representation learning, summarized in Table 6.
raw data can be used with minimal preprocessing, and can As supervised methods face performance limitations due
gain high-level discriminatory power for complex patterns; to the amounts of labeled wafer maps, the unsupervised meth-
proving that feature representation learning methods can gen- ods demonstrate how the plethora of unlabeled wafer maps
erate more meaningful and effective features for downstream can be leveraged. Despite achieving comparable defect detec-
defect pattern classification. However, the capacity of feature tion performance, unsupervised clustering algorithms are
representation learning is constrained by model complexity, sensitive to kernel methods and their respective parameters,
and whether the model is suited towards learning the complex and typically have high time complexity, resulting in long
structure of the data and problem (task) at hand. run-times. These methods are sensitive to initialization and
For supervised learning methods, although the use of hyperparameters, indicating the criticality of hyperparameter
labels can help models achieve improved performance with optimization for performance (Samariya & Thakkar, 2021).
low computational cost, they are limited by the following: (i) Related works have recognized this difficulty of using pre-set
amount of labeled data, (ii) heavily influenced by class label parameters (i.e., number of clusters), and in response adapted
distribution and data splits, and (iii) overfitting. As the acqui- clustering algorithms with the ability to estimate the num-
sition of labels is expensive, limited amounts of high-quality ber of clusters. However, as these methods involve inference
wafer maps are available for training and testing, in which the networks, computational complexity increases, attributing
limited amounts bottleneck classification performance, and long inference speeds which subsequently increases overall
highlights limitations in terms of real-world scalability. To run-time. It is important to note that despite the importance
tackle this limitation, related works have employed data aug- of hyperparameter optimization, related works employed
mentation techniques to supplement the small-sized datasets, simpler optimization frameworks, such as grid search or low-
however with the risk of increasing computational costs and level sensitivity evaluation. Based on reconstruction error,
generalization error. Similarly, related works implemented unsupervised pretraining is utilized to improve the initializa-
specialized modules and deep learning networks to improve tion of the model weights relative to random initialization,
learning. With modified modules like the deformable con- such that training time is faster as the weights are closer to a
volutional unit and the usage of specialized loss functions local optimum. However, in (Alberti et al., 2017), the authors
(i.e., contrastive loss, triplet loss), discriminative feature rep- demonstrated how minimizing the reconstruction error for
resentations were learned. However, these works focused layer-wise training of the autoencoder is not optimal for
on single-type defects, or considered a limited degree of downstream finetuning for classification tasks as the learned
diversity for the mixed-type defects. In face of new defects feature representations may not necessarily be meaningful
and combinations, the performance of these algorithms may (i.e., an identity function may be learned). The literature for
decrease with the growth in number of defect classes as class unsupervised pretraining methods demonstrates that repre-
distinctiveness and imbalance may take a toll. As labels are sentation learning with unlabeled data can be advantageous
used as the supervisory signal for training, model perfor- but needs an effective strategy to learn meaningful feature
mance is sensitive to class label distribution, data splits, and representations without high computational costs. In Table
class distinctiveness. Class imbalance can induce overfitting 7, the performance of the prominent unsupervised clustering
on the majority classes, with high misclassification on the algorithms is summarized.
minority classes, and may not be able to differentiate between Semi-supervised algorithms address the issues of data
similar defect patterns. This is a critical issue as supervised availability and ineffective feature learning from super-
methods are highly susceptible to overfitting as training is vised and unsupervised methods; demonstrating how the use
contingent on labels, such that careful considerations should of both labeled and unlabeled data can achieve improved
be made for model and training process parameters to pre- defect recognition and classification. In particular, the semi-
vent overfitting. Relative to defect type, majority of works for supervised deep generative modelling approach has shown
supervised algorithms are focused on the detection of single- effective latent representation learning, and generative capa-
type defects, and are not suitable for recognizing mixed-type bilities, but at a relatively high computational cost. It is
defects, albeit the increasing relevance of mixed-type defects. important to note that with limited amounts of labeled
Multi-label based mixed-type WMDD has limited develop- data, model selection is quite important for semi-supervised
ment and has not be extensively studied for scalability under learning to avoid overfitting and to promote effective rep-
low resource settings, whereas for multi-class based mixed- resentation learning (Kingma et al., 2014). In comparison
type defect detection, much more literature exists. It has to supervised and unsupervised methods, the literature for
noted that additional computational power was required to semi-supervised is scarcely developed despite the promising

123
3240 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Table 6 Summary of recognition

rates of prominent supervised Author Algorithm Data Results
algorithms for wafer map defect
detection Yu and Lu (2016) JLNDA WM-811K (Real) 0.9050
Tello et al. (2018) RGRN + CNN Real & Synthetic Single-type &
Mixed-type: 0.8617
Nakazawa and CNN Real 0.9820
Kulkarni (2018)
Kyeong and Kim CNN Real & Synthetic Single-type: 0.8700
(2018) Mixed-type: 0.9000
Saqlain et al. (2019) Ensemble WM-811K (Real) 0.9587
Kong and Ni (2019) Segmentation + CNN Real & Synthetic EMR: 0.8775
(Mixed-type) Accuracy: 0.9575
Kong and Ni (2020b) Segmentation + CNN Real (Mixed-type) 0.9010
Wang et al. (2020) DCN Mixed WM-38 (Real) 0.9320
Alawieh et al. (2020) CNN WM-811K (Real) 0.9400
Zhuang et al. (2020) DBN Real Single-type: 0.9267
Mixed-type: 0.9120
Hyun and Kim (2020) CNN Synthetic (Mixed-type) 0.8320
Tsai and Lee (2020a) CNN WM-811K (Real) 0.9663
Jin et al. (2020) CNN-SVM-ECOC WM-811K (Real) 0.9843
Kim et al. (2021) SSD WM-811K (Real) Single-type: 0.8900
Mixed-type: 0.9200

Table 7 Summary of prominent unsupervised clustering algorithms for Table 8 Summary of prominent semi-supervised algorithms for wafer
wafer map defect detection map defect detection1

Author Algorithm Data Results Author Algorithm Data Results

Kim et al. (2018) CPF-iWMM Real RI: 0.96 Yu et al. SCSDAE WM-811K 0.9883
ARI: (2019b)2 (Real)
0.92 Yu (2019) ESDAE WM-811K 0.9703
NMI: (Real)
0.92
Hu et al. Semi-supervised WM-811K 0.7790
Taha et al. (2018)a DDPfinder Real & 0.9980 (2021)3 with contrastive (Real)
Synthetic learning
Hwang and Kim DPGMM Real ARI: Lee and Kim SS-CDGMM Real & 0.9488
(2020) 0.76 (2020)4 Synthetic
AMI:
0.76 Kong and Ni Semi-supervised Real 0.9020
(2020b) Ladder Network
a Result reflects the clustering accuracy, which is the algorithm’s ability
Kong and Ni SVAE Real 0.9030
to correctly cluster defect patterns (2020a)
Yu and Liu PCACAE WM-811K 0.9377
(2020) (Real)
results; indicating great potential in future developments. In Shon et al. CVAE + CNN WM-811K 0.9689
Table 8, the prominent semi-supervised methods are sum- (2021) (Real)
marized, including the methods that utilized unsupervised 1 Performance results here reflect the overall average recognition rates.
pretraining and supervised finetuning. 2 Based on performance with original dataset.
3 Performance notes the best overall accuracy.
Enhanced learning strategies were used to boost defect
4 Notes the highest EMR score from the labeled-unlabeled ratios.
recognition and classification performance. Data augmen-
tation methods utilized image transformations and/or gen-
erative models to mitigate class imbalance issues, and
subsequently increase data diversity. Although GANs have
advanced data generation capabilities, they require substan- discriminator networks. As GAN training involves a trade-
tial computational time to effectively train the generator and off between the generator and discriminator, the models are

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3241

susceptible to getting stuck in local minima. For incremen- which in turn, restricts scalability and potential deployment
tal learning strategies, techniques like active learning and for real-time implementation, and increases training time and
pseudo-labeling have demonstrated capability in boosting needed processing resources. As this industry will remain
model performance. However, are susceptible to catastrophic competitive and continuously growing, computational com-
forgetting, and hyperparameter sensitivity (i.e., querying plexity should be reduced to be more efficient. These existing
strategy, ratio of original to pseudo-labeled data). With the challenges impose on implementation, scalability, and adapt-
help of transfer learning, many training processes have been ability to new state-of-the-art designs and feature sizes.
expedited to achieve relatively high accuracy with shorter With the plethora of unlabeled data available, recent
training times. However, as model complexity, data, and other developments that leverage ML/DL for self-supervised, and
design choices can impact performance, model selection and semi-supervised learning indicate potential to surpass super-
hyperparameter tuning need to be carefully considered. For vised learning for efficient feature representation learning,
model optimization strategies, RL and BO frameworks are image recognition, and classification. To promote future
used to bypass the extensive manual searching. These strate- developments for defect detection, which allows researchers
gies are important in understanding the sensitivities a model and engineers to validate and test against new designs and
may have to input/output, architecture, etc. Although RL feature sizes, consideration towards building a database with
imposes extensive training to determine optimal parameters real-world defects is needed. Consolidating continuous inno-
and design, BO provides a more computationally efficient vation, growth, and development indicates great promise
alternative to tuning model hyperparameters. However, it towards achieving efficient, and robust defect detection.
should be noted that for multi-objectives and increasing Based on the challenges, and current landscape of this field,
number of observations, BO frameworks become more com- the future outlook of WMDD research is summarized as fol-
putationally complex, which subsequently requires more lows:
processing resources.
(1) Handling class imbalance: As many works have focused
Challenges and outlook on tackling the class imbalance issue, it is evident
that performance suffers with skewed data distribu-
In this article, we survey the literature of ML and DL
tion. Development for more robust handling of class
applications for wafer map defect recognition and classifi-
imbalance is needed, particularly as mixed-type defects
cation, which demonstrated superior performance, as well
become more critical.
as great potential and applicability for in-line integration.
(2) Effective unsupervised feature representation learn-
However, despite the reported successes, many challenges
ing: As ML/DL, and computer vision applications are
in implementing these methodologies have been identified,
increasingly developing self-supervised techniques for
including difficulty learning new defects, difficulty differen-
image classification and pattern segmentation, these
tiating between similar defect patterns, taxing computational
methods should be investigated, especially in face of
loads, and lack of robust detection of complex defects. With
limited labeled data and the limitations of pretraining
respect to the surveyed literature, the following findings
via reconstruction loss.
emerged as the most prominent challenges in the WMDD
(3) Real-time Monitoring: Majority of developments are
field: (i) data availability, (ii) mixed-type defects, and (iii)
offline systems. Consideration of model requirements
high computational complexity. The field of WMDD is con-
to meet the conditions needed for real-time monitoring
tinuously developed, however, there is limited access to
and operation.
databases that reflect the current design and complexity of
(4) Computational Complexity: With respect to real-time
wafers and ICs. This is apparent in recent works that utilize
monitoring, more efficient and less computational com-
the WM-811K dataset, which is most likely outdated in terms
plex algorithms are needed to reduce the burden from
of wafer size, IC node size, etc. Similarly, as only private data
training and processing, memory requirements, and
can properly reflect the present design standards, which has
scalability limitations.
restricted access, the innovation and research for WMDD
(5) Model Optimization: Due to the complex parameter-
is slowed. Although simulated data is an option, there is
structure-performance relationship, calibrating model
currently a gap in producing realistic, synthetic wafer maps
selection and the optimal set of parameters is needed.
similar to real defect patterns. Majority of existing literature
From the existing literature, there is limited exploration
focuses on single-type defects, despite the growing criticality
and investigation into model optimization and joint
of mixed-type defects. Although mixed-type defect detection
hyperparameter tuning.
algorithms exist, many are limited in terms of labeled data
availability, range of defect pattern types, and computational
load. Many developments impose a high computational load,

123
3242 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Acknowledgements The work described in this paper was supported Abbreviation Term
by Natural Sciences and Engineering Research Council of Canada
(NSERC under grant RGPIN-217525). The authors are grateful for their
DCNN Deep Convolutional Neural Network
support.
DDPfinder Dominant Defective Patterns Finder
Funding This work was supported by Natural Sciences and Engineer- DFS Depth-first Search
ing Research Council of Canada (NSERC), Grant RGPIN-217525.
DL Deep Learning
DP Dirichlet Process
ECOC Error-Correcting Output Codes
Appendix EMR Exact Match Ratio
ESDAE Enhanced Stacked Denoising Autoencoder
EUV Extreme Ultraviolet
FAM Fuzzy ARTMAP
Abbreviation Term
FD Fischer-discriminant
AC Adjacency Clustering FZ Float-zone
AdaBalGAN Adaptive Balancing Generative Adversarial GAN Generative Adversarial Network
Network GBM Gradient Boosting Machine
AMI Adjusted Mutual Information GMM Gaussian Mixture Model
ANN Artificial Neural Network GRN Generalized Regression Network
ARI Adjusted Rand Index IC Integrated Circuit
BALD Bayesian Active Learning by Disagreement IL Incremental Learning
BO Bayesian Optimization ILT Inverse-lithography Technology
BB Bounding Box iWMM Infinite Warped Mixture Model
BPN Back Propagation Network JLNLDA Joint Local and Non-local Linear Discriminant
C2DPCA Conditional Two-Dimensional PCA Analysis
CCD Charge-Coupled Devices kNN k-Nearest Neighbors
CMP Chemical Mechanical Process LDA Linear Discriminant Analysis
CNN Convolutional Neural Network LLE Locally Linear Embedding
CPF Connected-Path Filtering LR Logistic Regression
CVAE Convolutional Variational Autoencoder MDS Multi-Dimensional Scaling
CZ Czochralski MI Mutual Information
DBN Deep Belief Network ML Machine Learning
DBSCAN Density-based Spatial Clustering of Applications MLP Multi-Layer Perceptron
with Noise MPre Micro-Precision
DCN Deformable Convolutional Network MRe Micro-Recall
NMI Normalized Mutual Information
OPTICS Ordering Point to Identify the Cluster Structure
PCA Principal Component Analysis
PCACAE PCA-based Convolutional Autoencoder
RCA Root-cause Analysis
RF Random Forest
RGRN Randomized General Regression Network
RI Rand Index
RL Reinforcement Learning
RNN Recurrent Neural Network
SAT Scanning Acoustic Tomography
SCSDAE Stacked Convolutional Sparse Denoising
Autoencoder

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3243

Abbreviation Term Bella, R. D., Carrera, D., Rossi, B., Fragneto, P., & Boracchi, G. (2019,
September). Wafer defect map classification using sparse convo-
lutional networks. In International Conference on Image Analysis
SDAE Stacked Denoising Autoencoder
and Processing (pp. 125–136). Springer, Cham.
SEM Scanning Electron Microscopy Bello, I., Zoph, B., Vasudevan, V., & Le, Q. V. (2017). Neural Optimizer
SS-CDGMM Semi-supervised Convolutional Deep Generative Search with Reinforcement Learning. In Proceedings of 34th Inter-
Multiple Models national Conference on Machine Learning (pp. 459–468). Sydney.
Retrieved from https://arxiv.org/abs/1709.07417.
SSD Single Shot Detector Banchhor, C., & Srinivasu, N. (2021). Analysis of Bayesian optimiza-
SVAE Semi-supervised Variational Autoencoder tion algorithms for big data classification based on Map Reduce
SVC Support Vector Clustering framework. Journal of Big Data, 8(1), 81. https://doi.org/10.1186/
s40537-021-00464-4
SVE Soft Voting Ensemble Byun, Y., & Baek, J. G. (2020). Mixed pattern recognition methodology
SVM Support Vector Machine on wafer maps with pre-trained convolutional neural networks. In
t-SNE t-distributed Stochastic Neighbor Embedding A. Rocha, L. Steels, & J. van den Herik (Eds.), ICAART 2020 -
Proceedings of the 12th International Conference on Agents and
TTV Total Thickness Variation Artificial Intelligence (pp. 974–979). (ICAART 2020—Proceed-
UV Ultraviolet ings of the 12th International Conference on Agents and Artificial
VAE Variational Autoencoder Intelligence; Vol. 2). SciTePress.
Chang, C.-W., Chao, T.-M., Horng, J.-T., Lu, C.-F., & Yeh, R.-H. (2012).
WBM Wafer Bin Map Development pattern recognition model for the classification of
WM Wafer Map circuit probe wafer maps on semiconductors. IEEE Transactions
WMDD Wafer Map Defect Detection on Components, Packaging and Manufacturing Technology, 2(12),
2089–2097. https://doi.org/10.1109/TCPMT.2012.2215327
Chen, H.-C. (2020). Automated detection and classification of defec-
tive and abnormal dies in wafer images. Applied Sciences, 10(10),
3423. https://doi.org/10.3390/app10103423
Chen, S.-H., Kang, C.-H., & Perng, D.-B. (2020a). Detecting and mea-
References suring defects in wafer die using GAN and YOLOv3. Applied
Sciences, 10(23), 8725. https://doi.org/10.3390/app10238725
Adly, F., Alhussein, O., Yoo, P. D., Al-Hammadi, Y., Taha, K., Muhai- Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020b). A Simple
dat, S., Jeong, Y.-S., Lee, U., & Ismail, M. (2015a). Simplified Framework for Contrastive Learning of Visual Representations.
subspaced regression network for identification of defect patterns Chen, F.-L., & Liu, S.-F. (2000). A neural-network approach to rec-
in semiconductor wafer maps. IEEE Transactions on Industrial ognize defect spatial pattern in semiconductor fabrication. IEEE
Informatics, 11(6), 1267–1276. https://doi.org/10.1109/TII.2015. Transactions on Semiconductor Manufacturing, 13(3), 366–373.
2481719 https://doi.org/10.1109/66.857947
Adly, F., Yoo, P., Muhaidat, S., Al-Hammadi, Y., Lee, U., & Ismail, Cheon, S., Lee, H., Kim, C. O., & Lee, S. H. (2019). Convolutional
M. (2015b). Randomized general regression network for identi- neural network for wafer surface defect classification and the
fication of defect patterns in semiconductor wafer maps. IEEE detection of unknown defect class. IEEE Transactions on Semicon-
Transactions on Semiconductor Manufacturing, 28(2), 145–152. ductor Manufacturing, 32(2), 163–170. https://doi.org/10.1109/
https://doi.org/10.1109/tsm.2015.2405252 tsm.2019.2902657
Airaksinen, V.-M. (2015). Silicon wafer and thin film measurements. In Chien, C.-F., Hsu, S.-C., & Chen, Y.-J. (2013). A system for online
M. Tilli, T. Motooka, V.-M. Airaksinen, S. Franssila, M. Paulasto- detection and classification of wafer bin map defect patterns for
Kröckel , & V. Lindroos (Eds.), Handbook of Silicon Based MEMS manufacturing intelligence. International Journal of Production
Materials and Technologies (2nd Ed., pp. 381–390). https://doi. Research, 51(8), 2324–2338. https://doi.org/10.1080/00207543.
org/10.1016/B978-0-323-29965-7.00015-4 2012.737943
Alawieh, M. B., Boning, D., & Pan, D. Z. (2020). Wafer map defect Chien, J.-C., Wu, M.-T., & Lee, J.-D. (2020). Inspection and classi-
patterns classification using deep selective learning. In 2020 57th fication of semiconductor wafer surface defects using CNN deep
ACM/IEEE Design Automation Conference (DAC). https://doi.org/ learning networks. Applied Sciences, 10(15), 5340. https://doi.org/
10.1109/dac18072.2020.9218580 10.3390/app10155340
Alawieh, M. B., Wang, F., & Li, X. (2018). Identifying wafer-level Choi, G., Kim, S.-H., Ha, C., & Bae, S. J. (2012). Multi-step ART1 algo-
systematic failure patterns via unsupervised learning. IEEE Trans- rithm for recognition of defect patterns on semiconductor wafers.
actions on Computer-Aided Design of Integrated Circuits and International Journal of Production Research, 50(12), 3274–3287.
Systems, 37(4), 832–844. https://doi.org/10.1109/TCAD.2017. https://doi.org/10.1080/00207543.2011.574502
2729469 Cuevas, A., & Sinton, R. A. (2018). Chapter III-1-A - Characteriza-
Alberti, M., Seuret, M., Ingold, R., & Liwicki, M. (2017, December tion and Diagnosis of Silicon Wafers, Ingots, and Solar Cells. In
17). A Pitfall of Unsupervised Pre-Training. arXiv.org. Retrieved D. Macdonald & S. A. Kalogirou (Eds.), McEvoy’s Handbook of
from https://arxiv.org/abs/1712.01655. Photovoltaics (3rd Ed., pp. 1119–1154). Essay, Academic Press.
Baker, B., Raskar, R., Naik, N., & Gupta, O. (2017). Designing Neural Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017).
Network Architectures using Reinforcement Learning. In Proc. of Deformable convolutional networks. In 2017 IEEE International
ICLR 2017. Retrieved from https://arxiv.org/abs/1611.02167. Conference on Computer Vision (ICCV). https://doi.org/10.1109/
Batool, U., Shapiai, M. I., Fauzi, H., & Fong, J. X. (2020). Convolutional iccv.2017.89
neural network for imbalanced data classification of silicon wafer
defects. In 2020 16th IEEE International Colloquium on Signal
Processing Its Applications (CSPA), 230–235. https://doi.org/10.
1109/CSPA48992.2020.9068669

123
3244 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Devika, B., & George, N. (2019). Convolutional neural network for Hwang, J., & Kim, H. (2020). Variational deep clustering of wafer
semiconductor wafer defect detection. In 2019 10th Interna- map patterns. IEEE Transactions on Semiconductor Manufactur-
tional Conference on Computing, Communication and Network- ing, 33(3), 466–475. https://doi.org/10.1109/tsm.2020.3004483
ing Technologies (ICCCNT) (pp. 1–6). https://doi.org/10.1109/ Hyun, Y., & Kim, H. (2020). Memory-augmented convolutional neural
ICCCNT45670.2019.8944584 networks with triplet loss for imbalanced wafer defect pattern clas-
di Palma, F., de Nicolao, G., Miraglia, G., Pasquinetti, E., & Pic- sification. IEEE Transactions on Semiconductor Manufacturing,
cinini, F. (2005). Unsupervised spatial pattern classification of 33(4), 622–634. https://doi.org/10.1109/tsm.2020.3010984
electrical-wafer-sorting maps in semiconductor manufacturing. Ishida, T., Nitta, I., Fukuda, D., & Kanazawa, Y. (2019). Deep learning-
Pattern Recognition Letters, 26(12), 1857–1865. https://doi.org/ based wafer-map failure pattern recognition framework. In 20th
10.1016/j.patrec.2005.03.007 International Symposium on Quality Electronic Design (ISQED).
Du, D.-Y., & Shi, Z. (2020). A wafer map defect pattern classifica- https://doi.org/10.1109/isqed.2019.8697407
tion model based on deep convolutional neural network. In 2020 Iwata, T., Duvenaud, D., & Ghahramani, Z. (2013, March 21). Warped
IEEE 15th International Conference on Solid-State Integrated Mixtures for Nonparametric Cluster Shapes. arXiv.org. Retrieved
Circuit Technology (ICSICT) (pp. 1–3). https://doi.org/10.1109/ from https://arxiv.org/abs/1206.1846.
ICSICT49897.2020.9278021 Jang, J., Seo, M., & Kim, C. O. (2020). Support weighted ensemble
Ebayyeh, A. A., & Mousavi, A. (2020). A review and analysis of model for open set recognition of wafer map defects. IEEE Trans-
automatic optical inspection and quality monitoring methods in actions on Semiconductor Manufacturing, 33(4), 635–643. https://
electronics industry. IEEE Access, 8, 183192–183271. https://doi. doi.org/10.1109/tsm.2020.3012183
org/10.1109/access.2020.3029127 Ji, Y. S., & Lee, J.-H. (2020). Using GAN to improve CNN perfor-
Ezzat, A. A., Liu, S., Hochbaum, D. S., & Ding, Y. (2021). A graph- mance of wafer map defect type classification: Yield enhancement.
theoretic approach for spatial filtering and its impact on mixed-type In 2020 31st Annual SEMI Advanced Semiconductor Manufac-
spatial pattern recognition in wafer bin maps. IEEE Transactions turing Conference (ASMC). https://doi.org/10.1109/asmc49169.
on Semiconductor Manufacturing, 34(2), 194–206. https://doi.org/ 2020.9185193
10.1109/tsm.2021.3062943 Jin, C. H., Kim, H.-J., Piao, Y., Li, M., & Piao, M. (2020). Wafer
Faaeq, A., Guruler, H., & Peker, M. (2018). Image classification using map defect pattern classification based on convolutional neural
manifold learning based non-linear dimensionality reduction. In network features and error-correcting output codes. Journal of
2018 26th Signal Processing and Communications Applications Intelligent Manufacturing, 31(8), 1861–1875. https://doi.org/10.
Conference (SIU). https://doi.org/10.1109/siu.2018.8404441 1007/s10845-020-01540-x
Fan, M., Wang, Q., & van der Waal, B. (2016). Wafer defect pat- Jin, C. H., Na, H. J., Piao, M., Pok, G., & Ryu, K. H. (2019). A
terns recognition based on OPTICS and multi-label classification. novel DBSCAN-based defect pattern detection and classification
In 2016 IEEE Advanced Information Management, Communi- framework for wafer bin map. IEEE Transactions on Semiconduc-
cates, Electronic and Automation Control Conference (IMCEC), tor Manufacturing, 32(3), 286–292. https://doi.org/10.1109/tsm.
912–915. https://doi.org/10.1109/IMCEC.2016.7867343 2019.2916835
Hasan, R. M., & Luo, X. (2018). Promising lithography techniques for Kang, S. (2020). Rotation-invariant wafer map pattern classifi-
next-generation logic devices. Nanomanufacturing and Metrology, cation with convolutional neural networks. IEEE Access, 8,
1(2), 67–81. https://doi.org/10.1007/s41871-018-0016-9 170650–170658. https://doi.org/10.1109/access.2020.3024603
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. B. (2019). Momentum Kang, H., & Kang, S. (2021). A stacking ensemble classifier with
Contrast for Unsupervised Visual Representation Learning. CoRR, handcrafted and convolutional features for wafer map pattern clas-
abs/1911.05722. http://arxiv.org/abs/1911.05722 sification. Computers in Industry, 129, 103450. https://doi.org/10.
Hsu, S.-C., & Chien, C.-F. (2007). Hybrid data mining approach for 1016/j.compind.2021.103450
pattern extraction from wafer bin map to improve yield in semi- Khastavaneh H., & Ebrahimpour-Komleh H. (2020) Representation
conductor manufacturing. International Journal of Production learning techniques: An overview. In Bohlouli, M., Sadeghi
Economics, 107(1), 88–103. https://doi.org/10.1016/j.ijpe.2006. Bigham, B., Narimani, Z., Vasighi, M., & Ansari, E. (Eds.),
05.015 Data Science: From Research to Application. CiDaS 2019. Lec-
Hsu, C.-Y., Chen, W.-J., & Chien, J.-C. (2020). Similarity matching ture Notes on Data Engineering and Communications Technolo-
of wafer bin maps for manufacturing intelligence to empower gies, Vol. 45. Springer, Cham. https://doi.org/10.1007/978-3-030-
Industry 3.5 for semiconductor manufacturing. Computers & 37309-2_8
Industrial Engineering, 142, 106–358. https://doi.org/10.1016/j. Kim, Y., Cho, D., & Lee, J.-H. (2020a). Wafer map classifier using deep
cie.2020.106358 learning for detecting out-of-distribution failure patterns. IEEE
Hu, M. (1962). Visual pattern recognition by moment invariants. IEEE International Symposium on the Physical and Failure Analysis
Transactions on Information Theory, 8(2), 179–187. https://doi. of Integrated Circuits (IPFA), 2020, 1–5. https://doi.org/10.1109/
org/10.1109/tit.1962.1057692 IPFA49335.2020.9260877
Hu, H., He, C., & Li, P. (2021). Semi-supervised wafer map pattern Kim, B., Jeong, Y.-S., Tong, S. H., & Jeong, M. K. (2020b). A gener-
recognition using domain-specific data augmentation and con- alised uncertain decision tree for defect classification of multiple
trastive learning. IEEE International Test Conference (ITC), 2021, wafer maps. International Journal of Production Research, 58(9),
113–122. https://doi.org/10.1109/ITC50571.2021.00019 2805–2821. https://doi.org/10.1080/00207543.2019.1637035
Huang, C.-J. (2007). Clustered defect detection of high quality Kim, J., Lee, Y., & Kim, H. (2018). Detection and clustering of mixed-
chips using self-supervised multilayer perceptron. Expert Sys- type defect patterns in wafer bin maps. IISE Transactions, 50(2),
tems with Applications, 33(4), 996–1003. https://doi.org/10.1016/ 99–111. https://doi.org/10.1080/24725854.2017.1386337
j.eswa.2006.07.011 Kim, T. S., Lee, J. W., Lee, W. K., & Sohn, S. Y. (2021). Novel method
Huang, C.-J., Chen, Y.-J., Wu, C.-F., & Huang, Y.-A. (2009). Applica- for detection of mixed-type defect patterns in wafer maps based
tion of neural networks and genetic algorithms to the screening for on a single shot detector algorithm. Journal of Intelligent Manu-
high quality chips. Applied Soft Computing, 9(2), 824–832. https:// facturing. https://doi.org/10.1007/s10845-021-01755-6
doi.org/10.1016/j.asoc.2008.10.002

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3245

Kim, S., & Oh, I. S. (2017). Automatic Defect Detection from SEM Northcutt, C., Jiang, L., & Chuang, I. (2021). Confident learning:
Images of Wafers using Component Tree. JSTS Journal of Semi- Estimating uncertainty in dataset labels. Journal of Artificial Intel-
conductor Technology and Science, 17(1), 86–93. https://doi.org/ ligence Research, 70, 1373–1411. https://doi.org/10.1613/jair.1.
10.5573/jsts.2017.17.1.086 12125
Kingma, D. P., Rezende, D. J., Mohamed, S., & Welling, M. Ooi, M.P.-L., Sok, H. K., Kuang, Y. C., Demidenko, S., & Chan, C.
(2014). Semi-supervised learning with deep generative models. (2013). Defect cluster recognition system for fabricated semi-
In Advances in Neural Information Processing Systems (Vol. 4, conductor wafers. Engineering Applications of Artificial Intel-
pp. 3581–3589). ligence, 26(3), 1029–1043. https://doi.org/10.1016/j.engappai.
Kong, Y., & Ni, D. (2018). Semi-supervised classification of wafer map 2012.03.016
based on ladder network. In 2018 14th IEEE International Confer- Park, S., Jang, J., & Kim, C. O. (2020). Discriminative feature learning
ence on Solid-State and Integrated Circuit Technology (ICSICT). and cluster-based defect label reconstruction for reducing uncer-
https://doi.org/10.1109/icsict.2018.8564982 tainty in wafer bin map label. Journal of Intelligent Manufacturing,
Kong, Y., & Ni, D. (2019). Recognition and location of mixed-type 32(1), 251–263. https://doi.org/10.1007/s10845-020-01571-4
patterns in wafer bin maps. In 2019 IEEE International Confer- Parsa, M., Mitchell, J. P., Schuman, C. D., Patton, R. M., Potok, T.
ence on Smart Manufacturing, Industrial & Logistics Engineering E., & Roy, K. (2020). Bayesian multi-objective hyperparameter
(SMILE). https://doi.org/10.1109/smile45626.2019.8965309 optimization for accurate, fast, and efficient neural network accel-
Kong, Y., & Ni, D. (2020a). A semi-supervised and incremental mod- erator design. Frontiers in Neuroscience. https://doi.org/10.3389/
eling framework for wafer map classification. IEEE Transactions fnins.2020.00667
on Semiconductor Manufacturing, 33(1), 62–71. https://doi.org/ Patel, D. V., Bonam, R., & Oberai, A. A. (2020). Deep learning-based
10.1109/tsm.2020.2964581 detection, classification, and localization of defects in semicon-
Kong, Y., & Ni, D. (2020b). Qualitative and quantitative analysis of ductor processes. Journal of Micro/nanolithography, MEMS, and
multi-pattern wafer bin maps. IEEE Transactions on Semiconduc- MOEMS, 19(02), 1. https://doi.org/10.1117/1.jmm.19.2.024801
tor Manufacturing, 33(4), 578–586. https://doi.org/10.1109/tsm. Patel, S., Sihmar, S., & Jatain, A. (2015). A study of hierarchical
2020.3022431 clustering algorithms. In 2015 2nd International Conference on
Kyeong, K., & Kim, H. (2018). Classification of mixed-type defect pat- Computing for Sustainable Global Development (INDIACom)
terns in wafer bin maps using convolutional neural networks. IEEE (pp. 537–541).
Transactions on Semiconductor Manufacturing, 31(3), 395–402. Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in
https://doi.org/10.1109/tsm.2018.2841416 image classification using deep learning. CoRR, abs/1712.04621.
Lee, H., & Kim, H. (2020). Semi-supervised multi-label learning for http://arxiv.org/abs/1712.04621
classification of wafer bin maps with mixed-type defect pat- Piao, M., Jin, C. H., Lee, J. Y., & Byun, J.-Y. (2018). Decision tree
terns. IEEE Transactions on Semiconductor Manufacturing, 33(4), ensemble-based wafer map failure pattern recognition based on
653–662. https://doi.org/10.1109/tsm.2020.3027431 radon transform-based features. IEEE Transactions on Semicon-
Li, K., Liao, P., Cheng, K., Chen, L., Wang, S., Huang, A., et al. (2021). ductor Manufacturing, 31(2), 250–257. https://doi.org/10.1109/
Hidden wafer scratch defects projection for diagnosis and quality TSM.2018.2806931
enhancement. IEEE Transactions on Semiconductor Manufactur- Pleschberger, M., Scheiber, M., & Schrunner, S. (2019). Simulated ana-
ing, 34(1), 9–16. https://doi.org/10.1109/tsm.2020.3040998 log wafer test data for pattern recognition. Zenodo. https://doi.org/
Li, T.-S., & Huang, C.-L. (2009). Defect spatial pattern recognition 10.5281/zenodo.2542504
using a hybrid SOM–SVM approach in semiconductor manufac- Preil, M. E. (2016). Patterning challenges in the sub-10 nm era. Optical
turing. Expert Systems with Applications, 36(1), 374–385. https:// Microlithography XXIX, 10(1117/12), 2222256.
doi.org/10.1016/j.eswa.2007.09.023 Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z., Chen, X., & Wang, X.
Liao, C.-S., Hsieh, T.-J., Huang, Y.-S., & Chien, C.-F. (2014). Similarity (2020, August 30). A survey of deep active learning. arXiv.org.
searching for defective wafer bin maps in semiconductor manufac- Retrieved September 22, 2021, from https://arxiv.org/abs/2009.
turing. IEEE Transactions on Automation Science and Engineer- 00236.
ing, 11(3), 953–960. https://doi.org/10.1109/TASE.2013.2277603 Ruthotto, L., & Haber, E. (2021). An introduction to deep gen-
Liu, C.-W., & Chien, C.-F. (2013). An intelligent system for wafer erative modeling. GAMM-Mitteilungen. https://doi.org/10.1002/
bin map defect diagnosis: An empirical study for semiconduc- gamm.202100008
tor manufacturing. Engineering Applications of Artificial Intel- Samariya, D., & Thakkar, A. (2021). A comprehensive survey of
ligence, 26(5–6), 1479–1486. https://doi.org/10.1016/j.engappai. anomaly detection algorithms. Annals of Data Science. https://
2012.11.009 doi.org/10.1007/s40745-021-00362-9
Luo, Y., Yin, L., Bai, W., & Mao, K. (2020). An appraisal of incremental Santos, A. M., & Canuto, A. M. P. (2012). Using semi-supervised
learning methods. Entropy, 22(11), 1190. https://doi.org/10.3390/ learning in multi-label classification problems. In The 2012 Inter-
e22111190 national Joint Conference on Neural Networks (IJCNN) (pp. 1–8).
Maksim, K., Kirill, B., Eduard, Z., Nikita, G., Aleksandr, B., Arina, https://doi.org/10.1109/IJCNN.2012.6252800
L., Vladislav, S., Daniil, M., & Nikolay, K. (2019). Classifica- Saqlain, M., Abbas, Q., & Lee, J. Y. (2020). A deep convolutional
tion of wafer maps defect based on deep learning methods with neural network for wafer defect identification on an imbalanced
small amount of data. International Conference on Engineering dataset in semiconductor manufacturing processes. IEEE Transac-
and Telecommunication (EnT), 2019, 1–5. https://doi.org/10.1109/ tions on Semiconductor Manufacturing, 33(3), 436–444. https://
EnT47717.2019.9030550 doi.org/10.1109/tsm.2020.2994357
Mohanaiah, P., Sathyanarayana, P., & GuruKumar, L. (2013). Image Saqlain, M., Jargalsaikhan, B., & Lee, J. Y. (2019). A voting ensem-
texture feature extraction using GLCM approach. International ble classifier for wafer map defect patterns identification in
Journal of Scientific and Research Publications, 3(5), 1–5. semiconductor manufacturing. IEEE Transactions on Semicon-
Nakazawa, T., & Kulkarni, D. V. (2018). Wafer map defect pat- ductor Manufacturing, 32(2), 171–182. https://doi.org/10.1109/
tern classification and image retrieval using convolutional neural tsm.2019.2904306
network. IEEE Transactions on Semiconductor Manufacturing, Settles, B. (2009). (rep.). Active Learning Literature Survey. Madison,
31(2), 309–314. https://doi.org/10.1109/tsm.2018.2795466 WI.

123
3246 Journal of Intelligent Manufacturing (2023) 34:3215–3247

Shawon, A., Faruk, M. O., Habib, M. B., & Khan, A. M. (2019). Silicon Wang, C.-H., Kuo, W., & Bensmail, H. (2006). Detection and classifica-
wafer map defect classification using deep convolutional neural tion of defect patterns on semiconductor wafers. IIE Transactions,
network with data augmentation. In 2019 IEEE 5th International 38(12), 1059–1068. https://doi.org/10.1080/07408170600733236
Conference on Computer and Communications (ICCC). https:// Wang, J., Xu, C., Yang, Z., Zhang, J., & Li, X. (2020). Deformable con-
doi.org/10.1109/iccc47050.2019.9064029 volutional networks for efficient mixed-type wafer defect pattern
Shen, Z., & Yu, J. (2019). Wafer map defect recognition based on recognition. IEEE Transactions on Semiconductor Manufacturing,
deep transfer learning. In 2019 IEEE International Conference 33(4), 587–596. https://doi.org/10.1109/tsm.2020.3020985
on Industrial Engineering and Engineering Management (IEEM). Wang, J., Yang, Z., Zhang, J., Zhang, Q., & Chien, W.-T.K. (2019).
https://doi.org/10.1109/ieem44572.2019.8978568 AdaBalGAN: An improved generative adversarial network with
Shi, X., Yan, Y., Zhou, T., Yu, X., Li, C., Chen, S., & Zhao, Y. (2020). imbalanced learning for wafer defective pattern recognition. IEEE
Fast and Accurate Machine Learning Inverse Lithography Using Transactions on Semiconductor Manufacturing, 32(3), 310–319.
Physics Based Feature Maps and Specially Designed DCNN. In https://doi.org/10.1109/tsm.2019.2925361
2020 International Workshop on Advanced Patterning Solutions Wang, W., Huang, Y., Wang, Y., & Wang, L. (2014). Generalized autoen-
(IWAPS). https://doi.org/10.1109/iwaps51164.2020.9286814 coder: A neural network framework for dimensionality reduction.
Shi, X., Zhao, Y., Cheng, S., Li, M., Yuan, W., Yao, L., Zhao, W., In 2014 IEEE Conference on Computer Vision and Pattern Recog-
Xiao, Y., Kang, X., & Li, A. (2019). Optimal feature vector design nition Workshops. https://doi.org/10.1109/cvprw.2014.79
for computational lithography. Optical Microlithography XXXII, Wang, Y., & Ni, D. (2019). Multi-bin Wafer Maps Defect Patterns
10(1117/12), 2515446. Classification. In 2019 IEEE International Conference on Smart
Shim, J., Kang, S., & Cho, S. (2020). Active learning of convolutional Manufacturing, Industrial & Logistics Engineering (SMILE).
neural network for cost-effective wafer map pattern classifica- https://doi.org/10.1109/smile45626.2019.8965299
tion. IEEE Transactions on Semiconductor Manufacturing, 33(2), Wen, G., Gao, Z., Cai, Q., Wang, Y., & Mei, S. (2020). A novel
258–266. https://doi.org/10.1109/tsm.2020.2974867 method based on deep convolutional neural networks for wafer
Shon, H. S., Batbaatar, E., Cho, W.-S., & Choi, S. G. (2021). Unsuper- semiconductor surface defect inspection. IEEE Transactions on
vised pre-training of imbalanced data for identification of wafer Instrumentation and Measurement, 69(12), 9668–9680. https://
map defect patterns. IEEE Access, 9, 52352–52363. https://doi. doi.org/10.1109/tim.2020.3007292
org/10.1109/access.2021.3068378 White, K. P., Kundu, B., & Mastrangelo, C. M. (2008). Classification
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian of defect clusters on semiconductor wafers via the hough trans-
optimization of machine learning algorithms. Advances in neural formation. IEEE Transactions on Semiconductor Manufacturing,
information processing systems, 25. 21(2), 272–278. https://doi.org/10.1109/tsm.2008.2000269
Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Wu, M.-J., Jang, J.-S.R., & Chen, J.-L. (2015). Wafer map failure pat-
Patwary, Md. M. A., Prabhat, & Adams, R. P. (2015). Scalable tern recognition and similarity ranking for large-scale data sets.
Bayesian Optimization Using Deep Neural Networks. ArXiv E- IEEE Transactions on Semiconductor Manufacturing, 28(1), 1–12.
Prints, arXiv:1502.05700. https://doi.org/10.1109/tsm.2014.2364237
Taha, K., Salah, K., & Yoo, P. D. (2018). Clustering the dominant Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algo-
defective patterns in semiconductor wafer maps. IEEE Transac- rithms. Annals of Data Science, 2(2), 165–193. https://doi.org/10.
tions on Semiconductor Manufacturing, 31(1), 156–165. https:// 1007/s40745-015-0040-1
doi.org/10.1109/TSM.2017.2768323 Yu, J. (2019). Enhanced stacked denoising autoencoder-based feature
Tello, G., Al-Jarrah, O., Yoo, P., Al-Hammadi, Y., Muhaidat, S., & learning for recognition of wafer map defects. IEEE Transactions
Lee, U. (2018). Deep-structured machine learning model for the on Semiconductor Manufacturing, 32(4), 613–624. https://doi.org/
recognition of mixed-defect patterns in semiconductor fabrication 10.1109/tsm.2019.2940334
processes. IEEE Transactions on Semiconductor Manufacturing, Yu, J., & Liu, J. (2020). Two-dimensional principal component analysis-
31(2), 315–322. https://doi.org/10.1109/tsm.2018.2825482 based convolutional autoencoder for wafer map defect detection.
Tsai, T.-H., & Lee, Y.-C. (2020a). Wafer Map Defect Classification with IEEE Transactions on Industrial Electronics, 68(9), 8789–8797.
Depthwise Separable Convolutions. In 2020a IEEE International https://doi.org/10.1109/tie.2020.3013492
Conference on Consumer Electronics (ICCE). https://doi.org/10. Yu, J., & Lu, X. (2016). Wafer map defect detection and recognition
1109/icce46568.2020a.9043041 using joint local and nonlocal linear discriminant analysis. IEEE
Tsai, T.-H., & Lee, Y.-C. (2020b). A light-weight neural network for Transactions on Semiconductor Manufacturing, 29(1), 33–43.
wafer map classification based on data augmentation. IEEE Trans- https://doi.org/10.1109/tsm.2015.2497264
actions on Semiconductor Manufacturing, 33(4), 663–672. https:// Yu, N., Xu, Q., & Wang, H. (2019a). Wafer defect pattern recognition
doi.org/10.1109/TSM.2020.3013004 and analysis based on convolutional neural network. IEEE Trans-
Ungredda, J., & Branke, J. (2021). Bayesian optimisation for con- actions on Semiconductor Manufacturing, 32(4), 566–573. https://
strained problems. CoRR, abs/2105.13245. https://arxiv.org/abs/ doi.org/10.1109/TSM.2019.2937793
2105.13245 Yu, J., Zheng, X., & Liu, J. (2019b). Stacked convolutional sparse
Wang, C.-H. (2009). Separation of composite defect patterns on wafer denoising auto-encoder for identification of defect patterns in
bin map using support vector clustering. Expert Systems with semiconductor wafer map. Computers in Industry, 109, 121–133.
Applications, 36(2, Part 1), 2554–2561. https://doi.org/10.1016/ https://doi.org/10.1016/j.compind.2019.04.015
j.eswa.2008.01.057 Yuan, T., Bae, S. J., & Park, J. I. (2010). Bayesian spatial defect pat-
Wang, C.-H. (2008). Recognition of semiconductor defect patterns tern recognition in semiconductor fabrication using support vector
using spatial filtering and spectral clustering. Expert Systems with clustering. The International Journal of Advanced Manufacturing
Applications, 34(3), 1914–1923. https://doi.org/10.1016/j.eswa. Technology, 51(5), 671–683. https://doi.org/10.1007/s00170-010-
2007.02.014 2647-x
Wang, R., & Chen, N. (2019). Wafer map defect pattern recognition Zhong, G., Wang, L., Ling, X., & Dong, J. (2016). An overview on
using rotation-invariant features. IEEE Transactions on Semicon- data representation learning: From traditional feature learning to
ductor Manufacturing, 32(4), 596–604. https://doi.org/10.1109/ recent deep learning. The Journal of Finance and Data Science,
TSM.2019.2944181 2(4), 265–278. https://doi.org/10.1016/j.jfds.2017.05.001

123
Journal of Intelligent Manufacturing (2023) 34:3215–3247 3247

Zhu, X., Hu, H., Lin, S., & Dai, J. (2019). Deformable ConvNets V2: Publisher’s Note Springer Nature remains neutral with regard to juris-
More Deformable, Better Results. 2019 IEEE/CVF Conference on dictional claims in published maps and institutional affiliations.
Computer Vision and Pattern Recognition (CVPR). https://doi.org/
10.1109/cvpr.2019.00953 Springer Nature or its licensor holds exclusive rights to this article
Zhuang, J., Mao, G., Wang, Y., Chen, X., & Wei, Z. (2020). A neural- under a publishing agreement with the author(s) or other rightsholder(s);
network approach to better diagnosis of defect pattern in wafer bin author self-archiving of the accepted manuscript version of this article
map. China Semiconductor Technology International Conference is solely governed by the terms of such publishing agreement and appli-
(CSTIC), 2020, 1–3. https://doi.org/10.1109/CSTIC49141.2020. cable law.
9282438
Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018, April 11).
Learning transferable architectures for scalable image recogni-
tion. arXiv.org. Retrieved September 22, 2021, from https://arxiv.
org/abs/1707.07012.

123

MV90 and ION Technology
No ratings yet
MV90 and ION Technology
28 pages
Ipc A 610standardinelectricandelectronic
No ratings yet
Ipc A 610standardinelectricandelectronic
9 pages
Engineering Applications of Artificial Intelligence
No ratings yet
Engineering Applications of Artificial Intelligence
15 pages
A Comprehensive Review of Deep Learning-Based PCB Defect Detection
No ratings yet
A Comprehensive Review of Deep Learning-Based PCB Defect Detection
22 pages
Coil Failure Modes
No ratings yet
Coil Failure Modes
17 pages
Orleary 2020
No ratings yet
Orleary 2020
14 pages
Closing The Loop Real-Time Error Detection and Correction in Automotive Production Using Edge - Cloud-Architecture and A CNN
No ratings yet
Closing The Loop Real-Time Error Detection and Correction in Automotive Production Using Edge - Cloud-Architecture and A CNN
7 pages
MobileNet CBAM
No ratings yet
MobileNet CBAM
18 pages
Enhancing_EfficientNet-YOLOv4_for_Integrated_Circuit_Detection_on_Printed_Circuit_Board_PCB
No ratings yet
Enhancing_EfficientNet-YOLOv4_for_Integrated_Circuit_Detection_on_Printed_Circuit_Board_PCB
13 pages
Chien 2014
No ratings yet
Chien 2014
12 pages
Led 2021
No ratings yet
Led 2021
10 pages
Electronics: An Improved Fault Diagnosis Using 1D-Convolutional Neural Network Model
No ratings yet
Electronics: An Improved Fault Diagnosis Using 1D-Convolutional Neural Network Model
19 pages
Lithohod: A Litho Simulator-Powered Framework For Ic Layout Hotspot Detection
No ratings yet
Lithohod: A Litho Simulator-Powered Framework For Ic Layout Hotspot Detection
14 pages
Jr 等 - 2002 - Industry survey of automatic defect classification technologies, methods, and performance
No ratings yet
Jr 等 - 2002 - Industry survey of automatic defect classification technologies, methods, and performance
8 pages
Bearing Fault Diagnosis Using Signal Processing and Machine Learning Techniques - A Review - Barai - 2022
No ratings yet
Bearing Fault Diagnosis Using Signal Processing and Machine Learning Techniques - A Review - Barai - 2022
11 pages
Ayomide's report
No ratings yet
Ayomide's report
16 pages
Why Is Deep Learning Challenging For Printed Circuit Board (PCB) Component Recognition and How Can We Address It?
No ratings yet
Why Is Deep Learning Challenging For Printed Circuit Board (PCB) Component Recognition and How Can We Address It?
18 pages
PCB 1
No ratings yet
PCB 1
24 pages
On-Line Detection of Defects in Layered Manufacturing: Is Use A
No ratings yet
On-Line Detection of Defects in Layered Manufacturing: Is Use A
6 pages
Real-time detection of surface cracking defects for large-sized stamped parts
No ratings yet
Real-time detection of surface cracking defects for large-sized stamped parts
15 pages
Sensors 19 03987 v4
No ratings yet
Sensors 19 03987 v4
6 pages
Ipc A 610standardinelectricandelectronic
No ratings yet
Ipc A 610standardinelectricandelectronic
9 pages
A Deep Convolutional Neural Network For Wafer
No ratings yet
A Deep Convolutional Neural Network For Wafer
9 pages
IPC-A-610STANDARDINELECTRICANDELECTRONIC
No ratings yet
IPC-A-610STANDARDINELECTRICANDELECTRONIC
9 pages
Electronics 12 04422 v2
No ratings yet
Electronics 12 04422 v2
17 pages
You-Only-Look-Once Multiple-Strategy Printed Circuit Board Defect Detection Model
No ratings yet
You-Only-Look-Once Multiple-Strategy Printed Circuit Board Defect Detection Model
12 pages
applsci-10-03423-v2
No ratings yet
applsci-10-03423-v2
19 pages
Liu2022 - Review Paper With Timeline
No ratings yet
Liu2022 - Review Paper With Timeline
30 pages
10786013
No ratings yet
10786013
13 pages
Ramesh Kumar Maurya (Roll No - 1742220034) Summer Training Report Auto Tools of Analysis and Desgin Ofintergrated Circuit
No ratings yet
Ramesh Kumar Maurya (Roll No - 1742220034) Summer Training Report Auto Tools of Analysis and Desgin Ofintergrated Circuit
61 pages
E-Design and Manufacturing Approach For Cubesat Solar Panel
No ratings yet
E-Design and Manufacturing Approach For Cubesat Solar Panel
16 pages
AnExplainable AI-Based Fault Diagnosis Model for Bearings
No ratings yet
AnExplainable AI-Based Fault Diagnosis Model for Bearings
34 pages
On-Chip Based Power Estimation For CMOS VLSI Circu
No ratings yet
On-Chip Based Power Estimation For CMOS VLSI Circu
8 pages
1-s2.0-S221486042100316X-main
No ratings yet
1-s2.0-S221486042100316X-main
18 pages
Sensors 23 06558 v2
No ratings yet
Sensors 23 06558 v2
17 pages
Bearing Fault Classification Using Ensem
No ratings yet
Bearing Fault Classification Using Ensem
20 pages
A digital twin implementation architecture for wire + arc additive
No ratings yet
A digital twin implementation architecture for wire + arc additive
5 pages
1-s2.0-S2590123024006522-main
No ratings yet
1-s2.0-S2590123024006522-main
17 pages
Welding Defects Classification by Weakly Supervised Semantic Segmentation
No ratings yet
Welding Defects Classification by Weakly Supervised Semantic Segmentation
5 pages
Diagnosis of Bearing Faults Using Temporal Vibration Signals: A Comparative Study of Machine Learning Models With Feature Selection Techniques
No ratings yet
Diagnosis of Bearing Faults Using Temporal Vibration Signals: A Comparative Study of Machine Learning Models With Feature Selection Techniques
17 pages
Defect Inspection Technologies For Additive Manufacturing
No ratings yet
Defect Inspection Technologies For Additive Manufacturing
22 pages
Non-destructive_testing_of_metal-based_additively_
No ratings yet
Non-destructive_testing_of_metal-based_additively_
28 pages
In-process monitoring of part geometry in fused filament fabrication using computer vision and digital twins
No ratings yet
In-process monitoring of part geometry in fused filament fabrication using computer vision and digital twins
17 pages
Vlsi Testing Workshop
No ratings yet
Vlsi Testing Workshop
2 pages
Artificial Intelligence and Image Processing Based Part Feeding Control in a Robot Cell
No ratings yet
Artificial Intelligence and Image Processing Based Part Feeding Control in a Robot Cell
11 pages
AlexNet.pdf
No ratings yet
AlexNet.pdf
9 pages
Detecting Gear Surface Defects Using Background-Weakening Method and Convolutional Neural Network
No ratings yet
Detecting Gear Surface Defects Using Background-Weakening Method and Convolutional Neural Network
14 pages
Integrated_Diagnostics_and_Prognostics_of_Rotating
No ratings yet
Integrated_Diagnostics_and_Prognostics_of_Rotating
8 pages
Ipc2022-86934 Not All Data Is Good Data The Challenges of Using Machine Learning
No ratings yet
Ipc2022-86934 Not All Data Is Good Data The Challenges of Using Machine Learning
9 pages
RCIM-
No ratings yet
RCIM-
15 pages
Deep learning based Intelligent Industrial based Fault Diagnosis Model
No ratings yet
Deep learning based Intelligent Industrial based Fault Diagnosis Model
17 pages
sensors-23-06235
No ratings yet
sensors-23-06235
18 pages
Literature review for SLM for all ceremic for dentistry
No ratings yet
Literature review for SLM for all ceremic for dentistry
11 pages
IEEE_IoT_Journal_6G_Internet_of_Vehicles_Networks_Camera_Ready
No ratings yet
IEEE_IoT_Journal_6G_Internet_of_Vehicles_Networks_Camera_Ready
14 pages
2024 Specificity - Autocorrelation.integration - Network.for - Surface.defect - Detection.of - No Service - Rail
No ratings yet
2024 Specificity - Autocorrelation.integration - Network.for - Surface.defect - Detection.of - No Service - Rail
10 pages
PCB WEEK10 s2 Thry
No ratings yet
PCB WEEK10 s2 Thry
6 pages
Tim 2020 3033726
No ratings yet
Tim 2020 3033726
12 pages
Yu 2020
No ratings yet
Yu 2020
13 pages
applsci-12-05513
No ratings yet
applsci-12-05513
13 pages
J Esthet Restor Dent - 2019 - Methani - The potential of additive manufacturing technologies and their processing
No ratings yet
J Esthet Restor Dent - 2019 - Methani - The potential of additive manufacturing technologies and their processing
11 pages
Essentials of Mechatronics
From Everand
Essentials of Mechatronics
John Billingsley
No ratings yet
What Is A UV-Vis Spectrophotometer
No ratings yet
What Is A UV-Vis Spectrophotometer
5 pages
937 - Válvula Mariposa (Bridada)
No ratings yet
937 - Válvula Mariposa (Bridada)
2 pages
مختبر مكامن - ١ (2)
No ratings yet
مختبر مكامن - ١ (2)
13 pages
1
No ratings yet
1
2 pages
Emergence of the Fourth Dimension. Higher Spatial Thinking in the Fin de Siècle 2018
No ratings yet
Emergence of the Fourth Dimension. Higher Spatial Thinking in the Fin de Siècle 2018
225 pages
NodeXL - Collection - Twitter - Starbucks - 2010-10!27!02!00!01
No ratings yet
NodeXL - Collection - Twitter - Starbucks - 2010-10!27!02!00!01
581 pages
Lab#14 - The Effect of Temperature On Germination (PD)
No ratings yet
Lab#14 - The Effect of Temperature On Germination (PD)
3 pages
GNU Toolchain For ARC
100% (1)
GNU Toolchain For ARC
154 pages
Logic Family
No ratings yet
Logic Family
36 pages
03 Reference Material HMT ME302 Steady Heat Conduction PDF
No ratings yet
03 Reference Material HMT ME302 Steady Heat Conduction PDF
38 pages
AEX ME 1 - Englpdf
No ratings yet
AEX ME 1 - Englpdf
2 pages
5.3.-Leed-165 Operation Manual
No ratings yet
5.3.-Leed-165 Operation Manual
21 pages
My Test
No ratings yet
My Test
5 pages
04-Chapter04-Equations of Straight Lines - Final
No ratings yet
04-Chapter04-Equations of Straight Lines - Final
23 pages
Anatomy and Physiology Reviewer (Transes)
No ratings yet
Anatomy and Physiology Reviewer (Transes)
4 pages
Best Practices in Hyperion Financial Management Design & Implementation
No ratings yet
Best Practices in Hyperion Financial Management Design & Implementation
26 pages
Xii PB Syallbus 24-25
No ratings yet
Xii PB Syallbus 24-25
3 pages
Lab 05 - Voltmeter and Ammeter Design Using Galvanometer
40% (5)
Lab 05 - Voltmeter and Ammeter Design Using Galvanometer
7 pages
H3C Cloudnet Platform APIs
No ratings yet
H3C Cloudnet Platform APIs
55 pages
Andromeda: Constellation
No ratings yet
Andromeda: Constellation
5 pages
Centroids and Centers of Gravity
No ratings yet
Centroids and Centers of Gravity
9 pages
A Product From MAGCHEM Technical Data Sheet - Fiche Technique
No ratings yet
A Product From MAGCHEM Technical Data Sheet - Fiche Technique
2 pages
204.4293.16 - DmOS - Troubleshooting Guide
No ratings yet
204.4293.16 - DmOS - Troubleshooting Guide
78 pages
Termometro IR Extech Instruments
No ratings yet
Termometro IR Extech Instruments
4 pages
How To Kill Tanks (FC 7-2)
No ratings yet
How To Kill Tanks (FC 7-2)
34 pages
CMOS Technologies: Chapter 3 CMOS Processing Technology
No ratings yet
CMOS Technologies: Chapter 3 CMOS Processing Technology
1 page
Study of Quantity of Caesin Present in Different Samples of Milk
No ratings yet
Study of Quantity of Caesin Present in Different Samples of Milk
21 pages
Geared Machines March 2018
No ratings yet
Geared Machines March 2018
31 pages
en Safety Manual VEGASWING 61 63 Contactless Electronic Switch With SIL
No ratings yet
en Safety Manual VEGASWING 61 63 Contactless Electronic Switch With SIL
24 pages

Advances in Machine Learning and Deep Learning Applications Towards Wafer Map Defect Recognition and Classification-A Review

Uploaded by

Advances in Machine Learning and Deep Learning Applications Towards Wafer Map Defect Recognition and Classification-A Review

Uploaded by

Journal of Intelligent Manufacturing (2023) 34:3215–3247

Advances in machine learning and deep learning applications

Introduction The onset of unknown/rare, mixed, and complicated

Fig. 1 Mind map illustrating overall structure of paper

With recent developments in computer vision and ML/DL

Data defect classes, which consists of 29 mixed-type defects of

Table 2 Summary of generated feature types and usage in literature

Class Papers Feature Description

Table 4 Summary of common

Supervised, Accuracy (Top-1) Acc = T P+T N

Exact match ratio

Hamming loss H amming loss = (10)

Unsupervised Rand Index RI = a+b

Fig. 6 Structure of proposed CNN by Nakazawa and Kulkarni (2018)

Fig. 7 Proposed mixed-type

Fig. 8 Proposed Siamese

To learn the low dimensional representations of the data Unsupervised learning

Fig. 9 Standard convolution

Fig. 10 General unsupervised

Fig. 11 Proposed SVAE

Fig. 12 Proposed SS-CDGMM in (Lee & Kim, 2020)

Fig. 13 AdaBalGAN framework structure from (Wang et al., 2019)

Fig. 14 General procedure for

Table 5 Comparison of pretrained models for wafer map defect detection

Center 64.52% 88.80% 98.92% 97.81%

Fig. 15 General training

increase in defect complexity and frequency necessitates

Table 6 Summary of recognition

Author Algorithm Data Results Author Algorithm Data Results

You might also like