0% found this document useful (0 votes)

5 views

2

This document presents a novel framework for machine unlearning that allows the removal of specific classes of data from trained machine learning models without accessing the original training data. The proposed method, called UNSIR, utilizes an error-maximizing noise matrix to efficiently manipulate model weights, enabling fast unlearning while maintaining overall model accuracy. This approach is particularly relevant for privacy concerns in applications like face recognition, as it supports unlearning multiple classes in a scalable manner and adheres to stricter privacy regulations.

Uploaded by

Phương Nghi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

2

Uploaded by

Phương Nghi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

13046 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO.

9, SEPTEMBER 2024

Fast Yet Effective Machine Unlearning

Ayush K. Tarun , Vikram S. Chundawat , Murari Mandal , and Mohan Kankanhalli , Fellow, IEEE

Abstract— Unlearning the data observed during the training from the already trained face recognition model. In addition,
of a machine learning (ML) model is an important task that there is a constraint such that the company no longer has
can play a pivotal role in fortifying the privacy and security of access to those (requested to be removed) facial images. How
ML-based applications. This article raises the following ques-
tions: 1) can we unlearn a single or multiple class(es) of data do we solve such a problem? With the increase in privacy
from an ML model without looking at the full training data awareness among the general populace and the cognizance
even once? and 2) can we make the process of unlearning fast of the negative impacts of sharing one’s data with ML-based
and scalable to large datasets, and generalize it to different deep applications, such type of demands could be raised frequently
networks? We introduce a novel machine unlearning framework in near future. Privacy regulations [1], [2] are increasingly
with error-maximizing noise generation and impair-repair based
weight manipulation that offers an efficient solution to the likely to include such provisions in future to give the control of
above questions. An error-maximizing noise matrix is learned personal privacy to the individuals. For example, the California
for the class to be unlearned using the original model. The noise Consumer Privacy Act (CCPA) [2] allows companies to collect
matrix is used to manipulate the model weights to unlearn the user data by default. However, the user has the right to delete
targeted class of data. We introduce impair and repair steps her personal data and right to opt-out of the sale of her
for a controlled manipulation of the network weights. In the
impair step, the noise matrix along with a very high learning personal information. In case a company has already used the
rate is used to induce sharp unlearning in the model. Thereafter, data collected from the users (in our example, face data) to
the repair step is used to regain the overall performance. With train its ML model, then the model needs to be manipulated
very few update steps, we show excellent unlearning while suitably to reflect the data deletion request. The naive way is
substantially retaining the overall model accuracy. Unlearning to redo the model training from scratch for every such request.
multiple classes requires a similar number of update steps
as for a single class, making our approach scalable to large This would result in significant cost of time and resources to
problems. Our method is quite efficient in comparison to the the company. How can this process be made more efficient?
existing methods, works for multiclass unlearning, does not What are the challenges? How do we know that the model
put any constraints on the original optimization mechanism or has actually unlearned those class/classes of data? How to
network design, and works well in both small and large-scale ensure minimal effect on the overall accuracy of the model?
vision tasks. This work is an important step toward fast and
easy implementation of unlearning in deep networks. Source These are some of the questions that have been asked and
code: https://github.com/vikram2000b/Fast-Machine-Unlearning. possible solutions have been explored in recent times [3], [4],
[5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16],
Index Terms— Data privacy, forgetting, machine unlearning,
privacy in artificial intelligence (AI). [17], [18], [19].
The unlearning (also called selective forgetting, data dele-
tion, or scrubbing) solutions presented in the literature are
I. I NTRODUCTION
focused on simple learning algorithms such a linear/logistic

C ONSIDER a scenario where it is desired that the informa-

tion pertaining to the data belonging to a single class or
multiple classes be removed from the already trained machine
regression [20], random forests [7], and k-means cluster-
ing [11]. Initial work on forgetting in convolutional networks is
presented in [9] and [10]. However, these methods are shown
learning (ML) model. For example, a company is requested to be effective only on small scale problems and are com-
to remove the face image data for a user (or a set of users) putationally expensive. Efficient unlearning in deep networks
Manuscript received 10 July 2022; revised 12 January 2023 and 3 March such as convolutional neural networks (CNNs) and vision
2023; accepted 4 April 2023. Date of publication 1 May 2023; date of current transformers (ViTs) still remain an open problem. In particular,
version 4 September 2024. This work was supported by the National Research efficiently unlearning of multiple classes is yet to be explored.
Foundation, Singapore under its Strategic Capability Research Centers Fund-
ing Initiative. (Ayush K. Tarun and Vikram S. Chundawat contributed equally This is due to several complexities that arise while working
to this work.) (Corresponding author: Murari Mandal.) with deep learning models. For example, the nonconvex loss
Ayush K. Tarun and Vikram S. Chundawat are with the Mavvex space [21] of CNNs makes it difficult to assess the effect
Labs, Faridabad 121001, India (e-mail: [email protected];
[email protected]). of a data sample on the optimization trajectory and the final
Murari Mandal was with the School of Computing, National University network weight combination. Furthermore, several optimal set
of Singapore, Singapore 117417. He is now with the School of Computer of weights may exist for the same network, making it difficult
Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar
751024, India (e-mail: [email protected]). to confidently evaluate the degree of unlearning. Comparing
Mohan Kankanhalli is with the School of Computing, National University the updated model weights after unlearning with a model
of Singapore (NUS), Singapore 117417 (e-mail: [email protected]). trained without the forget classes might not reveal helpful
This article has supplementary downloadable material available at
https://doi.org/10.1109/TNNLS.2023.3266233, provided by the authors. information on the quality of unlearning. Forgetting a cohort of
Digital Object Identifier 10.1109/TNNLS.2023.3266233 data or an entire class of data while preserving the accuracy
2162-237X © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:15:20 UTC from IEEE Xplore. Restrictions apply.
TARUN et al.: FAST YET EFFECTIVE MACHINE UNLEARNING 13047

of the model is a nontrivial problem as has been shown in large-scale vision problems (100 classes). Our method works
the existing works [3], [9]. Moreover, efficiently manipulating with the stringent zero-glance setting where data samples of
the network weights without using the unlearning data still the requested unlearning class is either not available or can
remains an unsolved problem. Other challenges are to unlearn not be used. This makes our solution unique and practical for
multiple classes of data, perform unlearning for large-scale real-world application. An important and realistic use-case of
problems, and generalize the solution to different types of deep unlearning is face recognition. We show that our method can
networks. effectively make a trained model forget a single as well as
Estimating the effect of a data sample or a class of data multiple faces in a highly efficient manner, without glancing
samples on the deep model parameters is a challenging prob- at the samples of the unlearning faces.
lem [22], [23]. Therefore, several unlearning research efforts To summarize, our key contributions are as follows.
have been focused on the simpler convex learning problems 1) We introduce the problem of unlearning in a zero-glance
(i.e., linear/logistic regression) that offer better theoretical setting which is a stricter formalization compared to the
analysis. Researchers have [22] used influence functions to existing settings and offers a prospect for higher-level
study the behavior of black-box models such as CNNs through of privacy guarantees.
the lens of training samples. It is observed that data points 2) We learn an error-maximizing noise for the respec-
with high training loss are more influential for the model tive unlearning classes. UNSIR is proposed to perform
parameters. The adversarial versions [24] of the training single-pass impair and single-pass repair by using a
images are generated by maximizing the loss on these images. very high learning rate. The impair step makes the
It is further shown that the influence functions are also useful network forget the unlearning data classes. The repair
for studying the effect of a group of data points [23]. Recently, step stabilizes the network weights to better remem-
Huang et al. [25] proposed to learn an error-minimizing noise ber the remaining tasks. The combination of both the
to make training examples unlearnable for deep learning steps allows it to obtain excellent unlearning and retain
models. The idea is to add such noise to the image samples accuracy.
that fools the model in believing nothing is to be learned from 3) We show that along with a better privacy setting and
those samples. If used in training, such images have no effect offering multiclass unlearning, our method is also highly
on the model. efficient. The multiclass unlearning is performed in a
Unlearning requires the model to forget specific class(es) single impair-repair pass instead of sequentially unlearn-
of data but remember the rest of the data. For the class(es) ing individual classes.
to be forgotten, if the model can be updated by observing 4) The proposed method works on large-scale vision
patterns that are somehow opposites of the patterns learned at datasets with strong performance on different types of
the time of original training, then the updated model weights deep networks such as convolutional networks and ViTs.
might reflect the desired unlearning. And hopefully it preserves Our method does not require any prior information
the remaining classes information. We know that the original related to process of original model training and it is
model is trained by minimizing the loss for all the classes. easily applicable to a wide class of deep networks.
So intuitively, maximizing a noise with respect to the model Specifically, we show excellent unlearning results on
loss only for the unlearning class will help us learn such face recognition. To the best of our knowledge, it is
patterns that help forgetting. It can also be viewed as learning the first machine unlearning method to demonstrate all
anti-samples for a given class and use these anti-samples to the above characteristics together.
damage the previously learned information. In this article,
we propose a framework for unlearning in a zero-glance II. R ELATED W ORK
privacy setting, i.e., the model can not see the unlearning A. Machine Unlearning
class of data. We learn an error-maximizing noise matrix Machine unlearning was formulated as a data forgetting
consisting of highly influential points corresponding to the algorithm in statistical query learning [26]. Brophy and
unlearning class. After that, we train the model using the Lowd [7] introduced a variant of random forests that sup-
noise matrix to update the network weights. We introduce ports data forgetting with minimal retraining. Data deletion
unlearning by selective impair and repair (UNSIR), a single- in k-means clustering has been studied in [11] and [27].
pass method to unlearn single/multiple classes of data in a Guo et al. [12] give a certified information removal frame-
deep model without requiring access to the data samples of work based on Newton’s update removal mechanism for
the requested set of unlearning classes. Our method can be convex learning problems. The data removal is certified using
directly applied on the already trained deep model to make a variation of the differential privacy condition [28], [29].
it forget the information about the requested class of data - Izzo et al. [14] presents a projective residual update method
while at the same time retaining very close to the original to delete data points from linear models. A method to hide the
accuracy of the model on the remaining tasks. In fact our class information from the output logits is presented in [13].
method performs exceedingly well in both unlearning the This however, does not remove the information present in
requested classes and retaining the accuracy on the remaining the network weights. Unlearning in a Bayesian setting using
classes. To the best of our knowledge, this is the first method variational inference is explored for regression and Gaussian
to achieve efficient multiclass unlearning in deep networks processes in [8]. Neel et al. [15] study the results of gradient
not only for small-scale problems (ten classes) but also for descent based approach to unlearning in convex models. All

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:15:20 UTC from IEEE Xplore. Restrictions apply.
13048 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 9, SEPTEMBER 2024

these methods are designed for convex problems, whereas we we do not require any prior information related to the training
aim to present an unlearning solution for deep learning models. process. In addition, we propose the first Unlearning method
Some methods adopt strategic grouping of data in the that works for both CNN and ViTs. We show the results on
training procedure and thus enable smooth unlearning by different deep learning models, small and large datasets, and
limiting the influence of data points on model learning [4], demonstrate successful unlearning in face-recognition.
[16]. This approach results in high storage cost as it mandates
storing multiple snapshots of the network and gradients to C. Data Privacy
ensure good unlearning performance. These approaches are
Privacy in ML has been extensively studied and various
independent of the types of learning algorithms and rely on
privacy-preserving mechanisms have been presented [41],
the efficient division of training data. They also need to retrain
[42]. The most common assumption in the such privacy
a subset of the models, while we aim to create a highly
protecting frameworks is that the model can freely access
efficient unlearning algorithm without any memory overhead.
the entire training data and algorithms are devised to protect
Gupta et al. [30] proposed an algorithm to handle a sequence
the model from leaking information about the training data.
of adaptive deletion requests in this setting.
Another privacy setting [25], [43] considers a scenario where
the goal is to make the personal data completely unusable
B. Unlearning in Deep Neural Networks for unauthorized deep learning models. The solutions in such
a setting are based on the principles of the adversarial attack
Forgetting in deep neural networks is challenging due to
and defense methods [44], [45]. Some privacy settings [3], [14]
their highly nonconvex loss functions. Although the term for-
allow the user to make a request to forget their data from the
getting is used quite often in continual learning literature [31],
already trained model. These privacy settings assume having
where a model rapidly loses accuracy on the previous task
access to all the training data before forgetting. We propose
when fine-tuned for a new task. This however does not address
to work in a stricter setting where once the user has made a
the information remaining in the network weights. Through-
request for forgetting her data (for example, her face in the
out this article we use the term unlearning and forgetting
face recognition model), the model can not use those samples
interchangeably, both denoting that the information of data
even for the purpose of network weight manipulation.
in the network weights are also removed. Golatkar et al. [9]
proposed an information theoretic method to scrub the infor-
III. U NLEARNING IN Z ERO -G LANCE P RIVACY S ETTING
mation from intermediate layers of deep networks trained
with stochastic gradient descent (SGD). They also give an A. Zero-Glance Privacy Assumptions
upper-bound on the amount of remaining information in the In several use cases, the ML model is trained with facial
network [32] after forgetting by exploiting the stability of images and personal medical data. Due to the sensitive nature
SGD. This work is extended [10] by including an update of the data and the time constraints usually set by the data pro-
mechanism for the final activations of the model. They present tection regulations (general data protection regulation (GDPR),
a neural tangent kernel (NTK) based approximation of the CCPA), it may not be possible to use the forget set data even
training process and use it to estimate the updated network for unlearning purpose. We assume that the user can request
weights after forgetting. However, both the approximation for immediate deletion of her data and a time-bound removal
accuracy and computational costs degrade for larger datasets. of the information (in network weights) from the already
The computational cost even in a small dataset is quite trained model. The immediate removal of requested data leaves
high as the cost is quadratic in the number of samples. us with only the remaining data to perform unlearning. Once
Golatkar et al. [3], directly train a linearized network and use the network weights are updated, the model should not have
it for forgetting. They train two separate networks: the core any information corresponding to the forgetting data. Even
model, and a mixed-linear model. The mixed-linear model after being exposed to the forgetting samples, the relearn
requires Jacobian-vector product (JVP) computation and few time (RT) should be substantially high to ensure that the model
other fine-tuning. This framework was shown to be scalable has actually forgotten those samples.
for several standard vision datasets. However, they present
such a network only for ResNet50 which requires a lot of B. Preliminaries and Objective
fine-tuning to obtain the results. Also designing a mixed-linear
We formulate the unlearning problem in the context of deep
network for every deep architecture is an inefficient approach.
networks. Let the complete training dataset consisting of n
Some researchers have studied the unintended privacy risks
samples and K total number of classes be Dc = {(xi , yi )}i=1
n
resulting from the existing unlearning methods [33], [34]. where x ∈ X ⊂ R are the inputs and y ∈ Y = 1, . . . , K
d
Thudi et al. [35] show the difficulty of formally proving the are the corresponding class labels. If the forget and retain
absence of certain data points in the model. They suggest that classes are denoted by Y f and Yr then D f ∪ Dr = Dc ,
the current unlearning methods are well-defined only at the D f ∩ Dr = ∅. Let the deep learning model be represented by
algorithmic level. Forgetting in federated learning [36] and the function f θ (x) : X → Y parameterized by θ ∈ Rd used to
recommendation systems [37] are also explored. Several other model the relation X → Y. The weights θ of the original
notable works include [38], [39], [40]. Our method does not trained deep network f θ 1 are a function of the complete
put any constraints on the type of optimization to be used
while training. We do not train any additional network, in-fact 1 We use the notation f to denote the model in the rest of this article.

Fig. 1. Proposed unlearning framework. We use the pretrained model to learn the error-maximizing noise matrix for the unlearning class. The generated
noise N is then used along with a subset of the retain data Dr _sub to update the model with one epoch (impair). Next, we apply a healing step by further
updating the network with only the retain data Dr _sub (repair). The repair step helps in regaining the overall model performance while unlearning the requested
class/classes of data.

training data Dc . Forgetting in zero-glance privacy setting is We maximize the error corresponding to the forget class(es)
an algorithm, which gives a new set of weights θ Dr _sub by using so that this noise is opposite to what D f represents. Using this
the trained model f and a subset of retain images Dr _sub ⊂ Dr in the impair stage of the UNSIR algorithm erases information
which does not remember the information regarding D f and related to D f . Overall, it enables efficient unlearning in deep
behaves similar to a model which has never seen D f in the networks. The second term λ∥wnoise ∥ in (1) is proposed to
parameter and output space. regularize the overall loss by preventing the values in N
To achieve unlearning, we first learn a noise matrix N from becoming too large. Without this regularization of N ,
for each class in Y f by using the trained model. Then we the model will start believing that images with higher values
transform the model in such a way that it fails to classify the belong to the unlearn class. For multiple classes of data,
samples from forget set D f while maintaining the accuracy for we learn the noise matrix N for each class separately. Since
classifying the samples from the retain set Dr . This is ensured the optimization is performed using the model loss with
by using a small subset of samples Dr _sub drawn from the respect to the noise matrix, this can be done in an insignificant
retain dataset Dr . amount of time. The UNSIR algorithm will be executed only
once for both single-class and multiclass unlearning.
IV. E RROR -M AXIMIZING N OISE BASED U NLEARNING
Our approach aims to learn a noise matrix for the unlearning B. UNSIR: Unlearning With Single Pass Impair and Repair
class by maximizing the model loss. Such generated noise We combine the noise matrix along with the samples
samples will damage/overwrite the previously learned network in Dr _sub , i.e., Dr _sub ∪ N , and train the model for one
weights for the relevant class(es) during the model update epoch (impair) to induce unlearning. After that we again
and induce unlearning. Error maximizing noise will have high train (repair) the model for one epoch, now on Dr _sub only.
influence to enable parameters updates corresponding to the The final model shows excellent performance in unlearning
unlearning class. the targeted classes of data and retaining the accuracy on the
remaining classes.
Impair: We train the model on a small subset of data from
A. Error-Maximizing Noise
the original distribution which also contains generated noise.
We learn an error-maximizing noise N of the same size as This step is called impair as it corrupts those weights in the
that of the model input. The goal is to create a correlation network which are responsible for recognition of the data in
between N and the unlearning class label, f : N → Y f , forget class(es). We use a high learning rate and observe that
N ̸ = X . We freeze the weights of the pretrained model during almost always only a single epoch of impair is sufficient.
this error maximizing process (see Fig. 1). Given a noise Repair: The impair step may sometimes disturb the weights
matrix N , initialized randomly with a normal distribution that are responsible for predicting the retain classes. Thus,
N (0, 1), we propose to optimize the error-maximizing noise we repair those weights by training the model for a single
by solving the following optimization problem: epoch (on rare occasions, more epochs may be required) on
the retain data Dr _sub . The final updated model has high RT,
arg min E(θ ) −L( f, y) + λ∥wnoise ∥ (1)
N i.e., it takes substantial number of epochs for the network
where, L( f, y) is the classification loss corresponding to the to relearn the forget samples. This is one of the important
class to unlearn, f denotes the trained model. The wnoise are criteria for effective unlearning and the proposed method
the parameters of the noise N (can be interpreted as pixel shows good robustness for the same. The overall framework
values in terms of an image) and λ is used to manage the of our unlearning algorithm is shown in Fig. 1.
trade-off between the two terms. The optimization problem
finds the L p -norm bounded noise that maximizes the model’s V. E XPERIMENTS AND R ESULTS
classification loss. In our method, we use a Cross-Entropy loss We show the performance of our proposed method for
function L with L 2 normalization. unlearning single and multiples classes of data across a

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:15:20 UTC from IEEE Xplore. Restrictions apply.
13050 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 9, SEPTEMBER 2024

variety of settings. We use different types of deep networks task [38], and we are not aware of any method claiming to
ResNet18 [46], AllCNN [47], MobileNetv2 [48] and ViTs [49] do so.
for evaluation and empirically demonstrate the applicability
of our method across these different networks. The exper- B. Models
iments are conducted for network trained from scratch as In CIFAR-10, we trained ResNet18 and AllCNN from
well as pretrained models fine-tuned on specific datasets. scratch and used the proposed method to unlearn a single
The unlearning method is analyzed over CIFAR-10 [50], class and multiple classes (two classes, four classes, and seven
CIFAR-100 [50] and VGGFace-100 (100 face IDs collected classes) from the model. Without loss of generality, we use
from the VGGFaces2 [51]). Results on these variety of models class 0 for single class unlearning, and a random manual selec-
and datasets demonstrate the wide applicability of our method. tion of class subsets for multiclass unlearning. For example,
The experimental results are reported with a single step in two-class unlearning we unlearn class 1 and 2, in four-class
(one epoch) of impair and a single step of repair. Additional unlearning we unlearn classes 3–6, in seven-class unlearning
fine-tuning could be done, however, we focus on such a setting we unlearn classes 3–9. In CIFAR-100, we use pretrained
(single-shot) to demonstrate the efficacy of our method under a ResNet18 and MobileNetv2. The unlearning is performed
uniform setup. All the models learned from scratch have been for one class (class 0), and 20 and 40 randomly sampled
trained for 40 epochs, and the pretrained models have been classes. In the latter part, we also demonstrate unlearning
fine-tuned for five epochs. We observe that λ = 0.1 in (1) on VGGFace-100 using pretrained ResNet18 and ViT. The
works quiet well across various tasks, and thus keep it fixed unlearning is performed for 1-faceID, 20-faceID, 40-faceID,
at 0.1 for all the experiments. and 60-faceID, respectively.

A. Evaluation Metrics C. Baseline Unlearning Methods

In the literature [3], [9], [10], [38] several metrics have been We primarily use the following baseline methods: 1) fine-
defined to measure the overall performance of an unlearning tuning on the retain set, i.e., catastrophic forgetting (FineTune)
method. These metrics attempt to determine the amount of and 2) gradient ascent on the forget class (NegGrad). The
information remaining in the network about the unlearn/forget comparative results are shown in Table I. We also run Fisher
data. In our analysis, we use the following metrics. Forgetting [9] and show the results in Table II. We present the
Accuracy on Forget Set (A D f ): Should be close to zero. results in two models for four-class forgetting as the Fisher
Accuracy on Retain Set (A Dr ): Should be close to the method is computationally very expensive. We did not use
performance of original model. methods such as removing the corresponding class from the
RT : RT is a good proxy to measure the amount of final output as it does not remove any information from the
information remaining in the model about the unlearning data. model itself. Simply removing the final layer class might also
If a model regains the performance on the unlearn data very lead to Streisand effect, i.e., the information we are trying to
quickly with only few steps of retraining, it is highly likely hide may become even more prominent.
that some information regarding the unlearn data is still present
in the model. We measure the RT as the number of epochs D. Experimental Settings
it takes for the unlearned model to reach the source model’s The experiments are conducted on a NVIDIA Tesla-V100
accuracy, with the model being trained on 500 random samples (32 GB) GPU. The settings for individual datasets are given
from the training set in each epoch. below:
Weight Distance: The distance between individual layers of CIFAR-10: The error-maximizing noise is learned for a
the original model and the unlearned model gives additional single batch size and 20 copies of this noise are used for the
insights about the amount of information remaining in the noise dataset. A batch size of 256 is used for all the datasets.
network about the forget data. A comparative analysis with The retain set (Dr ) is created by collecting 1000 samples of
the retrained model would validate the robustness of the each retain class. The learning rate of 0.02 is used for impair
unlearning method. step, where one epoch (one shot of damage) is trained done
Prediction Distribution on Forget Class: We analyze the using the mix of retain sub-samples and noise. The learning
distribution of the predictions for different samples in the rate in repair step is 0.01, where one epoch (one shot of
forget class(es) of data in the unlearned model. Presence of healing) is trained on the retain sub-samples.
any specific observable patterns such as repeatedly predicting CIFAR-100: Same as in CIFAR-10, the error-maximizing
a single retain class may indicate risk of information exposure. noise is learned for a single batch size and 20 copies of this
Additionally, a high similarity with the prediction distri- noise are used for the noise dataset. The retain set consists of
bution of the retrain model would indicate robustness in the 50 samples collected from each retain class. For pre-trained
unlearning method to information exposure of the forget class. ResNet18, the learning rate in the impair step is set to 0.01 for
A recent work [52] has reported the shortcomings of member- the last layer and 0.0001 for the remaining layers. Likewise,
ship inference attacks on deep networks. Thus, we avoid using in the repair step, the learning rate is set to 0.005 for the last
them to keep the analysis more consistent and reliable. It is layer and 0.0001 for rest of the layers. In the AllCNN model,
to be noted that a comprehensive method of evaluating the the learning rate for impair and repair steps are 0.02 and 0.01,
exposure/leakage of private data in a deep model is a difficult respectively.

TABLE I
U NLEARNING ON CIFAR-10. O RIGINAL M ODEL : THE M ODEL T RAINED ON C OMPLETE DATASET Dc . R ETRAIN M ODEL : THE M ODEL T RAINED ON
R ETAIN S ET Dr . F INE T UNE : THE F INE T UNED M ODEL ON Dr . N EG G RAD : THE N ETWORK F INE T UNED ON D f W ITH N EGATIVE G RADIENTS
(G RADIENT A SCENT ). O UR M ETHOD : THE P ROPOSED U NLEARNING M ETHOD . RT: RT IS THE Number of Epochs TAKEN BY M ODEL
TO R EGAIN F ULL ACCURACY ON F ORGET S ET W HEN T RAINED ON 500 R ANDOM S AMPLES F ROM Dc . A H IGHER VALUE OF RT
D ENOTES ROBUST E RASURE OF I NFORMATION IN THE N ETWORK W EIGHTS . T HE ACCURACY A D f ON THE F ORGET S ET
S HOULD B E C LOSE TO Z ERO AND A Dr S HOULD B E C LOSE TO O RIGINAL M ODEL’ S A Dr .
# Y f D ENOTES THE N UMBER OF U NLEARNING C LASSES

TABLE II
C OMPARISON OF O UR M ETHOD W ITH A S INGLE C LASS F ISHER F ORGETTING [9] M ETHOD ON CIFAR-10. F ISHER ACHIEVES F ORGETTING BUT FAILS
TO M AINTAIN THE ACCURACY ON THE R ETAINED DATASET

VGGFace-100: A batch of the noise matrix is learned and TABLE III

copied 15 times to create the noise dataset. The retain set U NLEARNING ON CIFAR-100. T HE M ODELS A RE P RETRAINED ON
ImageN ET AND F INE T UNED FOR CIFAR-100
consists of 100 samples of each retain class. For ResNet18,
the learning rate in impair and repair steps are 0.01 and 0.001,
respectively. For ViT model, the learning rate for impair and
repair steps are 0.0001 and 0.00002, respectively. We also run
a Fisher Forgetting model as presented in [9] which is similar
to a targeted noise addition based approach.

E. Results
Our results are compared with three baseline unlearning
methods: Retrain Model, FineTune [9], and NegGrad [9].
We compare the single-class unlearning results with an exist-
ing Fisher forgetting method in Table II. Due to poor results
of FineTune, NegGrad, and Fisher forgetting [9] in CIFAR-10,
we compare our results only with the Retrain Model in the
subsequent experiments.
1) Single Class Unlearning: Tables I and III show that
our model is able to erase the information with respect to
a particular class and unlearn in a single shot of impair and
repair. Table I shows the retain and forget set accuracy after much higher for our method in comparison to the baseline
unlearning along with the average standard deviation after methods (for example, >100 versus 12, 18 in the case of
three runs. We obtain superior accuracy in retain set (Dr ) and AllCNN, CIFAR-10). This shows the capability of our method
forget set (D f ) over the existing methods; such as in the case to enforce robust unlearning. From Table II, we observe that
of ResNet18 and CIFAR-10, we preserve 71.06% of accuracy our method is far superior to Fisher forgetting as well. Fisher
on Dr from an initial 77.86% while degrading the performance Forgetting is able to preserve only 10.85% accuracy in ResNet
on D f significantly (0% from an initial 81.01%). The RT is on Dr on CIFAR-10.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:15:20 UTC from IEEE Xplore. Restrictions apply.
13052 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 9, SEPTEMBER 2024

Fig. 2. Prediction distribution of the unlearned model on forget class of data. Our method gives randomized response to the input query of the forget class
of data. (a) 1-C unlearning (AllCNN). (b) 2-C unlearning (AllCNN). (c) 1-C unlearning (ResNet18). (d) 2-C unlearning (ResNet18).

TABLE IV
U NLEARNING ON VGGFACE -100. T HE M ODELS A RE P RETRAINED ON
ImageN ET AND F INE T UNED FOR VGGFACE -100

Fig. 3. Prediction distribution (AllCNN) of the retrained model (left) and

proposed method (right) for the forget class. We can see the distributions are
similar.

one of the most challenging unlearning tasks. The results

on VGGFace-100 is obtained using ResNet and pretrained
ViT and reported in Table IV. We report the unlearning
performance on Dr and D f after forgetting 1 class, 20 classes,
40 classes, and 60 classes. As the ViT model is obtained with
a few epochs (five epochs) of fine-tuning, the RT is expected
2) Multiple Class Unlearning: Our method shows to be low as well. Therefore, we do not present the analysis
the excellent unlearning result for multiclass unlearning. corresponding to the RT. Our method achieves good retention
We observe that as the number of classes to unlearn as well as forgetting accuracy. Such as for one class forgetting
increases, the repair step becomes more effective and on ResNet18, our method preserves 72.29% accuracy on Dr
leads to performance closer to the original model on Dr . compared to an initial 80.63% and degrades the performance
The experiments are done with pretrained ResNet18 on on D f to 3%. In case of 60 classes forgetting on pretrained
CIFAR-100. After unlearning 20 classes we retain 75.38% ViT, our method preserves 87.82% accuracy (initial accuracy:
accuracy compared to an initial 77.88% on retain set. The 90.97%) on Dr and 8.48% (initial accuracy: 91.82%) on the
FineTune and gradient ascent (NegGrad) methods either lose D f . This showcases the wide applicability of our method.
performance on Dr or their performance on D f is much
higher than expected. For example, in the case of four-class
unlearning on CIFAR-10, FineTune retains decent accuracy F. Prediction Distribution for the Forget Class of Data
on Dr but it fails to unlearn D f properly. It preserves 53.66% We plot the graph of the prediction class outcomes of the
accuracy on the forget classes versus 0% preserved by our unlearned model for the forget class of data. For example,
method. The NegGrad appears to unlearn the forget classes Fig. 2(a) depicts the prediction outcomes of an unlearned
properly but its performance on Dr takes a hit. It obtains ResNet18 model (forget class = class0) for the samples from
22% retain set accuracy versus 80.21% accuracy obtained by class0. The prediction outcomes for two-classes unlearned
our method. In addition, our method significantly outperforms model (forget class = class1, class2) is also shown in Fig. 2(b).
both FineTune and NegGrad in RT. This suggests that much Here we can check whether our unlearned model predicts
of the information about D f is still present in the unlearned a specific class(es) for all the forget set of data (Streisand
model which is not desirable. For example, in case of effect [9]) because this could lead to a potential vulnerability to
two-class unlearning on CIFAR-10 + ResNet-18, NegGrad adversarial attacks. We observe in Fig. 2 that all the predictions
achieves a decent 72.12% on Dr and 0.05% on D f . But the for the forget class of data are randomly distributed across the
model relearns in seven epochs compared to our method’s remaining retain classes. Our unlearned model is unable to
RT of 100 epochs. Thus, our method shows excellent overall confidently correlate the forget data with any specific retain
unlearning results as reported in Tables I, III, IV for multiclass class. This shows that our method has actually erased the
unlearning. information related to the unlearn class of data.
3) Unlearning in Face Recognition: Facial images are dif- We also compare the predictions of the retrained model
ficult to differentiate from each other for a model and are (gold model) and the proposed method in Fig. 3. It can be

Fig. 4. Layer-wise weight distance between the unlearned models (retrain model, our model) and the original model. The values are presented on a log scale.
Our method obtains comparable or higher weight distances in comparison to the retrain model. (a) 1-C unlearning (AllCNN). (b) 2-C unlearning (AllCNN).
(c) 1-C unlearning (ResNet18). (d) 2-C unlearning (ResNet18).

Fig. 6. Figure shows the training time comparison between Retraining, Fisher
Forgetting, and UNSIR (our method).

C. Efficiency
Fig. 5. GradCAM visualization of ResNet18 on CIFAR-100. The first column Our method is fast and highly efficient in comparison to
depicts visualization in one-class unlearning and the remaining columns depict
the visualization in four-classes unlearning. (a) Input. (b) Original model.
retraining and the existing unlearning approaches [9], [10].
(c) Unlearned model. The Fisher Forgetting [9] and NTK-based forgetting [10]
approaches require Hessian approximation which is compu-
tationally very expensive. These methods give some bounds
observed that the output distribution in both models is very on the amount of information remaining but they are quite
similar. This further shows the robustness of our method. inefficient for practical use. They take even more time than
retraining itself. Whereas retraining took us around 10 min
(617 s), it took us more than 2 h to run Fisher forgetting [9]
VI. A NALYSIS for one-class unlearning in ResNet18 + CIFAR-10. The Fisher
A. Layer-Wise Distance Between the Network Weights forgetting for one-class unlearning in AllCNN + CIFAR-10
takes around 1 h. For CIFAR-100, the estimated time surpassed
The layer-wise distance between the original and unlearned 25 h. The NTK-based forgetting [10] uses Fisher noise along
models help in understanding the effect of unlearning at with NTK-based model approximations and thus is even more
each layer. The weight difference should be comparable to time-consuming. Our method only requires 1.1 s for 40 steps
the retrain model as a lower distance indicates ineffective of noise optimization on ResNet18 + CIFAR-10, 1.70 s for
unlearning and a much higher distance may point to Streisand one epoch of impair, and 1.13 s for an epoch of repair. The
effect and possible information leaks. We compare the weight total computational time for unlearning is less than 4 s. This
distance in the 1) retrained model, and 2) proposed method for is 154× faster than the retraining approach, 1875× faster
AllCNN and ResNet18 in Fig. 4. We notice that the weight than the Fisher approach. We achieve fast unlearning without
differences of the proposed method with respect to the original compromising the effectiveness of the method. Moreover, our
model show a similar trend to that of the retrain model. method is scalable to large problems and big models. The cost
of noise matrix estimation depends on the cost of a forward
pass in the model. Usually, in multiclass unlearning, the cost of
B. Visualizing the Unlearning in Models
noise matrix estimation is linearly dependent on the number of
We use GradCAM [53] to visualize the area of focus in forget classes. In the case of UNSIR, the algorithm is executed
the model (ResNet18) for images in the unlearn class. Fig. 5 only once for both single-class or multiclass unlearning. Thus,
depicts where the model focuses before and after applying our our method offers the most efficient multiple-class unlearning
method for unlearning one-class and 20-classes, respectively. among them. Fig. 6 shows the time complexity comparison for
As expected, after applying our method, the model is unable to retraining, Fisher Forgetting, and UNSIR. Our method requires
focus on the relevant areas, indicating that the network weights 1250× less time than retraining and 125× less time than Fisher
no longer contain information related to those unlearn classes. forgetting.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:15:20 UTC from IEEE Xplore. Restrictions apply.
13054 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 9, SEPTEMBER 2024

TABLE V But this would hurt the zero-glance assumption and thus,
O BSERVING THE E FFECT OF D IFFERENT C OMBINATION OF I MPAIR -R EPAIR unlearning random samples, or only a subset of a class is
S TEPS . T HE E XPERIMENTS A RE D ONE ON
R ES N ET 18 + VGGFACE -100
out of the scope of this article. Furthermore, the analysis
of adaptive adversaries, that have exact knowledge about
the proposed algorithm, is out of scope as well. We also
point out the trade-off between speed and accuracy in our
unlearning method. For example, the efficiency gain in the
proposed method is 154× and 1875× more than the retrain
and Fisher method, respectively. However, this efficiency gain
comes at the cost of decreased accuracy in comparison to the
retrain method.

VII. C ONCLUSION
D. Comparing Different Impair-Repair Configurations In this article, we presented a stringent zero-glance setting
We conduct experiments to provide a comparison for unlearning and explore an efficient solution for it. We also
between different impair-repair configurations on ResNet18 + develop a scalable, multiple-class unlearning method. The
VGGFace-100 in Table V. A single impair-repair cycle does unlearning method consists of learning an error-maximizing
not yield the expected 0% accuracy on forget set. Since most noise matrix followed by single pass impair and repair to
of the damage is done in the impair step, we observe the update the network weights. Different from existing works,
effect of executing two impair steps before the repair step. our method is highly efficient in unlearning multiple classes
After impair, the performance on the forget set reaches the of data and we empirically demonstrate its effectiveness in
desired 0% but the model regains 3% accuracy after the repair a variety of deep networks such as CNN and ViT. The
step. We then execute two cycles of impair-repair. This means method is applicable to deep networks trained with any kind
one impair, one repair, one impair, and one repair step. This of optimization. Excellent unlearning results on a large-scale
yields the expected 0% on the forget set with minimal loss face recognition dataset are also shown which is the first such
on performance on the retain set (72.79, 72.5 versus 70.86%). attempt. Our work opens up a new direction for efficient
Furthermore, additional ablation analysis is presented in the multiclass unlearning on large-scale problems. A possible
Supplementary Material. future direction could be to perform unlearning without using
any kind of training samples.
E. Limitations
ACKNOWLEDGMENT
Our method achieves unlearning in already trained deep This research/project is supported by the National Research
learning models. The existing approaches [3], [9], [10], [12] Foundation, Singapore under its Strategic Capability Research
either require training the models in a specific manner or Centres Funding Initiative. Any opinions, findings, and con-
make impractical assumptions such as linear models or treating clusions or recommendations expressed in this material are
deep model training as a convex optimization problem and those of the author(s) and do not reflect the views of National
are thus incompatible with our target settings of unlearning Research Foundation, Singapore.
from an already trained model. The unlearning approach in [4]
provides an exact unlearning guarantee but consumes a lot R EFERENCES
of memory and requires implementation during the training
[1] P. Voigt and A. Von Dem Bussche, “The EU general data protection
process. Our method can be used to perform unlearning as regulation (GDPR),” in A Practical Guide, 1st ed. Cham, Switzerland:
an afterthought, i.e., delete data from previously deployed Springer, 2017.
deep learning models. Similarly, unlike the linear/convex case, [2] E. Goldman, “An introduction to the California consumer privacy act
(CCPA),” in Proc. Santa Clara Univ. Legal Stud. Res. Paper, 2020,
where strong bounds on the amount of remaining information pp. 1–7.
can be formulated, forgetting on DNNs often does not come [3] A. Golatkar, A. Achille, A. Ravichandran, M. Polito, and S. Soatto,
with any provable bound. This is still an open problem. The “Mixed-privacy forgetting in deep networks,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 792–801.
kind of information bounds given in the above works are not
[4] L. Bourtoule et al., “Machine unlearning,” in Proc. IEEE Symp. Secur.
compatible with our framework. To cope with this limitation, Privacy (SP), May 2021, pp. 141–159.
we conduct extensive experimental analysis to check the [5] D. M. Sommer, L. Song, S. Wagh, and P. Mittal, “Athena: Probabilistic
unlearning performance through a variety of widely accepted verification of machine unlearning,” Proc. Privacy Enhancing Technol.,
vol. 2022, no. 3, pp. 268–290, Jul. 2022.
metrics. We use performance on retain and forget set, layer- [6] S. Garg, S. Goldwasser, and P. N. Vasudevan, “Formalizing data deletion
wise weight difference, prediction distribution comparison for in the context of the right to be forgotten,” in Proc. Adv. Cryptol.—
forget set and RT to evaluate the unlearning and showcase that EUROCRYPT, Zagreb, Croatia, May 2020, pp. 373–402.
[7] J. Brophy and D. Lowd, “Machine unlearning for random forests,” in
our method is effective with no empirical signs of information Proc. Int. Conf. Mach. Learn., 2021, pp. 1092–1104.
leakage. However, a more formal guarantee of unlearning [8] Q. P. Nguyen, B. K. H. Low, and P. Jaillet, “Variational Bayesian
might be desired in highly privacy-sensitive applications. unlearning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020,
Unlearning a random cohort of data is beyond the scope pp. 16025–16036.
[9] A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless
of this work. Although, in theory, an error-maximizing noise net: Selective forgetting in deep networks,” in Proc. IEEE/CVF Conf.
matrix can be generated corresponding to the random samples. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9304–9312.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:15:20 UTC from IEEE Xplore. Restrictions apply.
TARUN et al.: FAST YET EFFECTIVE MACHINE UNLEARNING 13055

[10] A. Golatkar, A. Achille, and S. Soatto, “Forgetting outside the box: [32] A. Achille, G. Paolini, and S. Soatto, “Where is the information in a
Scrubbing deep networks of information accessible from input-output deep neural network?” 2019, arXiv:1905.12213.
observations,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 383–398. [33] M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, and Y. Zhang,
[11] A. Ginart, M. Y. Guan, G. Valiant, and J. Zou, “Making ai forget you: “When machine unlearning jeopardizes privacy,” in Proc. ACM SIGSAC
Data deletion in machine learning,” in Proc. Adv. Neural Inf. Process. Conf. Comput. Commun. Secur., Nov. 2021, pp. 896–911.
Syst., 2019, pp. 3513–3526. [34] N. G. Marchant, B. I. Rubinstein, and S. Alfeld, “Hard to forget:
[12] C. Guo, T. Goldstein, A. Hannun, and L. Van Der Maaten, “Certified Poisoning attacks on certified machine unlearning,” in Proc. AAAI Conf.
data removal from machine learning models,” in Proc. Int. Conf. Mach. Artif. Intell., vol. 36, no. 7, 2022, pp. 7691–7700.
Learn., 2020, pp. 3832–3842. [35] A. Thudi, H. Jia, I. Shumailov, and N. Papernot, “On the necessity of
[13] T. Baumhauer, P. Schöttle, and M. Zeppelzauer, “Machine unlearning: auditable algorithmic definitions for machine unlearning,” in Proc. 31st
Linear filtration for logit-based classifiers,” Mach. Learn., vol. 111, no. 9, USENIX Secur. Symp. (USENIX Secur.), 2022, pp. 4007–4022.
pp. 3203–3226, Sep. 2022. [36] C. Wu, S. Zhu, and P. Mitra, “Federated unlearning with knowledge
[14] Z. Izzo, M. A. Smart, K. Chaudhuri, and J. Zou, “Approximate data distillation,” 2022, arXiv:2201.09441.
deletion from machine learning models,” in Proc. Int. Conf. Artif. Intell. [37] C. Chen, F. Sun, M. Zhang, and B. Ding, “Recommendation unlearning,”
Statist., 2021, pp. 2008–2016. in Proc. ACM Web Conf., Apr. 2022, pp. 2768–2777.
[15] S. Neel, A. Roth, and S. Sharifi-Malvajerdi, “Descent-to-delete: [38] L. Graves, V. Nagisetty, and V. Ganesh, “Amnesiac machine learning,” in
Gradient-based methods for machine unlearning,” in Proc. 32nd Int. Proc. AAAI Conf. Artif. Intell., vol. 35, no. 13, 2021, pp. 11516–11524.
Conf. Algorithmic Learn. Theory, 2021, pp. 931–962. [39] A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh, “Remember what
[16] Y. Wu, E. Dobriban, and S. Davidson, “DeltaGrad: Rapid retraining you want to forget: Algorithms for machine unlearning,” in Proc. Adv.
of machine learning models,” in Proc. Int. Conf. Mach. Learn., 2020, Neural Inf. Process. Syst., vol. 34, 2021.
pp. 10355–10366. [40] T. Shibata, G. Irie, D. Ikami, and Y. Mitsuzumi, “Learning with selective
[17] V. S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Can forgetting,” in Proc. 30th Int. Joint Conf. Artif. Intell., vol. 2, no. 4, 2021,
bad teaching induce forgetting? Unlearning in deep networks using an p. 6.
incompetent teacher,” in Proc. AAAI Conf. Artif. Intell., 2023, pp. 1–12. [41] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in
[18] A. K. Tarun, V. S. Chundawat, M. Mandal, and M. Kankanhalli, “Deep Proc. 53rd Annu. Allerton Conf. Commun., Control, Comput. (Allerton),
regression unlearning,” 2022, arXiv:2210.08196. Sep. 2015, pp. 1310–1321.
[19] V. S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Zero- [42] N. Phan, Y. Wang, X. Wu, and D. Dou, “Differential privacy preservation
shot machine unlearning,” IEEE Trans. Inf. Forensics Inf. Security, early for deep auto-encoders: An application of human behavior prediction,”
access, Apr. 7, 2023, doi: 10.1109/TIFS.2023.3265506. in Proc. 30th AAAI Conf. Artif. Intell., 2016, pp. 1–8.
[20] A. Mahadevan and M. Mathioudakis, “Certifiable machine unlearning [43] S. Shan, E. Wenger, J. Zhang, H. Li, H. Zheng, and B. Y. Zhao, “Fawkes:
for linear models,” 2021, arXiv:2106.15093. Protecting privacy against unauthorized deep learning models,” in Proc.
[21] A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. LeCun, 29th USENIX Secur. Symp. (USENIX Secur.), 2020, pp. 1589–1604.
“The loss surfaces of multilayer networks,” in Proc. 18th Int. Conf. Artif. [44] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer-
Intell. Statist., 2015, pp. 192–204. sal adversarial perturbations,” in Proc. IEEE Conf. Comput. Vis. Pattern
[22] P. W. Koh and P. Liang, “Understanding black-box predictions via influ- Recognit. (CVPR), Jul. 2017, pp. 1765–1773.
ence functions,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1885–1894. [45] Z. Shen, S. Fan, Y. Wong, T.-T. Ng, and M. Kankanhalli, “Human-
[23] P. W. Koh, K.-S. Ang, H. Teo, and P. S. Liang, “On the accuracy of imperceptible privacy protection against machines,” in Proc. 27th ACM
influence functions for measuring group effects,” in Proc. Adv. Neural Int. Conf. Multimedia, Oct. 2019, pp. 1119–1128.
Inf. Process. Syst., vol. 32, 2019, pp. 5254–5264. [46] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
[24] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
adversarial examples,” Stat, vol. 1050, p. 20, Dec. 2015. (CVPR), Jun. 2016, pp. 770–778.
[25] H. Huang, X. Ma, S. M. Erfani, J. Bailey, and Y. Wang, “Unlearnable [47] J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving
examples: Making personal data unexploitable,” in Proc. ICLR, 2021, for simplicity: The all convolutional net,” in Proc. ICLR (Workshop
pp. 1–17. Track), 2015, pp. 1–14.
[26] Y. Cao and J. Yang, “Towards making systems forget with machine [48] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
unlearning,” in Proc. IEEE Symp. Secur. Privacy, May 2015, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc.
pp. 463–480. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
[27] B. Mirzasoleiman, A. Karbasi, and A. Krause, “Deletion-robust pp. 4510–4520.
submodular maximization: Data summarization with ‘the right [49] A. Dosovitskiy et al., “An image is worth 16 × 16 words: Transformers
to be forgotten,”’ in Proc. Int. Conf. Mach. Learn., 2017, for image recognition at scale,” in Proc. Int. Conf. Learn. Represent.,
pp. 2449–2458. 2021, pp. 1–22.
[28] M. Abadi et al., “Deep learning with differential privacy,” in Proc. 2016 [50] A. Krizhevsky et al., “Learning multiple layers of features from tiny
ACM SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 308–318. images,” CIFAR, Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009.
[29] C. Dwork and A. Roth, “The algorithmic foundations of differential pri- [51] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “VGGFace2:
vacy,” Found. Trends Theor. Comput. Sci., vol. 9, nos. 3–4, pp. 211–407, A dataset for recognising faces across pose and age,” in Proc. 13th IEEE
Aug. 2014. Int. Conf. Autom. Face Gesture Recognit. (FG), May 2018, pp. 67–74.
[30] V. Gupta, C. Jung, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and C. Waites, [52] S. Rezaei and X. Liu, “On the difficulty of membership inference
“Adaptive machine unlearning,” in Proc. Adv. Neural Inf. Process. Syst., attacks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
vol. 34, 2021, pp. 16319–16330. (CVPR), Jun. 2021, pp. 7892–7900.
[31] A. Prabhu, P. H. Torr, and P. K. Dokania, “GDumb: A sim- [53] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
ple approach that questions our progress in continual learning,” in D. Batra, “Grad-CAM: Visual explanations from deep networks via
Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, gradient-based localization,” in Proc. IEEE Int. Conf. Comput. Vis.
pp. 524–540. (ICCV), Oct. 2017, pp. 618–626.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:15:20 UTC from IEEE Xplore. Restrictions apply.

Download Guide to Wireless Communications Olenewa ebook All Chapters PDF
100% (2)
Download Guide to Wireless Communications Olenewa ebook All Chapters PDF
62 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
INDU 6121 - Course Outline (Fall 2021)
No ratings yet
INDU 6121 - Course Outline (Fall 2021)
10 pages
Remember What You Want To Forget
No ratings yet
Remember What You Want To Forget
12 pages
13 Machine Unlearning 36
No ratings yet
13 Machine Unlearning 36
36 pages
Comparative Study of Machine Unlearning Techniques for Computer Vision and NLP Models
No ratings yet
Comparative Study of Machine Unlearning Techniques for Computer Vision and NLP Models
6 pages
Wang 等 - 2024 - Machine Unlearning a Comprehensive Survey
No ratings yet
Wang 等 - 2024 - Machine Unlearning a Comprehensive Survey
29 pages
An Introduction To Machine Unlearning
No ratings yet
An Introduction To Machine Unlearning
37 pages
Machine Unlearning in Computer Vision (2023–2025)
No ratings yet
Machine Unlearning in Computer Vision (2023–2025)
20 pages
A Survey On Mahcine Unlearing
No ratings yet
A Survey On Mahcine Unlearing
36 pages
Machine Unlearning
No ratings yet
Machine Unlearning
10 pages
Choi Towards Efficient Machine Unlearning With Data Augmentation Guided Loss-Increasing GLI CVPRW 2024 Paper
No ratings yet
Choi Towards Efficient Machine Unlearning With Data Augmentation Guided Loss-Increasing GLI CVPRW 2024 Paper
10 pages
CVPR24 Tutoria Clean 06162024 Sec1
No ratings yet
CVPR24 Tutoria Clean 06162024 Sec1
17 pages
ARCANE An Efficient Architecture For Exact Machine Unlearning
No ratings yet
ARCANE An Efficient Architecture For Exact Machine Unlearning
8 pages
Machine Unlearning
No ratings yet
Machine Unlearning
20 pages
FedEraser IWQoS21
No ratings yet
FedEraser IWQoS21
10 pages
2403.11530v2 (2)
No ratings yet
2403.11530v2 (2)
17 pages
A Survey On Machine Unlearning: Techniques and New Emerged Privacy Risks
No ratings yet
A Survey On Machine Unlearning: Techniques and New Emerged Privacy Risks
35 pages
Low Rank Adaptation
No ratings yet
Low Rank Adaptation
7 pages
A Survey of Machine Unlearning
No ratings yet
A Survey of Machine Unlearning
24 pages
Merchanistic Unlearn
No ratings yet
Merchanistic Unlearn
18 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
1 Fu
No ratings yet
1 Fu
10 pages
17371-Article Text-20865-1-2-20210518
No ratings yet
17371-Article Text-20865-1-2-20210518
9 pages
LoRA retains more
No ratings yet
LoRA retains more
3 pages
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
From Everand
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
DAVID MACKAY
No ratings yet
Has Approximate Machine Unlearning Been Evaluated Properly? From Auditing To Side Effects
No ratings yet
Has Approximate Machine Unlearning Been Evaluated Properly? From Auditing To Side Effects
22 pages
Right to Be Forgotten
No ratings yet
Right to Be Forgotten
11 pages
Report (1)
No ratings yet
Report (1)
14 pages
Few-Shot Unlearning
No ratings yet
Few-Shot Unlearning
6 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Federated Unlearning a Survey on Methods Design Guidelines and Evaluation Metrics
No ratings yet
Federated Unlearning a Survey on Methods Design Guidelines and Evaluation Metrics
21 pages
Offset Unlearning for Large Language Models
No ratings yet
Offset Unlearning for Large Language Models
11 pages
Constrained Conditional Model: Fundamentals and Applications
From Everand
Constrained Conditional Model: Fundamentals and Applications
Fouad Sabry
No ratings yet
2504.12996v1
No ratings yet
2504.12996v1
8 pages
New Microsoft PowerPoint Presentation (Autosaved)
No ratings yet
New Microsoft PowerPoint Presentation (Autosaved)
31 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
参数高效的llmEraser
No ratings yet
参数高效的llmEraser
24 pages
Variational Diffusion Unlearning
No ratings yet
Variational Diffusion Unlearning
19 pages
2106.04378
No ratings yet
2106.04378
25 pages
DeepClean Rebuttal
No ratings yet
DeepClean Rebuttal
1 page
2023 Bfu Bayesian Federated Unlearning With Parameter Self-sharing_compressed
No ratings yet
2023 Bfu Bayesian Federated Unlearning With Parameter Self-sharing_compressed
12 pages
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Week 3
No ratings yet
Week 3
4 pages
Final2 0
No ratings yet
Final2 0
23 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Athena Probabilistic Verification of Machine Unlearning
No ratings yet
Athena Probabilistic Verification of Machine Unlearning
23 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
LaN Unleaerning 1746625974
No ratings yet
LaN Unleaerning 1746625974
15 pages
Essential Federated Learning: AI at the Edge
From Everand
Essential Federated Learning: AI at the Edge
Robert Johnson
No ratings yet
Cybersecurity in Cloud Computing
From Everand
Cybersecurity in Cloud Computing
Akula Achari
No ratings yet
Deep Reinforcement Learning: An Essential Guide
From Everand
Deep Reinforcement Learning: An Essential Guide
Robert Johnson
No ratings yet
Virtual Intelligence: Fundamentals and Applications
From Everand
Virtual Intelligence: Fundamentals and Applications
Fouad Sabry
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
Poster On Unlearning of LLMs
No ratings yet
Poster On Unlearning of LLMs
1 page
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
28782-Article Text-32836-1-2-20240324
No ratings yet
28782-Article Text-32836-1-2-20240324
9 pages
PUMA_Performance_Unchanged_Model_Augmentation_for_
No ratings yet
PUMA_Performance_Unchanged_Model_Augmentation_for_
15 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Aldaghri 2021
No ratings yet
Aldaghri 2021
14 pages
Business Information System Assignment 1
No ratings yet
Business Information System Assignment 1
13 pages
e-book
No ratings yet
e-book
5 pages
Download Complete Energy Minimization Methods in Computer Vision and Pattern Recognition Marcello Pelillo PDF for All Chapters
100% (1)
Download Complete Energy Minimization Methods in Computer Vision and Pattern Recognition Marcello Pelillo PDF for All Chapters
53 pages
Material Point Method: Theory and Applica3ons
No ratings yet
Material Point Method: Theory and Applica3ons
35 pages
Mindray M7 MSK Brochure
No ratings yet
Mindray M7 MSK Brochure
2 pages
112.blockchaintechnologgy Overview
No ratings yet
112.blockchaintechnologgy Overview
13 pages
Datasheet - HK Mt8952 9560
No ratings yet
Datasheet - HK Mt8952 9560
22 pages
ATS 12-13 Security Testing 1
No ratings yet
ATS 12-13 Security Testing 1
37 pages
I0T Based Smart Plant Monitoring System
No ratings yet
I0T Based Smart Plant Monitoring System
5 pages
CATIA
No ratings yet
CATIA
59 pages
Ims561 Sow
No ratings yet
Ims561 Sow
3 pages
CGR Notes
No ratings yet
CGR Notes
43 pages
GL-INT-006 Transfer JVA and JEs To GL Interface
No ratings yet
GL-INT-006 Transfer JVA and JEs To GL Interface
53 pages
Young Achievers School of Caloocan Inc.: A Compilation Study
No ratings yet
Young Achievers School of Caloocan Inc.: A Compilation Study
4 pages
Caterpillar 311b 330b Excavators Engine and Pump Electronic Control System
100% (61)
Caterpillar 311b 330b Excavators Engine and Pump Electronic Control System
10 pages
Kien-Truc-Phan-Mem - Tran-Minh-Triet - 8-Software-Architecture-Process - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-Phan-Mem - Tran-Minh-Triet - 8-Software-Architecture-Process - (Cuuduongthancong - Com)
33 pages
Best Journal
No ratings yet
Best Journal
9 pages
TACN
No ratings yet
TACN
21 pages
1-Suppose You Design An Architecture For An Application For College Registration. What Is A Good Style To Use? Give Your Justification. (Answer)
No ratings yet
1-Suppose You Design An Architecture For An Application For College Registration. What Is A Good Style To Use? Give Your Justification. (Answer)
4 pages
13.1.10 Packet Tracer - Configure A Wireless Network
No ratings yet
13.1.10 Packet Tracer - Configure A Wireless Network
5 pages
Junos Release Notes 22.3r2
No ratings yet
Junos Release Notes 22.3r2
149 pages
Diploma in Cyber Security and Ethical Hacking Course Ugc
No ratings yet
Diploma in Cyber Security and Ethical Hacking Course Ugc
36 pages
EE4146_Test1_202324_semB_solution
No ratings yet
EE4146_Test1_202324_semB_solution
7 pages
Test Code Mytap Mysql Dan Jawaban Mahasiswa
No ratings yet
Test Code Mytap Mysql Dan Jawaban Mahasiswa
17 pages
Functional and Nonfunctional Requirements-Case Studies
No ratings yet
Functional and Nonfunctional Requirements-Case Studies
22 pages
Doing A Basic Team Ranking Using Match Data
No ratings yet
Doing A Basic Team Ranking Using Match Data
8 pages
5th Sem BCS515B_ AI_Module3
No ratings yet
5th Sem BCS515B_ AI_Module3
113 pages
Arduino Combination Lock Lockduino
No ratings yet
Arduino Combination Lock Lockduino
22 pages

2

Uploaded by

2

Uploaded by

13046 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO.

Fast Yet Effective Machine Unlearning

C ONSIDER a scenario where it is desired that the informa-

A. Evaluation Metrics C. Baseline Unlearning Methods

VGGFace-100: A batch of the noise matrix is learned and TABLE III

Fig. 3. Prediction distribution (AllCNN) of the retrained model (left) and

one of the most challenging unlearning tasks. The results

You might also like