0% found this document useful (0 votes)
48 views

Breaking The Trilemma of Privacy, Utility, Efficiency Via Controllable Machine Unlearning

This document proposes a new framework called Controllable Machine Unlearning (ConMU) to address the trilemma of privacy, utility, and efficiency in machine unlearning. Existing machine unlearning methods focus on only one or two aspects of this trilemma. ConMU contains three modules to balance these aspects: 1) an important data selection module balances efficiency and utility, 2) a progressive Gaussian mechanism balances privacy and utility, and 3) an unlearning proxy controls the trade-off between privacy and efficiency. Experiments show ConMU achieves superior control over this trade-off compared to existing methods.

Uploaded by

786678234cy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Breaking The Trilemma of Privacy, Utility, Efficiency Via Controllable Machine Unlearning

This document proposes a new framework called Controllable Machine Unlearning (ConMU) to address the trilemma of privacy, utility, and efficiency in machine unlearning. Existing machine unlearning methods focus on only one or two aspects of this trilemma. ConMU contains three modules to balance these aspects: 1) an important data selection module balances efficiency and utility, 2) a progressive Gaussian mechanism balances privacy and utility, and 3) an unlearning proxy controls the trade-off between privacy and efficiency. Experiments show ConMU achieves superior control over this trade-off compared to existing methods.

Uploaded by

786678234cy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Breaking the Trilemma of Privacy, Utility, Efficiency via

Controllable Machine Unlearning


Zheyuan Liu∗ Guangyao Dou∗ Yijun Tian
University of Notre Dame University of Pennsylvania University of Notre Dame
South Bend, Indiana, USA Philadelphia, Pennsylvania, USA South Bend, Indiana, USA
[email protected] [email protected] [email protected]

Chunhui Zhang Eli Chien† Ziwei Zhu†


Dartmouth College Georgia Institute of Technology George Mason University
arXiv:2310.18574v1 [cs.CR] 28 Oct 2023

Hanover, New Hampshire, USA Atlanta, Georgia, USA Fairfax, Virginia, USA
[email protected] [email protected] [email protected]

ABSTRACT
Machine Unlearning (MU) algorithms have become increasingly
critical due to the imperative adherence to data privacy regulations.
The primary objective of MU is to erase the influence of specific
data samples on a given model without the need to retrain it from
scratch. Accordingly, existing methods focus on maximizing user
privacy protection. However, there are different degrees of privacy
regulations for each real-world web-based application. Exploring
the full spectrum of trade-offs between privacy, model utility, and
runtime efficiency is critical for practical unlearning scenarios. Fur-
thermore, designing the MU algorithm with simple control of the
aforementioned trade-off is desirable but challenging due to the
inherent complex interaction. To address the challenges, we present
Controllable Machine Unlearning (ConMU), a novel framework
designed to facilitate the calibration of MU. The ConMU framework
contains three integral modules: an important data selection mod-
ule that reconciles the runtime efficiency and model generalization,
a progressive Gaussian mechanism module that balances privacy Figure 1: Privacy, utility, efficiency trilemma in machine
and model generalization, and an unlearning proxy that controls the unlearning. All previous works have focused on either one
trade-offs between privacy and runtime efficiency. Comprehensive or two extremities of the problem while ignoring the full
experiments on various benchmark datasets have demonstrated spectrum of trade-offs between the trinity (as shown in blue
the robust adaptability of our control mechanism and its superi- dots on each subplot). Each of the proposed modules in our
ority over established unlearning methods. ConMU explores the ConMU offers smooth control of a pair of two unlearning
full spectrum of the Privacy-Utility-Efficiency trade-off and allows aspects specifically. Together, ConMU is capable of achiev-
practitioners to account for different real-world regulations. Source ing a satisfactory outcome for versatile practical scenarios,
code available at: https://github.com/guangyaodou/ConMU. including various degrees of privacy regulations, efficiency
constraints, and utility objectives.
KEYWORDS
Machine Unlearning, Data Privacy, Trustworthy ML, Deep Learning
California Consumer Privacy Act [46] have established the right to
1 INTRODUCTION be forgotten [5, 17, 44]. This mandates the elimination of specific
user data from models upon removal requests.
Machine Learning (ML) models are often trained on real-world
One naive approach to “forget” user data is removing it from the
datasets in various domains, including computer vision, natural
training set. However, this cannot provide sufficient privacy since
language processing, and recommender systems [6, 10, 32, 58]. For
ML models tend to memorize training samples [23]. Organizations
example, many computer vision models are trained on images pro-
must either expensively retrain the model from scratch without
vided by Flickr users [55], whereas an amount of natural language
using specified samples or employ machine unlearning [9] tech-
processing and recommender system algorithms have a high re-
niques to protect user data privacy. Machine unlearning methods
liance on IMDB [40]. Meanwhile, privacy regulations like the Gen-
are designed to meticulously eliminate samples and their associ-
eral Data Protection Regulation of the European Union [33] and the
ated influence from both the dataset and the trained model. This
∗ Equal Contributions safeguards the data privacy with unlearning, protecting it from
† Corresponding Authors potential malicious attacks and privacy breaches.
Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, and Ziwei Zhu

Beyond privacy, utility and efficiency are also important aspects on the delicate balance between privacy, model utility, and
of machine unlearning problems. For instance, sacrificing utility runtime efficiency.
by naively returning constant or purely random output ensures • We propose ConMU, which contains three modules, each
privacy but results in a useless model. On the other hand, retraining designed to reconcile these competing factors: important
from scratch without data subject to removal guarantees privacy data selection, progressive Gaussian mechanism, and un-
and utility yet is prohibitively expensive. Designing a method that learning proxy.
simultaneously maximizes the privacy, utility, and efficiency as- • Extensive experiments demonstrate the effectiveness of our
pects of machine unlearning is critical and needed. Unfortunately, proposed framework under both class-wise and random
theoretical machine unlearning research provides evidence that forgetting requests.
there is an inevitable privacy-utility-efficiency trade-off even for
convex problems [13, 30, 45, 48] and similar phenomena exist in 2 RELATED WORK
other privacy problems [11, 12]. This leads to a trilemma of trinity 2.1 Machine Unlearning with Theoretical
aspects of machine unlearning, which are accuracy, privacy, and
runtime efficiency. While the aforementioned theoretical unlearn-
Guarantees
ing solutions provide smooth control among the trinity, they are The concept of machine unlearning was first raised in [9]. In general,
restricted to simple models and cannot be generalized to general two unlearning criteria have been considered in previous works:
deep-learning approaches. Exact Unlearning and Approximate Unlearning. Exact unlearning
While a number of efforts have been put into machine unlearn- requires eliminating all information relevant to the removed data
ing, existing unlearning solutions for deep neural networks mainly so that the unlearned model performs exactly the same as a com-
focus on maximizing part of the trinity while neglecting their deli- pletely retrained model. For example, the authors of [25] presented
cate trade-off. In real-world scenarios, different applications would unlearning approaches for 𝑘-means clustering. [5] proposed the
require different levels of privacy regulations, runtime constraints, SISA framework that partitions data into shards and slices, and each
and utility demand. For example, protecting user identities in health- shard has a weak learner, which enables quick retraining when deal-
care applications [22] might be stricter than safeguarding friendship ing with unlearning requests. However, exact unlearning does not
data on social networks. However, autonomous driving [10] and allow algorithms to trade privacy for utility and efficiency due to
fraudulent attack detection in financial systems [59] would priori- its high requirements for privacy level. In contrast, unlike exact un-
tize accuracy and runtime efficiency more than privacy. learning, approximate unlearning only requires the parameters of
Therefore, a practical machine unlearning solution should be the unlearned model to be similar to a retrained model from scratch.
able to easily account for different levels of privacy, utility, and Therefore, it is possible for approximate unlearning to sacrifice a
efficiency requirements that arise from various tasks in practice. portion of the privacy in exchange for better utility and efficiency.
Unfortunately, the literature lacks a comprehensive examination [30, 48] studied the approximate unlearning for the cases of linear
and controllable MU approach for deep learning to the intricate dy- models and convex losses. [42] extended the idea and provided a
namics involved in balancing privacy, model accuracy, and runtime theoretical guarantee on weak convex losses. [13, 45] generalize the
efficiency. A natural yet pivotal research question arises: "How to method of [30] to the graph learning domain. Nevertheless, none
resolve the unlearning trilemma for deep neural networks?". of these approximate unlearning solutions apply to general deep
To answer this question, we present Controllable Machine Un- neural networks.
learning (ConMU), a novel framework that consists of three compo-
nents: an important data selection module, a progressive Gaussian 2.2 Unlearning in Deep Learning Models
mechanism, and an unlearning proxy. Each component empha- Machine unlearning for deep neural networks is challenging be-
sizes one part of the aforementioned trilemma, see Figure 1 for cause of the non-convex nature of the loss function [3]. [36] ap-
the illustration. In particular, the important data selection module proximated the model perturbation towards the empirical risk min-
modulates the relationship between runtime and model accuracy. imization on the remaining datasets, using the inverse of Hessian.
The progressive Gaussian mechanism controls the trade-offs be- [27] used fisher-based unlearning and introduced an upper bound
tween accuracy and privacy. The unlearning proxy facilitates a of SGD-based algorithms to scrub information from intermediate
re-calibration between runtime and privacy. We further underscore layers of DNNs. [28] extended the framework and introduced forget-
that the ConMU is adaptable and can be generalized across diverse ting methods using NTK theory [34]. [14] proposed a knowledge
model architectures. Among all conducted experiments, ConMU adaptation technique where the unlearned model tries to learn
achieves the best privacy performance across 10 out of 12 experi- from a competent teacher model about retained dataset and an
ments, with competitive model utility and a 10-15x faster runtime incompetent model about forgetting dataset. [15] proposed an un-
efficiency. Additionally, compared to the naive control baseline, learning method without using any training samples, in which they
ConMU has illustrated greater control over the trilemma by exhibit- used the error-maximizing noise, proposed by [54], to generate an
ing superior and stable performance under the influence of multiple impaired forgetting dataset, and then used the error-minimizing
trade-offs. Our main contributions are as follows: noise to generate the approximated retained dataset. However, this
method yields poor results. Thus, [15] proposed another unlearning
• To the best of our knowledge, this is the first work of tack- algorithm that uses gated knowledge transfer in a teacher-student
ling the critical trilemma within the realm of machine un- framework. [56] also proposed a Knowledge Gap Alignment method
learning for deep neural networks, with a specific focus that minimizes the output distribution difference between models
Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning

that are trained on different data samples. [53] focuses on deep re- module (Figure 2 (a)) selectively discards unimportant retaining
gression unlearning tasks using a partially trained blindspot model and forgetting data samples that will not be utilized by subsequent
to minimize the distribution difference with the original model. modules. Discarding more samples improves training time while
Lastly, [35] showed that applying unlearning algorithms on pruned degrading the model’s accuracy. Next, the Progressive Gaussian
models gives better performance. Mechanism module (Figure 2 (b)) injects Gaussian noise into the
Though important data selections were largely used in deep remaining forgetting dataset. The amount of noise can control the
learning [47], their implementation in the MU field is still unex- balance between privacy and accuracy. Subsequently, an unlearn-
plored. We discovered that the important data selection is able to ing proxy model (Figure 2 (c)) is trained on the retained dataset
offer strong control over the utility-efficiency trade-off. Similarly, for a select number of epochs. Through knowledge transfer, the
Gaussian Noise had been largely adopted in the field of differential training epoch of the proxy can balance the runtime and privacy.
privacy [1, 2, 7, 8, 18–20, 24], while its usage in machine unlearning Finally, by fine-tuning the original model using the concatenated
is not yet fully investigated. In addition, the concept of utilizing par- retained and noised forgetting datasets, it is transformed into an
tially competent teachers for a privacy-efficiency trade-off has not unlearned version. As a result, by controlling the data volume, the
been previously examined. Moreover, many of the existing works Gaussian noise level, and proxy training duration, we are able to
focused solely on privacy, overlooking the relationships between account for different privacy-utility-efficiency requirements. Sub-
accuracy, privacy, and runtime efficiency. Unlike other machine- sequent sections delve deeper into each module’s capabilities and
unlearning algorithms, our method gives users exceptional flex- their influence on the trilemma.
ibility and control over the trade-offs among these three factors.
In addition, our method imposes no restrictions on optimization 4.1 Important Data Selection
methodologies or model architecture. Unlearning acceleration is crucial in MU. Since our method uses
both remaining noised 𝐷 𝑓 and remaining 𝐷𝑟 to perform fine-tuning,
3 PRELIMINARIES the amount of 𝐷 𝑓 and 𝐷𝑟 play significant roles in the run time of
Removing certain training data samples can impact a model’s ac- our proposed methods. However, the large quantities of 𝐷 𝑓 and
curacy, potentially improving, maintaining, or diminishing it [52]. 𝐷𝑟 will likely result in an inefficient MU algorithm with a long
As noted by [14], significant discrepancies between unlearned and runtime. Therefore, to facilitate this process, we introduce a novel
retrained models can lead to the Streisand effect, inadvertently filtering method using EL2N scores to determine which samples
revealing information about forgotten samples through unusual are important for unlearning scenarios. Suppose that 𝑓 (𝜃, 𝑥) is the
model behavior. Therefore, the goal of Machine Unlearning is to output of the neural network 𝜃 with given data 𝑥, and denote 𝑦 as
erase the influence of the set of samples we want to forget so that the true class label of 𝑥. We calculate the mean and the standard
the unlearned model approximates the retraining one. deviation of 𝑙2 normed loss:
Let 𝐷𝑜 = {𝑥𝑖 }𝑖=1 𝑁 be the complete dataset before unlearning
𝜇𝜃 (𝑥) = E𝑥 ||𝑓 (𝜃, 𝑥) − 𝑦|| 2, (1)
requests, in which 𝑥𝑖 is the 𝑖 𝑡ℎ sample. Let 𝐷 𝑓 be the set of samples √︁
we want to forget as forgetting dataset, and the complement of 𝐷 𝑓 , 𝜎𝜃 (𝑥) = V𝑥 ||𝑓 (𝜃, 𝑥) − 𝑦|| 2 . (2)
which we denote as 𝐷𝑟 , is the set of samples retained in the training A higher 𝜇𝜃 means that 𝑥 is hard to learn and they tend to be the
samples, i.e. 𝐷 𝑓 ∪𝐷𝑟 = 𝐷𝑜 and 𝐷 𝑓 ∩𝐷𝑟 = ∅. In the setting of random outliers in the dataset. A lower 𝜇𝜃 means that 𝜃 can fit 𝑥 well. There-
forgetting, 𝐷 𝑓 may contain samples from different classes of 𝐷𝑜 . In fore, we can keep data samples important for the generalization
class-wise forgetting, 𝐷 𝑓 is a set of examples that have the same of models by keeping data within a certain range of samples that
class labels. We denote 𝜃𝑜 as the parameters of the original model, don’t have a very high or low 𝜇𝜃 . In our method, we introduce two
which was trained on 𝐷𝑜 , denote the parameters of unlearned controllable hyperparameters 𝑧 1 and 𝑧 2 and calculate a bound:
models as 𝜃𝑢 , and denote the parameters of retrained model 𝜃𝑟 ,
which is the model completely retrained from scratch using only [𝜇𝜃 (𝑥) − 𝑧 1 × 𝜎𝜃 (𝑥), 𝜇𝜃 (𝑥) + 𝑧 2 × 𝜎𝜃 (𝑥)]. (3)
𝐷𝑟 . Lastly, let 𝜃 𝐼 denote the parameters of the unlearning proxy, This bound gives users control of how many important data points
which has the same model architecture as 𝜃𝑜 , but was only partially we want to include by tuning 𝑧 1 and 𝑧 2 . If we include more data, our
trained on 𝐷𝑟 for a few epochs. accuracy increases, but the runtime also increases. As a result, the
𝜃𝑟 is the gold standard in our MU problem. The goal of machine ConMU can have a greater speed-up while maximally preserving
unlearning is to approximate our 𝜃𝑢 to 𝜃𝑟 , with less computational accuracy by utilizing important data samples.
overhead. However, for machine unlearning on deep neural net-
works, achieving a balance between utility, privacy, and efficiency 4.2 Progressive Gaussian Mechanism
has always been a difficult task.
MU algorithms aim to erase the information about 𝐷 𝑓 from the
original model. In order to forget 𝐷 𝑓 , we can continue training
4 METHODS the original model using an obfuscated version of 𝐷 𝑓 , prompting
To address such a trilemma in machine unlearning, we introduce catastrophic forgetting of 𝐷 𝑓 . Within this context, we propose the
the ConMU (Figure 2), a novel framework that consists of an im- progressive Gaussian mechanism, which leverages Gaussian noise
portant data selection, a progressive Gaussian mechanism, and an to obscure the selected 𝐷 𝑓 . Moreover, one of the standout features of
unlearning proxy that modulate relationships among accuracy, pri- this approach is that the magnitude and the shape of Gaussian noise
vacy, and runtime efficiency. First, the important data selection applied to the dataset serve as tunable hyperparameters, granting a
Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, and Ziwei Zhu

Figure 2: The overall framework of proposed method ConMU, which is placed after forgetting request. In (a), an important data
selection is implemented to select data samples that are important to the model. A customized upper/lower bound is attached
to this module to facilitate the selection process. Then, the selected forgetting data 𝐷 ′𝑓 is passed to (b), the progressive Gaussian
mechanism, to gradually inject Gaussian noise. More noise in the image leads to higher privacy. Afterward, the processed
forgetting data 𝐷 ′′
𝑓
is concatenated with the selected retaining data 𝐷𝑟′ , which is used for fine-tuning the original model. The
unlearning proxy (c) is partially trained on the retaining data 𝐷𝑟 and knowledge is transferred to the original model via KL
Divergence.

remarkable degree of control over the process. More formally, after transferring the knowledge of the behavior of the unlearning proxy,
selecting a subset of important samples: we can obtain an unlearned model that contains less information
about the forgetting datasets.
𝐷 ′𝑓 ∈ [𝜇𝜃𝑢 (𝐷 𝑓 ) − 𝑧 1 × 𝜎𝜃𝑢 (𝐷 𝑓 ), 𝜇𝜃𝑢 (𝐷 𝑓 ) + 𝑧 2 × 𝜎𝜃𝑢 (𝐷 𝑓 )], (4)
More formally, the unlearning proxy model 𝜃 𝐼 is partially trained
the ConMU adds Gaussian noise to data samples to balance privacy on the retained dataset 𝐷𝑟 for 𝛿 epochs, in which 𝛿 is a hyperparam-
and accuracy. More specifically, for each data samples in 𝐷 ′𝑓 , we eter. Next, we compute the KL divergence between the probability
add Gaussian noise and obtain: distribution of 𝜃 𝐼 ’s output on the input data 𝑥 and that of the 𝜃𝑢 as:
 
𝐷 ′′ ′ 2
∑︁ 𝜃 (𝑥) (𝑖)
𝑓 = 𝐷 𝑓 + 𝛼 × 𝑁 , 𝑁 ∼ N (𝜇, 𝜎 I), (5) 𝐷𝐾𝐿 (𝜃 𝐼 (𝑥) ∥ 𝜃𝑢 (𝑥)) = 𝜃 𝐼 (𝑥)(𝑖) log 𝐼 , (6)
𝑖
𝜃𝑢 (𝑥) (𝑖)
where 𝛼, 𝜇, and 𝜎 2 are controllable hyperparameters, where 𝜇 and
where 𝑖 corresponds to the data class. We want to minimize this KL
𝜎 2 represent the mean and variance of the Gaussian distribution,
divergence, aiming to make the output distribution of the unlearned
and the 𝛼 represents the number of times the noise was added to
model 𝜃𝑢 as close as possible to that of a model that has never seen
the sample. With more noise being added to the data samples, we
𝐷 𝑓 , which is the unlearning proxy. In section 5.3, we demonstrate
will get higher privacy, but lower model accuracy. Therefore, the
that if 𝛿 increases, the 𝜃𝑢 will become more similar to 𝜃𝑟 , but with
progressive Gaussian mechanism controls the amount of informa-
increasing runtime.
tion they want to scrub away and the amount of information that
they want to preserve to maintain the accuracy of the model. In
4.4 Controlling Machine Unlearning
Section 5.3, we empirically demonstrated that with larger 𝛼, the
accuracy decreases and the privacy increases, and vice versa. After discussing the individual modules for important data selection,
progressive Gaussian mechanism, and using an unlearning proxy,
4.3 Fine-tuning with Unlearning Proxy we now focus on how these parts come together. First, we obtain
The objective of machine unlearning is to align the output distri- 𝐷𝑛𝑒𝑤 = 𝐷 ′′𝑓
∪ 𝐷𝑟′ , in which:
bution of the unlearned model closely with that of the retrained
model — a model never exposed to the forgotten data samples. To 𝐷𝑟′ ∈ [𝜇𝜃𝑢 (𝐷𝑟 ) − 𝑧 1′ × 𝜎𝜃𝑢 (𝐷𝑟 ), 𝜇𝜃𝑢 (𝐷𝑟 ) + 𝑧 2′ × 𝜎𝜃𝑢 (𝐷𝑟 )], (7)
achieve this, we can utilize an unlearning proxy model, which is and 𝑧 1′ and 𝑧 2′
are two hyperparameters for filtering the retained
a model that has the same architecture as the original model and dataset, as discussed in 4.1. With 𝐷𝑛𝑒𝑤 , we will use the cross-
is partially trained on the retained dataset for a few epochs. By entropy (CE) loss to further train 𝜃𝑢 on 𝐷𝑛𝑒𝑤 , combined with the
Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning

KL loss in section 4.3. The loss to train the unlearned model 𝜃𝑢 is effectively control a specific aspect of the trilemma? (3) Can the
defined as: naive fine-tune method possess the same control ability as the
ConMU?
L = 𝐶𝐸 (𝐷𝑛𝑒𝑤 ) + 𝛾𝐷𝐾𝐿 (𝜃 𝐼 (𝐷𝑛𝑒𝑤 ) ∥ 𝜃𝑢 (𝐷𝑛𝑒𝑤 )). (8)
The 𝛾 in Equation (8) ensures that these two losses are on the same 5.1 Experiment setups
scale. In summary, the ConMU uses Equation (8) to fine-tune the
5.1.1 Datasets and models. Our experiments mainly focus on
original model 𝜃𝑜 to achieve 𝜃𝑢 , with way fewer epochs required
image classification for CIFAR-10 [37] on ResNet-18 [31] under
by complete retraining, allowing the calibration of the amount of
two unlearning scenarios: random data forgetting and class-wise
data to preserve, the amount of noise added to the filtered forget
data forgetting. Besides, additional experiments are conducted on
data samples, and number of times to train the unlearning proxy.
CIFAR100 [37], and SVHN [43] datasets using vgg-16 [50].
With these three modules, the ConMU allows controllable trade-offs
between accuracy, privacy, and runtime. 5.1.2 Baseline Models. For baselines, we compare with Fine-
Tuning (FT) [27, 35, 57], Gradient Ascent (GA) [29, 35], and Influ-
4.5 Forget-Retain-MIA Score ence Unlearning (IU) [16, 26, 35, 36, 44]. In particular, FT directly
There are many evaluation metrics to determine the privacy of the utilizes retained dataset 𝐷𝑟 to fine-tune the original model 𝜃𝑜 . The
unlearning algorithms. For example, many literatures used Retain GA method attempts to add the gradient updates on 𝐷 𝑓 during
Accuracy (RA) and Forget Accuracy (FA) [4, 14, 15, 26, 27, 35, 44, 54], the training process back to the 𝜃𝑜 . Lastly, IU leverages influence
which are the generalization ability of the unlearned model on 𝐷𝑟 functions to remove the influence of the target data sample in 𝜃𝑜 .
and 𝐷 𝑓 , respectively. Moreover, many previous works have used Besides, [35] has shown that pruning first before applying unlearn-
Membership Inference Attacks (MIA) [14, 21, 29, 35, 44, 51] that ing algorithms will increase performance. Therefore, we apply the
determine whether a particular training sample was present in the OMP (one-shot-magnitude pruning) [35, 38, 39, 41] to each baseline
training data for a model. model as well as the ConMU. The details of each baseline model
Given this landscape of varied metrics, it becomes imperative are elaborated in Appendix B.1.
to consolidate them to yield a more comprehensive evaluation. As 5.1.3 Evaluation Metrics. We aim to evaluate MU methods from
we have stated in section 3, our goal for the evaluation of privacy five perspectives: test accuracy forget accuracy (TA), forget accuracy
is to ensure minimal disparity between our model’s outcomes and (FA), retain accuracy (RA), membership inference attack (MIA) [49],
the retrained model, which is the gold standard of unlearning tasks. runtime efficiency (RTE), and FRM privacy score. Specifically, TA
Therefore, we introduce a new evaluation metric called the Forget- measures the accuracy of 𝜃𝑢 on the testing datasets and evaluates
Retain-MIA (FRM) score that considers the differences between the the generalization ability of MU methods. FA and RA measure the
unlearned model with the retrained model on the trifecta of FA, RA, accuracy of the unlearned model on forgetting dataset 𝐷 𝑓 and re-
and MIA, which is inspired by NeurlPS 2023 machine unlearning taining dataset 𝐷𝑟 , respectively. MIA verifies if a particular training
challenge 1 . Suppose we denote 𝐹𝐴𝑟 , 𝑅𝐴𝑟 , and 𝑀𝐼𝐴𝑟 as the FA, RA, sample existed in the training data for the original model. Lastly,
and MIA performance of the retrained model, and denote 𝐹𝐴𝑢 , 𝑅𝐴𝑢 , we use the FRM privacy score metric to comprehensively evaluate
and 𝑀𝐼𝐴𝑢 as the FA, RA, and MIA performance of the unlearning the privacy level of an MU method. Additional details of evaluation
model, we calculate the FRM score as: metrics are illustrated in Appendix A.
|𝐹𝐴𝑢 − 𝐹𝐴𝑟 | |𝑅𝐴𝑢 − 𝑅𝐴𝑟 | |𝑀𝐼𝐴𝑢 − 𝑀𝐼𝐴𝑟 | 
𝐹𝑅𝑀 = 𝑒𝑥𝑝 −( + + ) . 5.1.4 Implementation Details. We report the mean and the
𝐹𝐴𝑟 𝑅𝐴𝑟 𝑀𝐼𝐴𝑟 standard deviation in the form of 𝑎 ±𝑏 of ten independent runs with
(9)
different data splits and random seeds. For random forgetting, we
The FRM score quantitatively compares the normalized differences
randomly selected 20% of the training samples as forgetting datasets.
in FA, RA, and MIA performances of the unlearning model with
For class-wise forgetting, we randomly selected 50% of a particular
that of its retrained counterpart. The FRM score lies between 0
class for different datasets as the forgetting samples. The details
and 1. An FRM score of an unlearning model will be closer to
will be in appendix B.2.
1 if the unlearned model’s privacy is perfectly aligned with the
retained model’s privacy, and it will be closer to 0 if the model is
completely different from the retrained model. An ideal FRM score
5.2 Experiment Results
of 1 signifies that the unlearning algorithm has achieved exact To answer the first question: Can ConMU find the best trade-
unlearning. We use the FRM score to evaluate the ConMU and off points given three important factors? We conduct random
other baseline models’ performance on privacy in the subsequent forgetting and class-wise forgetting to comprehensively evaluate
experiment sections. the effectiveness of a MU method. The performance is reported in
Table 1. Note that a better performance of a MU method contains
5 EXPERIMENTS a smaller performance gap with the retrained model (except TA
In this section, we conduct extensive experiments to validate the and RTE), which is the gold standard for MU tasks. According
effectiveness of the ConMU. In particular, through the experiments, to the table, we can find that IU (influence unlearning) and GA
we aim to answer the following research questions: (1) Can ConMU (gradient ascent) with OMP pruning achieve satisfactory results
find the best balance point given the trilemma? (2) Can each module under unlearning privacy (FA, RA, MIA, and FRM) and efficiency
(RTE) metrics with relatively shorter runtime. According to the
1 https://unlearning-challenge.github.io/ table, IU is usually the fastest baseline, while GA is the runner-up.
Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, and Ziwei Zhu

Table 1: Overall results of our proposed ConMU, with a number of baselines under two unlearning scenarios: random forgetting
and class-wise forgetting. Since a retrained model is the golden baseline for unlearning tasks, we evaluate the performance of
MU models based on their differences to the retrained model. Bold indicates the best performance and underline indicates
the runner-up. The unlearning performance of each MU method is evaluated under five metrics: test accuracy (𝑇 𝐴), accuracy
on forget data (𝐹𝐴), accuracy on retain data (𝑅𝐴), membership inference attack (𝑀𝐼𝐴), and running time efficiency (𝑅𝑇 𝐸). An
additional 𝐹 𝑅𝑀 is added on top of that to thoroughly evaluate the privacy level of each method. Note that a larger value of FRM
denotes a smaller performance discrepancy with the retrained model, which means a higher privacy level. A performance
difference against the retrained model is reported in (•). The best performance in the metrics with blue ↓ has the smallest gap
with the retrained model. While the metrics with black ↑ favor a greater performance value.

MU Methods 𝑇 𝐴(%) ↑ 𝐹𝐴(%)↓ 𝑅𝐴(%)↓ 𝑀𝐼𝐴(%)↓ 𝑅𝑇 𝐸 (𝑠) 𝐹𝑅𝑀 𝑃𝑟𝑖𝑣𝑎𝑐𝑦 ↑


Resnet-18 Random data forgetting (CIFAR-10)
retrain 79.99 80.46 (0.00) 91.47 (0.00) 19.62 (0.00) 933.51 1
IU + Pruning 41.63±0.14 41.62±0.11 (38.84) 41.21±0.07 (50.26) 57.61±0.11 (37.99) 33.69 0.051
GA + Pruning 64.61±0.14 66.74±0.24 (13.72) 66.15±0.13 (25.32) 34.15±0.24 (14.53) 28.15 0.305
FT + Pruning 84.71±0.14 84.13±0.06 (3.67) 90.96±0.05 (0.51) 15.42±0.06 (4.20) 475.99 0.767
ConMU 78.83±0.57 81.22±0.53 (0.76) 81.75±0.69 (9.72) 18.93±0.10 (0.69) 59.59 0.855
Resnet-18 Class-Wise forgetting (CIFAR-10)
retrain 82.55 68.22 (0.00) 89.67 (0.00) 33.92 (0.00) 1241.92 1
IU + Pruning 20.39±2.25 0.01±0.00 (68.21) 20.96±2.33 (68.71) 100.00±0.00 (66.08) 40.15 0.024
GA + Pruning 52.22±0.35 15.22±0.41 (53) 53.88±0.38 (35.79) 83.23±0.41 (49.31) 25.24 0.072
FT + Pruning 85.75±0.11 69.89±0.72 (1.67) 92.10±0.03 (2.43) 27.81±0.72 (6.11) 565.47 0.793
ConMU 83.61±1.97 67.23±2.27 (0.99) 86.68±2.50 (5.75) 32.92±2.21 (1.00) 89.4 0.925
VGG Random data forgetting (CIFAR-10)
retrain 81.10 81.49 (0.00) 92.09 (0.00) 19.54 (0.00) 881.57 1
IU + Pruning 59.74±0.08 57.97±0.10 (23.52) 57.52±0.09 (34.57) 39.32±0.09 (19.78) 38.36 0.186
GA + Pruning 69.43±0.14 69.97±0.04 (11.52) 69.79±0.09 (22.30) 29.20±0.04 (9.66) 47.17 0.414
FT + Pruning 83.88±0.71 58.98±0.60 (22.51) 90.64±0.77 (1.45) 38.41±0.60 (18.87) 378.02 0.283
ConMU 79.09±2.19 82.52±2.41 (1.03) 84.00±2.40 (8.09) 17.53±2.43 (2.01) 32.42 0.816
VGG Class-Wise forgetting (CIFAR-10)
retrain 82.41 69.02 (0.00) 92.90 (0.00) 33.44 (0.00) 1034.40 1
IU + Pruning 53.06±17.55 27.16±28.87 (41.86) 53.08±20.04 (38.82) 65.50±14.07 (32.06) 46.70 0.136
GA + Pruning 53.18±0.25 11.96±0.28 (57.06) 54.51±0.29 (38.39) 86.42±0.28 (52.98) 30.74 0.059
FT + Pruning 83.88±0.91 58.98±8.94 (10.04) 90.64±0.82 (2.26) 38.31±8.94 (4.87) 353.97 0.729
ConMU 81.12±3.27 63.75±8.33 (5.27) 87.10±3.79 (5.80) 36.20±3.30 (2.76) 148 0.800
VGG Random data forgetting (CIFAR-100)
retrain 60.65 60.54 (0.00) 92.49 (0.00) 40.60 (0.00) 823.15 1
IU + Pruning 7.15±0.01 5.97±0.02 (54.57) 5.83±0.01 (86.66) 6.65±0.02 (33.95) 38.74 0.069
GA + Pruning 14.71±0.07 13.96±0.10 (46.58) 14.24±0.09 (78.25) 85.53±48.73 (44.93) 47.42 0.066
FT + Pruning 49.78±0.57 47.67±0.81 (12.87) 60.70±0.79 (31.79) 50.95±0.81 (10.35) 215.22 0.444
ConMU 55.22±0.32 61.53±2.58 (0.99) 71.77±1.16 (20.72) 41.42±2.22 (0.82) 65.01 0.771
VGG Class-Wise forgetting (CIFAR-100)
retrain 63.71 65.78 (0.00) 93.00 (0.00) 38.21 (0.00) 1036.40 1
IU + Pruning 7.14±0.03 3.56±0.01 (62.22) 5.89±0.92 (87.11) 4.44±0.92 (33.77) 37.95 0.063
GA + Pruning 10.68±0.16 0.44±0.37 (65.34) 10.21±0.14 (82.79) 1.00±0.37 (37.21) 23.01 0.057
FT + Pruning 57.33±0.58 64.00±10.41 (1.78) 74.60±0.86 (18.40) 36.45±10.41 (1.76) 403.07 0.762
ConMU 58.78±0.95 58.39±4.53 (7.39) 80.03±1.56 (12.97) 38.54±4.61 (0.33) 53.07 0.771

However, this outstanding unlearning efficiency comes at a high comes with a high sacrifice on runtime efficiency, making it the
cost to the model utility, rendering them the worst baseline models slowest baseline method. Finally, we observe that the ConMU can
in terms of test accuracy. Alternatively, FT (fine-tuning) performs outperform other baselines by remarkable margins and achieve
well across all metrics with the exception of unlearning efficiency. a good balance on all privacy metrics and competitive accuracy
As shown in Table 1, FT is the runner-up in the majority of cases across CIFAR-10, CIFAR-100, and SVHN, respectively. Additionally,
and achieves a high FRM across all benchmarks. However, this the ConMU has the highest FRM score among all baseline models
Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning

(a) Test Acc. vs Runtime (b) Test Acc. vs FRM Score (c) FRM Score vs Runtime

Figure 3: Ablation study results of each module on CIFAR-10 with ResNet-18. For every module, we fix the other two novel
modules while adjusting its own controllable parameters. Since each proposed module is designed to control one side of the
trilemma, we present the results for each module in a chart with 𝑥 and 𝑦 axes representing their respective controlled factors.

with an acceptable runtime efficiency relative to other baselines. increases. Take the last two points as an example, including 10 %
More experiment results of the different datasets are presented in more data leads to a mere 1.39 % increase in model utility but incurs
Appendix C. a substantial 29.7 % increase in runtime, escalating from 135.43
seconds to 175.71 seconds. This phenomenon suggests that beyond
5.3 Unlearning Trilemma Analysis a certain threshold of included data, sacrificing runtime yields only
Given our proposed ConMU is to narrow the performance gap with marginal improvements in model utility.
the gold-retrained model and to better control the trade-off between
different metrics, we conduct further experiments to validate the 5.3.2 Utility vs Privacy. In the intricate landscape of the trilemma,
effectiveness of each module. The central question addressed is: another crucial facet involves the delicate equilibrium between util-
Can each module effectively govern a specific facet of the ity and privacy. As mentioned in Section 4.2, the purpose of such a
aforementioned trilemma? The associated results are shown module is to disrupt the forgetting information in samples, where a
in Figure 3. To better answer this question, we first represent the higher noise level indicates that the sample contains more chaotic
unlearning trilemma as a triangle (figure 1), wherein each side cor- information, which represents better privacy. To validate this hy-
responds to a distinct aspect of the trilemma. An effective control pothesis, we modify the mechanism’s noise level to demonstrate
module should identify a balance point anywhere along the side, the relationship between model utility and privacy. Figure 3 (b),
rather than being confined to the two endpoints. Since the ConMU illustrates the performance of the proposed progressive Gaussian
contains three modules, to better observe the compatibility and mechanism module under varying noise levels. We begin with a
flexibility of each module in influencing different metrics, we sys- noise level of 0, which uses the selected data from the previous
tematically adjust the input values of each module with random module, and increase it to a noise level of 10. As shown by the in-
forgetting requests on the CIFAR-10 dataset, showcasing the ability creasing FRM score, a higher noise level results in a closer privacy
to control trade-offs at various levels. level with respect to the retrained model (higher FRM score). For
example, increasing the noise level from 0 to 2 increases the FRM by
5.3.1 Utility vs Efficiency. In our proposed method, the impor- 4.2 %, from 0.707 to 0.737. This enhancement, however, comes with
tant data selection module is specifically designed to curate the a 0.72 % decrease in model utility, from 81.21% to 80.62%. This trend
samples that are later utilized in the fine-tuning process of the is consistent as the intensity of noise increases. As the noise level
pruned model. The rationale behind this is twofold: (1) extracting increases from 8 to 10, the model test accuracy decreases from 79.59
the samples that contribute significantly to model generalization % to 78.22%, a decrease of 1.75 %, while the FRM increases by 3.2
process, and (2) expediting the runtime of the unlearning process. %. This phenomenon demonstrates the viability of the compromise
To further investigate the trade-off between model utility and run- between model utility and privacy.
time efficiency, we carefully adjust the upper and lower bounds of
the important data selection to incorporate different percentiles of 5.3.3 Privacy vs Efficiency. Lastly, there is a discernible trade-
data. In Figure 3 (a), we present the control ability of the proposed off between model privacy and runtime performance. As mentioned
important data selection module by selecting different portions in Section 4.3, we introduce an unlearning proxy to strike a bal-
of data. We start with the inclusion of 5 % of the data and grad- ance between these two crucial factors. This module’s purpose is
ually progress to 90 %. From figure 3 (a), we first discover that a to reduce the privacy disparity between the retrained model and
higher percentile of selected data not only prolongs the runtime the unlearning model by means of an unlearning proxy. To vali-
but also enhances the utility performance of the ConMU. For in- date this effect, we progressively increase the training epoch of the
stance, increasing the data percentile from 5 % to 25% results in a unlearning proxy from 0 to 8, enabling it closer to the retrained
7.89 % increase in model accuracy from 75.09 to 81.02. However, model. As shown in Figure 3 (c), an increase in the number of train-
this improvement comes at the cost of a 68.17 % increase in run- ing epochs in the unlearning proxy resulted in improved privacy
time, escalating from 38.86 seconds to 65.35 seconds. Furthermore, performance, as indicated by a higher FRM score. Raising the proxy
we observe diminishing returns as the included data percentage training epoch from 2 to 3 increases runtime by 23.13 %, from 67.61
Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, and Ziwei Zhu

(a) Test Acc. vs Runtime (b) Test Acc. vs FRM Score (c) FRM Score vs Runtime

Figure 4: Unlearning performance of ConMU and Naive Fine-tuning on CIFAR-10 with ResNet-18 under random forgetting
request using various combinations of controllable mechanisms. For FT, we adjust the epoch number from 5 to 35 and try
different learning rates ranging from 0.1 to 0.0001. Figure 4 (a) focuses on evaluating the relationship between utility and
runtime efficiency, where 𝑥 and 𝑦 axes denote the test runtime and accuracy, respectively. Figure 4 (b) focuses on the relationship
between utility and privacy, where 𝑥 and 𝑦 axes denote the FRM score and test accuracy, respectively. Figure 4 (c) depicts the
relationship between privacy and runtime efficiency, where 𝑥 and 𝑦 axes denote the runtime and FRM score, respectively. The
red point represents the performance of the fine-tuning method and the blue point denotes the ConMU.

seconds to 83.25 seconds. This results in a 26.02 % increase in FRM test accuracy improves, the performance of the ConMU remains
score. However, this progress diminishes when FRM exceeds 0.75. relatively stable and consistent. For instance, when the test accuracy
Consider the last three data points as an illustration: a 66.1 % in- for FT increases from 84.22 % to 85 %, the FT’s FRM falls from 0.77
crease in duration from 100.47 seconds to 166.89 seconds results in to 0.74. In contrast, the FRM for ConMU increases by 0.3 % from
a 4.64 % increase from 0.776 to 0.812. Similarly to the trade-off be- 0.799 to 0.801 when the test accuracy changes from 84.32 % to 85.09
tween utility and runtime, there exists a threshold between privacy %. In terms of the runtime, the FT increases from 600.22 seconds to
and runtime where sacrificing one does not result in a substantial 720.19 seconds, whereas the ConMU only increases by only 65.55
improvement in the other. seconds, from 207.13 to 272.68 seconds. Throughout the resulting
chart, ConMU displays its superiority not only in the stability of
5.4 ConMU vs. Naive Fine-Tune Method controlling the trilemma but also a significant margin over the
overall performance.
The overall performance of the rudimentary fine-tune (FT) base-
line method, as shown in Table 1, is comparable to that of the 6 CONCLUSION
ConMU baseline method. Consequently, an intriguing question
may be posed: Can the naive FT model have the same control In this paper, we identify the trilemma between model privacy,
ability over these trilemmas as the method demonstrated by utility, and efficiency that exists in machine unlearning for deep
merely adjusting its hyperparameters? In order to answer this neural networks. To address this issue and gain greater control
question, we compare the control ability of the ConMU and naive over this trilemma, we present ConMU, a novel MU calibration
fine-tuning. To demonstrate this distinction in a holistic manner, framework. Specifically, ConMU introduces three control modules:
we evaluate the performance of two models based on three crucial the important data selection, the progressive Gaussian mechanism,
factors: privacy (FRM), utility (TA), and efficiency (runtime). We and the unlearning proxy, each of which seeks to calibrate portions
primarily demonstrate the control ability of the naive fine-tuning of the MU trilemma. Extensive experiments and in-depth studies
method by varying two parameters: learning rate and fine-tuning demonstrate the superiority of the ConMU across multiple bench-
epochs. For the ConMU, we alter those three proposed modules. mark datasets and a variety of unlearning metrics. Future work
Figure 4 (a) demonstrates the trade-off between the runtime of could focus on extending our control mechanism to other fields of
each sample and the utility. In addition, figure 4 (b) illustrates the study, such as the NLP and Graph domain.
trade-off between utility and privacy, in which greater 𝑥 values
indicate a higher FRM score, which corresponds to a closer level REFERENCES
of privacy with the retrained model. As demonstrated in Section [1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov,
Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In
5.3, an expected trade-off would emerge as 𝑥 values increase from Proceedings of the 2016 ACM SIGSAC conference on computer and communications
left to right. Figure 4 (c) demonstrates the relationship between security. 308–318.
privacy and runtime efficiency. Ideally, a sample that resolves the [2] Borja Balle and Yu-Xiang Wang. 2018. Improving the gaussian mechanism for
differential privacy: Analytical calibration and optimal denoising. In International
trilemma should be placed in the top left corner of (a), the top Conference on Machine Learning. PMLR, 394–403.
right corner of (b), and the top left corner of (c). Given a similar [3] Samyadeep Basu, Philip Pope, and Soheil Feizi. 2020. Influence functions in deep
learning are fragile. arXiv preprint arXiv:2006.14651 (2020).
level of test accuracy, ConMU can achieve a higher FRM score [4] Alexander Becker and Thomas Liebig. 2022. Evaluating machine unlearning via
with a shorter runtime. When the test accuracy for FT is 77.99 epistemic uncertainty. arXiv preprint arXiv:2208.10836 (2022).
% and the ConMU is 78.08 %, for example, the FRM is 0.71 and [5] Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hen-
grui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. 2021.
0.87, respectively. Meanwhile, the runtime is 480.14 seconds and Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE,
52.12 seconds, respectively, which is 9x faster. Furthermore, as 141–159.
Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning

[6] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, [31] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda learning for image recognition. In Proceedings of the IEEE conference on computer
Askell, et al. 2020. Language models are few-shot learners. Advances in neural vision and pattern recognition. 770–778.
information processing systems 33 (2020), 1877–1901. [32] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
[7] Zhiqi Bu, Jinshuo Dong, Qi Long, and Weijie J Su. 2020. Deep learning with Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international
Gaussian differential privacy. Harvard data science review 2020, 23 (2020), 10– conference on world wide web. 173–182.
1162. [33] Chris Jay Hoofnagle, Bart Van Der Sloot, and Frederik Zuiderveen Borgesius.
[8] Clément L Canonne, Gautam Kamath, and Thomas Steinke. 2020. The discrete 2019. The European Union general data protection regulation: what it is and
gaussian for differential privacy. Advances in Neural Information Processing what it means. Information & Communications Technology Law 28, 1 (2019),
Systems 33 (2020), 15676–15688. 65–98.
[9] Yinzhi Cao and Junfeng Yang. 2015. Towards making systems forget with [34] Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent ker-
machine unlearning. In 2015 IEEE symposium on security and privacy. IEEE, nel: Convergence and generalization in neural networks. Advances in neural
463–480. information processing systems 31 (2018).
[10] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexan- [35] Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu,
der Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with Pranay Sharma, and Sijia Liu. 2023. Model sparsification can simplify machine
transformers. In European conference on computer vision. Springer, 213–229. unlearning. arXiv preprint arXiv:2304.04934 (2023).
[11] Wei-Ning Chen, Peter Kairouz, and Ayfer Ozgur. 2020. Breaking the [36] Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions
communication-privacy-accuracy trilemma. Advances in Neural Information via influence functions. In International conference on machine learning. PMLR,
Processing Systems 33 (2020), 3312–3324. 1885–1894.
[12] Eli Chien, Wei-Ning Chen, Chao Pan, Pan Li, Ayfer Özgür, and Olgica Milenkovic. [37] Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features
2023. Differentially Private Decoupled Graph Convolutions for Multigranular from tiny images. (2009).
Topology Protection. arXiv preprint arXiv:2307.06422 (2023). [38] Yann LeCun, John Denker, and Sara Solla. 1989. Optimal brain damage. Advances
[13] Eli Chien, Chao Pan, and Olgica Milenkovic. 2022. Efficient model updates for in neural information processing systems 2 (1989).
approximate unlearning of graph-structured data. In The Eleventh International [39] Xiaolong Ma, Geng Yuan, Xuan Shen, Tianlong Chen, Xuxi Chen, Xiaohan Chen,
Conference on Learning Representations. Ning Liu, Minghai Qin, Sijia Liu, Zhangyang Wang, et al. 2021. Sanity checks
[14] Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. for lottery tickets: Does your winning ticket really win the jackpot? Advances in
2023. Can bad teaching induce forgetting? unlearning in deep networks using Neural Information Processing Systems 34 (2021), 12749–12760.
an incompetent teacher. In Proceedings of the AAAI Conference on Artificial [40] Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng,
Intelligence, Vol. 37. 7210–7217. and Christopher Potts. 2011. Learning word vectors for sentiment analysis.
[15] Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. In Proceedings of the 49th annual meeting of the association for computational
2023. Zero-shot machine unlearning. IEEE Transactions on Information Forensics linguistics: Human language technologies. 142–150.
and Security (2023). [41] Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016.
[16] R Dennis Cook and Sanford Weisberg. 1980. Characterizations of an empirical Pruning convolutional neural networks for resource efficient inference. arXiv
influence function for detecting influential cases in regression. Technometrics 22, preprint arXiv:1611.06440 (2016).
4 (1980), 495–508. [42] Seth Neel, Aaron Roth, and Saeed Sharifi-Malvajerdi. 2021. Descent-to-delete:
[17] Quang-Vinh Dang. 2021. Right to be forgotten in the age of machine learning. Gradient-based methods for machine unlearning. In Algorithmic Learning Theory.
In Advances in Digital Science: ICADS 2021. Springer, 403–411. PMLR, 931–962.
[18] Jinshuo Dong, Aaron Roth, and Weijie J Su. 2019. Gaussian differential privacy. [43] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and A Ng.
arXiv preprint arXiv:1905.02383 (2019). 2018. The street view house numbers (SVHN) dataset. Technical Report. Technical
[19] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and report, Accessed 2016-08-01.[Online].
Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In [44] Thanh Tam Nguyen, Thanh Trung Huynh, Phi Le Nguyen, Alan Wee-Chung
Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference Liew, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2022. A survey of machine
on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, unlearning. arXiv preprint arXiv:2209.02299 (2022).
May 28-June 1, 2006. Proceedings 25. Springer, 486–503. [45] Chao Pan, Eli Chien, and Olgica Milenkovic. 2023. Unlearning graph classifiers
[20] Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ- with limited data resources. In Proceedings of the ACM Web Conference 2023.
ential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4 716–726.
(2014), 211–407. [46] Stuart L Pardau. 2018. The california consumer privacy act: Towards a european-
[21] Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vad- style privacy regime in the united states. J. Tech. L. & Pol’y 23 (2018), 68.
han. 2015. Robust traceability from trace amounts. In 2015 IEEE 56th Annual [47] Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. 2021. Deep
Symposium on Foundations of Computer Science. IEEE, 650–669. learning on a data diet: Finding important examples early in training. Advances
[22] Olivia G d’Aliberti and Mark A Clark. 2022. Preserving patient privacy during in Neural Information Processing Systems 34 (2021), 20596–20607.
computation over shared electronic health record data. Journal of Medical Systems [48] Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and Ananda Theertha Suresh.
46, 12 (2022), 85. 2021. Remember what you want to forget: Algorithms for machine unlearning.
[23] Vitaly Feldman. 2020. Does learning require memorization? a short tale about a Advances in Neural Information Processing Systems 34 (2021), 18075–18086.
long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory [49] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017.
of Computing. 954–959. Membership inference attacks against machine learning models. In 2017 IEEE
[24] Changyu Gao and Stephen J Wright. 2023. Differentially Private Optimization symposium on security and privacy (SP). IEEE, 3–18.
for Smooth Nonconvex ERM. arXiv preprint arXiv:2302.04972 (2023). [50] Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional net-
[25] Antonio Ginart, Melody Guan, Gregory Valiant, and James Y Zou. 2019. Making works for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
ai forget you: Data deletion in machine learning. Advances in neural information [51] Liwei Song and Prateek Mittal. 2021. Systematic evaluation of privacy risks of
processing systems 32 (2019). machine learning models. In 30th USENIX Security Symposium (USENIX Security
[26] Aditya Golatkar, Alessandro Achille, Avinash Ravichandran, Marzia Polito, and 21). 2615–2632.
Stefano Soatto. 2021. Mixed-privacy forgetting in deep networks. In Proceedings [52] Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari Morcos.
of the IEEE/CVF conference on computer vision and pattern recognition. 792–801. 2022. Beyond neural scaling laws: beating power law scaling via data pruning.
[27] Aditya Golatkar, Alessandro Achille, and Stefano Soatto. 2020. Eternal sunshine Advances in Neural Information Processing Systems 35 (2022), 19523–19536.
of the spotless net: Selective forgetting in deep networks. In Proceedings of the [53] Ayush Kumar Tarun, Vikram Singh Chundawat, Murari Mandal, and Mohan
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9304–9312. Kankanhalli. 2023. Deep regression unlearning. In International Conference on
[28] Aditya Golatkar, Alessandro Achille, and Stefano Soatto. 2020. Forgetting outside Machine Learning. PMLR, 33921–33939.
the box: Scrubbing deep networks of information accessible from input-output [54] Ayush K Tarun, Vikram S Chundawat, Murari Mandal, and Mohan Kankanhalli.
observations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, 2023. Fast yet effective machine unlearning. IEEE Transactions on Neural Networks
UK, August 23–28, 2020, Proceedings, Part XXIX 16. Springer, 383–398. and Learning Systems (2023).
[29] Laura Graves, Vineel Nagisetty, and Vijay Ganesh. 2021. Amnesiac machine [55] Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni,
learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in
11516–11524. multimedia research. Commun. ACM 59, 2 (2016), 64–73.
[30] Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens Van Der Maaten. 2020. [56] Lingzhi Wang, Tong Chen, Wei Yuan, Xingshan Zeng, Kam-Fai Wong, and
Certified Data Removal from Machine Learning Models. In International Confer- Hongzhi Yin. 2023. KGA: A General Machine Unlearning Framework Based on
ence on Machine Learning. PMLR, 3832–3842. Knowledge Gap Alignment. arXiv preprint arXiv:2305.06535 (2023).
Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, and Ziwei Zhu

[57] Alexander Warnecke, Lukas Pirch, Christian Wressnegger, and Konrad Rieck.
2021. Machine unlearning of features and labels. arXiv preprint arXiv:2108.11577
(2021).
[58] Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and
Nan Duan. 2023. Visual chatgpt: Talking, drawing and editing with visual
foundation models. arXiv preprint arXiv:2303.04671 (2023).
[59] Xiaoqian Zhu, Xiang Ao, Zidi Qin, Yanpeng Chang, Yang Liu, Qing He, and
Jianping Li. 2021. Intelligent financial fraud detection practices in post-pandemic
era. The Innovation 2, 4 (2021).
Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning

Appendix A EVALUATION METRICS forgetting dataset. For class-wise forgetting, we randomly selected
Retain Accuracy (RA) and Forget Accuracy (FA) [4, 14, 15, 26, 50% of a particular class as the forgetting data. Due to the limitation
27, 35, 44, 54]: Retain Accuracy is the generalization ability of 𝜃𝑢 of the GPU memory, the batch size is restricted to 128. Note that the
on 𝐷𝑟 , which is referred as the fidelity of machine unlearning. The 𝛼 is generally larger for class-wise forgetting than that for random
Forget Accuracy, on the other hand, is the generalization ability of forgetting.
𝜃𝑢 on 𝐷 𝑓 , which is the efficacy of machine unlearning. A better 𝐷 𝑓
score for an approximate unlearning method should minimize the
Table 2: Hyper-parameters of ConMU for CIFAR-10, CIFAR-
disparity with the retrained model, which is the gold-standard.
100 and svhn datasets on ResNet18 and VGG for random for-
Test Accuracy (TA): Test Accuracy (TA), unlike 𝐷 𝑓 or 𝐷𝑟 , assesses
getting. The 𝑧 1 and 𝑧 2 indicate the lower and upper bounds of
the generalization ability of 𝜃𝑢 on the test dataset after unlearning.
important data selection, respectively. The 𝛼 is the amount of
Besides the task of class-wise forgetting, in which the forgetting
Gaussian noise used in the Progressive Gaussian Mechanism.
class is excluded from the testing dataset, the accuracy of the test
The Gaussian noise has a mean 𝜇 and a standard deviation 𝜎.
is determined using the entire testing dataset.
𝛿 is the number of epochs that the unlearning proxy model
Confidence-based Membership Inference Attack (MIA) [14,
was trained on. 𝛾 is the coefficient of the KL loss term.
21, 29, 35, 44, 51]: Membership inference attacks determine whether
a particular training sample was present in the training data for a
Model ConMU (ResNet-18) ConMU (VGG)
model. In our work, we employed confidence-based MIA based on
Dataset CIFAR-10 CIFAR-100 svhn CIFAR-10 CIFAR-100 svhn
[35]. Formally, we first train an MIA predictor using 𝐷𝑟 and test
Fine-Tune Epoch 5 5 5 5 5 5
sets. Then, we apply the trained MIA predictor to 𝜃𝑢 on 𝐷 𝑓 , and we Learning Rate 1e-2 1e-2 1e-2 1e-2 1e-2 1e-2
get the following score called MIA-efficacy: 𝑇 𝑁 /|𝐷 𝑓 |. Here, 𝑇 𝑁 is Batch Size 128 128 128 128 128 128
the number of forgetting samples that are predicted as non-training Optimizer SGD SGD SGD SGD SGD SGD
Retain, 𝑧 2 0.3 0.25 0.28 0.2 0.45 0.3
samples. A higher MIA-efficacy score indicates that 𝜃𝑢 has less Retain, 𝑧 1 0.17 0.16 0.1 0.16 0.35 0.2
information about 𝐷 𝑓 . Forget, 𝑧 2 1.0 0.2 0.4 0.2 0.4 0.4
Runtime Efficiency (RTE) [9, 35, 44]: Runtime efficiency mea- Forget, 𝑧 1 0.85 0.18 0.3 0.15 0.37 0.3
𝛼 3 4 5 3 5 5
sures the computational efficiency of an MU method. In Machine 𝜇 0 0 0 0 0 0
Unlearning, we want to achieve the 𝜃𝑟 -like model in less computa- 𝜎 1 1 1 1 1 1
tional time. 𝛿 1 1 1 1 1 1
𝛾 0.5 0.5 0.5 0.5 0.5 0.5

Appendix B IMPLEMENTATION DETAILS


B.1 Baseline Descriptions
Influence unlearning [16, 26, 35, 36, 44]: The influence function Table 3: Hyper-parameters of ConMU for CIFAR-10, CIFAR-
is a technique that allows us to understand how model parameters 100 and svhn datasets on ResNet18 and VGG for class-wise
change by up-weighting training data points, and the effect of forgetting. The forgotten Class is the class index of the
these data points can be estimated in closed form. In machine dataset we chose to forget for the experiments. The 𝑧 1 and
unlearning, we can estimate |𝜃𝑜 − 𝜃𝑟 | if 𝐷𝑟 is removed from 𝐷𝑜 𝑧 2 indicate the lower and upper bounds of important data
using influence functions. Also, influence unlearning assumes that selection, respectively. The 𝛼 is the amount of Gaussian noise
the empirical risk is twice-differentiable and strictly convex in 𝜃𝑜 . used in the Progressive Gaussian Mechanism. The Gaussian
However, for deep neural networks, due to the non-convexity of noise has a mean 𝜇 and a standard deviation 𝜎. 𝛿 is the num-
the loss function and approximating the inverse-Hessian vector ber of epochs that the unlearning proxy was trained on. 𝛾 is
product can be erroneous [3, 36, 44], using influence functions to the coefficient of the KL loss term.
approximate 𝜃𝑟 is not effective in practice.
Gradient Ascent [29, 35]: Gradient Ascent is doing exactly the Model ConMU (ResNet-18) ConMU (VGG)

opposite of what gradient descent tries to do. It reverses the model Dataset CIFAR-10 CIFAR-100 svhn CIFAR-10 CIFAR-100 svhn

training on 𝐷 𝑓 during training back to the 𝜃𝑜 . To be more specific, Forgotten Class 5 69 5 5 69 5


Fine-Tune Epoch 5 5 5 5 5 5
gradient ascent approaches move 𝜃𝑜 towards loss for data points to Learning Rate 1e-2 1e-2 1e-2 1e-2 1e-2 1e-2
be erased. Batch Size 128 128 128 128 128 128
Regular Fine-tuning [27, 35, 57]: Different from naively retraining Optimizer SGD SGD SGD SGD SGD SGD
Retain, 𝑧 2 0.3 0.3 0.3 0.4 0.45 0.3
from the scratch, FT fine-tunes the the pre-trained 𝜃𝑜 on 𝐷𝑟 for a Retain, 𝑧 1 0.2 0.1 0.2 0.3 0.35 0.2
few epochs, with a slightly larger learning rate. The motivation is Forget, 𝑧 2 1.0 0.4 0.8 0.8 1.0 0.8
that fine-tuning on 𝐷𝑟 may cause catastrophic unlearning over 𝐷 𝑓 . Forget, 𝑧 1 0.8 0.3 0.5 0.5 0.85 0.5
𝛼 12 1 4 20 10 4
𝜇 0 0 0 0 0 0
1 1 1 1 1 1
B.2 Hyperparameters 𝜎
𝛿 1 1 1 1 1 1
The hyper-parameters of ConMU are listed in Table 2 for random 𝛾 0.5 0.5 0.5 0.5 0.5 0.5
forgetting and Table 3 for class-wise forgetting. For random forget-
ting, we randomly selected 20% of the original training data as the
Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, and Ziwei Zhu

Table 4: Additional results of our proposed ConMU, with a number of baselines under two unlearning scenarios: random
forgetting and class-wise forgetting. Bold indicates the best performance and underline indicates the runner-up. The efficacy of
each MU method is evaluated under five metrics: test accuracy (𝑇 𝐴), accuracy on forget data (𝐹𝐴), accuracy on retain data (𝑅𝐴),
membership inference attack (𝑀𝐼𝐴), and running time efficiency (𝑅𝑇 𝐸). The performance of ConMU is reported in the form of
𝑎 ±𝑏 , where 𝑎 is the mean value of independent 10 trial runs and 𝑏 denotes the standard deviation. A performance difference
against the retrained model is reported in (•).

MU Methods 𝑇 𝐴(%) ↑ 𝐹𝐴(%)↓ 𝑅𝐴(%)↓ 𝑀𝐼𝐴(%)↓ 𝑅𝑇 𝐸 (𝑠) 𝐹𝑅𝑀 𝑃𝑟𝑖𝑣𝑎𝑐𝑦 ↑

Resnet-18 Random data forgetting (CIFAR-100)


retrain 51.45 40.82 (0.00) 99.97 (0.00) 49.34 (0.00) 1066.69 1
IU + Pruning 6.48±0.02 6.03±0.02 (38.84) 6.02±0.00 (93.95) 92.92±0.02 (43.58) 38.36 0.069
GA + Pruning 31.33±0.19 31.26±0.13 (9.56) 31.64±0.18 (68.33) 67.51±0.13 (18.17) 54.58 0.276
FT + Pruning 56.36±0.83 54.87±0.87 (14.05) 70.55±0.89 (29.42) 45.18±0.87 (4.16) 282.09 0.484
ConMU 48.97±1.78 54.95±4.49 (14.13) 61.25±2.96 (38.72) 45.30±4.60 (4.04) 27.83 0.443
Resnet-18 Class-Wise forgetting (CIFAR-100)
retrain 56.43 60.89 (0.00) 99.98 (0.00) 39.12 (0.00) 1323.98 1
IU + Pruning 6.52±0.00 8.89±1.11 (52.00) 6.05±0.01 (93.93) 94.01±1.11 (54.89) 39.14 0.041
GA + Pruning 20.18±0.36 7.78±0.66 (53.11) 19.53±0.40 (80.45) 93.37±0.66 (54.25) 22.57 0.047
FT + Pruning 60.07±0.87 76.67±5.58 (15.78) 73.87±0.88 (26.11) 24.42±5.58 (14.70) 364.23 0.408
ConMU 55.22±0.32 61.53±2.58 (28.27) 71.77±1.16 (0.023) 41.46±2.31 (2.34) 48.23 0.704
VGG Random data forgetting (SVHN)
retrain 93.53 93.47 (0.00) 99.62 (0.00) 6.58 (0.00) 840.33 1
IU + Pruning 90.18±0.00 92.30±0.00 (1.17) 91.96±0.00 (7.66) 8.07±0.00 (1.49) 49.30 0.726
GA + Pruning 92.26±0.04 94.34±0.03 (0.87) 94.38±0.03 (5.24) 94.09±0.03 (87.51) 26.40 0.000
FT + Pruning 94.19±0.14 95.71±0.10 (2.24) 99.65±0.00 (0.03) 4.34±0.10 (2.24) 280.15 0.696
ConMU 90.95±0.67 91.93±0.97 (1.54) 92.51±1.09 (7.11) 8.13±0.12 (1.55) 79.71 0.716
VGG Class-Wise forgetting (SVHN)
retrain 94.12 90.21 (0.00) 99.55 (0.00) 9.80 (0.00) 1048.65 1
IU + Pruning 90.08±0.01 92.83±0.01 (2.62) 91.92±0.15 (7.63) 7.21±0.15 (2.59) 35.01 0.624
GA + Pruning 81.38±0.26 31.64±0.76 (58.57) 86.45±0.25 (13.10) 68.42±0.76 (58.62) 27.99 0.057
FT + Pruning 95.02±0.15 94.87±0.57 (4.66) 99.99±0.00 (0.44) 5.11±0.57 (4.69) 320.77 0.3772
ConMU 92.31±0.61 88.43±2.89 (1.78) 94.73±1.23 (4.82) 10.59±2.91 (0.79) 85.05 0.864
Resnet-18 Random data forgetting (SVHN)
retrain 91.05 91.68 (0.00) 99.45 (0.00) 8.35 (0.0) 954.66 1
IU + Pruning 35.04±0.00 36.45±0.00 (55.23) 36.78±0.00 (62.67) 63.70±0.00 (55.35) 40.52 0.000
GA + Pruning 85.35±0.05 87.87±0.03 (3.81) 87.51±0.03 (11.94) 12.17±0.03 (3.82) 57.55 0.538
FT + Pruning 92.95±2.89 93.34±1.82 (1.66) 100.00±0.00 (0.55) 6.63±1.82 (1.72) 729.64 0.795
ConMU 92.98±0.43 93.05±0.38 (1.37) 93.84±0.34 (5.61) 7.02±0.42 (1.33) 74.36 0.796
Resnet-18 Class-Wise forgetting (SVHN)
retrain 91.40 83.73 (0.00) 98.71 (0.00) 16.21 (0.00) 1075.59 1
IU + Pruning 32.68±0.05 13.14±0.41 (70.59) 35.48±0.07 (63.23) 86.87±0.41 (70.66) 41.08 0.003
GA + Pruning 73.96±0.10 28.25±0.79 (55.48) 77.42±0.11 (21.29) 71.72±0.79 (55.51) 19.28 0.014
FT + Pruning 93.39±0.32 90.57±1.21 (6.84) 100±0.00 (1.29) 9.40±1.21 (6.81) 458.9 0.598
ConMU 92.15±0.52 88.44±2.69 (4.71) 94.34±1.07 (4.37) 11.64±2.72 (4.57) 51.37 0.681

Appendix C ADDITIONAL EXPERIMENTS Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009
The additional experiments are reported in Table 4, which has the
same setup as Table 1.

You might also like