lecture1
lecture1
Han Zhao
01/21/2025
Brief Bio
• Name: Han Zhao
• Current position: Assistant Professor @ CS
• Research Interests: Machine Learning
- Domain adaptation / generalization
- Multitask learning / Multi-objective Optimization
- Algorithmic fairness
- Probabilistic circuits (e.g., arithmetic circuits, sum-product
networks)
Adaptation
3
Logistics
• Discussion forum:
• Piazza, UIUC, CS 442, signup code: s25-cs442
• Registration link: https://piazza.com/illinois/spring2025/cs442
4
Logistics
• Homework submission: Gradescope
• Link: https://www.gradescope.com/courses/957044
• Entry code: 4JN4GR
5
Logistics
My O ce Hour:
• Tue 3:30pm - 4:30pm (right after the class)
• Email: [email protected]
• My o ce: 3320 Siebel Center
6
ffi
ffi
Logistics
Teaching Assistant: Weixin Chen
• Email: [email protected]
• O ce Hour: F 4pm - 5pm
• Location: Lounge in front of 3102 Siebel Center
7
ffi
Course Topics
What is trustworthy ML and why should we care?
Accuracy on the training distribution is not enough!
8
Course Topics
What is trustworthy ML and why should we care?
Accuracy on the training distribution is not enough!
Aiming to build ML systems that are:
- Fair
- Generalizable
- Interpretable / Explainable
- Robust
- Privacy-preserving
9
Course Topics
Five parts:
- Basic Machine Learning
- Algorithmic Fairness
- Robustness
- Privacy
- Generalization under distribution shift
10
Course Topics
A brief introduction to supervised learning models
- Linear models:
- Classi cation: logistic regression
- Regression: linear / ridge regression
- Nonlinear models:
- Feed-forward neural networks
- Convolutional neural networks
11
fi
Course Topics
Algorithmic Fairness
- De nitions of group and individual fairness
- Tensions between di erent fairness de nitions
- Tradeo s between fairness and accuracy
- Classi cation
- Regression
- Methods to achieve fairness in supervised learning
- Learning fair representations
12
fi
fi
ff
ff
fi
Course Topics
Generalization
- Domain generalization
- distribution shift, domain adaptation /generalization
- distributional robust optimization
13
Course Topics
Robustness
- Adversarial robustness
- adversarial examples, empirical defense techniques
- certi ed robustness
14
fi
Course Topics
Privacy-Preserving
- Di erential Privacy
- De nition
- Laplacian mechanism / Gaussian mechanism
- Membership inference attacks
- Inferential Privacy
- Information obfuscation, information bottleneck, privacy funnel
- Attribute inference attacks
15
ff
fi
Course Overview
Mostly focus on the theory and algorithms of these topics.
Prerequisites:
16
Course Overview
Lecture-based course:
- 4 Homework (Homework 0 does not count towards nal grade)
- Section TMG: one nal project (due May 7th)
- One nal exam
17
fi
fi
fi
Course Overview
(TMG) Project
- Must be nished individually
- Either a literature review or original review on a topic related to this
course
- Three components:
- Proposal (due Feb. 15th, 20%): 2 pages, brie y describing the
type and goal of this project.
- Oral/Poster Presentation (date TBD, 40%): in-person
presentation
- Final report (due May 7th, 40%): ~8 pages
Format: pdf in NeurIPS LaTeX template (https://neurips.cc/Conferences/2021/PaperInformation/
StyleFiles)
Note: The score for the course project will be normalized towards 30% of your nal grade
18
fi
fl
fi
Course Overview
Homework today:
- Sign up for Piazza, Gradescope and Canvas
- Take a look of the course syllabus on Canvas
- Homework 0
Questions?
19
Introduction
The success of large-scaled supervised learning in
computer vision:
20
Introduction
The success of large-scaled supervised learning in
natural language understanding:
35 45
Machine Translation, ~3M parallel sentences [Cho et al. 2014; Devlin et al. 2014]
21
Introduction
But is it enough?
Models could be accurate at the cost of a minority group
22
Introduction
A real-world example: recidivism prediction
COMPAS (Northpointe):
Recidivism risk assessment
tool used in a county in
Florida
23
Introduction
A real-world example: recidivism prediction
COMPAS (high level):
0 1
prior arrests
B prior sentences C
B C
B age C
B C
B drug history C
=
<latexit sha1_base64="DtnBsMFJKt0GW8M0Mud6iLS7M5Q=">AAAETnicfZPNbhMxEMfdboEQvlo4clkRVeIUJQWpXCJVLZXCrQ1NWymJKq93NrHij63tbbKx9gm4wntx5UW4IfCmQUkcwUgrjeb3nxl71hOljGrTaPzY2g52Hjx8VHlcffL02fMXu3svL7XMFIEukUyq6whrYFRA11DD4DpVgHnE4Coan5T86g6UplJcmDyFAcdDQRNKsHGh89bNbq1Rb8wt3HSaC6eGFnZ2sxcc9mNJMg7CEIa17jUbqRlYrAwlDIpqP9OQYjLGQ+g5V2AOemDnJy3CfReJw0Qq9wkTzqOrGRZzrXMeOSXHZqR9Vgb/xcyIF36s7KKLcG5rzImNlEyvZ5R1N8O9zCQfBpaKNDMgyP0tkoyFRoblTMOYKiCG5c7BRFE3iJCMsMLEuMlX90O/RYpnsqiuteB4DEpK7iaFBQHmDiBgQiTnWMS2T0Vc2H6ZG0X2U+HRCdDhyPwVJHbiC6bTJZz6MM+XMPfhbLaEs42+k//01KlYUIKZ/byJjQJYUVz4inZ7hbZ9Ckq5B2X7d1hBqimTwhPEbnlWCnz0C8SQwG1ZgJQv2T1E0fJb3DpNYVsrEk9xOk2Xv+XU79DpLGGnKNyqNf3F2nQuD+rNd/WD8/e1o+PF0lXQa/QGvUVNdIiOUBudoS4iCNAX9BV9C74HP4Nfwe976fbWIucVWrOdyh/3CoRU</latexit>
B
B
B
B
race
Beducation historyC
C
C
C
Bage at first arrestC
C
COMPAS
B C
Defendant @ vocation history A
gender
<latexit sha1_base64="/Nqzz8mjoLYeKQvmI2jBWmr3N8A=">AAAFNnicfZPNbhMxEMc3bYASvlI4crGoKnGqklKpXCpVQKVwQCqlX1I2irze2cSK197a3iabZd+LV+HCDfXKIzCbBrJxKD6N5/efGdszDhLBjW21vtfW1uv37j/YeNh49PjJ02fNzefnRqWawRlTQunLgBoQXMKZ5VbAZaKBxoGAi2D0vuQX16ANV/LUZgn0YjqQPOKMWnT1mzd+AAMu8ySmVvNJ0fAtTGyeaK40oVqDsaYgvr/kNyAtSAZVQgdQ2YU6HZAhHl/prOLWlMFyDKGWRFwbOy9WoRCmt6f8R6JrdScagAxBF6Tho/H3Xv3mVmunNVtk1WjPjS1vvo77m+v7fqhYGuNdmaDGdNutxPZyqi1nAvChUgMJZSO8RBdNSWMwvXzWkYJsoyckEb5VpKQlM281IqexMVkcoBIPODQuK513MTuMC9dXVsFuzNYSQ7FVSpjliDLvqrub2uhtL+cyScv23t4iSgWxipSzQ0KugVmRoUGZ5vgQhA0pNtXihDW2iVsioVNVNJZKxHQEWqkYX4riBAk8gIQxU3FMsVs+l2GR+2VsEOQfC4eOgQ+G9o8gyseuYDJZwIkLs2wBMxdOpws4Xak7/k9Nk8g5ZVTkX1ax1QAVxamr6HQqtONS0BoHKvevqYbEcKGkIwjxB1QSfHAThBDBVZmAlZOMgygP3BJXqCnyg4rEURxNkkVbjtwKJycLeFKUX63tfqxV43x3p/1mZ/fz3tbhu/mn2/Beeq+8117b2/cOvY537J15rPapZmpfa0X9W/1H/Wf95la6VpvHvPCWVv3XbyzL3oI=</latexit>
- Black defendants more likely than white to be incorrectly labeled “high risk”
- White defendants more likely than black to be incorrectly labeled “low risk”
Defendant
26
Introduction
Northpointes’ defense:
Defendants labeled as “high risk” equally likely to recidivate,
regardless of race
Defendant
<latexit sha1_base64="KZzhH5viBCQmZGMDRQp39WZ6lRY=">AAAEn3icfVPbbhMxEHXbACVcmsIjL4aqUipFUVKQykukQqkILyiUXpWNKq93kljxZWt7m8tqf4t/QeIVvgNvmiqJIxhpteNzzszYHk8Yc2ZsrfZzbX2j8ODho83HxSdPnz3fKm2/ODcq0RTOqOJKX4bEAGcSziyzHC5jDUSEHC7CwVHOX9yCNkzJUzuOoSNIT7Iuo8Q66LrUCrpKE84xCZgM0loF14Osgu9R6tByDu5VgpuEREFLl69ww4kEi/BRebTnFrSCP7gfmfrXpZ1atTY1vOrUZ84OmlnrenvjIIgUTQRISzkxpl2vxbaTEm0Z5ZAVg8RATOiA9KDtXEkEmE46PXqGdx0SYbdb90mLp+hiREqEMWMROqUgtm98Lgf/xdm+yHwsr2IyPLUlzomtUtwsR+R5V+F2YrvvOymTcWJB0rtTdBOOrcJ5k3DENFDLx84hVDN3EZj2iSbUulYWd7FfIiYTlRWXSggyAK2UcDdFJAXuNiBhSJUQREapa2uUpUEeG4bpl8xjh8B6fXsv6KZDXzAazcmRT47Hc3Lsk5PJnJys1B3+p6aJ5YylhKffV2mrARYUp76i2Vxgmz4LWrsHlQa3RENsGFfSE0RuGhcSfPITRNCFmzwBzV+ye4iy4Ze4cZosbSxIPMXxKJ635divcHIyJ0+yzI1a3R+sVed8v1p/W93/9m7n8ONs6DbRK/QGlVEdHaBD1EQtdIYo+oF+od/oT+F14XPha6F1J11fm8W8REtWuPoL7B6awQ==</latexit>
8a 2 {0, 1}, 8c 2 (0, 1), Pr(Y = 1 | C(x) = c, A = a) = c
<latexit sha1_base64="m3HdvcKSV6lBw8mcjFlDJfMgrOU=">AAAEZXicfZNRb9MwEMe9tcAoDDpAe+GBiGoSD6hKBtJ4mTQBk8rbKOs21FSV41xaq46d2e7a1OTL8ApfiE/A18Dpitq6gpMine73v7v47IsyRpX2/V9b25Xqnbv3du7XHjzcffS4vvfkQomxJNAhggl5FWEFjHLoaKoZXGUScBoxuIxGH0p+eQNSUcHPdZ5BL8UDThNKsLahfn0/nNAYhlibr0VIeWj8114QFv16w2/6c/M2nWDhNNDCzvp7laMwFmScAteEYaW6gZ/pnsFSU8KgqIVjBRkmIzyArnU5TkH1zPwAhXdgI7GXCGk/rr15dDXD4FSpPI2sMsV6qFxWBv/F9DAt3FjZRRXe3NaYFWshmFrPKOtuhrtjnbzrGcqzsQZObk+RjJmnhVeO2oupBKJZbh1MJLWD8MgQS0y0vZDagee2yPBMFLW1FikegRQitZPCnACzP8BhQkSaYh4be2FxYcIyN4rMp8KhE6CDof4rSMzEFUynSzh1YZ4vYe7C2WwJZxt9J//pqTK+oAQz82UTawmwojh3Fa3WCm25FKS0D8qEN1hCpigT3BHEdqdWCnx0C8SQwHVZgJQv2T5Efuy2uLaawhyvSBzF6TRbXsup26HdXsJ2Ua5a4C7WpnNx2AzeNA8/v22cvF8s3Q56jl6iVyhAR+gEtdAZ6iCCvqHv6Af6Wfld3a0+q+7fSre3FjlP0ZpVX/wB0+uKQA==</latexit>
<latexit sha1_base64="iPtlB8FbW97rYGCFjS33FX91OC0=">AAAEXnicfZPNbhMxEMfdJpQSKEnhgsTFIqrEqUoKUrlEqoBK4VZC0w9lo8jrnSRW/LG1vU02q30SrvBQ3HgUvGlQEkcw0kqj+f1nxjv2hDFnxjYav3Z2S+VHe4/3n1SePjt4Xq0dvrgyKtEUulRxpW9CYoAzCV3LLIebWAMRIYfrcPKp4Nf3oA1T8tKmMfQFGUk2ZJRYFxrUqsGURTAmNrvNcQvfDmr1xnFjYXjbaS6dOlraxeCwdBpEiiYCpKWcGNNrNmLbz4i2jHLIK0FiICZ0QkbQc64kAkw/W5w8x0cuEuGh0u6TFi+i6xkZEcakInRKQezY+KwI/ovZscj9WNHF5HhhG8yJrVLcbGYUdbfDvcQOP/QzJuPEgqQPfzFMOLYKFzPGEdNALU+dQ6hmbhCYjokm1LqbqBxhv0VM5iqvbLQQZAJaKeEmRSQF7g4gYUqVEERGWcBklGdBkRuG2Zfco1Ngo7H9KxhmU18wm63gzIdpuoKpD+fzFZxv9Z3+p6eJ5ZJSwrNv29hqgDXFpa9ot9do26egtXtQWXBPNMSGcSU9QeSWaa3AZ79ABEO4KwrQ4iW7hyhbfos7p8mz1prEU5zP4tW1nPsdOp0V7OS5W7Wmv1jbztXJcfPd8cnX9/Wzj8ul20ev0Rv0FjXRKTpDbXSBuoiiBH1HP9DP0u/yXvmgXH2Q7u4sc16iDSu/+gPgfodA</latexit>
28
ff
fi
ff
fi
Introduction
Lesson learned:
Depending on the problem, choose the appropriate criterion
But, there are just too many de nitions…
29
fi
Introduction
Key assumption underlying the success: large-scale
labeled data from stationary domains
Source Target
30
Introduction
But, often the case, such an assumption does not hold
31
Introduction
But, often the case, such an assumption does not hold
Semi-supervised learning:
Training distribution = Test distribution
<latexit sha1_base64="XkZUfwlYn+/vaB2uR9tWXCTHxRE=">AAAETnicfZPNbhMxEMfdboEQvlo4clkRVeIUJaiiXCJVLZXCrQ1NWymJKq93NrHij63tbbKx9gm4wntx5UW4IfCmQUkcwUgrjeb3nxl71hOljGrTaPzY2g52Hjx8VHlcffL02fMXu3svL7XMFIEukUyq6whrYFRA11DD4DpVgHnE4Coan5T86g6UplJcmDyFAcdDQRNKsHGh89bNbq1Rb8wt3HSaC6eGFnZ2sxcc9mNJMg7CEIa17jUbqRlYrAwlDIpqP9OQYjLGQ+g5V2AOemDnJy3CfReJw0Qq9wkTzqOrGRZzrXMeOSXHZqR9Vgb/xcyIF36s7KKLcG5rzImNlEyvZ5R1N8O9zCQfBpaKNDMgyP0tkoyFRoblTMOYKiCG5c7BRFE3iJCMsMLEuMlX90O/RYpnsqiuteB4DEpK7iaFBQHmDiBgQiTnWMS2T0Vc2H6ZG0X2U+HRCdDhyPwVJHbiC6bTJZz6MM+XMPfhbLaEs42+k//01KlYUIKZ/byJjQJYUVz4inZ7hbZ9Ckq5B2X7d1hBqimTwhPEbnlWCnz0C8SQwG1ZgJQv2T1E0fJb3DpNYVsrEk9xOk2Xv+XU79DpLGGnKNyqNf3F2nQu39Wb7+sH5we1o+PF0lXQa/QGvUVNdIiOUBudoS4iCNAX9BV9C74HP4Nfwe976fbWIucVWrOdyh/4qoRZ</latexit>
Domain adaptation:
6= Test distribution
<latexit sha1_base64="R2b0WzHucRcXZRVR1rHQtxaOj5A=">AAAEUXicfZPNbhMxEMfdJEBJ+WjhyGVFVIlTlKCKcqlUtVQKtxKatlI2qrze2cSKPza2t8nG2lfgCu/FiUfhhjcNSuIIRlppNL//zNiznihlVJtW69dOpVp79PjJ7tP63rPnL17uH7y61jJTBHpEMqluI6yBUQE9Qw2D21QB5hGDm2h8XvKbe1CaSnFl8hQGHA8FTSjBpgyFAiZ3+41Ws7WwYNtpL50GWtrl3UH1OIwlyTgIQxjWut9upWZgsTKUMCjqYaYhxWSMh9B3rsAc9MAuDlsEhy4SB4lU7hMmWETXMyzmWuc8ckqOzUj7rAz+i5kRL/xY2UUXwcI2mBMbKZnezCjrbof7mUk+DiwVaWZAkIdbJBkLjAzKsQYxVUAMy52DiaJuEAEZYYWJccOvHwZ+ixTPZVHfaMHxGJSU3E0KCwLMHUDAlEjOsYhtSEVc2LDMjSL7ufDoFOhwZP4KEjv1BbPZCs58mOcrmPtwPl/B+Vbf6X966lQsKcHMft3GRgGsKa58RaezRjs+BaXcg7LhPVaQasqk8ASx25+1Ap/8AjEkMCkLkPIlu4coTvwWE6cp7MmaxFNczNLVb7nwO3S7K9gtCrdqbX+xtp3r9832h+bRl6PG6dly6XbRG/QWvUNtdIxOUQddoh4iaIS+oe/oR/Vn9XcN1SoP0srOMuc12rDa3h+bAYPc</latexit>
Training distribution
33
ff
Introduction
Domain adaptation: Training phase
Source domain:
···
(4, 8, 5)
+
Target domain:
(7, 3, 0)
··· 34
Introduction
Domain adaptation: Training phase
Source domain:
(4, 8, 5) (7, 3, 0)
+
Target domain:
Classi er
35
fi
Introduction
Domain adaptation: Testing phase
Target domain:
···
(4, 0, 1) (0, 4, 2)
36
The Net ix Prize Competition
Net ix’s goal: better movie recommendation
37
fl
fl
The Net ix Prize Competition
Movie recommendation: collaborative ltering
Goal: given (sparse) existing ratings from the seed users, how to
complete this user-movie rating matrix?
38
fl
fi
The Net ix Prize Competition
The Net ix Prize:
- An open competition for the best collaborative ltering algorithm to
predict user ratings for movies
- Data: ~100M ratings, ~480K users, ~18K movies
- In the context data, no other information about the users or movies are
available
- Grand prize $1M by BellKor’s Pragmatic Chaos team: ensemble model
(matrix factorization) using gradient boosted decision trees (GBDT).
Beating Net ix’s own algorithm by more than 10%
39
fl
fl
fl
fi
The Net ix Prize Competition
Unfortunately, this naive form of anonymization was insu cient
This discovery led to a class action lawsuit against Net ix, and the cancellation
of a sequel competition
“Robust De-anonymization of Large Sparse Datasets”, Narayanan and Shmatikov, IEEE S&P’ 08 40
fl
fl
fi
ffi
Memorization in Neural Networks
What if we instead just release some function or model of the
dataset?
- This only gives a restricted view of the dataset
- Perhaps this partial release prevents it from revealing private
information about the data used to train the model?
41
Model Interpretability
Problem: machine learning models (esp. neural networks) can be
a black-box
42
Model Interpretability
Example: credit lending with a black-box ML model
43
Model Interpretability
Black-box AI creates confusion and doubt, hence reducing its
trustworthiness
Examples:
- Attribute an object recognition network’s prediction to its pixels
- Attribute a text sentiment network’s prediction to individual words
- Attribute a credit scoring model’s prediction to its features
45
Summary
- Course overview, syllabus, covered topics
- Real-world examples on fairness, robustness, privacy
and explanation
46