SlideShare a Scribd company logo
Pilot Study on
Semi-Automated Patch Diffing by
Applying Machine-Learning Techniques
Asuka Nakajima (@AsuNa_jp)
NTT Secure Platform Laboratories
ROOTCON 13
2
#whoami
 Asuka Nakajima (@AsuNa_jp)
Security Researcher at NTT Secure Platform Laboratories
Vulnerability Discovery / Reverse Engineering / IoT Security
 Founder of “CTF for GIRLS”
First female infosec community in Japan (est.2014)
 Black Hat Asia Review Board
From 2018-Present
 Veteran Conference/Event Speaker
BlackHatUSA 2019, AsiaCCS 2019, AIS3 2018/2016, PHDays IV, SECCON,etc
3
Agenda
Background
Extracting Security Fix Patterns Using
Unsupervised Machine Learning Algorithm*
Classifying Security Fixes and Other Fixes*
Conclusion
PART 1
PART 2
*Original Paper
Asuka Nakajima, Ren Kimura, Yuhei Kawakoya, Makoto Iwamura, Takeo
Hariu, “An Investigation of Method to Assist Identification of Patched Part of the Vulnerable Software Based on Patch
Diffing” Multimedia, Distributed, Cooperative, and Mobile Symposium, June 2017, Japan
4
What is Patch Diffing?
Before Patched After Patched
Identify vulnerable part & Create 1-day exploit
Compare
5
Example: CVE-2006-4691 (MS06-70)
Before Patched (netapi32.dll) After Patched (netapi32.dll)
if ( !v5 )
{
_wcscpy(&Dest, L"");
v6 = (wchar_t *)&v24;
}
if ( _wcslen(Str) > 0x101 ){
NetpLogPrintHelper(“NetpManageIPCConnect:
server name %ws too long
- error outn“, (char)Str);
return 87;
}
if ( *Str != 92 ){
_wcscpy(&Dest, L"");
v4 = (wchar_t *)&v24;
}
Assembly
Pseudo Code
Assembly
Pseudo Code
Stacked-Based Buffer overflow in NetpManageIPCConnect Function
Windows
Security Check
6
Tools for Patch Diffing
However, patch diffing is still a difficult task
because it requires deep knowledge and experience
 Bindiff (Zynamics)
 https://www.zynamics.com/bindiff.html
 Turbodiff (Core SECURITY)
 https://www.coresecurity.com/corelabs-research/open-source-tools/turbodiff
 Diaphora (Joxean Koret)
 http://diaphora.re/
Acquired by Google
Semi-automated patch diffing
7
Previous Work
Machine learning techniques could be applied
 DarunGrim
 Shows the candidate functions that
security fixes might have been applied
 Approach
 Use heuristics pattern-matching rules to identify the candidate functions
 These patterns are manually defined by the developer
Pattern Type Score
cmp Opcode +1
test Opcode +1
0xFFFFFFF Immediate Value +3
wcslen Function Name +2
strlen Function Name +2
StringCchCopyW Function Name +2
ULongLongToUlong Function Name +2
DarunGrim (Jeongwook Oh)
Extracting Security Fix Patterns Using
Unsupervised Machine Learning Algorithm
PART 1
9
Hypothesis
@@ -672,10 +675,6 @@ static int do_ssl3_write(SSL *s,
+ if (wb->buf == NULL)
+ if (!ssl3_setup_write_buffer(s))
+ return -1;
+
if (len == 0 && !create_empty_fragment)
return 0;
CVE-2014-0198
@@ -92,8 +92,6 @@ X509_REQ *X509_to_X509_REQ(X509 *x,
pktmp = X509_get_pubkey(x);
+ if (pktmp == NULL)
+ goto err;
i = X509_REQ_set_pubkey(ret, pktmp);
EVP_PKEY_free(pktmp);
CVE-2015-0288
Similar Types of Vulnerabilities
will be Fixed in a Similar Manner
Null Pointer Dereference
Extract Fix Patterns Using Unsupervised
Machine Learning Algorithm (Cluster Analysis)
 Occurs when a program
attempts to read or write
to memory with a NULL
pointer
 Check weather the
pointer is NULL or not
10
Challenges
Challenge 1 : Optimization
Challenge 2 : Other Fixes May Have Been Applied
1. Basic Block Reordering
2. Instruction Reordering
3. Operand Changes
4. Inline Expansion / Loop Unrolling
11
Challenge1: Basic Block ReorderingChallenge 1
CVE-ID Program1 (Before Patched) Program2 (After Patched)
CVE-
2015-
1788
call 0x81031d0 <BN_copy>
test eax,eax
setnebl
jmp 0x820337a <BN_GF2m_mod_inv+970>
mov eax,DWORD PTR [esp+0x38]
mov DWORD PTR [esp+0x4],edi
mov DWORD PTR [esp],eax
call 0x8102f90 <bn_expand2>
jmp 0x8203149 <BN_GF2m_mod_inv+409>
lea eax,[esp+0x58]
mov edx,eax
jmp 0x820339a <BN_GF2m_mod_inv+1002>
shl ecx,0x5
mov DWORD PTR [esp+0x20],ecx
jmp 0x8203307 <BN_GF2m_mod_inv+855>
mov eax,DWORD PTR [esp+0x30]
mov DWORD PTR [esp+0x4],eax
mov eax,DWORD PTR [esp+0x3c]
mov DWORD PTR [esp],eax
call 0x8102f90 <bn_expand2>
jmp 0x82031dd <BN_GF2m_mod_inv+557>
mov eax,DWORD PTR [esp+0x30]
mov DWORD PTR [esp+0x4],eax
mov eax,DWORD PTR [esp+0x34]
mov DWORD PTR [esp],eax
call 0x8102f90 <bn_expand2>
jmp 0x8203190 <BN_GF2m_mod_inv+480>
call 0x81031d0 <BN_copy>
test eax,eax
setnebl
jmp 0x820338a <BN_GF2m_mod_inv+986>
mov eax,DWORD PTR [esp+0x30]
mov DWORD PTR [esp+0x4],eax
mov eax,DWORD PTR [esp+0x3c]
mov DWORD PTR [esp],eax
call 0x8102f90 <bn_expand2>
jmp 0x82031dd <BN_GF2m_mod_inv+557>
mov eax,DWORD PTR [esp+0x30]
mov DWORD PTR [esp+0x4],eax
mov eax,DWORD PTR [esp+0x34]
mov DWORD PTR [esp],eax
call 0x8102f90 <bn_expand2>
jmp 0x8203190 <BN_GF2m_mod_inv+480>
mov eax,DWORD PTR [esp+0x38]
mov DWORD PTR [esp+0x4],edi
mov DWORD PTR [esp],eax
call 0x8102f90 <bn_expand2>
jmp 0x8203149 <BN_GF2m_mod_inv+409>
lea eax,[esp+0x58]
mov edx,eax
jmp 0x82033aa <BN_GF2m_mod_inv+1018>
shl ecx,0x5
mov DWORD PTR [esp+0x20],ecx
jmp 0x8203317 <BN_GF2m_mod_inv+871>​​​​​​​
12
Challenge1: Instruction ReorderingChallenge 1
CVE-ID Program1 Program2
CVE-
2015-1789
mov [esp+44h+var_4], eax
push esi
mov ebp, [esp+4Ch+arg_4]
push esi
mov esi, [esp+4Ch+arg_0]
mov ecx, [esi]
mov eax, [esi + 8]
push edi
mov edi, [esi+4]
cmp edi, 17h
mov [esp+44h+var_4], eax
push esi
mov ebp, [esp+4Ch+arg_4]
push esi
mov esi, [esp+4Ch+arg_0]
push edi
mov edi, [esi+4]
mov ecx, [esi]
mov eax, [esi + 8]
cmp edi, 17h
IDA Pro
13
Challenge1: Operand Changes
CVE-ID Program1 Program2
CVE-
2008-5023
xor ebx, ebx
add rsp, 38h
mov eax, ebx
pop rbx
pop rbp
pop r12
pop r13
retn
xor r12d, r12d
add rsp, 38h
mov eax, r12d
pop rbx
pop rbp
pop r12
pop r13
retn
Register is different (ebx -> r12d)
Challenge 1
14
Challenges1: Inline Expansion/Loop UnrollingChallenge 1
Source code Program1 (Before) Program2 (After)
Inline
Expansion
void my_print(int n){
printf("%d", n);
}
int main(){
int n = 1;
my_print(n);
return 0;
}
<main>:
push ebp
mov ebp,esp
sub esp,0x4
mov DWORD PTR [ebp-0x4],0x1
push DWORD PTR [ebp-0x4]
call 804840b <my_print>
add esp,0x4
mov eax,0x0
leave
ret
<main>:
push 0x1
push 0x80484e0
push 0x1
call 8048310 <__printf_chk@plt>
add esp,0xc
xor eax,eax
ret
Loop
Unrolling
int main(){
int i;
for(i = 0; i < 3; i++){
printf("HelloWorld!");
}
return 0;
}
<main>:
push ebp
mov ebp,esp
sub esp,0x4
mov DWORD PTR [ebp-0x4],0x0
jmp 804842b <main+0x20>
push 0x80484c0
call 80482e0 <printf@plt>
add esp,0x4
add DWORD PTR [ebp-0x4],0x1
cmp DWORD PTR [ebp-0x4],0x2
jle 804841a <main+0xf>
<main>:
push 0x80484d0
push 0x1
call 8048310 <__printf_chk@plt>
push 0x80484d0
push 0x1
call 8048310 <__printf_chk@plt>
push 0x80484d0
push 0x1
call 8048310 <__printf_chk@plt>
15
Challenge 2
 Other fixes may have been applied
1. Bug Fixes
2. Refactoring
3. Feature Updates
16
Experiment Overview
Dataset
 Target Software: OpenSSL 1.0.1
Collected 62 Security Fixes
 Cluster Analysis
 Hierarchical Clustering Algorithm
Extract security fix patterns which could be used
to support the semi-automated patch diffing
GOAL
17
Data Collection Method [1/3]
 OpenSSL 1.0.1 (git / 4675a56 (openssl 1.0.1 stable)
CVE-
ID Type Hash value
CVE-
2012-
2110
Before
Patched
d36e0ee460f41d6b64015
455c4f5414a319865c3
After
Patched
8d5505d099973a06781b7
e0e5b65861859a7d994
CVE-
2016-
6304
Before
Patched
151adf2e5cc23284a059e0
f155505006a1c9fad9
After
Patched
2c0d295e26306e15a92eb
23a84a1802005c1c137
@@ -260,7 +265,11 @@ int BN_dec2bn(BIGNUM **bn, const char *a)
a++;
}
- for (i = 0; isdigit((unsigned char)a[i]); i++) ;
+ for (i = 0; i <= (INT_MAX/4) && isdigit((unsigned char)a[i]); i++)
+ continue;
+
+ if (i > INT_MAX/4)
Compile/Disassemble before & after patched source code
Diff the source code &
Identify the patched part (Function)
Analyze Commit Log &
Release note*
*OpenSSL 1.0.1 Series Release Notes
https://web.archive.org/web/20170208161711/https://www.openssl.org/news/openssl-1.0.1-notes.html
STEP 1 STEP 2
STEP 3
18
STEP 1 STEP 2
Extract the increased
instructions
Normalize the
instructions
After
Normalization
mov
mov
cmp
jne
test
je
push
jmp
lea
pop
…
trans
trans
cmp
jump
cmp
jump
stack
jump
lea
stack
…
# of occurrences
(each instruction)
trans: 2
cmp: 2
jump: 3
stack:2
lea: 1
…
Before
Normalization
Count the number of
occurrences of each
normalized instruction
trans
trans
cmp
jump
cmp
jump
stack
jump
lea
stack
…
STEP 3
mov
mov
cmp
jne
test
je
push
jmp
lea
pop
…
Before
Patched
After
Patched
push
mov
sub
jmp
…
push
mov
sub
+ mov
+ mov
jmp
…
Data Collection Method [2/3]
 Feature Extraction Method
Normalized
Instructions
Increased
Instructions
19
Normalized
Instruction
Type of Instruction Target Instuctions
jump Branch jns, jle, jne, jge, jae, jmp, js, jl, je, jg, ja, jb, jbe
trans Data Transfer movxz, mov, movsx, xchg, cdq
ctrans Conditional Data Transfer cmovge, cmovae, cmovs, cmovns, cmove, cmovne
stack Stack Manipulation push,pop
logical Logical Operation and, xor, or, not
arith Arithmetic Operation sub, add, imul, neg, adc
nop No Operation nop
bop Bit/Byte Operation bt, setne, sete
shift Shift Operation shr, shl, sar
func Function Operation call, ret
str String Operation repz *
cmp Comparison test, cmp
lea Address Computation lea
 Opcode (Instruction) Normalization
 Summarized and expressed the instructions that fall into similar
categories by one (normalized) instruction
 e.g.) Branch instruction such as jns,jle,jne,jge are normalized as “jump”
 Normalized the instructions which appeared in the security fixes (Function)
Data Collection Method [3/3]
20
Cluster Analysis [1/2]
Divides data into groups that are meaningful/useful
Before Clustering After Clustering
Cluster 1
Cluster 2
Cluster 3
21
Cluster Analysis [2/2]
 Hierarchical Clustering Algorithm
 Produce a classification in which small clusters of very similar data points
are nested within larger clusters of less closely-related data points*
 e.g.) Agglomerative Hierarchical Clustering
 Non-Hierarchical Clustering Algorithm
 Generates a classification by partitioning dataset*
 e.g.) K-means Clustering
*Hierarchical and non-Hierarchical Clustering
https://www.daylight.com/meetings/mug96/barnard/E-MUG95.html
Divides data into groups that are meaningful/useful
22
Cluster Analysis [2/2]
 Hierarchical Clustering Algorithm
 Produce a classification in which small clusters of very similar data points
are nested within larger clusters of less closely-related data points*
 e.g.) Agglomerative Hierarchical Clustering
*Hierarchical and non-Hierarchical Clustering
https://www.daylight.com/meetings/mug96/barnard/E-MUG95.html
Divides data into groups that are meaningful/useful
Euclidean Distance
&
Ward’s Method
d(A,B) = E(AUB) – E(P) – E(Q)
Cluster A
Cluster B
23
CWE [1/2]
Classic Buffer Over flow
 CWE (Common Weakness Enumeration)
 List of Software Weakness Types
 Gives a unique identifier (CWE-ID) to each types
 e.g, CWE-120:Buffer Copy without Checking Size of Input
 Latest Version:3.4 / Total 808 Weaknesses.
https://cwe.mitre.org/data/definitions/120.html
Used as a label
for each data
24
CWE [2/2]
CWE organizes a wide variety of
weakness types in a hierarchical structure
*CWE Overview, IPA
https://www.ipa.go.jp/security/english/vuln/CWE_en.html
 The weakness types at higher
levels in the structure gives a more
abstract and broader concept*
 Structure Types
 Development Concepts
 Research Concepts
 Architectural Concepts
[Parent] CWE-119 -> [Child] CWE-120
25
CWE [2/2]
CWE organizes a wide variety of
weakness types in a hierarchical structure
*CWE Overview, IPA
https://www.ipa.go.jp/security/english/vuln/CWE_en.html
 The weakness types at higher
levels in the structure gives a more
abstract and broader concept*
 Structure Types
 Development Concepts
 Research Concepts
 Architectural Concepts
CWE-ID Description
CWE-17 Code
CWE-19 Data Processing Error
CWE-254 Security Features
CWE-361 Time and State
CWE-398 Indicator of Poor Code Quality
CWE-399 Resource Management Errors
Used these root
CWE-IDs as labels
[Parent] CWE-119 -> [Child] CWE-120
26
Result
② ①
Cluster 2
Cluster 1
Dendrogram
CVE-ID + Label (CWE-ID)
27
Details of the Cluster 1
CWE-ID CVE-ID
Feature Vectors
(Normalized Instruction:# of Occurrences)
CWE-19 CVE-2016-6306 jump: 9, trans: 7, cmp: 6, lea: 4, arith: 3, func: 1
CWE-19 CVE-2016-0797 jump: 7, lea: 4, cmp: 4, trans: 2, arith: 2
CWE-19 CVE-2015-0206 jump: 7, trans: 5, func: 4, stack: 4, cmp: 4, arith: 2, lea: 1
CWE-19 CVE-2014-3508
jump: 9, trans: 5, cmp: 5, stack: 4, nop: 3, lea: 2, arith: 2,
bop: 1, func: 1
CWE-398 CVE-2014-5139
jump: 7, stack: 4, cmp: 3, logical: 2, trans: 2, nop: 2,
func': 1
Most of the labels are CWE-19 (Data Processing Error)
 Most of the vulnerabilities in this cluster are related to the memory or value
manipulation error, which was not initially expected by developers
• e.g.) Out-of-bounds read, info-leak, integer overflow)
 A certain number of Comparison/Branch/Arithmetic Operation instructions exist
Summary
28
CVE-2016-0797
Patched Part of CVE-2016-0797
@@ -190,11 +189,7 @@ int BN_hex2bn(BIGNUM **bn,
}
+ for (i = 0; i <= (INT_MAX/4) &&
isxdigit((unsigned char)a[i]); i++)
+ continue;
+
+ if (i > INT_MAX/4)
+ goto err;
- for (i = 0; isxdigit((unsigned char)a[i]); i++) ;
Added a check to confirm the
integer value is under the expected upper limit
Integer Overflow Vulnerability
Will be used in bn_expand(ret, i*4)
29
CWE-ID CVE-ID
Feature Vectors:
(Normalized Instruction:# of Occurrences)
CWE-254 CVE-2015-1793 trans: 4, jump: 3, ctrans: 1, nop: 1, func: 1
CWE-254 CVE-2014-3567 trans: 4, jump: 2, func: 1
CWE-254 CVE-2014-3470 trans: 4, jump: 2, nop: 2, lea: 1, func: 1, cmp: 1
CWE-254 CVE-2015-0205 trans: 5, func: 1
CWE-254 CVE-2014-0224 trans: 5, jump: 3, logical: 2, cmp: 1
CWE-19 CVE-2014-0195 trans: 6, jump: 3, cmp: 1
Details of the Cluster 2
Most of the labels are CWE-254 (Security Features)
 Most of the security fixes for the vulnerabilities in this cluster contain some sort
of error handling function
 A certain number of Data Transfer/Branch/Function related instructions exist
Summary
30
CVE-2014-3470
Patched Part of CVE-2014-3470
@@ -2512,13 +2512,6 @@ int ssl3_send_client_key_exchange
int field_size = 0;
+ if (s->session->sess_cert == NULL)
+ {
+ ssl3_send_alert(s,SSL3_AL_FATAL,
SSL_AD_UNEXPECTED_MESSAGE);
+ SSLerr(SSL_F_SSL3_SEND_CLIENT_KEY_EXCHANGE,
SSL_R_UNEXPECTED_MESSAGE);
+ goto err;
+ }
Added two error handling function + exit the function
NULL Pointer Dereference
31
Discussions
 Why Only Two Clusters?
 Some vulnerabilities are found in multiple functions
 Similar functions contain same vulnerability
 How to Improve
 Include other features such as function name?
 Collect more security fixes
 Use vulnerability corpus generation tools? (e.g, LAVA)
 Use other machine learning techniques
 For Semi-Automated Patch-Diffing
 Calculate the similarity between the extracted security fix patterns
(instructions) and the difference (increased instructions) found by the
patch diffing
*LAVA: Large-scale Automated Vulnerability Addition
https://www.andreamambretti.com/files/papers/oakland2016_lava.pdf
We count the number of
occurrences of normalized instructions
Centroid
Classifying Security Fixes and Other Fixes
PART 2
33
Classifying Security Fixes and Other Fixes
 Dataset
 OpenSSL 1.0.1 (62 Security Fixes / 377 Other fixes)
 Classification Method
 Supervised Linear Classifier
 Soft Margin Support Vector Machine (SVM)
 Kernel: RBF (C=10, γ = 0.001)
 Experiment
 Used 62 Security Fixes and 62 Other fixes (Random sampling)
 Conducted 10-fold Cross-Validation 3 times
• Perform random sampling for each cross-validation
 Environment
 OS: Ubuntu 14.04, Compiler: gcc 5.4.0
34
Support Vector Machine (SVM)
Method used for classification (+regression) tasks
Before After
A
A
A
A
B
B
BB
BB
B
A
BA
B
A
A
A
A
A
B
B
BB
BB
B
A
BA
B
A
A A
35
Result
Type of Fix Dataset 1 Dataset 2 Dataset 3 Average
Accuracy 0.62 0.54 0.54 0.56
Precision
Security Fix 0.70 0.57 0.57 0.61
Other 0.59 0.54 0.54 0.55
Recall
Security Fix 0.42 0.49 0.42 0.41
Other 0.82 0.71 0.68 0.73
F-Score
Security Fix 0.53 0.46 0.44 0.47
Other 0.68 0.61 0.63 0.64
36
Summary of Result
 Summary
 Overall Accuracy : 56% (average)
 Security Fixes: Precision 61% / Recall 41% (average)
 Other Fixes: Precision 55% / Recall 73% (average)
 Discussions
 Use other metrics? (e.g., Cyclomatic complexity)
Accuracy Ratio of the number of correctly labeled fixes to the number of all fixes in the dataset
Precision
Ratio of the number of correctly labeled security fixes to the number of all fixes
labeled as “security fix” by the program
Recall
Ratio of the number of correctly labeled security fixes to the number of all security
fixes in the dataset
F-Score Harmonic mean of the Precision and Recall
Glossary
37
Summary & Conclusion
 Patch diffing is still a difficult task because it requires
a deep knowledge and experience
✔
 Extracted security fix patterns which could be used
to support the semi-automated patch diffing
 Conducted an experiment to see if it is possible to
distinguish between security fixes and other fixes
✔
✔
Provided insights for future research
related to the semi-automated patch diffing
38
Appendix [1/3]
 Original Paper
Asuka Nakajima, Ren Kimura, Yuhei Kawakoya, Makoto Iwamura, Takeo Hariu, “An
Investigation of Method to Assist Identification of Patched Part of the Vulnerable
Software Based on Patch Diffing” Multimedia, Distributed, Cooperative, and Mobile
Symposium, June 2017, Japan
Download URL
https://ipsj.ixsq.nii.ac.jp/ej/?action=repository_uri&item_id=190132&file
_id=1&file_no=1
39
Appendix [2/3]
 Other Research (1)
 Asuka Nakajima, Takuya Watanabe, Eitaro Shioji, Mitsuaki Akiyama, and
Maverick Woo “A Pilot Study on Consumer IoT Device Vulnerability
Disclosure and Patch Release in Japan and the United States”
Proceedings of the 14th ACM ASIA Conference on Information, Computer and
Communications Security (ASIA CCS 2019)
 [PDF] https://www.cylab.cmu.edu/_files/pdfs/tech_reports/CMUCyLab19001.pdf
Revealed Significant 1-Day Risk Related to IoT
ASIA CCS 2019
Example: CVE-2017-7852
Patch Release
Timeline
DCS-932L RevA
2015/Nov/18
DCS-932L RevA
2016/Jul/19
244 Days Vendor: D-Link, Product: Network Camera
1-Day Risk: Unsynchronized Patch Release (Geographical Arbitrage)
40
Appendix [3/3]
 Other Activities (Female InfoSec Community)
 CTF for GIRLS: http://girls.seccon.jp (Twitter:@ctf4g)
 Asuka Nakajima, Suhee Kang, Hazel Yen, “Women in Security:
Building a Female InfoSec Community in Korea, Japan, and Taiwan”,
BlackHatUSA 2019
Women-Only CTF Workshop Talk about Asian Female InfoSec Community

More Related Content

What's hot (20)

PPTX
Cisco IOS shellcode: All-in-one
DefconRussia
 
PDF
深入淺出C語言
Simen Li
 
PPTX
Дмитрий Демчук. Кроссплатформенный краш-репорт
Sergey Platonov
 
PDF
Csw2016 gong pwn_a_nexus_device_with_a_single_vulnerability
CanSecWest
 
PPTX
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Positive Hack Days
 
PDF
Коварный code type ITGM #9
Andrey Zakharevich
 
PDF
Metaprogramming and Reflection in Common Lisp
Damien Cassou
 
PPTX
Process management
Utkarsh Kulshrestha
 
PDF
ITGM #9 - Коварный CodeType, или от segfault'а к работающему коду
delimitry
 
PDF
Advanced cfg bypass on adobe flash player 18 defcon russia 23
DefconRussia
 
PDF
Inside Winnyp
FFRI, Inc.
 
PPT
Евгений Крутько, Многопоточные вычисления, современный подход.
Platonov Sergey
 
PDF
Windbg랑 친해지기
Ji Hun Kim
 
PDF
Implementing Lightweight Networking
guest6972eaf
 
PPTX
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
PDF
Java, Up to Date Sources
輝 子安
 
PDF
Kirk Shoop, Reactive programming in C++
Sergey Platonov
 
PDF
20190521 pwn 101_by_roy
Roy
 
DOC
Network security mannual (2)
Vivek Kumar Sinha
 
PDF
[HITB Malaysia 2011] Exploit Automation
Moabi.com
 
Cisco IOS shellcode: All-in-one
DefconRussia
 
深入淺出C語言
Simen Li
 
Дмитрий Демчук. Кроссплатформенный краш-репорт
Sergey Platonov
 
Csw2016 gong pwn_a_nexus_device_with_a_single_vulnerability
CanSecWest
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Positive Hack Days
 
Коварный code type ITGM #9
Andrey Zakharevich
 
Metaprogramming and Reflection in Common Lisp
Damien Cassou
 
Process management
Utkarsh Kulshrestha
 
ITGM #9 - Коварный CodeType, или от segfault'а к работающему коду
delimitry
 
Advanced cfg bypass on adobe flash player 18 defcon russia 23
DefconRussia
 
Inside Winnyp
FFRI, Inc.
 
Евгений Крутько, Многопоточные вычисления, современный подход.
Platonov Sergey
 
Windbg랑 친해지기
Ji Hun Kim
 
Implementing Lightweight Networking
guest6972eaf
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
Java, Up to Date Sources
輝 子安
 
Kirk Shoop, Reactive programming in C++
Sergey Platonov
 
20190521 pwn 101_by_roy
Roy
 
Network security mannual (2)
Vivek Kumar Sinha
 
[HITB Malaysia 2011] Exploit Automation
Moabi.com
 

Similar to [ROOTCON13] Pilot Study on Semi-Automated Patch Diffing by Applying Machine-Learning Techniques (20)

PDF
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Anne Nicolas
 
PDF
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Anne Nicolas
 
PPT
Georgy Nosenko - An introduction to the use SMT solvers for software security
DefconRussia
 
PDF
Mathematically Guaranteed C and C++ Code
Pauline Schellenberger
 
PDF
Fuzzing - Part 1
UTD Computer Security Group
 
PDF
Secure Coding Practices for Middleware
Manuel Brugnoli
 
PDF
Marat-Slides
Marat Vyshegorodtsev
 
PPT
E-Commerce Security - Application attacks - Server Attacks
phanleson
 
PDF
Offensive cyber security: Smashing the stack with Python
Malachi Jones
 
PDF
Davide Berardi - Linux hardening and security measures against Memory corruption
linuxlab_conf
 
PPTX
Hypercritical C++ Code Review
Andrey Karpov
 
PDF
Ch 18: Source Code Auditing
Sam Bowne
 
PDF
A Boring Article About a Check of the OpenSSL Project
Andrey Karpov
 
PPT
Detecting and Preventing Memory Attacks#
gwarloki1
 
PDF
[ENG] Hacktivity 2013 - Alice in eXploitland
Zoltan Balazs
 
PPTX
Static analysis and writing C/C++ of high quality code for embedded systems
Andrey Karpov
 
PPTX
Mykhailo Zarai "Be careful when dealing with C++" at Rivne IT Talks
Vadym Muliavka
 
PDF
Chapter 2 program-security
Vamsee Krishna Kiran
 
PDF
Fuzzing: Finding Your Own Bugs and 0days! at Arab Security Conference
Rodolpho Concurde
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Anne Nicolas
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Anne Nicolas
 
Georgy Nosenko - An introduction to the use SMT solvers for software security
DefconRussia
 
Mathematically Guaranteed C and C++ Code
Pauline Schellenberger
 
Fuzzing - Part 1
UTD Computer Security Group
 
Secure Coding Practices for Middleware
Manuel Brugnoli
 
Marat-Slides
Marat Vyshegorodtsev
 
E-Commerce Security - Application attacks - Server Attacks
phanleson
 
Offensive cyber security: Smashing the stack with Python
Malachi Jones
 
Davide Berardi - Linux hardening and security measures against Memory corruption
linuxlab_conf
 
Hypercritical C++ Code Review
Andrey Karpov
 
Ch 18: Source Code Auditing
Sam Bowne
 
A Boring Article About a Check of the OpenSSL Project
Andrey Karpov
 
Detecting and Preventing Memory Attacks#
gwarloki1
 
[ENG] Hacktivity 2013 - Alice in eXploitland
Zoltan Balazs
 
Static analysis and writing C/C++ of high quality code for embedded systems
Andrey Karpov
 
Mykhailo Zarai "Be careful when dealing with C++" at Rivne IT Talks
Vadym Muliavka
 
Chapter 2 program-security
Vamsee Krishna Kiran
 
Fuzzing: Finding Your Own Bugs and 0days! at Arab Security Conference
Rodolpho Concurde
 
Ad

More from Asuka Nakajima (7)

PDF
[Dagstuhl Seminar 17281] Similarity Calculation Method for Binary Executables
Asuka Nakajima
 
PDF
技術紹介: S2E: Selective Symbolic Execution Engine
Asuka Nakajima
 
PDF
[JPCERT/CC POC Meeting] 研究紹介 + DLLハイジャックの脆弱性
Asuka Nakajima
 
PDF
[CSS×2.0 2014] Polyglotシェルコードの最高記録に挑戦しよう☆
Asuka Nakajima
 
PDF
[セキュリティ・キャンプフォーラム 2014] 卒業生プレゼンテーション 『私とセキュリティと過去と未来』
Asuka Nakajima
 
PDF
[AsiaCCS2019] A Pilot Study on Consumer IoT Device Vulnerability Disclosure a...
Asuka Nakajima
 
PDF
2014年10月江戸前セキュリティ勉強会資料 -セキュリティ技術者になるには-
Asuka Nakajima
 
[Dagstuhl Seminar 17281] Similarity Calculation Method for Binary Executables
Asuka Nakajima
 
技術紹介: S2E: Selective Symbolic Execution Engine
Asuka Nakajima
 
[JPCERT/CC POC Meeting] 研究紹介 + DLLハイジャックの脆弱性
Asuka Nakajima
 
[CSS×2.0 2014] Polyglotシェルコードの最高記録に挑戦しよう☆
Asuka Nakajima
 
[セキュリティ・キャンプフォーラム 2014] 卒業生プレゼンテーション 『私とセキュリティと過去と未来』
Asuka Nakajima
 
[AsiaCCS2019] A Pilot Study on Consumer IoT Device Vulnerability Disclosure a...
Asuka Nakajima
 
2014年10月江戸前セキュリティ勉強会資料 -セキュリティ技術者になるには-
Asuka Nakajima
 
Ad

Recently uploaded (20)

PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PDF
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PPTX
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
PPTX
WHO And BIS std- for water quality .pptx
dhanashree78
 
PDF
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PPTX
Introduction to Python Programming Language
merlinjohnsy
 
PPTX
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
PPTX
Computer network Computer network Computer network Computer network
Shrikant317689
 
PPTX
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
PDF
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
PDF
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
PDF
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
PDF
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego López-de-Ipiña González-de-Artaza
 
PDF
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
PPT
FINAL plumbing code for board exam passer
MattKristopherDiaz
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
WHO And BIS std- for water quality .pptx
dhanashree78
 
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
Introduction to Python Programming Language
merlinjohnsy
 
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
Computer network Computer network Computer network Computer network
Shrikant317689
 
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego López-de-Ipiña González-de-Artaza
 
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
FINAL plumbing code for board exam passer
MattKristopherDiaz
 

[ROOTCON13] Pilot Study on Semi-Automated Patch Diffing by Applying Machine-Learning Techniques

  • 1. Pilot Study on Semi-Automated Patch Diffing by Applying Machine-Learning Techniques Asuka Nakajima (@AsuNa_jp) NTT Secure Platform Laboratories ROOTCON 13
  • 2. 2 #whoami  Asuka Nakajima (@AsuNa_jp) Security Researcher at NTT Secure Platform Laboratories Vulnerability Discovery / Reverse Engineering / IoT Security  Founder of “CTF for GIRLS” First female infosec community in Japan (est.2014)  Black Hat Asia Review Board From 2018-Present  Veteran Conference/Event Speaker BlackHatUSA 2019, AsiaCCS 2019, AIS3 2018/2016, PHDays IV, SECCON,etc
  • 3. 3 Agenda Background Extracting Security Fix Patterns Using Unsupervised Machine Learning Algorithm* Classifying Security Fixes and Other Fixes* Conclusion PART 1 PART 2 *Original Paper Asuka Nakajima, Ren Kimura, Yuhei Kawakoya, Makoto Iwamura, Takeo Hariu, “An Investigation of Method to Assist Identification of Patched Part of the Vulnerable Software Based on Patch Diffing” Multimedia, Distributed, Cooperative, and Mobile Symposium, June 2017, Japan
  • 4. 4 What is Patch Diffing? Before Patched After Patched Identify vulnerable part & Create 1-day exploit Compare
  • 5. 5 Example: CVE-2006-4691 (MS06-70) Before Patched (netapi32.dll) After Patched (netapi32.dll) if ( !v5 ) { _wcscpy(&Dest, L""); v6 = (wchar_t *)&v24; } if ( _wcslen(Str) > 0x101 ){ NetpLogPrintHelper(“NetpManageIPCConnect: server name %ws too long - error outn“, (char)Str); return 87; } if ( *Str != 92 ){ _wcscpy(&Dest, L""); v4 = (wchar_t *)&v24; } Assembly Pseudo Code Assembly Pseudo Code Stacked-Based Buffer overflow in NetpManageIPCConnect Function Windows Security Check
  • 6. 6 Tools for Patch Diffing However, patch diffing is still a difficult task because it requires deep knowledge and experience  Bindiff (Zynamics)  https://www.zynamics.com/bindiff.html  Turbodiff (Core SECURITY)  https://www.coresecurity.com/corelabs-research/open-source-tools/turbodiff  Diaphora (Joxean Koret)  http://diaphora.re/ Acquired by Google Semi-automated patch diffing
  • 7. 7 Previous Work Machine learning techniques could be applied  DarunGrim  Shows the candidate functions that security fixes might have been applied  Approach  Use heuristics pattern-matching rules to identify the candidate functions  These patterns are manually defined by the developer Pattern Type Score cmp Opcode +1 test Opcode +1 0xFFFFFFF Immediate Value +3 wcslen Function Name +2 strlen Function Name +2 StringCchCopyW Function Name +2 ULongLongToUlong Function Name +2 DarunGrim (Jeongwook Oh)
  • 8. Extracting Security Fix Patterns Using Unsupervised Machine Learning Algorithm PART 1
  • 9. 9 Hypothesis @@ -672,10 +675,6 @@ static int do_ssl3_write(SSL *s, + if (wb->buf == NULL) + if (!ssl3_setup_write_buffer(s)) + return -1; + if (len == 0 && !create_empty_fragment) return 0; CVE-2014-0198 @@ -92,8 +92,6 @@ X509_REQ *X509_to_X509_REQ(X509 *x, pktmp = X509_get_pubkey(x); + if (pktmp == NULL) + goto err; i = X509_REQ_set_pubkey(ret, pktmp); EVP_PKEY_free(pktmp); CVE-2015-0288 Similar Types of Vulnerabilities will be Fixed in a Similar Manner Null Pointer Dereference Extract Fix Patterns Using Unsupervised Machine Learning Algorithm (Cluster Analysis)  Occurs when a program attempts to read or write to memory with a NULL pointer  Check weather the pointer is NULL or not
  • 10. 10 Challenges Challenge 1 : Optimization Challenge 2 : Other Fixes May Have Been Applied 1. Basic Block Reordering 2. Instruction Reordering 3. Operand Changes 4. Inline Expansion / Loop Unrolling
  • 11. 11 Challenge1: Basic Block ReorderingChallenge 1 CVE-ID Program1 (Before Patched) Program2 (After Patched) CVE- 2015- 1788 call 0x81031d0 <BN_copy> test eax,eax setnebl jmp 0x820337a <BN_GF2m_mod_inv+970> mov eax,DWORD PTR [esp+0x38] mov DWORD PTR [esp+0x4],edi mov DWORD PTR [esp],eax call 0x8102f90 <bn_expand2> jmp 0x8203149 <BN_GF2m_mod_inv+409> lea eax,[esp+0x58] mov edx,eax jmp 0x820339a <BN_GF2m_mod_inv+1002> shl ecx,0x5 mov DWORD PTR [esp+0x20],ecx jmp 0x8203307 <BN_GF2m_mod_inv+855> mov eax,DWORD PTR [esp+0x30] mov DWORD PTR [esp+0x4],eax mov eax,DWORD PTR [esp+0x3c] mov DWORD PTR [esp],eax call 0x8102f90 <bn_expand2> jmp 0x82031dd <BN_GF2m_mod_inv+557> mov eax,DWORD PTR [esp+0x30] mov DWORD PTR [esp+0x4],eax mov eax,DWORD PTR [esp+0x34] mov DWORD PTR [esp],eax call 0x8102f90 <bn_expand2> jmp 0x8203190 <BN_GF2m_mod_inv+480> call 0x81031d0 <BN_copy> test eax,eax setnebl jmp 0x820338a <BN_GF2m_mod_inv+986> mov eax,DWORD PTR [esp+0x30] mov DWORD PTR [esp+0x4],eax mov eax,DWORD PTR [esp+0x3c] mov DWORD PTR [esp],eax call 0x8102f90 <bn_expand2> jmp 0x82031dd <BN_GF2m_mod_inv+557> mov eax,DWORD PTR [esp+0x30] mov DWORD PTR [esp+0x4],eax mov eax,DWORD PTR [esp+0x34] mov DWORD PTR [esp],eax call 0x8102f90 <bn_expand2> jmp 0x8203190 <BN_GF2m_mod_inv+480> mov eax,DWORD PTR [esp+0x38] mov DWORD PTR [esp+0x4],edi mov DWORD PTR [esp],eax call 0x8102f90 <bn_expand2> jmp 0x8203149 <BN_GF2m_mod_inv+409> lea eax,[esp+0x58] mov edx,eax jmp 0x82033aa <BN_GF2m_mod_inv+1018> shl ecx,0x5 mov DWORD PTR [esp+0x20],ecx jmp 0x8203317 <BN_GF2m_mod_inv+871>​​​​​​​
  • 12. 12 Challenge1: Instruction ReorderingChallenge 1 CVE-ID Program1 Program2 CVE- 2015-1789 mov [esp+44h+var_4], eax push esi mov ebp, [esp+4Ch+arg_4] push esi mov esi, [esp+4Ch+arg_0] mov ecx, [esi] mov eax, [esi + 8] push edi mov edi, [esi+4] cmp edi, 17h mov [esp+44h+var_4], eax push esi mov ebp, [esp+4Ch+arg_4] push esi mov esi, [esp+4Ch+arg_0] push edi mov edi, [esi+4] mov ecx, [esi] mov eax, [esi + 8] cmp edi, 17h IDA Pro
  • 13. 13 Challenge1: Operand Changes CVE-ID Program1 Program2 CVE- 2008-5023 xor ebx, ebx add rsp, 38h mov eax, ebx pop rbx pop rbp pop r12 pop r13 retn xor r12d, r12d add rsp, 38h mov eax, r12d pop rbx pop rbp pop r12 pop r13 retn Register is different (ebx -> r12d) Challenge 1
  • 14. 14 Challenges1: Inline Expansion/Loop UnrollingChallenge 1 Source code Program1 (Before) Program2 (After) Inline Expansion void my_print(int n){ printf("%d", n); } int main(){ int n = 1; my_print(n); return 0; } <main>: push ebp mov ebp,esp sub esp,0x4 mov DWORD PTR [ebp-0x4],0x1 push DWORD PTR [ebp-0x4] call 804840b <my_print> add esp,0x4 mov eax,0x0 leave ret <main>: push 0x1 push 0x80484e0 push 0x1 call 8048310 <__printf_chk@plt> add esp,0xc xor eax,eax ret Loop Unrolling int main(){ int i; for(i = 0; i < 3; i++){ printf("HelloWorld!"); } return 0; } <main>: push ebp mov ebp,esp sub esp,0x4 mov DWORD PTR [ebp-0x4],0x0 jmp 804842b <main+0x20> push 0x80484c0 call 80482e0 <printf@plt> add esp,0x4 add DWORD PTR [ebp-0x4],0x1 cmp DWORD PTR [ebp-0x4],0x2 jle 804841a <main+0xf> <main>: push 0x80484d0 push 0x1 call 8048310 <__printf_chk@plt> push 0x80484d0 push 0x1 call 8048310 <__printf_chk@plt> push 0x80484d0 push 0x1 call 8048310 <__printf_chk@plt>
  • 15. 15 Challenge 2  Other fixes may have been applied 1. Bug Fixes 2. Refactoring 3. Feature Updates
  • 16. 16 Experiment Overview Dataset  Target Software: OpenSSL 1.0.1 Collected 62 Security Fixes  Cluster Analysis  Hierarchical Clustering Algorithm Extract security fix patterns which could be used to support the semi-automated patch diffing GOAL
  • 17. 17 Data Collection Method [1/3]  OpenSSL 1.0.1 (git / 4675a56 (openssl 1.0.1 stable) CVE- ID Type Hash value CVE- 2012- 2110 Before Patched d36e0ee460f41d6b64015 455c4f5414a319865c3 After Patched 8d5505d099973a06781b7 e0e5b65861859a7d994 CVE- 2016- 6304 Before Patched 151adf2e5cc23284a059e0 f155505006a1c9fad9 After Patched 2c0d295e26306e15a92eb 23a84a1802005c1c137 @@ -260,7 +265,11 @@ int BN_dec2bn(BIGNUM **bn, const char *a) a++; } - for (i = 0; isdigit((unsigned char)a[i]); i++) ; + for (i = 0; i <= (INT_MAX/4) && isdigit((unsigned char)a[i]); i++) + continue; + + if (i > INT_MAX/4) Compile/Disassemble before & after patched source code Diff the source code & Identify the patched part (Function) Analyze Commit Log & Release note* *OpenSSL 1.0.1 Series Release Notes https://web.archive.org/web/20170208161711/https://www.openssl.org/news/openssl-1.0.1-notes.html STEP 1 STEP 2 STEP 3
  • 18. 18 STEP 1 STEP 2 Extract the increased instructions Normalize the instructions After Normalization mov mov cmp jne test je push jmp lea pop … trans trans cmp jump cmp jump stack jump lea stack … # of occurrences (each instruction) trans: 2 cmp: 2 jump: 3 stack:2 lea: 1 … Before Normalization Count the number of occurrences of each normalized instruction trans trans cmp jump cmp jump stack jump lea stack … STEP 3 mov mov cmp jne test je push jmp lea pop … Before Patched After Patched push mov sub jmp … push mov sub + mov + mov jmp … Data Collection Method [2/3]  Feature Extraction Method Normalized Instructions Increased Instructions
  • 19. 19 Normalized Instruction Type of Instruction Target Instuctions jump Branch jns, jle, jne, jge, jae, jmp, js, jl, je, jg, ja, jb, jbe trans Data Transfer movxz, mov, movsx, xchg, cdq ctrans Conditional Data Transfer cmovge, cmovae, cmovs, cmovns, cmove, cmovne stack Stack Manipulation push,pop logical Logical Operation and, xor, or, not arith Arithmetic Operation sub, add, imul, neg, adc nop No Operation nop bop Bit/Byte Operation bt, setne, sete shift Shift Operation shr, shl, sar func Function Operation call, ret str String Operation repz * cmp Comparison test, cmp lea Address Computation lea  Opcode (Instruction) Normalization  Summarized and expressed the instructions that fall into similar categories by one (normalized) instruction  e.g.) Branch instruction such as jns,jle,jne,jge are normalized as “jump”  Normalized the instructions which appeared in the security fixes (Function) Data Collection Method [3/3]
  • 20. 20 Cluster Analysis [1/2] Divides data into groups that are meaningful/useful Before Clustering After Clustering Cluster 1 Cluster 2 Cluster 3
  • 21. 21 Cluster Analysis [2/2]  Hierarchical Clustering Algorithm  Produce a classification in which small clusters of very similar data points are nested within larger clusters of less closely-related data points*  e.g.) Agglomerative Hierarchical Clustering  Non-Hierarchical Clustering Algorithm  Generates a classification by partitioning dataset*  e.g.) K-means Clustering *Hierarchical and non-Hierarchical Clustering https://www.daylight.com/meetings/mug96/barnard/E-MUG95.html Divides data into groups that are meaningful/useful
  • 22. 22 Cluster Analysis [2/2]  Hierarchical Clustering Algorithm  Produce a classification in which small clusters of very similar data points are nested within larger clusters of less closely-related data points*  e.g.) Agglomerative Hierarchical Clustering *Hierarchical and non-Hierarchical Clustering https://www.daylight.com/meetings/mug96/barnard/E-MUG95.html Divides data into groups that are meaningful/useful Euclidean Distance & Ward’s Method d(A,B) = E(AUB) – E(P) – E(Q) Cluster A Cluster B
  • 23. 23 CWE [1/2] Classic Buffer Over flow  CWE (Common Weakness Enumeration)  List of Software Weakness Types  Gives a unique identifier (CWE-ID) to each types  e.g, CWE-120:Buffer Copy without Checking Size of Input  Latest Version:3.4 / Total 808 Weaknesses. https://cwe.mitre.org/data/definitions/120.html Used as a label for each data
  • 24. 24 CWE [2/2] CWE organizes a wide variety of weakness types in a hierarchical structure *CWE Overview, IPA https://www.ipa.go.jp/security/english/vuln/CWE_en.html  The weakness types at higher levels in the structure gives a more abstract and broader concept*  Structure Types  Development Concepts  Research Concepts  Architectural Concepts [Parent] CWE-119 -> [Child] CWE-120
  • 25. 25 CWE [2/2] CWE organizes a wide variety of weakness types in a hierarchical structure *CWE Overview, IPA https://www.ipa.go.jp/security/english/vuln/CWE_en.html  The weakness types at higher levels in the structure gives a more abstract and broader concept*  Structure Types  Development Concepts  Research Concepts  Architectural Concepts CWE-ID Description CWE-17 Code CWE-19 Data Processing Error CWE-254 Security Features CWE-361 Time and State CWE-398 Indicator of Poor Code Quality CWE-399 Resource Management Errors Used these root CWE-IDs as labels [Parent] CWE-119 -> [Child] CWE-120
  • 26. 26 Result ② ① Cluster 2 Cluster 1 Dendrogram CVE-ID + Label (CWE-ID)
  • 27. 27 Details of the Cluster 1 CWE-ID CVE-ID Feature Vectors (Normalized Instruction:# of Occurrences) CWE-19 CVE-2016-6306 jump: 9, trans: 7, cmp: 6, lea: 4, arith: 3, func: 1 CWE-19 CVE-2016-0797 jump: 7, lea: 4, cmp: 4, trans: 2, arith: 2 CWE-19 CVE-2015-0206 jump: 7, trans: 5, func: 4, stack: 4, cmp: 4, arith: 2, lea: 1 CWE-19 CVE-2014-3508 jump: 9, trans: 5, cmp: 5, stack: 4, nop: 3, lea: 2, arith: 2, bop: 1, func: 1 CWE-398 CVE-2014-5139 jump: 7, stack: 4, cmp: 3, logical: 2, trans: 2, nop: 2, func': 1 Most of the labels are CWE-19 (Data Processing Error)  Most of the vulnerabilities in this cluster are related to the memory or value manipulation error, which was not initially expected by developers • e.g.) Out-of-bounds read, info-leak, integer overflow)  A certain number of Comparison/Branch/Arithmetic Operation instructions exist Summary
  • 28. 28 CVE-2016-0797 Patched Part of CVE-2016-0797 @@ -190,11 +189,7 @@ int BN_hex2bn(BIGNUM **bn, } + for (i = 0; i <= (INT_MAX/4) && isxdigit((unsigned char)a[i]); i++) + continue; + + if (i > INT_MAX/4) + goto err; - for (i = 0; isxdigit((unsigned char)a[i]); i++) ; Added a check to confirm the integer value is under the expected upper limit Integer Overflow Vulnerability Will be used in bn_expand(ret, i*4)
  • 29. 29 CWE-ID CVE-ID Feature Vectors: (Normalized Instruction:# of Occurrences) CWE-254 CVE-2015-1793 trans: 4, jump: 3, ctrans: 1, nop: 1, func: 1 CWE-254 CVE-2014-3567 trans: 4, jump: 2, func: 1 CWE-254 CVE-2014-3470 trans: 4, jump: 2, nop: 2, lea: 1, func: 1, cmp: 1 CWE-254 CVE-2015-0205 trans: 5, func: 1 CWE-254 CVE-2014-0224 trans: 5, jump: 3, logical: 2, cmp: 1 CWE-19 CVE-2014-0195 trans: 6, jump: 3, cmp: 1 Details of the Cluster 2 Most of the labels are CWE-254 (Security Features)  Most of the security fixes for the vulnerabilities in this cluster contain some sort of error handling function  A certain number of Data Transfer/Branch/Function related instructions exist Summary
  • 30. 30 CVE-2014-3470 Patched Part of CVE-2014-3470 @@ -2512,13 +2512,6 @@ int ssl3_send_client_key_exchange int field_size = 0; + if (s->session->sess_cert == NULL) + { + ssl3_send_alert(s,SSL3_AL_FATAL, SSL_AD_UNEXPECTED_MESSAGE); + SSLerr(SSL_F_SSL3_SEND_CLIENT_KEY_EXCHANGE, SSL_R_UNEXPECTED_MESSAGE); + goto err; + } Added two error handling function + exit the function NULL Pointer Dereference
  • 31. 31 Discussions  Why Only Two Clusters?  Some vulnerabilities are found in multiple functions  Similar functions contain same vulnerability  How to Improve  Include other features such as function name?  Collect more security fixes  Use vulnerability corpus generation tools? (e.g, LAVA)  Use other machine learning techniques  For Semi-Automated Patch-Diffing  Calculate the similarity between the extracted security fix patterns (instructions) and the difference (increased instructions) found by the patch diffing *LAVA: Large-scale Automated Vulnerability Addition https://www.andreamambretti.com/files/papers/oakland2016_lava.pdf We count the number of occurrences of normalized instructions Centroid
  • 32. Classifying Security Fixes and Other Fixes PART 2
  • 33. 33 Classifying Security Fixes and Other Fixes  Dataset  OpenSSL 1.0.1 (62 Security Fixes / 377 Other fixes)  Classification Method  Supervised Linear Classifier  Soft Margin Support Vector Machine (SVM)  Kernel: RBF (C=10, γ = 0.001)  Experiment  Used 62 Security Fixes and 62 Other fixes (Random sampling)  Conducted 10-fold Cross-Validation 3 times • Perform random sampling for each cross-validation  Environment  OS: Ubuntu 14.04, Compiler: gcc 5.4.0
  • 34. 34 Support Vector Machine (SVM) Method used for classification (+regression) tasks Before After A A A A B B BB BB B A BA B A A A A A B B BB BB B A BA B A A A
  • 35. 35 Result Type of Fix Dataset 1 Dataset 2 Dataset 3 Average Accuracy 0.62 0.54 0.54 0.56 Precision Security Fix 0.70 0.57 0.57 0.61 Other 0.59 0.54 0.54 0.55 Recall Security Fix 0.42 0.49 0.42 0.41 Other 0.82 0.71 0.68 0.73 F-Score Security Fix 0.53 0.46 0.44 0.47 Other 0.68 0.61 0.63 0.64
  • 36. 36 Summary of Result  Summary  Overall Accuracy : 56% (average)  Security Fixes: Precision 61% / Recall 41% (average)  Other Fixes: Precision 55% / Recall 73% (average)  Discussions  Use other metrics? (e.g., Cyclomatic complexity) Accuracy Ratio of the number of correctly labeled fixes to the number of all fixes in the dataset Precision Ratio of the number of correctly labeled security fixes to the number of all fixes labeled as “security fix” by the program Recall Ratio of the number of correctly labeled security fixes to the number of all security fixes in the dataset F-Score Harmonic mean of the Precision and Recall Glossary
  • 37. 37 Summary & Conclusion  Patch diffing is still a difficult task because it requires a deep knowledge and experience ✔  Extracted security fix patterns which could be used to support the semi-automated patch diffing  Conducted an experiment to see if it is possible to distinguish between security fixes and other fixes ✔ ✔ Provided insights for future research related to the semi-automated patch diffing
  • 38. 38 Appendix [1/3]  Original Paper Asuka Nakajima, Ren Kimura, Yuhei Kawakoya, Makoto Iwamura, Takeo Hariu, “An Investigation of Method to Assist Identification of Patched Part of the Vulnerable Software Based on Patch Diffing” Multimedia, Distributed, Cooperative, and Mobile Symposium, June 2017, Japan Download URL https://ipsj.ixsq.nii.ac.jp/ej/?action=repository_uri&item_id=190132&file _id=1&file_no=1
  • 39. 39 Appendix [2/3]  Other Research (1)  Asuka Nakajima, Takuya Watanabe, Eitaro Shioji, Mitsuaki Akiyama, and Maverick Woo “A Pilot Study on Consumer IoT Device Vulnerability Disclosure and Patch Release in Japan and the United States” Proceedings of the 14th ACM ASIA Conference on Information, Computer and Communications Security (ASIA CCS 2019)  [PDF] https://www.cylab.cmu.edu/_files/pdfs/tech_reports/CMUCyLab19001.pdf Revealed Significant 1-Day Risk Related to IoT ASIA CCS 2019 Example: CVE-2017-7852 Patch Release Timeline DCS-932L RevA 2015/Nov/18 DCS-932L RevA 2016/Jul/19 244 Days Vendor: D-Link, Product: Network Camera 1-Day Risk: Unsynchronized Patch Release (Geographical Arbitrage)
  • 40. 40 Appendix [3/3]  Other Activities (Female InfoSec Community)  CTF for GIRLS: http://girls.seccon.jp (Twitter:@ctf4g)  Asuka Nakajima, Suhee Kang, Hazel Yen, “Women in Security: Building a Female InfoSec Community in Korea, Japan, and Taiwan”, BlackHatUSA 2019 Women-Only CTF Workshop Talk about Asian Female InfoSec Community