Adventures in .NET references

Weak referencing is a really useful feature when you don’t mind if an object is deleted, but you might still potentially want to access it again in future. For those of you who aren’t familiar with the concept of weak referencing, I’ll describe it briefly. If you already know how it works then you can skip ahead.

.NET is a garbage collected language, meaning that objects you create on the heap (e.g. with new) are automatically cleared up by the garbage collector (GC) when they are no longer being used. The definition of “being used” is implemented as a reference count. Here’s an example:

// we create an object instance and assign it to the variable 'foo'
// the instance now has one reference
var foo = new object();

// now we assign the value of foo (the instance) to bar
// the instance now has two references
var bar = foo;

// now we set foo to null.
// the instance now has one reference (bar)
foo = null;

When a variable goes out of scope it no longer exists, so the reference counter is decremented. When the reference count for an object instance drops to zero the GC is free to finalize (destroy) it. GC does this in passes and uses a generation-based model to periodically clean up objects without references. This means that an object may exist on the heap for some time after the reference count drops to zero. Incidentally, this is why SecureString exists – if you put sensitive data into a string object there is no guarantee when, or even if, that string will be erased from memory. Strings are also immutable in .NET, so you can’t manually overwrite them.

What I haven’t mentioned so far is that there are two types of reference – strong and weak. What I’ve talked about so far refers to strong references. Weak references are a special type of reference that still allow the GC to finalize the object, but also still allow the code to access (and create strong references to) the object if the GC has not yet finalized it. This is useful for caching because the GC will automatically “evict” (finalize) objects based on recent accesses and memory pressure.

Mixing weak references with lazy initialisation

In some cases you may not know if a code path is going to require access to a particular object, or if it will be accessed just once or multiple times. If that object takes up quite a bit of memory on the heap it may be prohibitive to keep it around. You could opt to manually handle this with a caching scheme, but a mixture of lazy initialisation and weak referencing allows this situation to be handled in a way that avoids allocation in the first place when the object isn’t used, and automatically manage caching of that object based on memory pressure and age via the GC.

I ran into this situation when I wanted to parse the PE headers and various structures of a lot of executable files, then run a battery of tests against each. Most tests only access a few different sections of the executable, and some tests do not run at all against some files (e.g. some tests only run on 64-bit binaries). The parsed data can take up quite a bit of memory – particularly import tables and disassembled code – but it’s not expensive to regenerate the data, so it makes sense to only initialise it when we need it, and get rid of it if we’re running short of memory. For the latter we can use weak referencing, but for the former we want lazy initialisation. Luckily both of these features are available in the .NET framework and are thread-safe by default.

For convenience I created a helper class that combines WeakReference with Lazy<T>, into WeakLazy<T>:

public class WeakLazy<T> where T : class
{
    readonly Func<T> _constructor;
    readonly Lazy<WeakReference> _lazyRef;

    public WeakLazy(Func<T> constructor)
    {
        _constructor = constructor;
        _lazyRef = new Lazy<WeakReference>(() => new WeakReference(_constructor()));
    }

    public bool IsAlive
    {
        get
        {
            if (!_lazyRef.IsValueCreated)
                return false;
            return _lazyRef.Value.IsAlive;
        }
    }

    public T Value
    {
        get
        {
            T obj = (T)_lazyRef.Value.Target;

            // if the object still exists, return that
            if (_lazyRef.Value.IsAlive)
                return obj;

            // object didn't exist so we need to create it again
            obj = _constructor();
            _lazyRef.Value.Target = obj;
            return obj;
        }
    }
}

This is fairly simple – when we access the Value property it initialises the object (this is the Lazy<T> functionality) and wraps it inside a WeakReference in order to keep the strong reference count at zero.

Here’s an example of how you might use it:

var peHeader = new WeakLazy<PEHeader>(() => new PEHeader(_file));

...

if (Is64bit)
{
    if (peHeader.Value.ImageBase < 0x100000000UL)
        Report.AddIssue(IssueMessages.MissingHiASLR, ...);
}

...

if (SomeOtherCondition)
{
    // some other access here
    if (peHeader.Value.??? ... )
        // ...?
}

In the first line we create a WeakLazy<T> wrapper around a PEHeader class, which represents the parsed PE (or “Optional”) header from some input file. At this point there is no PEHeader instance as its initialisation is lazy.

If the executable is 64-bit we check for HiASLR by validating that the base address is above the 4GB boundary. If the branch is taken we reference peHeader.Value, which triggers lazy instantiation of the PEHeader object via the lambda we passed on the first line.

Later we potentially access peHeader.Value again later, at which there are three possible cases. The first case is that the original branch was not taken (not a 64-bit exe) so the PEHeader gets created for the first time. The second case is that the original branch was taken and the underlying PEHeader object still exists, so we just access it. The third case is that the original branch was taken, but a GC pass occurred between the first and second access, so the object was finalized in the interim, so it gets recreated.

Unit testing WeakLazy<T>

The above all look correct, so let’s cover things off with some unit tests! The first couple of tests verify that lazy instantiation works as intended:

class TestObject
{
    public TestObject()
    {
        Bar = 123;
    }
    
    public void Foo() { }

    public int Bar { get; set; }
}

[TestMethod]
public void TestInstantiateViaMethod()
{
    var wl = new WeakLazy<TestObject>(() => new TestObject());
    Assert.IsFalse(wl.IsAlive);
    wl.Value.Foo();
    Assert.IsTrue(wl.IsAlive);
}

[TestMethod]
public void TestInstantiateViaProperty()
{
    var wl = new WeakLazy<TestObject>(() => new TestObject());
    Assert.IsFalse(wl.IsAlive);
    Assert.AreEqual(wl.Value.Bar, 123);
    Assert.IsTrue(wl.IsAlive);
}

These tests pass without problems. Next we want to test that weak referencing works:

[TestMethod]
public void TestWeakReferenceFinalize()
{
    var wl = new WeakLazy<TestObject>(() => new TestObject());
    wl.Value.Foo();
    Assert.IsTrue(wl.IsAlive);

    const int BLOWN = 1024;
    int fuse = 0;
    while (wl.IsAlive)
    {
        GC.Collect();
        if (++fuse == BLOWN)
            Assert.Fail("GC did not clear object.");
    }
}

This test first instantiates the object, then forces GC collection repeatedly (up to 1024 times) to make sure the object gets finalized. This test fails – the loop repeats until the assertion failure is hit. Can you see why? Here’s a hint: this unit test fails when the program is built as Debug, but not as Release.

Compiler magic or deeper behaviour?

What you might assume is that the compiler captures the result of wl.get_Value() into a local variable, thus “trapping” a reference to the TestObject instance. If you take a look at the compiled IL, you’ll find that this isn’t the case at all – the generated code is essentially the same barring some extra nops and unoptimised stloc/ldloc pairs in the debug code. In fact I spent quite a bit of time getting all confused about what was happening.

My first guess was that a strong reference was being kept in the CLR’s evaluation stack somewhere, but Visual Studio doesn’t allow you to see the evaluation stack in the CLR. I tried digging into this with mdbg but didn’t get much information out of that either. In the end I had to go hardcore and load up WinDbg.

It turns out that WinDbg has pretty solid support for .NET and CLR process internals via the sos extension. This extension comes inbuilt with WinDbg, but you have to load it manually with .loadby sos clr. Once this is done you can start using the CLR debugging features. I found this cheat sheet to be incredibly helpful.

First I manually modified my code to include some pauses – just some Console.ReadKey calls – then verified that my changes did not alter the behaviour I observed previously. After that I used !threads to find the correct managed thread and switch to it. From there I inspected the stack with !clrstack to verify that everything is as I expected, with no weird calls out to magic debugging methods or anything out of place. At this point it made sense to just directly check what the GC was holding onto, using !gchandles:

          Handle  Type                 Object  Size    Data Type
000002d4945615e8  WeakShort  000002d496196da8    24    PolyutilsTests.WeakLazyTests+TestObject

...

              MT    Count    TotalSize  Class Name
00007ff85ddc6c20        1           24  PolyutilsTests.WeakLazyTests+TestObject

From this we can see that only a weak handle to the object exists, so there isn’t anything holding it back. Yet, despite this, the Debug build of this program refuses to finalize the object that we have a weak reference to, whereas the Release build gets rid of it without problems.

Debug vs. Release assemblies

At this point I was convinced that this is a CLR behaviour unique to debug builds of the application, but not because of the generated IL. Opening up the Debug and Release binaries in JustDecompile showed me a difference in the flags applied to the DebuggableAttribute applied to the assembly.

From the Debug assembly:

[assembly: Debuggable(DebuggableAttribute.DebuggingModes.Default | DebuggableAttribute.DebuggingModes.DisableOptimizations | DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints | DebuggableAttribute.DebuggingModes.EnableEditAndContinue)]

From the Release assembly:

[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]

Using the Reflexil plugin, I modified the Debug assembly’s DebuggableAttribute to match the Release assembly, and re-ran the test harness. This time it completed just fine, proving that this is a CLR behaviour directly related to debugging.

But which of these flags causes this difference in behaviour? For that I needed to go through and unset each flag, one by one, until the test passed. I immediately hit paydirt on my first try – removing the Default flag from the assembly made the test pass, even with the other options there. This doesn’t really make much sense to me, as the reference source says:

Default: Instructs the just-in-time (JIT) compiler to use its default behavior, which includes enabling optimizations, disabling Edit and Continue support, and using symbol store sequence points if present. In the .NET Framework version 2.0, JIT tracking information, the Microsoft intermediate language (MSIL) offset to the native-code offset within a method, is always generated.

The only behaviour I can see that is potentially relevant is further up in the same class:

/// <summary>Gets a value that indicates whether the runtime will track information during code generation for the debugger.</summary>
/// <returns>true if the runtime will track information during code generation for the debugger; otherwise, false.</returns>
/// <filterpriority>2</filterpriority>
public bool IsJITTrackingEnabled
{
    get
    {
        return (this.m_debuggingModes & DebuggableAttribute.DebuggingModes.Default) != DebuggableAttribute.DebuggingModes.None;
    }
}

I’m still not sure if JIT tracking is the cause or if it’s something else.

I found an issue on the CoreCLR project where they ran into the same problem as me, although it didn’t really shed much light on the subject other than informing me that the JIT can arbitrarily extend object lifetimes.

Conclusion

Builds with the Default flag set on the DebuggableAttribute for the assembly seem to force the GC to ignore weak handles that are held by the currently executing method. As for why, I’m not sure, but it might be due to JIT tracking being enabled.

Fixing this is easy – just move the object access to its own method, and mark it with MethodImplOptions.NoInlining to force the object lifetime to be contained separately and not inlined into the calling method:

[TestMethod]
public void TestWeakReferenceFinalize()
{
    var wl = new WeakLazy<TestObject>(() => new TestObject());
    AccessTestObject(wl);
    Assert.IsTrue(wl.IsAlive);

    const int BLOWN = 1024;
    int fuse = 0;
    while (wl.IsAlive)
    {
        GC.Collect();
        if (++fuse == BLOWN)
        Assert.Fail("GC did not clear object.");
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
private void AccessTestObject(WeakLazy<TestObject> wl)
{
    wl.Value.Foo();
}

This causes the unit test to pass on both Debug and Release builds.

Advertisements

Anti-debug with VirtualAlloc’s write watch

A lesser-known feature of the Windows memory manager is that it can maintain write watches on allocations for debugging and profiling purposes. Passing the MEM_WRITE_WATCH flag to VirtualAlloc “causes the system to track pages that are written to in the allocated region”. The GetWriteWatch and ResetWriteWatch APIs can be used to manage the watch counter. This can be (ab)used to catch out debuggers and hooks that modify memory outside the expected pattern.

There are four primary ways to exploit this feature.

The first is a simple buffer manipulation check. Allocate a buffer with write watching enabled, write to it once, get the write count, and see if it’s greater than 1.

The second is an API buffer manipulation check. Allocate a buffer with write watching enabled, pass it as a parameter to an API that expects a buffer, but pass invalid values to other parameters. If an API hook doesn’t check parameters properly, or manipulates parameters, it may write to the buffer. Check the number of writes to the buffer after the call, and if it’s nonzero then there’s a hook in place. Any API will do as long as it writes to some memory. A particularly good trick is to use an API where there’s some kind of count value passed as a reference – in the real API the value will likely not be set, thus producing no memory writes, but in a hook there’s a bigger likelihood that they’ll set some placeholder value regardless.

Third, we can use the buffer to store the result of some check we care about, e.g. IsDebuggerPresent. If the write count is one and the value in the buffer is FALSE then we can assume that there’s no debugger attached and nobody tampered with the result of the call (or skipped the call).

Finally, we can allocate some memory with RWX protection and write watching enabled, copy some anti-debug check there, call ResetWriteWatch to ensure the write counter is zeroed, execute our payload, then check the write count.

Obviously in all cases these checks themselves can be skipped over, but it’s not a well known trick and may be missed by novice reverse engineers.

I’ve contributed these tricks to al-khaser, a tool for testing VMs, debuggers, sandboxes, AV, etc. against many malware-like defences.

ASUS UEFI Update Driver Physical Memory Read/Write

A short while ago, slipstream/RoL dropped an exploit for the ASUS memory mapping driver (ASMMAP/ASMMAP64) which was vulnerable to complete physical memory access (read/write) to unprivileged users, allowing for local privilege escalation and all sorts of other problems. An aside to this was that there were also IOCTLs available to perform direct I/O operations (in/out instructions) directly from unprivileged usermode, which had additional interesting impacts for messing with system firmware without triggering AV heuristics.

When the PoC was released, I noted that I’d reported this to ASUS a while beforehand, later clarifying that I’d actually reported it to them in March 2015. To be fair to ASUS, they were initially very responsive via email, and particularly via Twitter DM on the @ASUSUK account, and it seems like they had some bad luck with both the engineer working on the fixes and the customer support advisor handling my tickets leaving the company, resulting in a significant delay in the triage and patch processes. However, promises to keep me in the loop were not kept, and I was always chasing them up for answers.

In addition to the ASMMAP bugs, I also reported the exact same bugs in their UEFI update driver (AsUpIO.sys). This driver is deployed as part of the usermode UEFI update toolset, and exposes almost identical functionality which (as slipstream/RoL pointed out) is likely from an NT3.1 example driver that was written long before Microsoft took steps to segregate malicious users from physical memory in any meaningful way.

One additional piece of functionality which I believe was missed from the original ASMMAP vulnerability release was the ability to read/write Model Specific Registers (MSRs) as an unprivileged user. This was, again, a function exposed as an IOCTL in the driver. For those of you not versed in MSRs, they’re implementation-specific registers which contain control and status values for the processor and supporting components (e.g. SMM). You can read more about them in chapter 35 of the Intel 64 Architecture Software Developer’s Manual Volume 3. MSRs are particularly powerful registers in that they offer the ability to enable or disable all sorts of internal functionality on the processor, and are at least theoretically capable of bricking hardware if you abuse them in the wrong way. One of the most interesting MSRs is the Extended Feature Enable Register (EFER) at index 0xC0000080, which contains the No-eXecute Enable (NXE) and Secure Virtual Machine Enable (SVME) bits. Switching the NXE bit off on a live VM in VirtualBox crashes the VM with a “Guru Mediation” error (there’s an age cutoff on people who will get that reference), which I suppose is a novel anti-VM trick on its own, not to mention the intended behaviour of switching off NX on real-steel hardware.

Rather than just providing you with a bit of PoC code, I thought I’d take the opportunity to go through exactly how I discovered the bugs and what approach I took towards reliable exploitation.

Generally speaking, Windows drivers have a number of interfaces through which usermode code may communicate with them. The most important are I/O Request Packets (IRPs), which are sent to a driver when code performs a particular operation on the driver’s device object. The exposed functions which IRPs are sent to are known as Major Functions, examples of which include open, close, read, write, and I/O control (otherwise known as IOCTL). The descriptor structure for a driver object contains an array of function pointers, each pointing to a dispatch function for a major function. These are fantastic targets for bug-hunting in drivers, since they’re usermode-accessible (and often accessible from non-admin accounts) and can often result in local privilege escalation to kernelmode.

The first key thing to look for is whether or not the driver object is accessible as a low-privilege user. It’s all well and good finding a bug which gets you kernel code execution, but if you’ve got to be an admin to exploit it it’s a bit of a non-issue. When a driver goes through its initialisation steps, it usually names itself and creates a device object using the IoCreateDevice API, and a symbolic link to the DosDevices object directory using the IoCreateSymbolicLink API. An example is as follows:

NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegistryPath)
{
    NTSTATUS status = STATUS_SUCCESS;
    PDEVICE_OBJECT pDeviceObject = NULL;
    UNICODE_STRING driverName, dosDeviceName;
    
    RtlInitUnicodeString(&driverName, L"\\Device\\Example");
    RtlInitUnicodeString(&dosDeviceName, L"\\DosDevices\\Example"); 
    
    status = IoCreateDevice(pDriverObject, 0,
                            &driverName, 
                            FILE_DEVICE_UNKNOWN,
                            FILE_DEVICE_SECURE_OPEN, 
                            FALSE, &pDeviceObject);
    
    // ...
    
    IoCreateSymbolicLink(&dosDeviceName, &driverName);
    
    // ...
    
    return status;
}

In order to check whether or not the driver’s device object is accessible by low-privilege users, we need to know what name it picked for itself. There are a few approaches to this: we could debug the system and breakpoint on the IoCreateDevice API; we could reverse engineer the driver using a tool such as IDA; or we could simply extract all the strings from the binary and look for any that start with “\\Device\\”.

In the case of AsUpIO.sys, dropping it into IDA shows that it does exactly the above, using the name “AsUpdateio”:

AsUpIO_DriverEntry

This now tells us exactly what we should be looking for. In order to inspect the device object and view its discretionary access control list (DACL), we can use WinObj.

AsUpIO_Object_PoorDACL

As we can see here, the Everyone group is given both Read, Write, and Special permissions, allowing the device object to be directly interacted with from low-privilege usermode. Note that these ACEs are not set by the driver; this is a somewhat “hardened” permissions set applied by an up-to-date Windows 10 install, although it is still accessible by everyone. In Windows 8 and earlier, setting a the DACL to null simply results in it having no ACEs, allowing everyone complete access, unless you apply a hardened security policy. This is because, prior to Win10, the root object namespace had no inheritable ACEs.

The snippet above also gives us the address of the HandleIoControl function which is assigned to handle the Create, Close, and IOCTL major functions. Reverse engineering this shows the IOCTL number which is used for mapping memory:

AsUpIO_JumpMemoryMap

(note: the ASUS_MemMap name was set by me; I renamed it after analysing each function in this set of branches to work out their functions)

Since we now have control over the driver, we now want to exploit the bugs. In this case the IOCTL for doing the memory mapping is 0xA040244Ch, which can be found by reverse engineering the HandleIoControl routine we found above. Just as in the original slipstream/RoL exploit, this IOCTL can be used to map any physical memory section to usermode. The downside, from an exploitation perspective, is that the function covers a wide range of potential memory locations, including addresses where the HAL has to translate to bus addresses rather than the usual physical memory. This is fine if we know a specific location we want to map and access, but it becomes a bit fraught if we want to read through all of physical memory; trying to map and read an area of memory reserved for certain hardware might crash or lock up the system, and that’s not much use for a privesc.

The approach I took was to find a specific location in kernel memory which I knew was safe, then map and read that, and use that single operation to gain the ability to reliably read memory over the lifetime of the system. The ideal object to gain control over is \\Device\PhysicalMemory, as this gives us direct usermode access to the physical memory. The first hurdle is that we need a kernel pointer leak to identify the physical address of that object’s descriptor.

First, we want to know what processes have a handle to this object. By running Process Explorer as and administrator (we don’t need to do this in an actual attack scenario), we can see that the System process keeps a handle to it open:

SystemProcess_PhysicalMemoryHandle

Using an undocumented feature of the NtQuerySystemInformation API, i.e. the SystemHandleInformation information class, we can pull out information about every single handle open on the system. The returned structure for each handle looks like the following:

typedef struct _SYSTEM_HANDLE_INFORMATION
{
    DWORD    dwProcessId;
    BYTE     bObjectType;
    BYTE     bFlags;
    WORD     wValue;
    PVOID    pAddress;
    DWORD    GrantedAccess;
}
SYSTEM_HANDLE;

The pAddress field points to the kernel memory address of the object’s descriptor. By enumerating through all open handles on the system and checking for dwProcessId=4 (i.e. the System process) and bObjectType matching the object type ID of a section (this is different between Windows version), we can find all the sections open by the System process, one of which we know will be \\Device\\PhysicalMemory. In fact, System only has three handles open to sections in Windows 10, so we can just give ourselves access to all of them and not worry too much.

Of course, now that we have the address of the section descriptor in kernel memory, we still need to actually take control of that section object somehow. Let’s take a look at the header structure for the object, 0x30 bytes before the section descriptor, in WinDbg:

0: kd> dt nt!_OBJECT_HEADER 0xffffc001`cca13bd0-0x30
   +0x000 PointerCount     : 0n65537
   +0x008 HandleCount      : 0n2
   +0x008 NextToFree       : 0x00000000`00000002 Void
   +0x010 Lock             : _EX_PUSH_LOCK
   +0x018 TypeIndex        : 0x23 '#'
   +0x019 TraceFlags       : 0 ''
   +0x019 DbgRefTrace      : 0y0
   +0x019 DbgTracePermanent : 0y0
   +0x01a InfoMask         : 0x2 ''
   +0x01b Flags            : 0x16 ''
   +0x01b NewObject        : 0y0
   +0x01b KernelObject     : 0y1
   +0x01b KernelOnlyAccess : 0y1
   +0x01b ExclusiveObject  : 0y0
   +0x01b PermanentObject  : 0y1
   +0x01b DefaultSecurityQuota : 0y0
   +0x01b SingleHandleEntry : 0y0
   +0x01b DeletedInline    : 0y0
   +0x01c Spare            : 0x5000000
   +0x020 ObjectCreateInfo : 0x00000000`00000001 _OBJECT_CREATE_INFORMATION
   +0x020 QuotaBlockCharged : 0x00000000`00000001 Void
   +0x028 SecurityDescriptor : 0xffffc001`cca12273 Void
   +0x030 Body             : _QUAD

Now, remember earlier when I said that having a DACL set to null gives everyone access? The SecurityDescriptor field here is, in fact, exactly what gets set to null in such a situation. If we overwrite the field with zeroes, then (theoretically) everyone has access to the object. However, this object is a special case: it has the KernelOnlyAccess flag set. This means that no usermode processes can gain a handle to it. We need to switch this off too, so we set the Flags field to 0x10 to keep the PermenantObject flag but clear the rest:

0: kd> eb (0xffffc001`cca13bd0-0x30)+0x1b 0x10
0: kd> eq (0xffffc001`cca13bd0-0x30)+0x28 0
0: kd> dt nt!_OBJECT_HEADER 0xffffc001`cca13bd0-0x30
   +0x000 PointerCount     : 0n65537
   +0x008 HandleCount      : 0n2
   +0x008 NextToFree       : 0x00000000`00000002 Void
   +0x010 Lock             : _EX_PUSH_LOCK
   +0x018 TypeIndex        : 0x23 '#'
   +0x019 TraceFlags       : 0 ''
   +0x019 DbgRefTrace      : 0y0
   +0x019 DbgTracePermanent : 0y0
   +0x01a InfoMask         : 0x2 ''
   +0x01b Flags            : 0x10 ''
   +0x01b NewObject        : 0y0
   +0x01b KernelObject     : 0y0
   +0x01b KernelOnlyAccess : 0y0
   +0x01b ExclusiveObject  : 0y0
   +0x01b PermanentObject  : 0y1
   +0x01b DefaultSecurityQuota : 0y0
   +0x01b SingleHandleEntry : 0y0
   +0x01b DeletedInline    : 0y0
   +0x01c Spare            : 0x5000000
   +0x020 ObjectCreateInfo : 0x00000000`00000001 _OBJECT_CREATE_INFORMATION
   +0x020 QuotaBlockCharged : 0x00000000`00000001 Void
   +0x028 SecurityDescriptor : (null) 
   +0x030 Body             : _QUAD

Now the KernelOnlyAccess and SecurityDescriptor fields are zeroed out, we can gain access to the object from usermode as a non-adminstrative user:

PhysicalMemory_NULL_DACL

In a real exploitation scenario we’d do these edits via the driver bug rather than WinDbg, mapping the page containing the object header and writing to it directly.

Disabling the flags and clearing the security descriptor allows us to map the PhysicalMemory object into any process and use it to gain further control over the system, without worrying about the weird intricacies of how the driver handles certain addresses. This can be done by scanning for EPROCESS structures within memory and identifying one, then jumping through the linked list to find your target process and a known SYSTEM process (e.g. lsass), then duplicating the Token field across to elevate your process. This part isn’t really that novel or interesting, so I won’t go into it here.

One tip I will mention is that you can make the exploitation process much more reliable if you set your process’ priority as high as possible and spin up threads which perform tight loops to keep the processor busy doing nothing, while you mess with memory. This helps keep kernel threads and other process threads from being scheduled as frequently, making it less likely for you to hit a race condition and bugcheck the machine. I only saw this happen once during the hundred or so debugging sessions I did, so it’s not critical, but still worth keeping in mind.

In closing, I hope the teardown of this bug and my exploitation process has been useful to you. While you certainly can directly exploit the bug, it’s not without potential peril, and it’s often safer to pivot to a more stable approach.

Take-home points for security people and driver developers in general:

  • WHQL does not mean the code is secure, nor does it even mean the code is stable or safe. Microsoft happily signed a number of these drivers with vulnerability-as-a-feature code within them. These bugs were trivially identifiable; this indicates that WHQL is likely an automated process to ensure adherence to little more than a “don’t use undocumented / unsupported functions” requirement.
  • Ensure that appropriate DACLs are placed on objects, particularly the device object, via the use of IoCreateDeviceSecure and the Attributes parameters to Create* calls (e.g. CreateMutex, CreateEvent, CreateSemaphore). A null DACL mean anyone can access the object.
  • Drivers should not expose administrative functionality (e.g. UEFI updates) to non-administrative users (e.g. the Everyone group). Ensure that object DACLs reflect this.

Take-home points for ASUS:

  • Implement a security contact mailbox as per the guidance in RFC2142 and ensure that it is checked and managed by someone versed in security. Create a page on your website which lists this contact and outlines your expectations from researchers when reporting security issues.
  • Your Twitter support staff are better at communicating with customers than your support ticket people. You could stand to learn from their more informal and responsive model.
  • Ensure that anything assigned to someone who leaves the company is appropriately reassigned with guidance from that individual. This should help ensure that patches don’t end up delayed by 15 months.
  • Get your code assessed by a 3rd party security contractor before releasing it to customers, and ensure that your developers are given appropriate training on secure development practices. The vulnerable code used was likely copied from examples into a number of your drivers, which indicates that problems may be widespread.

Disclosure timeline:

  • 24th March 2015 – Submitted bug as ticket to ASUS (WTM20150324082900771)
  • 25th March 2015 – Acknowledgement from ASUS
  • 25th March 2015 – Sent reply email with additional information.
  • 27th March 2015 – Reply from “J” from ASUS, says engineer has a fix and is liasing with their own security researcher on the matter.
  • < I forgot about the issue for a long time >
  • 4th September 2015 – Sent email to query status of the issue.
  • 7th September 2015 – Reply from “Anthony” from ASUS, informing me that the agent I’d been interacting with before had left the company, asking for more details on the issue.
  • 7th September 2015 – Sent a response with another full report of the issue.
  • 21st September 2015 – No reply, sent a request for a status update.
  • 22nd September 2015 – Contacted @ASUSUK on Twitter. Had conversation via DM trying to get a status update.
  • 28th September 2015 – Chased up @ASUSUK for an update.
  • 29th September 2015 – Reply informing me that the HQ office in Taipei was closed due to a typhoon.
  • 7th October 2015 – Sent another chase-up message to @ASUSUK.
  • 7th October 2015 – Reply from them; no updates from the office but a promise to let me know when the patch is out.
  • 25th November 2015 – Another chase-up DM to @ASUSUK.
  • 25th November 2015 – HQ were offline, told I’d get a reply the next day. No reply came.
  • 9th May 2016 – Still nothing back from ASUS via email or Twitter, sent another chase-up email and DM informing my intent to disclose within 28 days due to the long delays in releasing.
  • 10th May 2016 – Told that Anthony is OOO until Monday.
  • 12th May 2016 – Told that the delays were due to the project leader at HQ leaving, they’re trying to source someone to fix it and push a fix out ASAP.
  • 12th May 2016 – Sent reply asking to be kept in the loop. ASUS replies saying they’ll keep me informed.
  • 12th June 2016 – Disclosed.

Vulnerable file details:

  • MD5: 1392B92179B07B672720763D9B1028A5
  • SHA1: 8B6AA5B2BFF44766EF7AFBE095966A71BC4183FA
  • Signing certificate serial number: ‎12 d5 c9 e2 94 9d 48 ab ac cd 35 14 f0 fb 22 ad

Talking about Windows drivers at 44CON 2015’s Community Evening

I’ll be speaking at 44CON this year, at the community evening on Wednesday 9th September. The community evening is free to attend – you just need to register to attend if you don’t have a conference ticket. My talk is currently scheduled at 19:45, and I’m speaking about writing Windows drivers, with the goal of leaving you a bit more informed about how they work, and how to get started.

In addition to my talk, Saumil Shah will be speaking about Stegosploit, and Michael Boman will be running a workshop on anti-analysis techniques used in malware. After the talks, there will be a showing of the 20th anniversary edition of Hackers, which is guaranteed to be fun.

As usual, there will be drinks and good conversation. Hope to see you all there! 🙂

W^X policy violation affecting all Windows drivers compiled in Visual Studio 2013 and previous

Back in June, I was doing some analysis on a Windows driver and discovered that the INIT section had the read, write, and executable characteristics flags set. Windows executables (drivers included) use these flags to tell the kernel what memory protection flags should be applied to that section’s pages once the contents are mapped into memory. With these flags set, the memory pages become both writable and executable, which violates the W^X policy, a concept which is considered good security practice. This is usually considered a security issue because it can give an attacker a place to write arbitrary code when staging an exploit, similar to how pre-NX exploits used to use the stack as a place to execute shellcode.

While investigating these section flags in the driver, I also noticed a slightly unusual flag was set: the DISCARDABLE flag. Marking a section as discardable in user-mode does nothing; the flag is meaningless. In kernel-mode, however, the flag causes the section’s pages to be unloaded after initialisation completes. There’s not a lot of documentation around this behaviour, but the best resource I discovered was an article on Raymond Chen’s “The Old New Thing” blog, which links off to some other pages that describe the usage and behaviour in various detail. I’d like to thank Hans Passant for giving me some pointers here, too. The short version of the story is that the INIT section contains the DriverEntry code (think of this like the ‘main()’ function of a driver), and it is marked as discardable because it isn’t used after the DriverEntry function returns. From gathering together scraps of information on this behaviour, it seems to be that the compiler does this because the memory that backs the DriverEntry function must be pageable (though I’m not sure why), but any driver code which may run at IRQLs higher than DISPATCH_LEVEL must not try to access any memory pages that are pageable, because there’s no guarantee that the OS can service the memory access operation. This is further evidenced by the fact that the CODE section of drivers is always flagged with the NOT_PAGED characteristic, whereas INIT is not. By discarding the INIT section, there can be no attempt to execute this pageable memory outside of the initialisation phase. My understanding of this is incomplete, so if anyone has any input on this, please let me know.

The DISCARDABLE behaviour means that the window of exploitation for targeting the memory pages in the INIT section is much smaller – a vulnerability must be triggered during the initialisation phase of a driver (before the section is discarded), and that driver’s INIT section location must be known. This certainly isn’t a vulnerability on its own (you need at least a write-what-where bug to leverage this) but it is also certainly bad practice.

Here’s where things get fun: in order to compare the driver I was analysing to a “known good” sample, I looked into some other drivers I had on my system. Every single driver I investigated, including ones that are core parts of the operating system (e.g. win32k.sys), had the same protection flags. At this point I was a little stumped – perhaps I got something wrong, and the writable flag is needed for some reason? In order to check this, I manually cleared the writable flag on a test driver, and loaded it. It worked just fine, as did several other test samples, from which I can surmise that it is superfluous. I also deduced that this must be a compiler (or linker) issue, since both Microsoft drivers and 3rd party drivers had the same issue. I tried drivers compiled with VS2005, VS2010, and VS2013, and all seemed to be affected, meaning that pretty much every driver on Windows Vista to Windows 8.1 is guaranteed to suffer from this behaviour, and Windows XP drivers probably do too.

INIT section of ATAPI Driver from Windows 8.1

While the target distribution appears to be pretty high, the only practical exploitation path I can think of is as follows:

  1. Application in unprivileged usermode can trigger a driver to be loaded on demand.
  2. Driver leaks a pointer (e.g. via debug output) during initialisation which can be used to determine the address of DriverEntry in memory.
  3. A write-what-where bug in another driver or in the kernel that is otherwise very difficult to exploit (e.g. due to KASLR, DEP, KPP, etc.) is triggered before the DriverEntry completes.
  4. Bug is used to overwrite the end of the DriverEntry function.
  5. Arbitrary code is executed in kernel.

This is a pretty tall order, but there are some things that make it more likely for some of the conditions to arise. First, since any driver can be used (they all have INIT marked as RWX) you only need to find one that you can trigger from unprivileged usermode. Ordinarily the race condition between step 1 and step 4 would be difficult to hit, but if the DriverEntry calls any kind of synchronisation routine (e.g. ZwWaitForSingleObject) then things get a lot easier, especially if the target sync object happens to have a poor or missing DACL, allowing for manipulation from unprivileged usermode code. These things make it a little easier, but it’s still not very likely.

Since I was utterly stumped at this point as to why drivers were being compiled in this way, I decided to contact Microsoft’s security team. Things were all quiet on that front for a long time; aside from an acknowledgement, I actually only heard back from yesterday (2015/09/03). To be fair to them, though, it was a complicated issue and even I wasn’t very sure as to its impact, and I forgot all about it until their response email.

Their answer was as follows:

After discussing this issue internally, we have decided that no action will be taken at this time. I am unable to allocate more resources to answer your questions more specifically, but we do thank you for your concern and your commitment to computer security.

And I can’t blame them. Exploiting this issue would need a powerful attack vector already, and even then it’d be pretty rare to find the prerequisite conditions. The only thing I’m a bit bummed about is that they couldn’t get anyone to explain how it all works in full.

But the story doesn’t end there! In preparation for writing this blog post, I opened up a couple of Microsoft’s drivers on my Windows 10 box to refresh my memory, and found that they no longer had the execute flag set on the INIT section. It seems that Microsoft probably patched this issue in Visual Studio 2015, or in a hotfix for previous versions, so that it no longer happens. Makes me feel all warm and fuzzy inside. I should note, however, that 3rd party drivers such as Nvidia’s audio and video drivers still have the same issue, which implies that they haven’t been recompiled with a version of Visual Studio that contains the fix. I suspect that many vendor drivers will continue to have this issue.

I asked Microsoft whether it had been fixed in VS2015, but they wouldn’t comment on the matter. Since I don’t have a copy of VS2015 yet, I can’t verify my suspicion that they fixed it.

In closing, I’d like to invite anyone who knows more than me about this to provide more information about how/why the INIT section is used and discarded. If you’ve got a copy of VS2015 and can build a quick Hello World driver to test it out, I’d love to see whether it has the RWX issue on INIT.


Disclosure timeline:

  • 29th June 2015 – Discovered Initial driver bug
  • 30th June 2015 – Discovered wider impact (all drivers affected)
  • 2nd July 2015 – Contacted Microsoft with report / query
  • 2nd July 2015 – Microsoft replied with acknowledgement
  • 6th July 2015 – Follow-up email sent to Microsoft
  • [ mostly forgot about this, so I didn’t chase it up ]
  • 3rd September 2015 – Microsoft respond (see above)
  • 3rd September 2015 – Acknowledgement email sent to Microsoft, querying fix status
  • 4th September 2015 – Microsoft respond, will not comment on fix status
  • 4th September 2015 – Disclosed

Another year, another Securi-Tay, another talk… and this time we’re sponsoring the bar!

Another year has rolled by (damn, I really don’t update this blog much, do I?) and Securi-Tay IV is coming up. I’ll be speaking about security issues related to serialisation and deserialisation of data in modern programming languages, including PHP and C#.

My colleague FreakyClown will be talking about robbing banks for a living, which promises to be amusing at the very least (which reminds me – ask me about coathangers and server rooms when you see me).

Most importantly though: we (and by that I mean Portcullis) are sponsoring the bar this year! I hope to see you all there for plenty of drunken security rambling.

Pentesting Java EE web applications with LAPSE+

Just a quick tip for anyone doing a code review of a Java EE web application: LAPSE+ is a very useful tool to have in the arsenal, whether you’ve got the original source or just the JAR/WAR file.

In my case, the client provided me with a single .WAR file which contained the application. As it was a large application, I didn’t really fancy digging through everything manually with JD-GUI, although it is an excellent Java decompiler. I decided to take the opportunity to give LAPSE+ a try.

Here’s what  you’ll need:

You can also grab a PDF instruction manual for LAPSE from the same site. However, be aware that I found some of the information in there to be a bit misleading, e.g. needing a specific version of Eclipse. Also, don’t worry if your client provided you a project for a different IDE, such as IntelliJ IDEA – it doesn’t really matter.

First step is to get Eclipse set up. Drop the .jar file from the LAPSE+ archive into the plugins directory of Eclipse. (Re)start Eclipse, then go to Window -> View -> Other… and select the items relating to LAPSE+. A little toolbar should appear on the right with blue spherical buttons. These are your LAPSE+ windows.

Next step is to load your code into a project. This is split up into two parts, but if you’ve already got an Eclipse project for the site’s source code, you can skip the first part. Otherwise, you’ll need to extract the code from your archive and make a project for it. Start by loading the JAR (rename the .WAR to .JAR if needs be) into JD-GUI. It should decompile the archive and let you browse the code. Go to File -> Export all Sources, and save the resulting ZIP file somewhere. This archive now contains all your decompiled source code, split into directories based on the namespace hierarchy.

Now, go back to Eclipse and create an empty Java project, filling the wizard out with whatever values suit you. Once that’s created, go into the project explorer tree and find the src directory, then right click it and select Import. Select your newly exported ZIP file, and Eclipse will populate your project with your reverse-engineered source. Now right-click the project and select Build. In all likelihood, it’ll throw a whole load of errors due to imperfect decompilation – don’t worry, we don’t really care, because LAPSE+ can still function with a broken build.

Once you’ve got your project set up, go to the individual LAPSE+ windows and browse through what they found. You might need to manually refresh them to run through the checking process. In my case, I found about a 10:1 ratio of false positives, which isn’t actually too bad for code scanning. Within an hour or so of digging through the results I’d found a couple of concrete XSS bugs that I’d not spotted yet, plus a whole bunch of potential XSS bugs that I couldn’t immediately find vectors for, and a whole variety of other interesting stuff to dig through. It’s a really nice way to cut down a 400kLoC project into manageable target points.