W^X policy violation affecting all Windows drivers compiled in Visual Studio 2013 and previous

Back in June, I was doing some analysis on a Windows driver and discovered that the INIT section had the read, write, and executable characteristics flags set. Windows executables (drivers included) use these flags to tell the kernel what memory protection flags should be applied to that section’s pages once the contents are mapped into memory. With these flags set, the memory pages become both writable and executable, which violates the W^X policy, a concept which is considered good security practice. This is usually considered a security issue because it can give an attacker a place to write arbitrary code when staging an exploit, similar to how pre-NX exploits used to use the stack as a place to execute shellcode.

While investigating these section flags in the driver, I also noticed a slightly unusual flag was set: the DISCARDABLE flag. Marking a section as discardable in user-mode does nothing; the flag is meaningless. In kernel-mode, however, the flag causes the section’s pages to be unloaded after initialisation completes. There’s not a lot of documentation around this behaviour, but the best resource I discovered was an article on Raymond Chen’s “The Old New Thing” blog, which links off to some other pages that describe the usage and behaviour in various detail. I’d like to thank Hans Passant for giving me some pointers here, too. The short version of the story is that the INIT section contains the DriverEntry code (think of this like the ‘main()’ function of a driver), and it is marked as discardable because it isn’t used after the DriverEntry function returns. From gathering together scraps of information on this behaviour, it seems to be that the compiler does this because the memory that backs the DriverEntry function must be pageable (though I’m not sure why), but any driver code which may run at IRQLs higher than DISPATCH_LEVEL must not try to access any memory pages that are pageable, because there’s no guarantee that the OS can service the memory access operation. This is further evidenced by the fact that the CODE section of drivers is always flagged with the NOT_PAGED characteristic, whereas INIT is not. By discarding the INIT section, there can be no attempt to execute this pageable memory outside of the initialisation phase. My understanding of this is incomplete, so if anyone has any input on this, please let me know.

The DISCARDABLE behaviour means that the window of exploitation for targeting the memory pages in the INIT section is much smaller – a vulnerability must be triggered during the initialisation phase of a driver (before the section is discarded), and that driver’s INIT section location must be known. This certainly isn’t a vulnerability on its own (you need at least a write-what-where bug to leverage this) but it is also certainly bad practice.

Here’s where things get fun: in order to compare the driver I was analysing to a “known good” sample, I looked into some other drivers I had on my system. Every single driver I investigated, including ones that are core parts of the operating system (e.g. win32k.sys), had the same protection flags. At this point I was a little stumped – perhaps I got something wrong, and the writable flag is needed for some reason? In order to check this, I manually cleared the writable flag on a test driver, and loaded it. It worked just fine, as did several other test samples, from which I can surmise that it is superfluous. I also deduced that this must be a compiler (or linker) issue, since both Microsoft drivers and 3rd party drivers had the same issue. I tried drivers compiled with VS2005, VS2010, and VS2013, and all seemed to be affected, meaning that pretty much every driver on Windows Vista to Windows 8.1 is guaranteed to suffer from this behaviour, and Windows XP drivers probably do too.

INIT section of ATAPI Driver from Windows 8.1

While the target distribution appears to be pretty high, the only practical exploitation path I can think of is as follows:

  1. Application in unprivileged usermode can trigger a driver to be loaded on demand.
  2. Driver leaks a pointer (e.g. via debug output) during initialisation which can be used to determine the address of DriverEntry in memory.
  3. A write-what-where bug in another driver or in the kernel that is otherwise very difficult to exploit (e.g. due to KASLR, DEP, KPP, etc.) is triggered before the DriverEntry completes.
  4. Bug is used to overwrite the end of the DriverEntry function.
  5. Arbitrary code is executed in kernel.

This is a pretty tall order, but there are some things that make it more likely for some of the conditions to arise. First, since any driver can be used (they all have INIT marked as RWX) you only need to find one that you can trigger from unprivileged usermode. Ordinarily the race condition between step 1 and step 4 would be difficult to hit, but if the DriverEntry calls any kind of synchronisation routine (e.g. ZwWaitForSingleObject) then things get a lot easier, especially if the target sync object happens to have a poor or missing DACL, allowing for manipulation from unprivileged usermode code. These things make it a little easier, but it’s still not very likely.

Since I was utterly stumped at this point as to why drivers were being compiled in this way, I decided to contact Microsoft’s security team. Things were all quiet on that front for a long time; aside from an acknowledgement, I actually only heard back from yesterday (2015/09/03). To be fair to them, though, it was a complicated issue and even I wasn’t very sure as to its impact, and I forgot all about it until their response email.

Their answer was as follows:

After discussing this issue internally, we have decided that no action will be taken at this time. I am unable to allocate more resources to answer your questions more specifically, but we do thank you for your concern and your commitment to computer security.

And I can’t blame them. Exploiting this issue would need a powerful attack vector already, and even then it’d be pretty rare to find the prerequisite conditions. The only thing I’m a bit bummed about is that they couldn’t get anyone to explain how it all works in full.

But the story doesn’t end there! In preparation for writing this blog post, I opened up a couple of Microsoft’s drivers on my Windows 10 box to refresh my memory, and found that they no longer had the execute flag set on the INIT section. It seems that Microsoft probably patched this issue in Visual Studio 2015, or in a hotfix for previous versions, so that it no longer happens. Makes me feel all warm and fuzzy inside. I should note, however, that 3rd party drivers such as Nvidia’s audio and video drivers still have the same issue, which implies that they haven’t been recompiled with a version of Visual Studio that contains the fix. I suspect that many vendor drivers will continue to have this issue.

I asked Microsoft whether it had been fixed in VS2015, but they wouldn’t comment on the matter. Since I don’t have a copy of VS2015 yet, I can’t verify my suspicion that they fixed it.

In closing, I’d like to invite anyone who knows more than me about this to provide more information about how/why the INIT section is used and discarded. If you’ve got a copy of VS2015 and can build a quick Hello World driver to test it out, I’d love to see whether it has the RWX issue on INIT.

Disclosure timeline:

  • 29th June 2015 – Discovered Initial driver bug
  • 30th June 2015 – Discovered wider impact (all drivers affected)
  • 2nd July 2015 – Contacted Microsoft with report / query
  • 2nd July 2015 – Microsoft replied with acknowledgement
  • 6th July 2015 – Follow-up email sent to Microsoft
  • [ mostly forgot about this, so I didn’t chase it up ]
  • 3rd September 2015 – Microsoft respond (see above)
  • 3rd September 2015 – Acknowledgement email sent to Microsoft, querying fix status
  • 4th September 2015 – Microsoft respond, will not comment on fix status
  • 4th September 2015 – Disclosed

Preventing executable analysis – Part 1, Static Analysis

In this series of posts, I’m going to discuss executable analysis, the methods that are used and mechanisms to prevent them. There are three types of analysis that can be performed on executables:

  • Static – Analysis of the sample file on disk.
  • Emulated – Branch and stack analysis of the sample through an emulator.
  • Live – Analysis of the executing sample on a VM, usually using hooks.

I’m going to look at each type in detail, giving examples of techniques used in each and ways to make analysis difficult.

In this first post, I’ll look at static analysis. This type of analysis involves parsing the executable and disassembling the code and data contained within it, without ever running it. The benefit of this is that it’s safe, since it’s impossible for the code to cause any damage. The downside is that static analysis can’t really make assumptions about high-level behaviours.

Entry Point Check
The first method used to perform static analysis is simple header checks. If the entry point (EP) of the executable resides outside of a section marked as code, it is safe to assume that the application isn’t “normal”. In order to prevent recognising this from being a simple task, the executable should have its BaseOfCode header pointing at the same section the EP is in, even when packed.

Executables are often packed – i.e. their code is encrypted in some way. We can analyse this using entropy calculations on each section, to discover how “random” the data looks. It’s often tempting for authors to try to create a good cipher for encrypting packed sections, but this often leads to a few problems. Firstly, entropy calculations will very quickly spot sections that look too random to be normal code or data. Secondly, there are many applications out there that will look for sequences of data and instructions that match known cryptographic algorithms. It’s relatively easy to spot magic numbers and S-box arrays

In order to prevent this, a packing algorithm should be used that preserves the statistical signature of the original data. A good way to do this is to flip only the lowest two bits of each byte, or to simply shuffle the data rather than encrypting it with xor or a similar operation. By definition, a sample of data will have the same Shannon entropy regardless of how much you shuffle it. The usual way that analysis tools work is to split each section into blocks and compute an entropy graph across the file. By using a cipher that only shuffles bytes that are close, you can achieve an almost identical entropy graph:

Entropy Graph

Since instructions are multi-byte, shuffling completely destroys the code, making it impossible to read. It’s relatively simple to perform half-decent shuffling, given a reasonably large key:

for each byte k in key
	tmp = data[0]
	data[0] = data[k]
	data[k] = tmp

Simply loop the above over a sequence of data, you’ll get reasonable shuffling within each 256-byte block. OllyDbg doesn’t recognise this as packed, since it works on counts of particularly common bytes in code sections.

Jump Tables
Static analysis tools such as IDA Pro work by mapping sequences of jumps together. Some enhance this by performing heuristic analysis of jumps, for example turning jmp [file.exe+0x420c0] into an assumed jump based on the data at file offset 0x420c0. We can try to defeat this type of analysis by using jump tables. These are pointer tables generated at runtime, which are encrypted or obfuscated on disk. Jumps in the code are done by pointing to offsets in the jump table. Often this is further obfuscated by using jumps to register pointers, or stack jumps:

; ecx = function ID
mov eax, [ptrToTable+ecx*4] ; load the encrypted pointer into eax
xor eax, [ptrToKey+ecx*4]   ; xor with the key
push eax                    ; push address to stack
ret                         ; return (jump) to it, obfuscates the jump

Obviously there’s more we can do here – better encryption, values generated at runtime, more obfuscation, etc.

Control Flow Obfuscation
Some analysis tools focus on artifacts of compilers – i.e. the signatures of how common high level language constructs translate into assembly language. For example, some loops may be translated into a dec/jg loop, whereas some others might use rep mov. It all depends on the high level construct in use. By altering these constructs and using them in situations where they are unusual, this can confuse heuristics. One example for short loops is using a switch:

for(int i=0; i<5; i++)
	if(i%2==0) printf("%i is even\n", i);
	else printf("%i is odd\n", i);
	if(i==4) printf("done");

We can turn this into a switch statement that flattens out the flow, instead of being an obvious loop:

for(int i=0; i<5; i++)
		case 0: printf("%i is even\n", i); break;
		case 1: printf("%i is odd\n", i); break;
		case 2: printf("%i is even\n", i); break;
		case 3: printf("%i is odd\n", i); break;
		case 4: printf("%i is even\n", i); printf("done"); break;

Since this uses a switch, we can use a jump table that is easy to obfuscate.

There are many ways to break static analysis, some of which are simple, some of which are more complex. By employing these, it makes it very difficult for any analyst to decode and understand. Such methods can also prevent automated tools from performing in-depth analysis of the code. Understanding these methods helps both implement them and circumvent them. In the next part, I’ll be looking at virtualised and emulated analysis, which uses virtual hardware to analyse and fingerprint software without actually executing the real application code live on a hardware processor.

Further Reading