Pentesting Java EE web applications with LAPSE+

Just a quick tip for anyone doing a code review of a Java EE web application: LAPSE+ is a very useful tool to have in the arsenal, whether you’ve got the original source or just the JAR/WAR file.

In my case, the client provided me with a single .WAR file which contained the application. As it was a large application, I didn’t really fancy digging through everything manually with JD-GUI, although it is an excellent Java decompiler. I decided to take the opportunity to give LAPSE+ a try.

Here’s what  you’ll need:

You can also grab a PDF instruction manual for LAPSE from the same site. However, be aware that I found some of the information in there to be a bit misleading, e.g. needing a specific version of Eclipse. Also, don’t worry if your client provided you a project for a different IDE, such as IntelliJ IDEA – it doesn’t really matter.

First step is to get Eclipse set up. Drop the .jar file from the LAPSE+ archive into the plugins directory of Eclipse. (Re)start Eclipse, then go to Window -> View -> Other… and select the items relating to LAPSE+. A little toolbar should appear on the right with blue spherical buttons. These are your LAPSE+ windows.

Next step is to load your code into a project. This is split up into two parts, but if you’ve already got an Eclipse project for the site’s source code, you can skip the first part. Otherwise, you’ll need to extract the code from your archive and make a project for it. Start by loading the JAR (rename the .WAR to .JAR if needs be) into JD-GUI. It should decompile the archive and let you browse the code. Go to File -> Export all Sources, and save the resulting ZIP file somewhere. This archive now contains all your decompiled source code, split into directories based on the namespace hierarchy.

Now, go back to Eclipse and create an empty Java project, filling the wizard out with whatever values suit you. Once that’s created, go into the project explorer tree and find the src directory, then right click it and select Import. Select your newly exported ZIP file, and Eclipse will populate your project with your reverse-engineered source. Now right-click the project and select Build. In all likelihood, it’ll throw a whole load of errors due to imperfect decompilation – don’t worry, we don’t really care, because LAPSE+ can still function with a broken build.

Once you’ve got your project set up, go to the individual LAPSE+ windows and browse through what they found. You might need to manually refresh them to run through the checking process. In my case, I found about a 10:1 ratio of false positives, which isn’t actually too bad for code scanning. Within an hour or so of digging through the results I’d found a couple of concrete XSS bugs that I’d not spotted yet, plus a whole bunch of potential XSS bugs that I couldn’t immediately find vectors for, and a whole variety of other interesting stuff to dig through. It’s a really nice way to cut down a 400kLoC project into manageable target points.

The Router Review: From nmap to firmware

When I moved into my flat, I found that the previous tenant had left behind his Sky Broadband router. Awesome – a new toy to break! Sadly I got bogged down with silly things like moving house and going to work, so I didn’t get a chance to play with it. Until now, that is.

This isn’t the first embedded device I’ve played with. Over the years I’ve desoldered EEPROMs from routers, done unspeakable things to photocopiers, and even overvolted an industrial UPS unit via SNMP. The router I shall be discussing in this post, however, was one of the easier and more generic bits of kit I’ve played with.

Now, a little about the device. The model is DG934, and the full part number is 272-10452-01. It’s an ADSL router supplied by Sky (also known as BSkyB) as part of their old broadband package, but it’s actually manufactured by Netgear. It’s got four ethernet ports, an ADSL (phone) port, and takes a 12V power supply. Internally, it runs on the Atheros chipset. Unfortunately, this being a UK-only device, there’s no FCC ID – if there had been, I could’ve looked it up on the FCC OET database and found all sorts of internal photos and test data, which is often valuable when looking at the hardware aspects.

My first job was to power it on and get into the config panel. Since the previous tenant clearly wasn’t security conscious, he’d kindly left the device in its default configuration and I was able to log into the configuration interface using the default admin / sky credentials. I exported the config file to my machine, and took a look. In this case it’s plaintext, so there’s nothing to break here, but it’s not exactly good practice – it includes the passwords for WiFi and the configuration interface.

I ran nmap against the device and got the following results:

PORT      STATE SERVICE VERSION
80/tcp    open  http    BSkyB DG934G http config
5000/tcp  open  sip     BSkyB/1.0 UPnP/1.0 miniupnpd/1.0 (Status: 501 Not Implemented)
8080/tcp  open  http    BSkyB DG934G http config
32764/tcp open  unknown

Interestingly, the configuration site was available on both 80 and 8080. This seems to be the norm for many routers, but I have no idea why. UPnP on port 5000 is always a fun one to spot, and we’ll take a look at this shortly. Finally, there’s an unknown protocol running on port 32764.

For messing with UPnP, I have the UPnP Developer Tools for Windows. They’re mainly written in C# and are open source, so you can always port to Mono if you want. I used Device Spy to get the following info:

  • It’s a BSkyB DG934 Router.
  • The firmware date is 2007-08-27.
  • You can pull out stats such as total bytes sent/received, total packets sent/received, and uptime in seconds.
  • Port mapping functions are available.
  • SetEnabledForInternet isn’t present – shame, really, since it leads to a nice DoS condition.

Sadly there’s not much you can play with here.

Next, we’ll take a look at that weird unknown protocol on port 32764. When connecting to it, the string “MMcS” is returned, along with two binary IP representations: 255.255.255.0 and 0.0.0.0. I tried playing around with this, but honestly I have no idea what it’s for. Google returned a bunch of people asking what it was, and nobody with any real answers. Potentially it’s for Multimedia Class Schedule Server, but that’s speculation at best. Again, no luck at fun stuff here.

Finally, let’s dig into the firmware. Instead of taking the device apart, desoldering the firmware EEPROM, and interfacing to it with a BusPirate to rip the data off, I decided to go the easy route and download the openly available firmware from Netgear. The file provided is a flat binary, with some interesting data inside it. It’s partitioned into various sections, with conveniently obvious data offsets (e.g. 0x10000). In order to properly dissect the file, I used binwalk. In BackTrack 5 it’s located in /pentest/reverse-engineering/binwalk/ and requires you to manually set the magic file via the -m switch.

root@bt:~# binwalk -m /pentest/reverse-engineering/binwalk/magic.binwalk ~/dg834gt_1_02_09.img
DECIMAL HEX DESCRIPTION
-------------------------------------------------------------------------------------------------------
:1248 0x4E0 CFE boot loader
1288 0x508 CFE boot loader
4177 0x1051 LZMA compressed data, properties: 0xA4, dictionary size: 285474816 bytes, uncompressed size: 256 bytes
7951 0x1F0F LZMA compressed data, properties: 0xC2, dictionary size: 556793856 bytes, uncompressed size: 67108881 bytes
8087 0x1F97 LZMA compressed data, properties: 0x82, dictionary size: 556793856 bytes, uncompressed size: 67108881 bytes
8227 0x2023 LZMA compressed data, properties: 0xC2, dictionary size: 556793856 bytes, uncompressed size: 67108881 bytes
8371 0x20B3 LZMA compressed data, properties: 0x82, dictionary size: 556793856 bytes, uncompressed size: 67108881 bytes
10563 0x2943 LZMA compressed data, properties: 0xDF, dictionary size: 555220992 bytes, uncompressed size: 167272448 bytes
65792 0x10100 CramFS filesystem, big endian size 2879488 version #2 sorted_dirs CRC 0x51df60ff, edition 0, 1975 blocks, 938 files
1016865 0xF8421 ARJ archive data, v193, backup, original name: \230\346+\210\365 ... [snip]

This gives us a pretty good idea of what we’re dealing with. First, there’s a Common Firmware Environment (CFE) bootloader, which is Broadcom’s alternative to U-Boot. There’s some irony here in that Broadcom and Atheros are competitors, yet CFE is being used on an Atheros chipset device. Anyway, there’s a bunch of LZMA junk after that which looks like various bits of firmware and a Linux kernel image. The bit we’re really interested in is the CramFS data. As a side note here, it looks like binwalk was a bit overzealous in identifying an ARJ archive at the end (hence the corrupted original name) so we can assume that the CramFS block takes up the remainder of the file.

In order to extract the filesystem, we can use good old dd. The following should suffice:

dd size=256 skip=257 count=20000 if=dg834gt_1_02_09.img of=firmware.cramfs

Note that 257 * 256 = 65792, which is 0x10100, i.e. the offset of the data we want to pull out. I stuck a really big count in there because we’re reading to the end of the file. Now, you’re going to want to grab some tools to work with CramFS:

sudo apt-get install cramfsprogs fusecram

This provides you with the modules needed to mount CramFS volumes, as well as some tools to help you along the way. Now we can mount the filesystem:

root@bt:~# sudo mount -t cramfs -o loop ~/firmware.cramfs /media/firmware/
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

Hmmm, that’s odd. Let’s see what dmesg has to say about this…

root@bt:~# dmesg | tail -n 1
[ 4394.319907] cramfs: wrong endianess

Aha! A fun fact about CramFS is that file systems have endianness as per the architecture they were created on. Since the router is big-endian and my box is little-endian, I need to convert it. Thankfully, cramfsprogs includes a tool called cramfsswap that flips the endianness of a provided image. Side note: if you get “wrong magic” as an error, you didn’t extract the right blocks of data, or the file system isn’t CramFS.

root@bt:~# cramfsswap ./firmware.cramfs ./firmware-conv.cramfs
Filesystem is big endian, will be converted to little endian.
Filesystem contains 937 files.
CRC: 0xe86ad3b0
root@bt:~# sudo mount -t cramfs -o loop ~/firmware-conv.cramfs /media/firmware/
root@bt:~#

Excellent! Now to dig around inside the files.

root@bt:~# ls -l /media/firmware/
total 20
drwxr-xr-x 1 root root 452 1970-01-01 01:00 bin
drwxr-xr-x 1 root root 0 1970-01-01 01:00 dev
lrwxrwxrwx 1 root root 8 1970-01-01 01:00 etc -> /tmp/etc
drwxr-xr-x 1 root root 784 1970-01-01 01:00 lib
drwxr-xr-x 1 root root 0 1970-01-01 01:00 proc
drwxr-xr-x 1 root root 176 1970-01-01 01:00 sbin
drwxr-xr-x 1 root root 0 1970-01-01 01:00 tmp
drwxr-xr-x 1 root root 116 1970-01-01 01:00 usr
lrwxrwxrwx 1 root root 8 1970-01-01 01:00 var -> /tmp/var
lrwxrwxrwx 1 root root 8 1970-01-01 01:00 www -> /tmp/www
drwxr-xr-x 1 root root 3900 1970-01-01 01:00 www.deu
drwxr-xr-x 1 root root 3908 1970-01-01 01:00 www.eng
drwxr-xr-x 1 root root 3824 1970-01-01 01:00 www.fre

There’s a full listing on pastebin, if you’re interested. It’s worth noting that if you can mount the filesystem, can see the directories and files inside it, but can’t read the file data, then you probably didn’t copy the entire filesystem and it’s missing chunks of data. Anyway, this looks pretty typical. We can see a very basic file system that comprises all the runtime parts of the device, excluding the kernel and any ramfs stuff. Here’s what I found:

  • The three www prefixed directories contain the template files used for the administration panel.
  • /bin contains busybox binaries.
  • /lib contains the kinds of libraries you’d expect on a router, e.g. libcrypt, libupnp, libpppoe, etc.
  • /lib/modules contains various kernel modules for the router, such as the push button driver and Atheros HAL.
  • /sbin contains various binaries such as ifconfig, insmod, lsmod, etc.
  • /usr/bin contains four binaries, including one called test.
  • /usr/etc contains the default config files and various scripts.
  • /usr/sbin contains various binaries for daemons (including reaim and iptables), as well as some for performing maintenance operations, e.g. WiFi control operations.
  • /usr/upnp contains the definitions for the UPnP endpoint.

The most interesting directory was /usr/etc, which contains both passwd and an svn.info. The passwd file shows only root and nobody, which leads me to believe that all services run as root. The svn.info file has all sorts of interesting info in it:

Path: .
URL: file:///svn/Platform/DG834_PN/Source
Repository Root: file:///svn/Platform/DG834_PN
Repository UUID: 25bc2c04-8815-0410-823d-fa30465ac5aa
Revision: 93
Node Kind: directory
Schedule: normal
Last Changed Author: ethan
Last Changed Rev: 93
Last Changed Date: 2007-02-16 16:23:45 +0800 (Fri, 16 Feb 2007)

Boot Loader version: CFE version 1.0.37-5.11 for BCM96348

So we now know that Netgear use(d) SVN for their source control, that “Ethan” is the guy developing the firmware for the DG834, and that we’re running CFE 1.0.37-5.11 on the BCM96348 SoC IC. Hi, Ethan!

I’m going to leave this here for now, primarily because it’s almost 4am, but also because the point of this blog post was to show just how much information you can dig out of a device without even touching it with a screwdriver, or opening a manual. Keep in mind that the techniques I’ve shown here should apply to many routers and other small embedded devices. At some point in the future I’ll get around to digging into some of their custom binaries, as well as their HTTPD. If I find anything interesting, I’ll be sure to post an update. Also, let me know if you’ve got any spare routers you want me to dig into when I get a spare few hours – I’m always happy to take donations!

Preventing executable analysis – Part 1, Static Analysis

In this series of posts, I’m going to discuss executable analysis, the methods that are used and mechanisms to prevent them. There are three types of analysis that can be performed on executables:

  • Static – Analysis of the sample file on disk.
  • Emulated – Branch and stack analysis of the sample through an emulator.
  • Live – Analysis of the executing sample on a VM, usually using hooks.

I’m going to look at each type in detail, giving examples of techniques used in each and ways to make analysis difficult.

In this first post, I’ll look at static analysis. This type of analysis involves parsing the executable and disassembling the code and data contained within it, without ever running it. The benefit of this is that it’s safe, since it’s impossible for the code to cause any damage. The downside is that static analysis can’t really make assumptions about high-level behaviours.

Entry Point Check
The first method used to perform static analysis is simple header checks. If the entry point (EP) of the executable resides outside of a section marked as code, it is safe to assume that the application isn’t “normal”. In order to prevent recognising this from being a simple task, the executable should have its BaseOfCode header pointing at the same section the EP is in, even when packed.

Packing
Executables are often packed – i.e. their code is encrypted in some way. We can analyse this using entropy calculations on each section, to discover how “random” the data looks. It’s often tempting for authors to try to create a good cipher for encrypting packed sections, but this often leads to a few problems. Firstly, entropy calculations will very quickly spot sections that look too random to be normal code or data. Secondly, there are many applications out there that will look for sequences of data and instructions that match known cryptographic algorithms. It’s relatively easy to spot magic numbers and S-box arrays

In order to prevent this, a packing algorithm should be used that preserves the statistical signature of the original data. A good way to do this is to flip only the lowest two bits of each byte, or to simply shuffle the data rather than encrypting it with xor or a similar operation. By definition, a sample of data will have the same Shannon entropy regardless of how much you shuffle it. The usual way that analysis tools work is to split each section into blocks and compute an entropy graph across the file. By using a cipher that only shuffles bytes that are close, you can achieve an almost identical entropy graph:

Entropy Graph

Since instructions are multi-byte, shuffling completely destroys the code, making it impossible to read. It’s relatively simple to perform half-decent shuffling, given a reasonably large key:

for each byte k in key
{
	tmp = data[0]
	data[0] = data[k]
	data[k] = tmp
}

Simply loop the above over a sequence of data, you’ll get reasonable shuffling within each 256-byte block. OllyDbg doesn’t recognise this as packed, since it works on counts of particularly common bytes in code sections.

Jump Tables
Static analysis tools such as IDA Pro work by mapping sequences of jumps together. Some enhance this by performing heuristic analysis of jumps, for example turning jmp [file.exe+0x420c0] into an assumed jump based on the data at file offset 0x420c0. We can try to defeat this type of analysis by using jump tables. These are pointer tables generated at runtime, which are encrypted or obfuscated on disk. Jumps in the code are done by pointing to offsets in the jump table. Often this is further obfuscated by using jumps to register pointers, or stack jumps:

; ecx = function ID
mov eax, [ptrToTable+ecx*4] ; load the encrypted pointer into eax
xor eax, [ptrToKey+ecx*4]   ; xor with the key
push eax                    ; push address to stack
ret                         ; return (jump) to it, obfuscates the jump

Obviously there’s more we can do here – better encryption, values generated at runtime, more obfuscation, etc.

Control Flow Obfuscation
Some analysis tools focus on artifacts of compilers – i.e. the signatures of how common high level language constructs translate into assembly language. For example, some loops may be translated into a dec/jg loop, whereas some others might use rep mov. It all depends on the high level construct in use. By altering these constructs and using them in situations where they are unusual, this can confuse heuristics. One example for short loops is using a switch:

for(int i=0; i<5; i++)
{
	if(i%2==0) printf("%i is even\n", i);
	else printf("%i is odd\n", i);
	if(i==4) printf("done");
}

We can turn this into a switch statement that flattens out the flow, instead of being an obvious loop:

for(int i=0; i<5; i++)
{
	switch(i)
	{
		case 0: printf("%i is even\n", i); break;
		case 1: printf("%i is odd\n", i); break;
		case 2: printf("%i is even\n", i); break;
		case 3: printf("%i is odd\n", i); break;
		case 4: printf("%i is even\n", i); printf("done"); break;
	}
}

Since this uses a switch, we can use a jump table that is easy to obfuscate.

Conclusion
There are many ways to break static analysis, some of which are simple, some of which are more complex. By employing these, it makes it very difficult for any analyst to decode and understand. Such methods can also prevent automated tools from performing in-depth analysis of the code. Understanding these methods helps both implement them and circumvent them. In the next part, I’ll be looking at virtualised and emulated analysis, which uses virtual hardware to analyse and fingerprint software without actually executing the real application code live on a hardware processor.

Further Reading