How To Decode JavaScript Obfuscations: Top Answers from our JavaScript Challenge

Ricky Lawshae
September 27, 2011

When we announced the winner of our JavaScript obfuscation challenge, I mentioned a couple of the big-picture lessons I learned from the successful entries. I’ve been asked to go a little deeper by sharing more specifics from some of the best entries we received, so I thought I’d highlight those here.

Lots of the entrants sent us just the CVE number that was the answer to the challenge, or the answer with an attached .zip file showing their de-obfuscated code. Some even added brief but insightful comments like these:

  • That base-32 encoding is a neat trick; could have made it harder by varying the base, because this way one regex replace could be used to replace a lot of ['length'] expressions at once. :D
  • Nice challenge; the (x=valueOf,x()) trick to get window is really clever. [My reply: I wish I could take credit for the window trick, but as I mentioned in my first post, the cleverness credit for that one goes to the folks at the sla.ckers.org forums.]
  • Constructions of obfuscation are just amazing, especially 'v'[720094129..toString(16<<1)+""], or that buffer filled with ciphers.
  • That's an interesting process of adding cruft to the code. I particularly like the ternaries that are there just to add a whole statement of useless confusion.
  • I see \d+\.\.toString\(\) used a lot. Why not spice things up a bit with \d+\.0+\.toString\(\) ?

Others went into even more detail, and I’m going to share three of the best methods here. The first one is from Mario Heiderich. I liked it because of how embarrassingly easy he made it seem:

  1. String de-obfuscation unveiling the original 'eval' (nice 'Object.valueOf' window accessors!)
  2. Replacing 'eval' by 'console.log()'
  3. Getting hands on the function 'LcXiYjzTRSKyzv(jkPfjgfRwzD)'
  4. De-obfuscating the loop and 'document.write' value
  5. Realize that a, base, and audio tags are concatenated
  6. Search matching CVE — find the one by Daniel Veditz, done!

Prasanna K. also followed an approach that I liked because it gave me further insight into the de-obfuscation techniques employed by manual analysts in the real world:

The exploit is the interleaved calls to document.write and DOM insertion (document.appendChild) causing a heap buffer overflow.

Method to Find the Decryption:

  1. Used the de-obfuscation to find -~-~'zgBq'[720094129..toString(32<<0)+""] strings and then made the first cut readable.
  2. Figured that (pjSkrbvs((YvtOzgP={}.valueOf,YvtOzgP()) equates to the window object.
  3. Created an HTML page that did the following:
    1. Get Unicode from the char code.
    2. Unescape the Unicode.
    3. Call the XOR function with Firefox key (this was brute-forced manually to get the right key) . . . as the result has to be eval, I was sure the result of the operation should be JS, which came only with the Firefox key.
    4. We now had the second JS.
  4. Steps 1 and 2 were repeated again to create a human-readable JS.
  5. Then Googled the exploit to find the exploit that matches the one generated by me; the link that came close was http://www.scriptjunkie.us/2011/06/firefox-exploit-analyzed/ .
  6. Confirmed the exploit from the exploit-db source code (http://www.exploit-db.com/exploits/15352/).

Finally, "psifertex" offered this detailed analysis in the comment thread on the original post—but I thought it deserved a spotlight here:

There's two layers of obfuscation. The first obviously unpacks the giant blob of raw data, and the second contains the "interesting" functionality.

First, there's a couple of cliches that are used throughout the code that can be rather simply replaced to make it more readable. Any String.fromCharCode(x,y,z) sequences can obviously be replaced with the string formed from the ascii values, and likewise the ######..toString(32) trick was neat as well.

It might have been easier in the long run to write some code to clean that up, but I mostly did it by hand, copying out those sequences, pasting them into Firebug or the standalone JS interpreter and replacing them in the original code to make it more readable.

The ..toString sequence is especially neat, so I'll cover that briefly. First, any sequence of numbers will be treated as a number object by appending a "." at the end; therefore the slightly confusing construction of ####..function() is the same as: var myNum = ####; myNum.function().

Next is the clever use of toString. You usually see toString used to output raw values as hex (great for testing memory leaks from JavaScript, for example!) via: toString(16). But the point is that any base, not just 16, can be used as an argument, and the output will be expressed in that number system.

By passing 32 as the argument, the digits 0-9 and the first 22 letters of the English alphabet (everything up to v) are used. Why 32? No idea—I guess all the functions they wanted to call didn't use "W," "X," "Y," or "Z." What would have been more annoying would be to change the base of the toString function each time depending on the string being encoded. That would make a static analysis tool's job much more difficult. If I saw this in the wild always behaving this way, I'd just take all numbers and pass them through my own equivalent version of .toString(32).

[My response: About the base32 vs base36 thing... The obfuscator itself is written in ruby, and for whatever reason, I kept getting inconsistent results when I would base36 convert a string in ruby, and then convert it back in javascript. Still not sure what was going on there, and I do need to revisit it to get those precious four characters, but at the time base32 was the magic number to get the consistent results I needed.]

. . . A quick sample of the output can be seen by popping open the Firefox Web console and pasting:


	

for (var x=0; x<100; x++) { console.log(x.toString(32)); }

You'll quickly see how any ascii text (sans WXYZ) can be easily encoded.

Also, most simple numeric constants were likewise generated with overly verbose constructions. Those were also easily simplified by copying/pasting them into any JS interpreter for simplification.

From there it was mostly a matter of just identifying the functions being called (remember, any method in javascript can also be accessed as an array index, which is convenient when decoding the function name as a string).

You'll quickly see the different browser user-agent strings produce different decryption keys for the code later on. I simply added javasript to dump the contents of the decrypted buffer and then manually changed the useragent to all the different types before finding out that this particular payload only successfully decrypted on Firefox.

Once you get inside the main payload, you repeat the same basic process again. If you read the original instructions well (I didn't!) you'd quickly see a bunch of dom manipulation functions being called; Google a bit and find the correct CVE.

If you didn't, you'd waste a bunch of time de-obfuscating the entire function before realizing there was nothing else hidden there and going back to read the instructions and realizing you were long since done. Oops. ;-)

Besides these three methods, I’d also suggest that you take a look at Kahu Security’s blog post about solving the challenge, which provides another detailed walk-through.

Again, thanks to everyone who participated in something that was highly educational for me, but also a lot of fun.


Related Posts:

blog comments powered by Disqus