Defeating Macro Document Static Analysis with Pictures of My Cat

Over the past few weeks I've spent some time learning Visual Basic for Applications (VBA), specifically for creating malicious Word documents to act as an initial stager. When taking operational security into consideration and brainstorming ways of evading macro detection, I had the question, how does anti-virus detect a malicious macro?

The hypothesis I came up with was that anti-virus would parse out macro content from the word document and scan the macro code for a variety of malicious techniques, nothing crazy. A common pattern I've seen attackers counter this sort-of detection is through the use of macro obfuscation, which is effectively scrambling macro content in an attempt to evade the malicious patterns anti-virus looks for.

The questions I wanted answered were:

How does anti-virus even retrieve the macro content?
What differences are there for the retrieval of macro content between the implementation in Microsoft Word and anti-virus?

Discovery

According to Wikipedia,Open Office XML (OOX) "is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents". This is the file format used for the common Microsoft Word extensions docx and docm. The fact that Microsoft Office documents were essentially a zip file of XML files certainly piqued my interest.

Since the OOX format is just a zip file, I found that parsing macro content from a Microsoft Word document was simpler than you might expect. All an anti-virus would need to do is:

Extract the Microsoft Office document as a ZIP and look for the file word\vbaProject.bin.
Parse the OLE binary and extract the macro content.

The differences I was interested in was how the methods would handle errors and corruption. For example, common implementations of ZIP extraction will often have error checking such as:

Does the local file header begin with the signature 0x04034b50?
Is the minimum version bytes greater than what is supported?

What I was really after was finding ways to break the ZIP parser in anti-virus without breaking the ZIP parser used by Microsoft Office.

Before we get into corrupting anything, we need a base sample first. As an example, I simply wrote a basic macro "Hello World!" that would appear when the document was opened.

For the purposes of testing detection of macros, I needed another sample document that was heavily detected by anti-virus. After a quick google search, I found a few samples shared by @malware_traffic here. The sample named HSOTN2JI.docm had the highest detection rate, coming in at 44/61 engines marking the document as malicious.

To ensure that detections were specifically based on the malicious macro inside the document's vbaProject.bin OLE file, I...

Opened both my "Hello World" and the HSOTN2JI macro documents as ZIP files.
Replaced the vbaProject.bin OLE file in my "Hello World" macro document with the vbaProject.bin from the malicious HSOTN2JI macro document.

Running the scan again resulted in the following detection rate:

Fortunately, these anti-virus products were detecting the actual macro and not solely relying on conventional methods such as blacklisting the hash of the document. Now with a base malicious sample, we can begin tampering with the document.

Exploitation

The methodology I used for the methods of corruption is:

Modify the original base sample file with the corruption method.
Verify that the document still opens in Microsoft Word.
Upload the new document to VirusTotal.
If good results, retry the method on my original "Hello World" macro document and verify that the macro still works.

Before continuing, it's important to note that the methods discussed in this blog post does come with drawbacks, specifically:

Whenever a victim opens a corrupted document, they will receive a prompt asking whether or not they'd like to recover the document:

Before the macro is executed, the victim will be prompted to save the recovered document. Once the victim has saved the recovered document, the macro will execute.

Although adding any user interaction certainly increases the complexity of the attack, if a victim was going to enable macros anyway, they'd probably also be willing to recover the document.

General Corruption

We'll first start with the effects of general corruption on a Microsoft Word document. What I mean by this is I'll be corrupting the file using methods that are non-specific to the ZIP file format.

First, let's observe the impact of adding random bytes to the beginning of the file.

With a few bytes at the beginning of the document, we were able to decrease detection by about 33%. This made me confident that future attempts could reduce this even further.

Result: 33% decrease in detection

Prepending My Cat

This time, let's do the same thing except prepend a JPG file, in this case, a photo of my cat!

You might think that prepending some random data should result in the same detection rate as an image, but some anti-virus marked the file as clean as soon as they saw an image.

To aid in future research, the anti-virus engines that marked the random data document as malicious but did not mark the cat document as malicious were:

Ad-Aware
ALYac
DrWeb
eScan
McAfee
Microsoft
Panda
Qihoo-360
Sophos ML
Tencent
VBA32

The reason this list is larger than the actual difference in detection is because some engines strangely detected this cat document, but did not detect the random data document.

Result: 50% decrease in detection

Prepending + Appending My Cat

Purely appending data to the end of a macro document barely impacts the detection rate, instead we'll be combining appending data with other methods starting with my cat.

What was shocking about all of this was even when the ZIP file was in the middle of two images, Microsoft's parser was able to reliably recover the document and macro! With only extremely basic modification to the document, we were able to essentially prevent most detection of the macro.

Result: 88% decrease in detection

Zip Corruption

Microsoft's fantastic document recovery is not just exclusive to general methods of file corruption. Let's take a look at how it handles corruption specific to the ZIP file format.

Corrupting the ZIP Local File Header

The only file we care about preventing access to is the vbaProject.bin file, which contains the malicious macro. Without corrupting the data, could we corrupt the file header for the vbaProject.bin file and still have Microsoft Word recognize the macro document?

Let's take a look at the structure of a local file header from Wikipedia:

I decided that the local file header signature would be the least likely to break file parsing, hoping that Microsoft Word didn't care whether or not the file header had the correct magic value. If Microsoft Word didn't care about the magic, corrupting it had a high chance of interfering with ZIP parsers that have integrity checks such as verifying the value of the magic.

After corrupting only the file header signature of the vbaProject.bin file entry, we get the following result:

With a ZIP specific corruption method, we almost completely eliminated detection.

Result: 90% decrease in detection

Combining Methods

With all of these methods, we've been able to reduce static detection of malicious macro documents quite a bit, but it's still not 100%. Could these methods be combined to achieve even lower rates of detection? Fortunately, yes!

Method	Detection Rate Decrease
Prepending Random Bytes	33%
Prepending an Image	50%
Prepending and Appending an Image	88%
Corrupting ZIP File Header	90%
Prepending/Appending Image and Corrupting ZIP File Header	100%

Interested in trying out the last corruption method that reduced detection by 100%? I made a script to do just that! To use it, simply execute the script providing document filename as the first argument and a picture filename for the second parameter. The script will spit out the patched document to your current directory.

As stated before, even though these methods can bring down the detection of a macro document to 0%, it comes with high costs to attack complexity. A victim will not only need to click to recover the document, but will also need to save the recovered document before the malicious macro executes. Whether or not that added complexity is worth it for your team will widely depend on the environment you're against.

Regardless, one must heavily applaud the team working on Microsoft Office, especially those who designed the fantastic document recovery functionality. Even when compared to tools that are specifically designed to recover ZIP files, the recovery capability in Microsoft Office exceeds all expectations.