Abstract digital wave of particles and light, depicting data flow or network connectivity, with a gradient of blue and orange colors on a dark background.
Blog General Infosec

Streamlining Adversary Infrastructure Hunting With SOAR


Ever since DomainTools began working with SOC practitioners, which goes back many years now, we have been intensely interested in the objectives analysts and hunters have in mind when they delve into our datasets. We’re always on the lookout for ways to enhance your capabilities in the SOC, whether that is by beefing up the data available in Iris and our APIs, or by supporting and enhancing specific investigative activities (which is the raison d’etre of the Iris UI itself). And when it comes to workflows in the SOC, automation is almost always part of the conversation. 

We’re seeing more and more folks using Security Orchestration, Automation, and Response (SOAR) technologies, whether in a ready-made commercial platform, as a series of custom-built scripts, or somewhere in between. This three-part blog series delves into how the kinds of infrastructure hunting and exploration that our users often carry out in Iris can be supported with SOAR playbooks, to help you accomplish more without having to add more resources (or hours to your workday). In this installment, we’ll cover some orientation around domain enrichment and adversary infrastructure hunting, and then we’ll look at what a SOAR playbook can offer for straightforward flows and use cases. In parts 2 and 3, we’ll look at more complex cases where the SOAR playbook and the human analyst have handoffs; specifically, we’ll look at where the playbook leaves off and the analyst goes into Iris for a closer look. The third installment will give a preview of a new tool that we’re going to be open-sourcing to help analysts make sense of seemingly opaque datasets.

Adversary Hunting Basics

First off—for more information on this topic, I encourage you to check out this earlier blog series, which takes a deep dive on how to use DNS-related data to get ahead of attacks; this blog by Joe Slowik about how indicators can act as composite objects for analysis, and another by Joe on inferring adversary intent through examination of infrastructure. The idea behind this type of hunting is to gain better context on domains seen in your logs or events, by answering questions such as:

  • When was this domain created?
  • Does it have a high risk score?
  • Is it part of a larger campaign?
  • Is it related to other domains or IP addresses that also represent threats?
  • What is the nature of the entity controlling it?

The profile information that DomainTools provides, in our APIs or in the Iris UI, helps analysts quickly answer many of those questions. The third and fourth questions, about whether the domain is part of a larger campaign, are what propel investigations where the analyst “pivots” on attributes such as IP addresses, name servers, or registration details, to find other domains (and their associated hosting infrastructure) that may be under the same control as the domain that originally was flagged. Building out that picture does two key things for the analyst:

  • Gives better context on the first domain seen, potentially exposing a larger campaign
  • Provides additional search items (domains, IPs) to determine whether the larger campaign has touched the protected environment

Based on what is learned, the analyst then can make better-informed decisions about what to do about the domain, from ignoring it to blocking traffic to it to sharing information about it with a trust group or other threat intelligence partners. Simple Whois and NS lookups can’t provide enough of the necessary detail to drive good analysis.

Where Does SOAR Fit In?

SOAR playbooks are no replacement for human analysts, but they can streamline the operations that “set the table” for analysis and disposition of indicators. For example, a simple SOAR playbook can perform lookups of domain profile information, and provide a list of domains that match predetermined criteria for further scrutiny. Many SOCs have a policy of not trusting newly-created domains; some combine that with obtaining risk scores, so that all young and/or risky domains that have received traffic are flagged. If you are a Splunk Phantom user, as an example, you can use this playbook to perform risk score lookups. If you’re a Demisto user, this playbook auto-enriches domains by calling the Iris Investigate API.

The Simple Path: Playbooks for Enrichment and Basic Hunting

Enrichment of domain data from the protected environment is a foundational activity; it’s something that just about every shop does in one form or another, and as a DomainTools user, chances are you’re enriching indicators seen in your environment in the form of domains or IP addresses. When you enrich a domain, you add information to the raw indicator—information that can help you better understand what the domain is all about, and what (if anything) you need to do when you’ve observed traffic to that destination. The Iris Enrich API allows you to append registration, hosting, and content data about any domain you wish. For many users, this enrichment happens on an automated basis for any domains flagged in a Security Information and Event Management (SIEM) tool. Examples of this enrichment include:

  • Risk: the DomainTools Risk Score for the domain
  • Registration: Whois data including creation and expiration dates, registrant identification, and contact/location information. In the post-GDPR world, much of this information is made private, but in most cases at least the registrant organization is available, and the creation dates are valuable to SOCs who want visibility into connections to newly-created domains.
  • Hosting: DNS-based information such as IP addresses, hosting provider names, Autonomous System Numbers (ASN), IP location, Mail Exchanger (MX) records, and other DNS record types such as TXT, CNAME, and more.
  • Content: SSL certificate data including hashes, SSL organizations, and more; tracking codes such as Google Analytics or AdSense; screenshots

Some shops make use of most of these data points, while others rely on just a couple of key ones (such as creation dates, IPs, and name servers).

Setting the Table

This whole discussion is oriented around enriching domains to help identify malicious infrastructure. But what domains should be candidates for lookups? In a large organization, your volume of traffic may rule out auto-enriching every domain that receives traffic (or a DNS lookup on your resolver). So it may be necessary for you to develop a playbook that “sets the table” for enrichment and hunting. It could look something like this:

  1. Aggregate all log sources that have domain names available (more on this later)
  2. Normalize the logs and extract the domain names (at the SLD level, e.g. “example.com”)
  3. Write these to a file
  4. Look up all domains against the Alexa Top Million
  5. Discard domains in the Top Million
  6. Write remaining domains to a “candidate domains” file

This would give you a list of domains that aren’t among the Internet’s most common. It would by and large select-in most young domains, since young domains are less likely to make the top million than older ones. Likewise, it will tend to select-in domains of higher risk, because malicious domains tend to be flagged and placed on block lists before they reach the top million. Of course, there are exceptions to these, but it’s a first-stage filter that many SOCs like to use.

As in the first playbook sketch, here too, step 1 is simple to write, much more challenging in real life to pull off. This blog on sources of domain names in logs is a great place to start, and this one, on Windows DNS, also includes this diagram of how this process can work:

Playbook sketch diagram.

Conceptually, a similar process applies for other sources of domain data, such as web proxies, SMTP servers, endpoint security telemetry, and more. The key is sources that contain domain names.

Enriching, Thresholding, Teeing Up Next Steps

A simple enrichment and thresholding playbook could look like this:

  1. Ingest candidate list of domains to look up (possibly generated by the previous playbook)
  2. Look up domain age and risk score
  3. Age: for domains younger than days (your choice), flag
  4. Risk: for domains with Risk Scores above (also configurable), flag

While this playbook does not automate any response actions, it does help focus the analyst’s attention on domains that may violate the organization’s policies for young or high-risk domains. The thresholding you set in steps 3 and 4 of this playbook could, depending on your policies and practices, set up further automated steps such as generating detection or blocking rules. You could choose to create a script, for example, that creates a Snort signature for any domain beyond your chosen age/risk thresholds; Snort rules can be used for detection and/or blocking in many different platforms.

Exposing Larger Campaigns

In the section describing adversary infrastructure hunting, we saw that part of the value of enrichment is that it gives us the opportunity to suss out a larger campaign, of which the original domain seen is just one part. How do we accomplish this with a playbook? This is where the Iris Investigate API comes into play. One of its most intriguing features is that it provides a count on any “pivotable” field—that is, a field, such as an IP address, email address, or SSL certificate—that has other domains connected to it. Here is what the IP address section of an Enrich API response looks like:

Here is what the IP address section of an Enrich API response looks like:

Notice the word “count” in this response: the IP address in question has a count of 3, so there are 3 domains that the DomainTools Iris dataset has seen on that IP. The ASN has a count of 101, meaning that 101 domains share that ASN. 

These counts are valuable for helping you determine what infrastructure might be worth a closer look. If the count is too low (e.g. 1), it is not interesting: only one domain has that value, and it is the domain you’re already looking at. If the count is too high (e.g. 10,000), then it’s hard to infer any meaningful connection between those domains, plus that’s not a particularly manageable number for an analyst to process anyway. But when the count is in a sweet spot of a handful to a few hundred domains, there is a higher likelihood that the connection between those domains is non-coincidental, and the number of domains to be examined is not unreasonable, either. The Iris Guided Pivots Playbook for Phantom identifies infrastructure that may be connected to domains of interest. 

In the next installment of this series, we’ll look at where the analyst takes over from the playbook, such as when the Iris Guided Pivots Playbook delivers a set of infrastructure that could represent a campaign connected to a domain that was flagged by the enrichment/thresholding playbook. You may also have your own ideas about the direction you would take your workflow based on the wealth of data that Iris can surface for you. 

Happy exploring!