
When we speak of the fundamental use cases for domain, DNS, and related data such as SSL certificates or screenshots, there is typically a base level of understanding of what we’re talking about and why it’s important. The conversations include things like:
But what we don’t always spend as much time on is how we can actually spot those potentially risky traffic flows, and how to knit together what we can see in the protected environment and what’s available from sources such as the DomainTools Iris Investigate or Farsight DNSDB databases. This blog seeks to delve into that space.
In the world of roadgoing vehicles, the term contact patch refers to the part of the tire that is in actual contact with the road—four, for a car, two for a motorcycle (unless it’s popping a wheelie, but that’s a different blog). All of the fancy engineering of the transportation aspects of that vehicle—the engine, transmission, suspension, brakes, and their ancillary systems—plays out in those relatively few square inches of contact between tire and pavement.
It’s like that with traffic flows in a protected environment, and the data that can help security folks make sense of those flows. When an Internet-related event occurs, packets flow from a client device to a server, and back from that server (yes, this is an oversimplification, but the concept works for our purposes here). The places where forensic “tire tracks” are left behind are the network forensics contact patch. Here are some examples:
If we are to get as much context as we can about the traffic flow in question, these are the places to look, because once we have extracted the key observables—domain name and IP address for our purposes today—we can then open up a can of OSINT (open source intelligence) on the flow to find out more about what’s really going on.
Not every event recorded in the typical protected environment has a domain name associated with it. In fact, relatively few do. Fortunately, those few are present in every network, so you have the veins of gold; you just have to mine them.
Think about a basic Web transaction. Let’s say a user—we’ll call her Sarah—visits a web site. If she’s working in an office environment (or from home, but over a VPN that pushes all traffic through the office, which is common in many remote work environments), the web transaction will look different to us depending on where we’re viewing it. For example:
There are several reasons why it’s particularly valuable to get domain names in our logs and events—but those are matters we’ve discussed before (please do take a look if you’re curious!) For now, let’s assume we are resolved (sorry, not sorry!) that domains are valuable forensic artifacts.
If something about Sarah’s connection to that site raised an alarm—for example, if its IP address was on a reputation list for dangerous addresses—we would want to know more about the event. But if we only had that IP to go on, we would be left with some challenges:
So let’s get those domains from the logs!
If we use all, or even most, of the log sources that could possibly give us domains in network transactions, we’ll get a lot of data. Much like the dog that actually caught the car he’d been chasing all those years, collecting all of these logs could give us more than we bargained for. That raises a few different matters we have to address, such as:
If we were going to keep every single DNS record, we’d certainly be in for quite a data management problem. But what if we were to pre-filter the data to narrow it down? Are there methods we can use to curtail the records to a more manageable level while still retaining what might be important from an incident response or proactive defense perspective? The answer is yes. Yes, there are.
One of the most useful pre-filtering techniques would be to eliminate any DNS log data related to requests for popular domains. This will cut things down considerably, since the majority of your users (and even the non-user-initiated automated processes running in most environments) are sending traffic to relatively well-known resources. There are several resources available (after the demise of the famous Alexa web ranking service) that you might consider for getting this data:
But even that might not be enough to narrow things down enough. If the amount of non-top-million log data is still too high, there are some other filtering criteria you could consider:
If you choose to use one or more of these methods, you now have the ingredients for some paring-down of your domain list. What you’re aiming for is a much smaller list of domains that are not well-known on the web or to your filters. Some of those domains will likely be innocuous, such as your uncle Fred’s new vanity stamp collection site; but when it comes to malicious domains, a large proportion of them are young (recently created) and comparatively obscure (so they won’t be on that top-million list).
A good pre-filtering script might behave something like this:

With this narrowed-down list of domains, we still need a programmatic way to identify the ones that might be most risky—especially if we’re planning on doing some hunting or some detection engineering based on what we learn about them. This is where enrichment of the domain names comes into play, often with tools such as SIEM and SOAR. In Part 2, we’ll look at how that enrichment can take place.