Hunting Malicious Domains: Introducing DomainTools Threat Profile

NOTE: Iris Detect has supplanted PhishEye, with dramatically expanded capabilities. Please explore Detect for your brand protection, anti-fraud, and spoof infrastructure analysis needs.

“Every path you take, every domain registration you make, we’ll be watching you.”

Summary (TL;DR)

We created Threat Profile, a set of supervised machine learning classifiers, to find domains which could become weaponized by bad actors for phishing, malware, or spam campaigns
We find these domains before they are weaponized; we think of them as domains registered with “malicious intent”
The accuracy of our classifiers is pretty darn good
We created a new machine learning infrastructure, called the Crank, to make quick changes, run experiments, and automatically evaluate the results,
We have a dedicated Data Science and R&D team so that we’ll constantly stay one step ahead of the bad actors
You can easily integrate the Threat Profile score into your firewall rules, Splunk, or other threat intelligence processes
It’s in beta now, and will be released soon

Introduction

There are, unfortunately, bad actors on the Internet who register, weaponize, and deploy domains as part of phishing, malware, or spam campaigns. Malicious domains make the internet a less safe and more annoying place for everyone. Our goal is to identify and flag these domains—domains registered with “malicious intent”—before they are weaponized and they “Cry ‘Havoc!’, and let slip the dogs of war”.

So, we’ve invested heavily in Data Science and R&D over the last 18 months to create Threat Profile, a component of the DomainTools Risk Score that you can use to augment your existing threat intelligence processes. We think of domains with a high Threat Profile score as belonging on a “domain watchlist”, domains which we believe may become dangerous in the near future. An overview of Threat Profile is shown in Figure 1.

Threat Profile embodies our belief that bad actors make the Internet a less safe and more annoying place for everyone.

Figure 1: An overview of the Domaintools Threat Profile, showing how malicious domain data is used to train classifiers which then generate risk scores on new and and updated domain registrations

What is a Threat Profile?

A Threat Profile is our view into the mindset of bad actors: how they determine which domains to register and how they set up their malicious infrastructure. From that view, we’ve created a set of three machine learning classifiers: one for phishing, one for malware, and one for spam.

Each classifier is independently engineered, trained, and optimized to find domains registered with malicious intent. We use the same high-quality industry and vendor blocklist information as the Proximity component of the Risk Score, as well as our extensive Whois and DNS databases, to identify important domain features against which we train our classification models. Each model is repeatedly tested and optimized over time to validate its accuracy.

These classifiers analyze all new and updated domain registrations, generating scores indicating our belief that a given domain has malicious intent. Specifically, our models look for domains that may become weaponized anytime within next 18 months. A high score doesn’t guarantee badness; a bad actor may register many domains but only end up using a few, but our Threat Profile classifiers are designed to find all such domains registered by bad actors whether or not they ever become weaponized.

Important Note: we’re not looking for compromised domains, only domains we believe are registered with malicious intent by bad actors or their proxies.

Classifying Domains

We are using state-of-the-art supervised machine learning classifiers to build our Threat Profiles. Each classifier is selected and tuned independently to best identify phish, malware, and spam threats respectively. Here’s an overview of how we do it:

Create training and test datasets using curated blocklist data and our Whois and DNS databases
Use our extensive domain knowledge of bad actors, our expertise in cybersecurity TTPs, and detailed analysis of our data to determine which intrinsic properties of the domains, a.k.a. features, are most useful for identifying malicious intent
Run grid searches using our machine learning infrastructure over different sets of features and tunings to optimize our classification models
Compare model accuracy on the test dataset using standard classification metrics

A feature is machine learning terminology for an intrinsic property of an item which is used to train a classifier or used to classify an item. “Raw” information about an item isn’t useful, it needs to be encoded for a computer to understand it. This “encoding” is the feature. For example, if you want to train a classifier to predict someone’s age, you might use height as measured in inches as a feature. The feature is encoded as a number to represent height. When a classifier does training, it looks at the set features for each item and learns from the patterns it finds.

For the Threat Profile score, we created features from three categories of data:

The domain name itself, including TLD
Domain registration information
Domain infrastructure information

We found that different features are more or less important when predicting phish, malware, or spam intention. One concrete example: using a hyphen in your domain name, such as “com-online-today[.]test”. While hyphens are important to identify phishing domains, they aren’t nearly as important for identifying spam domains.

For each classifier, we look for discriminatory features in two ways. First, we leverage our internal expertise in cybersecurity and domain registrations. For example, many of the features used in our phish classifier come from our expertise creating PhishEye. Second, we’re doin’ data science—we look for correlations and patterns among domain metadata. It’s deeper than just the characters in the domain name or TLD. We look at how and when a domain was registered, and we look at the infrastructure used to host the domain.

Train, Test, Repeat

We’ve spent the last 18 months in research and development of the Threat Profile classifiers. To that end, we created a robust machine learning infrastructure (affectionately called “the Crank”) to deploy and test changes to our features and classifiers quickly. Using the Crank, we can run not just a few, but hundreds of classifier experiments on our cluster at once. All of this to find interesting interactions between features and improve our models.

To make sure our models are awesome, we use a consistent training/testing methodology. We randomly sample domains to be in either a training or test dataset and then perform k-fold cross-validation over models built with the training dataset. This helps ensure that the models are not brittle or overly sensitive to the training data. We like k-folds so much, we built it into the Crank.

We use a standard set of accuracy metrics to evaluate our models. Some metrics measure the classifier’s overall performance, and others measure its performance at a given threshold. We evaluate against the withheld test datasets. Our metrics include:

Receiver-Operator Characteristic (“ROC”) Curves
Precision-Recall (“PR”) Curves
Precision, Recall, and the F1 Score, at given thresholds

For both ROC and PR curves, it is common to look both at a visualization of the curve as well as the area under the curve (AUC). The higher the AUC, the better the classifier is doing, with 1.0 being “perfect”. The F1 score is the harmonic mean of precision and recall, and thus takes both false positives and false negatives into account. It’s more robust than precision or recall alone, and harder to achieve a high score. It’s perfect for us and our high standards. It also ranges from 0.0 to 1.0.

The Crank allows us to encode and execute hundreds of classification experiments at once. We can quickly compare each experiment’s results and use that data to help us improve our models over time.

Peeking Under the Hood

So, just how good is it? Let’s look at the metrics for one of our Threat Profile classifiers: Phishing. This data comes from one of our recent rounds of testing; we expect the performance of our releasing Threat Profile to be even better.

Table 1 shows some summary metric scores for our Phishing Threat Profile classifier. While AUC and F1 score performance is application dependent, we are very happy with these scores and the domains we classify as having phishing intent.

Table 1: Summary Metrics for Phishing

The classifier we’re using for the Phish Threat Profile returns a raw score between 0 and 1, where 0 means not “phishy” at all and 1 means totally “phishy”. To compare the classifier’s score against the test dataset, you select a threshold, typically 0.5, and do “a cut”. Everything below the threshold is considered a 0 (not phishy), and everything above it is a 1 (totally phishy). For this instance of Phish, we set our threshold at 0.46.

Figure 2 shows how the Precision, Recall, and F1 scores for Phish vary as we adjust the Threshold parameter from 0 to 1. In the figure, you can trace the tradeoffs between optimizing a classifier for precision versus recall: as one falls the other increases. We are happy to see that for a broad set of thresholds our Phish classifier generates high F1 scores. This implies that most of the raw classification scores are not near 0.5, but rather towards the two ends of the spectrum and gives us high confidence in the quality of our classifications.

Figure 2: Precision, Recall, and F1 scores for Threat Profile Phish by Threshold. The x-axis is the threshold, and the y axis is the metric score.

Our Malware and Spam Threat Profile classifiers show even better performance with F1 scores near or exceeding 0.9 and ROC AUC scores above 0.95. Why did we show you Phishing, the worst performer of the three classifiers? It is important that our customers trust our scores and the data science behind them. In security trust is earned and not given. Our customers will want to know how we generate and update our scores before including them in their operational security practices and processes. Most importantly we hold ourselves to the highest standards, both our own and those of our customers and partners.

Interpretation and Use

Threat Profile is a risk score and should be used as part of your existing threat intelligence processes. Think of domains with a high Threat Profile score as belonging on a “domain watchlist,” domains that could become weaponized anytime within the next 18 months. Depending on how severely we score the domain and your organization’s risk tolerance, you may want to take different actions: everything from flagging their appearance in your server logs to blocking the domains outright.

The Threat Profile score format is similar to the Proximity format, following a 0 to 100 scale. The higher the Threat Profile score, the more likely the domain was registered with malicious intent:

0, domain is zerolisted
50+, suspicious
70+, our recommended threshold for indicating malicious intent
90+, strong confidence in near-term weaponization
100, domain is on an industry blocklist

We combine the results from our three independent classifiers together to create one composite Threat Profile score. This score optionally comes with supporting evidence, outlining how the classifiers fed into the score for a given domain. Threat Profile is designed to be used in conjunction with with our Proximity score to help you understand the kinds of threats appearing on your network–Proximity to identify domains closely related to known malicious activity and Threat Profile to identify domains with malicious intent before they can be weaponized.

If you just want to mitigate your overall risk, use the DomainTools Risk Score, which is the combination of Proximity and Threat Profile. It is the “one score to rule them all” which you can easily integrate into your firewall rules or other automated threat intelligence processes.

A Note About Dormant Domains

Not every domain registered by a bad actor will be weaponized. Many will sit dormant until their registration period ends. The goal of Threat Profile is to find all domains registered with malicious intent, even if they remain dormant. From a classification perspective, these domains are not “false positives” but rather “future positives,” because we believe they have the potential to become weaponized at any time.

There’s always a tradeoff between providing users access to resources online and protecting your network from threats. We believe watching and/or blocking domains flagged with our Threat Profile score is an effective way to isolate potential threats while minimizing the impact to your users and customers.

The DomainTools Advantage

We have a dedicated R&D and Data Science team continually monitoring changes in our DNS databases and evaluating new blocklisted domains to determine how bad actors behave and update our models accordingly. Moreover, we built the Crank, our flexible machine learning infrastructure, to make quick changes to the features, run experiments, and automatically evaluate the results. This infrastructure is just as important as the Risk Scores themselves–it means we can keep putting out high quality predictions in the future, no matter how bad actors change their tactics.

In the cat-and-mouse game of domains registered with malicious intent, we’re not building a better mousetrap, we’re making the cats better hunters. Hunting malicious domains.

Take it for a Test Drive

The DomainTools Risk Score with Threat Profile will be available soon; we’re rolling it out in beta now. Moreover, our latest release of the DomainTools App for Splunk has built-in support for our new Risk Score with Threat Profile & Proximity.

Hunting Malicious Domains: Introducing DomainTools Threat Profile

Share this entry

Summary (TL;DR)

Introduction

What is a Threat Profile?

Classifying Domains

Train, Test, Repeat

Peeking Under the Hood

Interpretation and Use

A Note About Dormant Domains

The DomainTools Advantage

Take it for a Test Drive

Sign up for our newsletter

Related Content

RDAP and BGP in Investigative Journalism

Part 2: Tracking LummaC2 Infrastructure

Newsletter No. 5: A Little Bit of Research in my life…