What do I Mean by Web Assets?
In general, when I refer to web assets, I mean files that are loaded into the main HTML of a site via HTML tags. Examples include:
- CSS files (via style tags)
- Images (via img tabs)
Programmers are lazy… threat actors are no exception
- Reusing css and js files is easier than writing new ones from scratch
- The set of third party files loaded into an HTML document and the order in which they are loaded is highly variable, and therefore a good potential fingerprint
What Does a Concrete Example Look Like?
Step 1: Choose a Target
I wanted to create a sort of trivial example of searching for connected infrastructure, but first, I needed a malicious domains. Thinking like a SOC analyst, I thought it might be nice to look at some web properties that had known phishing components. To facilitate this, I went over to AlienVault’s Open Threat Exchange (OTX) and found an interesting site there.
When I opened this site in a web browser, I noticed its text centers around earning bitcoins, implying that by doing link shortening for facebook links, you can earn bitcoins. This site seems to have a number of malicious things going on, including running a number of scripts loaded from .ru domains.
Right now, I’m not trying to analyze what the site does or what its author’s intentions may be, my goal is to verify that it’s the type of malicious site that might get sent to users on our network To confirm this, let’s look at the domain in Iris:
Iris highlights that this domain has a risk score of 100, citing that it has a high proximity to other malicious domains, and shows evidence of malware and phishing. This is a great indication that I am onto to something here! With any luck, I should be able to see strong infrastructure connections using web assets in addition to the things Iris already mentions.
Step 2: Pick an Asset to Search For
A couple of important notes here, “components.css” by itself is not going to be a good pivot term, as it will be commonly used across many sites, since it’s such a generic and descriptive name. In this case, it’s the file path that has the unique naming, and the filename that (we hypothesize) has the shared code.
Step 3: Search for Connections
Google probably isn’t the best tool for something like this, but it’s the tool everyone has at hand, so let’s see what I can find by searching on our pivot term.
It is pretty apparent that a number of website metadata tracking links come up as our first searches on Google. This is valuable as these sites track what types of components are loaded by sites, and so can give me the types of answers (infrastructure correlation) that I’m looking for. If I pick one of these other sites at random, say bandirun[.]com, I can take a look and see if I can confirm my hypothesis that I can explore a threat actor’s infrastructure in this way.
Loading bandirun[.]com up in Iris, I see right away that it has a malicious profile, though interestingly this time from Iris’ perspective, its malware score is low, but its proximity score is high. This is actually along the lines of what we want to see, it confirms that this site is connected to other known bad sites.
If I look at the Iris domain row for each of 1ink[.]cc and bandirun[.]com, I don’t see any pieces of data that directly relate these two sites. However, Iris is a great infrastructure exploration tool on its own, so lets do some pivoting and see if I can find something related. If I expand on bandirun[.]com’s IP address, I will get a list of hosts that share that IP.
Now if I scroll through the new list of sites Iris has generated for me, I pretty quickly see that globalmaritimetraining[.]net, which is on the same ip address as bandirun.com, shares the same DNS/SOA email address (markabi.twins@gmail[.]com) as 1ink[.]cc.
Bandirun[.]com, expanded on 104.168.58[.]149:
So the two domains I investigated, though not directly related, have a related attribute (DNS/SOA email address) through a third domain (surfaced by Iris). This shows us that even with a very straightforward approach, I can begin to associate malicious actors infrastructure via web assets.
Step 4: Expand on this technique
To continue to grow the sophistication of this technique, I could begin to look at individual code blocks within the files themselves, as well as comments, coding style, and other indicators.
Inevitably, attackers will adapt their techniques to evade detection via this type of exploration, however, their resources to do so are limited. In essence, I am using my own exploration and asymmetry advantages against threat actors.
For an in-depth discussion on fingerprinting threat actors with similar techniques, watch this webinar I co-presented with Rebekah Brown to learn:
- How the threat intelligence space is evolving
- Practical steps your team can take to get ahead of threat actors
- Real world examples of enumerating attacker infrastructure using web assets and other information scraped from HTML