No One Expects the Pineapple Affix
In a previous DomainTools Report conducted in the summer of 2016, we leveraged DomainTools extensive dataset in order to examine the distribution of malicious or neutral domains across a set of affixes (which are made up of prefixes, suffixes, and infixes). Our analysis confirmed that certain affixes do portend higher risk, and we published data demonstrating which affixes were most represented in domains added to industry blocklists for malware, spam, or phishing.
As threat actors continually evolve their tactics, we periodically refresh DomainTools reports and update earlier findings. In the most recent DomainTools report, we went back to the data for a new study of affix patterns. Our aim was to identify what had changed, what stayed the same, and what inferences could be drawn from the data. In the interest of regularly evolving and refining our methodology, we also introduced a new method of finding affixes which contributed some interesting new data.
What you Need to Know
What is an Affix?
noun
af-iks
1. An additional element placed at the beginning or end of a root, stem, or word, or in the body of a word, to modify its meaning.
Examples of affixes as discussed in this report:
www–apple[.]com (the prefix “www–” is attached to “apple”)
googlecom[.]net (the suffix “com” is attached to “google”)
wonderfulprizes[.]stream (the infix “prize” is embedded in the domain name)
Methodology
We developed a new method of hunting for suspicious affixes for this edition of the report, but some aspects of our work remained the same as in the previous study. First, we amassed a list of affixes that appeared frequently in an initial corpus of domains used in phishing attacks and other nefarious activity. Then we queried our database to assess the rates of appearance of the affixes. Next, using well-known industry blocklist providers, we compared the rates of occurrence of these affixes in any domains that had been identified as spam, phishing, or malware on their blocklists.
Our new technique of identifying interesting affixes worked like this:
- We split every existing domain name into sets of three contiguous letters, a process called tri-gramming. (Example: the word “affix” contains the tri-grams aff, ffi, and fix) We then used the signal strength algorithm (described below) to identify tri-grams that are overrepresented in the three threat categories of spam, phishing, and malware.
- We combined overlapping high-signal tri-grams into larger word fragments. This provided hints for the most likely malicious patterns (since the tri-grams themselves were generally not words).
- We then generated a new list of affixes based on these patterns and re-ran our affix processing.
Armed with the new data set, we compared how these new affixes stacked up against our previous list. Some of the questions we wanted to answer were these:
- Do certain affixes still carry a strong signal of risk?
- Have the specific affixes favored by threat actors changed over the last 12-18 months?
- Do the malicious activity types (malware, phishing, spam) have different constellations of affixes?
The Results
Now that you have some context for our methodology and affixes themselves, here are the top ten phishing, malware, and spam affixes. Please note: In each of these lists, the affixes in bold are those found with our original methodology that did not make the Top 10 in the previous study. Affixes in italics are those found by our new affix hunting methodology.
The Big Picture
Changes in favored affixes represent one small insight into the many ways that threat actors are evolving. Cybercriminals are nothing if not pragmatic; it is likely that the new affixes unearthed in our Top 10 lists are connected to a demonstrable ROI for the criminals. Similar to the 2016 affix report, some of the key findings were surprising. For example, we expected to see prefixes such as “www” and suffixes as as “www” and “com”, but these seem likely to remain in our Top 10 lists for some time, but we didn’t expect to see “pineapple” or “plan-cul.” A slightly discouraging trend is the apparent payoff to criminals for including variations on the words “upgrade,” “update,” and “security;” it appears that victims are being successfully lured by those terms.
These signals may prove extremely valuable in combination with other features we have examined. Our Threat Profile algorithm uses various attributes of domains to develop predictive risk models for domains. These affixes are examples of the kind of attributes that can, at statistically relevant scale, classify dangerous or unsavory domains.
In the meantime, we hope that these analyses are helpful to security professionals, researchers, and anyone else interested in better understanding large-scale patterns in domain registration data with respect to nefarious activities.
For a complete picture including additional key takeaways and historical trends, download the full DomainTools Report.