abstract image of colored circles
Blog Farsight Long View

Optimizing Reputation System Input Data

Abstract

Over the last two articles, you’ve learned what a reputation system is and the types of data with which to seed it. In this last article, I will discuss how to tune your reputation system. I will discuss how to optimize the data you have in order to accomplish specific security goals.

Goals

One of the best reasons to create your own reputation system is because it will give you greater freedom to tune it to meet your organization’s own specific needs. Commercially available IP and domain reputation systems are often one size fits all, and you may find that those system’s goals are not congruent with your goals, or you may wish to fill a need that the other products available to you do not meet. I will discuss some common goals and how you might use available data to fit your use case.

Use cases

The most common use cases for IP and domain reputation systems are:

  • Preventing email spam and phishing attacks from entering your network
  • Checking the reputation of domains within the bodies of email messages

Email spam

For email, your inputs are relatively self-explanatory. Check DNSBLs and “record” listings for a certain amount of time. I prefer to make that time relatively short for most inputs—for example, if an IP is listed in CBL or Spamcop BL, that indicates a transient threat like a malware infection; once resolved there is really no security justification to downgrade the reputation of that IP for a long period of time. If the threat persists or recurs, the IP will likely quickly be re-listed. Repeated listings indicate an ongoing problem and justify longer degradation. Longer downgrades are appropriate for IPs that directly transmitted spam to you in the past, such as ESP outbound servers.

Consider downgrading IPs from TLDs and geolocations that have sent a significant amount of spam to your systems in the past. (Note that I am not saying that it is a good idea to drop all mail from certain regions or TLDs. I know of administrators who categorically do not accept mail from IPs allocated to AFRINIC, for example, because they do not want 419 spam. I don’t want 419 spam, either, but I believe rejecting all mail from a continent is too heavy-handed and an unwise policy which may result in an unacceptable rate of false positives). Also remember that a “hit” impacts reputation, but it doesn’t define it —- reputation is made of many positive and negative factors considered together, not just one.

Domain reputation

Many sites parse the bodies of email messages in order to look for domains that may be associated with malware or botnets. I would check SURBL and snowshoe DNSBLs first, since that is low-hanging fruit, and I would decay any listings I found very slowly. I would improve the score of any site in the Alexa top 500. (If you want to be more conservative and are inclined to expend the effort, you could check against a list of all URLs your users have ever visited previously. You will get false positives, which you can then add to your database of visited sites manually, but it’s more effective than you might think). I would also quarantine any mail containing links to file sharing sites when the mail does not come from the site itself.

Other things to be suspicious of include URLs on sites with dynamic DNS and any URL that resolves to a dynamic IP. I do not find URLs belonging to newly registered domains to have value, so consider checking domains against Farsight Security’s Newly Observed Domains (NOD) list and let those listings decay slowly—after a week or so, we can talk, maybe, depending on other criteria. There are lists of domains used by botnet command and control servers; weight those very heavily and let them decay slowly. Some domain reputation services temporarily degrade the reputation of any URL currently seen in Pastebin and the like, which you may find useful as well.

Conclusion

With this final article, you should have the information you need to create a basic reputation system, identify data that is useful, feed your system with data from your own and other publicly available sources, and to weight and degrade it appropriately.

Kelly Molloy is a Senior Program Manager for Farsight Security, Inc.