Farsight Long View

Building a Reputation System From Available Data

Written by: 
Published on: 
Jul 15, 2015
On This Page
Share:

Abstract

Last week, you learned what reputation systems are and what kind of data theyconsume. This week, I will expand on the data reputation systems consume. Wewill look at the data that most mail server operators have and can use tocreate their own reputation systems.

Impetus

To gain access to a reputation system, you could purchase or license one frommany for-profit companies, but what if you or your employer don’t want to footthe sometimes hefty bill for such a system? Perhaps there’s a cheaper and moreinteresting way: as a mail server operator for a medium sized network, youlikely have enough data available to develop your own.

In this article I’ll walk you through the data you could easily collect (or mayalready be collecting) to feed a homegrown system (the actual code andhardware to build the reputation system are left as an exercise to the reader).

Before we begin, Bradley Taylor wrotea wonderful primer on how to build your own email reputation system. While almost 10 years old, the paper is still accurate andrelevant.

SMTP data

The first place to look are the mail server logs. You’ll want to parse,collect, and/or check for the following:

HELO

  • s. Some botnets use the same

HELO

  • for every message they spam. Also,do

HELO

  • s and reverse DNS (rDNS) match? Matching

HELO

  • and rDNS is a goodindication of clued-in mail server operation. This would benefit the sender’sscore.
  • Matching forward and rDNS for the sender. This would benefit the sender’sscore.
  • Is there a Sender Policy Framework (SPF) record for the domain? Is it valid?The presence and validity of which would benefit the sender’s score, theabsence or invalidity of which would hurt the sender’s score.
  • Is the sender IP or domain listed in a blocklist? Look especially for IPslisted in the Spamhaus PBL. This wouldsignificantly hurt the sender’s score.

Other Data Sources

With a little scripting, there is other information you can glean from logs:

  • How many domain names has the IP resolved to within a particular quantum oftime? The more domains that have resolved to the IP might be cause for concernand could hurt the sender’s score.
  • Has the IP sent spam to you in the past? Clearly, this would significantlyhurt the sender’s score, probably with a decaying coefficient based on the dateit last sent you spam.
  • Has the domain been seen in spam in the past? This would probably have asimilar effect and be governed by a similar back-off as the previous item.
  • What ASN does the IP belong to? Do IPs from that ASN send spam frequently?This would probably have a similar effect and be governed by a similar back-offas the previous two items.
  • Does the TTL of the domain name fluctuate? How frequently? By how much? Asthis could be indicative of Fast Flux DNS, this could hurt the sender’s score.
  • Data from an intrusion detection system such as Snort, Bro, or Suricata.This can be very useful in some circumstances. Detecting IPs that try to sendmalware into your network is obviously A Good Thing (TM). Additionally,detecting IPs that were the targets of that malware is also good, but may giveunintended results when used in a reputation system. If an attack on aninternal IP is intercepted or prevented, then “punishing” that internal IPwith a degraded score may have unintended results. Test early and test often.You might find that data from an intrusion prevention system is useful as well,because that data is historical rather than contemporaneous, but again, testyour rules thoroughly before using them in production.
  • Reliable DNSBLs are always useful. Consider the CBL from Spamhaus and theSpamCop Blocklist, as well asSURBL for domains.

Farsight Security Datafeeds

Additionally, Farsight provides unique sources of information that can be usedas inputs to a reputation system:

Conclusion

With this data, you can be off to a good start on designing an in-housereputation system for your own network. In the next and final installment ofthis series, we’ll look at the axes on which we might evaluate this data. Morespecifically, what exactly goes in to a reputation score and how that mightchange your individual goals.

Kelly Molloy is a Senior Program Manager for Farsight Security, Inc.

Read the next part in this series: Optimizing Reputation System Input Data