Farsight TXT Record

Farsight's Real-time DNSDB, Part One

Written by: 
Published on: 
Oct 28, 2015
On This Page
Share:

Background

I remember back in 2008 when Paul Vixie introduced PassiveDNS replication to me,a real-time stream of names and answers scrolling by in a terminal window.Every now and then I could pause the terminal and see what obviouslylooked like a “pharma” domain or some kind of phishing. It was excitingand magical, but it wasn’t quite useful — yet. Until an indexed databasewas available, it wasn’t easy to make associations between the same IPaddresses a criminal used for one campaign to identify other campaigns.We also wanted to ask questions like “What other hostnames are being usedinside this domain?” Until we were able to build an index based on the livedata, all we could do was “grep”.

We’ve had friends try to use standard SQL databases, and they’ve had difficultybeing able to keep up with inserting new information with the flood of data comingin while still being able to perform queries. It inspired us to look totime-delimited NoSQL solutions. We’ve gone through several iterations ofNoSQL database design on theback-end ranging from:

  • a simple DB4 file (disk was too slow), to
  • hybrid CDB sorted indices (not scalable long-term), to
  • Cassandra clusters (reliability/speed issues), to
  • TokyoCabinet on PCIe SSD (generally good performance), to
  • developing a generic MTBL and specific DNStable implementation of sorted string tables.

Each iteration took advantage of technology available at the time tomake lookups as efficient as possible. The last two were developed totake advantage of SSD to write data once and let clients read as manytimes as needed without any spinning media bottlenecks. We generatedhourly databases from live data that were merge-sorted into dailydatabases and then monthly databases and yearly databases. A “fileset”gives an access client the list of databases to open in parallel toquery for their answer. The process scales well on RAID arrays ofgeneric 2.5″ SSD drives, and we can replicate linearly if needed.

If researchers wanted or needed to perform lookups for data in the last fewminutes, we usually directed them toward the raw Passive DNS real-time datafeed that’s available on the Security Information Exchange (SIE). They could develop their own methods to utilize the real-time data, but they found it complex and time-consuming to useboth a database and create their own processing scripts for the real-timedata.

As announced last week,we’ve improved our DNSDB Export service to real time. Now, users don’t haveto wait for the next hourly update to get more information. We now makeupdates available for DNSDB Export every minute. We developed a TLS-baseddownload manager to speed up transfers and manage consistency on the clientside based on what local files are available.

How do I use it?

First, understand that with PassiveDNS replication, we gather matchedquestions and answers between recursive nameservers from all over theInternet. We have a waterfall computing model where raw uploadsfrom sensors are deduplicated by their query and answer, deduplicatedagain based on where in the DNS hierarchy the answer arrived from, andthen filter out superfluous data. The process is documented in Passive DNS Architecture.The live data includes information that looks like the following:

type: EXPIRATION
count: 1
time_first: 2015-10-22 03:20:19
time_last: 2015-10-22 03:20:19
bailiwick: tumblr.com.
rrname: ziegast.tumblr.com.
rrclass: IN (1)
rrtype: A (1)
rrttl: 30
rdata: 66.6.41.21
rdata: 66.6.42.21
rdata: 66.6.43.21

type: INSERTION
count: 10
time_first: 2015-10-22 07:16:26
time_last: 2015-10-22 18:56:16
response_ip: 194.85.252.62
bailiwick: ru.
rrname: 1f.ru.
rrclass: IN (1)
rrtype: NS (2)
rrttl: 345600
rdata: ns3.nic.ru.
rdata: ns4.nic.ru.
rdata: ns8.nic.ru.

The entries are inserted as tuples into the database DNStable. The indicesare built to enable queries based mostly on rrname and rdata. We can makesimple direct queries like:

  • “What is the history of NS records for 1f.ru?”
  • “What other names are hosted at 66.6.41.21?”
  • “What other domains are hosted by ns3.nic.ru?”

Rdata types like IP addresses have their indices optimized for CIDR lookups,and rrname or rdata names have their indices optimized for wildcard searches.As such, we can quickly provide answers to:

  • “What other names are in the *.tumblr.com domain?”
  • “What other names point their addresses into 66.6.32.0/20?”

The databases are created each and every minute. For example, allof the new data from Oct 22, 2015 at 18:51 UTC get stored in a filenamed:

dns.20151022.1851.m.mtbl

. We merge-sort databases intocombined databases at 10-minute, 1-hour, 1-day, 1-month and 1-year intervals.The collection of files form a set that the dnstable library can open andaccess in parallel to gather answers.

A command line lookup tool,

dnstable_lookup

, can use the

DNSTABLE_FNAME

to look up answers in one database file or a list of files includedin a file specified in the

DNSTABLE_SETFILE

environment variable.

Another command line tool,

dnstable_dump

, can take the binary formatstored in the databases and convert them to rows of JSON.

We’ll provide examples of both commands below.

Brand name / counterfeit example

Back in April I wrote a blog about how to look up counterfeit names using SIEaccess and enhancing it with DNSDB lookups. This time, we’ll just use our DNSDB Export files.

Consider the Burberry line of clothing and accessories. As a popular luxury brand, it is often targeted by counterfeiters. Counterfeiters often make use of these freshly created domain names, since they tend to have their wares taken down from established online sales platforms (Amazon, eBay, etc), and are unable to establish long-lived domain names due to the ability of rights holders to easily take down domain names with tools like the U.S.’s DMCA and ICANN’s UDRP. The examples below show freshly created domain names that would appear at first glance to fit into this pattern.

Let’s look at the latest minute…

$ dnstable_dump -r dns.20151022.1941.m.mtbl | grep burberry | grep -v ';'
burberrybags808.tumblr.com. IN A 66.6.41.21
burberrybags808.tumblr.com. IN A 66.6.43.21
burberryoutletstores.xyz. IN NS f1g1ns1.dnspod.net.
burberryoutletstores.xyz. IN NS f1g1ns2.dnspod.net.
burberryoutletstores.xyz. IN NS f1g1ns1.dnspod.net.
burberryoutletstores.xyz. IN NS f1g1ns2.dnspod.net.
burberryoutletstores.xyz. IN SOA f1g1ns1.dnspod.net.
freednsadmin.dnspod.com. 1444295154 3600 180 1209600 180
www.burberryoutletstores.xyz. IN CNAME burberryoutletstores.xyz.

Looking up over the last year, we can find other merchandise hostedthere. Using a larger set of DNSDB history, here’s another lookup:

$ ls dns.2015* > dns.fileset
$ export DNSTABLE_SETFILE=dns.fileset
$ dnstable_lookup rrset burberryoutletstores.xyz A
;; bailiwick: burberryoutletstores.xyz.
;; count: 5
;; first seen: 2015-10-04 03:40:30 -0000
;; last seen: 2015-10-07 17:14:44 -0000
burberryoutletstores.xyz. IN A 142.54.172.171

;; bailiwick: burberryoutletstores.xyz.
;; count: 18
;; first seen: 2015-10-10 21:15:43 -0000
;; last seen: 2015-10-21 19:46:38 -0000
burberryoutletstores.xyz. IN A 151.237.189.86
;;; Dumped 2 entries.

Looking up prior addresses finds other trademark names beinghosted on the same servers now:

$ dnstable_lookup rdata ip 142.54.172.171
louisvuittonoutletonline.pw. IN A 142.54.172.171
raybaneyeglasses.us.com. IN A 142.54.172.171
3gp-ds.ytconv.net. IN A 142.54.172.171
coachoutletonline.top. IN A 142.54.172.171
michaelkorshandbags.xyz. IN A 142.54.172.171
burberryoutletstores.xyz. IN A 142.54.172.171
;;; Dumped 6 entries.

$ dnstable_lookup rdata ip 151.237.189.86
raybaneyeglasses.us.com. IN A 151.237.189.86
abercrombieandfitchoutletsonline.com. IN A 151.237.189.86
furlaoutletsonline.in.net. IN A 151.237.189.86
burberryoutlet.top. IN A 151.237.189.86
discountnfljerseys.top. IN A 151.237.189.86
burberryoutletonline.top. IN A 151.237.189.86
burberryoutletstores.top. IN A 151.237.189.86
guccioutletonline.xyz. IN A 151.237.189.86
burberryoutletonline.xyz. IN A 151.237.189.86
burberryoutletstores.xyz. IN A 151.237.189.86
;;; Dumped 10 entries.

We don’t have to use command line tools to look at the data. There areC and Python bindings for easily doing lookups against DNStable files.

Consider the following script,

lookup_ip.py

:

#!/usr/bin/python

import sys
import dnstable

d = dnstable.reader('dns.fileset')
q = dnstable.query(dnstable.RDATA_IP, sys.argv[1])

for res in d.query(q):
print res.to_json()

Additionally, we can get JSON tuples for all of the names that reference thatIP address:

$ ./lookup_ip.py 151.237.189.86
{"rrtype": "A", "time_last": 1445413846, "time_first": 1444238290,
"count": 15, "rrname": "raybaneyeglasses.us.com.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1445478986, "time_first": 1443590863,
"count": 56, "rrname": "abercrombieandfitchoutletsonline.com.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1445061327, "time_first": 1444404624,
"count": 13, "rrname": "furlaoutletsonline.in.net.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1444735169, "time_first": 1444302797,
"count": 42, "rrname": "burberryoutlet.top.", "rdata": "151.237.189.86"}
{"rrtype": "A", "time_last": 1444948864, "time_first": 1444948864,
"count": 2, "rrname": "discountnfljerseys.top.", "rdata": "151.237.189.86"}
{"rrtype": "A", "time_last": 1445428527, "time_first": 1444273015,
"count": 7, "rrname": "burberryoutletonline.top.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1444132851, "time_first": 1444097643,
"count": 6, "rrname": "burberryoutletstores.top.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1445492367, "time_first": 1444473075,
"count": 9, "rrname": "guccioutletonline.xyz.", "rdata": "151.237.189.86"}
{"rrtype": "A", "time_last": 1445424613, "time_first": 1444273016,
"count": 22, "rrname": "burberryoutletonline.xyz.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1445456798, "time_first": 1444511743,
"count": 18, "rrname": "burberryoutletstores.xyz.", "rdata":
"151.237.189.86"}

rrname

is the name that was queried, and

rrtype

was the DNS type(“A”, “NS”, “MX”, etc.) found in the answer.

The tuple of

time_first

,

time_last

and

count

show how many timesthe name was seen within a given period. The times values are Unix epochseconds (the number of seconds since midnight Jan 1 1970 UTC). A count of “0”means it was seen once in an

INSERTION

record. Actual counts are made on

EXPIRATION

records.

The

bailiwick

is the place in the DNS heirarchty from which wereceived and answer. Sometimes a registry nameserver and the domain’sauthoritative nameserver can be out of sync. If they are out of sync,they will list different

bailiwick

and

rdata

for the same

rrname

and

rrtype

.

The

rdata

is an array of answers returned for the given

rrname

/

rrtype

/

bailiwick

during the timeframe. In DNS, order ofanswers doesn’t matter, so it may make sense to make sure the answersare sorted before importing to a database.

Conclusion

In the next article, I will provide more use-case examples.

Eric Ziegast is a Senior Distributed Systems Engineer for Farsight Security,Inc.

Read the next part in this series: Farsight’s Real-time DNSDB, Part Two