
I remember back in 2008 when Paul Vixie introduced PassiveDNS replication to me,a real-time stream of names and answers scrolling by in a terminal window.Every now and then I could pause the terminal and see what obviouslylooked like a “pharma” domain or some kind of phishing. It was excitingand magical, but it wasn’t quite useful — yet. Until an indexed databasewas available, it wasn’t easy to make associations between the same IPaddresses a criminal used for one campaign to identify other campaigns.We also wanted to ask questions like “What other hostnames are being usedinside this domain?” Until we were able to build an index based on the livedata, all we could do was “grep”.
We’ve had friends try to use standard SQL databases, and they’ve had difficultybeing able to keep up with inserting new information with the flood of data comingin while still being able to perform queries. It inspired us to look totime-delimited NoSQL solutions. We’ve gone through several iterations ofNoSQL database design on theback-end ranging from:
Each iteration took advantage of technology available at the time tomake lookups as efficient as possible. The last two were developed totake advantage of SSD to write data once and let clients read as manytimes as needed without any spinning media bottlenecks. We generatedhourly databases from live data that were merge-sorted into dailydatabases and then monthly databases and yearly databases. A “fileset”gives an access client the list of databases to open in parallel toquery for their answer. The process scales well on RAID arrays ofgeneric 2.5″ SSD drives, and we can replicate linearly if needed.
If researchers wanted or needed to perform lookups for data in the last fewminutes, we usually directed them toward the raw Passive DNS real-time datafeed that’s available on the Security Information Exchange (SIE). They could develop their own methods to utilize the real-time data, but they found it complex and time-consuming to useboth a database and create their own processing scripts for the real-timedata.
As announced last week,we’ve improved our DNSDB Export service to real time. Now, users don’t haveto wait for the next hourly update to get more information. We now makeupdates available for DNSDB Export every minute. We developed a TLS-baseddownload manager to speed up transfers and manage consistency on the clientside based on what local files are available.
First, understand that with PassiveDNS replication, we gather matchedquestions and answers between recursive nameservers from all over theInternet. We have a waterfall computing model where raw uploadsfrom sensors are deduplicated by their query and answer, deduplicatedagain based on where in the DNS hierarchy the answer arrived from, andthen filter out superfluous data. The process is documented in Passive DNS Architecture.The live data includes information that looks like the following:
type: EXPIRATION
count: 1
time_first: 2015-10-22 03:20:19
time_last: 2015-10-22 03:20:19
bailiwick: tumblr.com.
rrname: ziegast.tumblr.com.
rrclass: IN (1)
rrtype: A (1)
rrttl: 30
rdata: 66.6.41.21
rdata: 66.6.42.21
rdata: 66.6.43.21
type: INSERTION
count: 10
time_first: 2015-10-22 07:16:26
time_last: 2015-10-22 18:56:16
response_ip: 194.85.252.62
bailiwick: ru.
rrname: 1f.ru.
rrclass: IN (1)
rrtype: NS (2)
rrttl: 345600
rdata: ns3.nic.ru.
rdata: ns4.nic.ru.
rdata: ns8.nic.ru.
The entries are inserted as tuples into the database DNStable. The indicesare built to enable queries based mostly on rrname and rdata. We can makesimple direct queries like:
1f.ru?”66.6.41.21?”ns3.nic.ru?”Rdata types like IP addresses have their indices optimized for CIDR lookups,and rrname or rdata names have their indices optimized for wildcard searches.As such, we can quickly provide answers to:
*.tumblr.com domain?”66.6.32.0/20?”The databases are created each and every minute. For example, allof the new data from Oct 22, 2015 at 18:51 UTC get stored in a filenamed:
dns.20151022.1851.m.mtbl
. We merge-sort databases intocombined databases at 10-minute, 1-hour, 1-day, 1-month and 1-year intervals.The collection of files form a set that the dnstable library can open andaccess in parallel to gather answers.
A command line lookup tool,
dnstable_lookup
, can use the
DNSTABLE_FNAME
to look up answers in one database file or a list of files includedin a file specified in the
DNSTABLE_SETFILE
environment variable.
Another command line tool,
dnstable_dump
, can take the binary formatstored in the databases and convert them to rows of JSON.
We’ll provide examples of both commands below.
Back in April I wrote a blog about how to look up counterfeit names using SIEaccess and enhancing it with DNSDB lookups. This time, we’ll just use our DNSDB Export files.
Consider the Burberry line of clothing and accessories. As a popular luxury brand, it is often targeted by counterfeiters. Counterfeiters often make use of these freshly created domain names, since they tend to have their wares taken down from established online sales platforms (Amazon, eBay, etc), and are unable to establish long-lived domain names due to the ability of rights holders to easily take down domain names with tools like the U.S.’s DMCA and ICANN’s UDRP. The examples below show freshly created domain names that would appear at first glance to fit into this pattern.
Let’s look at the latest minute…
$ dnstable_dump -r dns.20151022.1941.m.mtbl | grep burberry | grep -v ';'
burberrybags808.tumblr.com. IN A 66.6.41.21
burberrybags808.tumblr.com. IN A 66.6.43.21
burberryoutletstores.xyz. IN NS f1g1ns1.dnspod.net.
burberryoutletstores.xyz. IN NS f1g1ns2.dnspod.net.
burberryoutletstores.xyz. IN NS f1g1ns1.dnspod.net.
burberryoutletstores.xyz. IN NS f1g1ns2.dnspod.net.
burberryoutletstores.xyz. IN SOA f1g1ns1.dnspod.net.
freednsadmin.dnspod.com. 1444295154 3600 180 1209600 180
www.burberryoutletstores.xyz. IN CNAME burberryoutletstores.xyz.
Looking up over the last year, we can find other merchandise hostedthere. Using a larger set of DNSDB history, here’s another lookup:
$ ls dns.2015* > dns.fileset
$ export DNSTABLE_SETFILE=dns.fileset
$ dnstable_lookup rrset burberryoutletstores.xyz A
;; bailiwick: burberryoutletstores.xyz.
;; count: 5
;; first seen: 2015-10-04 03:40:30 -0000
;; last seen: 2015-10-07 17:14:44 -0000
burberryoutletstores.xyz. IN A 142.54.172.171
;; bailiwick: burberryoutletstores.xyz.
;; count: 18
;; first seen: 2015-10-10 21:15:43 -0000
;; last seen: 2015-10-21 19:46:38 -0000
burberryoutletstores.xyz. IN A 151.237.189.86
;;; Dumped 2 entries.
Looking up prior addresses finds other trademark names beinghosted on the same servers now:
$ dnstable_lookup rdata ip 142.54.172.171
louisvuittonoutletonline.pw. IN A 142.54.172.171
raybaneyeglasses.us.com. IN A 142.54.172.171
3gp-ds.ytconv.net. IN A 142.54.172.171
coachoutletonline.top. IN A 142.54.172.171
michaelkorshandbags.xyz. IN A 142.54.172.171
burberryoutletstores.xyz. IN A 142.54.172.171
;;; Dumped 6 entries.
$ dnstable_lookup rdata ip 151.237.189.86
raybaneyeglasses.us.com. IN A 151.237.189.86
abercrombieandfitchoutletsonline.com. IN A 151.237.189.86
furlaoutletsonline.in.net. IN A 151.237.189.86
burberryoutlet.top. IN A 151.237.189.86
discountnfljerseys.top. IN A 151.237.189.86
burberryoutletonline.top. IN A 151.237.189.86
burberryoutletstores.top. IN A 151.237.189.86
guccioutletonline.xyz. IN A 151.237.189.86
burberryoutletonline.xyz. IN A 151.237.189.86
burberryoutletstores.xyz. IN A 151.237.189.86
;;; Dumped 10 entries.
We don’t have to use command line tools to look at the data. There areC and Python bindings for easily doing lookups against DNStable files.
Consider the following script,
lookup_ip.py
:
#!/usr/bin/python
import sys
import dnstable
d = dnstable.reader('dns.fileset')
q = dnstable.query(dnstable.RDATA_IP, sys.argv[1])
for res in d.query(q):
print res.to_json()
Additionally, we can get JSON tuples for all of the names that reference thatIP address:
$ ./lookup_ip.py 151.237.189.86
{"rrtype": "A", "time_last": 1445413846, "time_first": 1444238290,
"count": 15, "rrname": "raybaneyeglasses.us.com.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1445478986, "time_first": 1443590863,
"count": 56, "rrname": "abercrombieandfitchoutletsonline.com.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1445061327, "time_first": 1444404624,
"count": 13, "rrname": "furlaoutletsonline.in.net.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1444735169, "time_first": 1444302797,
"count": 42, "rrname": "burberryoutlet.top.", "rdata": "151.237.189.86"}
{"rrtype": "A", "time_last": 1444948864, "time_first": 1444948864,
"count": 2, "rrname": "discountnfljerseys.top.", "rdata": "151.237.189.86"}
{"rrtype": "A", "time_last": 1445428527, "time_first": 1444273015,
"count": 7, "rrname": "burberryoutletonline.top.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1444132851, "time_first": 1444097643,
"count": 6, "rrname": "burberryoutletstores.top.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1445492367, "time_first": 1444473075,
"count": 9, "rrname": "guccioutletonline.xyz.", "rdata": "151.237.189.86"}
{"rrtype": "A", "time_last": 1445424613, "time_first": 1444273016,
"count": 22, "rrname": "burberryoutletonline.xyz.", "rdata":
"151.237.189.86"}
{"rrtype": "A", "time_last": 1445456798, "time_first": 1444511743,
"count": 18, "rrname": "burberryoutletstores.xyz.", "rdata":
"151.237.189.86"}
rrname
is the name that was queried, and
rrtype
was the DNS type(“A”, “NS”, “MX”, etc.) found in the answer.
The tuple of
time_first
,
time_last
and
count
show how many timesthe name was seen within a given period. The times values are Unix epochseconds (the number of seconds since midnight Jan 1 1970 UTC). A count of “0”means it was seen once in an
INSERTION
record. Actual counts are made on
EXPIRATION
records.
The
bailiwick
is the place in the DNS heirarchty from which wereceived and answer. Sometimes a registry nameserver and the domain’sauthoritative nameserver can be out of sync. If they are out of sync,they will list different
bailiwick
and
rdata
for the same
rrname
and
rrtype
.
The
rdata
is an array of answers returned for the given
rrname
/
rrtype
/
bailiwick
during the timeframe. In DNS, order ofanswers doesn’t matter, so it may make sense to make sure the answersare sorted before importing to a database.
In the next article, I will provide more use-case examples.
Eric Ziegast is a Senior Distributed Systems Engineer for Farsight Security,Inc.
Read the next part in this series: Farsight’s Real-time DNSDB, Part Two