
In Part One of this series (“Finding Top FQDNs Per Day in DNSDB Export MTBL Files”), we looked at how you can:
In this part of the series, our focus will look at how DNSDB count/volume-over-time data can be accessed and visualized, including how DNSDB Export customers can use Perl and GNUplot to graph volume-over-time data. We will also dig in a little on a couple of particularly busy
k12.az.us
domains that were identified in part one of this series.
Recall that DNSDB API reports the
time_first_seen, time_last_seen
and
count
for each unique combination of (resource record name, record type, bailiwick and resource record data values) for the entire time that Farsight has seen that data. For example, checking DNSDB for www.yahoo.com CNAME records from the yahoo.com bailiwick with dnsdbq we see:
$ dnsdbq -r www.yahoo.com/cname/yahoo.com -S -k last
;; record times: 2018-08-14 21:43:57 .. 2018-12-03 20:06:54
;; count: 13927967; bailiwick: yahoo.com.
www.yahoo.com. CNAME atsv2-fp-shed.wg1.b.yahoo.com.
;; record times: 2017-05-16 23:35:00 .. 2018-08-14 21:46:38
;; count: 15841537; bailiwick: yahoo.com.
www.yahoo.com. CNAME atsv2-fp.wg1.b.yahoo.com.
[etc]
That example shows two different CNAMEs used by
www.yahoo.com
, each with counts in the 10’s of million range (large, but not unexpected given the fact that Yahoo is a fairly popular site).There are other similar DNSDB
www.yahoo.com/cname/yahoo.com
results going back to 2010 that we’ve intentionally omitted here in the interest of space. The reported counts cover the ENTIRE time from
time_first_seen
to
time_last_seen
in each case, but there’s NO indication provided of HOW the counts were distributed WITHIN that period of time.
Fortunately, if you’re a DNSDB Export customer, you have access to additional volume-over-time data. Specifically, you can dig out count data from the MTBL files you have at your site, such as MTBL “yearly,” “monthly,” “daily,” etc., files.
For example, arbitrarily picking the DNSDB 2015 yearly file, we find one entry for
www.yahoo.com/cname/yahoo.com
:
$ export DNSTABLE_FNAME="/path-to-mtbl-files/dns.2015.Y.mtbl"
$ dnstable_lookup rrset www.yahoo.com CNAME yahoo.com
;; bailiwick: yahoo.com.
;; count: 35,359,634
;; first seen: 2014-12-31 07:26:34 -0000
;; last seen: 2015-12-31 23:01:35 -0000
www.yahoo.com. IN CNAME fd-fp3.wg1.b.yahoo.com.
That file is showing the count solely for 2015 data for that RRset. We can repeat that process for additional DNSDB yearly files we’ve got, and then graph the values we’ve extracted using Excel or another program:

Figure 1. DNSDB count data for www.yahoo.com/cname/yahoo.com by year
Conceptually, now imagine performing a similar process over a range of daily MTBL files, saving just the starting date and the count for each file. We can do this (“quick-and-dirty”-style) with a somewhat arcane Unix command pipeline:
$ export LOOK_FOR="www.yahoo.com cname yahoo.com" ; ls -l /path-to-mtbl-files/dns\.*D* | awk '{print $9}' | awk '{print "export DNSTABLE_FNAME=" $1 " ; dnstable_lookup -j rrset $LOOK_FOR "}' | bash | jq --unbuffered -r '"\(.time_first|todate) \(.count)"' | sed -u 's/T..:..:..Z//' | sed -u 's/\-//g'
Decoding that command pipeline:
export LOOK_FOR=" www.yahoo.com/cname/yahoo.com"ls -l /path-to-mtbl-files/dns\.*D*awk '{print $9}'awk '{print "export DNSTABLE_FNAME=" $1 " ; dnstable_lookup -j rrset $LOOK_FOR"}'bashjq --unbuffered -r '"\(.time_first|todate) \(.count)"'sed -u 's/T..:..:..Z//'sed -u 's/\-//g'Having run that command, we can get a set of dates and counts that we can display in an Excel graph:
20181031 30779
20181101 24464
20181102 29553
20181103 20790
20181104 31577
20181105 26500
20181106 30638
20181107 29298
20181108 30591
20181109 21037
20181110 28302
20181111 28388
20181112 27235
20181113 28350
20181114 26326
20181115 24542
20181116 31130
20181117 26718
20181118 23614
20181119 31338
20181120 24308
20181121 27419
20181122 28067
20181123 22076
20181124 23211
20181125 29751
20181126 22494
20181127 30519

Figure 2. Volume over time data for www.yahoo.com/cname/yahoo.com
laveeneld.k12.az.usRecall that in part one of this series, we noticed a couple of
k12.az.us
domains that ranked unusually highly in our list of highest-volume non-infrastructural 2nd-level domains. For just one day’s worth of traffic we saw:
Rank Count Domain
8 3,402,660 laveeneld.k12.az.us
16 2,634,318 gesd.k12.az.us
[...]
Those were (and are) VERY high counts for K12 domains.
Checking the web, there’s nothing that immediately jumps out as being very unusual about
laveeneld.k12.az.us
.
For example, Laveen ELD isn’t huge — it’s just an elementary school district of around 7,200 students in Phoenix, Arizona. For comparison, the Portland (Oregon) Public Schools has 48,345 students and the Salem-Keizer (Oregon) School District has 41,120 students.
One potentially interesting/relevant thing we did notice about this elementary school district: we noticed that they are now using
http://www.laveeneld.org/
for their web site, NOT
http://www.laveeneld.k12.az.us/
(even though
laveeneld.k12.az.us
is the “hot running” K12 domain we’d noticed.)
What do we see if we check the live DNS for
laveeneld.k12.az.us
?
$ dig k12.az.us ns
[...]
k12.az.us. 7200 IN NS ns-933.awsdns-52.net.
k12.az.us. 7200 IN NS ns-1716.awsdns-22.co.uk.
k12.az.us. 7200 IN NS ns-1364.awsdns-42.org.
k12.az.us. 7200 IN NS ns-88.awsdns-11.com.
$ dig k12.az.us soa
k12.az.us. 900 IN SOA ns-933.awsdns-52.net. postmaster.email.state.az.us. 2013090501 7200 900 1209600 10800
The serial number of that SOA record (if in common YYYYMMDD format) seems rather oddly old —2013/09/05?
k12.az.us
laveeneld.k12.az.us
$ dig laveeneld.k12.az.us @ns-933.awsdns-52.net
[...]
;; AUTHORITY SECTION:
laveeneld.k12.az.us. 600 IN NS dns1.laveeneld.k12.az.us.
laveeneld.k12.az.us. 600 IN NS dns2.laveeneld.k12.az.us.
;; ADDITIONAL SECTION:
dns1.laveeneld.k12.az.us. 600 IN A 63.229.61.147 <-- Centurylink IP
dns2.laveeneld.k12.az.us. 600 IN A 209.234.221.50 <-- Regus Mgmt Grp
laveeneld.k12.az.us domains:$ dig +norecurse laveeneld.k12.az.us SOA @63.229.61.147
[...]
;; connection timed out; no servers could be reached
$ dig +norecurse laveeneld.k12.az.us SOA @209.234.221.50
[...]
;; connection timed out; no servers could be reached
$ whois laveeneld.k12.az.us
No Data Found
[etc]
Checking the SOA record, we see
postmaster.email.state.az.us
as POC.
Checking Whois for
k12.az.us
, we see
Tech Name: [elided]
Tech Organization: Arizona Dept. of Education
Tech Street: 1535 West Jefferson
Tech City: Phoenix
Tech State/Province: AZ
Tech Postal Code: 85007
Tech Country: US
Tech Phone: [elided]
Tech Email: [email protected]
There is no website at at
www.k12.az.us
, nor at
www.az.us
, but we were able to get in touch with the relevant authorities for
www.k12.az.us
, and shared our findings with them.
laveneld.k12.us in DNSDB MTBL daily files (September 2016-end of August 2017)DNSDB Export customers normally don’t keep daily MTBL files after they’ve been rolled up into monthly files, but Farsight actually has a large internal-only cache of daily MTBL files that we’ve kept for research-related purposes.
We can use that data to get a better picture of how a particular FQDN’s counts have been evolving over a longer period of time. We begin by looking at a year’s worth of daily data for
laveeneld.k12.az.us
, from September 2016 through the end of August 2017.
Because we’re now looking at 365 daily files, let’s replace the arcane Unix command pipeline (plus Excel) used in part 4 of this article with a
and GNUplot program called
vot
(“volume-over-time”). A listing of that program is available in Appendix I, or you can download a copy here.
vot
will let us easily extract the volume-over-time data that’s of interest from hundreds of MTBL files, while also giving us the ability to:
Ensure you’ve got:
Date::Manip
DateTime
File::Temp
Getopt::Long
List::MoreUtils
List::Util
vot code tailored to look in the right directory for our MTBL filesCopy the
vot
program to /usr/local/bin and make it executable:
# cp vot.pl /usr/local/bin/vot
# chmod a+rx /usr/local/bin/vot
We can then see a summary of
vot
command options by saying:
$ vot --help
Usage:
$ vot --fqdn FQDN [--rectype RECTYPE] [--bailiwick BAILIWICK]
[--granularity {Y|M|D|H}] [--smooth INTEGER]
[--notable] [--tableout FILENAME] [--jsonl] [--tabledir DIR]]
[--plot [--plottype {POINT|LINE|VBAR}]
[--plotdev GNUPLOTDEVICE] [--plotout FILENAME] [--plotdir DIR]
[--title 'TITLE'] [--noplotraw] [--plotsmoothed]]
[--start DATE] [--stop DATE] [--daysback INTEGER]
[--mtbldir] [--version] [--help]
fqdn: fully qualified domain name to be graphed (REQUIRED)
aliases: name|rrset|rrname|r
rectype: ONE rectype (a, aaaa, cname, etc.) (def: non-DNSSEC types)
aliases: rrtype|t
bailiwick: for www.abc.com, either abc.com or com (def: use both)
alias: b
granularity: Y(ear), M(onth), D(ay), H(our) (def: D)
aliases: unit|timeperiod|u
smooth: moving average period (def: no smoothing done)
aliases: ma|m
notable: supress tabular output (def: tabular output)
aliases: quiet|q
tableout: file for tabular output (def: generated filename)
aliases: output|outfile
jsonl: produce table in json lines format (def: CSV)
aliases: json|j
[etc]
A sample run of the
vot
program (from a system with access to the relevant MTBL files) looks like:
$ vot --fqdn laveeneld.k12.az.us --ma 14 --plot --plotma --device postscript
--start 20160901 --stop 20170831
Processing: dns.20160901.D.mtbl
[...]
Processing: dns.20170901.D.mtbl
Table output file: ./laveeneld.k12.az.us.20181226656.txt
Plot output file: ./laveeneld.k12.az.us.20181226656.postscript
The data file output from running
vot
(e.g.,
laveeneld.k12.az.us.20181226656.txt
) looks like:
20160901 1094 NaN
[...]
20160914 1189 1401.78571428571
[...]
20170831 1424 1426.85714285714
Graphic output from that vot run looks like:

Figure 3. laveeneld.k12.az.us DNSDB volume over time, 2016/09/01–2017/08/31
Note that there are two data sets displayed on that graph:
ma14(t) = (count(t) + count(t-1) + count (t-2) + ... + count (t-13)) / 14
first-seen_time in a given file’s data may actually be the day before the date of the filename itselfLooking at Figure 3, we can see multiple distinct regions over the course of the yearlong period:
laveeneld.k12.az.us in DNSDB MTBL daily files (July 2016-Date)What do we see if look at
laveeneld.k12.az.us
over the full range of dates for which we have data available?

Figure 4. laveeneld.k12.az.us DNSDB volume over time, full range
Notes:
gesd.k12.az.usWhat about
gesd.k12.az.us
? (You may recall that this was the other
k12.az.us
domain that was also noted as running particularly hot.)
Again, this is not a particularly huge school district, reportedly serving over 11,000 students.
Like Laveen Elementary School District, Glendale Elementary School District has moved to a non-k12.az.us domain for its web presence, in this case now using
https://portals.gesd40.org/
The pattern we see in live DNS data is similar to the pattern we previously saw for
laveeneld.k12.az.us
:
gesd.k12.az.us domain still has name server records defined at the k12.az.us
$ dig gesd.k12.az.us @ns-933.awsdns-52.net
[...]
;; AUTHORITY SECTION:
gesd.k12.az.us. 600 IN NS dns1.gesd.k12.az.us.
gesd.k12.az.us. 600 IN NS dns2.gesd.k12.az.us.
;; ADDITIONAL SECTION:
dns1.gesd.k12.az.us. 600 IN A 12.159.64.132 <-- Oneneck IT
dns2.gesd.k12.az.us. 600 IN A 12.159.65.138 <-- Oneneck IT
gesd.k12.az.us:$ dig +norecurse gesd.k12.az.us @12.159.64.132
[...]
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 30741
[...]
$ `dig +norecurse gesd.k12.az.us @12.159.65.138`
[...]
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 54633
[...]
gesd.k12.az.us domain doesn’t exist.$ whois gesd.k12.az.us
No Data Found
[etc]
Let’s look at the daily data from DNSDB MTBL files for their legacy
k12.az.us
domain:

Figure 5.
gesd.k12.az.us
DNSDB volume over time, full range
That graph is not quite as extreme as the
laveeneld.k12.az.us
data from section 7 (the Y axis for GESD “only” goes to counts of 1,800,000 vs. 2,500,000 for Laveen), but it does exhibit what’s rapidly becoming a very familiar macroscopic pattern: the domain’s data is normally “down in the weeds,” but jumps dramatically (and stays elevated at a new, far-higher, level) as of mid-September 2017, while also exhibiting high levels of heteroskedasticity.
Did anything significant happen in September to dot US? Well, as noted in the US Monthly Progress Report for September 2017, there was substantial publicity (including on social media) around the future of the dot us TLD. It’s possible that at least some K12 sites (such as these in Arizona?) decided to move away from the dot us at that time.
We also note that technical issues with delegated dot us domains have been an ongoing concern; see, e.g.,Table 3.4.1a from the (undated) “.US Locality Compliance Report.” At least as of its date of completion, it reported:

Figure 6. Dot us Domain Health Status
Note the significant “No Name Servers respond” and the “Lame Delegated” levels.
We do not have updated/current data for those measurements at this time.
k12.az.us Data In SIEWe’re still intrigued enough by what we’re seeing to want to know more. The DNS data we saw and see in DNSDB (both DNSDB API and DNSDB Export) comes from the Farsight Security Information Exchange, or SIE. The raw data that eventually gets incorporated into DNSDB initially comes in on SIE Channel 202 (“Ch202”). Because of the volume of traffic carried by that channel (over 500Mbps), we’re going to pull a sample of
k12.az.us
data from a locally-attached system that’s configured to listen to that channel over 10Gig Ethernet. To keep this all reasonable, we’ll just keep five data elements from each record:
$ nmsgtool -C ch202 -J - | grep k12.az.us | jq '"\(.message.query_ip) \(.message.response_ip) \(.message.qname) \(.message.qtype) \(.message.type)"' > k12.az.us.txt
[after a few minutes...]
ctrl-C
That file contained 4,220 lines.
Who’s querying
k12.az.us
? Let’s look at the message.query_ip’s, the first of five fields we extracted from our Ch202 JSON output:
$ cat k12.az.us.txt | awk '{print $1}' | sort | uniq -c | sort -nr
We see 5 IPs, all from the same /24 netblock, collectively accounting for 4,208 of the 4,220 observations seen (we’re going to omit those specific IPs here for policy reasons)
Who’s responding to those queries? message.response_ip came from:
$ cat k12.az.us.txt | awk '{print $2}' | sort | uniq -c | sort -nr
805 205.251.197.84 ($ dig -x 205.251.197.84 ==> ns-1364.awsdns-42.org)
710 205.251.198.180 (ns-1716.awsdns-22.co.uk)
501 63.229.61.147 (63-229-61-147.dia.static.qwest.net)
482 209.234.221.50 (ns1.hairybuffalo.com)
477 12.159.65.138 (dns02-one.phx1.tdc.oneneck.com)
439 12.159.64.132 (dns01-one.phx1.tdc.oneneck.com)
288 205.251.195.165 (ns-933.awsdns-52.net)
[etc]
Note that these are name servers closely related to the
k12.az.us
domains in general, and
laveeneld.k12.az.us
and
gesd.k12.az.us
in particular:
$ dnsdbq -n ns-1364.awsdns-42.org -j | grep k12
{"count":63920034,"`time_first":1378771509,"time_last":1546118896,"rrname":"k12.az.us.","rrtype":"NS","rdata":"ns-1364.awsdns-42.org."}
[snip]
$ dnsdbq -n ns-1716.awsdns-22.co.uk -j | grep k12
{"count":63920034,"time_first":1378771509,"time_last":1546118896,"rrname":"k12.az.us.","rrtype":"NS","rdata":"ns-1716.awsdns-22.co.uk."}
[snip]
$ dnsdbq -n ns-933.awsdns-52.net -j | grep k12
{"count":63920566,"time_first":1378771509,"time_last":1546118896,"rrname":"k12.az.us.","rrtype":"NS","rdata":"ns-933.awsdns-52.net."}
[snip]
$ dnsdbq -i 63.229.61.147 -j | grep k12
{"count":587066681,"time_first":1277453528,"time_last":1546114733,"rrname":"dns1.laveeneld.k12.az.us.","rrtype":"A","rdata":"63.229.61.147"}
$ dnsdbq -i 209.234.221.50 -j | grep k12
{"count":587058832,"time_first":1277453528,"time_last":1546114733,"rrname":"dns2.laveeneld.k12.az.us.","rrtype":"A","rdata":"209.234.221.50"}
$ dnsdbq -i 12.159.65.138 -j | grep k12
{"count":432520817,"time_first":1277847524,"time_last":1546113057,"rrname":"dns2.gesd.k12.az.us.","rrtype":"A","rdata":"12.159.65.138"}
$ dnsdbq -i 12.159.64.132 -j | grep k12
{"count":432516652,"time_first":1277847524,"time_last":1546113057,"rrname":"dns1.gesd.k12.az.us.","rrtype":"A","rdata":"12.159.64.132"}
What’s being asked for? 4,098 out of 4,220 hits were for laveen, gesd, tempe, or mesa related names:
$ cat k12.az.us.txt | awk '{print $3}' | grep "laveen\|gesd\|tempe\|mesa" | wc -l
4098
Specific message.qname (queried names) per site were:
$ cat k12.az.us.txt | awk '{print $3}' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr | grep `laveen`
485 lesd-sccm.laveeneld.k12.az.us.[see footnote 1]
183 _ldap._tcp.00-ldo._sites.dc._msdcs.laveeneld.k12.az.us.[see footnote 2]
178 les.laveeneld.k12.az.us. (possibly Laveen Elementary School)
157 rrs.laveeneld.k12.az.us. (possibly Rogers Ranch School)
153 les-lt-1mdxtp2.laveeneld.k12.az.us. (former purpose of this host is unknown)
127 wpad.laveeneld.k12.az.us. (web proxy auto discovery[see footnote 3]
119 pps.laveeneld.k12.az.us. (possibly Paseo Pointe School)
94 ces.laveeneld.k12.az.us. (possibly Cheatham Elementary School)
93 dme.laveeneld.k12.az.us. (possibly Desert Meadow School)
[all remaining names had 20 or fewer hits]
$ cat k12.az.us.txt | awk '{print $3}' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr | grep `gesd`
265 ge000ccms03.do.gesd.k12.az.us.[see footnote 4]
209 ge000prnts01.do.gesd.k12.az.us. (possibly a former print server)
104 gesd.do.gesd.k12.az.us. (possibly district offices server)
96 _ldap._tcp.horizon._sites.dc._msdcs.do.gesd.k12.az.us.[see footnote 5]
86 wpad.do.gesd.k12.az.us. (Web Proxy Auto Discovery)
81 _ldap._tcp.desert-spirit._sites.dc._msdcs.do.gesd.k12.az.us.
79 _ldap._tcp.smith._sites.dc._msdcs.do.gesd.k12.az.us.
75 _ldap._tcp.district-office._sites.dc._msdcs.do.gesd.k12.az.us.
73 _ldap._tcp.dc._msdcs.do.gesd.k12.az.us.
56 _ldap._tcp.landmark._sites.dc._msdcs.do.gesd.k12.az.us.
51 _ldap._tcp.sine._sites.dc._msdcs.do.gesd.k12.az.us.
51 192.168.0.1.do.gesd.k12.az.us.
50 _ldap._tcp.bicentennial-south._sites.dc._msdcs.do.gesd.k12.az.us.
44 _ldap._tcp.burton._sites.dc._msdcs.do.gesd.k12.az.us.
39 _ldap._tcp.sunset-vista._sites.dc._msdcs.do.gesd.k12.az.us.
39 _ldap._tcp.pdc._msdcs.do.gesd.k12.az.us.
38 _ldap._tcp.mensendick._sites.dc._msdcs.do.gesd.k12.az.us.
37 _ldap._tcp.american._sites.dc._msdcs.do.gesd.k12.az.us.
30 _ldap._tcp.coyote-ridge._sites.dc._msdcs.do.gesd.k12.az.us.
[all remaining names had 20 or fewer hits]
$ cat k12.az.us.txt | awk '{print $3}' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr | grep `tempe`
127 wpad.tempe3.k12.az.us.
115 _ldap._tcp.tempe3._sites.dc._msdcs.tempe3.k12.az.us.
113 _ldap._tcp.tempe3.k12.az.us.
93 _ldap._tcp.dc._msdcs.tempe3.k12.az.us.
60 _gc._tcp.tempe3.k12.az.us.
56 _ldap._tcp.tempe3._sites.gc._msdcs.tempe3.k12.az.us.
45 td3dns2.tempe3.k12.az.us.
45 dns3.tempe3.k12.az.us.
[all remaining names had 20 or fewer hits]
$ cat k12.az.us.txt | awk '{print $3}' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr | grep `mesa`
92 wpad.mesa.k12.az.us.
[all remaining names had 20 or fewer hits]
What type of records were being requested? Let’s look at message.qtype, the fourth field we extracted:
$ k12.az.us.txt | awk '{print $4}' | sort | uniq -c | sort -nr
1739 A
1494 SRV
711 AAAA
269 SOA
[all remaining types had 20 or fewer hits]
Finally, because this is Ch202, we need to remember that we have both queries that were successfully answered, and other queries that went unanswered. This is reflected in the message types we saw:
$ cat k12.az.us.txt | awk '{print $5}' | sort | uniq -c | sort -nr
2331 UDP_UNANSWERED_QUERY"
1889 UDP_QUERY_RESPONSE"
The large number of
"UDP_UNANSWERED_QUERY"
results is not unexpected since we know that at least some of the targeted name servers are refusing queries or are simply unreachable.
Since this is an already-too-long blog post, let’s just finish up with a quick “bullet point” wrap-up:
k12.az.us domains, we saw that there was a major shift in traffic volume that happened in mid-September 2017 for several domainsk12.az.us domains were no longer being used by the school districts in question, but NS records were still being returned for those domains by the k12.az.us
k12.az.us domains; all the query traffic we’re seeing for those domains is coming from one set of five related recursive resolvers.IMPORTANT: The little demonstration Perl script shown in this blog article is meant to be run only by trusted users for research-related purposes. There has been no attempt to “sanitize” inputs passed to it, and under NO condition should the script be exposed to data from untrustworthy sources (e.g., as a web portal).
Isn’t it time you talked to Farsight Security about what DNSDB Export or the Security Information Exchange can do for you and your business?
Please contact Farsight Security at [email protected].
#!/usr/bin/perl
use strict;
use warnings;
use Date::Manip;
use DateTime;
use File::Temp qw/ tempfile tempdir /;
use Getopt::Long qw(:config no_ignore_case);
use List::MoreUtils;
use List::Util qw(pairs);
my $bailiwick = '';
my $cmd = '';
my $cutoff = '';
my $daysback = '';
my $dt = '';
my $dt1 = '';
my $dur = '';
my $file = '';
my $filecount = '';
my $filedate = '';
my $myfiledate = '';
my $fqdn = '';
my $fullpattern = '';
my $granularity = 'D';
my $help = '';
my $jsonl = '';
my $key = '';
my $minutesseconds = '';
my $mtbldir = '/export/dnsdb/mtbl/';
my $noplotraw = '';
my $notable = '';
my $now_string = '';
my $now_string_mm_ss = '';
my $plot = '';
my $plotdev = 'postscript';
my $plotdir = '.';
my $plotout = '';
my $plotsmoothed = '';
my $plottype = 'line';
my $rectype = '';
my $roughcutoff = '';
my $roughstart = '';
my $roughstop = '';
my $sizeofresults = '';
my $smooth = '';
my $start = '';
my $start2 = '';
my $stop = '';
my $stop2 = '';
my $tabledir = '.';
my $tableextension ='txt';
my $tableout = '';
my $timefencecheck1 = '';
my $timefencecheck2 = '';
my $timefencecheck3 = '';
my $timefencecheck4 = '';
my $timefencecheck5 = '';
my $title = '';
my $totalresultsfound = '';
my $value = '';
my @keepermtblfiles;
my @mtblfiles;
my @results = '';
my @unsortedfiles;
my %hash = ();
my %smoothedvalues = ();
################### GET AND PROCESS THE ARGUMENTS ##################
GetOptions ('fqdn|name|rrset|rrname|r=s' => \$fqdn,
'rectype|rrtype|t=s' => \$rectype,
'bailiwick|b=s' => \$bailiwick,
'granularity|unit|timeperiod|u=s' => \$granularity,
'smooth|ma|m=s' => \$smooth,
'notable|quiet|q' => \$notable,
'tableout|output|outfile=s' => \$tableout,
'jsonl|json|j' => \$jsonl,
'tabledir|reportdir|directory|dir|d=s' => \$tabledir,
'plot|graph|chart|p' => \$plot,
'plottype|graphtype|charttype=s' => \$plottype,
'plotdev|device=s' => \$plotdev,
'plotout=s' => \$plotout,
'plotdir|graphdir=s' => \$plotdir,
'title|plottitle=s' => \$title,
'noplotraw' => \$noplotraw,
'plotsmoothed|plotma' => \$plotsmoothed,
'start|first|begin=s' => \$start,
'stop|last|end=s' => \$stop,
'daysback|days|window=s' => \$daysback,
'mtbldir|datadir|mtblfiles|mtbl' => \$mtbldir,
'help|info|man|manual|usage|h' => \$help,
);
# b d h j m p q r t u v
# help report
if (($fqdn eq '') || ($help eq 1))
{
die "Usage:
\$ $0 --fqdn FQDN [--rectype RECTYPE] [--bailiwick BAILIWICK]
[--granularity {Y|M|D|H}] [--smooth INTEGER]
[--notable] [--tableout FILENAME] [--jsonl] [--tabledir DIR]]
[--plot [--plottype {POINT|LINE|VBAR}]
[--plotdev GNUPLOTDEVICE] [--plotout FILENAME] [--plotdir DIR]
[--title 'TITLE'] [--noplotraw] [--plotsmoothed]]
[--start DATE] [--stop DATE] [--daysback INTEGER]
[--mtbldir] [--help]
fqdn: fully qualified domain name to be graphed (REQUIRED)
aliases: name|rrset|rrname|r
rectype: ONE rectype (a, aaaa, cname, etc.) (def: non-DNSSEC types)
aliases: rrtype|t
bailiwick: for www.abc.com, either abc.com or com (def: use both)
alias: b
granularity: Y(ear), M(onth), D(ay), H(our) (def: D)
aliases: unit|timeperiod|u
smooth: moving average period (def: no smoothing done)
aliases: ma|m
notable: supress tabular output (def: tabular output)
aliases: quiet|q
tableout: file for tabular output (def: generated filename)
aliases: output|outfile
jsonl: produce table in json lines format (def: CSV)
aliases: json|j
tabledir: directory for tabular output files (def: current dir)
aliases: reportdir|directory|dir|d
plot: request plots (def: no plots output)
aliases: graph|chart|p
plottype: type of plot to make? lines, dots, steps, impulses (def: lines)
aliases: graphtype|charttype
plotdev: gnuplot output format (def: postscript)
aliases: device
plotout: graphic output filename (def: generated filename )
plotdir: directory for plot output files (def: current dir)
aliases: graphdir
title: plot title (def: command line string options)
aliases: plottitle
noplotraw: no plotting of raw data (def: display raw data)
plotsmoothed: plot smoothed data (def: omit smoothed data)
aliases: plotma
start: show data starting from this date forward (def: all dates)
aliases: first|begin
stop: display no data after this date (def: all dates)
aliases: last|end
daysback: only display data for the last N days (def: all dates)
aliases: days|window
mtbldir: location of MTBL files to be used (def: /export/dnsdb/mtbl/)
aliases: datadir|mtblfiles|mtbl
help: show this then exit
aliases: info|man|manual|usage|h
Examples: \$ $0 --fqdn \"www.reed.edu\"
\$ $0 --fqdn \"powells.com\" --smooth 3 --notable\\
--plot --graphformat jpg --graphdir mygraphs\\
--start 20180101 --stop 20180630\n\n";
}
if ($fqdn eq '') { die "Specify a fully qualified domain name with --fqdn";}
# need current time to construct default filenames for output if not specified
# and for relative time ("daysback") options
# get today's date for the starting time
$dt = DateTime->today;
# convert the date to YYYYMMDD format with no separator between elements
$now_string = $dt->ymd('');
# get the local time and convert it to the component chunks
my ($dsec,$dmin,$dhour,$dmday,$dmon,$dyear,$dwday,$dyday,$disdst) =
localtime(time);
# we're going to use this for the generated filenames (. = concatenate)
$minutesseconds = $dmin . $dsec;
$now_string_mm_ss = $now_string . $minutesseconds;
# need to add one to get correct behavior
if ($daysback ne '') {
$daysback++;
$dur = DateTime::Duration->new( days => $daysback );
$dt1 = $dt - $dur;
$cutoff = $dt1->ymd('');
}
# check for illegal time fencing -- can do daysback or start/stop, not both
if ((($daysback ne '') && ($start ne '')) ||
(($daysback ne '') && ($stop ne '')))
{ die "Cannot use --daysback AND --start or --stop"; }
# want to plot but no filename: synthasize an output filename for the plot
# example name: www.facebook.com.2018021029.postscript
if (($plot ne '') && ($plotout eq ''))
{ $plotout = "$plotdir/$fqdn\.$now_string_mm_ss\.$plotdev"; }
# if user wants json output format, set the file extension appropriately
if ($jsonl) { $tableextension='jsonl'; }
# set a default title
if ($title eq '') {
$title = "\$ vot \-\-fqdn $fqdn";
if ($rectype ne '') { $title = $title . "\/$rectype"; }
if ($bailiwick ne '') { $title = $title . "\/$bailiwick"; }
if ($smooth ne '') { $title = $title . " \-\-ma $smooth"; }
}
# build the output filename for the table
# example name: www.facebook.com.2018021029.txt
if (($notable eq '') && ($tableout eq ''))
{ $tableout = "$tabledir/$fqdn\.$now_string_mm_ss\.$tableextension";
}
# user wants to produce SOMETHING, right?
if (($notable ne '') && ($plot eq ''))
{ die "No table output? No plot output? Nothing to do!"; }
# typical input MTBL filename: dns.20181125.D.mtbl
# granularity is a single letter: (Y, M, D, H)
if ($granularity eq "D") {
$fullpattern = $granularity.'.mtbl';
} else {die "granularity must be D only for now, sorry (case sensitive!)";}
# let's get the list of MTBL files (we'll exclude out-of-scope ones later)
opendir DIR, $mtbldir or die "Cannot open mtbldirectory: $mtbldir $!";
@unsortedfiles = readdir DIR;
# the MTBL file array is unsorted, let's tidy that up
@mtblfiles = sort @unsortedfiles;
# do we have at least 1 file to analyze?
$filecount = @mtblfiles;
if ($filecount == 0) {die "no files of requested granularity in $mtbldir $!";}
# we've loaded the MTBL files into the @mtblfiles array, so we can close
# the filehandle
closedir DIR;
# loop over the file array, and just keep the ones (roughly) in scope
# we'll add a one day grace period around the actual time period
# compute the adjusted dates
if ($start ne '') { $roughstart = DateCalc(ParseDate($start), ParseDateDelta('- 1 days')); }
if ($stop ne '') { $roughstop = DateCalc(ParseDate($stop), ParseDateDelta('+ 1 days'));
}
if ($daysback ne '') { $roughcutoff = $dt1->ymd(''); $roughcutoff = DateCalc(ParseDate($roughcutoff), ParseDateDelta('+ 1 days')); }
foreach $file (@mtblfiles) {
# make sure the files have the right granularity and aren't an "in process file" starting with a dot
if ((index($file, $fullpattern) != -1) && (index($file, '^\.') == -1)) {
# trim the junk from the filename
$filedate = $file;
$filedate =~ s/^dns\.//;
$filedate =~ s/\.D\.mtbl//;
# convert the date string from the filename to a real DateTime
$myfiledate = ParseDate($filedate);
# no time fencing
$timefencecheck1 =
(($start eq '') && ($stop eq '') && ($daysback eq ''));
# just after start
$timefencecheck2 =
(($stop eq '') && ($start ne '') && ($myfiledate ge $roughstart));
# just before stop
$timefencecheck3 =
(($start eq '') && ($stop ne '') && ($myfiledate le $roughstop));
# after start and before stop
$timefencecheck4 =
(($start ne '') && ($stop ne '') &&
($filedate ge $roughstart) && ($myfiledate le $roughstop));
# relative time check...
$timefencecheck5 =
(($daysback ne '') && ($myfiledate ge $cutoff));
if ($timefencecheck1 || $timefencecheck2 || $timefencecheck3
|| $timefencecheck4 || $timefencecheck5) {
push (@keepermtblfiles, $file);
}
} # end of granularity check
} # end for each potential MTBL file
# plan is to tally the counts in a perl hash, keyed by the first seen date
# total results found initially? zero, of course
$totalresultsfound = 0;
foreach $file (@keepermtblfiles) {
# build the command we need to run...
$cmd = "export DNSTABLE_FNAME=$mtbldir$file ; dnstable_lookup -j rrset $fqdn $rectype $bailiwick\| jq -r \'\"\\(.time_first\|todate\) \\(.count\)\"\' \| sed \'s\/T\.\* \/ \/\' \| sed \'s\/\-/\/g\' \| sort";
@results = `$cmd`;
my $sizeofresults = @results;
$totalresultsfound = $totalresultsfound + $sizeofresults;
# now load the results from dnstable_lookup into a perl hash
my $i = 0;
while ($i<$sizeofresults)
{
# split out a pair of values (a datestamp and a count)
($key, $value) = split(/\s+/,$results[$i]);
# double check the time fencing
$start2 = DateCalc(ParseDate($start), ParseDateDelta('- 2 days'));
$stop2 = ParseDate($stop);
# no time fencing
my $timefencecheck1 =
(($start2 eq '') && ($stop2 eq '') && ($daysback eq ''));
# just after start
my $timefencecheck2 =
(($stop2 eq '') && ($start2 ne '') && ($key ge $start2));
# just before stop
my $timefencecheck3 =
(($start2 eq '') && ($stop2 ne '') && ($key le $stop2));
# after start and before stop
my $timefencecheck4 =
(($start2 ne '') && ($stop2 ne '') &&
($key ge $start2) && ($key le $stop2));
# relative time check...
my $timefencecheck5 =
(($daysback ne '') && ($key ge $cutoff));
if ($timefencecheck1 || $timefencecheck2 || $timefencecheck3 || $timefencecheck4 || $timefencecheck5)
{
# print periodic status reports
print "Processing: $file\n";
# perl doesn't set initial hash values to zero
# by default, so we need to avoid doing arithmetic
# with UNDEF hash values
if (defined ($hash{$key}))
{ $hash{$key} += $value; }
else
{ $hash{$key} = $value; }
my $temptally=0;
$smoothedvalues{$key} = 'NaN';
if ($smooth ne '') {
my $validhashvalues=0;
for (my $kk=0; $kk < $smooth; $kk++) {
my $offset ='- '.$kk.' days';
my $checkkey = DateCalc(ParseDate($key), ParseDateDelta($offset));
$checkkey =~ s/00:00:00//;
if (defined ($hash{$checkkey})) {
$validhashvalues++;
$temptally=$temptally+$hash{$checkkey};
} # end of the valid hash check+tally
} # end of the for loop
if ($validhashvalues == $smooth){
$smoothedvalues{$key} = $temptally/$validhashvalues;
} # end of smoothing and value is valid
} # end of the "are we smoothing?"
} # if in the time fence
$i++;
} # while results loop
} # looping over all keeper filenames
if ($totalresultsfound == 0)
{die "no results found -- typo in FQDN? wrong rectype or bailiwick?\n";}
# we now have a date=>count hash, let's do something with it
########################### TABLE ###############################
# handle the tabular output case, if tabular output hasn't been supressed
if ($notable eq '')
{
# user wants tabular output, so let's open that table output file
open (my $tfh, '>', $tableout)
or die "$0: open $tableout: $!";
# in printing the following, we have some stuff that's always printed
# and some stuff that's optionally printed
# always printed: "$j" (the date) and $hash{$j} (the count)
# printed if $smooth > 0: $smoothedvalues{$j} (the moving average)
# printed if $jsonl selected: jsonl formatting cruft
# The trailing comma is NOT printed if doing $jsonl and it's the last record
if ($jsonl) {print $tfh '['; }
my $left= keys %hash;
foreach my $j (sort keys %hash)
{
if ($jsonl) {print $tfh '{"date":"'; }
print $tfh "$j";
print "$j ";
if ($jsonl eq '') {print $tfh ' ';}
if ($jsonl) {print $tfh '"},{"count":"';}
print $tfh "$hash{$j}";
print "$hash{$j}";
if ($jsonl) {print $tfh '"}';}
# also print smoothed values?
if ($smooth ne '') {
if ($jsonl) {print $tfh ',{"ma":"';}
if ($jsonl eq '') {print $tfh ' ';
}
print $tfh "$smoothedvalues{$j}";
print " $smoothedvalues{$j}";
if ($jsonl) {print $tfh '"}';}
} # end smoothed block
# no comma on the last obs, need right square bracket instead
if (($jsonl) && ($left >= 1)) {print $tfh ',';}
elsif ($jsonl) {print $tfh ']';}
#everybody gets the newline
print $tfh "\n";
print "\n";
$left--;
}
print "Table output file: $tableout\n";
close ($tfh);
} # end of table processing block
####################### PLOT #############################
if ($plot ne '')
{
$plottype = lc($plottype);
# three files: one for the plot output, one for the gnuplot commands, and
# one for a temporary copy of the data
open (my $pfh, '>', "$plotout")
|| die "$0: open $plotout $!";
my $tempfilename1 = '';
my $tempfilename2 = '';
# temporary file #1 for the gnuplot commands
(my $tempfh1, $tempfilename1) = tempfile();
# temporary file #2 for a copy of the data
(my $tempfh2, $tempfilename2) = tempfile();
# build the graphic code we'll be running
my $gnuplotcode = "set term $plotdev size 10in,7in monochrome font 'Helvetica,14'\n";
print $tempfh1 "$gnuplotcode";
$gnuplotcode = "set output \"$plotout\"\nset title \"$title\"\n";
print $tempfh1 "$gnuplotcode";
$gnuplotcode ="set xdata time\nset timefmt \"\%Y\%m\%d\"\nset format x \"\%Y\%m\%d\"\nset format y '%.0f'\n";
print $tempfh1 "$gnuplotcode";
$gnuplotcode ="set xtics rotate by 90 offset 0,-4 out nomirror\nset mxtics\nset style data $plottype\n";
print $tempfh1 "$gnuplotcode";
$gnuplotcode ="set datafile missing \"NaN\"\nset xlabel offset 2\nset bmargin 7\nset rmargin 5\nset tmargin 3\n";
print $tempfh1 "$gnuplotcode";
if (($noplotraw eq '') && ($plotsmoothed)) {
# plot raw and smoothed
my $gnuplotcode2 ="plot '$tempfilename2' using 1:2 with $plottype t 'raw' lt black dt 3, '$tempfilename2' using 1:3 with $plottype t 'smoothed' lt black dt 1\n";
print $tempfh1 "$gnuplotcode2\n";
} elsif (($noplotraw ne '') && ($plotsmoothed)) {
# plot smoothed only
my $gnuplotcode2 ="plot '$tempfilename2' using 1:3 with $plottype t 'smoothed' lt black dt 1\n";
print $tempfh1 "$gnuplotcode2\n";
} else {
# plot raw only
my $gnuplotcode2 ="plot '$tempfilename2' using 1:2 with $plottype t 'raw' lt black dt 1\n ";
print $tempfh1 "$gnuplotcode2\n";
}
foreach my $j (sort keys %hash)
{
print $tempfh2 "$j, $hash{$j}";
if ($smooth ne '') { print $tempfh2 ", $smoothedvalues{$j}"; }
print $tempfh2 "\n";
}
my $tempcommandline = "gnuplot $tempfilename1 < $tempfilename2\n";
my $ploterrors = `$tempcommandline`;
print "$ploterrors";
close $pfh;
print "Plot output file: $plotout\n";
}
Joe St Sauver Ph.D. is a Distinguished Scientist with Farsight Security, Inc.
Read the next part in this series: Analyzing DNSDB Volume-Over-Time Time Series Data With R and ggplot2 Graphics (Part Three of a Three-Part Series)