
Earlier this year, we published a three-part blog series on DNSDB volume-over-time analyses aimed at DNSDB Export (aka DNSDB “on-premises”) customers. Those customers have direct file-level access to the data in DNSDB MTBL files. You can find the three blogs here: Finding Top FQDNs Per Day in DNSDB Export MTBL Files (Part 1), Volume-Over-Time Data From DNSDB Export MTBL Files, Part 2 and Analyzing DNSDB Volume-Over-Time Time Series Data With R and ggplot2 Graphics, Part 3.
Unfortunately, unless DNSDB users were fortunate enough to have file-level access to DNSDB Export data files, they couldn’t get the data they needed to evaluate volume-across-time trends. This has now changed. Now “regular” DNSDB API users can begin to ask for “deaggregated” counts “across time,” too.
and DNSDB Scout have both been extended to support this new functionality. Since this feature was already demonstrated using DNSDB Scout in the original announcement, this article will focus solely on
dnsdbq
‘s implementation of its volume-across-time implementation, aka the
-g ("gravel")
option.
Normally DNSDB returns a single aggregated count. Consider a sample query and classic aggregated output, such as the following:
$ dnsdbq -r www.reed.edu/A/reed.edu
;; record times: 2010-06-24 17:12:52 .. 2019-05-29 16:37:26
;; count: 971009; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2019-05-29 16:39:54 .. 2019-09-09 17:56:45
;; count: 26747; bailiwick: reed.edu.
www.reed.edu. A 134.10.50.30
Decoded:
Farsight’s sensor network saw www.reed.edu resolve to the IPv4 address 134.10.2.252 a total of 971,009 times between 17:12:52 UTC on 2010-06-24 and 16:37:26 UTC on 2019-05-29.
At 16:39:54 UTC on 2019-05-29, that domain changed and began to resolve to 134.10.50.30, and has continued to do so 26,747 times through 17:56:45 UTC on 2019-09-09.
Those aggregated count ran from:
That aggregated count was useful, but sometimes you need finer-grained details. Being able to drill down and allocate the count over smaller time periods may be of interest to you if you’re attempting to determine if:
Without volume across time data, there was simply no way to answer these sort of questions using DNSDB API.
With the new DNSDB API volume across time functionality, a user can request a result from EACH underlying MTBL file. The duration of each MTBL file will vary according to the “roll-up” status of the data:
Let’s rerun our sample query for www.reed.edu/A/reed.edu, this time adding a
-g
option. Suddenly we get far more granular results:
$ dnsdbq -r www.reed.edu/A/reed.edu -g
;; record times: 2010-06-24 17:12:52 .. 2010-12-31 18:09:34 <== Through 2010
;; count: 20833; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2010-12-28 11:56:00 .. 2011-12-31 18:45:54 <== Roughly 2011
;; count: 57285; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2011-12-31 14:51:10 .. 2012-12-31 23:39:57 <== Roughly 2012
;; count: 76954; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2012-12-31 22:04:40 .. 2013-12-31 22:24:43 <== Roughly 2013
;; count: 82726; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2013-12-31 21:52:26 .. 2014-12-31 16:58:22 <== Roughly 2014
;; count: 75379; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2014-12-31 12:41:43 .. 2015-12-31 21:24:51 <== Roughly 2015
;; count: 100858; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2015-12-31 17:33:54 .. 2016-12-31 21:16:24 <== Roughly 2016
;; count: 167684; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2016-12-31 10:07:17 .. 2017-12-31 21:20:05 <== Roughly 2017
;; count: 182531; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2017-12-31 14:02:42 .. 2018-12-31 19:23:57 <== Roughly 2018
;; count: 156543; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2018-12-31 16:01:04 .. 2019-01-31 22:38:31 <== Roughly Jan 2019
;; count: 10742; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2019-01-31 17:20:34 .. 2019-02-28 23:38:07 <== Roughly Feb 2019
;; count: 10491; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2019-02-28 18:41:19 .. 2019-03-31 23:44:30 <== Roughly Mar 2019
;; count: 10091; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2019-03-31 18:36:58 .. 2019-04-30 23:32:50 <== Roughly Apr 2019
;; count: 9904; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
;; record times: 2019-04-30 18:05:53 .. 2019-05-29 16:37:26 <== Roughly May 2019
;; count: 8988; bailiwick: reed.edu.
www.reed.edu. A 134.10.2.252
[the remaining results, e.g., for www.reed.edu/A/reed.edu ==> 134.10.50.30, omitted here]
You may wonder why the dates shown above don’t “perfectly” coincide with their respective time periods (e.g., why are the time periods only “roughly” a year or “roughly” a month). The answer is that unique RRname/RRtype/Bailiwick/RRname data values are cached during processing, and sometimes cached entries will span a time boundary, resulting in the epoch-spanning time boundaries shown in the above MTBL reports.
While the results shown in section 2 contain the raw numerical gravel’d output data, it’s hard to visualize what’s going on when we’re just looking at a table of raw numbers. Let’s see what our results look like when we graph them. We begin by extracting a CSV file containing just:
for our IP of interest using standard Unix command line tools plus
.
$ dnsdbq -g -r www.reed.edu/A/reed.edu -s -k first -j | grep "134.10.2.252" | jq -r '"\(.time_first), \(.time_last - .time_first), \(.count)"' > reed.csv
$ head -5 reed.csv
1277399572, 16419402, 20833
1293537360, 31819794, 57285
1325343070, 31654127, 76954
1356991480, 31537203, 82726
1388526746, 31518356, 75379
We could then plot that data in a graphing package of your choice. For example, we could use
Microsoft Excel
:

Figure 1. Count of www.reed.edu/A/reed.edu –> 134.10.2.252 Over Time Using Excel
We can also use
R
and
ggplot
to plot that graph. We assume you already have R and ggplot installed. If you need help getting
R
and
ggplot
installed, please visit here for more information.
Once you have
and
ggplot
installed, create the Rscript commands needed to plot the data:
$ cat plot-volume-over-time.R#!/usr/local/bin/Rscript
args <- commandArgs(trailingOnly = TRUE)
filename <- args[1]
mydata <- read.table(file = filename, header = FALSE, sep = ",")
colnames(mydata) <- c("x", "deltax", "y")
mydata$datetime <- as.Date(as.POSIXct(mydata$x, origin="1970-01-01", tz="GMT"))
outputfile <- args[2]
sink(file = outputfile, append = FALSE, type = c("output", "message"), split = FALSE)
graphfile <- paste(outputfile, ".pdf", sep = "", collapse = NULL)
pdf(graphfile, width = 10, height = 7.5)
library("ggplot2")
library("scales")
mytitle <- paste("\n\n", "www.reed.edu/A/reed.edu", "\nCounts Over Time", sep = "")
mydatebreaks = as.Date(c("2010-01-01","2011-01-01",
"2012-01-01","2013-01-01","2014-01-01","2015-01-01",
"2016-01-01","2017-01-01","2018-01-01","2019-01-01", "2020-01-01"))
p <- ggplot()+
geom_point(aes(x = mydata$datetime, y = mydata$y)) +
geom_line (aes(x = mydata$datetime, y = mydata$y )) +
labs(title=mytitle, x = "Date\n\n", y = "\n\nCounts") +
scale_x_date(limits= as.Date(c("2010-01-01", "2020-01-01")),
breaks = mydatebreaks, date_minor_breaks = "1 month",
date_labels="%m/%y") +
scale_y_continuous(labels = comma,
sec.axis = sec_axis(trans=~., name="\n\n", breaks=NULL))
print(p)
To run that program, enter:
$ chmod a+rx plot-volume-over-time.R
$ ./plot-volume-over-time.R reed.csv reed.output
[output will be in reed.output.pdf, and should look like the graph shown on the next page]
You may see a couple of warning messages, such as:
Warning messages:
In min(x) : no non-missing arguments to min; returning Inf
In max(x) : no non-missing arguments to max; returning -Inf
You may safely disregard these. The graphic output in reed.output.pdf looks like Figure 2, below:

Figure 2. Count of www.reed.edu/A/reed.edu –> 134.10.2.252 Over Time Using R and ggplot
Figure 2 — even better than Figure 1 — makes it obvious that while the first measurements are annual totals, the last results are monthly results. As such, it should not be surprising that these are just a fraction of the size of the older annual results.
We hope that this article will serve to help “bootstrap” your use of the
-g
(“gravel”) option in
dnsdbq
.
The Farsight Security Sales Team can be reached at [email protected].
Joe St Sauver Ph.D. is a Distinguished Scientist with Farsight Security®, Inc.