abstract image of colored circles
Blog Farsight TXT Record

New DNSDB -V summarize option: Sometimes "Less" Is "More"

1. Introduction

As part of a recent update to DNSDB, dnsdbq now offers a “-V summarize” verb (this is an implementation of the “estimation of result size” feature mentioned in an earlier blog article). Since we covered the new feature using DNSDB Scout in that initial article, we will only focus on dnsdbq, Farsight’s DNSDB command line client written in C, here.

To make use of the new dnsdbq -V summarize feature, begin by ensuring that you’re running the latest available version of dnsdbq. The manual page for the new verb describes the -V option as:

 -V verb
      The verb to perform, i.e. the type of query, either "lookup" or
      "summarize".  The default is the "lookup" verb.  As an option, you
      can specify the "summarize" verb, which gives you an estimate of
      result size.  At-a-glance, it provides information on when a given
      domain name, IP address or other DNS asset was first-seen and last-
      seen by the global sensor network, as well as the total observation

As noted in the manual page, when you specify -V summarize in dnsdbq you will get JUST:

  • The first- and last-seen time across the returned results
  • The summed counts tallied across the returned results
  • But, NO detail records will be provided for the returned results.

2. Example A: www.mit.edu/A/mit.edu

It may help to consider an example. Let’s ask for RRname results for www.mit.edu/A/mit.edu, and limit that query to three results:

$ dnsdbq -r www.mit.edu/A/mit.edu -l 3
;; record times: 2010-06-24 06:02:21 .. 2013-04-01 16:52:02
;; count: 5215640; bailiwick: mit.edu.
www.mit.edu.  A

;; record times: 2013-01-22 21:10:33 .. 2013-01-23 00:11:57
;; count: 452; bailiwick: mit.edu.
www.mit.edu.  A
www.mit.edu.  A
www.mit.edu.  A

;; record times: 2013-01-22 17:51:20 .. 2013-01-22 17:53:43
;; count: 9; bailiwick: mit.edu.
www.mit.edu.  A

Now let’s run that same query, this time including the -V summarize option:

$ dnsdbq -r www.mit.edu/A/mit.edu -l 3 -V summarize
;; record times: 2010-06-24 06:02:21 .. 2013-04-01 16:52:02
;; count: 5216101; num_results: 3

Note that this output corresponds to our “full” results:

  • Looking at just the time first seen for the three records, the earliest of those (2010-06-24 06:02:21) is shown in the summary output.

  • Looking at just the time last seen for the three records, the latest of those (2013-04-01 16:52:02) is shown in the summary output.

  • And looking at the counts, if we sum up 5215640, 452, and 9, we get 5216101, the count shown in the summary output.

We did NOT get “imputed” information for “all potential results” that DNSDB may know for that query, just the three we asked for.

The dnsdbq summarize verb works “just like a regular query,” EXCEPT:

  • The first-seen time is the earliest first-seen time seen in ANY of the results that would normally be displayed,

  • The last-seen time the latest last-seen time seen in ANY of the results would normally be displayed,

  • The displayed count is the sum of the individual counts that were in the results that would normally be displayed, and

  • You aren’t shown the individual details records.

3. Example B: *.uber.com

Let’s consider another example, a dnsdbq summarize query for *.uber.com returning up to a million results. We’ll begin by “manually” summing up the counts for an up-to-million results with jq and a tiny one-line awk REPL script:

$ dnsdbq -r \*.uber.com -l 1000000 -j | jq -r '.count' | awk '{s+=$1}END{print s}'

Now let’s see what we see from the actual dnsdbq summarize verb:

$ dnsdbq -r \*.uber.com -l 1000000 -V summarize
;; record times: 2010-06-24 10:38:39 .. 2019-08-29 21:33:59
;;   zone times: 2010-04-24 16:12:21 .. 2018-03-22 16:02:25
;; count: 2771990200; num_results: 1000000

The results for this example are interesting for a couple of reasons:

  • The summarize results include TWO sets of times, one for the as-observed-in-Farsight-sensor-derived data, and the other for results derived from zone file data (our example from Section 2 didn’t include any zone file data, so didn’t have any zone file data timestamps in that example).

  • This summarize output has a very large count (2,771,990,200), representing the sum of the count values seen in the million results returned for our query.

    When you see a number that large, it can be tempting to assume that summarize MUST somehow be looking at ALL the results that DNSDB knows about for *.uber.com (rather than just the first million results) — but that would be wrong. The huge value of 2,771,990,200 is JUST the sum of the counts for the first million results, very close to the result we got when we summed up a million counts “manually” with jq and awk (2,771,990,113). (The difference between the two counts is due to values updating in the brief interval between the two measurements).

4. Quota Considerations

A dnsdbq -V summarize query “counts the same” as a regular query in terms of your quota usage

A common question, as you might expect, is “So if doing a dnsdbq -V summarize query counts the same as doing a regular dnsdbq query, why not just do a regular query?” The answer is that the summarize verb is a nice option when you ONLY care about things like aggregate counts/first/last seen times because it avoids the necessity of taking all the detail records (only to then subsequently end up “throwing them away”).

5. “Why do you show num_results in dnsdbq -V summarize output?”

dnsdbq includes num_results in its output because it provides important context for the summary output.

For example, if you’ve asked for 500,000 results but we only know about 400,000 results, we want to ensure you know that we weren’t able to give you a summary for the full 500,000 you requested.

6. What You DON’T and CAN’T Get From Summarize

When you use the dnsdbq -V summarize option, dnsdbq returns its summary based on the results you would otherwise have seen had you not specified the summarize verb. The summarize verb does NOT somehow magically review ALL the results that DNSDB potentially knows about a given query (as if the limit value didn’t matter).

To make this concrete, let’s pretend that dnsdbq knows about 25 million unique combinations of (RRname, RRtype, Bailiwick, Rdata, and zone-file vs observed-by-a-sensor). Let’s also assume you use dnsdbq -V summarize and ask for the maximum number of results you can get from dnsdbq in a single query (e.g., one million results).

The first-seen, last-seen and count values that will be reported through dnsdbq -V summarize will be be based on the one million displayable results you would otherwise have been shown in detail, NOT the full set of 25 million results.

This means that you do NOT know, and CANNOT know, how many total unique results for your query may still “lurk” undisclosed in the passive DNS database, nor what the sum of the counts for all those results might be — the summarize verb will just report on what you could otherwise have gotten in normal detail-record form.

7. Acknowledgement

The author would like to thank his colleague David Waitzman for his helpful comments on this article, and for all his work in adding new features in DNSDB API. Any errors remaining in this article are the responsibility of the author.

8. Conclusion

We hope that this introduction to the dnsdbq summarize verb has been helpful and instructive for you.

The Farsight Security Sales Team can be reached at [email protected].

Joe St Sauver Ph.D. is a Distinguished Scientist with Farsight Security®, Inc.