Introduction

Casual passive DNS users often like Farsight DNSDB Scout for its easy-to-use point-and-click graphical user interface, whereas analysts working “at scale” often end up using dnsdbq, our command line interface client to DNSDB.

dnsdbq is particularly convenient for big projects where you may be doing thousands or even hundreds of thousands of queries. For example, we recently mapped the entire Internet IPv4 address space via 16.7 million IPv4 /24 Rdata summary queries (link to the IPv4 report here). Large scale projects like those represent a nice opportunity to “torture test” the same tools that you, our customers, are using. Sometimes that work even helps us find and fix potential issues.

In this post, we will review two improvements made to dnsdbq which will make your life easier when performing large-scale data queries. First, we will show our results from queries that can ‘self-describe’ which query they came from, and second, we’ll talk through how client-side timeouts operate.

dnsdbq’s Summarize Verb Gets More Powerful with the qdetail Option

The DNSDB API “summarize” option has been available since the Fall of 2019, but we haven’t had a large project that really required us to work with it for large numbers of IP address queries. 

When we did begin to make millions of DNSDB summarize queries for IP addresses, we quickly noticed something that had somehow escaped anyone’s notice until then. Specifically, we noticed that the results of our dnsdbq -V summarize queries weren’t “self-describing.” That is, the results of the dnsdbq query didn’t include metadata showing the query that produced those results. 

Now if you’re just doing one interactive query, the relationship between the query and the results is pretty self-evident: we see the query, then we see the associated results immediately thereafter. For example, running interactively with JSON Lines format output:

$ dnsdbq -i 128.223.0.0/16 -l0 -V summarize -j
{"count":1470933099,"num_results":35865,"time_first":1277348834,"time_last":1694825301,
"zone_time_first":1271183957,"zone_time_last":1694733344}

But now consider output from four queries (as you might see if you redirected output from a set of dnsdbq -V summarize runs into a file):

{"count":1470933852,"num_results":35865,"time_first":1277348834,"time_last":1694828216,
"zone_time_first":1271183957,"zone_time_last":1694733344}
{"count":738480318,"num_results":67355,"time_first":1277349381,"time_last":1694828542,
"zone_time_first":1271175197,"zone_time_last":1694733344}
{"count":709953789,"num_results":34458,"time_first":1277349361,"time_last":1694828586,
"zone_time_first":1271175197,"zone_time_last":1694733344}
{"count":77148765,"num_results":1181,"time_first":1277356373,"time_last":1694827683,
"zone_time_first":1675115354,"zone_time_last":1694728146}

What query is associated with each of those results? There’s no indication in the results themselves! 

Fortunately, there’s a new dnsdbq option that takes care of this issue. Be sure you’re running the latest version of dnsdbq from github, then specify the -T qdetail option, as described here in the dnsdbq man page:

 -T transform[,...]
          specify one or more transforms to be applied to the output:

          datefix  always show dates in the format selected by the
                   DNSDBQ_TIME_FORMAT environment variable, not in database
                   format.

          reverse  show the DNS owner name (rrname) in TLD-first order (so,
                   COM.EXAMPLE rather than EXAMPLE.COM).

          chomp    strip away the trailing dot (.) from the DNS owner name
                   (rrname).

          qdetail  annotate the response to include query details, mostly for
                   use by -V summarize. This is incompatible with sorting.

With that option used, the results will now look like the following (bolding and color added here to emphasize the critical new metadata):

{"count":1471090542,"num_results":35865,"time_first":1277348834,"time_last":1694833918,
"zone_time_first":1271183957,"zone_time_last":1694733344,
"_dnsdbq":{"descr":"rdata/ip/128.223.0.0,16","limit":0,"gravel":false,"complete":false}}
{"count":738505978,"num_results":67355,"time_first":1277349381,"time_last":1694833827,
"zone_time_first":1271175197,"zone_time_last":1694733344,
"_dnsdbq":{"descr":"rdata/ip/128.100.0.0,16","limit":0,"gravel":false,"complete":false}}
{"count":709969237,"num_results":34458,"time_first":1277349361,"time_last":1694833903,
"zone_time_first":1271175197,"zone_time_last":1694733344,
"_dnsdbq":{"descr":"rdata/ip/128.200.0.0,16","limit":0,"gravel":false,"complete":false}}
{"count":77149376,"num_results":1181,"time_first":1277356373,"time_last":1694831092,
"zone_time_first":1675115354,"zone_time_last":1694728146,
"_dnsdbq":{"descr":"rdata/ip/128.150.0.0,16","limit":0,"gravel":false,"complete":false}}

As shown above, each result now self-describes the query which it came from. This frees the user from having to maintain a relationship between a set of summarized queries and each one’s results. 

Query Timeouts

In the course of doing the “megaruns” described in the preceding section, we ran into an obscure corner case that made some of our queries hang. That incredibly rare corner case has now been patched, but looking at those hangs made us realize that dnsdbq historically hasn’t have a client-side timeout option. Client side timeout options are routinely available in command web clients such as curl or in development web https libraries such as the Python3 requests library (and pylint will even complain if you don’t specify a timeout as part of your requests call.

Without a client side timeout option, workflows could potentially block forever on a pending request—NOT desirable when you’re trying to complete thousands (or hundreds of thousands) of queries. We’re thus excited that dnsdbq’s maintainers have now also added a new timeout option. It’s described in the dnsdbq man page:

   -o timeout
          specifies the timeout, in seconds, for initial connection to
          database server and for each transaction made to that server.

Most of the time you’ll never need to worry about timeouts, but if you’re doing big runs with thousands (or hundreds of thousands) of queries, you may find the new option to be a lifesaver. A reasonable worst-case value for this option might be 900 seconds, although the impatient might set a lower value.

Conclusion

When operating at scale, dnsdbq is the command line tool of choice for interacting with DNSDB. As part of DomainTools’ effort to ensure fast, reliable, and comprehensive access to DNSDB, we reviewed two updates to dnsdbq

  • A new option called qdetail that when used in tandem with the summarize command allow results to self-describe which query they came from
  • Improve client side query timeouts

We hope you find Farsight DNSDB API to be a fun and productive investigatory tool. These recent additions to dnsdbq should help to ensure you can do even big projects smoothly and efficiently! A big thank you from all dnsdbq users to our dnsdbq maintainer team!