
This article tackles a simple question (which actually turns out to have a surprisingly complex-appearing answer): “How are the results from a DNSDB Standard Search ordered?”
We’ll answer that question in this article and explain why that answer matters to anyone who makes DNSDB Standard Search queries that return large numbers of results.
Let’s begin by recalling that DNSDB data gets stored “server-side” in MTBL (immutable sorted string) files, as previously discussed in “Passive DNS and SIE File Formats” (see https://www.domaintools.com/resources/blog/passive-dns-and-sie-file-formats/). When you run a query against DNSDB, matching results are selected and returned from those MTBL files.
The results you receive will be the lesser of [the maximum number of results requested by the user via a DNSDB client] and [the maximum number of results allowed per-query by the DNSDB server].
This normally means that authorized users can ask for:
See “Getting More Results from DNSDB Using the New -O (Offset) Option” (https://www.domaintools.com/resources/blog/getting-more-results-from-dnsdb-using-the-new–o-offset-option/) for more around the “offset” concept if you’re not already familiar with it.
While four million results is undeniably a LOT of results, DNSDB may “know about” EVEN MORE than four million potential results, at least for some queries. When that’s the case, the subset of results you’ll see (in cases where there are too many results to return them all) is determined by how results are stored in MTBL files, and how the matching results are found and returned.
The results you’ll get from DNSDB are
You’ll simply get results that match your query in the natural order they’re saved in the MTBL files that are being searched.
This is true even if you subsequently sort your results “client side.” Your client will only receive, and can only SORT, the subset of results received from the server, it CANNOT somehow ask the server to consider “all” possible results that match your query, “cherry-picking” and returning just the “best” results after considering your particular preferences.
This means that it is important to understand the natural order of results as they’re saved in MTBL files.
Comprehending that process begins with understanding where DNSDB results originate.
DNSDB API results (as returned by DNSDB clients such as dnsdbq or DNSDB Scout), contain data from two sources — observations from ICANN Top Level Domain Zone Files, and observations from our network sensors. You can determine “which is which” by looking at the time stamp “labels” shown for each result. For example, the following observation (shown here in presentation format) comes from ICANN Zone File TLD data (emphasis added to this sample output):
;; zone times: 2021-05-20 23:12:48 .. 2021-11-28 23:05:03 (~191d 23h 52m)
;; count: 192; bailiwick: info.
apple.info. NS a.ns.apple.com.
apple.info. NS b.ns.apple.com.
apple.info. NS c.ns.apple.com.
apple.info. NS d.ns.apple.com.
On the other hand, the following observation came from sensor data (emphasis added to the sample output):
;; record times: 2021-03-14 19:30:11 .. 2021-11-29 15:06:24 (~259d 19h 36m)
;; count: 2841; bailiwick: info.
apple.info. NS a.ns.apple.com.
apple.info. NS b.ns.apple.com.
apple.info. NS c.ns.apple.com.
apple.info. NS d.ns.apple.com.
When results are displayed in “natural” order, you will always see Zone File data FIRST, if Zone File data is available.
That fact naturally leads to two questions:
i) Why won’t I ALWAYS see Zone File data for a query?
Sometime Zone File data simply isn’t available. For example, the dot edu and the dot mil zones don’t share Zone File data. Country code TLDs also often don’t share Zone File data.
Other times, you might be accessing DNSDB via DNSDB Export (sometimes referred to as “DNSDB On-Premises”). We’re not allowed to redistribute ICANN Zone File data in bulk, so that means DNSDB Export users do NOT receive ICANN Zone File data from us, only the data that originates from our sensors. (We’re happy to help DNSDB Export customers import Zone File data that they themselves have arranged to download directly from ICANN, however.)
Lastly, Zone Files only contain a limited set of Resource Record Types — largely “NS” records plus “glue” records (“A” or “AAAA” records referring to the delegation point’s in-domain name servers), plus a limited number of other “infrastructural” resource records related to the TLD zone itself. If your queries are for pretty much anything else, it’s unlikely that there will be anything in the Zone File data relevant to your query.
ii) When I do see Zone File data, why does it always appear FIRST?
DNSDB uses two sets of MTBL files: one set that contains only Zone File data, and another set that contains only sensor data. DNSDB was set up to search and return results from the smaller Zone File data files first (if they exist and are relevant), and then (and only then) return results from the sensor data files.
Before talking more about RRset ordering, let’s briefly recap some of the DNS “lingo” we’re about to use. For example, consider a typical DNS Resource Record (as might be returned as an answer from the Un*x command
$ dig www.domaintools.com
).
We’ll add a “header row” to make clear “what’s what” in the core answer received for that query:
RRname (or "left hand side") TTL Class RRtype Rdata (or "right hand side")
www.domaintools.com. 43200 IN A 199.30.228.112
The above “A” record maps the RRname (or “owner name”)
www.domaintools.com
to the IPv4 address
199.30.228.112.
The resource record also declares that this relationship should be remembered locally (or “cached”) for 60*60*12=43,200 seconds (e.g., 12 hours).
The DNS “class” of this Resource Record, like virtually all Resource Records, is “IN” (“INternet”). (For information on other DNS class values, if curious, consult https://datatracker.ietf.org/doc/html/rfc1035 at Section 3.2.4)
The data that’s reported by dnsdbq (see https://github.com/dnsdb/dnsdbq ) in default presentation format is very similar, albeit WITH some added comment lines (denoted by leading semicolons) and WITHOUT TTL or DNS Class data. We’ll use dnsdbq’s
-A1d
command line option to ask just to see results that have been seen within the last day (as of the time this example was run):
$ dnsdbq -r www.domaintools.com -A1d
;; record times: 2015-03-18 20:21:57 .. 2021-11-30 08:49:37 (~6y ~258d)
;; count: 962765; bailiwick: domaintools.com.
www.domaintools.com. A 199.30.228.112
Looking at just the comment lines shown above, those lines tell us:
With that background, we can now say that MTBL “key”/”value” pairs (e.g., DNSDB “observations” or “results”) are ordered (within the overall Zone File data or within the overall sensor data) by each entry’s “key.”
Each MTBL “key” begins with an ENTRY_TYPE. ENTRY_TYPEs are described in the man page that gets installed as part of the software mentioned in “Passive DNS and SIE File Formats” (https://www.domaintools.com/resources/blog/passive-dns-and-sie-file-formats/):
$ man dnstable-encoding
The current full list of ENTRY_TYPEs is:
Our focus today is going to be on the first of those, ENTRY_TYPE_RRSET. These are the MTBL entries that get searched if you look for an exact RRname such as
www.example.com
in DNSDB. These entries are also what’s used to search for a left hand wildcard RRname (such as
*.example.com
) in DNSDB.
As stated in the previously mentioned manual page, the “key” field for the ENTRY_TYPE_RRSET is a composite field consisting of:
The “label-reversed wire-format DNS domain names” mentioned in the preceding need a bit of explanation. Assume the original RRname (or “RRset owner name”) is
www.example.com
As stated in the man page, the label-reversed wire-format DNS domain name would then be
\x03com\x07example\x03www\x00
Given the above, the “label-reversed wire-format DNS domain names” look like:
\x{tld-length}TLD\x{2nd-label-length}2ND-LABEL\x{3rd-label-length}3RD-LABEL[...]\x00
and will be sorted as follows:
So, let’s assume we were given a very strange/tiny MTBL file with just the following domain names:
. [aka "the DNS root"]
abc.info
af
bbc.co.uk
biz
google.com
host.af
mmcz.co.zw
mx.ucla.edu
www.nic.in
zapp.com
Those names would be saved and returned (perhaps initially counter-intuitively) in the order:
Reversed representation Reason for this ordering
[0] (the root is the smallest possible TLD)
[2]af[0] ("af" 2 character TLD is longer than 0 character (root) domain)
[2]af[4]host[0] (same 2 character TLD, but longer 2nd-level domain)
[2]in[3]nic[3]www[0] ("in" 2 character TLD comes after "af" TLD)
[2]uk[2]co[3]bbc[0] ("uk" 2 character TLD comes after "in" TLD)
[2]zw[2]co[4]mmcz[0] ("zw" 2 character TLD comes after "uk" TLD)
[3]biz[0] ("biz" 3 character TLD is longer than the "za" TLD)
[3]com[4]zapp[0] ("com" 3 character TLD comes after "biz" TLD)
[3]com[6]google[0] (same TLD, but the 6 char 2LD "google" > the 4 char 2LD "zapp")
[3]edu[4]ucla[2]mx[0] ("edu" 3 character TLD comes after "com" TLD)
[4]info[3]abc[0] ("info", a 4 character TLD comes after all 3 character TLDs)
Recall that we mentioned in Section IV that the fields comprising the ENTRY_TYPE_RRSET key are:
We’ve talked about the type byte and the RRset owner name. The three remaining items that form the rest of the ENTRY_TYPE_RRSET key are:
Let’s now look at a few examples.
Example A): We’ll use dnsdbq to get up to a million “A” records for the arbitrarily selected domain name
vod.uoregon.edu
, from the
vod.uoregon.edu
bailiwick, for fully qualified domain names seen in the last 60 days. The full results for that query in JSON Lines format (with human-readable datetimes) looks like the following (lines wrapped for display in this article):
$ dnsdbq -r vod.uoregon.edu/A/vod.uoregon.edu -l0 -j -T datefix -A60d
{"count":6,"time_first":"2021-10-05 17:15:44","time_last":"2021-10-06 18:59:20",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["34.210.135.53","44.238.165.32","44.239.179.132"]}
{"count":2,"time_first":"2021-10-07 23:30:28","time_last":"2021-10-09 01:44:12",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["34.214.137.119","44.238.165.32","44.239.179.132"]}
{"count":7,"time_first":"2021-10-10 04:34:28","time_last":"2021-10-13 14:16:46",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["34.214.137.119","44.238.165.32","54.69.240.214"]}
{"count":3,"time_first":"2021-10-14 15:11:18","time_last":"2021-10-15 17:04:59",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["34.214.137.119","54.69.240.214","54.187.37.159"]}
{"count":4,"time_first":"2021-10-17 21:38:55","time_last":"2021-10-18 23:58:04",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["34.214.137.119","54.187.37.159","54.213.245.156"]}
{"count":4,"time_first":"2021-11-08 23:30:40","time_last":"2021-11-09 01:35:57",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["44.233.200.82","44.237.40.166","52.12.193.206"]}
{"count":1,"time_first":"2021-10-27 17:26:46","time_last":"2021-10-27 17:26:46",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["44.237.40.166","52.43.178.149","54.187.37.159"]}
{"count":31,"time_first":"2021-11-23 19:39:26","time_last":"2021-11-24 16:23:45",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["44.237.209.101","52.25.117.189","52.88.16.106"]}
{"count":3,"time_first":"2021-11-27 23:45:31","time_last":"2021-11-27 23:45:31",
"rrname":"vod.uoregon.edu.","rrtype":"A","bailiwick":"vod.uoregon.edu.",
"rdata":["52.25.117.189","52.25.186.42","52.88.16.106"]}
That’s rather “visually dense.” If we look at just the Rdata for our results with jq (see https://stedolan.github.io/jq/ ), the fact that the results are sorted by Rdata (when the RRname, Bailiwick and RRtype are constant, as they are for this query) is easy to ascertain (even though we won’t look at Rdata sorting in detail today):
$ dnsdbq -r vod.uoregon.edu/A/vod.uoregon.edu -l0 -j -T datefix -A60d | jq -r '.rdata' -c
["34.210.135.53","44.238.165.32","44.239.179.132"]
["34.214.137.119","44.238.165.32","44.239.179.132"]
["34.214.137.119","44.238.165.32","54.69.240.214"]
["34.214.137.119","54.69.240.214","54.187.37.159"]
["34.214.137.119","54.187.37.159","54.213.245.156"]
["44.233.200.82","44.237.40.166","52.12.193.206"]
["44.237.40.166","52.43.178.149","54.187.37.159"]
["44.237.209.101","52.25.117.189","52.88.16.106"]
["52.25.117.189","52.25.186.42","52.88.16.106"]
Example B): Now let’s look at a “less-tightly constrained” example. Let’s look at results for for the arbitrarily selected domain name
www.cs.uoregon.edu
for the last quarter. We’ll request any/all (non-DNSSEC) RRtypes across all bailiwicks for that domain. For ease of display, we’ve wrapped the results, just as we did in Example A:
$ dnsdbq -r www.cs.uoregon.edu -j -T datefix -A90d
{"count":8598,"time_first":"2010-08-14 05:11:07","time_last":"2021-11-30 02:26:39",
"rrname":"www.cs.uoregon.edu.","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.4.25"]}
{"count":218670,"time_first":"2010-06-24 11:33:02","time_last":"2021-11-30 14:59:10",
"rrname":"www.cs.uoregon.edu.","rrtype":"A","bailiwick":"cs.uoregon.edu.",
"rdata":["128.223.4.25"]}
{"count":2492,"time_first":"2016-09-22 20:38:09","time_last":"2021-11-30 01:57:23",
"rrname":"www.cs.uoregon.edu.","rrtype":"AAAA","bailiwick":"uoregon.edu.",
"rdata":["2607:8400:205e:40::80df:419"]}
{"count":57021,"time_first":"2016-09-22 19:16:39","time_last":"2021-11-30 09:43:08",
"rrname":"www.cs.uoregon.edu.","rrtype":"AAAA","bailiwick":"cs.uoregon.edu.",
"rdata":["2607:8400:205e:40::80df:419"]}
Looking at those four records in JSON Lines format, we see that we’ve received:
uoregon.edu
cs.uoregon.edu
uoregon.edu
cs.uoregon.edu
Those results all appeared in exactly the order we’ve described and expected to see.
Example C): Now let’s look at a still more complex example. Let’s look at all
*.uoregon.edu
domains as seen over the last 90 days. Because that’s likely to be a lot of data, we’ll save those results to a file for ease of review:
$ dnsdbq -r "*.uoregon.edu" -l0 -A90d -j -T datefix > uoregon.jsonl
$ wc -l uoregon.jsonl
20983 uoregon.jsonl
Since we can see that we have nearly 21,000 results from that query, to keep this writeup to reasonable length, we’ll just show selected “snippets” of those results. For example, starting with the top of that file, we see:
$ more uoregon.jsonl
{"count":472411,"time_first":"2019-02-22 01:10:03","time_last":"2021-11-30 19:55:15",
"rrname":"uoregon.edu.","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["184.171.111.233"]}
{"count":3396086,"time_first":"2020-01-29 22:16:36","time_last":"2021-11-30 20:12:37",
"rrname":"uoregon.edu.","rrtype":"NS","bailiwick":"edu.",
"rdata":["lsu-bdds1.lsu.edu.", "phloem.uoregon.edu.","ruminant.uoregon.edu.",
"ns1.f5cloudservices.com."]}
{"count":13074799,"time_first":"2020-01-29 22:17:41","time_last":"2021-11-30 20:35:19",
"rrname":"uoregon.edu.","rrtype":"NS","bailiwick":"uoregon.edu.",
"rdata":["lsu-bdds1.lsu.edu.", "phloem.uoregon.edu.","ruminant.uoregon.edu.",
"ns1.f5cloudservices.com."]}
{"count":3705,"time_first":"2021-09-01 21:46:35","time_last":"2021-09-02 00:07:53",
"rrname":"uoregon.edu.","rrtype":"SOA","bailiwick":"uoregon.edu.",
"rdata":["phloem.uoregon.edu. hostmaster.uoregon.edu. 2021090113 3600 1800 605000 600"]}
[...]
{"count":811398,"time_first":"2018-09-17 18:10:16","time_last":"2021-11-30 14:12:00",
"rrname":"uoregon.edu.","rrtype":"MX","bailiwick":"uoregon.edu.",
"rdata":["10 mxa-000bfd01.gslb.pphosted.com.",
"10 mxb-000bfd01.gslb.pphosted.com."]}
{"count":14194,"time_first":"2021-03-15 19:42:26","time_last":"2021-11-30 09:19:17",
"rrname":"uoregon.edu.","rrtype":"TXT","bailiwick":"uoregon.edu.",
"rdata":["\"v=spf1 mx ip4:128.223.0.0/16 ip4:163.41.128.0/17 ip4:184.171.0.0/17
ip6:2001:468:d00::/40 ip6:2607:8400:2802::/32 ip4:148.163.128.0/19
ip4:72.10.180.28/31 ?all\""]}
Those first six records shown above are all for just the raw delegation point (e.g., “uoregon.edu”). We see:
The next few records in the results look like:
{"count":51,"time_first":"2016-12-20 06:53:24","time_last":"2021-10-02 21:59:25",
"rrname":"windows-8.1.uoregon.edu.","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.142.45"]}
{"count":543,"time_first":"2019-04-04 11:16:20","time_last":"2021-11-27 16:06:45",
"rrname":"m.uoregon.edu.","rrtype":"CNAME","bailiwick":"uoregon.edu.",
"rdata":["drupal-hosting-web-cluster5-prod.uoregon.edu."]}
{"count":48,"time_first":"2020-10-20 11:48:29","time_last":"2021-09-30 20:54:47",
"rrname":"dyn-128.223.65.70.uoregon.edu.","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.65.70"]}
{"count":43,"time_first":"2020-11-05 23:18:22","time_last":"2021-11-24 18:18:10",
"rrname":"128.223.34.76.uoregon.edu.","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.34.78"]}
{"count":55,"time_first":"2019-10-19 05:54:16","time_last":"2021-09-14 11:25:15",
"rrname":"dyn-128.223.65.78.uoregon.edu.","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.65.78"]}
{"count":66,"time_first":"2019-10-09 06:03:15","time_last":"2021-09-30 20:55:41",
"rrname":"dyn-128.223.65.91.uoregon.edu.","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.65.91"]}
That may seem like a totally crazy order of presentation until you remember that we’re sorting the RRnames by reversed label order, and we pay attention to the length of each label.
That means that in this case, the as-reversed-by-label RRnames actually look like:
edu.uoregon.1.windows-8
edu.uoregon.m
edu.uoregon.70.65.223.dyn-128
edu.uoregon.76.34.223.128
edu.uoregon.78.65.223.dyn-128
edu.uoregon.91.65.223.dyn-128
The highlighted bits are indeed sorted in ascending order (and we need look no further than the (manually) highlighted label in each of those names to confirm those records are correctly sorted).
Before continuing to scrutinize those results, let’s rerun our query with our output RRnames reversed “automatically:”
$ dnsdbq -r "*.uoregon.edu" -l0 -A90d -j -T datefix,reverse,chomp > uoregon-reversed.jsonl
The highlighted options will ensure that:
datefix
www.example.com
com.example.www
chomp
When we run that dnsdbq command, some of the results in the output file look like:
{"count":46,"time_first":"2015-07-04 11:13:11","time_last":"2021-11-18 02:43:21",
"rrname":"edu.uoregon.ac","rrtype":"CNAME","bailiwick":"uoregon.edu.",
"rdata":["lcb-web2c.uoregon.edu."]}
{"count":3935,"time_first":"2020-12-21 20:24:54","time_last":"2021-11-30 14:19:25",
"rrname":"edu.uoregon.ad.pki","rrtype":"CNAME","bailiwick":"uoregon.edu.",
"rdata":["ad-sca.ad.uoregon.edu."]}
{"count":3769287,"time_first":"2014-05-30 06:42:49","time_last":"2021-11-30 09:05:33",
"rrname":"edu.uoregon.ad.ad-dc1","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.34.139"]}
{"count":35588723,"time_first":"2014-05-30 06:42:49","time_last":"2021-11-30 09:05:33",
"rrname":"edu.uoregon.ad.ad-dc2","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.34.134"]}
{"count":35588522,"time_first":"2014-05-30 06:42:49","time_last":"2021-11-30 09:05:33",
"rrname":"edu.uoregon.ad.ad-dc3","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.34.140"]}
{"count":35588176,"time_first":"2014-05-30 06:42:49","time_last":"2021-11-30 17:25:19",
"rrname":"edu.uoregon.ad.ad-dc4","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.34.135"]}
{"count":3266,"time_first":"2020-12-21 21:38:58","time_last":"2021-11-30 19:14:29",
"rrname":"edu.uoregon.ad.ad-kms","rrtype":"A","bailiwick":"uoregon.edu.",
"rdata":["128.223.162.89"]}
[etc]
You might briefly think “Uh oh, something must be wrong — what’s the reversed domain edu.uoregon.ad.pki doing ahead of the reversed domain edu.uoregon.ad.ad-dc1?”
Thinking carefully about this, remember that RRnames are sorted by label LENGTH, then by the values of the label. “pki” (at 3 characters) is shorter than “ad-dc1” (at 5 characters).
The “display-in-reversed-by-label” format output makes it clear that all is well in the default sequencing world for this modest-size dataset of nearly 21,000 results.
Example D:) What do we run into if we try doing something totally crazy, like trying to look at all
*.net
names for the last three days?
$ dnsdbq -r "*.net" -l0 -A3d -j -T datefix,reverse,chomp > net-reversed.jsonl
Database limit: Result limit reached
Hmm. Okay, there’s far more than a million results in *.net. That’s not really a surprise — https://www.verisign.com/en_US/channel-resources/domain-registry-products/zone-file/index.xhtml says that there are nearly 13.5 million dot net domains registered (to say nothing of all the combinations of FQDNs, RRtypes, bailiwicks and Rdata involving those domains that DNSDB tracks and reports).
Let’s ask for the three offset tranches we can get for this query, for an aggregate total of up to 4,000,000 results (though even that obviously isn’t going to get us through the total set of results that DNSDB has for
*.net
):
$ dnsdbq -r "*.net" -l0 -A3d -j -T datefix,reverse,chomp -O1000000 >> net-reversed.jsonl
Database limit: Result limit reached
$ dnsdbq -r "*.net" -l0 -A3d -j -T datefix,reverse,chomp -O2000000 >> net-reversed.jsonl
Database limit: Result limit reached
$ dnsdbq -r "*.net" -l0 -A3d -j -T datefix,reverse,chomp -O3000000 >> net-reversed.jsonl
Database limit: Result limit reached
$ wc -l net-reversed.jsonl
4000000 net-reversed.jsonl
Let’s check out what we’ve gotten in those four million results. What does the first observation look like (wrapped for display here)?
$ head -1 net-reversed.jsonl
{"count":212,"zone_time_first":"2019-11-11 15:52:29","zone_time_last":"2021-11-28 22:50:24",
"rrname":"net","rrtype":"NS","bailiwick":"net.",
"rdata":["a.gtld-servers.net.","b.gtld-servers.net.","c.gtld-servers.net.",
"d.gtld-servers.net.","e.gtld-servers.net.","f.gtld-servers.net.",
"g.gtld-servers.net.","h.gtld-servers.net.","i.gtld-servers.net.",
"j.gtld-servers.net.","k.gtld-servers.net.","l.gtld-servers.net.",
"m.gtld-servers.net."]}
That first observation is, as expected, Zone File data (rather than sensor data), and for the raw TLD name (e.g.,
net
) itself.
What about the last of those 4,000,000 results?
$ tail -1 net-reversed.jsonl
{"count":145,"zone_time_first":"2021-07-06 22:50:22","zone_time_last":"2021-11-28 22:50:24",
"rrname":"net.tzsws","rrtype":"NS","bailiwick":"net.",
"rdata":["v1s1.xundns.com.","v1s2.xundns.com."]}
At 4,000,000 observations for our
*.net
query, we’re STILL wading through Zone File data, and we’re only up to dot net 2nd-labels that are five characters long. We haven’t seen ANY sensor network data for dot net at all yet! This example perfectly illustrates:
While you don’t have the ability to preferentially sort and change the the subset of results you receive from DNSDB on the DNSDB server itself, you CAN sort the subset of results you’ve received “client side” (e.g., once the software client you’re using to access DNSDB has received those results).
Just for completeness, we’ll show you how to do this in two DNSDB clients: in DNSDB Scout (our GUI point and click web-based client), and in dnsdbq (our command line interface).
In DNSDB Scout, after you’ve run a sample search, simply click on a heading in the table of results to sort by that field.
For example, let’s sort the results for a
*.uoregon.edu
query by count, by clicking on the “Count” header row:

Want to see values in reverse order, instead? Click the same heading again. Want to sort by a different field? Just click that header.
If you’re using our command line client, dnsdbq, there are two sort-related command line options you need to be aware of — dash ess (with the ess either lower case or capitalized) and dash kay. Cutting and pasting from
$ man dnsdbq
we see:
-s sort output in ascending key order. Limits (if any) specified
by -l and -L will be applied before and after sorting,
respectively. In batch mode, the -f, -ff, and -ffm option sets
will cause each batch entry's result to be sorted independently,
whereas with -fm, all outputs will be combined before sorting.
This means with -fm there will be no output until after the
last batch entry has been processed, due to store and forward
by the sort process.
-S sort output in descending key order. See discussion for -s
above.
-k sort_keys
when sorting with -s or -S, selects one or more comma
separated sort keys, among "first", "last", "duration",
"count", "name", "type", and/or "data". The default order
is "first,last,duration,count,name,type,data" (if sorting is
requested.) Names are sorted right to left (by TLD then 2LD
etc). Data is sorted either by name if present, or else by
numeric value (e.g., for A and AAAA RRsets.) Several -k
options can be given after different -s and -S options, to
sort in ascending order for some keys, descending for others.
Replicating our DNSDB Scout sorting example in dnsdbq:
$ dnsdbq -r "*.uoregon.edu" -l0 -S -k count | more
;; record times: 2010-06-24 03:09:00 .. 2021-11-30 19:21:56 (~11y ~162d)
;; count: 309154620; bailiwick: .
phloem.uoregon.edu. A 128.223.32.35
;; record times: 2010-06-24 03:09:00 .. 2021-11-30 19:21:56 (~11y ~162d)
;; count: 309006772; bailiwick: .
phloem.uoregon.edu. AAAA 2001:468:d01:20::80df:2023
;; record times: 2010-06-24 03:08:15 .. 2021-11-30 19:33:08 (~11y ~162d)
;; count: 203840350; bailiwick: uoregon.edu.
phloem.uoregon.edu. A 128.223.32.35
;; record times: 2010-06-24 03:08:15 .. 2021-11-30 19:33:08 (~11y ~162d)
;; count: 203537192; bailiwick: uoregon.edu.
phloem.uoregon.edu. AAAA 2001:468:d01:20::80df:2023
;; record times: 2010-06-24 03:08:15 .. 2021-11-30 15:38:40 (~11y ~162d)
;; count: 109575370; bailiwick: edu.
phloem.uoregon.edu. AAAA 2001:468:d01:20::80df:2023
[etc]
Misconception #1: Sorting Results Client-Side Will Change the Set of Results You Get — FALSE.
The subset of results you receive (out of all total possible matching results) is determined on the DNSDB server. The DNSDB server always returns matching results in their natural order as previously described above.
AFTER those results get downloaded to whatever client you’re using (such as dnsdbq or DNSDB Scout or a DNSDB integration), the client may SORT (and thus change the order of your results as DISPLAYED), but this has NO IMPACT on the set of results RECEIVED from DNSDB.
Some modifications to your queries that WILL potentially change what the DNSDB server returns as results include:
*.uoregon.edu
*.cs.uoregon.edu
www.cs.uoregon.edu
Misconception #2: You Can Specify Partial Label Wildcards (or Other Complex Pattern Searches) in DNSDB Standard Search to Narrow In On The Results Returned — FALSE.
When confronting a flood of DNSDB results, users may sometimes try to tweak their queries in ways that DNSDB Standard Search simply isn’t able to handle, such as attempting partial-label or mid-label wildcard searches.
By way of illustrating this, let’s consider some queries that are OK in DNSDB Standard Search:
Fully qualified domain names: www.example.com
Left hand wildcards: *.example.com
Right hand wildcards: example.*
Individual IPv4 addresses: 199.30.228.112
Individual IPv6 addresses: 2620:11c:f008::13
IPv4 CIDR network address blocks: 199.30.228.0/24
IPv6 CIDR network address blocks: 2620:11c:f008::/64
IPv4 address dashed ranges: 199.30.228.110-199.30.228.205
IPv6 address dashed ranges: 2620:11c:f008::5-2620:11c:f008::79
Now let’s consider some queries that are NOT OK in DNSDB Standard Search:
Double-sided wildcards: *example*
Mid-label wildcards: www.ex*ple.com
Partial-label wildcards: *ple.com
Wildcarded IP addresses: 128.223.32.*
Note that the above list is illustrative (and not an exhaustive list) of okay and problematic DNSDB Standard Search patterns (for example, we did not show raw hex queries).
If you’d like to find domain names that match “keywords” such as brand names (or domain names that match more complex patterns — up to and including regular expressions), try DNSDB Flexible Search. DNSDB Flexible Search can be an amazing “finding aid,” and it is bundled free with your DNSDB API or DNSDB Export subscription. For more details on getting started with Flexible Search, see the introductory DNSDB Flexible Search slide deck at https://www.domaintools.com/wp-content/uploads/DNSDB_Flexible_Search_Intro.pdf
Misconception #3: “I Can Ask for JUST Sensor Data from DNSDB API, Excluding Zone File Data” — FALSE.
DNSDB API Users: Unfortunately, you cannot currently say “Please exclude Zone File Data from my DNSDB API results.” If relevant Zone File results are available for a given DNSDB API query, you WILL receive them. Naturally, you can drop them once you receive them if you really don’t want them (e.g., for example by using grep -v), but you can’t exclude them a priori. (If this is functionality you think you might find useful, we’d love to hear from you about this.)
DNSDB Export Users: Bulk Zone File data cannot be provided to DNSDB Export customers, so Zone File data is always “automatically” excluded from searches made by DNSDB Export users (unless the DNSDB Export customer arranges with ICANN to directly download their own Zone File data, in which case we’re happy to help the DNSDB Export customer locally ingest that data).
**Misconception #4: If I Just Time Fenced My Request Sufficiently Aggressively, I Could Successfully Dump a Full Slice of a Big TLD (Such as Half-an-Hour’s Worth of .com or .net) — FALSE.
If you attempt this sort of strategy, your query will normally timeout/fail. For example, if you tried to dump half an hour’s worth of dot com, you might see:
$ dnsdbq -r "*.com" -A30m -l0 -j > star-dot-com.jsonl
dnsdbq: warning: libcurl failed with curl error 18 (Transferred a partial file)
Query response missing: Data transfer failed -- No SAF terminator at end of stream
If you’d like to learn more about SAF, Farsight’s Streaming API Framing Protocol, and how it helps to protect you from incomplete results, see https://www.domaintools.com/resources/user-guides/farsight-streaming-api-framing-protocol-documentation/
Misconception #5: There’s NO WAY To Dump All Matching Names for A Given Pattern from DNSDB (Even If You Have DNSDB Export) — FALSE.
DNSDB Export (aka “DNSDB On Premises”) is customarily described as “like DNSDB API, but running on local hardware.” It is accessed using a local copy of the same front end that normally handles DNSDB API queries run over the Internet, and behaves similarly — except for the fact it is running “on premises.”
That said, those who have purchased DNSDB Export can request permission from their account executive to directly access DNSDB MTBL files and do custom searches that exceed normal search parameters/normal search limits using
dnstable_lookup
and/or
dnstable_dump
.
We hope you now have at least a basic understanding for how results are ordered in DNSDB MTBL files, and how that ordering can impact the subset of results you receive out of the total set of results that may exist. You’ve seen some worked examples of how that ordering appears, and we’ve tackled what you can do “client side” with the results you receive. We’ve also tried to clear up some common misconceptions. We hope this discussion has helped to clarify why DNSDB results come out in the order they do, and why you get the results you get.
The author would like to thank (in alphabetical order) Ben April, Pawel Foremski, Chris Mikkelson, David Waitzman, and Stephen Watt for their review and extremely helpful comments on earlier drafts of this article. Any remaining errors are solely the responsibility of the author.