Enhancing DNSDB to Better Handle DNS Wildcard Names

Introduction

This blog article explains how DomainTools is enhancing DNSDB to better handle wildcard DNS records.

Before we can explain those enhancements, let’s quickly review wildcard DNS records.

DNS Wildcard Records

Normally, we make DNS queries for already-defined fully qualified domain names (FQDNs). For example, using the Un*x dig command, we can see the company’s website resolves to an IPv4 address:

$ dig www.domaintools.com +short
199.30.228.112

This happens because a specific DNS resource record is defined in the company’s authoritative DNS. A given registerable base domain name (such as domaintools.com) might have many predefined names of that sort. Those names might include FQDNs for mail servers, storage servers, network printers, etc. Each of those networked systems would have a corresponding regular DNS entry defined, and as a result, the DNS works for those systems.

In that sort of environment, names that aren’t specifically defined WON’T resolve. You can verify this by trying to resolve a random “gibberish” host name – it won’t work (as expected):

$ dig insdvkncvkxcnvxcvn.domaintools.com +short
[nothing is found or returned for the gibberish host name]

Some domains, however, are set up to leverage “DNS wildcard records.” Those wildcard records will respond to literally ANY hostname folks may try to resolve. For example, Tumblr has set up a wildcard DNS record so that anything at tumblr.com will resolve:

$ dig kjsndvjknsvijn.tumblr.com +short
74.114.154.18
74.114.154.22
$ dig joe-uses-too-much-hot-sauce-in-his-chili.tumblr.com +short
74.114.154.22
74.114.154.18

Sometimes even entire top level domains will have DNS wildcard records. For example, dot ph is currently set up this way:

$ dig ojnosufjojsf.ph +short
45.79.222.138

There are many subtle considerations regarding DNS wildcard records we’re not going to cover in this brief introduction. If interested in a detailed and in-depth treatment, see “The Role of Wildcards in the Domain Name System,” https://datatracker.ietf.org/doc/html/rfc4592

“Why Do Sites Have DNS Wildcard Records?”

The two most common reasons why sites define DNS wildcard records are:

(a) Operational Simplicity: Some organizations routinely allow users to create new sites under one of their domains. These sites are accessed via unique hostnames that point at the users’ pages. To properly route visitors to their users’ sites, the organization COULD continually create new specific DNS entries (a new DNS record every time a new site is registered), but it’s far easier to define one wildcard record that routes visitors for all real (or potentially real) sites to a single set of load balancers, leaving it to the load balancing infrastructure to “take it from there.”

(b) Advertising Revenue: Other sites, including some TLDs, may view any typo-prone visitor purely as a potential viewer for advertising. This means that if someone tries to go to some non-existent domain, that’s actually a GOOD thing, because they can then be shown some ads. Wildcard DNS entries are normally used to enable that “show them advertisements if they make a typo” functionality.

DNS wildcard records may also be mis-used and exploited by those perpetrating distributed denial of service (DDoS) attacks. Wildcards will routinely enable resolution of long hostnames while also permitting multiple levels of non-existent subdomains. A continual stream of exceptionally-long, resolvable-but-randomized, spoofed DNS queries will typically result in large attack payloads (and those results will have poor caching properties, too).

“Are DNS Wildcard Names Different From DNSDB Wildcard Queries?”

Yes, DNSDB wildcard queries are DEFINITELY different from DNS wildcard names.

You can make queries for specific FQDNs to DNSDB Standard Search (for example, you could look up the specific FQDN www.princeton.edu in DNSDB Standard Search). However, DNSDB Standard Search also supports whole label “left hand wildcard” and whole label “right hand wildcard” searches.

This means that in addition to being able to look up a specific hostname in DNSDB, you could use a left hand wildcard query to see all the FQDNs harvard.edu has defined:

*.harvard.edu

Or you could use a right hand wildcard search to find FQDNs (if any) that start with “yale” (other than the yale.edu domain we’d expect to see):

yale.*

Those sorts of search patterns are normally referred to as “wildcard searches,” but they are NOT related to the wildcard DNS names described in section two of this article.

We’ll also take this opportunity to remind you that if you’ve ever wished you could make a broader range of wildcard queries in DNSDB (such as partial label wildcards or mid-name wildcards, or even regular expression searches), you now can do so by using DNSDB Flexible Search! Check out the Flexible Search training deck that’s available at https://www.farsightsecurity.com/assets/media/download/DNSDB_Flexible_Search_Intro.pdf

The Problems That DNS Wildcard Records Raise Can Be a Technical Challenge In Passive DNS

At the risk of stating the obvious, DNSDB queries should return usable results. In particular, you need to be able to see results relevant to your underlying analysis, not just drown in noise. Consider the following realities:

A single DNSDB API query will return at most a million results. That said:

Some DNSDB API clients (such as DNSDB Scout) may have a lower result limit due to browser-related limitations in handling tables.

After the first million results from your initial query, if you still need more, you can make three additional supplemental “offset” queries (each potentially returning an additional million results) for up to four million results in all.

A million results is unquestionably a LOT of results. To make that tangible, assume each result is just a single line, and there are 66 lines per printed page. A million lines of output would be over 15,000 printed pages!

Now combine that with the fact that a single DNS wildcard name will often produce more than a million unique observations in just a single day, and continue to accrete results at that rate day after day after day. Left unmanaged, wildcard values will choke out “real” domain names in user results. That’s a user-visible and important practical problem.

DNS wildcard names can also cause DNSDB MTBL file storage to bloat if left unmanaged, unnecessarily driving up storage costs for anyone running a copy of DNSDB “on premises” via DNSDB Export.

For these reasons and others, DomainTools routinely scrubs DNS wildcard names, preventing their pollution of DNSDB. In the past, the addition of new filters silently stopped data collection for the filtered wildcarded domain. Users might have wondered what happened — did that domain break? Did the sensor(s) covering that domain cease contributing? Is there an error in the software? That has changed.

So What’s Actually NEW?

We now flag filtered wildcard domains with a special uppercase leading label, _WILDCARD_

$ dnsdbq -r \*.mycricket.com/CNAME -A1d
;; record times: 2022-07-05 11:44:27 .. 2022-07-27 19:16:11 (~22d 7h 31m)
;; count: 81696; bailiwick: mycricket.com.
_WILDCARD_.mycricket.com.  CNAME  www.cricketwireless.com.edgekey.net.

When you see that uppercase leading tag, you’ll know that matching records, many of which have the appearance of randomly generated data, have been “rolled up” into a single consolidated entry. We believe this is better than filtering those entries without a trace because the _WILDCARD_ entries make it possible to continue to collect first seen/last seen timestamps, and counts.

_WILDCARD_ FAQ:

FAQ-1. “Are There Very Many of These New _WILDCARD_ Entries? Can I Perhaps Do A DNSDB Search For _WILDCARD_.* and Get A List of Them All?”

While you might run into _WILDCARD_ tagged names in your output, DNSDB API users CANNOT currently search for a list of all _WILDCARD_ domains using dnsdbq, DNSDB Scout or similar DNSDB API clients. This exclusion is intentional.
However, DNSDB Export users who have local MTBL files CAN pull a list of those records with the dnstable_dump command. As an example, using one recent daily MTBL file:

$ dnstable_dump -r dns.20220726.D.mtbl -j | fgrep "_WILDCARD_"
{"count":17076,"time_first":1658736570,"time_last":1658857004,"rrname":"_WILDCARD_.gap.ae.","rrtype":"A","bailiwick":"gap.ae.","rdata":["162.13.201.232"]}
[etc]

$ dnstable_dump -r dns.20220726.D.mtbl -j | fgrep "_WILDCARD_" > _WILDCARD_.txt
$ sort -u < _WILDCARD_.txt | wc -l
756

The number of _WILDCARD_ records will vary from MTBL file to MTBL file, depending on the traffic we see.

FAQ-2. “But If I Check DNSDB With dnsdbq, I DO See some _wildcard_.* entries reported!”

Note that these entries are lowercase. They are “as-seen” in our data feeds, and do NOT represent domains we’ve consolidated into _WILDCARD_ records. Special wildcard entries will ALWAYS start with _WILDCARD_ (uppercase only)

FAQ-3. “What Do the Counts Mean for _WILDCARD_ records? And the time first_seen and time last_seen?”

The counts represent the number of number of matching records for all matching records “rolled up” into the wildcard, counting from the time the _WILDCARD_ entry was instantiated. Similarly, the datetimes reflect the earliest (and latest) datetimes seen for any matching resource records rolled up into the wildcards.

FAQ-4. “Once A DNS Wildcard Is Identified, Will Earlier Wildcard Records (E.G., Entries Already Present in DNSDB) Be Retrospectively Consolidated into the New _WILDCARD_ entries?”

We’re not doing retrospective wildcard rollups at this time.

FAQ-5. “Will Newly Added _WILDCARD_ Entries Be Disclosed to Customers?”

Only in the form of new entries embedded in DNSDB Export MTBL files (as shown above in Section 6).

FAQ-6. “Can We ‘Nominate’ Domains That We Believe Are Wildcards? Or Tell You About Records That Are No Longer Operating as Wildcards?”

Wildcards are identified and verified/reverified programmatically. While we appreciate your interest in helping, there is no need for a manual nomination/delisting process at this time.

FAQ-7. “After a _WILDCARD_ Entry Has Been Created, Why Do I Still See Some Other Specific Hostnames Under That Same Domain Name?”

_WILDCARD_ entries may be narrowly scoped, perhaps limited only to “A” records or “CNAME” records. Other entries with the same suffix (such as “NS” records or “TXT” records) may still be narrowly and statically defined. When that’s the case, those entries will continue to be tracked separately. Similarly, all other aspects of the RRset must match for a resource record to be consolidated into a _WILDCARD_ entry. Internally, the previous filtering method only matched records on a per-subdomain or subdomain+rrtype basis. The improved wildcarding also allows for filtering specific rdata per-subdomain or subdomain+rrtype. As an example of what this enables, if there is a wildcard rule for “*.foo.com A 127.0.0.1″ then records like “bar.foo.com A 1.2.3.4” will still produce un-filtered observations.

FAQ-8. “Is It Possible That I’ll See Two or More Different _WILDCARD_ Entries for the Same Domain Suffix?”

Yes. For example, imagine a _WILDCARD_ “A” record that generates answers for one IP address and then changes to a different one. You might then see two _WILDCARD_ “A” records, one for the old IP and one for the new IP (assuming the domain in question is still a wildcard!)

FAQ-9. “How Do We Know That 3rd Parties Won’t Poison DNSDB With Fake Wildcard Entries? (And how Dare YOU Stick a ‘Made Up’ Value Into DNSDB!)”

An attacker can’t produce entries with UPPERCASE _WILDCARD_ tags — all “regular” “RRnames” (as seen in our sensor data or zone file data) are forced to be lowercase-only during processing. Only our specially-added records will have a leading uppercase _WILDCARD_ tag.

Moreover, because the domains tagged ARE wildcards, they will literally answer for anything –including our special wildcard tag! It’s a very safe/conservative approach.

FAQ-10. How does this relate to SIE?

It should be noted that live SIE channel traffic, which serves as the raw input to DNSDB collection, is the upstream source where wildcard filtering actually occurs and wildcard management will also be noticeable there.

FAQ-11. “We’ve Still Got Questions! Who Can We Ask?”

ACKNOWLEDGEMENTS

DNSDB’s wildcard processing work is the result of contributions from many participants. We’d like to thank and acknowledge (in alphabetical order) Marc Evans, Christine Fogel, David Waitzman, and Stephen Watt.

Enhancing DNSDB to Better Handle DNS Wildcard Names

Share this entry

Introduction

DNS Wildcard Records

“Why Do Sites Have DNS Wildcard Records?”

“Are DNS Wildcard Names Different From DNSDB Wildcard Queries?”

The Problems That DNS Wildcard Records Raise Can Be a Technical Challenge In Passive DNS

So What’s Actually NEW?

_WILDCARD_ FAQ:

ACKNOWLEDGEMENTS

Sign up for our newsletter

Related Content

Part 2: Tracking LummaC2 Infrastructure

Newsletter No. 5: A Little Bit of Research in my life…

Tracking LummaC2 Infrastructure with Cats