Farsight TXT Record

How Many "Parts" (or "Labels") Does A Domain Name Typically Have?

Written by: 
Published on: 
Oct 13, 2017
On This Page
Share:

How Many “Parts” (or “Labels”) Does A Domain Name Typically Have?

I. Introduction

Domain names can be thought of as a series of “labels” or “parts” or “chunks” separated with dots.

We’re all familiar with domain names that have three labels, such as www.example.com.

It’s also not unusual to find domain names that have just two labels (such as uoregon.edu) or an “extra” label (as in www.matse.illinois.edu).

But what does the distribution of domain names really look like? Are there domains with more than four labels? More than ten? What does that distribution look like? We decided to look at a day’s worth of data from DNSDB and find out.

II. The Dataset

We arbitrarily selected June 28th, 2017 for this study. The DNSDB Export MTBL file for that date, dns.20170628.D.mtbl, is a medium-sized file, at 31,626,779,104 bytes. That mtbl file is in a compact binary format, but can be exported into a more-convenient text format for simple analyses.

We processed that file with the command pipeline:

$ dnstable_dump -r /export/dnstable/mtbl/dns.20170628.D.mtbl | grep -v ";;" | grep -v "^$" | grep -v ".in.addr." | grep -v ".ip6.arpa." | awk '{print $1}' > rrnames.txt

That command string dumps the rrnames (domain names) from the specified mtbl file, removing any:

-- comment lines (";;")
-- blank lines ("^$")
-- IPv4 inverse adress records (".in.addr.") and
-- IPv6 inverse address records (".ip6.arpa.").

We then save the resource record names into a temporary file as a first step.

Once that job finished, we had a list of unsorted resource record names in rrnames.txt, including potentially many duplicate domain names. To eliminate duplicates, we sorted and uniq’d those names:

$ sort -u < rrnames.txt > uniq-rrnames.txt

We then counted the number of “dots” per unique name, outputting one “dot count” per record. Conveniently, because each names end in a trailing dot, a name with two labels will also have two dots, a name with three parts will have three dots, etc. We then counted those dots by saying:

$ sed 's/[^\.]//g' < uniq-rrnames.txt | awk '{ print length }' > rrname-uniq-dot-count.txt

We then sorted the dot counts in descending order by observed frequency of occurrence. Generally, this followed the length of the domain, but in a couple of cases we had to move things around manually to get them in ascending order by label count.

$ sort rrname-uniq-dot-count.txt | uniq -c | sort -nr > rrname-uniq-dot-count-tops.txt

III. The Distribution of Label Counts

The number of unique RRnames with <N> labels can be see in Figure 1. When reviewing Figure 1, note that this graph has log-linear axes.

Figure 1.

Unique RRnames with <N>Labels/Name

Summarizing that graph, 99.98% of all unique RRnames seen have 10 or fewer labels, and 78.36% have just 1, 2 or 3 labels:

Table 1.

Count # of Labels % obs Cum %
sn-5ualdn7l.gvt1.com_-.edgedl_-.release2_-.aj7czeerus1-_-.59.0.3071.115_58.0.3029.110_chrome_updater.exe.58af.un-6a4b.v3.url.zvelo.com.ecc-untangle.ecc-clinic[dot]org.

0.68.106.66.73_-.data_-.03ef32f2a1e6db82_-.r5sn-5ualdn7l.gvt1.com_-.edgedl_-.release2_-.aj7czeerus1-_-.59.0.3071.115_58.0.3029.110_chrome_updater.exe.58af.un-6a4b.v3.url.zvelo.com.ecc-untangle.ecc-clinic[dot]org.

paypal.com.us.webapps.mpp.home.signin.country.x.us.locale.x.en.us.mpp.account.selection.customer.personal.account.info.privacy.legal.contact.home.request.form-check.com.yandex[dot]ru.

Long names similar to these are sometimes used to confuse users and potentially lure them into visiting a malware-dropping or phishing site.

Other names looked like:

daewoo.daewoo.daewoo.daihatsu.daihatsu.daihatsu.daihatsu.daihatsu.daihatsu.bmw.suzuki.bmw.bmw.bmw.bmw.subaru.subaru.subaru.subaru.subaru.subaru.subaru.bmw.bmw.bmw.bmw.test.auto.testquelle[dot]de.

daewoo.daewoo.jeep.jeep.jeep.jeep.daewoo.rover.rover.rover.rover.nissan.nissan.nissan.daewoo.daewoo.daewoo.daewoo.daewoo.daewoo.daihatsu.suzuki.suzuki.suzuki.suzuki.subaru.test.staubsauger.testquelle[dot]de.

daewoo.daewoo.rover.rover.rover.rover.rover.rover.rover.rover.donkervoort.donkervoort.rover.rover.rover.rover.rover.rover.rover.rover.audi.audi.audi.audi.audi.lexus.lexus.donkervoort.was-tun-bei-motorschaden[dot]de.

These names may have been crafted in this format in a misguided attempt at improving search engine rankings.

V. Conclusion

You now know a bit more about how many labels typically make up a domain name on the Internet, and you may be able to see how you could use DNSDB Export datasets to explore DNS-related questions of your own. Please contact Farsight Sales at [email protected] or visit https://www.farsightsecurity.com/order-services/ for more information about obtaining access to DNSDB Export datasets.

Appendix 1. Raw Data

Frequency # of Labels
1,535 1
96,557,627 2
85,837,909 3
22,353,662 4
16,056,670 5
3,094,765 6
6,560,952 7
1,783,962 8
193,365 9
279,332 10
7,994 11
4,039 12
6,590 13
2,378 14
2,248 15
1,426 16
859 17
679 18
555 19
616 20
500 21
520 22
419 23
336 24
393 25
552 26
342 27
214 28
779 29
6,997 30
191 31
169 32
162 33
154 34
168 35
151 36
109 37
108 38
88 39
91 40
72 41
63 42
57 43
57 44
50 45
44 46
44 47
43 48
42 49
40 50
38 51
38 52
38 53
39 54
36 55
35 56
37 57
32 58
33 59
29 60
24 61
15 62
10 63
1 64
2 67
1 70
1 71
1 72
1 79

Joe St Sauver Ph.D. is a Scientist for Farsight Security, Inc.