featured image, abstract image
Blog Farsight TXT Record

How Many "Parts" (or "Labels") Does A Domain Name Typically Have?

How Many “Parts” (or “Labels”) Does A Domain Name Typically Have?

I. Introduction

Domain names can be thought of as a series of “labels” or “parts” or “chunks” separated with dots.

We’re all familiar with domain names that have three labels, such as www.example.com.

It’s also not unusual to find domain names that have just two labels (such as uoregon.edu) or an “extra” label (as in www.matse.illinois.edu).

But what does the distribution of domain names really look like? Are there domains with more than four labels? More than ten? What does that distribution look like? We decided to look at a day’s worth of data from DNSDB and find out.

II. The Dataset

We arbitrarily selected June 28th, 2017 for this study. The DNSDB Export MTBL file for that date, dns.20170628.D.mtbl, is a medium-sized file, at 31,626,779,104 bytes. That mtbl file is in a compact binary format, but can be exported into a more-convenient text format for simple analyses.

We processed that file with the command pipeline:

$ dnstable_dump -r /export/dnstable/mtbl/dns.20170628.D.mtbl | grep -v ";;" | grep -v "^$" | grep -v ".in.addr." | grep -v ".ip6.arpa." | awk '{print $1}' > rrnames.txt

That command string dumps the rrnames (domain names) from the specified mtbl file, removing any:

-- comment lines (";;") 
-- blank lines ("^$")
-- IPv4 inverse adress records (".in.addr.") and
-- IPv6 inverse address records (".ip6.arpa."). 

We then save the resource record names into a temporary file as a first step.

Once that job finished, we had a list of unsorted resource record names in rrnames.txt, including potentially many duplicate domain names. To eliminate duplicates, we sorted and uniq’d those names:

$ sort -u < rrnames.txt > uniq-rrnames.txt

We then counted the number of “dots” per unique name, outputting one “dot count” per record. Conveniently, because each names end in a trailing dot, a name with two labels will also have two dots, a name with three parts will have three dots, etc. We then counted those dots by saying:

$ sed 's/[^\.]//g' < uniq-rrnames.txt | awk '{ print length }' > rrname-uniq-dot-count.txt

We then sorted the dot counts in descending order by observed frequency of occurrence. Generally, this followed the length of the domain, but in a couple of cases we had to move things around manually to get them in ascending order by label count.

$ sort rrname-uniq-dot-count.txt | uniq -c | sort -nr > rrname-uniq-dot-count-tops.txt

III. The Distribution of Label Counts

The number of unique RRnames with <N> labels can be see in Figure 1. When reviewing Figure 1, note that this graph has log-linear axes.

Figure 1.

Unique RRnames with <N>Labels/Name

Summarizing that graph, 99.98% of all unique RRnames seen have 10 or fewer labels, and 78.36% have just 1, 2 or 3 labels:

Table 1.

Count        # of Labels       % obs     Cum %
sn-5ualdn7l.gvt1.com_-.edgedl_-.release2_-.aj7czeerus1-_-.59.0.3071.115_58.0.3029.110_chrome_updater.exe.58af.un-6a4b.v3.url.zvelo.com.ecc-untangle.ecc-clinic[dot]org.

0.68.106.66.73_-.data_-.03ef32f2a1e6db82_-.r5sn-5ualdn7l.gvt1.com_-.edgedl_-.release2_-.aj7czeerus1-_-.59.0.3071.115_58.0.3029.110_chrome_updater.exe.58af.un-6a4b.v3.url.zvelo.com.ecc-untangle.ecc-clinic[dot]org.

paypal.com.us.webapps.mpp.home.signin.country.x.us.locale.x.en.us.mpp.account.selection.customer.personal.account.info.privacy.legal.contact.home.request.form-check.com.yandex[dot]ru.

Long names similar to these are sometimes used to confuse users and potentially lure them into visiting a malware-dropping or phishing site.

Other names looked like:

daewoo.daewoo.daewoo.daihatsu.daihatsu.daihatsu.daihatsu.daihatsu.daihatsu.bmw.suzuki.bmw.bmw.bmw.bmw.subaru.subaru.subaru.subaru.subaru.subaru.subaru.bmw.bmw.bmw.bmw.test.auto.testquelle[dot]de.

daewoo.daewoo.jeep.jeep.jeep.jeep.daewoo.rover.rover.rover.rover.nissan.nissan.nissan.daewoo.daewoo.daewoo.daewoo.daewoo.daewoo.daihatsu.suzuki.suzuki.suzuki.suzuki.subaru.test.staubsauger.testquelle[dot]de.

daewoo.daewoo.rover.rover.rover.rover.rover.rover.rover.rover.donkervoort.donkervoort.rover.rover.rover.rover.rover.rover.rover.rover.audi.audi.audi.audi.audi.lexus.lexus.donkervoort.was-tun-bei-motorschaden[dot]de.

These names may have been crafted in this format in a misguided attempt at improving search engine rankings.

V. Conclusion

You now know a bit more about how many labels typically make up a domain name on the Internet, and you may be able to see how you could use DNSDB Export datasets to explore DNS-related questions of your own. Please contact Farsight Sales at [email protected] or visit https://www.farsightsecurity.com/order-services/ for more information about obtaining access to DNSDB Export datasets.

Appendix 1. Raw Data

Frequency     # of Labels
1,535         1
96,557,627    2
85,837,909    3
22,353,662    4
16,056,670    5
3,094,765     6
6,560,952     7
1,783,962     8
193,365       9
279,332       10
7,994         11
4,039         12
6,590         13
2,378         14
2,248         15
1,426         16
859           17
679           18
555           19
616           20
500           21
520           22
419           23
336           24
393           25
552           26
342           27
214           28
779           29
6,997         30
191           31
169           32
162           33
154           34
168           35
151           36
109           37
108           38
88            39
91            40
72            41
63            42
57            43
57            44
50            45
44            46
44            47
43            48
42            49
40            50
38            51
38            52
38            53
39            54
36            55
35            56
37            57
32            58
33            59
29            60
24            61
15            62
10            63
1             64
2             67
1             70
1             71
1             72
1             79

Joe St Sauver Ph.D. is a Scientist for Farsight Security, Inc.