abstract image
Blog Farsight TXT Record

Visualizing the incidence of IPv4 addresses in Farsight's DNSDB


Recently, a customer asked “..how large is the DNSDB population of IP addresses with “current” DNS records?” I had some rough back of the napkin estimates, but nothing concrete. The article that follows is my take on finding a more calculated answer to that question.

To approach this problem we first need to define some elements of the question. First, we will define “current” by stating observations captured in the selected set of DNSDB-export mtbl file(s) are current for that set. We can run this analysis on an Hour/Day/Month/Year of data. For this study, I choose to run three passes, one with a daily file (2017-FEB-03), one with a monthly file (2017-JAN) and finally with a yearly file (2015). (At the time of this writing the 2016 yearly file was in the process of being generated so I chose our 2015 rollup). We will define DNS activity as the appearance of an A record with a valid IP address within the subject file (of course as we are looking at A records, we are only looking at IPv4 address space).

The finished product is shown below (read on to learn what it all means and how it was built): IPv4 Hilbert Curve Heatmap

(A higher-resolution version of this image is available here)

The Tools

To facilitate the process of collecting the data I created a quick set of IPv4 mapping tools in golang to manipulate a one-bit per address bit-map. This tool set consists of the following three tools:

  • setup – creates an empty bit-map file,
  • addd – the “add daemon” is a daemon that listens for connections on a local TCP port and sets the bit for any newline separated IP addresses submitted to the socket. Finally the
  • read – tool takes a filename of a bit-map file and outputs an ordered stream of new-line-separated dotted-quad IPv4 addresses.

These tools provide a simple pipeline model for handling a large list of IPv4 addresses with Unix tools like sed, awk, grep, nmsgtool etc.

The Process

Let’s work through a simple example. First initialize an empty data-file:

$ ./setup data.dat 

Now start the daemon by specifying the data file and TCP port number to listen on:

$ ./addd data.dat 4444

Next we will populate the data file by transmitting a newline separated list of dotted-quad IPv4 addresses to the TCP socket. We generate the data using the dnstable_dump command to output an ordered stream of rdata records from the subject mtbl file then filtering the output to contain only “A” records with properly formatted IPv4 addresses:

$ dnstable_dump –d dns.20170203.D.mtbl | grep –e “ IN A “| awk{‘print $4’} | uniq | nc localhost 4444

The IP addresses are passed over the network socket by way of the netcat utility. In this case, localhost:4444 is our instance of the addd tool. The bits mapped to the IP address of the A records in the data is then set to one. At this point, we can perform some quick analytics and get a feel for the kinds of numbers that we are looking at. For this I use the read tool to generate a list of IP addresses where the bits are set to one in the bitmap file and pipe the output to typical command-line tools. Read can also be called with the argument zero which will emit a list of IPv4 addresses where the bit is set to zero.

Let’s start with a simple line/IP count:

$ ./read data.dat one | wc –l

Sticking this output from this command into a table we see the following:

|File |Total |# In Use |% In Use |# Unused |% Unused | |—–| |2017-FEB-03|4,294,967,296|12,904,017|%0.30|4,282,063,279|%99.69| |2017-JAN|4,294,967,296|37,266,823|%0.86|4,257,700,473|%99.13| |2015|4,294,967,296|69,044,394|%1.60|4,225,922,902|%98.39|

We observe approximately 1% of the total IPv4 address space in DNS on a monthly basis and about 1/3 of a percent on a daily basis. The size of the daily in-use count tracks to about 12 Million addresses per day over the week of data that I checked.

Next, we can compare the daily files:

$ ./read 20170203.dat one > ips.20170203.txt
$ ./read 20170204.dat one > ips.20170204.txt
$ sort ips.20170203.txt ips.20170204.txt | uniq -d | wc –l 10720088

Comparing the day files for 2017-FEB-03 and 2017-FEB-04 we see an overlap of 10,720,088 IPs that were seen in both day files. This leaves us with about 2.2 million addresses worth of churn in one day.

A note about the size of IPv4 address space

Beware that the 4,294,967,296 number of addresses is based on the entire 32-bit size of the IPv4 address space. Not all of this space is routed or even open for allocation. Carve-outs such as RFC-1918, multicast space and “Class E” AKA “Future use” address space reduce the overall usable space of IPv4, however the DNS A record ignores these limitations and is represented as a 32-bit number. There are applications that make use of A records as a 32-bit number so some addresses found in this study are not functional IP addresses. For this exercise, we are willing to live with this.

Generating the IPv4 Hilbert Curve Heatmap

After loading the data into the data.dat file, we can use the read utility to extract it and feed it into ipv4-heatmap. Ipv4-heatmap takes as input a stream of newline separated dotted-quad IPv4 addresses and plots them onto a Hilbert curve heatmap (the same Hilbert curve made popular for IP mapping by xkcd. For this example we’ll use the default mode which generates an image where each pixel in the body of the graph represents a single /24 netblock. The color of the pixel indicates how many of the individual addresses within that /24 were found in the data-source.

The command is as follows:

$ ./read data.dat one | ipv4-heatmap \
  -u "/24 per pixel" \
  -h -d -i -P rdbu\
  -f ~/ipv4-heatmap/extra/LiberationSans-Regular.ttf \
  -t "IPs found within DNSDB rdata (month)" \
  -a ../slash_eight.annotations \
  -o ./images/map_day.png

The command above generates the image found at the top of this post. Hilbert curves are a powerful projection used to render IPv4 address space. The output is rather compact and the way adjacent IP space is plotted, CIDR blocks are represented as rectangles. It takes a bit of getting used to at first, the way numbers meander around the image. Once you get the hang of however, it becomes intuitive. The excerpt below shows a zoom in on the inner squares each represent /16s. Each pixel represents a /24. Black pixels had zero IP addresses in that /24. Blue to white blocks are < 50% in use and white to red blocks are > 50% with red being 100% represented.

IPv4 Hilbert Curve Heatmap excerpt

Ben April is the Research Director for Farsight Security, Inc