Introduction: What is SIE, and Why Use It? 

The Security Information Exchange (SIE) is DomainTools near-real-time security-data-sharing service. If you’re like many people, you may wonder how SIE relates to DomainTools more well-known product, Farsight DNSDB Passive DNS. The simplest way is to think about this:

  • Farsight DNSDB answers specific user queries based on historical DNS traffic that it has previously seen and indexed. DNSDB is a great example of a “pull model” for accessing security information: subscribers ask questions and DNSDB answers them.
  • SIE, on the other hand, is a nice example of a “push model” for security information distribution. SIE distributes near real-time threat intelligence feeds, also known as “channels”, analogous to broadcast TV channels: SIE streams cybersecurity data continually to subscribers. Subscribers can listen to traffic from a channel for insights, or distill traffic into new products or services. For example, DNSDB itself is built from processed SIE data. 

Given that last fact, some might wonder, “But why would I want to listen directly to SIE channels if the data that SIE receives gets put into DNSDB?” There are several reasons. 

One big reason for wanting direct access to SIE is that in order to get an answer from DNSDB, the user needs to already have a question in mind. DNSDB can only respond to specific questions. SIE, on the other hand, will send a continuous stream of data that can be observed and filtered to find hits that may be of interest. 

Another reason to use SIE, rather than just DNSDB, is the fact that only a portion of the data that traverses SIE actually makes it into DNSDB. For example:

  • DNSDB focuses solely on successful DNS queries, ones where the “Return Code” (or RCODE) is equal to zero (“NOERROR”). Many other DNS queries are unsuccessful. When that’s the case, those failures are signaled with an RCODE != 0 (see
    https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6 for a full list). For example:
    • Sometimes domain registrations may have accidentally been allowed to expire. When that happens, bookmarks and links may point at domains that used to work, but no longer do. Those queries will normally return NXDOMAIN (RCODE=3), meaning “the domain you asked about doesn’t exist.” 
    • NXDOMAINs are also returned for domains that have never existed, typographical errors, and mis-entered domain names.
    • Other times, name servers may be down, misconfigured, or intentionally set to not answer some queries. Those conditions will also result in non-zero RCODE values, such as SERVFAIL (RCODE=2) or REFUSED (RCODE=5).

NXDOMAINs and other errors may be important because they can represent domain generation algorithm (DGA)-related traffic, domains whose registration may have inadvertently been allowed to lapse, or domains that speculators may register in an effort to capture traffic associated with common typos, among other things.

Yet NONE of those unsuccessful DNS queries get added to DNSDB – DNSDB only records and indexes successful queries (where RCODE=0). If you wanted to know about unsuccessful queries, you’d need to look at the query failures streaming across an appropriate SIE channel, instead, such as:

  • The NXDOMAINS channel (Channel 221), containing a condensed representation of NXDOMAIN data

Another example of data that’s available via SIE, but that isn’t part of DNSDB, is non-DNS-related data. For example, current public non-DNS SIE channels include:

  • Channel 14, our Darknet (or “dark space telescope”) channel. It shows the “background radiation” IP traffic that’s constantly hitting systems on the Internet, e.g., unsolicited traffic including scans, probes, and spoofed traffic from hacker and academic researchers.
  • Channel 25, our proof-of-concept spamtrap channel that contains selected fields from unsolicited emails
  • Channel 27, which carries newly discovered phishing URLs from PhishLabs.
  • Channel 42, which has firewall and intrusion detection system (IDS) telemetry from ThreatStop.
  • Channel 115, which has DDoS event data identified from Channel 14

Each of those channels has interesting and powerful content, but that data isn’t DNS-based, so it doesn’t end up in DNSDB. If you want that data, you need to be subscribed to those SIE channels instead.

As yet another example of why SIE is useful, while DNSDB indexes successful DNS queries, as mentioned above, not all successful DNS datapoints are “equally interesting.” For example, some DNS queries may be “seen all time,” including queries for major search engines, queries for popular social media sites, or queries for top online shopping sites. Those are all “mundane” or “routine” DNS datapoints. While those sites are important, queries for them are not unexpected or necessarily worthy of attention.

Many security analysts are primarily interested in novel DNS names, such as queries for NEW effective 2nd-level domain names, or queries for NEW fully qualified domain names (FQDNs). While DNSDB includes those, it doesn’t highlight or flag them in any way. We have SIE channels, however, that DO isolate and announce those, including:

  • NOD (Channel 212), Newly Observed Domains, listing new “effective 2nd-level domains[1] that have never been seen by our sensors before. 
  • NOH (Channel 213), Newly Observed Hosts, listing new hostnames (or “FQDNs”) that have never been seen by our sensors before.
  • NAD (Channel 211), Newly Active Hosts, listing effective 2nd-level domains that have been seen again by one of our sensors after not having been seen for at least ten days.

These names are interesting because often attackers will set up (“stage”) new infrastructure to use for imminent attacks. Promptly discovering sites of that sort can make cyber defense possible and effective.

While all of the above is important in establishing why you might be interested in SIE, that’s not the primary focus of this piece. Our primary goal is to help you understand how you can ACCESS those channels. Let’s explore some technical options (for pricing details, please contact sales).

SIE via a Leased Blade Server (also called “SIE Direct Connect”)

The heart of SIE consists of modern high speed Ethernet switches. SIE traffic gets broadcast over those switches on “channels” (actually Ethernet VLANs). Subscribers are given access to particular channels they’ve purchased, and can “listen to” the traffic flowing over them, extracting all or selected bits of that traffic for further analysis.

The most straightforward SIE channel access option is use of an SIE blade server leased from DomainTools. You’d get full root access to that server, and it comes with SIE access software installed and configured so you can “hit the ground running.” Each blade has two network interfaces: one high speed interface connects directly to SIE while the other high speed interface provides Internet transit connectivity. Blade servers are ideal for:

  • SIE’s highest bandwidth channels. SIE Direct Connect may be the ONLY option for some particularly bandwidth-intensive channels.
  • Those who want to minimize the ongoing bandwidth consumed in transporting data from SIE to their home location (for example, to cull unwanted variables and observations first, on the blade at SIE, before transporting what’s left for further analysis.)
  • Those who may subscribe to multiple SIE channels.
  • Those who want to routinely and persistently monitor SIE channel traffic, rather than just establishing connectivity for a brief transient interval now and then.

Typical usage: 

To use a blade server to access SIE data, you normally begin by using ssh with preshared keys to connect to your blade. 
Once you’ve connected you might then use nmsgtool (see https://github.com/farsightsec/nmsg) or your own custom code to access an SIE channel you’ve subscribed to. For example, assume we want to create a series of compressed minutely JSON format-data files for Ch213, our Newly Observed Hosts channel, writing a new file every sixty seconds:

$ nmsgtool -C ch213 -t 60 -k '' -J ch213
^C

Details about nmsgtool can be seen in the nmsgtool man page ($ man nmsgtool). Decoding this specific command invocation:

$the shell prompt (don’t type this in!)
nmsgtoolour command line convenience utility to read a channel
-C ch213Specify the channel we want to access, in this case, Ch213 (“NOH”)
-t 60 -k ''Every 60 seconds, rotate the file (note that ” is two single tick marks, not one double)
-J ch213Tag the JSON Lines format output file with “ch213”
^CHit CTRL-C to interrupt the data capture process after it’s been running for a while

We should emphasize that:

  • While that sample accesses ch213, most other channels would work the same way; there’s nothing unique about Ch213
  • We chose to rotate files every sixty seconds, but you can have rotate files every five minutes or every hour – it’s up to you
  • We asked for JSON Lines output, but you can also get other formats

You can see examples of what some filenames look like from running that command:

$ ls -lh
[…]
-rw-r--r-- 1 […]  14M Mar  7 01:57 ch213.20230307.0156.1678154160.095391031.json
-rw-r--r-- 1 […]  31M Mar  7 01:58 ch213.20230307.0157.1678154220.085093474.json
-rw-r--r-- 1 […]  41M Mar  7 01:59 ch213.20230307.0158.1678154280.028651345.json
-rw-r--r-- 1 […]  31M Mar  7 02:00 ch213.20230307.0159.1678154340.042128085.json
-rw-r--r-- 1 […]  20M Mar  7 02:00 ch213.20230307.0200.1678154400.041951015.json
-rw-r--r-- 1 […]  19M Mar  7 02:02 ch213.20230307.0201.1678154460.002206639.json
[…]

You might then use sftp, scp, or rsync to export those data files to a remote system. For example, from a remote system you might pull those data files with rsync, either interactively or routinely from cron:

$ rsync --stats -h mybladeserver-name.fsi.io:somesubdirectory/*.json .

Number of files: 11
Number of files transferred: 11
Total file size: 265.45M bytes
Total transferred file size: 265.45M bytes
Literal data: 265.45M bytes
Matched data: 0 bytes
File list size: 446
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 258
Total bytes received: 265.51M

sent 258 bytes  received 265.51M bytes  9.66M bytes/sec
total size is 265.45M  speedup is 1.00

Details about using rsync can be found in the rsync man page ($ man rsync)

You can then process the files you’ve downloaded, perhaps beginning by checking to see what effective 2nd-level-domains are most heavily represented in that set of files (a copy of the 2nd-level-dom-large script we like for that is attached as an appendix). We can also use jq (see https://stedolan.github.io/jq/) to extract just selected fields of interest from our data files, and then process the extracts with a Un*x command pipeline. (The domains in the following command output have been “defanged” by replacing a “.” in the domain name with “[dot]”)

$ cat ch213* | jq -r '.message.rrname' | 2nd-level-dom-large | sort | uniq -c | sort -nr | more
  78831 akamaihd[dot]net
  76932 ttd-a[dot]com
  67470 sbb[dot]ch
  36883 blizzardgames[dot]cn
  12586 kinderramadan[dot]com
  10913 allposters[dot]com
   4282 roadshowkinder[dot]pl
   4244 kindercrazyfriends[dot]pl
   4138 kindercrazyfriends[dot]com.pl
   3892 teenidolskinderjoyroadshow[dot]eu
   3204 bme2[dot]net
   2908 sensic[dot]net
   2896 delice[dot]cz
   2697 hawatalk[dot]com
   2572 cargo[dot]site
   2399 ringcentral[dot]com
   2331 lustaufgrillen[dot]de
   2255 bancointer[dot]com.br
   2183 idemia[dot]io
   1876 kristarr[dot]com
   1755 nintendo[dot]net
   [etc]

SIE Remote Access (aka “SRA” or “sratool/sratunnel,” Sometimes Also Referred to as “AXA” )

SIE Remote Access, sometimes referred to as “AXA” (“Advanced eXchange Access), is an SIE encrypted tunnel service for delivering data directly to customer’s environments. It is NOT the same as the confusingly-similarly-named but different “AXAMD,” a RESTful interface to SIE sometimes used by developers for web integrations.

Not every channel requires a dedicated SIE blade server. Most channels can be delivered from SIE to the analyst’s remote workstation using SIE Remote Access–that is, via an encrypted network tunnel. That tunnel can be turned up when needed, used for a while, and then terminated – perfect for use cases where dedicated access to high bandwidth channels is NOT required.

There are two  versions of SRA — AXA Protocol v1, AXA Protocol v2. The two versions are NOT interoperable. The services run on different port numbers, use different versions of the required client software, and use different authentication methods. All SIE Remote Access users should now use AXA Protocol 2, the current version.

We’ll now briefly introduce use of the NEW version of SIE Remote Access (based on AXA Protocol 2). Using SIE Remote Access requires the use of AXA client software. DomainTools provides binaries for Debian-based Linux distributions. Building AXA from source involves satisfying a relatively long list of dependencies, but is quite doable on RHEL-based Linux as well as current generation Macs (Intel or M1), among other hosts. To get set up to use SRA, get and build the AXA toolkit as described at https://github.com/farsightsec/axa After successfully building AXA, the sratool and sratunnel command will be installed on your local system. You check the version information with the capital vee option:

$ sratool -V
sratool built using AXA library 3.0.0, supporting AXA protocols v1 to v2; currently using v2
client HELLO: {"hostname":"[elided]","uname_sysname":"Darwin","uname_release":"22.3.0",
"uname_version":"Darwin Kernel Version 22.3.0: Mon Jan 30 20:39:35 PST 2023; 
root:xnu-8792.81.3~2/RELEASE_ARM64_T8103","uname_machine":"arm64","origin":"sratool",
"libaxa":"3.0.0","libnmsg":"1.0.1","libwdns":"0.11.0","libyajl":20100,
"OpenSSL":"OpenSSL 1.1.1t  7 Feb 2023","AXA protocol":2}

$ sratunnel -V
sratunnel built using AXA library 3.0.0, AXA protocol 2 in 1 to 2 

To work with sratool and sratunnel, create a config file called ~/.axa/config (note both the leading tilde and the dot before axa). Using your favorite editor, add the line:

alias:sra-v2=apikey:your_API_key_here@axa-sie.domaintools.com,49500

Save the file and ensure it is NOT readable by other users on your system:

$ chmod 0700 ~/.axa/config

Try connecting with sratool. sratool is meant to be used as an interactive debugging tool for confirming SIE access; sratunnel will normally be used as the production service delivery conduit. Details about sratool are available in its man page( $ man sratool). (Output defanged below by replacing an actual dot with [dot])

$ sratool
	sra> conn sra-v2
	* HELLO srad v3.0.1 axa-sie-1 supporting AXA protocols v1 to v2; currently using v1
	* Using AXA protocol 2
	* OK USER jsmith authorized
	sra> list
	[the list of provisioned SIE channels gets displayed here]
	ch14 off enp1s0f1.14
 	ch24 off 10.32.24.255/8430 10.32.24.255/9430
	ch25 off 10.32.25.255/8430 10.32.25.255/9430
	[...]
	ch255 off 10.32.255.255/8430 10.32.255.255/9430
	sra> 10 watch ch=213
	sra> channel 213 on
	10 ch213  {"time":"2023-03-21 20:18:58.397164404","vname":"SIE","mname":"newdomain", 	"source":"a1ba02cf","message":{"domain":"thebalance[dot]com.",
	"time_seen":"2023-03-21 20:18:41","bailiwick":"thebalance[dot]com.",
	"rrname":"tinker1-rbass.atlas.thebalance[dot]com.","rrclass":"IN","rrtype":"CNAME",
	"rdata":["ops-us-east-1-redirect-alb-1126811726.us-east-1.elb.amazonaws[dot]com."],
	"keys":[],"new_rr":[]}}
	10 ch213  {"time":"2023-03-21 20:18:58.397256656","vname":"SIE","mname":"newdomain",
	"source":"a1ba02cf","message":{"domain":"thebalance[dot]com.",
	"time_seen":"2023-03-21 20:17:13","bailiwick":"thebalance[dot]com.",
	"rrname":"pearl2.s201.atlas.thebalance[dot]com.","rrclass":"IN","rrtype":"CNAME",
	"rdata":["ops-us-east-1-redirect-alb-1126811726.us-east-1.elb.amazonaws[dot]com."],
	"keys":[],"new_rr":[]}}	
	[etc]
	CTRL-C
	sra> quit

(JSON Lines observations in the preceding output have had manual line breaks added to help with their display here)
Now let’s try sratunnel. Details about sratunnel can be seen in its man page ($ man sratunnel).

$ sratunnel -s 'sra-v2' -c 213 -w ch=213 -o nmsg:udp:127.0.0.1,8000 &

Traffic will now be streaming across the tunnel. At this point, you can access that tunneled traffic with nmsgtool or custom code you’ve written. For example:

$ nmsgtool -l 127.0.0.1/8000 -o -
[125] [2023-01-11 23:31:45.324660321] [2:5 SIE newdomain] [a1ba02cf] [] [] 
domain: u2d-ka106[dot]de.
time_seen: 2023-01-11 23:30:43
rrname: u2d-ka106[dot]de.
rrclass: IN (1)
rrtype: NS (2)
rdata: ns1103.ui-dns[dot]de.
rdata: ns1103.ui-dns[dot]biz.
rdata: ns1103.ui-dns[dot]com.
rdata: ns1103.ui-dns[dot]org.
[etc]
CTRL-C

Once the sratunnel is running, you can also use any other nmsgtool command (such as writing minutely files as we showed for the SIE Direct Connect example previously).

Be sure to also kill the tunnel running in the background once you’re done. To do so, AFTER you’ve killed any nmsgtool running:

$ fg🡨 brings the background job back to the foreground
CTRL-C🡨 kills the tunnel

SIE Batch 

If the preceding options seem too complex, you may want to try SIE Batch, our solution for periodically downloading files of cached SIE data. It is one of the simplest ways to get traffic from SIE channels.  Here’s how it works:

  • The SIE Batch servers listen to select channels for subscribers, temporarily caching up to half a day’s worth of traffic for all but the hottest-running SIE channels.
  • Subscribers can access and download that data using an interactive point-and-click web page, a Un*x command line client, or via the SIE Batch API. 

One cool advantage of SIE Batch: if you’re okay with using data that’s already cached by SIE Batch, you don’t need to wait while data comes in “live” the way you would if you were using SIE Direct Connect or SIE Remote Access, you can just instantly grab up to half a day’s worth of data and begin your analysis.

SIE Batch can also “save your bacon” if an event occurred when you weren’t watching a relevant SIE channel you subscribe to: SIE Batch will let you go “back in time” for up to half a day and grab data that might otherwise have been irretrievably lost “downstream.”

So, SIE Batch is very convenient and easy to use, but it does still have some limitations, including:

  • Not All Channels Are Available in SIE Batch: Some channels may run at too high of a bitrate to be accessed via SIE Batch.
  • Batching Adds Latency: Many people will want data from SIE channels with as little latency as possible, but SIE Batch unavoidably adds latency while data is accumulated for your next “batch” (this can be minimized by using a short interval for batches). 
  • SIE Batch Still Requires Users to Work With NMSG Format Binary Data: Many SIE channels still get delivered in nmsg (binary) format, so you still need to build and use nmsgtool to read and convert the binary files you’ve downloaded via SIE Batch.

We have written a lot more about SIE Batch. See https://www.domaintools.com/resources/blog/whats-sie-batch-why-might-i-be-interested-in-it/ for more on using SIE Batch.

Information on command line clients that can be used to access SIE Batch (convenient for calling from cron, etc.) are described in
https://www.farsightsecurity.com/assets/documents/SIE-batch-api-command-line-client-whitepaper_1-0.pdf

 AXAMD

AXAMD is a RESTFUL API endpoint for accessing SIE channel data. It also includes both a command line interface client and a Python3 binding. The acronym AXAMD stands for “Advanced eXchange Access Middleware Daemon.” 

We recently discussed AXAMD in a mid-length blog article, see ‘Using AXAMD to Read Observations from NOD SIE Ch212 (“Newly Observed Domains”) with Python3,’ so for information about AXAMD, we recommend reviewing that article.

 Colocation or Cross-Connects

Some may like the idea of working from a system located at or near SIE, but may need a system other than one of the company’s blade servers. In cases of that sort, colocating a customer-provided system with SIE may be a good option to explore. Because there are many considerations involved in arranging this, including:

  • Arranging for installation
  • Rack space/power/cooling requirements
  • Out-of-band access and remote hands support, etc.

If you’re interested in colocating a system with SIE, please contact your Sales Executive to discuss your requirements.

In other cases, a site may already have their own rack or cage at the same datacenter as SIE. In that case, a network cross-connect from the customer’s cage to the SIE cage may be a worthwhile option to explore. Again, please contact your Sales Executive to discuss network cross-connect options.

Conclusion

We hope this overview of SIE Access Options has been helpful. If you still have questions, please contact us.
We’d be happy to address your questions or concerns.

Appendix: 2nd-level-dom-large

$ cat 2nd-level-dom-large
#!/usr/bin/perl
use strict;
use warnings;
use IO::Socket::SSL::PublicSuffix;

my $pslfile = '/usr/local/share/public_suffix_list.dat';
my $ps = IO::Socket::SSL::PublicSuffix->from_file($pslfile);

while (my $line = <STDIN>) {
        chomp($line);
        my $root_domain = $ps->public_suffix($line,1);
        printf( "%s\n", $root_domain );
}

[1] “Effective 2nd-level domains” include both domains registered immediately above TLDs such as com, net, org, edu, mil, gov, etc., as well as domains registered about more-complex multi-label “effective TLDs” as defined by the Public Suffix List (see https://publicsuffix.org/).