featured image, lights with dark background
Blog Farsight TXT Record

Non-U.S. Universities with .edu Domain Names: They're More Common Than You Might Think

1. Introduction

In general, as stated here:

Only U.S. postsecondary institutions that are institutionally accredited by an agency on the U.S. Department of Education’s list of Nationally Recognized Accrediting Agencies may obtain an Internet name in the .edu domain.

However, that policy needs to be read in light of a “grandfather clause”, which clarifies:

The Cooperative Agreement between EDUCAUSE and the U.S. Department of Commerce specifies that all .edu names in existence as of October 29, 2001 are “grandfathered,” regardless of current or past eligibility requirements.

We were curious to see how many 2nd-level dot edu domains (including grandfathered edus) actually map to non-US IP address space. This may be of some practical importance since often people forget that users coming from legacy dot edu domains may not be from the United States.

Now obviously, a non-US university could elect to host their domain in US address space, or a US university could choose to host their domain in non-US address space, but for the most part we’d expect to see US universities in US IP address space, and international universities in non-US address space.

So can we identify dot edu domains that are hosted outside the US? It turns out that yes, yes we can.

2. Dot Edu Domains Seen In Real Time At The Security Information Exchange

Working from a local SIE blade server, we begin by capturing a small sample of 5,000,000 observations from Channel 204, our deduplicated/filtered/verified passive DNS channel. We’ll just keep the fully qualified domain names (FQDNs) we saw from those observations. This short collection period was meant to capture dot edu base domains that are in routine/continual use. It took just a little over four minutes to collect those observations with the command:

$ nmsgtool -C ch204 -c 5000000 | grep rrname | awk '{print $2}' > ch204.txt

We now want to simplify those observed names by converting the FQDNs we found into 2nd level dot edu domain names using a small Perl script that checks the Public Suffix List (that and another script used in this article follows the body of the article):

$ 2nd-level-dom < ch204.txt | sort -u > ch204-2nd-level.txt

That left us with 1,731,891 unique effective 2nd level domains domains. 1,806 of those had a dot edu domain name. However, some of those 1,806 were hashed DNSSEC owner names such as:

083ued1lj2fhb4d8m9j6q04r899q6kg0.edu
14pj7h7elaug4rk38hmrrrsf4vrpjo1j.edu
1jud27g70lsul7a1m8dkj8qhicbtjf80.edu
amehl7oe9aug1shfmkt24g10d5204gpe.edu
dkfbc38o8216a3bfbdik8l98oh2udkci.edu

Removing those, we’re left with 1,733 domains.

3. Resolving Dot Edus to IPs, and IPs to GeoIP Country Codes

We then ran those 1,733 dot edu domains through a little Python script (see appendix II):

$ domain-to-country-code.py < edus.txt > edus-processed.txt

That script looked for “A” and/or “AAAA” (IPv4 and IPv6) records associated with each of the observed 2nd level edu domains, and then mapped the resulting IP addresses to two letter country codes using Maxmind’s GeoLite2 Country database.

When that script finished, we were left with 1,983 records: 1,910 of those records represented dot edu/IP address pairs that had a US country code. 73 of those records had a non-US country code.

Some of you may wonder how we were able to end up with 1,983 resolved addresses from just 1,733 dot edu 2nd level domains… Some of the 2nd level domains we started with didn’t resolve at all, some resolved to just a single IP address, while others resolved to multiple IPv4 or IPv6 addresses.

For example, the 2nd level domain nodak.edu — unquestionably a real dot edu domain — didn’t resolve at all. www.edutech.nodak.edu and other specific fully qualified domain names from nodak.edu definitely resolved, but nodak.edu itself (or www.nodak.edu, for that matter), didn’t resolve at the time this blog article was written.

Considering another example, uoregon.edu, resolved to just a single IP address. That IP address was reported as being located in the US (as we’d expect) per the Geolite2 database:

uoregon.edu 128.223.142.244 US

krasnoyarsk.edu, on the other hand, resolved to a single IP address that geolocated to Russia:

krasnoyarsk.edu 193.218.136.140 RU

yale.edu, the domain of Yale University, resolved to six unique IP addresses, all of which geolocated to the United States:

yale.edu 104.16.245.46 US
yale.edu 104.16.243.4 US
yale.edu 104.16.241.46 US
yale.edu 104.16.244.46 US
yale.edu 104.16.242.46 US
yale.edu 2400:cb00:2048:1::6810:f12e US

monash.edu, a domain traditionally associated with Australia, resolved to seven unique IP addresses that resolved to other country codes as well as Australia:

monash.edu 185.64.253.1 GB
monash.edu 202.9.95.188 AU
monash.edu 54.214.33.151 US
monash.edu 54.232.88.45 BR
monash.edu 119.9.73.89 HK
monash.edu 166.78.109.115 US
monash.edu 176.32.95.209 JP

If we take the file of domains that did resolve to at least one IP, and look just at the BASE DOMAINS (e.g., domains like uoregon.edu, not fully qualified domain names like www.uoregon.edu) there were 1,655 unique domains:

  • 1,594 unique domains associated solely with US IPs,
  • 59 unique domains associated solely with non-US IP, and
  • 2 unique domains that had a combination of both US and non-US IPs.

4. Full Results From Our Brief SIE Sample

All-in-all, we found 73 unique records associated with non-US dot edus:

AE dmcg.edu 194.170.31.41
AT mci.edu 193.171.232.28
AU canberra.edu 137.92.97.131
AU monash.edu 202.9.95.188
BG africau.edu 185.62.238.50
BR monash.edu 54.232.88.45
CA columbiacollege.edu 159.203.39.159
CA cursus.edu 192.99.38.118
CA digipen.edu 204.174.42.104
CA marianopolis.edu 206.47.149.100
CA niagara.edu 192.188.5.61
CA pwcs.edu 155.254.146.72
CA toronto.edu 128.100.166.120
CA unb.edu 131.202.1.106
DE glion.edu 52.58.68.105
DE isb.edu 144.76.121.175
DE kit.edu 129.13.40.10
DE kit.edu 2a00:1398:9:fd10::810d:280a
EE nel.edu 217.146.69.9
EE nel.edu 2a02:29e8:770:0:3::19
EG aast.edu 196.219.60.10
ES esade.edu 213.4.197.20
ES esade.edu 84.88.228.20
ES mondragon.edu 193.146.78.2
ES ub.edu 161.116.100.2
ES ucam.edu 193.147.26.228
ES uoc.edu 213.73.40.242
ES upc.edu 147.83.2.135
ES upf.edu 84.89.128.15
GB london.edu 163.119.244.27
GB marygrove.edu 2a02:fe80:1010::10:7
GB monash.edu 185.64.253.1
GR hauniv.edu 194.219.151.109
HK monash.edu 119.9.73.89
HK ncuindia.edu 119.9.107.27
HU ceu.edu 193.6.218.8
ID stts.edu 139.255.65.82
ID upi.edu 103.23.244.5
IE ie.edu 52.218.16.178
IN cds.edu 202.88.238.244
IN jipmer.edu 210.212.230.85
IN nitt.edu 203.129.195.156
IN ritindia.edu 202.38.172.143
IN sastra.edu 14.139.181.236
IN sastra.edu 220.225.137.243
JP monash.edu 176.32.95.209
KR skku.edu 115.145.129.184
NL tilburguniversity.edu 137.56.209.21
NL tilburguniversity.edu 137.56.209.22
NL tilburguniversity.edu 2001:610:1410:280:24ee:f0cd:bb36:7745
NL tul.edu 137.120.30.68
None safa.edu 130.117.92.15
NO ntnu.edu 129.241.56.117
PH aiias.edu 116.93.59.233
PH ateneo.edu 202.125.102.21
PH ubaguio.edu 122.55.103.201
PK pgc.edu 119.159.229.143
PR uprm.edu 136.145.30.119
PS hebron.edu 82.213.57.178
PS hebron.edu 93.184.9.13
PS iugaza.edu 195.189.210.6
RU crimea.edu 80.245.119.130
RU krasnoyarsk.edu 193.218.136.140
RU mpgu.edu 91.143.47.22
RU phystech.edu 93.175.31.131
RU spb.edu 195.70.196.197
SD sustech.edu 41.67.53.4
SD uofk.edu 2c0f:fec8:1000::5
SD uofk.edu 41.67.20.5
SG galgotiacollege.edu 119.81.113.118
TH au.edu 168.120.16.231
TR metu.edu 144.122.144.137
TR sabanciuniv.edu 193.255.135.111

These domains represent non-U.S. dot edus that are in routine/continual use, and serve as a nice reminder that we cannot safely assume that all dot edus are located in the United States.

However, the above was just from a small 5,000,000 observation drawn from SIE — less than a five minute sample.

What would we see if we were to look at results from DNSDB for a longer period?

5. A Longer Sample of Unique Dot edus from DNSDB

We can pull a longer sample of dot edu A and AAAA records from DNSDB, and demonstrate production of json format output from DNSDB, by saying:

$ dnsdb_query.py -r \*.edu/A -j --after=2016-08-01 -l 1000000 > pdns.txt
$ dnsdb_query.py -r \*.edu/AAAA -j --after=2016-08-01 -l 1000000 >> pdns.txt

As of August 18th, 2016, a typical observation from our pdns.txt output file looked like:

{"count": 358354, "time_first": 1277351519, "rrtype": "A", 
"rrname": "purgatory.bc.edu.", "bailiwick": "bc.edu.", 
"rdata": ["136.167.2.254"], "time_last": 1471365799}

We can use jq to do data “surgery” on that output, keeping just the rrname values that we’re interested in:

$ cat pdns.txt | jq .rrname | sed 's/"//g' | 2nd-level-dom | sort -u > pdns-edus.txt

That left us with 8,028 lines that looked like:

22cf.edu
3ponts.edu
4cd.edu
4dcollege.edu
aa.edu
aaa.edu
aaaom.edu
aaart.edu
​[etc]

6. Data Completeness and Quality

How “complete” and “right” is our new larger list? Did we see any odd pseudo domains? Have we found “all” dot edu domains? If we had a copy of the dot edu zone file, we could compare what we’ve found to what Educause actually includes in their zone, but unfortunately the dot zone file is not publicly available.

Fortunately, Educause does at least publish a summary graph that shows the size of the dot edu zone file — 7,524 is the latest value published there.

How does that compare to U.S. Department of Education statistics? Well, we know that as of 2012-2013, there were 7,253 “Title IV Postsecondary Institutions” (e.g., institutions whose students are eligible to receive Stafford loans or other Federal student financial aid), of which 4,726 were/are degree-granting institutions. How do we explain the fact that our list has 8,028 domains, given that fact? Well, we note the following phenomena:

  • Some grandfathered institutions may have multiple dot edu domain, not withstanding the fact that current policies only allow new dot edu applicants to obtain a single dot edu domain. For example:
    • iu.edu and indiana.edu
    • orst.edu and oregonstate.edu
    • uw.edu and washington.edu
  • Some dot edu domains seen in DNSDB may not be “real.” For instance, we discovered a number of “pseudo domains” in dot edu associated with a particular Verisign monitoring netblock. Those base domains have very easily-recognized formats, such as
emt-t-1006862691-1429043183732-2-ez.edu
emt-t-1008366659-1428438754423-2-qg.edu
emt-t-1008428805-1429569009365-2-hlmdn.edu
​[etc]

and

t-1019030600-1424225115847-2-lclrb.edu
t-102097433-1424812217636-2-dnc.edu
t-1023615998-1398918487508-2-nt.edu
​[etc]

After removing those pseudo domains, our list dropped to 7,106 domains.

Digging a bit further into the Department of Education web site, we find the Department’s database of accredited postsecondary institutions and programs.

Here you will find a spreadsheet with 9,546 unique institutional names known to the Department of Education. Some of those educational activities are not what you or I might typically think of as traditional colleges or universities, including entries for specialized programs, focused offsite professional programs, and/or non-degree-granting programs. Some examples of less-traditional entries included:

  • 2nd Dental Battalion Naval Dental Clinic/Le Jeune Advanced Education in General Dentistry 12 Months
  • Alfred I. duPont Hospital for Children
  • Aurora University at Carpentersville Middle School
  • Earl Warren Adult School-California Correctional Center
  • George Washington University at The National Geospatial Intelligence Agency
  • US Army Armor School
  • [etc]

In other cases, the same overarching organization may be listed multiple times due to multiple branch or satellite locations. For example:

  • Empire Beauty School
  • Empire Beauty School – Baltimore
  • Empire Beauty School – Boston
  • Empire Beauty School – Cincinnati
  • Empire Beauty School – Framingham
  • Empire Beauty School – Glendale
  • Empire Beauty School – Grand Rapids
  • Empire Beauty School – Kennesaw
  • Empire Beauty School – Laconia
  • Empire Beauty School – Lehigh Valley
  • Empire Beauty School – Malden
  • Empire Beauty School – Peekskill
  • Empire Beauty School – Philadelphia
  • Empire Beauty School – Portland
  • Empire Beauty School – Pottstown
  • Empire Beauty School – Pottsville
  • Empire Beauty School – Queens
  • Empire Beauty School – Somersworth
  • Empire Beauty School – Westminster
  • Empire Beauty School – Wyoming Valley

We also know that some recognized institutions may use a dot com, dot org, dot us, or other non-dot edu domain name for any of a variety of reasons.

Bottom line, it can be complex to find exactly how many dot edu-eligible institutions actually exist. Still, we believe that our list of 7,106 dot edus is 94.4% of Educause’s 7,524 figure, and likely represents virtually all dot edu domains in actual active use.

7. Final Results

We then processed those 7,106 dot edu base domains through our same domain-to-country-code.py script, resulting in 7,666 unique returned domain/IP pairs associated with 6,756 unique domains.

687 of those, as shown in Appendix III, were non-US dot edu domain/IP pairs, with 625 unique non-US dot edu domains.

What this all means for you:

  1. Any assumption that not-sharing zone files will keep third parties from identifying domain names in routine use is obviously a bad assumption — we’ve demonstrated that we can easily identify virtually all of dot edu from passive DNS. The implications of this for dot gov and dot mil are obvious: just as we passively enumerated dot edu, an attacker could just as easily passively enumerate the far-more-sensitive dot gov or dot mil zones.

  2. Any assumption that dot edus are all US-located is clearly wrong (potentially relevant for crypto deemed-export rules and ITAR-controlled research work).

  3. Some who are doing cyber security-related work routinely want to map IP addresses to country codes, but think that there’s some complicated magic involved in doing so. There’s not. The process is simple and easily accomplished. That’s one of the things the code provided with the article is meant to concretely demonstrate.

  4. Farsight is routinely in discussions with potential partners offering other data enrichment feedss. This article provides a nice example of how that process can work, and the synergies that can arise from combining passive DNS with other third party data sources.

**Acknowledgements**: The author gratefully acknowledges the assistance of his colleague Mr. Gabriel Iovino of Farsight Security, Inc., for insightful suggestions related to an earlier draft of this article, although sole responsibility for the content of this article remains with the author.

Also, thank you Mr. Robert Edmonds, now of Fastly, Inc., for your helpful comments.

Appendix I. 2nd-level-dom script

#!/usr/bin/perl
use strict;
use warnings;
use IO::Socket::SSL::PublicSuffix;

my $pslfile = '/your_path_to_the/public_suffix_list.dat';
my $ps = IO::Socket::SSL::PublicSuffix->from_file($pslfile);

my $line;

foreach $line (<>) {
        chomp($line);
        my $root_domain = $ps->public_suffix($line,1);
        printf( "%s\n", $root_domain );
}

Notes:

  • public_suffix_list.dat can be downloaded here
  • IO::Socket::SSL::PublicSuffix can be downloaded here

Appendix II. domain-to-country-code.py

#!/usr/bin/python -u
import sys
import dns.resolver
import geoip2.database

myResolver = dns.resolver.Resolver() 
myResolver2 = dns.resolver.Resolver() 

reader = geoip2.database.Reader('/your_path_to_the/GeoLite2-Country.mmdb')
reader2 = geoip2.database.Reader('/your_path_to_the/GeoLite2-Country.mmdb')

try:
   line = raw_input()
except EOFError:
   print "Please pipe in a list of domains to process..."
   exit()

while line:

   #  IPv4
   clean=0
   cleana=0
   
   try:
      myAnswers = myResolver.query(line, "A") 
   except:
      clean=1
   
   if clean == 0:
     for rdata in myAnswers: 
        try:
           response = reader.country(rdata)
        except:
           cleana=1
	if cleana == 0:
           print line,rdata,response.country.iso_code
   
   #  IPv6
   
   clean2=0
   clean2a=0
   
   try:
      myAnswers2 = myResolver2.query(line, "AAAA") 
   except: 
      clean2=1
   
   if clean2 == 0:
      for rdata2 in myAnswers2: 
         try:
             response2 = reader2.country(rdata2)
         except:
	     clean2a=1

   if (clean2 == 0) and (clean2a == 0):
             print line,rdata2,response2.country.iso_code
             
   try:
      line = raw_input()
   except EOFError:
      sys.exit(0)

Notes:

  • python -u unbuffers the output when this script is run; if you are not running the script interactively you can omit the -u
  • If you don’t care about IPv6, you can omit the IPv6 section of this script to just see IPv4 results
  • Obtain GeoLite2-Country.mmdb here
  • Get the Python library required to process the database here

Required Attribution: This paper includes GeoLite2 data created by MaxMind.

Appendix III. Non-US Dot Edus Seen In Our August DNSDB Data

Available here.

Joe St Sauver, Ph.D. is a Scientist with Farsight Security, Inc.