
In general, as statedhere:
Only U.S. postsecondary institutions that are institutionally accredited byan agency on the U.S. Department of Education’s list of NationallyRecognized Accrediting Agencies may obtain an Internet name in the .edudomain.
However, that policy needs to be read in light of a “grandfather clause”, whichclarifies:
The Cooperative Agreement between EDUCAUSE and the U.S. Department ofCommerce specifies that all .edu names in existence as of October 29, 2001 are“grandfathered,” regardless of current or past eligibility requirements.
We were curious to see how many 2nd-level dot edu domains (includinggrandfathered edus) actually map to non-US IP address space. This may be ofsome practical importance since often people forget that users coming fromlegacy dot edu domains may not be from the United States.
Now obviously, a non-US university could elect to host their domain in USaddress space, or a US university could choose to host their domain in non-USaddress space, but for the most part we’d expect to see US universities in USIP address space, and international universities in non-US address space.
So can we identify dot edu domains that are hosted outside the US? It turnsout that yes, yes we can.
Working from a local SIEblade server, we begin by capturing a small sample of 5,000,000 observationsfrom Channel 204, our deduplicated/filtered/verified passive DNS channel. We’ll justkeep the fully qualified domain names (FQDNs) we saw from those observations.This short collection period was meant to capture dot edu base domains that arein routine/continual use. It took just a little over four minutes to collectthose observations with the command:
$ nmsgtool -C ch204 -c 5000000 | grep rrname | awk '{print $2}' > ch204.txt
We now want to simplify those observed names by converting the FQDNs we foundinto 2nd level dot edu domain names using a small Perl script that checks thePublic Suffix List (that and another script usedin this article follows the body of the article):
$ 2nd-level-dom < ch204.txt | sort -u > ch204-2nd-level.txt
That left us with 1,731,891 unique effective 2nd level domains domains.1,806 of those had a dot edu domain name. However, some of those 1,806 werehashed DNSSEC owner names such as:
083ued1lj2fhb4d8m9j6q04r899q6kg0.edu
14pj7h7elaug4rk38hmrrrsf4vrpjo1j.edu
1jud27g70lsul7a1m8dkj8qhicbtjf80.edu
amehl7oe9aug1shfmkt24g10d5204gpe.edu
dkfbc38o8216a3bfbdik8l98oh2udkci.edu
Removing those, we’re left with 1,733 domains.
We then ran those 1,733 dot edu domains through a little Python script(see appendix II):
$ domain-to-country-code.py < edus.txt > edus-processed.txt
That script looked for “A” and/or “AAAA” (IPv4 and IPv6) records associatedwith each of the observed 2nd level edu domains, and then mapped the resultingIP addresses to two letter country codes usingMaxmind’s GeoLite2 Country database.
When that script finished, we were left with 1,983 records: 1,910 of thoserecords represented dot edu/IP address pairs that had a US country code. 73 ofthose records had a non-US country code.
Some of you may wonder how we were able to end up with 1,983 resolvedaddresses from just 1,733 dot edu 2nd level domains… Some of the 2nd leveldomains we started with didn’t resolve at all, some resolved to just a singleIP address, while others resolved to multiple IPv4 or IPv6 addresses.
For example, the 2nd level domain
nodak.edu
— unquestionably a real dot edudomain — didn’t resolve at all.
www.edutech.nodak.edu
and other specificfully qualified domain names from
nodak.edu
definitely resolved, but
nodak.edu
itself (or
www.nodak.edu
, for that matter), didn’t resolve at thetime this blog article was written.
Considering another example,
uoregon.edu
, resolved to just a single IPaddress. That IP address was reported as being located in the US (as we’dexpect) per the Geolite2 database:
uoregon.edu 128.223.142.244 US
krasnoyarsk.edu
, on the other hand, resolved to a single IP address thatgeolocated to Russia:
krasnoyarsk.edu 193.218.136.140 RU
yale.edu
, the domain of Yale University, resolved to six unique IPaddresses, all of which geolocated to the United States:
yale.edu 104.16.245.46 US
yale.edu 104.16.243.4 US
yale.edu 104.16.241.46 US
yale.edu 104.16.244.46 US
yale.edu 104.16.242.46 US
yale.edu 2400:cb00:2048:1::6810:f12e US
monash.edu
, a domain traditionally associated with Australia, resolved toseven unique IP addresses that resolved to other country codes as well asAustralia:
monash.edu 185.64.253.1 GB
monash.edu 202.9.95.188 AU
monash.edu 54.214.33.151 US
monash.edu 54.232.88.45 BR
monash.edu 119.9.73.89 HK
monash.edu 166.78.109.115 US
monash.edu 176.32.95.209 JP
If we take the file of domains that did resolve to at least one IP, and look just at the BASE DOMAINS (e.g., domains like
uoregon.edu
, not fully qualified domain names like
www.uoregon.edu
) there were 1,655 unique domains:
All-in-all, we found 73 unique records associated with non-US dot edus:
AE dmcg.edu 194.170.31.41
AT mci.edu 193.171.232.28
AU canberra.edu 137.92.97.131
AU monash.edu 202.9.95.188
BG africau.edu 185.62.238.50
BR monash.edu 54.232.88.45
CA columbiacollege.edu 159.203.39.159
CA cursus.edu 192.99.38.118
CA digipen.edu 204.174.42.104
CA marianopolis.edu 206.47.149.100
CA niagara.edu 192.188.5.61
CA pwcs.edu 155.254.146.72
CA toronto.edu 128.100.166.120
CA unb.edu 131.202.1.106
DE glion.edu 52.58.68.105
DE isb.edu 144.76.121.175
DE kit.edu 129.13.40.10
DE kit.edu 2a00:1398:9:fd10::810d:280a
EE nel.edu 217.146.69.9
EE nel.edu 2a02:29e8:770:0:3::19
EG aast.edu 196.219.60.10
ES esade.edu 213.4.197.20
ES esade.edu 84.88.228.20
ES mondragon.edu 193.146.78.2
ES ub.edu 161.116.100.2
ES ucam.edu 193.147.26.228
ES uoc.edu 213.73.40.242
ES upc.edu 147.83.2.135
ES upf.edu 84.89.128.15
GB london.edu 163.119.244.27
GB marygrove.edu 2a02:fe80:1010::10:7
GB monash.edu 185.64.253.1
GR hauniv.edu 194.219.151.109
HK monash.edu 119.9.73.89
HK ncuindia.edu 119.9.107.27
HU ceu.edu 193.6.218.8
ID stts.edu 139.255.65.82
ID upi.edu 103.23.244.5
IE ie.edu 52.218.16.178
IN cds.edu 202.88.238.244
IN jipmer.edu 210.212.230.85
IN nitt.edu 203.129.195.156
IN ritindia.edu 202.38.172.143
IN sastra.edu 14.139.181.236
IN sastra.edu 220.225.137.243
JP monash.edu 176.32.95.209
KR skku.edu 115.145.129.184
NL tilburguniversity.edu 137.56.209.21
NL tilburguniversity.edu 137.56.209.22
NL tilburguniversity.edu 2001:610:1410:280:24ee:f0cd:bb36:7745
NL tul.edu 137.120.30.68
None safa.edu 130.117.92.15
NO ntnu.edu 129.241.56.117
PH aiias.edu 116.93.59.233
PH ateneo.edu 202.125.102.21
PH ubaguio.edu 122.55.103.201
PK pgc.edu 119.159.229.143
PR uprm.edu 136.145.30.119
PS hebron.edu 82.213.57.178
PS hebron.edu 93.184.9.13
PS iugaza.edu 195.189.210.6
RU crimea.edu 80.245.119.130
RU krasnoyarsk.edu 193.218.136.140
RU mpgu.edu 91.143.47.22
RU phystech.edu 93.175.31.131
RU spb.edu 195.70.196.197
SD sustech.edu 41.67.53.4
SD uofk.edu 2c0f:fec8:1000::5
SD uofk.edu 41.67.20.5
SG galgotiacollege.edu 119.81.113.118
TH au.edu 168.120.16.231
TR metu.edu 144.122.144.137
TR sabanciuniv.edu 193.255.135.111
These domains represent non-U.S. dot edus that are in routine/continual use,and serve as a nice reminder that we cannot safely assume that all dot edusare located in the United States.
However, the above was just from a small 5,000,000 observation drawn fromSIE — less than a five minute sample.
What would we see if we were to look at results from DNSDB for a longerperiod?
We can pull a longer sample of dot edu A and AAAA records from DNSDB, anddemonstrate production of json format output from DNSDB, by saying:
$ dnsdb_query.py -r \*.edu/A -j --after=2016-08-01 -l 1000000 > pdns.txt
$ dnsdb_query.py -r \*.edu/AAAA -j --after=2016-08-01 -l 1000000 >> pdns.txt
As of August 18th, 2016, a typical observation from our pdns.txt output filelooked like:
{"count": 358354, "time_first": 1277351519, "rrtype": "A",
"rrname": "purgatory.bc.edu.", "bailiwick": "bc.edu.",
"rdata": ["136.167.2.254"], "time_last": 1471365799}
We can use jq to do data “surgery” on thatoutput, keeping just the rrname values that we’re interested in:
$ cat pdns.txt | jq .rrname | sed 's/"//g' | 2nd-level-dom | sort -u > pdns-edus.txt
That left us with 8,028 lines that looked like:
22cf.edu
3ponts.edu
4cd.edu
4dcollege.edu
aa.edu
aaa.edu
aaaom.edu
aaart.edu
[etc]
How “complete” and “right” is our new larger list? Did we see any odd pseudodomains? Have we found “all” dot edu domains? If we had a copy of the dot eduzone file, we could compare what we’ve found to what Educause actuallyincludes in their zone, but unfortunately the dot zone file is not publiclyavailable.
Fortunately, Educause does at least publish a summary graph that shows the size of the dotedu zone file — 7,524 is the latest value published there.
How does that compare to U.S. Department of Education statistics? Well, weknow that as of 2012-2013,there were 7,253 “Title IV Postsecondary Institutions” (e.g., institutionswhose students are eligible to receive Stafford loans or other Federal studentfinancial aid), of which 4,726 were/are degree-granting institutions. How dowe explain the fact that our list has 8,028 domains, given that fact? Well, wenote the following phenomena:
emt-t-1006862691-1429043183732-2-ez.edu
emt-t-1008366659-1428438754423-2-qg.edu
emt-t-1008428805-1429569009365-2-hlmdn.edu
[etc]
and
t-1019030600-1424225115847-2-lclrb.edu
t-102097433-1424812217636-2-dnc.edu
t-1023615998-1398918487508-2-nt.edu
[etc]
After removing those pseudo domains, our list dropped to 7,106 domains.
Digging a bit further into the Department of Education web site, we find theDepartment’s database of accredited postsecondary institutions and programs.
Hereyou will find a spreadsheet with 9,546 unique institutional names known to theDepartment of Education. Some of those educational activities are not what youor I might typically think of as traditional colleges or universities,including entries for specialized programs, focused offsite professionalprograms, and/or non-degree-granting programs. Some examples ofless-traditional entries included:
In other cases, the same overarching organization may be listed multiple timesdue to multiple branch or satellite locations. For example:
We also know that some recognized institutions may use a dot com, dot org,dot us, or other non-dot edu domain name for any of a variety of reasons.
Bottom line, it can be complex to find exactly how many dot edu-eligibleinstitutions actually exist. Still, we believe that our list of 7,106 dotedus is 94.4% of Educause’s 7,524 figure, and likely represents virtually alldot edu domains in actual active use.
We then processed those 7,106 dot edu base domains through our samedomain-to-country-code.py script, resulting in 7,666 unique returneddomain/IP pairs associated with 6,756 unique domains.
687 of those, as shown in Appendix III, were non-US dot edu domain/IP pairs,with 625 unique non-US dot edu domains.
What this all means for you:
**Acknowledgements**: The author gratefully acknowledges the assistance of hiscolleague Mr. Gabriel Iovino of Farsight Security, Inc., for insightfulsuggestions related to an earlier draft of this article, although soleresponsibility for the content of this article remains with the author.
Also, thank you Mr. Robert Edmonds, now of Fastly, Inc., for your helpfulcomments.
#!/usr/bin/perl
use strict;
use warnings;
use IO::Socket::SSL::PublicSuffix;
my $pslfile = '/your_path_to_the/public_suffix_list.dat';
my $ps = IO::Socket::SSL::PublicSuffix->from_file($pslfile);
my $line;
foreach $line (<>) {
chomp($line);
my $root_domain = $ps->public_suffix($line,1);
printf( "%s\n", $root_domain );
}
Notes:
public_suffix_list.dat
IO::Socket::SSL::PublicSuffix
#!/usr/bin/python -u
import sys
import dns.resolver
import geoip2.database
myResolver = dns.resolver.Resolver()
myResolver2 = dns.resolver.Resolver()
reader = geoip2.database.Reader('/your_path_to_the/GeoLite2-Country.mmdb')
reader2 = geoip2.database.Reader('/your_path_to_the/GeoLite2-Country.mmdb')
try:
line = raw_input()
except EOFError:
print "Please pipe in a list of domains to process..."
exit()
while line:
# IPv4
clean=0
cleana=0
try:
myAnswers = myResolver.query(line, "A")
except:
clean=1
if clean == 0:
for rdata in myAnswers:
try:
response = reader.country(rdata)
except:
cleana=1
if cleana == 0:
print line,rdata,response.country.iso_code
# IPv6
clean2=0
clean2a=0
try:
myAnswers2 = myResolver2.query(line, "AAAA")
except:
clean2=1
if clean2 == 0:
for rdata2 in myAnswers2:
try:
response2 = reader2.country(rdata2)
except:
clean2a=1
if (clean2 == 0) and (clean2a == 0):
print line,rdata2,response2.country.iso_code
try:
line = raw_input()
except EOFError:
sys.exit(0)
Notes:
python -u
-u
GeoLite2-Country.mmdb hereRequired Attribution: This paper includes GeoLite2 data created byMaxMind.
Available here.
Joe St Sauver, Ph.D. is a Scientist with Farsight Security, Inc.