Non-U.S. Universities with .edu Domain Names: They're More Common Than You Might Think
1. Introduction
In general, as stated here:
Only U.S. postsecondary institutions that are institutionally accredited by an agency on the U.S. Department of Education’s list of Nationally Recognized Accrediting Agencies may obtain an Internet name in the .edu domain.
However, that policy needs to be read in light of a “grandfather clause”, which clarifies:
The Cooperative Agreement between EDUCAUSE and the U.S. Department of Commerce specifies that all .edu names in existence as of October 29, 2001 are “grandfathered,” regardless of current or past eligibility requirements.
We were curious to see how many 2nd-level dot edu domains (including grandfathered edus) actually map to non-US IP address space. This may be of some practical importance since often people forget that users coming from legacy dot edu domains may not be from the United States.
Now obviously, a non-US university could elect to host their domain in US address space, or a US university could choose to host their domain in non-US address space, but for the most part we’d expect to see US universities in US IP address space, and international universities in non-US address space.
So can we identify dot edu domains that are hosted outside the US? It turns out that yes, yes we can.
2. Dot Edu Domains Seen In Real Time At The Security Information Exchange
Working from a local SIE blade server, we begin by capturing a small sample of 5,000,000 observations from Channel 204, our deduplicated/filtered/verified passive DNS channel. We’ll just keep the fully qualified domain names (FQDNs) we saw from those observations. This short collection period was meant to capture dot edu base domains that are in routine/continual use. It took just a little over four minutes to collect those observations with the command:
$ nmsgtool -C ch204 -c 5000000 | grep rrname | awk '{print $2}' > ch204.txt
We now want to simplify those observed names by converting the FQDNs we found into 2nd level dot edu domain names using a small Perl script that checks the Public Suffix List (that and another script used in this article follows the body of the article):
$ 2nd-level-dom < ch204.txt | sort -u > ch204-2nd-level.txt
That left us with 1,731,891 unique effective 2nd level domains domains. 1,806 of those had a dot edu domain name. However, some of those 1,806 were hashed DNSSEC owner names such as:
083ued1lj2fhb4d8m9j6q04r899q6kg0.edu 14pj7h7elaug4rk38hmrrrsf4vrpjo1j.edu 1jud27g70lsul7a1m8dkj8qhicbtjf80.edu amehl7oe9aug1shfmkt24g10d5204gpe.edu dkfbc38o8216a3bfbdik8l98oh2udkci.edu
Removing those, we’re left with 1,733 domains.
3. Resolving Dot Edus to IPs, and IPs to GeoIP Country Codes
We then ran those 1,733 dot edu domains through a little Python script (see appendix II):
$ domain-to-country-code.py < edus.txt > edus-processed.txt
That script looked for “A” and/or “AAAA” (IPv4 and IPv6) records associated with each of the observed 2nd level edu domains, and then mapped the resulting IP addresses to two letter country codes using Maxmind’s GeoLite2 Country database.
When that script finished, we were left with 1,983 records: 1,910 of those records represented dot edu/IP address pairs that had a US country code. 73 of those records had a non-US country code.
Some of you may wonder how we were able to end up with 1,983 resolved addresses from just 1,733 dot edu 2nd level domains… Some of the 2nd level domains we started with didn’t resolve at all, some resolved to just a single IP address, while others resolved to multiple IPv4 or IPv6 addresses.
For example, the 2nd level domain nodak.edu
— unquestionably a real dot edu
domain — didn’t resolve at all. www.edutech.nodak.edu
and other specific
fully qualified domain names from nodak.edu
definitely resolved, but
nodak.edu
itself (or www.nodak.edu
, for that matter), didn’t resolve at the
time this blog article was written.
Considering another example, uoregon.edu
, resolved to just a single IP
address. That IP address was reported as being located in the US (as we’d
expect) per the Geolite2 database:
uoregon.edu 128.223.142.244 US
krasnoyarsk.edu
, on the other hand, resolved to a single IP address that
geolocated to Russia:
krasnoyarsk.edu 193.218.136.140 RU
yale.edu
, the domain of Yale University, resolved to six unique IP
addresses, all of which geolocated to the United States:
yale.edu 104.16.245.46 US yale.edu 104.16.243.4 US yale.edu 104.16.241.46 US yale.edu 104.16.244.46 US yale.edu 104.16.242.46 US yale.edu 2400:cb00:2048:1::6810:f12e US
monash.edu
, a domain traditionally associated with Australia, resolved to
seven unique IP addresses that resolved to other country codes as well as
Australia:
monash.edu 185.64.253.1 GB monash.edu 202.9.95.188 AU monash.edu 54.214.33.151 US monash.edu 54.232.88.45 BR monash.edu 119.9.73.89 HK monash.edu 166.78.109.115 US monash.edu 176.32.95.209 JP
If we take the file of domains that did resolve to at least one IP, and
look just at the BASE DOMAINS (e.g., domains like uoregon.edu
, not fully
qualified domain names like www.uoregon.edu
) there were 1,655 unique domains:
- 1,594 unique domains associated solely with US IPs,
- 59 unique domains associated solely with non-US IP, and
- 2 unique domains that had a combination of both US and non-US IPs.
4. Full Results From Our Brief SIE Sample
All-in-all, we found 73 unique records associated with non-US dot edus:
AE dmcg.edu 194.170.31.41 AT mci.edu 193.171.232.28 AU canberra.edu 137.92.97.131 AU monash.edu 202.9.95.188 BG africau.edu 185.62.238.50 BR monash.edu 54.232.88.45 CA columbiacollege.edu 159.203.39.159 CA cursus.edu 192.99.38.118 CA digipen.edu 204.174.42.104 CA marianopolis.edu 206.47.149.100 CA niagara.edu 192.188.5.61 CA pwcs.edu 155.254.146.72 CA toronto.edu 128.100.166.120 CA unb.edu 131.202.1.106 DE glion.edu 52.58.68.105 DE isb.edu 144.76.121.175 DE kit.edu 129.13.40.10 DE kit.edu 2a00:1398:9:fd10::810d:280a EE nel.edu 217.146.69.9 EE nel.edu 2a02:29e8:770:0:3::19 EG aast.edu 196.219.60.10 ES esade.edu 213.4.197.20 ES esade.edu 84.88.228.20 ES mondragon.edu 193.146.78.2 ES ub.edu 161.116.100.2 ES ucam.edu 193.147.26.228 ES uoc.edu 213.73.40.242 ES upc.edu 147.83.2.135 ES upf.edu 84.89.128.15 GB london.edu 163.119.244.27 GB marygrove.edu 2a02:fe80:1010::10:7 GB monash.edu 185.64.253.1 GR hauniv.edu 194.219.151.109 HK monash.edu 119.9.73.89 HK ncuindia.edu 119.9.107.27 HU ceu.edu 193.6.218.8 ID stts.edu 139.255.65.82 ID upi.edu 103.23.244.5 IE ie.edu 52.218.16.178 IN cds.edu 202.88.238.244 IN jipmer.edu 210.212.230.85 IN nitt.edu 203.129.195.156 IN ritindia.edu 202.38.172.143 IN sastra.edu 14.139.181.236 IN sastra.edu 220.225.137.243 JP monash.edu 176.32.95.209 KR skku.edu 115.145.129.184 NL tilburguniversity.edu 137.56.209.21 NL tilburguniversity.edu 137.56.209.22 NL tilburguniversity.edu 2001:610:1410:280:24ee:f0cd:bb36:7745 NL tul.edu 137.120.30.68 None safa.edu 130.117.92.15 NO ntnu.edu 129.241.56.117 PH aiias.edu 116.93.59.233 PH ateneo.edu 202.125.102.21 PH ubaguio.edu 122.55.103.201 PK pgc.edu 119.159.229.143 PR uprm.edu 136.145.30.119 PS hebron.edu 82.213.57.178 PS hebron.edu 93.184.9.13 PS iugaza.edu 195.189.210.6 RU crimea.edu 80.245.119.130 RU krasnoyarsk.edu 193.218.136.140 RU mpgu.edu 91.143.47.22 RU phystech.edu 93.175.31.131 RU spb.edu 195.70.196.197 SD sustech.edu 41.67.53.4 SD uofk.edu 2c0f:fec8:1000::5 SD uofk.edu 41.67.20.5 SG galgotiacollege.edu 119.81.113.118 TH au.edu 168.120.16.231 TR metu.edu 144.122.144.137 TR sabanciuniv.edu 193.255.135.111
These domains represent non-U.S. dot edus that are in routine/continual use, and serve as a nice reminder that we cannot safely assume that all dot edus are located in the United States.
However, the above was just from a small 5,000,000 observation drawn from SIE — less than a five minute sample.
What would we see if we were to look at results from DNSDB for a longer period?
5. A Longer Sample of Unique Dot edus from DNSDB
We can pull a longer sample of dot edu A and AAAA records from DNSDB, and demonstrate production of json format output from DNSDB, by saying:
$ dnsdb_query.py -r \*.edu/A -j --after=2016-08-01 -l 1000000 > pdns.txt $ dnsdb_query.py -r \*.edu/AAAA -j --after=2016-08-01 -l 1000000 >> pdns.txt
As of August 18th, 2016, a typical observation from our pdns.txt output file looked like:
{"count": 358354, "time_first": 1277351519, "rrtype": "A", "rrname": "purgatory.bc.edu.", "bailiwick": "bc.edu.", "rdata": ["136.167.2.254"], "time_last": 1471365799}
We can use jq to do data “surgery” on that output, keeping just the rrname values that we’re interested in:
$ cat pdns.txt | jq .rrname | sed 's/"//g' | 2nd-level-dom | sort -u > pdns-edus.txt
That left us with 8,028 lines that looked like:
22cf.edu 3ponts.edu 4cd.edu 4dcollege.edu aa.edu aaa.edu aaaom.edu aaart.edu [etc]
6. Data Completeness and Quality
How “complete” and “right” is our new larger list? Did we see any odd pseudo domains? Have we found “all” dot edu domains? If we had a copy of the dot edu zone file, we could compare what we’ve found to what Educause actually includes in their zone, but unfortunately the dot zone file is not publicly available.
Fortunately, Educause does at least publish a summary graph that shows the size of the dot edu zone file — 7,524 is the latest value published there.
How does that compare to U.S. Department of Education statistics? Well, we know that as of 2012-2013, there were 7,253 “Title IV Postsecondary Institutions” (e.g., institutions whose students are eligible to receive Stafford loans or other Federal student financial aid), of which 4,726 were/are degree-granting institutions. How do we explain the fact that our list has 8,028 domains, given that fact? Well, we note the following phenomena:
- Some grandfathered institutions may have multiple dot edu domain, not
withstanding the fact that current policies only allow new dot edu
applicants to obtain a single dot edu domain. For example:
- iu.edu and indiana.edu
- orst.edu and oregonstate.edu
- uw.edu and washington.edu
- Some dot edu domains seen in DNSDB may not be “real.” For instance, we discovered a number of “pseudo domains” in dot edu associated with a particular Verisign monitoring netblock. Those base domains have very easily-recognized formats, such as
emt-t-1006862691-1429043183732-2-ez.edu emt-t-1008366659-1428438754423-2-qg.edu emt-t-1008428805-1429569009365-2-hlmdn.edu [etc]
and
t-1019030600-1424225115847-2-lclrb.edu t-102097433-1424812217636-2-dnc.edu t-1023615998-1398918487508-2-nt.edu [etc]
After removing those pseudo domains, our list dropped to 7,106 domains.
Digging a bit further into the Department of Education web site, we find the Department’s database of accredited postsecondary institutions and programs.
Here
you will find a spreadsheet with 9,546 unique institutional names known to the
Department of Education. Some of those educational activities are not what you
or I might typically think of as traditional colleges or universities,
including entries for specialized programs, focused offsite professional
programs, and/or non-degree-granting programs. Some examples of
less-traditional entries included:
- 2nd Dental Battalion Naval Dental Clinic/Le Jeune Advanced Education in General Dentistry 12 Months
- Alfred I. duPont Hospital for Children
- Aurora University at Carpentersville Middle School
- Earl Warren Adult School-California Correctional Center
- George Washington University at The National Geospatial Intelligence Agency
- US Army Armor School
- [etc]
In other cases, the same overarching organization may be listed multiple times
due to multiple branch or satellite locations. For example:
- Empire Beauty School
- Empire Beauty School – Baltimore
- Empire Beauty School – Boston
- Empire Beauty School – Cincinnati
- Empire Beauty School – Framingham
- Empire Beauty School – Glendale
- Empire Beauty School – Grand Rapids
- Empire Beauty School – Kennesaw
- Empire Beauty School – Laconia
- Empire Beauty School – Lehigh Valley
- Empire Beauty School – Malden
- Empire Beauty School – Peekskill
- Empire Beauty School – Philadelphia
- Empire Beauty School – Portland
- Empire Beauty School – Pottstown
- Empire Beauty School – Pottsville
- Empire Beauty School – Queens
- Empire Beauty School – Somersworth
- Empire Beauty School – Westminster
- Empire Beauty School – Wyoming Valley
We also know that some recognized institutions may use a dot com, dot org, dot us, or other non-dot edu domain name for any of a variety of reasons.
Bottom line, it can be complex to find exactly how many dot edu-eligible institutions actually exist. Still, we believe that our list of 7,106 dot edus is 94.4% of Educause’s 7,524 figure, and likely represents virtually all dot edu domains in actual active use.
7. Final Results
We then processed those 7,106 dot edu base domains through our same domain-to-country-code.py script, resulting in 7,666 unique returned domain/IP pairs associated with 6,756 unique domains.
687 of those, as shown in Appendix III, were non-US dot edu domain/IP pairs, with 625 unique non-US dot edu domains.
What this all means for you:
Any assumption that not-sharing zone files will keep third parties from identifying domain names in routine use is obviously a bad assumption — we’ve demonstrated that we can easily identify virtually all of dot edu from passive DNS. The implications of this for dot gov and dot mil are obvious: just as we passively enumerated dot edu, an attacker could just as easily passively enumerate the far-more-sensitive dot gov or dot mil zones.
Any assumption that dot edus are all US-located is clearly wrong (potentially relevant for crypto deemed-export rules and ITAR-controlled research work).
Some who are doing cyber security-related work routinely want to map IP addresses to country codes, but think that there’s some complicated magic involved in doing so. There’s not. The process is simple and easily accomplished. That’s one of the things the code provided with the article is meant to concretely demonstrate.
Farsight is routinely in discussions with potential partners offering other data enrichment feedss. This article provides a nice example of how that process can work, and the synergies that can arise from combining passive DNS with other third party data sources.
**Acknowledgements**: The author gratefully acknowledges the assistance of his colleague Mr. Gabriel Iovino of Farsight Security, Inc., for insightful suggestions related to an earlier draft of this article, although sole responsibility for the content of this article remains with the author.
Also, thank you Mr. Robert Edmonds, now of Fastly, Inc., for your helpful comments.
Appendix I. 2nd-level-dom script
#!/usr/bin/perl
use strict;
use warnings;
use IO::Socket::SSL::PublicSuffix;
my $pslfile = '/your_path_to_the/public_suffix_list.dat';
my $ps = IO::Socket::SSL::PublicSuffix->from_file($pslfile);
my $line;
foreach $line (<>) {
chomp($line);
my $root_domain = $ps->public_suffix($line,1);
printf( "%s\n", $root_domain );
}
Notes:
Appendix II. domain-to-country-code.py
#!/usr/bin/python -u
import sys
import dns.resolver
import geoip2.database
myResolver = dns.resolver.Resolver()
myResolver2 = dns.resolver.Resolver()
reader = geoip2.database.Reader('/your_path_to_the/GeoLite2-Country.mmdb')
reader2 = geoip2.database.Reader('/your_path_to_the/GeoLite2-Country.mmdb')
try:
line = raw_input()
except EOFError:
print "Please pipe in a list of domains to process..."
exit()
while line:
# IPv4
clean=0
cleana=0
try:
myAnswers = myResolver.query(line, "A")
except:
clean=1
if clean == 0:
for rdata in myAnswers:
try:
response = reader.country(rdata)
except:
cleana=1
if cleana == 0:
print line,rdata,response.country.iso_code
# IPv6
clean2=0
clean2a=0
try:
myAnswers2 = myResolver2.query(line, "AAAA")
except:
clean2=1
if clean2 == 0:
for rdata2 in myAnswers2:
try:
response2 = reader2.country(rdata2)
except:
clean2a=1
if (clean2 == 0) and (clean2a == 0):
print line,rdata2,response2.country.iso_code
try:
line = raw_input()
except EOFError:
sys.exit(0)
Notes:
python -u
unbuffers the output when this script is run; if you are not running the script interactively you can omit the-u
- If you don’t care about IPv6, you can omit the IPv6 section of this script to just see IPv4 results
- Obtain
GeoLite2-Country.mmdb
here - Get the Python library required to process the database here
Required Attribution: This paper includes GeoLite2 data created by MaxMind.
Appendix III. Non-US Dot Edus Seen In Our August DNSDB Data
Available here.
Joe St Sauver, Ph.D. is a Scientist with Farsight Security, Inc.