Farsight TXT Record

Non-U.S. Universities with .edu Domain Names: They're More Common Than You Might Think

Written by: 
Published on: 
Aug 25, 2016
On This Page
Share:

1. Introduction

In general, as statedhere:

Only U.S. postsecondary institutions that are institutionally accredited byan agency on the U.S. Department of Education’s list of NationallyRecognized Accrediting Agencies may obtain an Internet name in the .edudomain.

However, that policy needs to be read in light of a “grandfather clause”, whichclarifies:

The Cooperative Agreement between EDUCAUSE and the U.S. Department ofCommerce specifies that all .edu names in existence as of October 29, 2001 are“grandfathered,” regardless of current or past eligibility requirements.

We were curious to see how many 2nd-level dot edu domains (includinggrandfathered edus) actually map to non-US IP address space. This may be ofsome practical importance since often people forget that users coming fromlegacy dot edu domains may not be from the United States.

Now obviously, a non-US university could elect to host their domain in USaddress space, or a US university could choose to host their domain in non-USaddress space, but for the most part we’d expect to see US universities in USIP address space, and international universities in non-US address space.

So can we identify dot edu domains that are hosted outside the US? It turnsout that yes, yes we can.

2. Dot Edu Domains Seen In Real Time At The Security Information Exchange

Working from a local SIEblade server, we begin by capturing a small sample of 5,000,000 observationsfrom Channel 204, our deduplicated/filtered/verified passive DNS channel. We’ll justkeep the fully qualified domain names (FQDNs) we saw from those observations.This short collection period was meant to capture dot edu base domains that arein routine/continual use. It took just a little over four minutes to collectthose observations with the command:

$ nmsgtool -C ch204 -c 5000000 | grep rrname | awk '{print $2}' > ch204.txt

We now want to simplify those observed names by converting the FQDNs we foundinto 2nd level dot edu domain names using a small Perl script that checks thePublic Suffix List (that and another script usedin this article follows the body of the article):

$ 2nd-level-dom < ch204.txt | sort -u > ch204-2nd-level.txt

That left us with 1,731,891 unique effective 2nd level domains domains.1,806 of those had a dot edu domain name. However, some of those 1,806 werehashed DNSSEC owner names such as:

083ued1lj2fhb4d8m9j6q04r899q6kg0.edu
14pj7h7elaug4rk38hmrrrsf4vrpjo1j.edu
1jud27g70lsul7a1m8dkj8qhicbtjf80.edu
amehl7oe9aug1shfmkt24g10d5204gpe.edu
dkfbc38o8216a3bfbdik8l98oh2udkci.edu

Removing those, we’re left with 1,733 domains.

3. Resolving Dot Edus to IPs, and IPs to GeoIP Country Codes

We then ran those 1,733 dot edu domains through a little Python script(see appendix II):

$ domain-to-country-code.py < edus.txt > edus-processed.txt

That script looked for “A” and/or “AAAA” (IPv4 and IPv6) records associatedwith each of the observed 2nd level edu domains, and then mapped the resultingIP addresses to two letter country codes usingMaxmind’s GeoLite2 Country database.

When that script finished, we were left with 1,983 records: 1,910 of thoserecords represented dot edu/IP address pairs that had a US country code. 73 ofthose records had a non-US country code.

Some of you may wonder how we were able to end up with 1,983 resolvedaddresses from just 1,733 dot edu 2nd level domains… Some of the 2nd leveldomains we started with didn’t resolve at all, some resolved to just a singleIP address, while others resolved to multiple IPv4 or IPv6 addresses.

For example, the 2nd level domain

nodak.edu

— unquestionably a real dot edudomain — didn’t resolve at all.

www.edutech.nodak.edu

and other specificfully qualified domain names from

nodak.edu

definitely resolved, but

nodak.edu

itself (or

www.nodak.edu

, for that matter), didn’t resolve at thetime this blog article was written.

Considering another example,

uoregon.edu

, resolved to just a single IPaddress. That IP address was reported as being located in the US (as we’dexpect) per the Geolite2 database:

uoregon.edu 128.223.142.244 US

krasnoyarsk.edu

, on the other hand, resolved to a single IP address thatgeolocated to Russia:

krasnoyarsk.edu 193.218.136.140 RU

yale.edu

, the domain of Yale University, resolved to six unique IPaddresses, all of which geolocated to the United States:

yale.edu 104.16.245.46 US
yale.edu 104.16.243.4 US
yale.edu 104.16.241.46 US
yale.edu 104.16.244.46 US
yale.edu 104.16.242.46 US
yale.edu 2400:cb00:2048:1::6810:f12e US

monash.edu

, a domain traditionally associated with Australia, resolved toseven unique IP addresses that resolved to other country codes as well asAustralia:

monash.edu 185.64.253.1 GB
monash.edu 202.9.95.188 AU
monash.edu 54.214.33.151 US
monash.edu 54.232.88.45 BR
monash.edu 119.9.73.89 HK
monash.edu 166.78.109.115 US
monash.edu 176.32.95.209 JP

If we take the file of domains that did resolve to at least one IP, and look just at the BASE DOMAINS (e.g., domains like

uoregon.edu

, not fully qualified domain names like

www.uoregon.edu

) there were 1,655 unique domains:

  • 1,594 unique domains associated solely with US IPs,
  • 59 unique domains associated solely with non-US IP, and
  • 2 unique domains that had a combination of both US and non-US IPs.

4. Full Results From Our Brief SIE Sample

All-in-all, we found 73 unique records associated with non-US dot edus:

AE dmcg.edu 194.170.31.41
AT mci.edu 193.171.232.28
AU canberra.edu 137.92.97.131
AU monash.edu 202.9.95.188
BG africau.edu 185.62.238.50
BR monash.edu 54.232.88.45
CA columbiacollege.edu 159.203.39.159
CA cursus.edu 192.99.38.118
CA digipen.edu 204.174.42.104
CA marianopolis.edu 206.47.149.100
CA niagara.edu 192.188.5.61
CA pwcs.edu 155.254.146.72
CA toronto.edu 128.100.166.120
CA unb.edu 131.202.1.106
DE glion.edu 52.58.68.105
DE isb.edu 144.76.121.175
DE kit.edu 129.13.40.10
DE kit.edu 2a00:1398:9:fd10::810d:280a
EE nel.edu 217.146.69.9
EE nel.edu 2a02:29e8:770:0:3::19
EG aast.edu 196.219.60.10
ES esade.edu 213.4.197.20
ES esade.edu 84.88.228.20
ES mondragon.edu 193.146.78.2
ES ub.edu 161.116.100.2
ES ucam.edu 193.147.26.228
ES uoc.edu 213.73.40.242
ES upc.edu 147.83.2.135
ES upf.edu 84.89.128.15
GB london.edu 163.119.244.27
GB marygrove.edu 2a02:fe80:1010::10:7
GB monash.edu 185.64.253.1
GR hauniv.edu 194.219.151.109
HK monash.edu 119.9.73.89
HK ncuindia.edu 119.9.107.27
HU ceu.edu 193.6.218.8
ID stts.edu 139.255.65.82
ID upi.edu 103.23.244.5
IE ie.edu 52.218.16.178
IN cds.edu 202.88.238.244
IN jipmer.edu 210.212.230.85
IN nitt.edu 203.129.195.156
IN ritindia.edu 202.38.172.143
IN sastra.edu 14.139.181.236
IN sastra.edu 220.225.137.243
JP monash.edu 176.32.95.209
KR skku.edu 115.145.129.184
NL tilburguniversity.edu 137.56.209.21
NL tilburguniversity.edu 137.56.209.22
NL tilburguniversity.edu 2001:610:1410:280:24ee:f0cd:bb36:7745
NL tul.edu 137.120.30.68
None safa.edu 130.117.92.15
NO ntnu.edu 129.241.56.117
PH aiias.edu 116.93.59.233
PH ateneo.edu 202.125.102.21
PH ubaguio.edu 122.55.103.201
PK pgc.edu 119.159.229.143
PR uprm.edu 136.145.30.119
PS hebron.edu 82.213.57.178
PS hebron.edu 93.184.9.13
PS iugaza.edu 195.189.210.6
RU crimea.edu 80.245.119.130
RU krasnoyarsk.edu 193.218.136.140
RU mpgu.edu 91.143.47.22
RU phystech.edu 93.175.31.131
RU spb.edu 195.70.196.197
SD sustech.edu 41.67.53.4
SD uofk.edu 2c0f:fec8:1000::5
SD uofk.edu 41.67.20.5
SG galgotiacollege.edu 119.81.113.118
TH au.edu 168.120.16.231
TR metu.edu 144.122.144.137
TR sabanciuniv.edu 193.255.135.111

These domains represent non-U.S. dot edus that are in routine/continual use,and serve as a nice reminder that we cannot safely assume that all dot edusare located in the United States.

However, the above was just from a small 5,000,000 observation drawn fromSIE — less than a five minute sample.

What would we see if we were to look at results from DNSDB for a longerperiod?

5. A Longer Sample of Unique Dot edus from DNSDB

We can pull a longer sample of dot edu A and AAAA records from DNSDB, anddemonstrate production of json format output from DNSDB, by saying:

$ dnsdb_query.py -r \*.edu/A -j --after=2016-08-01 -l 1000000 > pdns.txt
$ dnsdb_query.py -r \*.edu/AAAA -j --after=2016-08-01 -l 1000000 >> pdns.txt

As of August 18th, 2016, a typical observation from our pdns.txt output filelooked like:

{"count": 358354, "time_first": 1277351519, "rrtype": "A",
"rrname": "purgatory.bc.edu.", "bailiwick": "bc.edu.",
"rdata": ["136.167.2.254"], "time_last": 1471365799}

We can use jq to do data “surgery” on thatoutput, keeping just the rrname values that we’re interested in:

$ cat pdns.txt | jq .rrname | sed 's/"//g' | 2nd-level-dom | sort -u > pdns-edus.txt

That left us with 8,028 lines that looked like:

22cf.edu
3ponts.edu
4cd.edu
4dcollege.edu
aa.edu
aaa.edu
aaaom.edu
aaart.edu
​[etc]

6. Data Completeness and Quality

How “complete” and “right” is our new larger list? Did we see any odd pseudodomains? Have we found “all” dot edu domains? If we had a copy of the dot eduzone file, we could compare what we’ve found to what Educause actuallyincludes in their zone, but unfortunately the dot zone file is not publiclyavailable.

Fortunately, Educause does at least publish a summary graph that shows the size of the dotedu zone file — 7,524 is the latest value published there.

How does that compare to U.S. Department of Education statistics? Well, weknow that as of 2012-2013,there were 7,253 “Title IV Postsecondary Institutions” (e.g., institutionswhose students are eligible to receive Stafford loans or other Federal studentfinancial aid), of which 4,726 were/are degree-granting institutions. How dowe explain the fact that our list has 8,028 domains, given that fact? Well, wenote the following phenomena:

  • Some grandfathered institutions may have multiple dot edu domain, notwithstanding the fact that current policies only allow new dot eduapplicants to obtain a single dot edu domain. For example:
    • iu.edu and indiana.edu
    • orst.edu and oregonstate.edu
    • uw.edu and washington.edu
  • Some dot edu domains seen in DNSDB may not be “real.” For instance, wediscovered a number of “pseudo domains” in dot edu associated with aparticular Verisign monitoring netblock. Those base domains have veryeasily-recognized formats, such as

emt-t-1006862691-1429043183732-2-ez.edu
emt-t-1008366659-1428438754423-2-qg.edu
emt-t-1008428805-1429569009365-2-hlmdn.edu
​[etc]

and

t-1019030600-1424225115847-2-lclrb.edu
t-102097433-1424812217636-2-dnc.edu
t-1023615998-1398918487508-2-nt.edu
​[etc]

After removing those pseudo domains, our list dropped to 7,106 domains.

Digging a bit further into the Department of Education web site, we find theDepartment’s database of accredited postsecondary institutions and programs.

Hereyou will find a spreadsheet with 9,546 unique institutional names known to theDepartment of Education. Some of those educational activities are not what youor I might typically think of as traditional colleges or universities,including entries for specialized programs, focused offsite professionalprograms, and/or non-degree-granting programs. Some examples ofless-traditional entries included:

  • 2nd Dental Battalion Naval Dental Clinic/Le Jeune Advanced Education in General Dentistry 12 Months
  • Alfred I. duPont Hospital for Children
  • Aurora University at Carpentersville Middle School
  • Earl Warren Adult School-California Correctional Center
  • George Washington University at The National Geospatial Intelligence Agency
  • US Army Armor School
  • [etc]

In other cases, the same overarching organization may be listed multiple timesdue to multiple branch or satellite locations. For example:

  • Empire Beauty School
  • Empire Beauty School – Baltimore
  • Empire Beauty School – Boston
  • Empire Beauty School – Cincinnati
  • Empire Beauty School – Framingham
  • Empire Beauty School – Glendale
  • Empire Beauty School – Grand Rapids
  • Empire Beauty School – Kennesaw
  • Empire Beauty School – Laconia
  • Empire Beauty School – Lehigh Valley
  • Empire Beauty School – Malden
  • Empire Beauty School – Peekskill
  • Empire Beauty School – Philadelphia
  • Empire Beauty School – Portland
  • Empire Beauty School – Pottstown
  • Empire Beauty School – Pottsville
  • Empire Beauty School – Queens
  • Empire Beauty School – Somersworth
  • Empire Beauty School – Westminster
  • Empire Beauty School – Wyoming Valley

We also know that some recognized institutions may use a dot com, dot org,dot us, or other non-dot edu domain name for any of a variety of reasons.

Bottom line, it can be complex to find exactly how many dot edu-eligibleinstitutions actually exist. Still, we believe that our list of 7,106 dotedus is 94.4% of Educause’s 7,524 figure, and likely represents virtually alldot edu domains in actual active use.

7. Final Results

We then processed those 7,106 dot edu base domains through our samedomain-to-country-code.py script, resulting in 7,666 unique returneddomain/IP pairs associated with 6,756 unique domains.

687 of those, as shown in Appendix III, were non-US dot edu domain/IP pairs,with 625 unique non-US dot edu domains.

What this all means for you:

  1. Any assumption that not-sharing zone files will keep third parties fromidentifying domain names in routine use is obviously a bad assumption —we’ve demonstrated that we can easily identify virtually all of dot edufrom passive DNS. The implications of this for dot gov and dot mil areobvious: just as we passively enumerated dot edu, an attacker could justas easily passively enumerate the far-more-sensitive dot gov or dot milzones.
  2. Any assumption that dot edus are all US-located is clearly wrong(potentially relevant for crypto deemed-export rules and ITAR-controlledresearch work).
  3. Some who are doing cyber security-related work routinely want to map IPaddresses to country codes, but think that there’s some complicated magicinvolved in doing so. There’s not. The process is simple and easilyaccomplished. That’s one of the things the code provided with the articleis meant to concretely demonstrate.
  4. Farsight is routinely in discussions with potential partners offeringother data enrichment feedss. This article provides a nice exampleof how that process can work, and the synergies that can arisefrom combining passive DNS with other third party data sources.

**Acknowledgements**: The author gratefully acknowledges the assistance of hiscolleague Mr. Gabriel Iovino of Farsight Security, Inc., for insightfulsuggestions related to an earlier draft of this article, although soleresponsibility for the content of this article remains with the author.

Also, thank you Mr. Robert Edmonds, now of Fastly, Inc., for your helpfulcomments.

Appendix I. 2nd-level-dom script

#!/usr/bin/perl
use strict;
use warnings;
use IO::Socket::SSL::PublicSuffix;

my $pslfile = '/your_path_to_the/public_suffix_list.dat';
my $ps = IO::Socket::SSL::PublicSuffix->from_file($pslfile);

my $line;

foreach $line (<>) {
chomp($line);
my $root_domain = $ps->public_suffix($line,1);
printf( "%s\n", $root_domain );
}

Notes:

public_suffix_list.dat

  • can be downloaded here

IO::Socket::SSL::PublicSuffix

  • can be downloaded here

Appendix II. domain-to-country-code.py

#!/usr/bin/python -u
import sys
import dns.resolver
import geoip2.database

myResolver = dns.resolver.Resolver()
myResolver2 = dns.resolver.Resolver()

reader = geoip2.database.Reader('/your_path_to_the/GeoLite2-Country.mmdb')
reader2 = geoip2.database.Reader('/your_path_to_the/GeoLite2-Country.mmdb')

try:
line = raw_input()
except EOFError:
print "Please pipe in a list of domains to process..."
exit()

while line:

# IPv4
clean=0
cleana=0

try:
myAnswers = myResolver.query(line, "A")
except:
clean=1

if clean == 0:
for rdata in myAnswers:
try:
response = reader.country(rdata)
except:
cleana=1
if cleana == 0:
print line,rdata,response.country.iso_code

# IPv6

clean2=0
clean2a=0

try:
myAnswers2 = myResolver2.query(line, "AAAA")
except:
clean2=1

if clean2 == 0:
for rdata2 in myAnswers2:
try:
response2 = reader2.country(rdata2)
except:
clean2a=1

if (clean2 == 0) and (clean2a == 0):
print line,rdata2,response2.country.iso_code

try:
line = raw_input()
except EOFError:
sys.exit(0)

Notes:

python -u

  • unbuffers the output when this script is run; if you are not running the script interactively you can omit the

-u

  • If you don’t care about IPv6, you can omit the IPv6 section of this scriptto just see IPv4 results
  • Obtain GeoLite2-Country.mmdb here
  • Get the Python library required to process the database here

Required Attribution: This paper includes GeoLite2 data created byMaxMind.

Appendix III. Non-US Dot Edus Seen In Our August DNSDB Data

Available here.

Joe St Sauver, Ph.D. is a Scientist with Farsight Security, Inc.