People say that once something’s on the internet, it’s there forever. And while that seems to be unfortunately true for Facebook posts and embarrassing photos, it isn’t always the case for information that may be useful for an investigation. In this blog post, I’ll discuss a couple of ways of finding information that has disappeared from the internet, as well as how to save something important to your investigation that you are concerned may be taken down at some point.
Finding archived stuff
At times, there may be information that is useful to an investigation that is no longer easily accessible on the internet. If you have the URL of the site you need to access, it may be possible to find it on an archive site.
For example, back in December 2018, the House Committee on Oversight and Reform released a report on the Equifax data breach from September 2017. In it, they described a timeline of the incident and several factors they believed contributed to the actor gaining access to the data. Recently, I went to look for this report again to show to another colleague and discovered it was mysteriously no longer available on the House Oversight website, nor was it cached by Google.
Since I had the former URL of the report from having sent it to a colleague back in December, I was able to check to see if it had been archived on various archive sites, and found that it was still available in its entirety on archive.today.
Finding cached stuff
Sometimes, finding a former version of a webpage can be even easier than locating it on an archive site. When searching in Google for an older version of the page, it may be possible to find a cached version of the site. It may be slightly outdated, but if the current version is completely gone, an outdated version may be better than nothing.
To find a cached version of a site, search for it in Google. If the URL of the result has a green arrow to the right of the result, it may have a cached version available.
Click the green arrow to see the menu. Select the “Cached” option to view the cached version of the webpage.
The cached version of the webpage will identify at the top of the page what date it is from. In this case, I took this screenshot on 1 May 2019, but the cached version is from 9 April 2019. This means that any changes made to the DomainTools blog between 9 April and 1 May are not included in this version of the site.
For research, an outdated version of a site may be good enough, so it’s still worth it to check for the cached site.
Additionally, if a web page has changed recently, it may be possible to find the cached version without the change and take a screenshot so you have it for your research if needed. For example, Chronicle recently released a blog post regarding their research on the GOSSIPGIRL supra threat actor. Initially, the blog post contained a screenshot of a slide from a CSEC presentation called “Pay attention to that man behind the curtain: Discovering aliens on CNE infrastructure.” At some point after the article was published, Chronicle went back and blurred out most of the slide that wasn’t relevant to their blog post.
However, using the Google cache, it is still possible to find the earlier version that contains the clear version of the slide.
Archiving stuff that may be useful later
When conducting research, you may come across a page that’s extremely useful to your investigation. If you’re worried that page may disappear from the internet, you can archive it to make sure it can be found later. The only caveat to this is that anybody will be able to access it later, including those with nefarious intentions or the subject of your investigation. So keep that in mind and don’t burn your investigation by archiving an extremely sensitive site.
If you do choose to archive something, sites like archive.org and archive.today are easy-to-use options. Let’s say I came across an interesting Twitter account that may be useful in an investigation later. I can search for the account to see if its already been archived (though of course, the archived version may be outdated).
In this case, I can see that my own Twitter account has never been archived on archive.today. If, for whatever reason, I felt archiving this account might be beneficial to my investigation, I can do so by entering the URL on the homepage of archive.today and clicking the “Save” button.
Now, anyone who searches for my Twitter account on archive.today will be greeted with an exact replica of the way my account looks today (2 May 2019).
In summary, archived and cached versions of websites can be extremely useful during the course of an investigation. Tools like archive.org, archive.today, and Google cache are good starting points, and there may be other tools out there that suit your investigative needs better. If you come across a useful site during an investigation, archiving it may save your investigation if the site is later removed or changed. Just remember that archiving a site may alert an actor that he or she is being researched, and do so at your own risk. I hope this blog post was helpful!