Farsight Long View

Systematically Collecting The Right Sort of Data About Cyber Security Incidents

Written by:

Close-up portrait of a middle-aged man with a mustache wearing a collared shirt outdoors.

Published on:

Sep 17, 2015

Introduction

Given that Farsight Security, Inc. (FSI) is a data-driven cyber securitycompany, it shouldn’t be very surprising that many Farsight staff members havea passionate interest in how data — pretty much any sort of data — getscollected. As data people we know that systematically and consistentlycollecting data maximizes the value of that data, but surprisingly often,people measure even the simplest of phenomena in unexpectedly inconsistentways.

Consider the humble apple. Given that this is autumn, many backyardorchardists are harvesting their trees. Orchardists, like fishermen, hunters,and others who harvest Mother Nature’s bounty, often may be inclined to brag abit about their success.

One orchardist might measure the yield of his apple trees in bushels (weighing40 pounds each), while another with only a few trees might count his modestharvest apple-by-apple. We can attempt to equilibrate those measurements, butthat’s an imperfect process (there’s natural variability from one apple to thenext, and some varieties of apples yield significantly larger and heavier fruitthan others).

It’s a lot easier if everyone can agree to measure a given phenomena the sameway: “our hobbyist orchard society agrees that it will measure backyard appleorchard output in 40 pound bushels.”

It can also matter when we measure: the end of August? The end of September?The end of October? The later we wait, the bigger the apples might become, butthe later we wait, the greater the chance that some apples might become damagedor eaten by deer, birds, insects, or disease, or simple fall to the ground andbe ruined.

Speaking of imperfect apples, when we measure, what are we measuring? Onlyflawless apples picked right from the tree and perfect for eating fresh? Or arewe counting all usable apples, including partially-flawed apples that mightneed to be processed into apple sauce or cider to be acceptable to consumers?

Measuring even simple things can be surprisingly tricky, and if we don’t usecommon units, and agreed upon processes, we might literally find ourselvesunable to compare “apples to apples.”

Cyber Incidents

Measuring cyber incidents (such as PII spills or malware infections or cyberintrusions), is potentially far harder than measuring fruit tree output.

Earlier this month, the Department of Homeland Security (DHS) NationalProtection and Programs Directorate (NPPD)’s Cyber Incident Data and AnalysisWorking Group (CIDAWG) released a new 53 page report entitled EnhancingResilience Through Cyber Incident Data Sharing and Analysis: Establishing Community-Relevant Data Categories in Support of a Cyber Incident Data Repository.

While many government reports may be notorious for tackling obscure topics,having scant readership, and having little if any lasting global impact, thatwill likely NOT be the case for this report.

The CIDAWG report is important. It points to a path forward that will likely help thedeveloping cyber security industry fill a longstanding and significant gap, anddoes an excellent job of proposing a practically usable framework forcollecting information about cyber incidents, both big and small. If thisframework ends up broadly used, we’ll be better positioned to track andunderstand the cyber security incidents we’re increasingly experiencing.

This report defines what matters about cyber security incidents, and whatdoesn’t. That makes this a truly critical report. It also implicitly declareswhat WON’T be measured, and thus what won’t be ABLE to be easily analyzed.That’s another critically important point.

A consistent framework, if clearly and carefully defined, and broadly acceptedand used, lays a foundation for…

Data to be systematically collected and recorded, thereby making itpossible for data to be shared and compared by incident response communitiesboth at home and abroad. This means that your data will be able to becleanly combined or contrasted with my data, and we won’t run into thingssuch as non-comparable data categories* or differences of opinion aboutwhat’s defined to be a new bit of malware**.
Longitudinal trends can be monitored over time, with confidence that changes in reported statistics are due to substantive phenomena, not just differencesin definitions or changes to data collection methodologies.***

Frankly, the adoption of a consistent cyber incident measurement framework is awatershed event, and given the increasing prevalence of cyber securityincidents, one that’s long overdue.

Systematically and Consistently Measuring Phenomena of Interest: A Well-Accepted Idea

Many may find it a bit shocking that we don’t already have a framework of thissort for cyber security security incident data collection since we haveconsistent data collection frameworks for so many other areas of national andinternational concern, including (but not limited to):

Nationwide Uniform Crime Reporting statistics(since the 1930s!) and the UN’s Office on Drugs and Crime
The National Notifiable Diseases Surveillance System (NNDSS) and the World Health Organization’s IHR
The National Weather Service Cooperative Observer Program and the Citizen Weather Observer Program
Automatic Dependent Surveillance-Broadcast (ADS-B), a flight tracking application forcivil aviation]
And the Aviation Safety Reporting System (ASRS)

The Report

With all of that by way of preface, what does the CIDAWG report actuallyrecommend? The report originally sought to identify information that would beneeded to create a database that could be used for cyberinsurance-underwriting-related purposes, but that’s just one of many possiblepurposes to which this data could potentially be put.

Most of the body of the report (report pages 3 through 28) is devoted todescribing and explaining the 16 types of data the group would like to seecollected when cyber security incidents occur. Because you really should readthe entire report itself, I won’t rehash those data types in detail except tonote the 16 major areas called out by the report:

Type of Incident
Severity of Incident
Use of Information Security Standards and Best Practices
Timeline
Apparent Goals
Contributing Causes
Security Control Decay
Assets Compromised/Affected
Type of Impact (s)
Incident Detection Techniques
Incident Response Playbook
Internal Skill Sufficiency
Mitigation/Prevention Measures
Costs
Vendor Incident Support
Related Events

Please see the body of the report for details (truly, it’s well worth a read),or jump to the summary table in Appendix A for an excellent compact summary.

In our opinion, the 16 areas recommended make sense. They seem to do anexcellent job of capturing the right general information associated with cyberincidents, although ultimately the usability of the data collection will dependstrongly on the final “checkboxes” offered as possible responses tocategorical questions, among other things.

The notional cyber incident use cases in Appendix B are also realistic andcredible, and a fine test of whether the right information is being collectedabout incidents. That portfolio of scenarios should be augmented with furtherscenarios in a subsequent report.

Exclusions

Two potential areas of data collection were considered and excluded from thereport’s framework: overall organizational cyber security maturity (ala cybersecurity maturity models), and attack attribution.

It was disappointing to see that the authors of the CIDAWG report consideredbut rejected a simple summary measure of overall cyber security maturity. Asthe report conceded, however, enough indicators are available in what will becollected and reported that a rough assessment of organizational maturity canlikely be derived or imputed. This is a mitigating bone of sorts.

Incident attribution is also excluded from the incident-related areas wheredata collection is recommended. Unquestionably, attribution is oftentechnically hard, but hard questions are often very interesting.

Moreover, if you think of cyber security incident data collection as beinganalogous to classic investigative journalism process, “who” is an integral andnon-extricable part of the classic“5 W’s.”

We also suspect that many victims will be strongly motivated to identify, orattempt to identify, their proximate attacker if/when they are able to do so.

Conclusion

This report is well worth reading. We urge you do so.

We further hope that those who do read it consider adopting the framework itoutlines for cyber security incident reporting and management.

Notes

* Disjoint or non-comparable categories arise when continuous data is binnedinconsistently. For example, one survey might ask if the respondent is under18, 18 to 24, 25 to 33, 34 to 39, or 40 or over. Another survey might ask ifrespondents are 21 or under, 22 to 30, 31 to 50, or 51 or over. Thosecategories simply don’t align.

** Simple example of this phenomena: is “adware” malware, or not? If a malware dropper undergoes minor modifications to make it harder for major antivirus software to detect (but is otherwise unchanged), is that a “new” strain of malware, or just a variant of an existing strain?.

*** To see how changing definitions can matter, consider FCC measurements of broadband deployment. At one time, 4 Mbps down/1 Mbps up was fast enough to count as “broadband” Internet. Recently the FCC changed that definition to 25 Mbps down/4 Mbps up. If you were to look at a plot over time of how many Americans have “broadband” access to the Internet, the watershed date for that definitional change will appear to be a time when many Americans suddenly “lost” broadband access, even though the only thing that changed was the FCC’s definition. If you’d like anexample of how changes to data collection methodologies can have a profoundimpact on statistical results, review “The Tragedy of Canada’s Census,” Feb 26, 2015).

Joe St. Sauver, Ph.D. is a Scientist with Farsight Security, Inc.