Quantitative Assessment of Cyber Security Risk using ...

Quantitative Assessment of Cyber Security Risk using Bayesian Network-based model Sheung Yin Kevin Mo, Peter A. Beling, Member, IEEE and Kenneth G. Crowther, Member, IEEE

Abstract— This paper proposes a quantitative model for assessing cyber security risk in information security. The model can be used to evaluate the security readiness of firms in the marketplace through qualitative and quantitative tools. We propose a Bayesian network methodology that can be used to generate a cyber security risk score that takes as input a firm’s security profile and data breach statistics. The quantitative model enables cyber risk to be captured in a precise and comparable fashion. The objective of the scoring model is to create a common reference in the marketplace that could enhance incentives for firms to invest and improve their security systems. This paper concludes with a demonstration of scoring an intrusion detection network.

I. INTRODUCTION

W

HEN the Internet first became public in the 1990s, it would have been difficult to imagine it burgeoning into the busy hub of commercial activities that it has become today. Most corporate institutions have adopted strategies to move many of their business transactions to the cyber domain providing services such as online banking and shopping. However, the increasing reliance of corporate business on the Internet comes with a hefty price. Transmittance of personal information across the digital network during these transactions and Internet accessibility has allowed a new breed of crime to surface. Network infrastructures and systems have become a hotspot for malicious cyber attacks of all sorts such as viruses, worms, trojans, phishing and spam. A Chronology of Data Breaches reported 98 data breach cases since January 1, 2009 and the victims ranged from the banking industry to educational institutions [1]. The lesson to learn here is that no company, institution or individual is immune to cyber attacks. The increasing prevalence of cyber attacks calls for constant improvements in cyber security systems to protect the confidentiality, integrity and availability of information [2]. On the contrary, there is currently no effective means of improving information security in the marketplace. Many CEOs and managing directors are hesitant to undertake a proactive role in adopting these security systems to protect Manuscript received April 6, 2009. This material is based upon work supported by the Department of Homeland Security through the Institute of Information and Infrastructure Protection (I3P). Points of view in this document are those of the authors and do not necessarily represent the official position of the US Department of Homeland Security. Sheung Yin Kevin Mo is an undergraduate student of Systems Engineering at the University of Virginia. Peter A. Beling is an Associate Professor of Systems and Information Engineering at the University of Virginia. Kenneth G. Crowther is a Research Assistant Professor of Systems and Information Engineering at the University of Virginia.

their corporation’s digital resources because of the knowledge gap between the CEOs and the IT department. The senior management underestimates the level of risk and liability should a cyber attack arise. Moreover, businesses no longer operate as discrete entities and many responsibilities have been outsourced to a third-party usually outside the US. Calls to invest in a more robust security system are often lost in the clamor of near-term profit maximization. Due to the global operations of the business, it is difficult to regulate cyber security systems. Existing benchmark standard sets, such as the ISO 27002, were published to ensure the security of corporate information systems, but often are loosely regulated because of the ever-changing nature of the threats. More than 230 million records were reported lost in a 4-year period in which 59% of breaches could be prevented through standard compliance [3]. Haimes et al. proposes a sustainable risk management process as a mechanism for bridging the divides among various disciplines in cyber security [4]. One of the proposed alternatives is to establish a common reference for a model information security system that will create incentives in the marketplace [4]. The goal of this paper is to demonstrate a feasible scoring approach that effectively evaluates the readiness of a company’s information systems. We hypothesize that such a scoring system could be to a driver for corporations to improve their security systems in the marketplace. II. BACKGROUND A. The Information Asymmetry Problem The indifferent attribute of corporations to improve their security systems is mainly due to the prevalence of information asymmetry problem. Information asymmetry arises in a market transaction when buyers do not receive the same elaborate knowledge on the product quality as the sellers [5]. This problem creates distorted pricing mechanism that fails to differentiate the value of high quality products. As a result, the asymmetry problem discourages investment in producing a high quality product. George Akerlof illustrates how certainty interacts with the product quality and eventually leads to market transparency and investment [5]. By reducing the asymmetry problem, buyers who have adequate knowledge on a specific product are more willing to invest in it due to their higher confidence in decision making [5]. Sellers, on the other hand, realize the need of producing more high quality products. The above scenario provides solid justification for a given mechanism

that is capable of reducing the asymmetry phenomenon and, therefore, provides motivation for upgrading the quality of cyber security systems available in the marketplace. B. Bayesian Network and its Scoring Applications A Bayesian network is a probabilistic graphical model that represents a set of variables and their inter-dependencies [6]. Nodes represent variables and arcs represent conditional dependencies between variables. The main concept of the Bayesian network modeling is embedded in the formal definition of Bayes’ rule, which states that posterior probabilities can be computed given some prior probabilities of observations. One advantage of Bayesian networks is that they provide a natural framework for incorporating data as it becomes available. In Equation (1), the posterior probability P(A|B) is adjusted when the prior value P(A) or the conditional probabilities P(B|A) is updated:

(1) Another basis for Bayesian networks is the total probability theorem, which summarizes a set of prior probabilities to an expected posterior probability in a hierarchical network. Equation (2) shows that the expected posterior probability P(A) is composed of a set of prior probabilities P(A|B1),…,P(A|Bn) with their associated P(Bi) values:

(2) There are several scoring systems based on Bayesian networks that have been successfully implemented. The domain of consumer credit, in which credit scoring models are used to determine a consumer’s eligibility for various consumer products such as credit cards and loans, provides an example of a scoring model that works by receiving data as and when it becomes available. In addition, Morris has successfully applied Bayesian Network as a scoring model to assess Commercial Off-the-Shelf Software (COTS) [7]. The network in [7] is deconstructed in multiple attribute levels that capture the assessment of various critical components. These examples of successful scoring systems establish a basis for applying Bayesian networks to the evaluation of cyber security risk, which is arguably a larger scale application. III. METHODOLOGY A. Overview of the Cyber Scoring Model We created a risk-based model that yields a scoring solution that provides an accurate assessment of risk as well as insight into how various security investments impact risk profiles. The model can be updated as additional information becomes available from external sources and experts. The scoring framework is based on a hierarchical Bayesian network model (Figure 1) in which the influence of specific vulnerabilities (represented by the nodes) can be mapped to

the overall risk score based on threat information (represented by the arcs) derived from the breach statistics.

Fig.1. Vulnerability, threat and risk relationships

By definition, the Bayesian model utilizes information on the given state of the system and the casual relationship through a-priori probabilities. In Figure 1, the given state is the most fundamental level of the network and refers to the vulnerability present in the security system. The overall goal of the model is to determine the final risk score at the highest level of the network. The risk scores, which describe the readiness of a firm’s protection system, are scaled ranging from 0.00 to 1.00. An example of a risk score for a firm could be 0.80, which is indicative of the level of compliance with a prioritized set of benchmarks that could be delivered as XML and embedded into a resource management system. A risk score can be interpreted as the likelihood that a company meets 80% of the highest priority benchmarks. The methods to estimate this risk score will be discussed in detail under section IV. The benchmarks could be standardized through the adaptation of standards such as ISO 27002 and would guide the data acquisition processes for scoring. We expect that such a scoring system would be monitored by an external organization that is part of a consortium of companies who agree to report breaches openly. This practice is essential for maintaining this scoring model in the Bayesian Network. B. Model Construction The scoring model for cyber risk is constructed in a threestep process. First, attributes of a good quality security system and their interrelationships are identified and converted into a network structure. The identification process can be facilitated by hierarchical holographic method (HHM), which captures the characteristics of a large-scale security system in a holistic manner [8]. Figure 2 demonstrates that resource-driven security can be broken into various attributes and their hierarchical relations.

system’s readiness of attack pathway in the network through qualitative and quantitative assessment. B. Qualitative Assessment

Fig. 3. Overview of an intrusion detection network.

Fig. 2. Resource-driven security attributes and relationships.

Second, a standardized questionnaire profile is created to evaluate these attributes as probabilistic values. The questionnaires are structured according to the reference benchmark sets such as ISO 27002 standards. The set of attributes values is a reflection of the level of standard compliance by the firms. Lastly, the strengths of different attribute relationships are represented as prior values, in which they are determined from existing data breach statistics. These prior values are used in the network as conditional probabilities to compute the expected posterior probabilities using the total probability theorem, i.e. the risk scores of attributes at a higher level in the network. C. Model Implementation After the qualitative and quantitative setup of the network, the structured questionnaire will be given to firms who would like to participate in anonymity. Their responses are first computed into the model as attribute scores. These scores are then used to determine score for nodes at a higher level in the network using the total probability theorem. Given the attribute scores and the strength of the attribute relationship from data breach statistics, a final risk score can be generated to indicate the firm’s cyber risk readiness. Specific areas of deficiency in their security systems can also be addressed.

Figure 3 is an example of a network model for scoring intrusion detection based on the ISO 27002 standard and a data breach analysis. An intrusion detection network is constructed based on identifying different attack pathways, which are utilized by data thieves “as interface to gain access to corporate systems and conduct nefarious activities” [3]. These attack pathways are separated into four main groups: Remote and Access Control (AC), Physical Access (PA), Wireless Network (WN), and Web Application (WA). Each group is expressed as a unique node in the scoring network, and contains causal relationships with the overall score for Attack Pathway prevention. These relationships are represented as arcs pointing from the lower parent nodes to the higher child nodes, and their relationships can be quantified using conditional probability distributions derived from some measure of threat frequency or risk. For instance, effective Remote and Access Control leads to a higher likelihood of preventing successful attack pathway prevention. The basic features of the network are derived from a benchmark standard. After these features are established, the next step is to design specific questions for each node based on their relevance to standard benchmarks. Questions for each node depending on the extent to which the company has fulfilled the necessary standard related to the node. The data breach analysis incorporates a benchmarking schema mapped from standard to specific attack pathways. For example, the Access Control section from ISO 27002 has been adopted as reference questions for Remote and Access Control failure shown in Figure 4.

IV. SCORING MODEL ILLUSTRATION A. Intrusion Detection Network Overview The following section illustrates the application of the Bayesian methodology to an intrusion detection network, which is a component of cyber security risk. This section aims to demonstrate how the proposed scoring mechanism using a Bayesian Network was constructed on a benchmark standards set (e.g., ISO/IEC 27002) and prioritized by a collective breach report. The overall objective is to score the

Fig. 4. Remote Access Control questions mapped from ISO 27002 standards.

A company answering true to a particular question is equivalent to successful fulfillment of one of the standard’s

requirement, while answering false corresponds with incomplete fulfillment. Other terms, such as unsure may be added for completeness. It is expected that many such answering process could be semi-automated in resource systems.

Fig. 5. True/False/Unsure as representations of standard requirement fulfillment.

C. Quantitative Assessment The network requires data to determine prior values enabling the standardized questionnaire to be mapped to a risk score. It is important to incorporate data from reliable statistics and expert input in the quantitative assessment. One of the advantages that the Bayesian network approach offers for the generation of a scoring system is the ability to incorporate additional data as it becomes available. Data can be updated through modifying the prior values in the conditional probability table (summarized subsequently). In order to translate statistics to the required prior value in the table, we refer to an illustrative example in Figure 6.

Fig. 7. Prior values in conditional probability table and sub-score from questionnaire

The sub-scores for each component are determined through a proportional measure. In Figure 7, the score of Remote and Access Control is 0.40 provided that it has two standard fulfillments within a total of five questions. Although each question is equally weighted, this measure provides a feasible and organized approach for the network to obtain a proportional score from questionnaires. An alternative approach is to employ Analytic Hierarchy Process which enables priority assigned to more significant components in a complex system and therefore allows all components for comparison in a rational and consistent way. D. The Scoring Mechanism After prior values derived from statistics and sub-scores determined from questionnaires are generated, the network is complete in both its qualitative and quantitative assessments. The scoring mechanism proceeds with a series of calculations to determine the score of a higher child node and similarly to the resource-driven security score. The Attack Pathway score will be illustrated as an example in Figure 10.

Fig. 6. Common attack pathways statistics (Verizon Data Breach Report 2008)

The graph in Figure 6 summarizes the four attack pathways along with the percentage of cases in which they were exploited across the approximately 500 cases in the Verizon Breach Report [3]. Percentages of breaches are normalized to provide a consistent scale for generating prior values. According to the relative normalized percentages, prior values are entered in the table. The probability of attack pathway prevention given all the components is true (i.e. fulfilling the benchmark) is 1.00. The prior values are expressed in the conditional probability table listed in Figure 7. The data extraction process applies to other categories that will be defined accordingly to resource-driven security.

Fig. 8. Intrusion detection network to determine attack pathway prevention sub-score.

The scoring mechanism incorporates the application of the total probability theorem, which states that prior probability of some events provides information to the expected value of posterior probability. Prior values and sub-scores generated from questionnaires are applied in the equation to determine a higher child node score, i.e. attack pathways score. Prior determined values and sub-scores from nodes {AC, PA, WN, WA} are computed into the equation in Figure 9.

Fig. 9. Total probability theorem and attack pathway score application.

risk-scoring model is to gradually implement it in individual market sectors where data is easily accessible and abundant (such as the financial industry). However, firms may not be willing to participate as it requires substantial usage of resources in terms of time and money. More importantly, a large sample size of participants is required to produce a statistically significant result to justify the use of this scoring model. An alternative is to conduct a simulation to predict how the model would function given different scenarios. We can design it to include data from real decision makers in a wide spectrum of sectors. The cyber risk-scoring model tested in this experimental setting will provide more information on the efficacy of the scores albeit being predictions. Hence, potential firms who are interested would gain more confidence in adopting this model. By analyzing the feedback from participants and their scoring outcomes, the risk-scoring model can be refined accordingly and be widely used in the marketplace in the near future. REFERENCES [1] [2]

[3] [4] Fig. 10. Attack pathway score.

Utilizing the same scoring mechanism, the resourcedriven security base score can be determined from prior values and the scores from its parent nodes, i.e. attack pathways and other categories. The node forwarding mechanism enables the resource-driven base score to capture the current state of the system in a quantitative fashion. V. FUTURE RESEARCH Our hypothesis is that successful implementation of a cyber risk-scoring model, along the lines described above, would relieve the information asymmetry phenomenon and result in a more transparent market structure. A more transparent market structure would, in turn, foster an environment in which companies are encouraged to better equip their security systems. Companies, upon attaining better scores, can now switch their focus towards tackling specific weaknesses in their security systems. Increased level of market competition would drive greater investments in cyber security and the advancement of security technology in the marketplace. The more indirect implication of this scoring model translates to an overall improvement in our infrastructure against cyber threats. Future work would be aimed towards implementing the cyber risk-scoring model in the marketplace. The objective is to test how the risk-scoring model would behave given adoption of participants in the market. One way to test the

[5]

[6] [7]

[8]

Privacy Rights Clearinghouse (2009). A Chronology of Data Breaches. Available: http://www.privacyrights.org/ International Standard Organization. ISO/IEC 27002:2005 Information technology -- Security techniques -- Code of Practice for Information Security Management. Available: http://www.iso27001security.com/html/27002.html/ P. Tippett. (2008) 2008 Verizon Business Data Breach Investigations Report. The Verizon Business RISK Team, Verizon Business. Y.Y. Haimes, B.M. Horowitz, J.H. Lambert, J.R. Santos, K.G. Crowther. (2008) Harmonizing and Uniting the key technical discipline for risk management of cyber security. New Hampshire: Institute for Information Infrastructure Protection, Dartmouth College. G. A. Akerlof. (1970) The market for lemons: Quality uncertainty and the market mechanism, the Quarterly Journal of Economics, 84(3): 488-500. F.V. Jensen (1996). An Introduction to Bayesian Networks. New York: Springer. Morris, A.T. (2004) A Bayesian network-based scoring methodology for COTS software. (Doctoral dissertation, University of Virginia, 2004). Dissertation Abstracts International, 65(4), 2072B. Haimes, Y. Y. (1981). Hierarchical holographic modeling. IEEE Transactions on Systems, Man, and Cybernetics, 11, 606-617.