Secure Data Aggregation and Intrusion Detection in ... - IEEE Xplore

Secure Data Aggregation and Intrusion Detection in Wireless Sensor Networks P. Raghu Vamsi and Krishna Kant Department of Computer Science and Engineering Jaypee Institute of Information Technology, Noida, India. [email protected], [email protected]

Abstract—Data Aggregation (DA) is a technique of data gathering in Wireless Sensor Networks (WSNs). It provide advantages such as reporting consolidated data, reducing data redundancy, improving network lifetime etc. However, deploying WSNs in hostile and remote environments presents security vulnerabilities that can lead to various security attacks such as energy based attacks, attacks on data aggregation etc. Numerous secure DA techniques have been proposed in the literature. However, lightweight models using Trust Monitoring System (TMS) and Intrusion Detection Systems (IDS) are limited. This paper presents a secure data aggregation framework for Wireless Sensor Networks (WSNs) using TMS at node level and IDS at Base Station (BS) side. Each node in the network assesses the behavior of its neighbors using trust ratings and performs the network activities such as cluster head selection, data aggregation, and reporting to the BS. Then, BS analyzes the received information using IDS and reports the information about the malicious activities back to nodes in the network. In this way, the proposed model identifies and isolates the malicious nodes from the data aggregation process. Simulation results show the effectiveness of this model. Index Terms—Data aggregation, energy efficiency, intrusion detection system, malicious activity, security, trust monitoring system.

I. I NTRODUCTION Wireless Sensor Networks (WSNs) are highly distributed networks with tiny and low cost sensing and communication gadgets, called Sensor Nodes (SNs). Distributed architecture, ad hoc, wireless communication and others made them suitable for a wide variety of applications ranging from domestic appliances to battlefields [1]. In recent years, WSNs have gained significant use in the applications smart grids, smart cities, machine-to-machine communication etc. The task of each SN is to sense and report the data to the Base Station (BS) using localized routing decisions. In a large WSN, reporting the data by each node to the BS is not possible because of their limited transmission range. To achieve this, WSNs adopt in-network Data Aggregation (DA) process. DA is the process of gathering and summarizing the data for statistical analysis. Any DA process includes data gathering and routing. Directed Diffusion (DD) [2], Sensor Protocol for Information via Negotiation (SPIN) [3] is the baseline protocols for DA process. However, DA process should be energy efficient due to resource limitations of SNs. To enhance lifetime of WSNs, hierarchical clustering protocols have been evolved. In these protocols, a set of SNs chooses an Aggregation Node (AN) among them and reports the gathered information to it. In

978-1-4799-6761-2/15/$31.00 ©2015 IEEE

turn, AN aggregates the collected data using an aggregation operation such as SUM, MAX, MIN, etc., and report it further to any AN or to the BS. This entire process increases the network lifetime and discards the duplicate data. Low Energy Adaptive Clustering Hierarchy (LEACH), Power Efficient Data Gathering in Sensor Information Systems (PEGASIS), Hybrid Energy Efficient Data Gathering (HEED) and others [4],[21], are the examples of this category. However, when the WSNs are deployed in hostile and remote environments, an adversary can physically capture and tamper the SNs so that the tampered nodes exhibit malevolent nature and thwart the network operations. Research studies in the field of WSNs have shown that the DA process will seriously affect when the malicious nodes are present in the network. However, conventional cryptography methods are proven best in identifying external attacks but they are unable to identify internal attacks. It is because an adversary can launch security attacks using the keys obtained from compromised nodes. To support routing security, several security models using a human behavior pattern called trust have been evolved in the recent years. The current work proposes a framework for secure data aggregation using Trust Monitoring System (TMS) at the node level and Intrusion Detection System (IDS) at the BS level. Each node in the network assesses the behavior of its neighbors using their behavior in performing the network activities such as cluster head selection, data aggregation, etc., and reports to the BS. Then, BS analyzes the received information using IDS and report about the malicious activities back to nodes in the network. In this way, the proposed model identifies and isolates the malicious nodes from the data aggregation process. With the extensive simulations, it is noted that the proposed framework is robust in detecting and isolating the malicious nodes from the DA process. It is also noted that using the proposed framework the network lifetime is greatly improved as compared to other trust aware DA methods. The remainder of the paper is structured as follows. Section II provides background of trust based data aggregation protocols. Section III explains the proposed data aggregation using TMS and IDS in WSNs. The simulation study showing efficiency of the proposed model is presented in Section IV. Finally, Section V concludes the paper.

127

II. BACKGROUND There are various secure data aggregation methods available in the literature [5,6]. Many researchers have proposed the DA process with the use of cryptography methods. These methods can assure confidentiality, authentication and integrity. In addition to these features data privacy is also required a great attention. Piyi et al. [7] proposed a privacy preserving data aggregation scheme. This method detects the active and passive compromising attacks, coalition attacks, etc. This method has stages such initialization of nodes, preparation for key agreement and re-keying. This algorithm has been devised to achieve constant communication overhead. Miloud et al. [5] described various data aggregation scheduling algorithms in WSNs. Scheduling is an important task in DA because consider the scheduling methods like TDMA in which a fixed time slot will be allotted to each node in the cluster by the cluster head. Such data aggregation feature can be defined in three dimensions when to aggregate, where to aggregate and how to aggregate. Feng et al. [6] presented sensor data collection issues, challenges and approaches on the basis of deployment, control message dissemination and data delivery. Fei et al. [8] proposed a DA method using trust concepts in LEACH protocol to prolong the network lifetime. Without loss of functionality of LEACH protocol, the authors have associated direct and indirect trust values to perform vital steps of hierarchical routing such as cluster head selection and routing in the presence of malicious nodes in the network. In addition, a cluster based assisted monitoring scheme has been proposed to reduce the energy consumption. Hongmei et al. [9] proposed trust aware in-network data aggregation. It is devised with the use of well defined trust and reputation models. The DA process has the following stages: collection of the aggregation input and checking for inconsistencies. All the inconsistencies are dropped and the consistent data is processed for trust evaluation. After the trust evaluation trustworthiness information of the nodes are reported to process reliable and trustworthy data. Bjorn et al. [10] considered trust composing for DA as a multi-criteria decision problem and proposed a framework using Gaussian probability function with Byzantine decision making. With this method, it has been shown that the trust variability has been achieved with respect to node behavior. Along with the trust based DA, false data injection detection by intruders and fault tolerance is also required to receive genuine data from the source. Sandya et al. [11] proposed false data injection elimination in heterogeneous WSNs. However, the authors have utilized Jaobian Elliptic Chebyshev Rational Map to generate the security key and to provide key agreement. Yan et al. [12] proposed trust based fault tolerant DA method. In this, the authors have utilized subjective logic trust model and its chaining in a tree fashion to achieve fault tolerance in DA process. Mengfan et al. [13] proposed a method for improving lifetime of WSNs using a shortest path data aggregation trees. Sankar et al. [14] proposed synopsis diffusion method to filter out the attackers impact on the DA process. The network is organized in terms of hierarchy. Leaf nodes report the data to the aggregation node and the aggregated data further to another

aggregation node or to the BS. In this way, the aggregation nodes filter the values using synopsis diffusion and the false data injected by the attackers will be dropped. After cryptography and trust management, Intrusion Detection System (IDS) [17, 18][20] is used as the second level of defense against security attacks. IDS takes the observation data and a set of rules to detect outliers from the available data. Maintaining observation data, historical data, predefined rules, applying such rules on data, etc., require high processing power and can consume more energy. Placing an IDS on each node can shorten the lifetime of the network. Further, SNs are not capable of processing such complicated rules. So, the best way of using IDS is at the BS level. BS will be having uninterrupted power supply, high processing power and storage, and can control the network. With these features, installing IDS at BS level is a suitable alternative. Hence, this paper presents a DA method using TMS at node level and IDS at BS level. III. S ECURE DATA AGGREGATION AND I NTRUSION D ETECTION A. Network Model Consider a set of N sensor nodes SN = s1 , s2 , . . . ., sn (include malicious nodes) are placed in M XM m2 area, and the communication takes place in hierarchical fashion. Each node has a limited transmission range R. BS periodically trigger the setup phase in the network. Upon receiving the setup message, SNs elects an Aggregation Node (AN) (also called the cluster head) to report the sensed data. The task of AN is to aggregate the collected data and report to another AN or to the BS. In this way, a hierarchy could be formed to send the data from the SN to the BS. Each node contains a symmetric key with which the data is encrypted and sent to the AN. Each node makes use of promiscuous mode of network interfaces. Using this mode, a node can observe the packets which are passing through its radio range. B. Trust Based Secure Data Aggregation The proposed model consists of two stages: the setup phase and the steady state phase. During the setup phase an AN is selected and during steady state phase data transmission takes place. These two states are repeated with a time period. The setup phase is vital because a malicious node should not be selected as AN. If AN is malicious then the data never reach the BS there by huge packet loss will takes place. To select a trustworthy AN, each node needs to observe the sincerity in the network activities carried out by its neighboring nodes. Based on the observations, each node asses the direct trust of its neighbors. Based on the trust value the AN is selected. Among the network activities, packet forwards, packet integrity and energy, etc., are important to check in order to assess the trust opinions. Each node must be obedient in meeting the cryptographic primitives such as confidentiality, integrity and authentication. With these network observations the direct trust is computed as follows

128

1 88

2 45

3 35

4 50

5 44

6 36

7 87

8 32

9 50

10 46

11 38

12 43

13 82

14 39

15 83

16 85

17 41

18 43

19 47

20 44

Final Rating 52.9

15 83

16 85

17 41

18 43

19 47

20 44

Final Rating 42

(a) Reputation ratings using [16] 1 88

2 45

3 35

4 50

5 44

6 36

7 87

8 32

9 50

10 46

11 38

12 43

13 82

14 39

(b) Reputation ratings validation using [15]

NI NI

1 4 88 10 88

2 10 45 4 45

3 8 35 4 35

4 12 50 6 50

5 11 44 6 44

6 6 36 6 36

7 4 87 11 87

8 10 32 5 32

9 5 50 5 50

10 6 46 6 46

11 9 38 9 38

12 8 43 8 43

13 2 82 12 82

14 6 39 6 39

15 1 83 1 83

16 4 85 11 85

17 6 41 6 41

18 4 43 4 43

19 7 47 7 47

20 9 44 9 44

Final Rating 41.43 45.85

(c) Reputation validation with the proposed method (NI-Number of Interactions)

TABLE I: Reputation ratings and validation

Packet forwards (A1 ): Each node observes the sincerity in packet forwards and sincerity in providing acknowledgments. Sincerity in packet forwards present the coordination and cooperation in forwarding the packets. Whereas network acknowledgment represents sincerity in providing acknowledgment in the packet reception and forward. • Packet Integrity (A2 ): Each node observes the sincerity in packet integrity. It deals with the sincerity in maintaining the data integrity and sincerity in node authentication. • Energy Information (A3 ): Each node periodically beacon the energy information. Wrong energy information can mislead the AN selection and there by packet loss occurs. So the energy information is computed as the fraction of energy consumed as follows E0 − Et Ere = (1) E0 where, E0 is the initial energy, Et is the energy information provided at time t. With these observations a node computes the direct trust of its neighbors. Direct trust computation consists of two components: weight assignment and expectation computation. Each network activity is given a weight to multiply with the expectation of it. The sum of products of the weight value and corresponding expectations provides the direct trust. It is an analogy from social sciences that to assess or form trust of a person, there should be sufficient number of interactions with him. Number of Interaction (NI) plays a major role in predicting the node behavior. So, in the proposed model the weight is assigned to each network activity based on the NI. Let A1 , A2 , and A3 are the the activities, then the NI of these activities are N I(A1 ), N I(A2 ), and N I(A3 ) respectively. With these values, weights of each activity are computed as •

N I(Ai ) (2) TNI where TNI is the total number of interactions with all activities. It is computed as W (Ai ) =

TNI =

3 X i=1

N I(Ai )

(3)

With this, the sum of weights of all the activities will remain in [0,1]. To assess the expectation of the the observed activities, Beta expectation is used. Let α is the number of positive experiences and β be the number negative experiences, then the reputation score of the node is computed using Beta probability density function [19]. However, the probability value is small and computing density function has no use. So the Beta expectation helps in solving the expectation of activities. Let α and β are r positive and s negative observations with r = α + 1 and s = β + 1. The resultant expectation E(Ai ) is given by r+1 (4) r+s+2 With the expectation and weights, the direct trust (DT) of a node i is computed as E(Ai ) =

DTi =

3 X

W (Ai ) ∗ E(Ai )

(5)

j=1

Since the sum of the weights of the network activities remains in [0,1], the computed direction trust also remains in [0,1]. However, DT value represents floating point which require more memory for storage. So, DT value is reduced as follows DTiR = ceil(DTi ∗ 10)

(6)

R

The reduced DT value remains between 0 and 10 using ceil function. It rounds the computed value to a nearest integer value. For example, let a node’s direct trust value is 0.75 then using Eq. (6), reduced DT value is 8. To store, transmit or receive this value, 4 bits are sufficient (since 4-bits support up to 15). Since the energy consumption of a node is calculated via number of bits transmitted and received, the reduced DT value can substantially reduce the energy consumption. C. Reputation ratings collection and validation Along with DT, Reputation Ratings (RR) are also important in the case of newly initialized nodes. Research studies have shown that combining DT and RR can improve the selection

129

of trustworthy nodes to forward the data packets. However, a major concern in this combination is how the RR is validated. A malicious node can perform reputation based attacks such as ballot stuffing and bad mouth attacks to pollute the RR. In bad mouth attack, a malicious node will broadcast false reputation ratings to damage the trustworthiness of its neighbors. On the other hand, in ballot stuffing attack a malicious node broadcast self reputation rating as high value in order to opt it as trustworthy node. So, validating the reputation ratings is crucial in deciding the final trust. The values in the Table I can be obtained by replacing ceil(DT ∗ 10) with ceil(DT ∗ 100). The values in Table I is used for easy understanding and discussion. Many solutions have suggested to validate the reputation ratings sent by nodes. The authors in [16] have suggested a simple mean value as the consolidated trust rating. However, mean value has less accuracy. In [15], the authors proposed a method to validate reputation ratings by extending D-E theory. However, validating a node reputation ratings with respect to the NI is missing. Since NI play a vital role in assessing the behavior of a node, neglecting this factor can lead to low accuracy in the final validation. To overcome this, the proposed model uses NI to validate the RR using a method proposed in [15]. Assume that 20 nodes have sent RR about a node. Table 1 shows that list of RR values sent by 20 nodes, then using [16] method the validated RR is 52.9. With [15] method, validated value is 42. However, in either case method [16] found to be optimal. When the number of successful interactions with each neighbor is considered (Table I(c)), the RR and NI need to be taken into account for validating the final RR. For example, nodes 1, 7, 13, 15, 16 have reported RR about a node more than 80. This value is very high as compared to the simple average. But these nodes has very low NI as compared to other nodes. With the final value obtained by applying method [16] and multiplying it with Eq.(2) for each node and summing all RR values, the final rating has appeared as 41.2. This value is lower than 42 (using method [16]) and greater than average. In the converse case, if nodes 1,7,13, 15,16 have interactions 10,11,12,1,11 then there will be a certain value to such nodes opinions. The absolute value of RR with the proposed method is obtained as 45.85. This value is too lower than average (method [15]) and slightly greater than 42 (method [16]). Therefore, it is clear that NI will significantly impact the final RR values. Since the final RR is the validated value, the final trust of a node is computed using average of DT and RR values. Final trust value is used in performing network activities such as cluster head election, data aggregation, etc., in the future rounds. D. Intrusion Detection System During the DA process, BS can receive the consolidated data at regular intervals or depending on the BS request. During each DA interval, nodes report the malicious information about the neighboring nodes to the aggregation node. Finally, BS receive consolidated data along with the malicious report of

each node in the network. This information is of small amount as compared to the data reported by nodes. It means that the underlying data reporting follow the normal distribution and the variance of the malicious activities is unknown. This problem can be modeled using Student’s t-distribution with v degrees of freedom as follows Γ( v+1 ) f (t) = √ 2 v πvΓ( 2 )

1+

t2 v

− v+1 2 (7)

f (t) is t-distribution function, Γ is the Gamma function. p When, let T = Z/ Y /v, in which Y is chi-square distribution with v degrees of freedom with Z has the normal distribution, then T has t-distribution f (t). During the aggregation collection time if BS receive n samples from the normal distribution with mean µwith unknown variance σ 2 , then test statistic can be defined as t=

x−µ √ s/ n

(8)

With the user configured false positive and false negative rates, the BS tests the test statistic against two competing hypothesis H0: Node is benign H1: Node is malicious Then, BS collects the information of all nodes for which the H1 is accepted. BS broadcasts the details of malicious nodes back to the network in the next round of setup phase. By this, all normal nodes will be vigilant about the malicious nodes while performing network operations. E. Application of trust and IDS During the setup phase, each node in the network chooses AN using the threshold function used in LEACH protocol. However, before sending the CH join message each node verifies the CH advertisement message is received from a benign node. Each node choose a valid node then sends a join message to the corresponding cluster head. In case if no CH found in the list, then SN directly broadcast the data to BS. IV. S IMULATION S TUDY The proposed model is simulated using MATLAB. A network of 100 x 100 square meters is considered in which 100 nodes are randomly deployed. Each node has a transmission range of 250 meters and the BS is assumed to be positioned at the center of the simulation area. First order radio model used in [3] has been utilized for computing energy consumption. It is considered that there are 10% (10 nodes) of malicious nodes in the network. These nodes attempt to disrupt the DA process. The proposed method has validated against LEACH, T-LEACH, Yanbing et al. [15] method. Fig. 1 plots the number of alive nodes vs. number of rounds in the presence of 10% of malicious nodes in the network. It is noted from the graph that, since LEACH do not have the trust mechanism to assess the behavior of the nodes, its performance is greatly affected by the malicious nodes. Nodes lost their

130

[2]

[3]

[4]

[5] [6]

Fig. 1: Number of alive nodes vs. rounds

[7]

energy by 1500 rounds. Fist Node Death (FND) has started from 1200 round and the Last Node Death (LND) is observed at 1500 rounds. When T-LEACH is implemented, due to the trust mechanism in handling network operations, the FND and LND are raised to 1600 and 2500 rounds respectively. TLEACH works with direct trust of a node. It will not consider the secondary trust opinions. When DT and validated RR are combined to compute the final trust value, each node in the network gains the ability to accurately detect and isolate the malicious nodes from the DA process. It is found from Yanbing et al [15] method, that FND and LND were raised to 2100 and 3500 rounds respectively. The proposed method has the merit of considering NI values and the caution messages obtained from the BS. In addition, DT R occupies 4 bits. With the validated RR and low communication overhead, the FND and LND were raised to 2500 and 3500 rounds respectively. With the proposed method, LND is found similar to [15] but the FND has improved by 400 rounds. To conclude, with the advantage of validated RR values with NI, caution messages from BS, and validated finalized trust values, the proposed model performs better and FND has increased as compared to other trust models.

[8]

[9]

[10] [11]

[12]

[13]

[14] [15]

[16]

V. C ONCLUSION This paper presents a secure data aggregation methods using TMS and IDS. Each node evaluates the trust of its neighboring nodes using the trust management system. Due to the complexity of an intrusion detection system, it has been placed at the BS. With the computed trust values and caution message sent by the base station, each node in the network gain the ability to dynamically detect and isolate malicious nodes from the data aggregation process. The simulation results have shown that the proposed method outperforms the existing models. Applying the proposed model on the real test bed is left as future work.

[17] [18]

[19] [20]

[21]

R EFERENCES [1] T. Arampatzis, J. Lygeros, and S. Manesis, “A survey of applications of wireless sensors and wireless sensor networks,” in Intelligent Control,

131

2005. Proceedings of the 2005 IEEE International Symposium on, Mediterrean Conference on Control and Automation, pp. 719–724, IEEE, 2005. C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed diffusion: a scalable and robust communication paradigm for sensor networks,” in Proceedings of the 6th annual international conference on Mobile computing and networking, pp. 56–67, ACM, 2000. W. R. Heinzelman, J. Kulik, and H. Balakrishnan, “Adaptive protocols for information dissemination in wireless sensor networks,” in Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking, pp. 174–185, ACM, 1999. J. Hur, Y. Lee, S. Hong, and H. Yoon, “Trust-based secure aggregation in wireless sensor networks,” in Proceedings of the 3rd International Conference on Computing, Communications and Control Technologies, vol. 3, pp. 1–6, 2005. M. Bagaa, Y. Challal, A. Ksentini, A. Derhab, and N. Badache, “Data aggregation scheduling algorithms in wireless sensor networks: Solutions and challenges,” F. Wang and J. Liu, “Networked wireless sensor data collection: Issues, challenges, and approaches,” Communications Surveys & Tutorials, IEEE, vol. 13, no. 4, pp. 673–687, 2011. P. Yang, Z. Cao, X. Dong, and T. A. Zia, “An efficient privacy preserving data aggregation scheme with constant communication overheads for wireless sensor networks,” Communications Letters, IEEE, vol. 15, no. 11, pp. 1205–1207, 2011. F. Song and B. Zhao, “Trust-based leach protocol for wireless sensor networks,” in Future Generation Communication and Networking, 2008. FGCN’08. Second International Conference on, vol. 1, pp. 202–207, IEEE, 2008. H. Deng, G. Jin, K. Sun, R. Xu, M. Lyell, and J. A. Luke, “Trust-aware in-network aggregation for wireless sensor networks,” in Global Telecommunications Conference, 2009. GLOBECOM 2009. IEEE, pp. 1–8, IEEE, 2009. B. Stelte and A. Matheus, “Secure trust reputation with multi-criteria decision making for wireless sensor networks data aggregation,” in Sensors, 2011 IEEE, pp. 920–923, IEEE, 2011. M. Sandhya, K. Murugan, and P. Devaraj, “False data elimination in heterogeneous wireless sensor networks using location-based selection of aggregator nodes,” IETE Journal of Research, vol. 60, no. 2, pp. 145– 155, 2014. Y. Sun, H. Luo, and S. K. Das, “A trust-based framework for faulttolerant data aggregation in wireless multimedia sensor networks,” Dependable and Secure Computing, IEEE Transactions on, vol. 9, no. 6, pp. 785–797, 2012. M. Shan, G. Chen, D. Luo, X. Zhu, and X. Wu, “Building maximum lifetime shortest path data aggregation trees in wireless sensor networks,” ACM Transactions on Sensor Networks (TOSN), vol. 11, no. 1, p. 11, 2014. S. Roy, M. Conti, S. Setia, and S. Jajodia, “Secure data aggregation in wireless sensor networks: filtering out the attackers impact,” 2014. Y. Liu, X. Gong, and C. Xing, “A novel trust-based secure data aggregation for internet of things,” in Computer Science & Education (ICCSE), 2014 9th International Conference on, pp. 435–439, IEEE, 2014. H. Alzaid, E. Foo, and J. G. Nieto, “Rsda: reputation-based secure data aggregation in wireless sensor networks,” in Parallel and Distributed Computing, Applications and Technologies, 2008. PDCAT 2008. Ninth International Conference on, pp. 419–424, IEEE, 2008. F. Bao, R. Chen, M. Chang, and J.-H. Cho, “Trust-based intrusion detection in wireless sensor networks,” in Communications (ICC), 2011 IEEE International Conference on, pp. 1–6, IEEE, 2011. F. Bao, R. Chen, M. Chang, and J.-H. Cho, “Hierarchical trust management for wireless sensor networks and its applications to trust-based routing and intrusion detection,” Network and Service Management, IEEE Transactions on, vol. 9, no. 2, pp. 169–183, 2012. A. Jsang and R. Ismail, “The beta reputation system,” in Proceedings of the 15th bled electronic commerce conference, pp. 41–55, 2002. K. K. Raghu Vamsi, “Systematic design of trust management systems for wireless sensor networks: A review,” in Advanced Computing & Communication Technologies (ACCT), 2014 Fourth International Conference on, pp. 208–215, IEEE, 2014. P. K. Batra and K. Kant, “Stable cluster head selection in leach protocol: a cross-layer approach,” in Proceedings of the 7th ACM India Computing Conference, p. 15, ACM, 2014.