Security Aspects of P2P Networks

Security Aspects of P2P Networks Ilja Livenson ilja [email protected] October 24, 2006 Abstract The article describes some common security related aspects of P2P networks.

1

Introduction

In order to talk about security side of peer-to-peer networks it makes sense to first think about what P2P network actually is. Perhaps the most characteristic feature of the P2P network is the ability of the nodes to act both as servers and clients (servants in Gnutella terminology). A number of aspects is associated with the P2P system, e.g resource discovery, routing, identity management etc. With each such aspect comes the security problem which might be different from the problems in the “traditional” client-server applications. For instance, clearly a distributed identity management is a much harder problem than a centrally managed list of identities.

1.1

Motivation for attacks

There’s a lot of controversy connected with the P2P networks, mostly due to the question of legitimacy of their usage. A major opponent of the P2P networks is entertainment industry that claim that P2P networks are the main source of pirate video, audio and software and should thus be brought down. It is interesting to note that it is not the first time the entertainment industry makes its main anti-piracy effort to attack the distribution network [4]. In general, the idea of fighting is to make the piracy as expensive as possible. The price is clearly a function of the costs of extracting the content and distributing copies. If one-time extraction costs e and the per copy distribution is d, then the total price for distributing n copies is e/n + d

(1)

When the world was young and the consumer-writable media was very expensive d was the main factor hindering the distribution of the pirate media. However, the flourishing of the audio- and videotapes and the widespread of the consumer VCRs led to the situation where the cost of the per-copy distribution dropped dramatically. So the industry tried increasing another term - e. This was done by enforcing VCR to be built with technology that would refuse to record audio and video from the copyrighted sources, at same time 1

manufactures of the content players were forced to include some kind of antipiracy mechanisms into their production. This all lead to the increasing value of e and thus the total cost for the content distribution. But it’s hard to hold the piracy at bay and the next technological advance that made the distribution cheap again were digital content players and cheap digital media. They drove the costs related to purchasing and writing media almost to nothing additionally eliminating the problem of copy degradation. This was a strong strike against the recording industry, but even stronger was the integration of the end user into the distribution process - first peer-topeer networks. Napster made sharing the music so cheap that the market of the pirated content exploded. This led to this extensive legal attacks on the Napster, which finally ended in success mostly due to the fact that Napster wasn’t a peer-to-peer network in the sense that we currently use. Most of the services, like file index, were kept centrally. Thus bringing down the main index server resulted in the complete stop of the network activity. Napster follow-ups, such as Kazaa, Gnutella and BitTorrent, were designed in the way that allows to survive even if part of the network is brought down. Therefore the industry now tries to increase distribution costs by employing law suits against specific users and servers to terrorize the rest of the peers or technical attacks against P2P network usability. The first one can be relatively successfully tackled by introducing into the system the notion of anonymity (second section handles anonymity in more details), the second mean and its countermeasures will be explored in the third section of the article. But of course, P2P networks are not limited to the file sharing - P2P computational networks (so called “BOINC style computations”, one of the most known is “Seti@HOME”) are widely known and used for the tasks, requiring large amount of CPU cycles. The motivation behind such types of networks is to gather (or scavenge) unused CPU cycles of commodity PCs for some (mostly scientific) need. The natural security problem that arises is the reliability of the results, produced by some remote PC - the result might be corrupted either accidently or intentionally. For example, if a pharmaceutical company would like to use resources of some P2P computational network for analysis of the effect of a new drug on a human organism it must somehow make sure that the following situations are taken care of: • If the analyzed data is confidential then it must be impossible or very hard for the computing peer to steal it. Achieving it through encryption/obfuscation of the data is one option. • It must be possible to make sure that the computation really took place and that the result is not just a adversary falsification. Introducing “witnesses” of the computation might help solve this problem.

2

Anonymity

A natural desire of the peer would be to have control over its information that is leaked into the system, be it the documents that it requests or the files that it shares. In a perfect world (yes, I know it’s a arguable sentence) everything would be anonymous and thus safe to use. But what does everything mean? What kind of information could e.g. RIAA get from the network and use it 2

for launching law suits? And what properties must the system have in order to evade this threat? Roger R. Dingledine [2] has classified the following characteristics of the anonymity in respect to the network systems: • Author anonymity. Perhaps the most often used characteristic, which means that the other of some document cannot be identified. An example of such anonymity is “anonymous remailers”. • Publisher anonymity addresses the identity of the entity that introduced the file into the system. Note the difference from the author anonymity it is not always that these two entities coincide. • Reader anonymity is present when it is not possible to identify the reader of the document, i.e. the person that downloads it from the network. For example, the user might not want to reveal his download interests due to their immorality or legitimate problems. • Server anonymity means that the server at which the document is located is not known. Basically that means that if an identifier of the document there’s no way to prove that this or that server currently possess or shares the document. • Document anonymity means that server cannot find out what contents it stores or is helping to store even if the communication among all servers is allowed. • Query anonymity refers to the situation when during the process of the query response the server learns nothing new about the document. That is the “identity” of the document is not revealed to the server. The situation is called private information retrieval (PIR). Each item from this list could be implemented in different ways. For instance, to ensure that the query anonymity holds there are 2 main methods: • information-theoretical PIR - using clever mathematics [3]; • computational-intensive PIR - requesting a large number of different files thus masking the document that is actually retrieved.

3

Attacks on P2P networks

As mentioned in the introduction attacks on the P2P networks often have the goal of increasing the distribution costs of the pirated contents and making the network either unusable or unattractive to use by peers [4]. According to Wikipedia the examples of attacks against P2P networks include (but are not limited to): • poisoning attacks (e.g. providing files whose contents are different from the description) • polluting attacks (e.g. inserting ”bad” chunks/packets into an otherwise valid file on the network) 3

• defection attacks (users or software that make use of the network without contributing resources to it) • insertion of viruses to carried data (e.g. downloaded or carried files may be infected with viruses or other malware) • malware in the peer-to-peer network software itself (e.g. distributed software may contain spyware) • denial of service attacks (attacks that may make the network run very slowly or break completely) • filtering (network operators may attempt to prevent peer-to-peer network data from being carried) • identity attacks (e.g. tracking down the users of the network and harassing or legally attacking them) • spamming (e.g. sending unsolicited information across the network - not necessarily as a denial of service attack) Perhaps the most typical attacks are on the confidentiality, integrity and availability of the network. In the following let’s examine them in a bit more detail.

3.1

Confidentiality

Breaches of network confidentiality have two positive outcomes for the attacker: increase of the distribution cost and acquisition of the information that can be used to attack the integrity and availability of the system. Typical approaches for attacking include eavesdropping, traffic analysis and client impersonation. Encryption of the traffic can help, but only against the very simple attacks - there’s no way encryption can help to ensure that the party on the other side of the line is not the attacker. Therefore anonymity aspect mentioned in the previous section is of high importance for the system.

3.2

Integrity

Attack on the integrity of information in a peer-to-peer system may be done through the introduction of degraded-quality content or by falsification of the content. In the context of music, these attacks have included introducing noisy recordings or falsely labeling songs. Attacks on the integrity also includes attacks on the information describing the operation of the peer-to-peer network, such as the network’s topology and routing information, which may corrupt communication or even prevent users from accessing the network. This leads to a significant increase in the per-copy distribution cost. Although methods exist for making the system more resilient to such attacks (e.g. reputation systems that enable users to rate the validity of contents and those, who provide it), no existing system is immune to the integrity attack given a large enough number of the malicious users.

4

3.3

Availability

In a typical file-sharing peer-to-peer network it is much cheaper to issue a search request than to process it. This fact can be used to overload network with search requests thus lowering its ability to response to the “useful” queries of the peers. Another possible availability attack includes dropping the packages and requests of the other peers. To fight such attacks a number of methods were developed, for example asking the client to solve some puzzle before issuing the search request. Analogous approach was proposed to fight spam - if the computational costs for sending out spam would become reasonably high that would make spam unprofitable. Another approach is to use the reputation system mentioned before to track the utilization of network resources, but this is an easily corruptible system once the attacker has enough of “his” nodes in the system.

3.4

Byzantine Generals Problems

Most of these attacks and the corresponding defense mechanisms are closely related to the “Byzantine Generals Problems” - a theoretical problem of trust in the situation of conflicting information providers. The “legend” behind the problem is as follows: the Byzantine army has surrounded the city and the generals want to work out an attack plan. Without loss of generality we could assume that they want just to decide whether to attack today or not. The generals can communicate with each other only by messengers. So some of the generals vote for this plan, the others against. The majority (like always) wins. No the problem is complicated by the fact that there might be traitors among generals voting for the non-optimal decision. For example, if nine generals decide on the optimal strategy which is to attack and the votes of the loyal generals are distributed as four to four, the traitor would then vote against thus forcing the selection of the suboptimal solution, that is to miss the perfect days for attacking and conquering the city. Or even worse send to some loyal generals answer that she will attack and to the others that she won’t. In this case the loyal generals would take different decisions and thus suffer possibly even more. In case an algorithm can be found such that all loyal generals agree unanimously on their strategy it is said that the system is “Byzantine fault tolerant”. The solutions to this situation were proposed in 1982 by Lamport, Shostak and Pease [1]. The first solution does not use any cryptographic machinery but requires that the number of traitorous generals would be less than one third. The situation where more than one third of the generals is traitorous cannot be solved due to the following observation: consider three generals, A, B and C. And let A be the traitor, who tells B to attack and C to retreat. Of course, B and C communicate with each other forwarding the reply of A, but the problem is that neither B nor C can figure out who’s really the traitor - both of them could also be traitors and thus could have forged A’s message. Moreover, it is shown that if m is a number of traitors, then the solution requires at least 3m + 1 generals in total. The second solution is a bit more elegant, involves usage of unforgeable signatures (e.g. by using PKI) and can cope with any number of traitorous

5

generals (although for n < m + 2, where n is a number of generals the problem doesn’t make any sense). In both cases the communication required for the algorithm to work is quite a large number which makes the straight-forward implementation of the algorithm impractical with the growth of n. Therefore, smart optimization techniques are needed to make the system “bullet-proof” against malicious generals/users.

3.5

Sufficiently secure networks

When designing an attack on something (not necessarily computer system) it is always reasonable to consider costs of attacks and the profit that the attack might bring. Rupert Gatti et al [5] argues that the same approach is reasonable for attacks on the peer-to-peer networks. In other words - assumption that attacker has infinite resources and possibility to do almost everything, but for the computationally hard tasks, is a too strong assumption. And propose an analysis model in which both the attacker and defender can choose the level of resource commitment, that is how much would they want to spend on attacking/defending the network, or, to be more precise, a document that is available in this network. The model also derives an utility function both for the publisher (defender) and the attacker. Consider a network with n nodes. If a publisher can decide on the number d, to which the document should be deployed, and the attacker can decide on the proportion x of the nodes that she will attack, then expected utility functions both for attacker (EUa ) and defender (EUp ) are: EUa EUp

= Va xd − ca nx = Vp [1 − xd ] − cp d

(2) (3)

Where Va and Vp are the utilities that the corresponding parties gain if their goal is achieved (attacking all nodes with document replicas in case of the attacker and having at least one node with document uncorrupted for the publisher), ca and cp are the costs associated with attacking and publishing. It is shown that for a basic model where the cost of compromising each extra node is linear the attacker’s best strategy is either to attacker the whole network or do nothing. However, for the non-linear cost of attack it is possible to find Nash equilibria when the attacker attacks only some of the nodes and publisher publishes also only to a fraction of nodes.

4

Summary

In this article several security aspects of P2P systems were reviewed starting from the motivation for attacks on the network and ending with an economical model for a “weaker” security system that argues that peers and attackers are “reasonable” and try to maximize their utility function. Different classes of the network anonymity have also been discussed. A sound support for anonymity is becoming a must for any P2P system as industrial organizations are becoming more and more aggressive in their quest to bring down peer-to-peer systems.

6

References [1] L. Lamport, R. Shostak, M. Pease. “The Byzantine Generals Problem”. 1982. ACM Transactions on Programming Languages and Schemes. [2] R. Dingledine. “The Free Haven Project: Design and Deployment of an Anonymous Secure Data Haven”. 2000. [3] T. Malkin. “A Study of Secure Database Access and General TwoParty Computation”. 2000 .Ph.D. Thesis [4] S. Schechter, R. Greenstadt, M. Smith“Trusted Computing, PeerTo-Peer Distribution, and the Economics of Pirated Entertainment”. 2003. [5] R. Gatti et al. “Sufficiently Secure Peer-to-Peer Networks”. 2004. Workshop on Economics and Information Security.

7