A Hash-based Pseudonymization Infrastructure for ... - Semantic Scholar

2 downloads 128 Views 257KB Size Report
pseudonym is used to name an object, but only .... the tree topology that can be found in the domain name ..... [5] Website of I2P; http://www.i2p.net; 2005.
A Hash-based Pseudonymization Infrastructure for RFID Systems Dirk Henrici, Joachim Götze, and Paul Müller University of Kaiserslautern, Germany {henrici, j_goetze, pmueller}@informatik.uni-kl.de

Abstract Many proposals have been made to solve the privacy implications of RFID systems: The main idea to ensure location privacy is to change the identifiers of RFID tags regularly. For building inter-organizational RFID systems, pseudonyms can be used to provide a link to the respective owner of a tag without affecting location privacy. Based on these considerations, in this paper a pseudonymization infrastructure is presented that is based on one-way hash functions and thus is a better fit for the specific demands of resource scarce tags than approaches based on public key cryptography.

1. Introduction and Related Work Pseudonymization is often done to conceal the real identity of a user or a device for privacy reasons. A pseudonym is used to name an object, but only legitimate parties or several parties together (shared trust) have the necessary information to link a pseudonym to the real object. In 1981 David Chaum published concepts for using pseudonyms for communication over unsecured networks [1]. He used public key cryptography and mix servers to hide the sender and the content of messages and allowed the receiver to send replies. Based on this, the concept of “onion routing” was developed by David Goldschlag et al. [2]. This works became the basis of untraceable pseudonymous remailers (e.g. [3]) and today’s anonymizing networks like Tor [4] or I2P [5]. Today’s systems are much more sophisticated than former ones and contain many improvements to guard against weaknesses like traffic analysis: For instance, messages are reordered in intermediate network nodes to counteract timing attacks.

All the systems are built upon the same basic principles: The messages are routed over multiple network nodes with the intention that each node increases the achieved level of protection. The design of the systems ensures that only multiple nodes together can establish a link between the sender and the receiver. This means that trust is shared among several entities. In consequence, even if one or even several of the network nodes on the message path share their data with an attacker, the latter cannot link the communicating parties. Bases on the principles underlying these infrastructures, a basic building block for a pseudonymization infrastructure is the following concept: For creating a pseudonym, one selects a number of network nodes that shall ensure ones privacy. Each of the nodes has a private key of an asymmetric cryptosystem and has published the corresponding public key. Now one enciphers ones “identity”, i.e. in this case ones real communications address, with the public key of one of the network nodes and appends the communications address of that node. That data is enciphered with the public key of another network node whose network address is appended afterwards. This procedure is performed for all network nodes that shall be employed. The resulting data can be used as pseudonym for the real address because it contains all information that is required to route a message to the real address without revealing it. In figure 1 is shown how such a pseudonym is used to route data to the address of the receiver. In that example two intermediate nodes are employed. The sender sends its data to the address noted in the pseudonym. Upon reception “Node A” deciphers the next address layer using its private key. Then the node forwards the message to the node whose address is found in the deciphered data. “Node B” performs the same algorithm. In the example “Node B” is the last node and thus by deciphering the data with the node’s private key gets the real address of the receiver.

Proceedings of the Second International Workshop on Security, Privacy and Trust in Pervasive and Ubiquitous Computing (SecPerU'06) 0-7695-2549-0/06 $20.00 © 2006 IEEE

Figure 1. Concept of routing onions as pseudonyms

Due to the layered encryption, only all the chosen intermediate nodes together can derive the path between sending and receiving party. Within onion routing, the layering of encrypted data layers is called “routing onions”, because onions also consist of many layers. A system like the one explained can be used in two ways: The first way, the receiver generates a pseudonym and gives it to the sender instead of his real address. The intention is here to hide the receiver’s real address from the sender. The second way, the sender knows the real address but creates and uses pseudonyms for sending messages to the receiver. In this case, the intention is to conceal the communication relationship to attackers eavesdropping one or more hops of the communications path. This is important if an outsider shall not get to know with whom the sender is communicating. By changing the used pseudonyms it is also possible to hide how frequent communications with certain receivers take place. In the next section we motivate the concept to be proposed and explain an intended application scenario. Afterwards the new concept is introduced and some considerations regarding security and scalability are made before the paper ends with a conclusion.

2 Motivation for Creating a New Concept The need for an alternative concept to the one explained in the introduction emerges when dealing with location privacy in RFID systems. RFID tags usually contain a static identifier, in case of the Electronic Product Code (EPC [6]) consisting of manufacturer, product type, and a unique serial number. As explained and discussed in the literature about privacy in RFID systems as well as in press, such an identifier can be abused (e.g. [7], [8]). One of the problems is the violation of “location privacy”: If tags are affixed to objects and these objects are carried by people, e.g. clothing or a wrist watch, an attacker can use the identifiers to recognize people. If the attacker has control of tag readers at places where a person dwells, he is even able to track the person’s location and create a location profile.

Several solution proposals have been published, for instance “Randomized Hash-Locks” [8] or the “Proposal of Henrici and Mueller” [9] amongst many others. The general idea is to change the identifier of a tag regularly. This has to take place securely and in a way that only legitimate readers can link an identifier to the corresponding tag data. An attacker must not be able to correlate identifiers of the same tag so that he is not able to recognize it any more. Besides the problem of changing identifiers securely, it is difficult to achieve scalability: “Randomized Hash-Locks” do not scale in any way; proposals like the one from Henrici and Müller scale much better but are still not well suited for large scale, i.e. decentralized and inter-organizational, systems without constricting privacy. The schemes focus on securing the insecure RF channel between tags and reading party and presume that only legitimate readers have the required information to uniquely identify tags despite their changing identifiers. In open, inter-organizational RFID systems in which an operator of RFID readers shall be allowed to read tags of different organizations, these organizations would need to reveal the mentioned information to the operators of the readers. This is not wanted for security reasons since the reading parties would need to be trusted completely.

Figure 2. Mapping problem of interorganizational RFID systems with changing identifiers If the information is not revealed, a problem occurs when a reading party shall have access to data of tags of several organizations (see figure 2) and the concept of changing identifiers is used: When the reading party queries a tag it gets the current tag identifier. But this tag identifier must not contain any information that could be used by an attacker to recognize the tag or to

Proceedings of the Second International Workshop on Security, Privacy and Trust in Pervasive and Ubiquitous Computing (SecPerU'06) 0-7695-2549-0/06 $20.00 © 2006 IEEE

Figure 3. Infrastructure Topology recognize the person carrying objects with tags by observing tag constellations. Because of that a tag identifier must not contain information that allows conclusion to which organization a tag belongs to. Therefore, the reading party has no idea which organization to ask or inform about the tag and would have to use the try and error principle. This limits scalability. A pseudonymization infrastructure would be a solution to this problem: A tag would play the role of the sender and the responsible backend the role of the receiver. Then the RFID tag could create a pseudonym for the responsible backend (i.e. generating pseudonyms by the sender, see previous section). If in each step of the encryption a random number is added, different pseudonyms for the same receiver can be created. As the address of the first intermediate node could be used as means for unwanted tracking, it must be omitted so that the first of the intermediate nodes must always be the same. It works as a kind of “root node”. As it could be a single point of failure and be a bottleneck, there must be enough mirrors for it to share the load and obtain reliability. The resulting pseudonym would not reveal any exploitable information about the tag or the responsible backend to an attacker, but the reading party would be able to use the pseudonym to contact the responsible backend. It is a problem that much effort is needed to create a pseudonym, because for each intermediate node to be used an enciphering step using asymmetric cryptography needs to be performed. This is far beyond the capabilities of RFID tags; even rather expensive ones cannot perform such calculations within a reasonable time. Another problem is that, due to the block-size of the used ciphers, the pseudonyms are too long to be efficiently used as identifiers, because they would require a long time to transmit over the slow air interface of RFID tags. Because of that a more lightweight concept that is tailored to the application would be appreciated.

Thus, the concept to be created shall share the positive characteristics of the scheme described but get by with the low resources available in RFID tags: • • • • • • •

Provide adequate security for the intended application Less resources required for generating pseudonyms Optimized pseudonyms, i.e. size less than 1Kbit, without degrading security Offline generation of pseudonyms possible, i.e no Internet connection required No central authority required (for scalability and security reasons [shared trust]) Stateless operation (this is for scalability reasons) Avoid writing database access in nodes (this is also for scalability reasons)

The solution presented in the following still requires too many resources to be applied using low cost tags. Nevertheless, it might be applicable to more powerful tags and explores some new directions for research towards privacy respecting, interorganizational RFID systems. Note that the presented infrastructure is not intended to be used as a replacement for other RFID protocols like the ones mentioned. The infrastructure shall only help to identify the responsible organization in charge of a tag without revealing information that can be abused for tracking. It does not protect against threats like mimicking, so that other RFID protocols are still required.

3 A Hash-based Infrastructure

Pseudonymization

One-way hash functions (see e.g. [10]) are regarded as a feasible primitive for implementing security functionality in RFID tags [11]. Thus they are used in many proposed protocols like in the “Randomized

Proceedings of the Second International Workshop on Security, Privacy and Trust in Pervasive and Ubiquitous Computing (SecPerU'06) 0-7695-2549-0/06 $20.00 © 2006 IEEE

Hash-Lock”-approach [8] or in the protocol of Henrici and Mueller [9]. Because of that it is self-evident to try to use hash functions also for implementing the required pseudonymization infrastructure. It makes sense to share trust by using multiple intermediate nodes like in mix networks. Hereby, the concept needs to ensure that each node can derive only the needed information out of the pseudonym data that is required for finding the address of the next node along the path. The security of the concept must obviously be based on the one-way characteristic of the employed hash-function. Thereby one can differentiate a legitimate node and an attacker by the amount of information that is present, for instance a legitimate node is in possession of data in a database that is used to reduce search space whereas an attacker is not and needs to search the entire search space to derive some information out of a hash value. Based on these considerations, we propose a new concept for pseudonymization infrastructures based on one-way hash functions and introduce it in the following sections. For demonstrating and practically testing the presented concept, one of our students implemented a demonstrator that proved the feasibility of the approach in form of a pseudonymous mailing system.

the corresponding node’s address is never revealed to any other nodes. The path through the infrastructure tree from the root node to the receiver becomes set when the receiver selects a leaf node on joining the system. Then the receiver gets assigned an identifier “D”. The mapping between receiver’s real address and the receiver’s identifier D is stored in a database table of the selected infrastructure leaf node. Note that no other intermediate nodes need to perform an operation when a new receiver joins the system. This is very important for the scalability of the system. On joining, the receiver gets the node identifiers Ni of the intermediate nodes on his communication path. The path itself can be kept private because for creating pseudonyms only the receiver’s identifier (D) and the node identifiers of the nodes on the path (Ni) need to be available. Thus the receiver needs to create pseudonyms by himself or needs to give these identifiers to all parties that shall be able to create pseudonyms. For the mentioned RFID systems, the RFID tags will act as “sender” and therewith need to get these identifiers. Each intermediate node has a database table that links node identifiers of its child nodes to the hostnames of the child nodes. In another table, rows containing

3.1 Infrastructure Topology

h(k , N child ), k , N child

As already explained in the section 2, the first of the intermediate nodes must always be the same, since a selector would be a means for identification itself and thus could be abused by an attacker. This first node of the infrastructure will be called “root node” (see figure 3). This root node has a number of child nodes, each of whom has a number of child nodes as well. This way, a hierarchical tree topology is created which is similar to the tree topology that can be found in the domain name system (DNS, [12]) which proved to be powerful and well scalable. The tree topology has the advantage that it is well structured and that the number of child nodes of each node is in the same order of magnitude when building the tree balanced.

3.2 Node Identifiers and Infrastructure Architecture Each of the nodes in the infrastructure assigns an identifier to its child nodes. The identifier will be denoted by “N” in the following and should be chosen in such a way that it cannot be guessed. It can for instance be created by concatenating the nodes hostname with a random number and then using the hash value of that string. The mapping between N and

∀ N child , k ∈ [0, k max ) ⊂ Ν +0

are present, whereby h(x) denotes the one-way hash function and “kmax” is a natural number that is used to create a tradeoff between the number of different pseudonyms for a particular receiver that can be created and the space that is required in the database. Obviously, count(Nchild)*kmax rows need to be stored in that database table. A higher kmax increases the level of privacy that the infrastructure is able to offer. If kmax is the same for all nodes in the path then n = kmaxt is the number of different pseudonyms that can be created for a particular receiver whereby t is the depth of the tree topology, i.e. the number of nodes on the path (intermediate nodes plus receiver) less the root node, so that t=4 in figure 3. As n grows exponentially, we suggest t to be five or more whereby kmax should be at least 105 to get a reasonably high result without putting to much burden on individual nodes.

3.3 Pseudonym Generation and Decomposition For creating a pseudonym in an infrastructure of depth t, the receivers identifier Nt-1=D and the node identifiers Nt-2..N0 as well as t random numbers Rt-1..R0 are required. Then a pseudonym P is the following vector: P = ( pt −1, pt − 2 ,..., p0 ) where pi = f R0 ( f R1 ( ... f Ri−1 ( h( Ri , N i ))...))

Proceedings of the Second International Workshop on Security, Privacy and Trust in Pervasive and Ubiquitous Computing (SecPerU'06) 0-7695-2549-0/06 $20.00 © 2006 IEEE

[

in which Ri ∈ 0, k max ) ⊂ N 0 and p0 degrades to +

p0 = h( R0 , N 0 ) . h(..) again is the one-way hash-function. fr(x) is an invertible function that scrambles its preimage using the parameter t. It can be based on a simple bitwise XOR operation:

f r ( x) = x ⊕ s where s = r ⋅ k max + r ⋅ k max j

j −1

j

+ ... + r ⋅ k max + r = r ¦ k max 1

i

i =0

in which j is chosen in such a way that the domain of s is greater or equal than the domain of x but as small as possible within that restriction. If kmax is a power of 2 then calculating fr can be implemented very efficiently using bitwise XOR and binary shift operations only. The pseudonym P consists of an element for each node on the path to the receiver. The nearer a node is to the receiver the more the hash value of the node identifier is scrambled by the random numbers that are not known to any intermediate node in the infrastructure or to the receiver. So these random numbers make the nodes depend on each other to derive the hash values and are also used to be able to create different pseudonyms for a single receiver. Due to the hash calculations, each pi has the length of the output of the employed hash function, e.g. 128bit for MD5 or 160bit for SHA 1. Message forwarding within the infrastructure is done as follows: The root node receives the message and looks for p0 in its database so that it obtains the used random number R0 and the node identifier of the next node on the path. Using R0 it strips the outer fR0 functions off the other elements of the pseudonym vector by applying the inverse function f-1R0. Then the element p0 is dropped so that P consists of one element less. With that obtained node identifier, the node can look up the address of the next node in its database and thus knows where it needs to forward the message to. The same procedure is performed by the other nodes along the path to the receiver. Finally, the concerned leaf node of the infrastructure analogously obtains the receiver’s identifier, looks up the corresponding real address in its database, and forwards the message to the receiver whereby P has gone completely.

3.4 Security Main goal of the approach is not to transmit data that can be abused for unwanted tracking over the air interface of RFID tags: To achieve this, each tag can create n = kmaxt different pseudonyms, so that an attacker cannot use the pseudonyms for tracking purposes. Linking different pseudonyms naming the

same receiver without knowing the node identifiers of the nodes along the path to the receiver would require inversion of the one-way hash function. But if an attacker had this information he would be able to link different pseudonyms: Here, the proposed solution is weaker than an approach using asymmetric cryptography, because having the required data for creating pseudonyms for a particular receiver would in that case not enable linking of different pseudonyms. But as an attacker does not have this information in the RFID scenario without spending much effort (e.g. physical extraction), the proposed approach is applicable. General security of the approach depends mainly on the privacy of the mapping between node identifiers and node addresses as well as the knowledge which node identifiers exist on each hierarchy level. Because of that, as already mentioned, each mapping should – corresponding to a private key in onion routing – only be known to the respective responsible node. To raise the level of security, the communication links between nodes should be encrypted to counteract attackers on the communications path within the infrastructure, for instance by using symmetric cryptography whereby the keys are only known to the two communicating nodes on each link. Additionally, the f(x)-function should be chosen in such a way that the order of its applications with different parameters should not be interchangeable, i.e. applications of the function should be non-commutative. Concerning the message flow within the infrastructure, vulnerabilities are analogous to the ones within the described asymmetric cryptography approach. For curing these vulnerabilities, most of the enhancements published for onion routing (e.g. constant size messages, message reordering and injection of dummy messages) are also applicable for the newly introduced concept. But focus of this paper lies on introducing a new concept for securing the RFID air interface so that we do not go into details concerning attackers within the infrastructure here. The main disadvantage of the infrastructure in the form proposed is that it is static and that thus security cannot be ensured in the long term. It cannot be prevented that the node identifiers of the nodes within the infrastructure become public, since the tags need to store the ones along their respective path for pseudonym generation. As there is no other private component, an attacker can rebuild the database tables of the nodes. This makes him able to at least do tracking by constellation if not more. A solution to this problem is making the infrastructure more variable. There should be more than one path to a single receiver so that it is possible that two tags belong to the same responsible party

Proceedings of the Second International Workshop on Security, Privacy and Trust in Pervasive and Ubiquitous Computing (SecPerU'06) 0-7695-2549-0/06 $20.00 © 2006 IEEE

although their node identifiers are not equal. Additionally, there should be more private information so that an attacker cannot rebuild the database tables just by knowing the relatively few node identifiers within the system. A detailed description of such a security enhanced version of the pseudonymization infrastructure is currently created and will be published in a subsequent paper.

3.5 Optimization and Scalability As the hash values are only used as an index within the database table of a node, one can shorten the values by only using the first m bits of a hash value. This can reduce the size of pseudonyms further but on the other hand increases the probability of an index that is not unique which would make a message undeliverable. When using full size hash values (e.g. 128 bits) the probability for hash collisions and thus non-unique indexes is negligibly low. After selecting a path through the infrastructure and obtaining the required node identifiers for pseudonym generation, the path remains static and the nodes need to be available to deliver a message. This is similar to the particular nodes that a routing onion contains. For practical applications it might be useful to arrange for a backup path. Concerning scalability, the presented approach has the advantage that writing operations in the databases in the leaf nodes occur only when a new receiver is added and in the other nodes only when a new child node is added. Because of that, nodes can be very efficiently mirrored, so that even the root node can be made that redundant that it does not become a bottleneck.

4 Conclusion In this paper a new concept for creating a pseudonymization infrastructure was introduced. The basic primitive are one-way hash functions. Messages are forwarded by cooperating intermediate nodes, thus following the principle of shared trust. Using a hierarchical topology and due to database entries that need only be changed seldom, the infrastructure is well scalable. Security is based on the one-way properties of the employed hash function and the data security of the databases in intermediate nodes (mapping of node identifiers to node addressed) and the node identifiers within the infrastructure. Albeit the security characteristics are for particular attacks not as good as the ones of onion routing, the approach is suited for application in ubiquitous devices like RFID tags: The presented concept needs much less resources for creating pseudonyms than approaches

based on asymmetric cryptography and has the advantage that pseudonyms can be made much shorter while still providing a high number of different pseudonyms for each receiver. Although the infrastructure is currently vulnerable to collecting node identifiers and abusing them, the research done is a good first trial in creating a pseudonymization infrastructure that is not based on asymmetric cryptography. We are very confident, that the mentioned weakness can be eliminated so that such an infrastructure becomes usable in practical applications in which infrastructures based on asymmetric cryptography are too resource consuming.

5 References [1] Chaum, D.: Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms; Communications of the ACM; vol. 24/2, pp. 24-88, 1981 [2] Goldschlag, D. et al.: Onion Routing; Communications of the ACM; vol. 42/2, pp. 39-41, 1999 [3] Danezis, G. et al.: Mixminion: Design of a Type III Anonymous Remailer Protocol; IEEE Symposium on Security and Privacy, 2003 [4] Dingledine, R. et al.: Tor: The Second-Generation Onion Router; Usenix 13th USENIX Security Symposium, 2004 [5] Website of I2P; http://www.i2p.net; 2005 [6] EPCglobal: EPC Tag Data Standard Version 1.1 rev 1.27, see http://www.epcglobalinc.org/standards_technology/ specifications.html, 2005 [7] Sarma, S. et al.: Radio-Frequency Identification: Security Risks and Challenges, RSA Laboratories Cryptobytes, Vol. 6, No. 1, 2003 [8] Weis, S. et al.: Security and Privacy Aspects of LowCost Radio Frequency Identification Systems, First International Conference on Security in Pervasive Computing (SPC), 2003 [9] Henrici, D.; Müller, P.: Hash-based Enhancement of Location Privacy for Radio-Frequency Identification Devices using Varying Identifiers, PerSec'04 at IEEE PerCom, 2004 [10] Menezes, A. et al.: “Handbook of Applied Cryptography”, chapter “Hash Functions and Data Integrity”, pp. 321-383, CRC Press, 1996 [11] Weis, S.: Security and Privacy in Radio-Frequency Identification Devices, Massachusetts Institute of Technology, 2003 [12] Mockapetris, P.: RFC1034, Domain Names – Concepts and Facilities, Network Working Group, 1987, see e.g. http://www.ietf.org/rfc/rfc1034.txt

Proceedings of the Second International Workshop on Security, Privacy and Trust in Pervasive and Ubiquitous Computing (SecPerU'06) 0-7695-2549-0/06 $20.00 © 2006 IEEE