Paper Title (use style: paper title)

Content Registry – the distributed database of content localization for the PI CAN network Sławomir Nowak, Piotr Pecka, Mateusz Nowak Institute of Theoretical and Applied Informatics ul Bałtycka 5, 44-100 Gliwice, Poland { emanuel, piotr, Mateusz }@iitis.pl

II. THE ST RUCT URE OF PI CAN NET WORK CAN networks can be designed on the basis of different presumptions. Jacobson’s approach [1] is evolutionary uses the existing TCP/IP stack and expands current overlay networks solutions (peer-to-peer networks). On the other hand, NetInf project [4] shows a revolutionary approach, proposing full new stack of protocols, replacing the network layer protocol (e.g. IP) by a new protocol operating on the level of content identifiers.

Keywords- CAN networks; content addressin;, distributed content database

Section II presents the structure of PI-CA N network and describes the role of the CR services. Section III presents related works, known solutions and concept for the localization problem and the CR solution, based on the COLOCA N algorithm. The imp lementation and performance evaluations of the CR are presented in section IV.

AAA Server

Content Mediator Content Registry

Content Consumer

I. INT RODUCT ION The increasing amount of multimedia content available in the Internet led to the concept of CCN (Content-Centric Networks), known also as CAN (Content-Addressable Networks or Content-Aware Networks) [1]. CAN, having knowledge about content that can be delivered, is able to locate desired content for the user. In the CAN networks the role of address plays an Identifier. The user can see CAN as “blackbox”, being expected to simply deliver desired content according to unique Content Identifier (CId). In case that more copies of the content object exist in the network, CAN is able to provide the “best” instance of content with the required QoS parameters, resulting best multimedia quality for the user. Thus one of the key elements of the content delivery system is determin ing the physical location of the content based on CId. The paper describes the imp lementation and evaluations of the Content Registry (CR) mechanism, wh ich stores a set of physical location of the content is in a distributed database. The implemented solution was developed to meet the requirement of Parallel Internet CAN (PI CA N) [2] network in the Future Internet Engineering project [3].

Content Search Server

Service Control Proxy

Path Detection Decision Process Path Configuration

Content Server

Abstract— Content addressing and localization are basic issues in the structure of content-centric networks. This paper presents the implementation and performance evaluation of the Content Registry (CR), developed within the framework of the FIE (Future Internet Engineering) project to meet the requirements of the PI CAN network. The CR allows to find the desired Content Record (containing basic transmission parameters and localization of a content) in the network-distributed content database. The results confirm the CR in highly efficient in various configurations of logical topology of connected nodes.

Content Forwarder

Figure 1. CR against PI CAN network architecture (on the basis of [KST iT])

This paper is related to the FIE project, the concept of which joins different protocols using virtual network links to share different virtual Parallel Internets (PI) over a single physical connection [3]. One of the PI’s, designed in the project, is PI-CAN. For PI CAN a revolutionary approach is taken, and a complete stack of CAN protocols is imp lemented. The network shall provide CAN functionality with basic QoS guarantees, and is built fro m scratch, as all layers above PHY will be designed [5]. In CAN usually a few types of services (or dedicated nodes) can be distinguished. The services used in the PI-CAN pro ject are: Content Consumer, wh ich requests the content and receives it, Content Server, which stores the data, Content Mediator, wh ich provides search and location functions and Content Aware Forwarder (CAF), which is a CAN router. The content can be replicated in more than one Content Server.

The Content Consumer is connected to a given Content Mediator (CM). Content Mediators handle content registrations and requests, find the best location of Content and select the optimal path between the Content Server and Consumer. By registration we mean putting the Content Record (and optional copies), related to the Content, into proper Content Mediators, and ipso facto into a distributed localization database (if the network has more than one CM node). This approach is similar to the one used in DONA [6] and some other projects, and is different than e.g. in Jacobson’s CCN, where no registration was required. The Content Record contains list of a physical network locations of the Content, along with additional metadata (such as a full name, author, publisher, performers) and technical parameters (length, size, codec used, bandwidth required etc.). Thus the basis operation for Content Mediator is to obtain the instance of Content Record on the basis of the CId. Content Registry, described in detail in this paper, is an intrinsic part of the software controlling CM node and acts like a distributed database of Content Records (see Fig. 1). Once the Content Record is found details on the locations and technical requirements of transmission are transferred to Path Detection, Decision and Configuration module. After the transmission path is configured the role of CM ends, and the actual transmission of the content is performed among CAF

control or data messages between them should be minimal. 

Redundancy - for the fail-safety reasons we decided to register every piece of Content Record in K different CR modules (for the FIE project the K=3). III.

CR AS SOLUT ION FOR CONT ENT LOCALIZAT ION IN PI-CAN

A. Related work s The localization problem arises in a different network systems, delivering digital content to their users. Not only CANs or CCNs, but also p2p networks or CDNs deal with looking for a content having given ID. Therefore the number of solutions was proposed, however one should remember detailed requirements for them were different than requirements for Content Records database for PI CAN.

A)

0

1

0

All CR nodes performing services of Content Records processing. The CR programming interface, available to clients applications, includes the following basic operation (irrelevant details are omitted): string assignCId (ContentRecord cr); int registerCR(string CId, ContentRecord cr); int modifyCR(string CId, ContentRecord cr); int removeCR(string CId); ContentRecord[] retrieveCR(string CId);

The interface provides the operations of assignment Cid to a Content Record, registration of a Content Record using the specific CId, updating, removal and finally retrieving a Content Record fro m a database. The retrieving operation is defined in the PI-CAN as the content resolution. The main operations are: the registration and the content resolution on which we focus during performance evaluations . The solution for the CR used in the PI-CAN network had to meet several requirements while conforming to the basic assumptions of the FIE pro ject. Analysis of these requirements, initial assumptions and discussion of possible solutions were presented by authors in [7]. Based on these assumptions the convention for content naming was proposed. It was recommended to use an unstructured CId, having constant length of 128 bits, as well as a Distributed Hash Tables (DHT) based content localization method. The important requirements for CR services, which had to be met include: 

Decentralization – the nodes collectively form the system without any central coordination.



High scalability – it should be possible to extend the network for other CR (CM) nodes.



Performance – Content Record registration and retrieval operations should be very fast even for large numbers of requests, number of hops and number of

0 CR 1 000

1 CR 2 001

B)

1

0

1

0

1

CR 3 010 0010

CR 4 011 0011

0

1

0 CR 5 100

CR 6 101

1

CR 7 110

CR 8 111

0010

0011

0 CR 1 0

0

CR 7 10

0 CR 6 110

0

CR 1 1110

0

CR 1 11110

0

CR 1 111110

0

CR 1 1111110

1 CR 1 1111111

Figure 2. A) Balanced B) unbalanced binary tree of Content Registers.

Two basic approaches to the localizat ion of content can be distinguished. The first uses a hierarchical structure (conceptually similar to DNS), the second assumes a flat structure of identifiers and algorithms are based on mathematical mapping of names to the locations or a node to redirect the request (in this group algorithms using hash functions should be mentioned).

The approach using the hierarchical structure of the content identifiers presents the CCNx project [18] described e.g. in [1]. The project uses hierarchical content IDs, constructed in a way that allows to o mit the content registration (content publishing) before its use by consumers and, consequently, content may be created on demand. Relatively old TRIAD project [19] also uses the hierarchical structure of content names. It used the URL scheme, a request is sent to the location specified by the URL, and then the pointer to the "nearest" replica is returned. The second group consists of solutions based on flat content identifiers for the mechanism of mapping destinations to store the content copies. Due to the high efficiency the DHT solutions are often used. A number of content-based projects and peer-to-peer solutions are based on DHT exists. ROFL [10] uses localization method based on classical Chord DHT algorithm [11]. Widely used software using DHT for content localization are BitTorrent [8] and (probably, as there are no official specification) Skype [9] (the examp le of network oriented for voice streams as the content). Also a number of CAN-type projects are based on DHT: SEATTLE [12] provides a directory service using ﬂat addressing with a onehop DHT. Others projects using DHT for content localization are i3 [13] and 4WARD NetInf [4]. B. The COLOCAN algorithm For the realization of CR functionality we proposed and implemented the COLOCAN (C Ontent LOcalisation for CAN) algorithm, described in details in [14]. The algorith m is based on the DHT general idea, however it is not based on any solutions that are known to authors. The algorithm uses the 128-bit unstructured key as Cid and is redundant, securing a basic level of fail-safety. We decided to register K copies of Content Record, in different Content Mediators. For the purposes of PI-CAN K=3 was assumed. This assures that in case of failure of some CMs a copy of particular Content Record will still be available. Th is also helps in maintaining the upgrades of Content, supplying the Content Record with the Timestamp, providing access to the most current information on locations. The localizations keys are used to indicate the appropriate CR node and yet indicate the search path in the virtual topology of CR nodes (binary tree).

the Consumer’s request is sent. A similar operation takes place in the registration procedure, except that the target CRs receive the requests to insert a new Content Record in their databases. As the Content Registry is intended to work in Content Mediators nodes of CAN network, the nodes serve as routers, and are considered as stable, with time-of-living counted in weeks or months. Therefore the content localization algorith m doesn’t put emphasis on quick reactions to changes in the network topology, in favor of higher efficiency. However, changes in the network topology still can be introduced in a dynamic way. Attaching new nodes, as well as controlled detaching is easy, though requires heavy communication (parts of database must be transferred). A recovery mechanism after a permanent failure of the node is provided, thanks to copies of Content Records stored on different nodes. The proposed mechanism is also resistant to temporary breaks in network communication, as registration and modification requests are buffered, in order to perform them after recovery of communication. The copies of Content Records are crucial for the safety of access to the information about content. Therefore the CId generation algorithm assures that chosen CIds, together with associated copies of Content Record, are stored on different physical nodes. This protection is of particular importance in small networks, consisting of few nodes. General properties of COLOCAN meet the requirements of the FIE project. Fu ll description of the algorithm is available at: www.iitis.pl/~mateusz/iitis-fiecolocan.pdf (in Polish). The operation of COLOCAN in thousands -nodes network was tested in simulat ion, with use of OMNeT++ discreet event simulator, as well as in a real network (up to 16 nodes). Details of CR imp lementation as well as performance results are presented below. The idea of the COLOCAN is simple, but simulations evaluations show its effectiveness. In most cases it is able to find desired Content Record in one step (the request makes one hop fro m one CR to another, storing the Content Record). On ly in transitory state, when changes in the network structure are not yet mirrored in the border CM’s memory, two hops will be needed. Thus the complexity of the algorithm is O(1) and number of hops is independent of size of the network.

The balancing of the tree is not required for proper operation of COLOCAN, but the better a tree is balanced, the lesser are the differences between nodes in the size of database and number of requests to service in a time unit.

IMPLEMENT AT ION AND PERFORMANCE EVALUAT ION OF CR The prototype implementation of communication layer was developed with the use of the RPC and the Internet Co mmunication Engine (ICE) library [15]. Thus communication between the CR-s is running over IPv6, which was the choice for administrative channel in the PI CAN (as opposed to the transmission channel of content, which uses proprietary protocol). ICE co mbines ease of use of objectoriented RPC mechanism with high performance communication. For quick access to Content Records in local storage, Kyoto Cabinet was used. Kyoto Cabinet is a simp le, yet very fast library for managing local (serverless) database [16].

Requests to deliver Content Record are sent to the CR bound to given keys and all delivery requests are sent simultaneously. Based on responses from the CRs, the reply to

As the imp lementation language we used C++ (gcc 4.4) and implementation platform was Linu x (Debian, kernel 2.6.32). For tests and performance evaluation we used computing

The virtual tree topology is stored in every CR’s memo ry. Tree leaves correspond to the CR nodes and are described by the network address of node Content Registry Id (CRID), determined by its position within the tree (See Fig. 2). After receiving the request for a given CId, the CR generates K content-keys, which are subsequently used as search keys in the tree. N lowest bits from keys are taken for lookup the tree, and N is equal to maximu m depth of the tree (if the tree was balanced, k = ceiling(log M), where M – number of CRs.

IV.

cluster consisting of 16 servers. Each server has two dual-core AMD Opteron processors and 8 GB of RAM. Servers are connected using a Gb Ethernet network within the blade chassis (8 servers in each of the two chassis) and 10Gb E lin k between chassis.

Figure 3. T he resoponse time (mean value) for single content resolution (content Record obtaining) operation depending on t he number of Content Consumers.

Figure 4. T he resoponse time (mean value) for single Content Record registering operation depending on the number of Content Consumers.

A. The test environment and scenarios Test scenarios were designed to explore the effectiveness of the solution according to the increasing load throughout the whole network (Scenario 1), and an increasing load on the single specific node in the network (Scenario 2). According to the assumption of the FIE project on any vCAN (virtual CAN) at least one Content Mediator node (with the appropriate CR process) should be available. Scenario 3 examines the performance of the solution of a separate node. Co mparing results of the scenarios we can assess what is the overhead associated with the distributed database of Content Records (in comparison to single CR node).

In the scenarios the package means 200 000 Content Records, registered one by one and then obtained one by one fro m the CR. Obtained time values for each scenario represent average time for a single request. Identify the Headings 

Scenario 1. The topology of 16 connected CR nodes. The logical tree topology is balanced. A client connects to the CR 1 and performs the registration and resolution for a package of Content Records. Next, two clients (connected to CR 1 and CR 2) performs the registration and resolution for a package. Next three clients, up to 16 client (one client per one CR node). At every step of this scenario clients start directing theirs requests simultaneously.



Scenario 2. The topology of 16 connected CR nodes, and balanced tree as the logical topology, like above. One client connects to the CR 1 and performs the registration and resolution for a package of Content Records. Ne xt, two clients connected to CR 1 perform the same action directing requests to the CR 1 one. Then three clients, up to 16 clients simultaneously.



Scenario 3. The scenario is analogical to the scenario 2 (1 up to 16 simultaneously acting clients), but there is only one CR node (no others CR nodes connected to it).

The results are presented in Fig. 3 (process of Content Records registering) and Fig. 4 (content resolution). The main conclusion is that the uniformly loaded network (Scenario 1) provides high performance (for the basic operation of Content Record resolution), comparable to use of a single CR node (no distributed database). By increasing the load, the difference still is decreasing, and for 12 nodes the performance of content resolution operations in the distributed system exceeds the single-node system for the same load. For a Content Record registration, performance of distributed version is lower, wh ich is associated with the need to register copies over the network and the communication overhead. Registration is an operation requested less frequently (once for a piece of a Content). Besides, analyzing the graph on Fig. 4 we can expect that for higher load on a single node system p lots for Scenario 1 and 3 will also intersect. The worst case is related to the scenario 3. The results point out that in the network of connected CRs heavy load of the individual CR nodes (or uneven load) leads to a significant decrease in performance. In that case the communication between the loaded node and the rest of the network is a bottleneck. This case, however, is an extreme case and then (taking into account only the performance) the better solution seems to be a single CR node. For scenarios 1 and 2 tests for the unbalanced topology (similar to that shown in Figure 2) were performed too. In this case, the local databases stored on each CR node significantly differ in size. The big differences were also reported in the load and the response time of individual nodes. Therefore, the variance of service time for content resolution and register was high. Due to the necessity of load balancing it is important to maintain the balanced tree topology. Detailed results according to different topologies and configurations are beyond the scope

of this paper, but it is worth to point out that maintaining the balanced topology of logical tree is important. Performance tests, presented in this paper were carried out for the local network in a clustered environment that differ fro m the real networks conditions. Thus it is also planned to test the implementation in the experimental network, dedicated to the FIE project, PL-LAB [17]. V. SUMMARY The COLOCAN algorithm has been implemented in the CR module and tested. The CR became an intrinsic part of the implementation of PI-CAN (the FIE project). The preliminary, basic assumptions are met, and the presented evaluations indicate the efficiency of a distributed solution is enough for simultaneous handling of requests from mult iple Content Consumers (in most cases only one connection is needed to obtain the desired Content Record). The results indicate however that in small networks a sufficient and an more efficient solution is to use a single CR node. As the network grows, the load on each node increases and the advantage of single-node solution is not significant. Co mparison of the curve inclination for the scenario 1 and 3 (Fig. 3) shows that for the content resolution operation the performance of distributed system exceeds the single-node topology, for a sufficiently large load. In addition, one should remember that the primary benefit of using a distributed structure is redundancy. It allows for the registration of mu ltiple copies of the Content Record and provision of resolution results even in case of connection problems. Redundant solution used in the CR allows also to recover the entire database of Content Records in case of failure of single or few nodes. The proposed solution is suitable for content localization in content-oriented and content-centric networks having stable and efficient nodes, such as the Content Mediator nodes in the network PI-CAN. It allows to spread information about the content in a manner independent of the content and its location on the servers. Any content available in the network is treated equally, regardless of its popularity or the number of copies available on the servers. The COLOCAN algorithm can also be potentially useful for the general problem of distributed databases. Application, however, in this area requires further research and detailed analysis of currently available solutions. A CKNOWLEDGMENT The work presented in present paper is sponsored by Future Internet Engineering (Polish: Inżynieria Internetu Przyszłości) project, EU Funds 2007-2013, contract no. POIG.01.01.02-00045/09-00.

REFERENCES [1]

[2]

[3]

[4]

[5] [6] [7]

[8] [9] [10] [11]

[12] [13]

[14]

[15] [16] [17]

[18] [19]

V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, R. L. Braynard. “Networking named content”. In Proceedings of the 5th international conference on Emerging networking experiments and technologies, CoNEXT '0 9J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73. “ Specification of Parallel Internet: Content Aware Network”, Future Internet Engineering (IIP) report, EU Funds 2007-2013, contract no. POIG.01.01.02-00-045/09-00, https://www.iip.net.pl/en. “ Future Internet Architecture – state of the art and requirements”, Future Internet Engineering (IIP) report, EU Funds 2007-2013, contract no. POIG.01.01.02-00-045/09-00, https://www.iip.net.pl/en. “ Second NetInf architecture description”. EU 7th Framework Programme, The Network of the Future Project 216041 "4WARD Architecture and Design for the Future Internet". W.Burakowski, H.Tarasiuk, A.Bęben. Architektura Systemu IIP, Przegląd telekomunikacyjny, 8-9 2011, pp.720 – 722. Koponen T. et all. A data-oriented (and beyond) network architecture. In Proc. ACM SIGCOMM, Kyoto, Japan, August 2007. Japan, August 2007.M. Nowak, S. Nowak, P. Pecka.K. Grochla “ Content identification in PI CAN Network”. In Computer Networks A., Kwiecień, P. Gaj, P. Stera (Eds.) Communication in Computer and Information Science 160, pp. 164-172, 2011. B. Cohen. “The BitTorrent Protocol Specification”, http://bittorrent.org/beps/bep_0003.html S.A. Baset, H.G. Schulzrinne, “An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol”. Proc of INFOCOMM 2006 M. Caesar, T. Condie, J. Kannan, K. Lakshminarayanan, I. Stoica, and S. Shenker. ROFL: Routing on Flat Labels. In SIGCOMM, 2006. I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. In SIGCOMM, 2001. C. Kim, M. Caeser, and J. Rexford. Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises. In SIGCOMM, 2008. I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S.Surana, "Internet indirection infrastructure,"presented at ACM SIGCOMM Conference, 2002. P. Pecka, M. Nowak, S. Nowak, “ Content localization for non-overlay content-aware networks”. Smart Spaces and Next Generation Wired/Wireless Networking. LNCS, vol. 6869, pp. 520 -528.11th International Conference, NEW2AN 2011, and 4th Conference on Smart Spaces, ruSMART 2011 St.Petersburg, August 2011. “Ice Manual”. Ice 3.4.2 Documentation. ZeroC, Inc. 2011. Available at: www.zeroc.com/Ice-Manual.pdf. “ Fundamental Specifications of Kyoto Cabinet Version 1”. FAL Labs, 2011, http://fallabs.com/kyotocabinet/spex.html J.Śliwiński, R.Krzywania, Ł.Dolata. PL-LAB – sieć eksperymentalna projektu Inżynieria Internetu Przyszłości, Przegląd telekomunikacyjny, 8-9 2011, pp.728 – 730. CCNx project: Web site (2011). http://www.ccnx.org. T RIAD Project, http://www-dsg.stanford.edu/triad.