The Storage of Weighted Directed Graph in Relational ... - Springer Link

The Storage of Weighted Directed Graph in Relational Database and Implementation of Dijkstra Algorithm Yanchen Li1,2,Yuan Feng1,*, Feng Hong1, and Ying Bi3 1

College of Info. Sci. & Tech.Ocean University of China, Qingdao 2 Hisense Trans-Tech Co,Ltd Qingdao, China 3 College of Business Administration, QingDao Hismile College

Abstract. As an important data structure, map is widely used in modern enterprise management software. Route planning, decision support, finding the critical path, and other activities all need the algorithms in graph theory. And once the map becomes more complex, it has to be stored in database. While the traditional adjacency matrix and adjacency list stored in the database have their limitations. If still achieve the appropriate algorithm using a programming language, software and database will exchange large amounts of data when the software and database are not in the same computer. So this method will inevitably affect the efficiency of the algorithm, and this situation is extremely difficult to optimize. Based on the characteristics of relational database, our paper designs a storage model of a directed graph in relational database. Using the SQL language and set operations, we successfully achieve the Dijkstra algorithm. In the process of Dijkstra algorithm, we use the mechanism of UPDATE in SQL instead of the inner loop steps in Dijkstra algorithm, thus the time cost by database operations is reduced greatly and the efficiency is raised. Keywords: Weighted directed graph, Relational database, Dijkstra algorithm, Sql.

1

Introduction

As an important data structure, map is widely used in modern enterprise management software. Route planning, decision support, finding the critical path, and other activities all need the algorithms in graph theory. Once the map becomes more complex, it has to be stored in database. The traditional adjacency matrix and adjacency list stored in the database have their limitations: 1. For the storage of adjacency matrix, since the number of nodes and edges are different, so the numbers of rows and columns representing the adjacency matrix are different. In such situation, you need to store all the tables as many as all the maps, which is not conducive to database performance optimization. And this method makes the comparison between maps different. When the weights of the edges become more complex, it is harder to describe the compositions of the weights of edges. We can only record the results calculated as the weight, as a result, analysis the weights calculated dynamically based on the user’ demand becomes difficult. *

Corresponding author.

M. Zhou and H. Tan (Eds.): CSE 2011, Part II, CCIS 202, pp. 48–54, 2011. © Springer-Verlag Berlin Heidelberg 2011

The Storage of Weighted Directed Graph in Relational Database and Implementation

49

2. If you use adjacency list, its essential pointers in the relational database is difficult to achieve, common method to solve this problem is to take a whole row as a pointer and store the ID of the corresponding line, according to the stored ID we can find the corresponding row. But such method needs to use Recursive queries and cursors. And in relational database, its implementation is inefficient. In the two storage modes above, Dijkstra used in map analysis is usually achieved by programming language, which will cause large amount of exchange between applications and databases. Once the software and database is not in the same computer, the network speed may become operational Bottleneck, then the system will be very difficult to optimize. Dijkstra using relational database often takes database language as programming language, it uses application variables and map cursor to operate map step by step, so its query is not efficient. If the transaction is busy, it will easily lead to deadlock. According to the characteristics of a relational database, this paper designs a storage model using only one table to store number of directed graphs with same nodes. This model can store a variety of weights without pointers. The implementation method is easy to use database to implant the horizontal comparison between maps and facilitate the dynamic calculation of weights. Most graph operations can be achieved through a database set operations, which can greatly reduce the interaction between programming language and databases and improve the efficiency of the system. 1. On the base of such storage, we use SQL language to achieve Dijkstra algorithm. Using the mechanism of UPDATE in SQL language instead of the Dijkstra algorithm inner loop, we can find and record the whole diagram of the vertices of the shortest paths with only one loop. 2. Replace the use of the programming languages, using database language to achieve Dijkstra can reduce the large times of interactions between the database and program, which improves the efficiency and reliability greatly.

2 2.1

Table Structure and Algorithm Design The Storage of Directed Graph in the Database

In the process of storing graph using adjacency list, we find that some particular collection of database operations can directly find the desired data. Start point and end point can be recorded to represent an edge, and single node can be seen as an edge from point to their own. In order to distinguish arcs their own and nodes, we can use a flag. The arc right and the field of DATA can be expressed using the same column. A weighted directed graph can be expressed in Table 1(Table 1 is defined as Graph, in order to facilitate the use of SQL statements). Table 1. Graph BgnNode V0 V1 V0 ……

EndNode V0 V1 V5 ……

Type NODE NODE SIDE ……

Data 0 1 100 ……

50

Y. Li et al.

In Table 1, BgnNode column represents the starting of edge, EndNode column represents the end of Edge. Type column is marked to indentify the node and edge. If the Type is node, the data of BgnNode and EndNode must be the same. Data column stores the weights of edges or the data information of nodes. Using the method introduced in this section to store map, operations become easier, which can use SQL set to achieve. As the actual application has no need of traversal algorithm, so this paper does not do research on traverse algorithm, but we realize the Dijkstra algorithm. 2.2

The Operation to Achieve Djistra Using SQL

Dijkstra algorithm is useful in many cases, the storage method described in this paper can use the SQL set operations to support the implementation of the Dijkstra algorithm. Dijkstra is an algorithm that finds the shortest path from one point to other points. The basic idea of the algorithm is: Let V to be the set of all vertices in the graph, the source v0 belongs to the set V; Let set S to be the set of vertices having found the shortest path, Suppose T = V-S. Select the vertices of shortest path to v0 from the collection T, and add the point u to set S. Every time adding point u, we modify the value of the shortest path from v0 to other points in set T. The length of other vertices in T is the sum of the original value and the shortest path values of u and the smallest path value of this point. Repeat this process until all vertices of T have added to S[1]. 2.2.1 The Design of Result Table In order to implant Dijkstra in the database, we need a table to store the results of the algorithm execution. It can be seen from the algorithm, the algorithm results of each cycle will affect the next cycle, so there should be a flag in the table to identify whether a particular node has found the shortest path. We make the flag of the node has found the shortest path to be 2, and otherwise we take the flag to be 0. Table should have a column to store nodes on the shortest path and the total length of the whole path. According to the representation of maps, there should be the starting point and end point in the table. Based on the Dijkstra, we can let the length of vertexes yet to find the shortest path to be the maximum that the computer allowed. In this paper, we use the maximum 65534. A structure of a result table is shown as table 2. Table 2. Djistra BgnNode V0 V0 V0 ……

Path

EndNode V0 V1 V2 ……

PathLength 65534 65534 65534 ……

Flag 0 0 0 ……

The Storage of Weighted Directed Graph in Relational Database and Implementation

51

BgnNode is the starting point and EndNode is all the nodes in the table. So all the paths described in the paper are recorded. The path can be constituted by a sequence of vertices, which is recorded in the feild of Path. PathLength field is used to store the length of the path.when the shortest path has not exited, the value is recorded to be maximum 65534 and the FLAG is 0. When the Flag is 1, it presents that the state is the middle state of finding the shortest path. 2.2.2 Algorithm Design Seen from the description of the Dijstra, the algorithm is composed of two loops, inner loop will find the current shortest path to all the vertices. Every time finishing the outside loop, it will find the shortest path to add into the collection of S. Next time, the vertexes of set S will be the initial conditions, and then start the inner loop. The mechanism of UPDATE in SQL is to find the appropriate conditions after scanning the whole table, and then update the relevant records. Properly set the conditions, it can certainly replace the inner loop. Besides, you can optimize with the use of database indexes, whose performance is superior to the circulation structure compiled by its own. Outline design of the algorithm is as follows: 1.

Based on the Dijkstra, after initialized the relevant variables, the first step is to check whether there are still nodes that have not found the shortest path. If not, then exit the program directly; otherwise, start the loop. 2. The next step, it should find the shortest path from the source to the whole nodes can be reached, and add it into result table. This is the function of inner loop. In this paper, we use a UPDATE statement to achieve. 3. After getting all paths, we choose the shortest path not belonging to S; add it to set S and set the flag to be middle state. Such node will be added to set S. 4. Next, we store the variables of intermediate state nodes, which will be used in the next loop. 5. Set the intermediate state of the vertexes to 2, and then end this loop to start the next loop. Stepwise refinement: It can be seen from the outline design and flow chart, the algorithm focuses on how to use the UPDATE statement instead of inner loop of the Dijkstra algorithm. Update statement can be divided into two aspects: the searching conditions and the content of records. Suppose there is an edge from Vi to Vj, we set its weight to be PathLength (Vi,Vj), the source node is V0. 1.Searching conditions: After the first n cycles, we defined the vertice added to set S to be vi. Then the next loop, we should find all the vertices Vx to Vi. And PathLength (V0,Vi)+PathLength (Vi,Vx) < PathLength (V0,Vx).

2.content of records First we should record the weight of the shortest path, not only for the final result, but also for the process of the algorithm step by step. Principles of records are: Set the path length of point Vi which is the last added to set S to be ShortestPathLength, if ShortestPathLength+PathLength (Vi,Vj)