Concurrency Control in Federated Databases: A ... - Semantic Scholar

Concurrency Control in Federated Databases: A Dynamic Approach† San-Yih Hwang1 Jiandong Huang2 Jaideep Srivastava1 1Computer Science Department, University of Minnesota, 200 Union St. SE, Minneapolis, MN 55455. 2Sensor and System Development Center, Honeywell, 3660 Technology Drive, Minneapolis, MN 55418.

Abstract The concurrency control problem in a federated database system (FDBS) is especially difficult due to the inherent heterogeneity and autonomy of participating local database systems. A number of FDBS concurrency control algorithms have been proposed. However, each algorithm has the drawbacks of low concurrency, global deadlocks or high system resource wastage. In this paper we propose a new protocol, called Dynamic Adjustment of Global Serialization Order (DAGSO), which provides high concurrency and is able to reduce system resource wastage due to early detection and abortion of eventually non-globally serializable transactions. The protocol is proved to be correct for achieving global serializability and free from global deadlocks. This paper also compares, using a detailed simulation model, the global transaction throughput of DAGSO with Top Down, Bottom Up, and Site Graph approaches proposed in the literature. The results show that DAGSO performs the best in most local system operating regions and range of global transaction behavior.

1. Introduction

management systems (DBMS). Heterogeneity and autonomy are two main features of an FDBS. Heterogeneity refers to the various types of participating local DBMS's, which may employ different user interfaces, data models, query languages, and transaction management mechanisms. Autonomy dictates that a participating local DBMS should not be modified by the FDBS, and a local DBMS has the right to decide the types of internal information provided to the FDBS and execute queries and transactions according its own rule [Shet90]. Due to autonomy, there are two types of transactions in the entire FDBS environment, namely local transactions and global transactions. A local transaction, which accesses data controlled by a single DBMS, is submitted to and executed by a local DBMS without the control of the FDBS. A global transaction, which accesses data controlled by more than one DBMS, is submitted to the FDBS and decomposed into several subtransactions to be executed by different DBMS's. The objective of transaction management in the FDBS is to guarantee serializable execution of local and global transactions 3 . Several solutions have been proposed for achieving serializable execution [Brei88] [Pu88] [Elma90] [Geor91] [Mehr92b] [Batr92]. Each proposed approach, although has its strengths, has some shortcomings. Most of them suffer low concurrency, possibility of global deadlocks, or wasteful resource consumption. The main problem for these

A federated database system (FDBS) is a system which provides access to a set of pre-existing local database †This work was supported in part by sub-contract #B09530013 from Honeywell SSDC, under contract #F30602-91-C-0128 from Rome Laboratory of the US Air Force.

3 Serializability [Eswa76] is a well accepted, but strict correctness criterion for database transaction management. A number of weaker correctness criteria for FDBS environment, which model different application scenarios, have been proposed [Du89] [Rusi90] [Pu91] [GaMo91] [Mehr92a]. While the suitability of these weaker correctness criteria on the FDBS environment needs further investigation, we use serializability as the correctness criterion in this paper.

approaches is that the global serialization order of a global transaction is decided either at the beginning or at the end of its execution. In [Elma90] , [Batr92], and some schemes proposed in [Mehr92b], the global order of a transaction is decided when the transaction starts and is enforced throughout its lifetime. In [Pu88] and [Geor91], a transaction is validated (to match the global orders of other committed transactions) only at the end of its execution. While the former approach may limit the degree of concurrency, the later approach may cause wastage of resources. This paper proposes a novel approach that provides high concurrency and reduces wastage of system resources while maintaining global serializability. In particular, our approach is able to dynamically adjust the global transaction serialization order to match the possibly local dependency orders. This feature allows our algorithm to accept more executions. Besides, a global transaction is validated early in its execution to reduce resource consumption by eventually aborted global transactions. In summary, our approach preserves the advantages of the existing algorithms; i.e. it has high concurrency, and is free from global deadlock, and in addition reduces the wastage of resources. We also compare the simulation results of DAGSO with Top Down, Bottom Up and Site Graph approaches proposed in the literature. The simulation study unveils the algorithm’s run-time behavior under the effects of data contention, resource contention, and transaction access patterns. Our simulation results show that DAGSO performs the best under most local system operating regions and global transaction behaviors. This paper is organized as follows. Section 2 describes the FDBS concurrency control problems, reviews the related work, and identifies the shortcomings of existing algorithms. Section 3 describes an FDBS environment and transaction model that we considered in this study. Section 4 presents our DAGSO algorithm, proves its correctness and show its computational complexity. Section 5 demonstrates simulation results of DAGSO and the other algorithms. Finally, section 6 summarizes this work and points out future extensions.

2.

Related Work

In this section, we describe the concurrency control problems in the FDBS environment and review the related work. Unless otherwise noted, we refer to serializability as conflict serializability as defined in [Bern87]. Definition 2.1. A serialization order of a set S of transactions on a serializable execution is a total order p of S such that for any pair of transactions Ti and Tj in S, Ti pTj if Tj depends on Ti in the execution. Given a serializable execution, a set of transactions may have more than one serialization order. The goal of FDBS concurrency control is to make sure that various local execution

are consistent. However, due to local autonomy, the exact dependency relationships among global transactions in a local execution can not be revealed to the FDBS. One sufficient approach for guaranteeing consistent local executions, as used by many FDBS concurrency control algorithms, is to first determine a serialization order of global transactions in the execution at each local site and then ensure that all the local serialization orders do not conflict with one another. For some local concurrency control mechanisms, e.g. 2-phase locking (2PL), timestamp ordering, and optimistic validation [Bern87], the local serialization order of global transactions in a local execution can be determined by considering the times when global subtransactions reach their serialization points [Leu90]. A transaction is said to reach its serialization point (or to be serialized) if the dependency relationship between it and every other conflicting transaction can be decided. In case of 2PL, a transaction reaches its serialization point when it has obtained all locks requested, which usually is the last data request. In case of timestamp ordering, assuming the timestamp of a transaction is assigned when its first operation is executed, a transaction reaches its serialization point when its first operation is executed. In case the types of local concurrency control mechanisms are not known to the FDBS, or the local concurrency control employs some mechanism, e.g. serialization graph testing approach [Bern87], such that the serialization order of a transaction cannot be decided in its life time, a ticketing approach can be applied to get the local serialization order [Geor91]. The ticketing approach forces each global subtransaction executing at a site to read and increment a common data item called ticket. The local serialization order of a subtransaction is the ticket value it reads. Under ticketing mechanism, a subtransaction is said to reach its serialization point when its ticketing operation completes. Therefore, no matter what types of concurrency control protocols are used in local DBMS's, a local serialization order of global transactions on the execution at each site can be obtained. A number of FDBS concurrency control algorithms based on local serialization order have been proposed, e.g. [Pu88], [Elma90] and [Geor91]. The approaches described in [Pu88] and [Geor91] are of type Bottom Up. In this approach, all operations of a global transaction are submitted to the local DBMS's for execution without suspension. A global transaction can be validated when all its subtransactions complete (but do not commit yet). If the local serialization orders of all previously committed global transactions and the validated global transaction are consistent, only then is the validated global transaction committed. This optimistic approach may provide high concurrency, but has the drawback of late abortion (occurs when a transaction fails to pass the validation phase at its end), which causes a waste of resources (CPU, I/O, and network), in addition to global deadlocks. In [Elma90] a Top Down approach is described which assigns a particular order of global transactions beforehand. The local

serialization order of global transactions at each site is forced to match this global order by controlling the submission of global subtransactions, requiring that operations of a global subtransaction not be sent to the local DBMS for execution until the previous subtransaction has reached its serialization point. This approach is deadlock-free and incurs less computation overhead. However, it suffers low concurrency for some local concurrency control mechanisms (e.g. 2PL). One approach that does not use the local serialization order concept is Site Graph [Brei88]. Site Graph is a pessimistic approach, which assumes conflicts exist between every pair of global transactions executed simultaneously at the same site. The approach maintains an acyclic Site Graph, whose nodes are all the participating sites. All sites from which a transaction G accesses data are connected by a path marked G. The global transaction execution is controlled in such a way that the Site Graph does not contain multi-edges (i.e. two or more edges joining the same pair of vertices) and is always acyclic. However, a cycle in the Site Graph does not usually imply nonserializable execution. Thus, the Site Graph approach is pessimistic and may result in low concurrency of global transaction execution. Clearly, low concurrency, global deadlocks, and resource wastage are the major problems confronting development of FDBS global concurrency control protocols. Our goal therefore is to develop an algorithm which achieves high concurrency and low wastage of system resource, while ensuring global serializability and preventing global deadlocks.

3.

System Environment

In this section, we describe the system model and the global transaction model considered in this work. The reference architecture is shown in Figure 3-1. A federated autonomous database system (FDBS) is composed of a Global Transaction Manager (GTM) and a number of Global Transaction Agents (GTA), one for each local database management system (LDBMS). The GTM is a logical unit which consists of a Global Concurrency Controller (GCC) and a number of clients, one for each site in the network. Each client is associated with an end-user system, while each GTA is associated with an LDBMS. A global transaction is issued to a client in the GTM. The client decomposes the global transaction into several subtransactions and submits them to the GTA's. A GTA is responsible for submitting the operations of global subtransactions to the associated local database and interact with the client and the GCC to achieve global serializability. Due to local autonomy, a GTA is treated as a user process by the LDBMS. The GCC is responsible for the concurrency control of the global transactions. It maintains the information of the global transaction execution. The clients and the GTA's will consult with the GCC to determine the fate of an executing

global transactions. In other words, the GTM and all the GTA's work cooperatively to guarantee global serializable execution. Local transactions are submitted directly to the local database systems and not known to the GTA's. Note that Figure 3-1 depicts a logical architecture, not the physical allocation of the FDBS components (GTM, GTA and GCC). With respect to implementation, a client, and several GTA's and LDBMS's, for example, can coexist in the same platform. global trans.

global trans.

global trans.

client

client

client

GTM GCC

Network local trans.

GTA

GTA

LDBMS

LDBMS

GTA

local trans.

LDBMS

local trans.

Figure 3-1. FDBS reference architecture The global transaction model in our work is shown in Figure 32. A global subtransaction accessing a local database is modeled as a sequence of read/write operations. A global transaction commits only if all its subtransactions are executed successfully at local sites.

Global_Trans_Begin Sitei Pi1 . . x pages . . . Pix

Site j Pj1 . . . Pjy

Sitej Pk1 . . z pages . . Pkz

Global_Trans_End Figure 3-2. Global Transaction Model For the algorithm being discussed in the next section, we make the following assumptions: • Each local DBMS guarantees conflict serializable execution of transactions submitted to it. It also has the ability to resolve local deadlocks and recover from local failures. • A local DBMS cannot distinguish between local and global transactions. • Each local DBMS provides a visible prepared-to-commit state for its transactions. This assumption is made to focus

our study on the concurrency control aspect. We consider recovery issues in [Hwan92].

each of which is assigned the timestamp and submitted for execution at local sites.

• Inter-site communication is reliable. Network partition problem is not our concern in this paper.

• Execution — The global subtransactions are being executed at local DBMS's.

4.

Dynamic Concurrency Control

In this section, we present our global concurrency control protocol, called Dynamic Adjustment of Global Serialization Order (DAGSO). We first discuss the mechanisms employed by the protocol, then describe the protocol itself. We show the correctness and properties of the protocol. Finally, we analyze its computational complexity.

4.1

The Mechanisms

Our goal in developing a new global concurrency control protocol is to reduce wastage of system resources, to increase concurrency, and to prevent global deadlocks. Our strategy is to combine advantageous features of existing approaches and incorporate new mechanisms to overcome existing problems. In particular, we consider a mechanism for dynamically adjusting serialization order among concurrent transactions. This improves system performance by reducing unnecessary transaction aborts and thus increasing useful resource utilization. Predefined global order Under DAGSO, when a global transaction starts, it is assigned a unique global order before being submitted to local databases. The serialization order of global transactions in the execution at each local DBMS then follows the global order. Thus, DAGSO is comparable to the Top Down approach [Elma90] since the global serialization order is defined before transaction execution. With the predefined global ordering mechanism, the task of global concurrency control is to monitor the local serialization order of global subtransactions executing at each local DBMS and to match it with the global serialization order. Optimistic concurrency control To increase the degree of concurrency, DAGSO employs an optimistic approach [Kung81]. A global transaction starts on obtaining its global order, and each of its subtransactions is submitted to the local DBMS without suspension. To this extent, DAGSO is comparable to the Bottom Up approach, having the advantage of high concurrency. The transaction execution of DAGSO can be described by the state diagram shown in Figure 4-1. A global transaction can be in one of four states: • Initialization — A new global transaction obtains its global order. It is then decomposed into subtransactions,

• Validation — A global transaction is validated by comparing the global serialization order against the local serialization orders. If the two kinds of orders do not conflict, it continues execution; otherwise it aborts or the serialization order of global transactions is adjusted to match the existing local serialization orders. For any pair of global subtransactions Ti and Tj executed at the same site, there are four cases: (1) Ti reaches its serialization point before Tj starts. (2) Ti reaches its serialization point after Tj starts but before Tj reaches its serialization point. (3) Tj reaches its serialization point before Ti starts. (4) Tj reaches its serialization point after Ti starts but before Ti reaches its serialization point. For cases (1) and (2), local serialization order Ti pTj can be inferred. For cases (3) and (4), local serialization order Tj p Ti holds. Since new local serialization orders can be derived when each global subtransaction either starts or reaches its serialization point, the validation of a global transaction can be performed when any of its subtransactions either starts or reaches its serialization point. • Termination — A global transaction terminates either all of its subtransactions complete their operations or if any of its subtransactions aborts.

Initialization

submit

Execution

start/ serialized

ready to commit

validated

Validation

Termination

abort

Figure 4-1. DAGSO state diagram for global transaction execution It can be seen that DAGSO has the feature of early validation. Unlike the Bottom Up approach, where a global transaction can enter validation only when all its subtransactions complete, DAGSO is able to validate every subtransaction before its execution and after it is serialized to reduce wastage of system resources due to potential abortion. Thus, DAGSO can decide the fate of a global transaction earlier so that unnecessary resource consumption is avoided. The validation of DAGSO may potentially adjust the global orders of transactions, which

is a key component of the DAGSO protocol, as described below. Dynamic adjustment of global serialization order A global subtransaction is validated by comparing the local serialization orders with the global order. The condition for successful validation is that the local serialization orders follow the global order. By the traditional timestamp ordering approach [Bern87], or the global timestamp approach proposed for federated concurrency control [Batr92], a global transaction aborts if any of the local serialization orders do not match the global timestamp order. However, some transactions need not abort, since the validation condition is sufficient but not necessary. Consider global transactions G1 and G2, each having subtransactions executing at Site 1 and Site 2. Assume that the global order is G1 p G2. Local execution of their subtransactions can lead to one of the following four cases:

(1) (2) (3) (4)

Site 1

Site 2

G1 p G2 G1 p G2 G2 p G1 G2 p G1

G1 p G2 p G1 p G2 p

G2 G1 G2 G1

Using the sufficiency condition, only case (1) is valid. In fact, case (4) is also valid as long as we can redefine the global serialization order to be G2 pG1. Clearly, the sufficiency condition may cause unnecessary aborts, resulting in wastage of system resources and lowering system performance. DAGSO eliminates the problem of unnecessary aborts by incorporating a global serialization order adjustment mechanism into the validation process. In brief, when the sufficiency condition fails, the validation process checks to see if the global serialization order can be rearranged to match the local serialization orders. If so, it adjusts the global order, and thus there is no need to abort the transaction being validated. Consider the above example again. Assume that the subtransaction of G1 at Site 1 reaches its serialization point after that of G2. The validation process adjusts the global order from G1 p G2 to G2pG1 if the subtransaction of G1 at Site 2 is not serialized before G2. In this case, it does not have to abort G2. Therefore, DAGSO eliminates the unnecessary abort. In summary, the DAGSO concurrency control protocol is based on three basic mechanisms, i.e., predefined global order, optimistic concurrency control, and dynamic adjustment of global serialization order. The three mechanisms tend to provide the advantages of high concurrency, reduction of late aborts, and elimination of unnecessary aborts.

4.2

The Algorithm

We now describe the DAGSO concurrency control algorithm that uses the mechanisms presented above.

The local serialization orders of global transactions on various local executions can be captured by a directed graph, called Virtual Global Serialization Graph (VirtGlobalSG). VirtGlobalSG is a directed graph (V, E). V is a set of global transactions. An arc (Gi, Gj, s) ∈ Ε, where Gi and Gj are global transactions and s is a site, if the subtransaction of Gi i s serialized before the subtransaction of G j at site s, and no other global subtransaction is serialized between Gi and Gj at site s. Note that VirtGlobalSG is a multigraph, in which more than one arc may exist between two global transactions, indicating these transactions have the same serialization order at multiple sites. Let LastSerialized(s) denote the global transaction whose subtransaction is most recently serialized at site s. When a subtransaction of G at site s, denoted as Gs, is started, it can be determined that Gs must be serialized after LastSerialized(s). Therefore the arc (LastSerialized(s), G, s) is added to VirtGlobalSG. When a subtransaction Gs is serialized, it can be inferred that all non-serialized subtransactions executed at site s must be serialized after Gs. Thus, the set of arcs {(G, T, s) | T is executed but not serialized at site s} can be added to VirtGlobalSG. Figure 4-2 shows an example. In Figure 4-2(a), G1 is LastSerialized(s1) and the subtransactions of G2, G4 , and G5 at site s1 are not yet serialized. Assume G2 is the next subtransaction at site s1 to be serialized. The arcs (G1, G4, s1) and (G1, G5, s1) will then be removed and the arcs (G2, G4, s1) and (G2, G5, s 1 ) are added. Figure 4-2(b) shows the VirtGlobalSG after G2 is serialized. However, an added arc (Gi, Gj, s) may not match the global order. If the global order of G i is after that Gj while the arc (Gi, Gj, s) is to be added, some action must be taken to resolve this mismatch between global and local serialization orders. Under DAGSO, it is checked if the global order of Gi can be adjusted to be after that of Gj. More specifically, if there does not exist a path from G j to Gi in VirtGlobalSG, the global order adjustment can be done. Otherwise, some transactions need to be aborted to eliminate this inconsistency. There can be a variety of strategies in choosing abort victims to restore inconsistency between global and local serialization orders. We adopt a heuristic that aborts the validating requester if not all of its subtransactions are serialized, and aborts other transactions otherwise. Among several heuristics we tested by our simulation model, this heuristic turned out to be the best . Fig 4-3. described the DAGSO algorithm executed by the GCC, in terms of event-action pairs. Events are triggered by messages from either the clients or GTA's. Note that the interaction between clients and GTA's is the standard 2-phase commit protocol, whose operations are omitted from Figure 4-3 for simplicity. When a transaction commits, some arcs and nodes in VirtGlobalSG may be deleted. A transaction G and its associated arcs can be removed from VirtGlobalSG if all

transactions with smaller global orders commit, and G also commits.

s2 G0

G1

s3

s1

G2

s1

s1 G3

s2

G4

G5

LastSerialized(s1) s3 s2 G0

G1

s1

G2

(a) s1 s2

G3

Figure 4-3. Pseudo code of DAGSO.

s1 G4

Abort transactions in S' else Abort(G) else /* adjust the global order */ Delete {arc (LastSerialized(s), T, s) | T∈ S}; call AdjustGlobalOrder(S', G); Add {arc(G, T, s) | T∈ S}; LastSerialized(s) := G; until FALSE;

G5

LastSerialized(s1) (b) Figure 4-2 (a) VirtGlobalSG before G2 is serialized at site 's1'. (b) VirtGlobalSG after G2 is serialized at site 's1'. repeat wait(event); case event of A global transaction G is started: /* message from a client */ global order of G := global_counter; global_counter := global_counter+1; Global subtransaction of G at site s is started: /* message from the GTA at site 's' */ if the global order of LastSerialized(s) < the global order of G then Add arc (LastSerialized(s), G, s) to VirtGlobalSG; else /* local order does not follows global order */ if PathInVirtGlobalSG({G}, LastSerialized(s)) then Abort(G) else /* adjust the global order */ call AdjustGlobalOrder({G}, LastSerialized(s)); Add arc (LastSerialized(s), G, s) to VirtGlobalSG; Global subtransaction of G at site s is serialized: /* message from the GTA at site 's' */ S := {T | arc (LastSerialized(s), T, s) exists in VirtGlobalSG and T ≠ G}; S' := {T | T ∈ S and global order of T < global order of G}; if S' = ∅ then /* local orders follow global order */ Delete {arc (LastSerialized(s), T, s) | T∈S}; Add {arc(G, T, s) | T ∈ S}; LastSerialized(s) := G; else if PathInVirtGlobalSG(S', G) then if all of G's subtransactions are serialized then

The function PathInVirtGlobalSG(S, G) returns TRUE if there exists a path from some transaction in S to G. Procedure AdjustGlobalOrder(S, G), where the original global order of G is larger than that of any transaction in S, adjusts the global order such that the new global order of G is smaller than that of any transaction in S. Note that AdjustGlobalOrder(S, G) is invoked only when PathInVirtGlobalSG(S, G) returns FALSE. In this case, AdjustGlobalOrder(S, G) intends to adjust the global order such that any of the global orders of G and the transactions having path to G in VirtGlobalSG is less than any of the global orders of transactions in S. The pseudo code of PathInVirtGlobalSG(S, G) and AdjustGlobalOrder(S, G) are listed in Figure 4-4. boolean Function PathInVirtGlobalSG(S: set of global_trans; G: global_trans) begin /* global order of any transaction in S < global order of G */ Min = Minimum {global order of T | T ∈ S}; Backtrack from G along the inverse of the arcs using Breadth First Search (BFS) until either no inverse arc exists or the originating transactions of all incoming arcs have smaller global order than Min; Let Front_Set be the set of all transactions traversed; if S ∩ Front_Set ≠ trans. in S to G */ return TRUE; else return FALSE; end;

∅

then /* there exists a path from some

Procedure AdjustGlobalOrder(S: set of global_trans, G: global_trans) begin /* global order of any transaction in S < global order of G */ Min := Minimum {global order of T | T ∈ S}; Max := the global order of G; Adjust_Set := {T | T is a global transaction with global order in [Min, Max]}

Assign [Min, Max] to the transactions in Adjust_Set such that the global order of any transactions in Front_Set is less than that of any transaction in Adjust_Set - Front_Set end; Figure 4-4. Pseudo code of PathInVirtGlobalSG() and AdjustGlobalOrder() Example: Figure 4-5(a) shows a snapshot of VirtGlobalSG. Consider arc (G7, G1, s1) is to be added to the VirtGlobalSG, assuming G7 is LastSerialized(s) when G1 has a subtransaction starting at site s1. Thus, Adjust_Set = {G1...G7}. The subscript of each global transaction represents the original global order, i.e. the original global order is G0 p G1 p G2p G3 p G4 p G5 p G6 p G7. Firstly, PathInVirtGlobalSG({G1}, G7) is invoked to check if there exists a path from G1 to G7. Using backwards BFS starting from G7, we get Front_Set = {G2, G4, G6, G7}. Since Front_Set does not contain G1, there does not exist a path from G1 to G7. Secondly, AdjustGlobalOrder({G1}, G7) is called to switch the global orders of Front_Set and Adjust_Set − Front_Set,. That is, the global orders of {G2, G4, G6, G7} are moved to be before those of {G1, G3, G5}. As a result, the new global order G0 p G2 p G4 p G6 p G7 p G1 p G3 p G5 is obtained as shown in Figure 4-5(b). At this point, arc (G7, G1, s) can be added without violating the global order.

s1

s1

G0

G1

G2

serialization graph contains a cycle, this cycle must also exist in VirtGlobalSG. However, from Lemma 1, we know VirtGlobalSG is acyclic. Therefore the real global serialization graph is acyclic, i.e. the set of local schedules is global serializable. ❐

Theorem 2. The algorithm is deadlock free. Proof: Suppose there exists a global deadlock. Without loss of generality, the wait for graph (WFG) contains the following cycle:

Gk Gi+1

G1 G2 G3 Gi

G4

For any pair of transactions Gi and Gj in the cycle, j = (i+1) mod k, Gj must have acquired the exclusive lock that Gi is waiting for at some site, say site 's'. Thus, Gj must be serialized before Gi at site 's'. However, this indicates there exists a corresponding cycle in VirtGlobalSG, which contradicts Theorem 1. Therefore, the WFG must not contain a cycle, i.e. DAGSO is deadlock free. ❐

s1

G3

s2

G4

G5

G6

G7 s3

s2

In the following discussion, we denote N as the number of transactions in VirtGlobalSG and M the number of sites a global transaction accesses. Lemma 2. The number of arcs in VirtGlobalSG is O(NM).

(a)

Lemma 3. The complexity of AdjustGlobalOrder() is O(N).

s1 s1

G0

s1

G2

G4

G6

s3

s2

s2

G7

G1

G3

G5

(b) Figure 4-5 (a) VirtGlobalSG before G1 is started at site s1. (b) VirtGlobalSG after G1 is started at site s1.

4.3

Correctness and Complexity of the Algorithm

DAGSO algorithm guarantees global serializability and is deadlock free. This section provides formal proofs of these properties and analyzes its computational complexity. Due to space limitation, we do not present proofs of Lemma 1, 2 and 3. The reader is referred to [Hwan93] for details. Lemma 1. For each arc (Gi, Gj, s) in VirtGlobalSG, the global order of Gi is less than that of Gj at any time. Theorem 1. The algorithm can achieve global serializability. Proof: Since each path in VirtGlobalSG represents the potential local dependency order, if the real global

Theorem 3. The complexity of DAGSO to process a global transaction is O(NM2)). Proof: For each subtransaction, GCC is invoked twice for validation of global serialization order, once at the time when it is started and again when it reaches its serialization point. The worst case occurs when the added relative local serialization order is the inverse of the global orders. In this case, PathInVirtGlobalSG() and (possibly) AdjustGlobalOrder() are invoked, which takes O(NM) time. Since each transaction has M subtransactions, the time for GCC to process a global transaction is therefore O(NM2)). ❐ With respect to the computation complexity, Top Down incurs least overhead since there is no need to validate a global transaction either before or after the transaction execution. Now let us compare the DAGSO complexity with that of Site Graph and Bottom Up which also perform validation operations. Since each global transaction is validated once, the complexity of both Site Graph and Bottom Up to process a global transaction is O(NM). One might therefore suspect that DAGSO incurs

higher overhead in validating a global transaction than the other two optimistic approaches. However, notice that Theorem 3 is a worst case analysis. We have conducted experiments whose results show that most validation requests do not perform graph cycle detection and only take constant time [Hwan93].

where µ (G) and σ (G) are the mean and standard deviation of global transaction response time, and k is a weighting factor.

Table 5-1. System Model Parameters and Settings System Parameter

5. Performance Evaluation The performance of a global concurrency control algorithm can be affected by a number of factors. In the previous section, we analyzed DAGSO and other three algorithms with respect to computational complexity. However, the analytical results provide only a partial answer, since an algorithm’s behavior is related to various factors, including data contention due to blocking, and resource (i.e., CPU, I/O, memory, and network) contention due to access concurrency. To further investigate the DAGSO algorithm, we have conducted a comprehensive performance study. In this section, we demonstrate some of the performance results. For more data and detailed performance analysis, the reader is referred to [Huan93].

Setting

Number of local database sites

4

Number of global terminals

10

Number of local terminals

10

Local database size (pages)

1000

Number of resource units

1, 20

1 Unit = 1 CPU + 2 disks CPU time for processing a page (ms)

10

Disk time for reading/writing a page (ms)

[10,30] uniform

CPU time for sending/receiving a page (ms)

5

Table 5-2. Transaction Model Parameters and Settings

5.1

Simulation Model Transaction Parameter

We have developed a detailed simulation model for studying a variety of global concurrency control algorithms and performance trade-offs. We briefly describe the model in the following. The simulation model includes global and local transaction models as well as system models. It captures the main functional elements of both global concurrency control and local DBMS's, including global transaction manager, global transaction agent, local transaction manager, local lock manager, local deadlock detection, local data manager, global timeout, and CPU and I/O resources. The local DBMS model enforces strict 2PL to ensure local serializability and uses wait for graph to detect local deadlocks. Besides, global timeout is employed for those global concurrency control algorithms such as Bottom Up that do not prevent global deadlocks. Our performance model has been implemented on a simulation testbed using the SES/workbench simulation tool [SES92]. Table 5-1 summarizes the parameters used for describing the federated database system model. The table also shows the parameter value range varied during the experiments. Table 5-2 presents the global and local transaction model parameters and their default settings. Most of the parameters are selfexplanatory. Global transaction timeout factor is used for setting the global timeout period. Let k denote the global transaction timeout factor. The global timeout interval is a function of both mean (µ ) and deviation (σ ) of global transaction response time as specified by formula (5-1). Global timeout interval = µ(G) + k * σ(G)

(5-1)

Setting

Global transaction length (number of pages)

12

Local sites a global transaction may access

2

Global transaction write probability

0.0, 0.25, ..., 1.0

Global transaction think time interval

1 second

Global transaction timeout factor

2

Local transaction size (number of pages)

12

Local transaction update probability

0.25

Local transaction think time

1 second

The primary metric used throughout the experiments is Throughput, which is the number of global/local transactions completed per second. Several secondary performance metrics are used in analyzing the simulation results, which include Abort ratio, Blocking Ratio, CPU and I/O utilizations, etc. The data collection in the experiments is based on the method of replication. The following graphs only present the mean values of the performance measures.

5.2

Simulation Results

In this section we show the simulation results of five FDBS concurrency control algorithms with centralized control, namely Global Timeout, Bottom Up, Top Down, Site Graph, and DAGSO. Global Timeout is an approach that employs global timeout mechanism. Since our simulation model uses

We first investigate the throughput of global transactions in a system without resource contention. Each global transaction is assumed to access two sites. We intend to see the performance of the five global concurrency control algorithms under different levels of data contention. To focus on the effects of data contention, we isolate resource contention by setting the number of resource units at each site to 20 units, where each unit of resource corresponds to one CPU and two disks. Under such a setting, the measured CPU and I/O utilizations are less than 20%, and their queues are almost always empty.

G_Throughput

Figure 5-1 shows the throughput of the five algorithms under this setting. As expected, for each algorithm, the throughput decreases as the data contention increases. Furthermore, the throughput of DAGSO is very close to that of Global Timeout and outperforms any other algorithms under various data contention. Because of its conservative subtransaction submission at GTA, Top Down results in the lowest throughput.

8

Global Timeout

6

Bottom Up Top Down

4 2

Global Timeout

2

G_Throughput

strict 2PL at each local DBMS and uses 2PC protocol between client and GTA's for commitment, global serializability is guaranteed without any further control [Pu88]. Global Timeout uses timeout to resolve global deadlocks.

1.5

Bottom Up

1

Top Down DAGSO

0.5 0

0

0.25 0.5 0.75 G_Write_Prob

1

Site Graph

Figure 5-2. Effect of Data Contention with Resource Contention We also conducted experiments to investigate the behavior of various algorithms in different operating regions by varying parameters such as number of global terminals and transaction length. In particular, we examined the validation behavior of DAGSO. The experimental results show that, under reasonable system loads, more than 80% of the validation requests end up with the same global and local serialization orders and do not invoke the expensive graph acyclic checking. Thus, DAGSO validation does not incur much overhead at run time. In addition, the dynamic global order adjustment significantly reduces unnecessary aborts. Due to space limitation, these experimental results are not included in this paper. Interested readers are referred to [Huan93] and [Hwan93].

6. Conclusion

DAGSO 0

0.25 0.5 0.75 G_Write_Prob

1

Site Graph

Figure 5-1. Effect of Data Contention without Resource Contention To study the effects of resource contention, the previous experiment was repeated by setting the number of resource units at each site to 1. The CPU and IO utilizations under this setting are more than 80% and 90% respectively. The throughput result from this experiment is shown in Figure 5-2. The overall performance trend is similar to that in Figure 5-1 except that the performance of optimistic approaches (Global Timeout, Bottom Up and DAGSO) drops more rapidly than that of pessimistic approaches (Site Graph and Top Down) due to resource contention. However, DAGSO is still comparable to Global Timeout and outperforms other algorithms. Under low data contention, the validation abort rate of DAGSO is higher than the timeout abort rate of Global Timeout, and thus DAGSO does not perform as well as Global Timeout.

A number of FDBS concurrency control algorithms have been proposed recently for ensuring global serializability. Even though each algorithm has its unique characteristics and strength, it also exhibits some drawbacks, like low concurrency, global deadlock, or high wastage of system resources. These problems may limit the system performance such as global transaction throughput. Our aim is to reduce these problematic effects while maintaining the advantageous features of the existing algorithms, hence improving the system performance. In this paper we proposed a new protocol, called Dynamic Adjustment of Global Serialization Order (DAGSO). It employs three basic mechanisms, i.e. Predefined Global Order, Optimistic Concurrency Control, and Dynamic Adjustment of Global Serialization Order. By changing the global transaction serialization order at the validation stage, DAGSO is able to accept the same set of transaction execution schedules as the optimistic-oriented Bottom Up algorithm, thus maintaining high concurrency. In addition, using the feature of early validation, DAGSO reduces wastage of system resources. We proved that DAGSO achieves global serializability and is deadlock free. In addition, we formally analyzed the complexity of DAGSO and compared them to those of Top Down, Bottom Up and Site

Graph approaches proposed in the literature. The analytical results show that DAGSO does not incur much computation overhead.

[Geor91] D. Georgakopoulos, M. Rusinkiewicz and A Sheth, “On Serializability of Multidatabase Transactions Through Forced Local Conflicts,” Proc. of the 7th ּInt. Conf. on Data Engineering, Feb. 1991.

We further investigated the run-time behavior of DAGSO through a performance study. Our simulation results indicate that DAGSO outperforms the others in most system operating regions and global transaction behavior.

[Hwan92] S.Y. Hwang, J. Srivastava and J. Li, “Transaction Recovery in Federated Autonomous Databases,” to appear in Journal of Parallel and Distributed Databases, 1993, also available as Dept. Computer Sci., Univ. Minnesota, Minneapolis, MN, Tech. Rep TR 92-15, Mar. 1992.

Our future work includes investigation of different global transaction order assignment policies. In this paper, DAGSO uses the global timestamp for defining the global transaction order. Other possible global assignment policies can use, for example, dependence analysis information to determine transaction global order. The question to be answered is how different policies affect DAGSO performance.

[Hwan93] S.Y. Hwang, J. Huang, and J. Srivastava, "Concurrency Control in Federated Databases: A Dynamic Approach", Dept. Computer Sci., Univ. Minnesota, Minneapolis, MN, Tech. Rep TR 92-66, April, 1993.

Acknowledgement

[Huan93] J. Huang, S.Y. Hwang and J. Srivastava, “Concurrency Control in Federated Database Systems: A Performance Study,” Dept. Computer Sci., Univ. Minnesota, Minneapolis, MN, Tech. Rep TR 93-15, Feb. 1993. [Kung81] H.T.Kung and J.T.Robinson, "On Optimistic Methods for Concurrency Control," ACM TODS, Vol. 6, No. 2, June, 1981.

We thank Mr. Mark Foresti of Rome Laboratories, Griffiss Air Force Base, New York, for his comments on an earlier draft of the paper as well as his overall encouragement and support.

[Leu90] Y. Leu and A.K. Elmagarmid, “A Hierarchical Approach to Concurrency Control for Multidatabases,” Proc. of the 2nd Intl. Symposium on Databases in Parallel and Distributed Systems, June 1990.

References

[Mehr92a] S. Mehrotra, R. Rastogi, H. F. Korth and A. Silberschatz "Relaxing Serializability in Multidatabase System," Proc. of the 2nd Int'l. Workshop on Research Issues on Data Engineering: Transaction and Query Processing, Feb. 1992.

[Batr92] R. K. Batra, M. Rusinkiewicz, and D. Georgakopoulos, “A Decentralized Deadlock-free Concurrency Control Method for Multidatabase Transactions,” GTE Technical Report , 1992. [Bern87] P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems. Addison Wesley, Reading, MA, 1987. [Brei88] Y. Breitbart and A. Silberschatz, “Multidatabase Update Issues,” Proc. of the ACM SIGMOD Int’l Conference on Management of Data, 1988. [Du89] W. Du and A. Elmagarmid, “Quasi Serializability: A Correctness Criterion for Global Concurrency Control in Interbase,” Proc. of the 15th Intl. Conf. on Very Large Data Bases, Amsterdam, Aug. 1989. [Elma90] A.K. Elmagarmid and W. Du, “A Paradigm for Concurrency Control in Heterogeneous Distributed Database Systems,” Proc. of the 6th ּInt. Conf. on Data Engineering, Feb. 1990. [Eswa76] K.P.Eswaran, J. N. Gray, R. A. Lorie and I. L. Traiger, "The Notions of Consistency and Predicate Locks in a Database Systems," Comm. ACM, Vol. 19, No. 11, Nov. 1976. [GaMo91] H. Garcia-Molina, “Global Consistency Constraints Considered Harmful in Heterogeneous Database Systems,” The 1st Intl. Workshop on Interoperability in Multidatabase Systems, Jan. 1991.

[Mehr92b] S. Mehrotra, R. Rastogi, H. F. Korth and A. Silberschatz "The Concurrency Control Problem in Multidatabases: Characteristics and Solutions,” Proc. of the ACM SIGMOD Conference, 1992. [Pu88] C. Pu, “Superdatabases for Composition of Heterogeneous Databases,” Proc. of the 4th Intl. Conf. on Data Engineering, Feb. 1988. [Pu91] C. Pu and A. Leff, “Replica Control in Distributed Systems: An Asynchronous Approach,” Proc. of the ACM SIGMOD Int’l Conference on Management of Data, 1991. [Rusi90] M. Rusinkiewicz, A. Elmagarmid, Y. Leu, and W. Litwin, “Extending the Transaction Model to Capture More Meaning,” SIGMOD record, Vol.19, No.1, March, 1990. [SES92] “SES/workbench Reference Manual,” Release 2.1, Scientific and Engineering Software, Inc., 4301 Westbank Drive, Building A, Austin, TX 78746, February, 1992. [Shet90] A.P. Sheth and J.A. Larson, "Federated Database Systems for Managing Distributed, Heterogenous, and Autonomous Databases," ACM Computing Surveys, Vol.22, No.3, Sept. 1990.