Performance Comparison of Three Modern DBMS ... - CiteSeerX

0 downloads 0 Views 298KB Size Report
Mar 16, 1994 - study is to examine performance related issues of these three .... case of small to medium size retrieval operations. ... the second solution is followed then the client request should be ..... has cardinality of 20000 tuples and requires 1000 disk pages. ..... Principles and Techniques in the Design of ADMS .
Performance Comparison of Three Modern DBMS Architectures  Alexios Delis and Nick Roussopoulos

Institute for Advanced Computer Studies and Computer Science Department University of Maryland College Park, MD 20742 March 16, 1994

Abstract The introduction of powerful workstations connected through LAN networks inspired new DBMS architectures which o er high performance characteristics. In this paper, we examine three such software architecture con gurations, namely: Client-Server (CS), RAD-UNIFY1 type of DBMS (RU) and Enhanced ClientServer (ECS). Their speci c functional components and design rationales are discussed. We use three simulation models to provide a performance comparison under di erent job workloads. Our simulation results show that the RU almost always performs slightly better than the CS especially under light workloads and that ECS o ers signi cant performance improvement over both CS and RU. Under reasonable update rates, the ECS over CS (or RU) performance ratio is almost proportional to the number of participating clients (for less than 32 clients). We also examine the impact of certain key parameters on the performance of the three architectures and nally show that ECS is more scalable that the other two. Index Terms: Client{Server DBMS Architectures, Data Caching, DBMS Architecture Modeling, Incremental Database Operations, Performance Analysis.

Appeared in the IEEE{Transactions on Software Engineering, February 1993, Volume 19, Number 2, Pages 120-138. This research was sponsored partially by the National Science Foundation under Grant IRI{8719458, by the Air Force Oce of Scienti c Research Grant AFOSR{89{0303, and by the University of Maryland Institute for Advanced Computer Studies (UMIACS). R 1UNIFY is a trademark of Unify Corporation. 

1 Introduction Centralized DBMSs present performance restrictions due to their limited resources. In the early eighties, a lot of research was geared towards the realization of database machines. Specialized but expensive hardware and software were used to built complex systems that would provide high transaction throughput rates utilizing parallel processing and accessing of multiple disks. In recent years though, we have observed di erent trends. Research and technology in local area networks have matured, workstations became very fast and inexpensive, while data volume requirements continue to grow rapidly [2]. In the light of these developments, computer systems|and DBMSs in particular| in order to overcome long latencies have adopted alternative con gurations to improve their performance. In this paper, we present three such con gurations for DBMSs that strive for high throughput rates, namely: the standard Client{Server[23], the RAD{UNIFY type of DBMS[19], and the Enhanced Client{Server architecture[16]. The primary goal of this study is to examine performance related issues of these three architectures under different workloads. To achieve that, we develop closed queuing network models for all architectures and implement simulation packages. We experiment with di erent workloads expressed in the context of job streams, analyze simulated performance ratios, and derive conclusions about system bottlenecks. Finally, we show that under light update rates (1%-5%) the Enhanced Client{Server o ers performance almost proportional to the number of the participating workstations in the con guration for 32 or less workstations. On the other hand, the RAD{UNIFY performs almost always slightly better than the pure Client{Server architecture. In section 2, we survey related work. Section 3 discusses the three DBMS architectures and identi es their speci c functional components. In section 4, we propose three closed queuing network models for the three con gurations and talk brie y about the implemented simulation packages. Section 5 presents the di erent workloads, the simulation experiments and discusses the derived performance charts. Conclusions are found in section 5.

2 Related Work There is a number of studies trying to deal with similar issues like those we investigate here. Roussopoulos and Kang [18] propose the coupling of a number of workstations 2

with a mainframe. Both workstations and mainframe run the same DBMS and the workstations are free to selectively download data from the mainframe. The paper describes the protocol for caching and maintenance of cached data. Hagman and Ferrari [11] are among the rst who tried to split the functionality of a database system and o {load parts of it to dedicated back{end machines. They instrumented the INGRES DBMS, assigned di erent layers of the DBMS to two di erent machines and performed several experiments comparing the utilization rates of the CPUs, disks and network. Among other results, they found that generally there is a 60% overhead in disk I/O and a suspected overhead of similar size for CPU cycles. They attribute these ndings to the mismatch between the operating and the DBMS system. The cooperation between a server and a number of workstations in an engineering design environment is examined in [13]. The DBMS prototype that supports a multi{level communication between workstations and server which tries to reduce redundant work at both ends is described. DeWitt et al. [8] examine the performance of three workstation{server architectures from the Object{Oriented DBMS point of view. Three approaches in building a server are proposed: object, server and le server. A detailed simulation study is presented with di erent loads but no concurrency control. They report that the page and le server gain the most from object clustering, page and object servers are heavily dependent on the size of the workstation bu ers and nally that le and page servers perform better in experiments with a few write type transactions (response time measurements). Wilkinson and Niemat in [26] propose two algorithms for maintaining consistency of workstation cached data. These algorithms are based on cache, and notify locks and new lock compatibility matrices are proposed. The novel point of this work is that server concurrency control and cache consistency are treated in a uni ed approach. Simulation results show that cache locks always give a better performance than two{phase locking and that notify locks perform better than cache locks whenever jobs are not CPU bound. Alonso et al. in [2] support the idea that caching improves performance in information retrieval systems considerably and introduce the concept of quasi{caching. Di erent caching algorithms| allowing various degree of cache consistency| are discussed and studied using analytical queuing models. Delis and Roussopoulos in [6] through a simulation approach examine the performance of server based information systems under light updates and they show that this architecture o ers signi cant transaction processing rates even under considerable updates of the server resident data. In [17], we describe modern Client-Server DBMS architectures and report some preliminary results on their 3

performance, while in [7], we examine the scalability of three such Client{Server DBMS con gurations. Carey et al. in [5] examine the performance of ve algorithms that maintain consistency of cached data in client{server DBMS architecture. The important assumption of this work is that client data are maintained in cache memory and they are not disk resident. Wang and Rowe in [25] a similar study examine the performance of ve more cache consistency algorithms in a client{server con guration. Their simulation experiments indicate that either a two phase locking or a certi cation consistency algorithm o er the best performance in almost all cases. Some work, indirectly related to the issues examined in this paper, are the Goda distributed ling system project [20] and the cache coherence algorithms described in [3]. The ECS model{presented here{is a slightly modi ed abstraction of the system design described in [18] and is discussed in [16]. In this paper, we extend the work in three ways: rst, we give relative performance measures with the other two existing DBMS con gurations, analyze the role of some key system parameter values, and nally, provide insights about the scalability of the architectures.

3 Modern DBMS Architectures Here, we brie y review three alternatives for modern database system architectures and highlight their di erences. There is a number of reasons that made these con gurations a reality: 1. The introduction of inexpensive but extremely fast processors on workstations with large amount of main memory and medium size disk. 2. The ever{growing volume of operational databases. 3. The need for data staging: that is, extracting for a user class just a portion of the database that de nes the user's operational region. Although the architectures are general, they are described here for the relational model only.

3.1 Client{Server Architecture (CS) The Client{Server architecture(CS) is an extension of the traditional centralized database system model. It originated in engineering applications where data are mostly processed 4

in powerful clients, while centralized repositories with check in and out protocols are predominantly used for maintaining data consistency. In a CS database [23], each client runs an application on a workstation (client) but does database access from the server. This is depicted in Figure 1(a). Only one client is shown for the sake of presentation. The communication between server and clients is done through remote calls over a local area network (LAN) [22]. Applications processing is carried out at the client sites, leaving the server free to carry out database work only. The same model is also applicable for a single machine in which one process runs the server and others run the clients. This con guration avoids the transmission of data over the network but obviously puts the burden of the system load on the server. In this paper, we assume that server and client processes run on di erent hardware.

3.2 RAD{UNIFY Type of DBMS Architecture(RU) The broad availability of high speed networks and fast processing diskless workstations were the principal reasons for the introduction of the RAD{UNIFY type of DBMS architecture(RU) presented by Rubinstein et al. in [19]. The main objective of this con guration is to improve the response time by utilizing both the client processing capability and its memory. This architecture is depicted in Figure 1(b). The role of the server is to execute low level DBMS operations such as locking and page reads/writes. As Figure 1(b) suggests, the server maintains the lock and the data manager. The client performs the query processing and determines a plan to be executed. As datapages are being retrieved, they are sent to the diskless client memory for carrying out the rest of the processing. In the above study, some experimental results are presented where mostly look up operations perform much better than in traditional Client{Server con gurations. It is also acknowledged that the architecture may perform well only under the assumption of light update loads. More speci cally in the system's prototype, it is required that only one server database writer is permitted at a time. The novel point of the architecture is the introduction of client cache memory used for database processing. Therefore, the clients can use their own CPU to process the pages resident in their caches and the server degenerates to a le server and a lock manager. The paper [19] suggests that the transfer of pages to the appropriate client memory gives improved response times at least in the case of small to medium size retrieval operations. This is naturally dependent on the size of each client cache memory. 5

3.3 Enhanced Client{Server Architecture (ECS) The RU architecture relieves most of the CPU load on the server but does little to the biggest bottleneck, the I/O data manager. The Enhanced Client{Server Architecture (ECS) reduces both the CPU and the I/O load by caching query results and by incorporating in the client a full{ edged DBMS for incremental management of cached data [15, 16]. Shared Database

Shared Database

Server DBMS

Commun. Software

Server DBMS Locking and Data Managers

Server DBMS

Commun. Software

Commun. Software

CS Server

RAD-UNIFY

LAN

LAN

Commun. Software

Shared Database

Server

ECS Server

LAN

Commun. Software

Commun. Software

Client DBMS

Client DBMS Application Soft.

Client

(a)

Application Soft.

Application Soft.

Client Local Disk

Client

Client

Cached Data

(b)

(c)

Figure 1: CS, RU, and ECS architectures Initially, the clients start o with an empty database. Caching query results over time permits a user to create a local database subset which is pertinent to the user's application. Essentially, a client database is a partial replica of the server database. Furthermore, a user can integrate into her/his local database private data not accessible to others. Caching in general presents advantages and disadvantages. The two major advantages are that it eliminates requests for the same data from the server and boosts performance with client CPUs working on local data copies. In the presence of updates though, the system needs to ensure proper propagation of new item values to the appropriate clients. Figure 1(c) depicts this new architecture. Updates are directed for execution to the server which is the primary site. Pages 6

to be modi ed are read in main memory, updated and ushed back to the server disk. Every server relation is associated with an update propagation log which consists of timestamped inserted tuples and timestamped qualifying conditions for deleted tuples. Only updated (committed) tuples are recorded in these logs. The amount of bytes written in the log per update is generally much smaller than the size of the pages read in main memory. Queries involving server relations are transmitted to and processed initially by the server. When the result of a query is cached into a local relation for the rst time, this \new" local relation is bound to the server relations used in extracting the result. Every such binding is recorded in the server DBMS catalog by stating three items of interest: the participating server relation(s), the applicable condition(s) on the relation(s), and a timestamp. The condition is essentially the ltering mechanism that decides what are the qualifying tuples for a particular client. The timestamp indicates what is the last time that a client has seen the updates that may a ect the cached data. There are two possible scenarios for implementing this binding catalog information. The rst approach is that the server undertakes the whole task. In this case, the server maintains all the information about who caches what and up to what time. The second alternative is that each client individually keeps track of its binding information. This releases the server's DBMS from the responsibility to keep track of a large number of cached data subsets and the di erent update statuses which multiplies quickly with the number of clients. Query processing against bound data is preceded by a request for an incremental update of the cached data. The server is required to look up the portion of the log that maintains timestamps greater than the one seen by the submitting client. This is possible to be done once the binding information for the demanding client is available. If the rst implementation alternative is used this can be done readily. However, if the second solution is followed then the client request should be accompanied along with the proper binding template. This will enable the server to perform the correct tuple ltering. Only relevant fractions (increments) of the modi cations are propagated to the client's site. The set of algorithms that carry out these tasks are based on the Incremental Access Methods for relational operators described in [15] and involve looking at the update logs of the server and transmitting di erential les [21]. This signi cantly reduces data transmission over the network as it only transmits the increments a ecting the bound object, compared with traditional CS architectures in which query results are continuously transmitted in their entirety. It is important to point out some of the characteristics of the concurrency control 7

mechanism assumed in the ECS architecture. First, since updates are done on the server, a 2{ locking protocol is assumed to be running by the server DBMS (this is also suggested by a recent study [25]). For the time being, and until commercial DBMSs reveal a 2{ commit protocol, we assume that updates are single server transactions. Second, we assume that the update logs on the servers are not locked and, therefore, the server can process multiple concurrent requests for incremental updates.

4 Models for DBMS Architectures In this section, we develop three closed queuing network models one for each of the three architectures. The implementation of the simulation packages is based on them. We rst discuss the common components of all models and then the speci c elements of each closed network.

4.1 Common Model Components and DBMS Locking All models maintain a WorkLoad Generator that is part of either the client or the workstation. The WorkLoad Generator is responsible for the creation of the jobs (of either read or write type). Every job consists of some database operation such as selection, projection, join, update. These operations are represented in an SQL{like language developed as part of the simulation interface. This interface speci es the type of the operation as well as simple and join page selectivities. The role of the WorkLoad Generator is to randomly generate client job from an initial menu of jobs. It is also responsible for maintaining the mixing of read and write types of operations. When a job nishes successfully, the WorkLoad Generator submits the next job. This process continues until an entire sequence of queries/updates is processed. To accurately measure throughput, we assume that the sites' query/update jobs are continuous (no think{time between jobs). All three models have a Network Manager that performs two tasks : 1. It routes messages and acknowledgments between the server and the clients. 2. It transfers results/datapages/increments to the corresponding client site depending on the CS/RU/ECS architecture respectively. Results are answers to the queries of the requesting jobs. Datapages are pages requested by RU during the query processing at the workstation site. Increments are new records and tuple 8

NetworkParameters Meaning init time overhead for a remote procedure call net rate network speed mesg length average size of requesting messages

V alue 10 msec 10 Mbits/sec 300 bytes

Table 1: Network Parameters identi ers of deleted records used in the ECS model incremental maintenance of cached data. The related parameters to the network manager appear in Table 1. The time overhead for every remote call is represented by init time [14] while net rate is the network's transfer rate(Mbits/sec). The average size of each job request message is mesg length. Locking is at the page level. There are two types of locks in all models: shared and exclusive. We use the lock compatibility matrix described in [10] and the standard two{phase locking protocol.

4.2 Client{Server Model Figure 2 outlines the closed queueing network for the CS model. It is an extension of the model proposed in [1]. It consists of three parts: the server, the network manager and the client. Only one client is depicted in the gure. The client model runs application programs and directs all DBMS inquiries and updates through the network manager to the CS server. The network manager routes requests to the server and transfers the results of the queries to the appropriate client nodes. A job submitted at the client's site, passes through the Send Queue, the network, the server's Input Queue, and nally arrives at the Ready Queue of the system where it awaits service. A maximum multiprogramming degree is assumed by the model. This limits the maximum number of jobs concurrently active and competing for system resources in the server. Pending jobs remain in the Ready Queue until the number of active jobs becomes less than the multiprogramming level (MPL). When a job becomes active, it is queued at the concurrency control manager and is ready to compete for system resources such as CPU, access to disk pages and lock tables (which are considered to be memory resident structures). In the presence of multiple jobs, CPU is allocated in a 9

Concurrency Control Queue

Ready Queue

CCM

MPL Send Queue

Input Queue

Update Queue

Update UPD

Network

Blocked Queue

Blocked Operation WorkLoad Generator

Read RD

PRC

Manager

Processing Queue

Read Queue

ABRT

Abort Operation Abort Queue Receive Queue

Output Queue

Commit Queue CMT

Client

Server

Figure 2: Queuing Model for CS Architecture round{robin fashion. The physical limitations of the server main memory (and the client main memory as we see later on) allocated to database bu ering may contribute to extra data page accesses from the disk. The number of these page accesses are determined by formulas given in [24] and they depend on the type of the database operation. Jobs are serviced at all queues with a FIFO discipline. The concurrency control manager (CCM ) attempts to satisfy all the locks requested by a job. There are several possibilities depending on what happens after a set of locks is requested. If the requested locks are acquired, then the corresponding page requests are queued in the ReadQueue for processing at the I/O RD module. Pages read by RD are bu ered in the ProcessingQueue. CPU works (PRC ) on jobs queued in ProcessingQueue. If only some of the pages are locked successfully, then part of the request can be processed normally (RD; PRC ) but its remaining part has to reenter the concurrency control manager queue. Obviously, this is due to a lock con ict and is routed to CCM through the Blocked Queue. This lock will be acquired when the con ict ceases to exist. For each update request, the corresponding pages are exclusively locked, and subsequently bu ered and updated (UPD). If the job is unsuccessful in obtaining a lock, it is queued in the Blocked Queue. If the same lock request passes unsuccessfully a prede ned number of times through the Blocked Queue, then a deadlock detection mechanism 10

ServerParameters server cpu mips disk tr main memory instr per lock instr selection instr projection instr join instr update instr RD page instr WR page ddlock search kill time mpl

Meaning processing power of server average disk transfer time size of server main memory instructions executed per lock instructions executed per selected page instructions per projected page instructions per joined page instructions per updated page instructions to read a page from disk instructions to write a page to disk deadlock search time time required to kill a job multiprogramming level

V alue 21MIPS 12msec 2000 pages 4000 10000 11000 29000 12500 6500 8000 10 msec 0.2 sec 10

Table 2: Server Parameters is triggered. The detection mechanism simply looks for cycles in a wait{for graph that is maintained throughout the processing of the jobs. If such a cycle is found, then the job with the least amount of processing done so far is aborted and restarted. Aborted jobs release their locks and rejoin the system's Ready Queue. Finally, if a job commits, all changes are re ected on the disk, locks are released, and the job departs. Server related parameters appear in Table 2 (these parameters are also applicable to the other two models). Most of them are self{describing and give processing overheads, time penalties, ratios and ranges of system variables. Two issues should be mentioned: rst, the queuing model assumes an average value for disk access, thereby avoiding issues related to data placement on the disk(s). Second, whenever a deadlock has been detected, the system timing is charged with an amount equal to ddlock search  active jobs + kill time. Therefore, the deadlock overhead is proportional to the number of active jobs. The database parameters (applicable to all three models) are described by the set of parameters shown in Table 3. Each relation of the database consists of the following information: a unique name (rel name), the number of its tuples (card), and the size of each of those tuples (rel size). The stream mix indicates the composition of the streams submitted by the WorkLoad Generator(s) in terms of read and write jobs. Note also that the size of the server main memory was de ned to hold just a portion of the disk resident database which is 25% in the case of multiprogramming equal to one(we have applied this fraction concept later on with the sizes of the client main mem11

DataParameters name of db dp size num rels ll factor rel name card rel size stream mix

Meaning name of the database size of the data page number of relations page lling factor name of a relation cardinality of every relation size of a relation tuple (bytes) query/update ratio

V alue TestDataBase 2048 bytes 8 98% R1, R2,.., R8 20k 100 10

Table 3: Data Parameters ories). For the above selected parameter values every segment of the multiprogrammed main memory is equal to 200 pages (main memory=mpl).

4.3 RAD{UNIFY Model Input Queue

Send Queue

Concurrency Control Queue

Ready Queue

MPL

CCM

Update Queue UPD

Network WorkLoad Generator

Blocked Queue

Manager

Read Queue

PRC Aborted or Committed Job

? more

RD

Abort Queue

Processing Queue

ABRT

? Rec. Queue

Output Queue

Commit Queue CMT

Diskless Client

RAD-UNIFY Server

Figure 3: Queuing Model for RAD{UNIFY Architecture Figure 3 depicts a closed network model for the RAD{UNIFY type of architecture. There are a few di erences from the CS model: 

Every client uses a cache memory for bu ering datapages during query processing. 12

  

Only one writer at a time is allowed to update the server database. Every client is working o its nite capacity cache and any time it wants to request an additional page forwards this request over to the server. The handling of the aborts is done in a slightly di erent manner.

The WorkLoad Generator creates the jobs which, through the proper SendQueue, the network and the server's InputQueue are directed to the server's ReadyQueue. The functionality of the MPL processor is modi ed to account for the single writer requirement. One writer may coexist with more than one readers at a time. The functionality of the UPD and RD processors, their corresponding queues as well as the queue of blocked jobs is similar to that of the previous model. The most prominent di erence from the previous model is that the query job pages are only read (ReadQueue and RD service module) and then through the network are directed to the workstation bu ers awaiting processing. The architecture capitalizes on the workstations processing capability. The server becomes the device responsible for the locking and page retrieval which make up the \low" level database system operations. Write type of jobs are executed solely on the server and are serviced with the UpdateQueue and the UPD service module. As soon as pages have been bu ered in the client site (ProcessingQueue), the local CPU may commence processing whenever is available. The internal loop of the RU client model corresponds to the subsequent requests of the same page that may reside in the client cache. Since the size of the cache is nite, this may force request of the same page many times{depending on the type of the job [24]. The replacement is performed in either a least recently used (LRU) discipline or the way speci c database operations call for (i.e. case of the sort{merge join operation). The wait{for graph of the processes executing is maintained at the server which is the responsible component for deadlock detection. Once a deadlock is found, a job is selected to be killed. This job is queued in the AbortQueue and the ABRT processing element releases all the locks of this process and sends an abort signal to the appropriate client. The client abandons any further processing on the current job, discards the datapages fetched so far from the server and instructs the WorkLoad Generator to resubmit the same job. Processes that commit notify with a proper signal the diskless client component, so that the correct job scheduling takes place. In the presence of an incomplete job (still either active or pending in the server) the WorkLoad Generator takes no action awaiting for either a commit or an abort signal. 13

4.4 Enhanced Client{Server Model The closed queuing model for the ECS architecture is shown in Figure 4. There is a major di erence from the previous models: 

The server is extended to facilitate incremental update propagation of cached data and/or caching of new data. Every time a client sends a request, appropriate sections of the server logs need to be shipped from the server back over to the client. This action decides whether there have been more recent changes than the latest seen by the requesting client. In addition, the client may demand new data (not previously cached on the local disk) for the processing of its application. Send Queue

Input Queue

Concurrency Control Queue

Ready Queues MPL

WorkLoad Generator

CCM

Receive Messages

Update Queue

No

Update UPD

More?

Network

Yes

Blocked Queue

Data Access and Processing

Blocked Operation Abort Queue

Manager

Abort Operation

ABRT

ISM

LOGRD Queue

No

Yes Upd

Read Type Xaction

LOG RDM

Update Processing Rec. Queue

Send Messages

Commit Queue CMT

Output Queue

ECS Server

Client

Figure 4: Queuing Model for ECS Architecture Initially, client{disk cached data from the server relations can be de ned using the parameter Rel (1  i  num rels) that corresponds to the percentage of server relation Reli; (1  i  num rels) cached in each participating client. Jobs initiated at the client sites are always dispatched to the server for processing through a message output queue (Send Queue). The server receives these requests in its Input Queue via the network and forwards them to the Ready Queue. When a job is nally scheduled by the concurrency control manager (CCM ), its type can be determined. If the job is of write i

14

type, it is serviced by the Update, Blocked, Abort Queues and the UPD and ABRT processing elements. At commit time, updated pages are not only ushed to the disk but a fraction (write log fract) of these pages is appended in the log of the modi ed relation (assuming that only fraction of page tuples is modi ed per page at a time). If it is just a read only job it is routed to the LOGRD Queue. This queue is accommodated by the log and read manager (LOGRDM ) which decides what pertinent changes need to be transmitted before the job is evaluated at the client or the new portion of the data to be downloaded to the client. The increments are decided after all the not examined pieces of the logs are read and ltered through the client applicable conditions. The factor that decides the amount of new data cached per job is de ned for every client individually by the parameter cont caching perc. This last factor determines the percentage of new pages to be cached every time (this parameter is set to zero for the initial set of our experiment). The log and read manager sends {through a queue{ either the requested data increments along with the newly read data or an acknowledgment. The model for the clients requires some modi cation due to client disk existence. Transactions commence at their respective WorkLoad Generators and are sent through the network to the server. Once the server's answer has been received, two possibilities exist. The rst is that no data has been transferred from the server, just an acknowledgment that no increments exist. In this case, the client's DBMS goes on with the execution of the query. On the other hand, incremental changes received from the server are accommodated by the increment service module (ISM) which is responsible for re ecting them on the local disk. The service for the query evaluation is provided through a loop of repeating data page accesses and processing. Clearly, there is some overhead involved whenever updates on the server a ect a client's cached data or new data are being cached. Table 4 shows some additional parameters for all the models. Average client disk access time, client CPU processing power, and client main memory size are described by client dist tr, client cpu mips and client main memory respectively. The num clients is the number of clients participating in the system con guration in every experiment and instr log page is the number of instructions executed by the ECS server in order to process a log page.

15

ClientParameters client disk tr client cpu mips client main memory Rel i

instr log page write log fract cont caching perc num clients

Meaning average disk transfer time processing power of client size of client main memory initial cached fraction of server relation Reli instructions to process a log page fraction of updated pages to be recorded to the log continuous caching factor number of clients

V alue 15 msec 20 MIPS 500 Pages 0.30 or 0.40

Applicable in Model ECS RU, ECS RU, ECS ECS

5000

ECS

0.10

ECS

0 ECS 4,8,16,24...56 CS, RU, ECS

Table 4: Additional CS, RU and ECS parameters

4.5 Simulation Packages The simulation packages were written in C and their sizes range from 4.6k to 5.4k lines of source code. Two{phase locking protocol is used and we have implemented time{out mechanisms for detecting deadlocks. Aborted jobs are not scheduled immediately but they are delayed for a number of rounds before restart. The execution time of all three simulators for a complete workload is about one day on a DECstation 5000/200. We also ensure|through the method of batch means[9]|that our simulations reach stability (con dence of more than 96%).

5 Simulation Results In this section, we discuss performance metrics and describe some of the experiments conducted using the three simulators. System and data related parameter values appear in tables 1 to 4.

5.1 Performance Metrics, Query{Update Streams, and Work Load Settings The main performance criteria for our evaluation are the average throughput (jobs/min), and throughput speedup de ned as the ratio of two throughput values. Speedup is related to the throughput gap [12] and measures the relative performance of two system con gurations for several job streams. We also measure server disk access reduction which is the ratio of server disk accesses performed by two system con gurations, cache 16

memory hits and various system resource utilizations. The database for our experiments consists of eight base relations. Every relation has cardinality of 20000 tuples and requires 1000 disk pages. The main memory of the server can retain 2000 pages for all the multiprogrammed jobs while each client can retain the most 500 pages for database processing in its own main memory for the case of RU and ECS con gurations. Initially, the database is resident on the server's disk but as time progresses parts of it are cached either in the cache memory of the RU clients or the disk and the main memory of the ECS clients. In order to evaluate the three DBMS architectures, the WorkLoad Generator creates several workloads using a mix of queries and modi cations. A Query{Update Stream(QUS) is a sequence of query and update jobs mixed in a prede ned ratio. In the rst two experiments of this paper, the mix ratio is 10, that is each QUS includes one update per ten queries. QUS jobs are randomly selected from a predetermined set of operations that describe the workload. Every client submits for execution a QUS and terminates when all its jobs have completed successfully. Exactly the same QUS are submitted to all con gurations. The lenght of the QUSs was selected to be 132 jobs since that gave con dence of more than 96% in our results. Our goal is to examine system performances under diverse parameter settings. The principal resources all DBMS con gurations compete for are CPU cycles and disk accesses. QUS consists of three groups of jobs. Two of them are queries and the other is updates. Since the mix of jobs plays an important role in system performance (as Boral and DeWitt show in [4]), we chose small and large queries. The rst two experiments included in this paper correspond to low and high workloads. The small size query set (SQS) consists of 8 selections on the base relations with tuple selectivity of 5% (2 of them are done on clustered attributes) and 4 2{way join operations with join selectivity 0.2. The large size query set (LQS) consists of 8 selections with tuple selectivity equal to 25%, 5 selections with tuple selectivity of 40%(4 of these selections are done on clustered attributes), 3 projections, and nally 4 2{way joins with join selectivity .40. Update jobs (U) are made up of 8 modi cations with varying update rates (4 use clustered attributes). The update rates give the percentage of pages modi ed in a relation during an update. For our experiments, update rates are set to the following values: 0% (no modi cations), 2%, 4%, 6% and 8%. Given the above group classi cation, we formulate two experiments: SQS{U, and LQS{U. A job stream of a particular experiment, say SQS{U, and of x% update rate consists of queries from the SQS group and updates from the U that modify x% of 17

Throughput

CS and RU Throughput Rates (SQS-U)

35.00

0% 30.00

25.00 6% 4%

2%

20.00 8%

15.00 2% 0%

10.00 6%

10

20

30

8%

4%

40

50

Clients

Figure 5: CS and RU Throughput the server relation pages. Another set of experiments designed around the Wisconsin benchmark can be found in [16].

5.2 Experiments: SQS{U and LQS{U In Figure 5, the throughput rates for both CS and RU architectures are presented. The number of clients varies from 4 to 56. There are clearly two groups of curves: those of the CS (located in the lower part of the chart) and those of the RU (located in the upper part of the chart). Overall, we could say that the throughput averages at about 11.6 jobs per minute for the CS and 23.7 for the RU case. From 4 to 16 CS clients, throughput increases as the CS con guration capitalize upon their resources. For more than 24 CS clients, we observe a throughput decline for the non-zero update streams attributed to high contention and the higher number of aborted and restarted jobs. The RU 0% curve always remains in the range between 32 and 33 jobs/min since client cache memories essentially provide much larger memory partitions than the corresponding ones of the CS. The \one writer at a time" requirement of the RU con guration plays a major role in the almost linear decrease in throughput performance for the non-zero update curves as the number of clients increases. Note that at 56 clients, the RU performance values obtained for QUS with 4% to 8% update rates are about the same with those of their CS counterparts. It is evident from the above gure that since RU utilizes both the cache 18

ECS Throughput (SQS-U)

Throughput

1200

0%

1000

800

600

2% 400

4% 6%

200

8%

0 10

20

30

40

50

Clients

Figure 6: ECS Throughput memories and the CPU of its clients for DBMS page processing, it performs considerably better than its CS counterpart (except in the case of many clients submitting non-zero update streams where RU throughput rates are comparable with those obtained in the CS con guration). The average RU throughput improvement for this experiment was calculated to be 2.04 times higher than that of CS. Figure 6 shows the ECS throughput for identical streams. The 0% update curve shows that the throughput of the system increases almost linearly with the number of clients. This bene t is due to two reasons: 1) the clients use only their local disks and achieve maximal parallel access to already cached data 2) the server carries out a negligible disk operations (there are no updates therefore the logs are empty) and handles only short messages and acknowledgments routed through the network. No data movement across the network is observed in this case. As the update rate increases (2%, 4%) the level of the achieved throughput rates remains high and increases almost linearly with the number of workstations. This holds up to 32 stations. After that, we observe a small decline that is due to higher con ict rate caused by the increased number of updates. Similar declines are observed by all others (but the 0% update curve). From Figures 5 and 6, we see that the performance of ECS is signi cantly higher than those of the CS and RU. For ECS the maximum processing rate is 1316.4 jobs/min (all 56 clients attached to a single server with 0% updates) while the maximum throughput 19

Throughput Speedup

Speedup Chart (SQS-U)

1e+02 0% ECS/CS

5 2% ECS/CS 4% ECS/CS 2

6% ECS/CS

8% ECS/CS

1e+01

5

0% RU/CS 2% RU/CS 2

4% RU/CS

6% RU/CS

8% RU/CS

1e+00 10

20

30

40

50

Clients

Figure 7: ECS/CS and RU/CS Throughput Speedup value for the CS is about 12.6 jobs/min and for the RU 32.9 jobs/min. The number of workstations after which we observe a decline in job throughput is termed \maximum throughput threshold" (mtt) and varies with the update rate. For instance, for the 2% curve it comes at about 35 workstations and for the 6%, 8% curves appears in the region around 20 workstations. The mtt greatly depends on the type of submitted jobs, the composition of the QUS as well as the server concurrency control manager. A more sophisticated ( exible) manager (such as that in [26]) than the one used in our simulation package would further increase the mtt values. Figure 7 depicts the throughput speedup for RU and ECS architectures over CS (y axis is depicted in logarithmic scale). It suggests that the average throughput for ECS is almost proportional to the number of clients at least for the light update streams (0%, 2%, and 4%) in the range of 4 to 32 stations. For the 4% update stream, the relative throughput for ECS is 17 times better than its CS{RU counterparts (at 56 clients). It is worth noting that even for the worst case (8% updates), the ECS system performance remains about 9 times higher than that of CS. The decline though starts earlier, at 16 clients, where it still maintains about 10 times higher job processing capability. At the lower part of the Figure 7, the RU over CS speedup is shown. Although RU performs generally better than CS under the STS{U workload for the reasons given earlier, its corresponding speedup is notably much smaller than that of ECS. The principal reasons 20

Disk Reduction

Server Disk Reduction (SQS-U)

1e+02

5

2% ECS/CS

4% ECS/CS 2 6% ECS/CS

8% ECS/CS 1e+01

5

RU/CS reduction curves

10

20

30

40

50

Clients

Figure 8: ECS/CS and RU/CS Disk Reduction for the superiority of the ECS are the o loading of the server disk operations and and the parallel use of local disks. Figure 8 supports these hypotheses. It presents the server disk access reduction achieved by ECS over CS and similarly the one achieved by RU over CS. ECS server disk accesses for 0% update streams cause insigni cant disk access (assuming that all pertinent user data have been cached already in the workstation disks). For 2% update rate QUSs the reduction varies from 102 to 32 times over the entire range of clients. As expected, this reduction drops further for the more update intensive streams (i.e. curves 4%, 6%, and 8%). However, even for the 8% updates the disk reduction ranges from 26 to 8 times which is a signi cant gain. The disk reduction rates of the RU architecture over the CS vary between 2.4 and 2.6 times throughout the range of the clients and for all QUS curves. This is achieved predominantly by the use of the client cache memories which are larger than the corresponding main memory multiprogramming partitions of the CS con guration avoiding so many page replacements. Figure 9 reveals one of the greatest impediments in the performance of ECS that is the log operations. The above graph shows the percentage of the log pertinent disk accesses performed (both log reads and writes) over the total number of server disk accesses. In the presense of more than 40 clients the log disk operations constitute a large fraction of server disk accesses (around 69%). 21

Percentage

Percentage of Log Accesses over Total Server Disk Accesses

7 2% 6 8% 5

6%

4%

4

3.5

3

2.5

2

10

20

30

40

50

Clients

Figure 9: Percentage of Server Page Accesses due to Log Operations Figure 10 depicts the time spent on the network for all con gurations and for update rates: 0%, 4%, 8%. The ECS model causes the least amount of trac in the network because the increments of the results are small in size. The RU models causes the highest network trac because all datapages must be transferred to the workstation for processing. On the other hand, CS transfers only qualifying records(results). For 56 clients, the RU network trac is almost double the CS (2.15), and 7 times more than that of ECS. Figure 11 summarizes the results of the LTS{U experiment. Although the nature of query mix has been changed considerably, we notice that the speedup curves have not (compare with Figure 7). The only di erence is that all the non{zero update curves have been moved upwards closer to the 0% curve and that the mtts appear much later from 30 to 50 clients. Positive speedup deviations are observed for all the non{zero curves compared with the rates seen in Figure 7. It is interesting to note that gains produced by the RU model are lower than in the STS{U experiment. We o er the following explanation: as the number of pages to be processed by the RU server disk increases signi cantly (larger selection/join selectivities and projection operators involved in the experiment) imposing delays at the server site (disk utilization ranges between .93 and .99), workstation CPUs can not contribute as much to the system average throughput. 22

Time(msec)

Time Spent in the Network (SQS-U)

5

RU related Curves

2

1e+06

CS related Curves

5

8% ECS 2 4% ECS 0% ECS

1e+05

5

2

1e+04 10

20

30

40

50

Clients

Figure 10: Time Spent on the Network

5.3 E ect of Model Parameters In this section, we vary one by one some of the system parameters which have a signi cant impact on the performance of the examined architectures, run the SQS{U experiment and nally compare the obtained results with those of the Figure 7. The parameters that vary are: Client disk access time to 9msec: This corresponds to 40% reduction in average access time. Figure 12 presents the throughput speedup achieved for the ECS experiments using this new client access time over the ECS results depicted in Figure 6. Not surprisingly, the new ECS performance values are increased an average 1.23 factor for all curves which represents very serious gains. The no-update curve indicates a constant 53% throughput rate improvement while the rest curves indicate signi cant gains in the range 4 to 40 clients. For more than 40 clients the gains are insigni cant and the explanation for that is that the large number of updates (that increases linearly with the number of submitting clients) imposes serious delays on the server. Naturally, the RU and CS throughput values are not a ected at all in this experiment. Server disk access time set to 7 msec: We observe that all but one(8%) non{zero update curves approach the 0% curve and the mtts area appear much later at about 40 stations. By providing a faster disk access time server jobs are executed faster and cause 23

Throughput Speedup

Speedup Chart (LQS-U)

5

0% ECS/CS 2% ECS/CS 4% ECS/CS 6% ECS/CS

2

8% ECS/CS

1e+01

5

2 0% RU/CS

Rest RU/CS curves

1e+00 10

20

30

40

50

Clients

Figure 11: RU/CS and ECS/CS Throughput Speedups for LQS{U fewer restarts.

Client CPU set to 110MIPS: The results are very similar to those of Figure 6 with

the exception that we notice an average increase of 15.16 jobs/min throughput rate for all update curves. This indicates that extremely fast processing workstations alone have moderate impact on the performance of the ECS architecture. Server CPU set to 90MIPS: We observe a shifting of all curves towards the top right corner of the graph. This pushes the mtts further to the right and allows for more clients to work simultaneously. There are two reasons for this behavior: 1) updates are mildly consuming the CPU and therefore, by providing a faster CPU they nish much faster and 2) fast CPU results in lower lock contention which in turn leads to fewer deadlocks. Server log processing is also carried out faster as well.

5.4 Other Experiments In this section, we perform four more experiments to examine the role of \clean" update workloads, specially designed QUSs so that only a small number of clients submits updates, the e ect of the continuous caching parameter cont caching perc in the ECS model as well as the ability of all models to scale{up. Pure Update Workload: In this experiment, the update rate (6%) remains constant 24

Throughput Speedup

ECS Speedup for Client Disk Access 9msec over 15msec

1.55 0% 1.50 1.45 1.40 1.35 1.30 2% 1.25 1.20 4% 1.15 6% 1.10 1.05

8%

1.00 10

20

30

40

50

Clients

Figure 12: ECS Speedup for Client Disk Access Time at 9msec over Client Access Time at 15 msec and the QUS are made up of only update jobs (pure update workload). Figure 13 gives the throughput rates for all con gurations. The CS and ECS models show similar curves shape{wise. For the range of 4-10 clients their throughput rates increase and beyond that point under the in uence of the extensive blocking delays and the created deadlocks their performance declines considerably. At 56 clients the ECS achieved only half of the throughput obtained than that when 8 clients when used. Overall, CS is doing better than ECS because the latter has to also write into logs (that is extra disk page accesses) at commit time. It is also interesting to note the almost linear decline in the performance of the RU con guration. Since there is strictly one writer at a time in this set of experiments all the jobs are sequenced by the MPL processor of the model and are executed one at a time (due to lack of readers). The concurrency assists both CS and ECS clients in the lower range (4 to 8) to achieve better throughput rates than their RU counterparts, but later the advantages of job concurrency (many writers at the same time { CS,ECS cases) diminish signi cantly. Limited Number of Update Clients: In this experiment, a number of clients is dedicated to updates only and the rest clients submit read only requests. This simulates environments in which there is a (large) number of read-only users and a constant number of updaters. This class of databases includes systems such as those used in the stock 25

Throuhgput

Pure Update Workload

45.00

40.00

35.00

RU

30.00 CS 25.00 ECS 20.00

15.00

10.00 10

20

30

40

50

Clients

Figure 13: QUSs consist of Update Jobs Only markets. Figure 14 presents the throughput speedup results of the SQS{U experiment. Note that three clients at each experiment are designated as writers and the remaining query the database. The RU/CS curves remain in the same performance value levels as those observed in Figure 7 with the exception that as the number of clients increases the e ect of the writers is amortized by the readers. Thus, all the non-zero update curves converge to the 0% RU/CS curve for more than 32 clients. The ECS/CS curves suggest spectacular gains for the ECS con guration. The performance of the system increases almost linearly with the number of participating clients for all the update curves. Note as well that the mtts have been disappeared from the graph. Naturally, the mtts appear much later when more than 56 clients are used in the con guration (not shown in the Figure). Changing data locales for the ECS clients: So far, we have not considered the case in which clients continuously demand new data from the server to be cached in their disks. The goal of this experiment was to address this issue and try to qualify the degradation in the ECS performance. To carry out this experiment, we use the parameter cont caching perc (or ccp) that represents the percentage of new pages from the relations involved in a query to be cached in the client disk. This parameter enabled us to simulate the constantly changing client working set of data. Figure 15 shows the 0%, 2% and 6% curves of Figure 6 (where ccp is 0%) superimposed with the curves for 26

Throughput Speedup

Limited Number of Update Job Strems

1e+02 0% ECS/CS 2% ECS/CS

5 4% ECS/CS 8% ECS/CS

2

6% ECS/CS

1e+01

5

0% RU/CS

2

8% RU/CS

10

20

30

40

50

Clients

Figure 14: Throughput Speedup Rates for Three Writer Only Clients per Experiment the same experiment (SQS{U) but with ccp equal to 1% and 2%. Taken into account that every query requires a new part of the server data, this percentage contributes to large numbers of additional disk accesses and augmented processing time on the part of the server. This very fact makes the almost linear format of the original 0% curve to disappear. More speci cally, while 56 clients in the original experiment attain 1316.3 jobs/min under ccp=1% they achieve 416.1 jobs/min and under ccp = 2% they accomplish only 206.6 jobs/min. The performance degradation for the heavy updating QUS (i.e. 6%) is less noteworthy since there is already serious server blocking. Ability to scale{up: We are also interested in the scale{up behavior of all architectures under the presence of a large number of workstations. For this purpose, we ran an experiment where the number of clients ranges from 4 to 120 per server. Figure 16 depicts the resulting curves for the STS{U experiment. The graph indicates that beyond the mtts region the speedup of the ECS over CS is gradually decreasing (after 40 clients). For more update intesive QUSs the speedup decline is less sharp than that of the 2% curve. This detarioration is due to the saturation of the server's resources that the great client number creates. Note however, even at the 120 clients, the ECS architecture can process between 6.1 to 19.9 times more jobs than the CS architecture (non-zero update curves). The 0% update curve shows no saturation and continues to increase almost linearly. The major reasons for this are that the performance of the CS con guration for 27

Throughput

SQS-U Experiment with Three cont_caching_perc (ccp) Values 0% ECS (ccp=0%)

600

500 2% ECS (ccp=0%)

400

0% ECS (ccp=1%)

300 2% ECS (ccp=1%)

200

2% ECS (ccp=1%)

6% ECS (ccp=0%)

6% ECS (ccp=2%)

100

0% ECS (ccp=2%)

6% ECS (ccp=1%)

0 10

20

30

40

50

Clients

Figure 15: Experiments with Three Values for the ccp parameter more than 56 clients remains stable in the range between 9.00 and 12.5 job/min (due to the high utilization of its resources) and that the network shows no signs of extremely heavy utilization. The gains for the RU con guration remain in the same levels with those reported in experiment STS{U.

6 Conclusions We have presented, modeled and compared three contemporary DBMS architectures all aiming for high throughput rates. The simulation results show the following characteristics: 

The RAD{UNIFY architecture gives an average 2.1 times performance improvement over the CS by utilizing the main memories and the CPUs of the clients.



Although the ECS performance declines when the QUS queries to updates ratio decreases, it still o ers serious speedup rates over the other two architectures. Under pure update workloads, the ECS gives the worst performance for more than 20 clients.



Under light update rates (1%{5%), the performance speedup of ECS over both CS and RU is almost proportional to the number of participating workstations in the 28

Throughput Speedup

Model Scalability Experiment

60.00 0% ECS/CS

50.00

40.00 2% ECS/CS

30.00

4% ECS/CS

20.00 6% ECS/CS 8% ECS/CS

10.00

0% RU/CS

0.00

Rest of RU/CS curves 20

40

Clients 60

80

100

120

Figure 16: Scalability Experiment range of 4{32 for our experiments. For streams without updates, ECS/CS speedup increases almost linearly with the number of workstations. 

Under either heavier update rates and many concurrently imposed modi cations, the ECS Server becomes the system bottleneck. We have introduced the \maximum throughput thresholds" (mtts) to identify these bottleneck points with respect to each update rate.



Faster workstation disk access time improves the performance of ECS.



ECS performance is diminished signi cantly whenever clients constantly demand new data elements from the server.



Database environments with few exclusive writers and a number of readers o er very good performance results for the ECS con guration.



Under all, but pure update workloads, the ECS architecture is more scalable than the other two.

Future work includes experimentation using local databases to the workstations, exploring the behavior of the con gurations under di erent server concurrency control 29

protocols and developing periodic update propagation strategies for bringing transaction throughput close to the 0% update curves.

Aknowledgements: The authors are grateful to the anonymous referees for their valu-

able comments and suggestions as well as, Jennifer Carle and George Panagopoulos for commenting on earlier versions of this paper.

References [1] R. Agrawal, M. Carey, and M. Livny. Models for Studing Concurrency Control Performance: Alternatives and Implications. ACM{Transactions on Database Systems, 12(4), December 1987. [2] R. Alonso, D. Barbara, and H. Garcia-Molina. Data Caching Issues in an Information Retrieval System. ACM{Transactions on Database Systems, 15(3):359{384, September 1990. [3] J. Archibald and J.L. Baer. Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model. ACM{Transactions on Computer Systems, 4(4), November 1986. [4] H. Boral and D. DeWitt. A Methodology for Database System Performance Evaluation. In Proceedings of SIGMOD{Conference on the Management of Data, Boston, MA, June 1984. [5] M. Carey, M. Franklin, M. Livny, and E. Shekita. Data Caching Tradeo s in Client{Server DBMS Architectures. In Proccedings of the 1991 ACM SIGMOD International Conference, Denver, CO, May 1991. [6] A. Delis and N. Roussopoulos. Server Based Information Retrieval Systems Under Light Update Loads. In Proccedings of the 1991 IEEE International Conference on Systems, Man, and Cybernetics, Charlottsville, VA, October 1991. [7] A. Delis and N. Roussopoulos. Performance and Scalability of Client{Server Database Architectures. In Proceedings of the 18th Very Large Data Bases Conference, pages 610{623, Vancouver, Canada, August 1992. [8] D. DeWitt, D. Maier, P. Futtersack, and F. Velez. A Study of Three Alternative Workstation{ Server Architectures for Object{Oriented Database Systems. In Proceedings of the 16th Very Large Data Bases Conference, pages 107{121, Brisbane, Australia, 1990. [9] D. Ferrari. Computer Systems Performance Evaluation. Prentice{Hall, Eaglewood Cli s, NJ, 1978. [10] J. Gray, R. Lorie, G. Putzolu, and I. Traiger. Granularity of Locks and Degrees of Consistency in a Shared Database, chapter 6. North{Holland, 1976. [11] R. Hagman and D. Ferrari. Performance Analysis of Several Back{End Database Architectures. ACM{Transactions on Database Systems, 11(1):1{26, March 1986. [12] A. Kumar and M. Stonebraker. Performance Considerations for an Operating System Transaction Manager. IEEE{Transactions on Software Engineering, 15(6):705{714, June 1989. [13] K. Kuspert, P. Dadam, and J. Gunauer. Cooperative Object Bu er Management in the Advanced Information Management Prototype. In Proccedings of the 13th Very Large Data Bases Conference, Brighton, UK, 1987.

30

[14] J. Purtilo and P. Jalote. An Environment for Developing Fault{Tolerant Software. Transactions on Software Engineering, 17(1991):153{159, February 1991. [15] N. Roussopoulos. The Incremental Access Method of View Cache: Concept, Algorithms, and Cost Analysis. ACM{Transactions on Database Systems, 16(3):535{563, September 1991. [16] N. Roussopoulos and A. Delis. Evaluation of an Enhanced Workstation{Server DBMS Architecture. Technical report, University of Maryland, College Park, February 1991. UMIACS{TR{91{30. [17] N. Roussopoulos and A. Delis. Modern Client{Server DBMS Architectures. ACM{Sigmod Record, September 1991. [18] N. Roussopoulos and H. Kang. Principles and Techniques in the Design of ADMS. Computer, 19(12), December 1986. [19] W. Rubenstein, M. Kubicar, and R. Cattell. Benchmarking simple database operations. In ACM{ SIGMOD{Conference on the Management of Data, pages 387{394, 1987. [20] M. Satyanarayanan, J. Kistler, P. Kumar, M. Okasaki, E. Siegel, and D. Steere. Coda: A Highly Available File System for a Distributed Workstation Environment. IEEE{Transactions on Computers, 39(4), April 1990. [21] D. G. Severance and G. M. Lohman. Di erential Files: Their Application to the Maintenance of Large Databases. ACM{Transactions on Database Systems, 1(3), September 1976. [22] R. Stevens. Unix Networking Programming. Prentice Hall, Englewood Cli s, NJ, 1990. [23] M. Stonebraker. Architectures of Future Data Base Systems. IEEE{Data Engineering, 13(4), December 1990. [24] J. Ullman. Database and Knowledge{Base Systems: Volume II{The New Technologies. Computer Science Press, Rockville, MD, 1989. [25] Y. Wang and L. Rowe. Cache Consistency and Concurrency Control in a Client/Server DBMS Architecture. In Proccedings of the 1991 ACM SIGMOD International Conference, Denver, CO, May 1991. [26] K. Wilkinson and M.A. Neimat. Maintaining Consistency of Client{Cached Data. In Proceedings of International Conference on Very large Data Bases, Brisbane, Australia, August 1990.

31