A language multidatabase system communication protocol - Data ...

2 downloads 0 Views 784KB Size Report
eva Kuhn. Franz Puntigam. University of Technology Vienna. Institute of Computer Languages ..... and wait until the requests to bind Czv', Cdf', C$:' and Cz',”'.
A Language Multidatabase System Communication Protocol* Omran Bukhres

eva Kuhn

University of Technology Vienna Institute of Computer Languages Argentinierstr. 8,1040 Vienna, Austria { eva,franz}&nips.complang. tuwien.ac.at

Purdue University Department of Computer Sciences West Lafayette, IN 47907, USA bukhresOpurdue.cs.edu

Abstract

accommodation of the inherent distribution of some applications [I]. A primary motivation for distributed computing is the need to increase the execution speed of highperformance applications. This increase is achieved through parallelism, which involves breaking an independent process (computation) into subtasks, which are then executed simultaneously on different local processors. A second important motivation for the distributed system is to improve system availability and reliability. The increase in system availability in a distributed system with multiple processors is obvious. Reliability improves through the partial failure property, which means that the failure of one processor does not affect the functions of the other processors in the system. This property, combined with appropriate replication of functions and/or data, increases the reliability of the distributed computing system. Functional specialization is another motivation for distributed systems and is achieved through structuring applications, such as the distributed operating system, as a collection of specialized services (e.g., file service, print service, process service, etc.), and then implementing these applications on a distributed system in such a way that each service can utilize one or more processors, as required by the individual process. The final motivation discussed here is the accommodation of applications that are inherently distributed by nature, such as electronic mail or the inventory tracking of a retail firm with several branch locations. In these instances, a distributed system is required if the users at different sites are to avail themselves of the application. These factors, as well as many others, clearly substantiate the need for distributed computing systems. In recent years, many distributed computing systems have been proposed and implemented. The d e velopment of distributed systems has produced three main issues to be considered in the process of pro-

Rapid growth in the area of Multi Database Syst e m s (MDBSs), which involve both the access of global daia and distributed transaction processing, has created a need for programming languages that provide communication reliability and powerfil synchronization. In this paper, we first present the requirements of MDBSs, then explain the VPL (Vienna Parallel Logic) programming language and its features, and jinally illustrate how t o realize an MDBS communication protocol in this language. VPL provides a concurrent logic language, as well as features of both distributed operating systems and database management systems. These features combine to support the communication and synchronization required by dastributed transaction processing. VPL is suitable for use as a general-purpose distributed programming and coordination language.

1

Introduction

The rapid globalization of both the business and the scientific communities is creating an increasing demand for distributed computing systems. While there is no consensus on the definition of a distributed computing system, [l]proposes the following definition:

A distributed computing system consists of multiple autonomous processors that do not share primary memory, but cooperate by sending messages over a communication network. Factors that make distributed systems desirable include the following: decreased turnaround time for a single computation; increased reliability and availability of the system; functional specialization; and *The work is supported by the Austrian FWF (Fonds zur Fbrderung der wissenschaftlichem Forschung), project “Multidatabase Transaction Processing”, contract number P9020PHY in cooperation with the NSF.

633 1063-6382/93$03.00 0 1993 IEEE

Franz Puntigam

gramming a distributed system: 0 Parallelism, 0 Communication and Synchronization, and 0 Partial Failure Recovery. The issues of parallelism and partial failure recovery were discussed above. The remaining issuecommunication and synchronization-addresses the two types of interaction involved in the cooperation that is required between the processes of a program running in parallel on different processors. Communication and synchronization are closely related. Communication refers to the means used by the active processes in order to exchange data with other related processes, whereas synchronization refers to the methodology used in providing timely active and wait states during the execution of interrelated processes. In general, communication in distributed system processes is achieved by either message passing or data sharing. There are numerous variations of message passing involving either one-way or two-way communication between two or more processes. Examples of this type of communication protocol are synchronous, asynchronous, rendezvous, remote procedure call, and one-to-many message passing. Communication by data sharing occurs when two processes have access to the same variable. The processes can then communicate by one process’s setting the variable and the other process’s reading it. Synchronization may take place a t either the global or the local level. Global synchronization can be implemented by languages using atomic transactions. At the local level, the local processor controls access to its data by other proces-

of complex queries across the distributed system are tedious to implement with the library calls available in a sequential system. On the other hand, a special language for distributed programming can present communication a t a higher level of abstraction than the message passing models supported by most distributed operating systems. In addition, a distributed programming language provides improved readability, portability, and type checking. In this paper we apply such a language, the VPL (Vienna Parallel Logic) language [14], for the implementation of the communication in MDBSs. The language is based on a logic programming paradigm which allows both sequential and parallel execution to be expressed. The major advantage of using the VPL language are the built-in high level communication and synchronization operators [12, 131. VF‘L provides controlled parallelism and support of shared data-objects, as well as concurrency and backtracking through compensate actions. In Section 2 we discuss the requirements of MDBSs as a motivation for the W L language. Section 3 introduces the VPL (Vienna Parallel Logic Language) and its features. Section 4 indicates how VPL might be employed to implement an MDBS.

2

Multidatabase Systems

In a Multidatabase System (MDBS) environment, the heterogeneity of local systems prevents assumptions about the software systems, the language interfaces, the data and transaction models, the correctness criteria, and the access rights supported by the local systems. To be successful, an MDBS must be very flexible and adaptable in dealing with local systems. In general, the heterogeneity of local systems prohibits making a priori assumptions about the following characteristics: Software system. MDBSs were introduced to make (relational) database systems interoperable [16]. However, we believe that in its most general form, an MDBS should be able to connect all kinds of software systems, since the problem of connecting existing software exists in many areas [18]. Language interface. If the local systems are relational databases, SQL can be assumed as the interface language. Automatic translation from the language of the MDBS into SQL can be provided. In general, the language of arbitrary software systems is not known to the MDBS, nor will the MDBS be aware of the semantics of a statement given in a certain language. Thus the degree to which the MDBS is able to reason about a local transaction is severely restricted. In the ex-

sors.

A logical extension of the distributed computing system is the Multidatabase System (MDBS) [16, 171, a collection of local database systems that not only have different geographical locations, but also differ in data models, language interfaces, transaction models, and other important factors. A suitable MDBS should provide the high degree of flexibility and adaptability required to accommodate local systems with all these differences. The requirements given above may be supported by a distributed operating system [19] or by the use of a distributed programming language [3] to implement the system. Support by a distributed operating system involves primitives of the operating system, which are called by applications programmed in an extended sequential language. Often the data types and control structures provided by the sequential language are not adequate for the demands of distributed programming. For example, the communication requirements

634

Vienna

treme case, the MDBS might only pass uninterpreted statements to and from the local system.

3

Data model. Even if local systems are relational databases, their integration (global integration or dynamic MDBS languages) is no simple task [ll]. A global data model that is able to represent all possible data and queries will usually not exist.

The VPL programming language is based on Prolog. It supports both concurrency and the possibility to try several search paths: backtracking is extended by the possibility to undo non-committed and to compensate committed side-effects. Communication and synchronization are supported by communication variables. A VPL program is a (finite) set of procedures consisting of several clauses:

Transaction model. Local databases are autonomous in their use of transaction models. Often, only serializable schedules, sometimes strict or rigorous schedules will be produced. However, common software like the UNIXTMutilities provide no transaction model a t all.

The

Pf&yt’ Language

Head :- B o d y l . Head :- Bodyz.

The above two clauses represent a procedure with sequential clause search. To prove a call (viz. query) to Head, clause 1 is taken first. If it fails, clause 2 is tried. Clauses form a procedure with parallel clause search if “::-” is used instead of “:-”. In order to prove Head, all clauses are started in parallel, the first one to complete is the clause that is taken (OR-parallelism). Body, consists of goals (viz. subqueries) connected by “&” (sequential AND operator) or “&&” (parallel AND operator). If Bodyi is defined by (GI && Gz) 8~ G3, the VPL system starts GI and Ga in parallel. If both succeed, G3 is started. If G1 or G:! fails, the other goal is aborted and G3 is not started (AND-parallelism). If neither sequential nor parallel execution is required, the neutral operators “and” and “