Trustworthy Storage and Exchange of Theorems Jim Grundy - CiteSeerX

1 downloads 0 Views 193KB Size Report
Jim Grundy. Turku Centre for Computer Science ..... comment on earlier drafts of this paper: John Harrison, Emil Sekerinski, and. Joakim von Wright. In addition, I ...
Trustworthy Storage and Exchange of Theorems Jim Grundy

Turku Centre for Computer Science TUCS Technical Reports No 1, April 1996

Turku Centre for Computer Science Lemminkaisenkatu 14 FIN-20520 Turku Finland http://www.tucs.abo.

April 1996 ISBN 951-650-728-X ISSN 1239-1891

Abstract A large e ort is usually required to have a theorem prover establish a complex theorem. Having invested this e ort, how can we store the result for later use, or communicate it to others while preserving our trust in its validity? This paper discusses the use of digital signatures to store and exchange theorems in a secure way.

1 Introduction Suppose you have gone to the trouble of proving a theorem using a mechanical theorem prover. If your work is to have any enduring value, you have to be able to save your result for later use and communicate it to others. A theorem prover should allow its users to create and modify theorems only in a sound way. However, once outside the theorem prover | for example while saved on disk or being transmitted elsewhere | nothing protects a result from being modi ed in other ways. At present, this problem is not really addressed in the design of mechanical theorem provers. Some theorem provers | like GETFOL [9] and Isabelle [16] | can not save individual result at all (although they can record the state of the theorem prover). Others | like HOL [10] | can save results, but do so in an undocumented format. This lack of documentation makes it dicult to exchange results with other systems, but presents no real obstacle to those who would bypass the security of the prover by modifying stored results in an unsound way. Some systems | like Coq [2] | even store their results using a binary data format dictated by their implementation language. While this is ecient, such results are rmly tied to the systems in which they were proved. Indeed, with such systems it is not generally possible to exchange theorems between two versions of the same tool built using di erent compilers. Systems that save their results in a documented format do exist; the most notable among them being IMPS [7]. The results produced by such systems are the easiest to exchange between tools, but they are also the easiest to modify. The only truly secure method currently used to store and reuse results is to store both the theorem and its proof, and to recheck the proof before reusing the theorem. This approach, adopted by LEGO [13], can be computationally expensive. Furthermore, this technique can not be used to share results with other systems, since proofs can usually be checked only by the system in which they were developed. Table 1 presents a summary of the methods by which various theorem provers store and reuse results. This paper proposes a method by which a theorem prover can use digital signatures to detect any modi cations made to a theorem while outside the system. Using this method it is no longer necessary to store theorems together with their proofs in order to ensure the security of the theorem prover. Furthermore it is perfectly safe to store theorems in a documented format. These two features make the method an ideal basis for exchanging results between di erent proof tools. 1

System Theorems Stored Proofs Checked 1 Format Coq [2] Y N B Ergo [19] Y N B2 GETFOL [9] N3 HOL [10] Y U IMPS [7] Y N S4 5 Isabelle [16] N LEGO [13] Y Y D LP [8] Y D Mizar [17] Y N U Otter [15] Y N B 4 ProofPower [1] N Formats: S A proposed standard for theorem exchange. D A nonstandard, but documented format. U An undocumented format. B A binary data format.

Notes: 1 A blank here indicates the system does not store proofs. 2 Theorems are stored in a binary format, but proofs are store in a documented format. 3 The system does not store theorems, but can log the user's commands for later replay. 4 This standard [12] is yet to be adopted by other systems. 5 The system does not store theorems, but can save the state of the theorem prover.

Table 1: How Theorem Provers Store Theorems

2 The Architecture of a Theorem Prover This section describes the architecture of a theorem prover as a precursor to explaining how to extend that architecture to allow for the secure storage of results outside the system. We begin by describing the LCF-style architecture for theorem provers | named after LCF [11], the rst system to be built this way. We then show how most theorem provers can be thought of as sharing this architecture, even if only in a trivial sense. An LCF-style theorem prover is one in which the core of the system implements an abstract data-type that describes the theorems of a given logic. The axioms of the logic are constants of the data-type, while the inference rules are represented by functions that return elements of the datatype. Modus ponens, for example, would be implemented as a function which takes two theorems of the form P and P implies Q and returns a theorem of the form Q. The data-type may also include a function to make new theorems 2

that extend the logic, for example by de ning new constants. No other ways of creating a theorem are included in the signature of the data-type. The remainder of the system, perhaps including an elaborate user interface and automatic proof procedures, can then be built around this core. The modularisation features of the implementation language are used to ensure that the functions of abstract data-type remain the only way to produce a theorem. Users of the system are therefore free to extend the system further by building on the region outside the core. Errors made outside the abstract data-type can result only in a failure to produce a theorem, or in the production of a theorem other than the one intended, they can not result in the production of a `nontheorem'. Figure 1 illustrates the architecture of a such a theorem prover. External Environment Theorem Prover theorem

theorem

Theorem Data-Type theorem theorem

theorem

theorem

theorem

theorem

Figure 1: The Architecture of a Theorem Prover A non-LCF, or what we might call monolithic, theorem prover presents itself as a closed system to its users. However, interfaces and derived proof rules can still be built for such systems by constructing them as front ends that drive the theorem prover from outside via its user interface. In this way, we can regard a monolithic theorem prover as a special case of of an LCF-style system; one in which the implementation of the abstract data-type constitutes the entire theorem prover. Most systems fall somewhere between these two extremes. The abstract data-type at the heart of most LCF-style 3

systems usually contains functions that do not logically belong there, but which have been included for eciency reasons. On the other hand, many monolithic systems provide some form of interface language that can be used to combine and sequence its more basic commands. Let us now return to gure 1, but this time we will consider it as illustrating the general architecture of all theorem provers. Theorems can be created and manipulated only within the region representing the implementation of the theorem data-type. Should there exist a region that is outside of the abstract data-type, but still within the theorem prover, then the modularisation features of the implementation language must ensure that theorems can be neither created nor altered there. However, this protection can not extend outside the theorem prover, and accordingly the diagram shows no method of importing or exporting theorems. The ability to import a theorem would require the addition of a `back door' to the theorem data-type that would allow a theorem to be created based on evidence supplied from outside the system. The next section discusses the notion of digital signatures, a technique which can be used to put a lock on this back door, and thereby ensure that only theorems that have previously been proved may enter by this route.

3 Digital Signatures Digital signatures provide a way for the sender of a message to electronically `sign' it so that a recipient can verify who sent the message and that its content has not changed since being signed. This concept was originally proposed by Die and Hellman [6]. The two main components of a such a method are a secure hash algorithm and an asymmetric, or public key, cryptographic system. A secure hash algorithm, also known as a message digest function, is an algorithm that creates a hash code summarising the content of a message. It must be `secure' in the sense that if given some message, it is computationally infeasible to nd a second message that would produce the same hash code. An asymmetric cryptographic system consists of a method for generating a pair of related cryptographic keys, known as a public key and a private key, together with a pair of algorithms for encrypting and decrypting messages using a key. The encryption algorithm should have the property that it is not computationally feasible to decrypt a message encrypted using a particular key, without knowledge of its related key. It should also not be feasible to deduce a private key, even given knowledge of its corresponding public key. Users of an asymmetric cryptographic system each generate a pair of 4

keys. They keep the knowledge of their private key a secret, but share the knowledge of their public key. One user can send a secret message to another by encrypting it with the recipient's public key. The recipient, as the sole possessor of the corresponding private key, is the only person who can decrypt the message. Alternatively, a user can encrypt a message with their own private key. The recipient of the message can then decrypt it using the sender's public key. This imparts no secrecy to the message, however the recipient can be certain of who sent it, namely the user who is the sole possessor of the matching private key. To digitally sign a message, a user rst computes its secure hash code. The hash code is then encrypted with the user's private key. The result is a digital signature for the message. Any recipient of the message and its signature can then check its authenticity as follows. By decrypting the signature using the sender's public key, the recipient gains a hash code for the message. The recipient also independently computes the hash code of the message. If the two codes match, then the recipient knows that the message was indeed signed by the sender, and that it has not changed since being signed. Note that Die and Hellman's original description of digital signatures does not need a hash algorithm. For the interested reader, Denning [4] gives one of the earlier descriptions of the use of hash algorithms to increase the security of a signature while decreasing its size.

4 An Extended Architecture Consider the architecture for a theorem prover described in gure 1. Suppose we extend the implementation of the abstract data-type of theorems with the hashing and encryption functions necessary to implement a system of digital signatures. Suppose also that when a theorem prover is being built, a random pair of cryptographic keys is generated and stored within the implementation of the theorem data-type. The signature of the data-type could then be extended to include the following functions: 

A function to return the public key.



A function which takes a theorem, represents it as a string, and then signs it with the private key.



A function which if given a signed string that represents a theorem, and where the signature matches the public key, creates the corresponding theorem. 5

This extended architecture is shown in gure 2. A theorem proved in such a system can be converted into a signed string, and stored or transmitted outside the system. In the external environment, strings are not protected from modi cation. However, any changes to a string will be rejected at the rst attempt to convert it back into a theorem. Note that for ease of illustration, External Environment Theorem Prover

theorem

theorem

theorem

Theorem Data-Type theorem

theorem theorem theorem

theorem

theorem

theorem

theorem

Figure 2: Extended Architecture for Secure Storage of Theorems gure 2 shows signed theorems moving directly between the implementation of the theorem data-type and the external environment. However, the routines to actually load, store, or transmit a signed theorem would likely be implemented outside of the abstract data-type since it is not necessary for them to be trusted.

4.1 Trusting Other Theorem Provers

A theorem prover using the architecture described so far will only accept results that it has previously proved itself. In general this is a good thing, other sites may have made local, possibly unsound, modi cations to their copy of a theorem prover. However, there may be some sites who's theorem provers we can trust, and it would be useful to accept results produced by those systems as well. 6

An obvious generalisation of the proposed system is to maintain a set of public keys of other trusted theorems provers. A theorem could then be created for a string signed by any trusted system. Note that the representation of a theorem should be similarly extended to include the public keys of all the systems on which its validity may depend. We may wish trust some other theorem prover without having to trust all the systems it trusts as well.

4.2 A Possibility for Attack The security of a theorem prover, and of the other provers that trust it, can be compromised if its private key is disclosed. The modularisation features of the language in which the prover is implemented should prevent this. However, a determined attacker could nd the information by directly inspecting the object code. This can be prevented if the theorem prover is run on a remote site and communication to that site is restricted to issuing commands to the theorem prover.

4.3 A Complication with Theories Most theorem provers allow their logic to be extended by expanding its signature with new constants and operators, and by asserting new axioms. In such systems it is not sucient to save a theorem without recording the extensions that were in force when it was proved. This is necessary to ensure that the theorem can not be reused unless the same extensions are in force again. Logical extensions are usually collected into related groups and organised into a hierarchy of theories, where a theory is composed of a set of signature descriptions for the new constants and operators, a set of new axioms, and a set of parent theories containing further extensions on which current theory may depend. Some systems, like HOL [10], require a proof that each new axiom results in a sound extension of the logic. Theories then, should also be protected with digital signatures when stored outside the theorem prover. The external representation of a theory can be built up from strings representing its signature descriptions and axioms, and the identity of its parent theories | which can be represented by their hash codes. Finally, in a system where theories are also protected with digital signatures, the representation of an individual theorem need not record all the logical extensions on which it may depend. The same result can be achieved by including the hash code of the theory in force when it was proved. 7

5 Potential Applications The use of digital signatures to secure the storage and exchange of theorems may open up new ways to exploit theorem proving results. The remainder of the paper is given over to considering potential future applications of digitally signed theorems.

5.1 Exchanging Results Between Di erent Theorem Provers

The concept of allowing di erent theorem provers to cooperate in the production of a proof is a subject of increasing interest [3, 5, 18]. A large proof can often be solved more easily using a collection of proof tools than by any one tool alone. Using digital signatures to ensure the integrity of theorems would allow proof tools that share the same logic to adopt a common documented format for the storage and exchange of theorems without compromising their security. The set of systems that a particular theorem prover is prepared to trust could then include provers of di erent types. Theorem provers would then be able to import results from other systems as easily and securely as they can reuse their own. Note that since only the results need to be shared, and not their proofs, secure cooperation is possible even between tools based on di erent proof systems.

5.2 Version Control of Theorem Provers

A special case of exchanging results between theorem provers is that of exchange between di erent versions of the same prover. From time to time, new versions of almost all theorem provers are released. As with any product, users upgrading to a new version usually expect to keep all the work they have done with the old one. This could be achieve by adding the public key of the user's old version of the theorem prover to the list of systems trusted by their new one. From time to time, however, new versions of a theorem prover are released to correct logical errors discovered in the system. An example of this came in 1993 when Joakim von Wright found a bug in the type instantiation rule of version 2.01 of the HOL system. This bug was subsequently exploited by Richard Boulton to construct a `proof' of false. Version 2.02 of HOL corrected this bug. In situations like this, some sites upgrading to the new version might prefer to recheck their old results rather than trust them implicitly. 8

Such sites would therefore not add the public key of any old versions of their system to the list trusted by their new one.

5.3 Trustworthy Products from Untrustworthy Suppliers

A customer looking to buy a critical product would like their vendor to supply both the product and evidence that it has been proved to meet its speci cation. What form should that evidence take? A simple statement that the theorem required has been proved is clearly not sucient; the vendor could make such a statement without actually doing the proof. A signed theorem produced by the vendor's theorem prover is no more useful; the vendor may have a `theorem prover' that will sign anything they want. Having the vendor supply the customer with a proof would appear to be the answer, but this requires the customer to be able to check the proof. The customer would have to acquire their own copy of a trusted theorem prover, and learn how to use it, potentially a signi cant investment. The problem can be solved if a disinterested third party with a trusted theorem prover can be found, perhaps an independent proof certi cation authority. The customer could then turn the proof over to the third party for checking. If the proof is genuine, then the customer can be supplied with the evidence required in the form of a theorem signed by the third party's theorem prover.

5.4 Online Reference Books

Working mathematicians do not usually solve problems from rst principles, or even just using results they have personally proved before. Mathematicians also accept theorems they nd in various respected reference books. The productivity of a proof tool might be enhanced if it possessed a similar ability to accept theorems found in respected online databases. An online database of theorems could be built up by researchers submitting proofs to a generally trusted theorem prover. If a proof is genuine, then the resulting signed theorem can be added to the database. The signature on a theorem allows a tool to distinguish the theorems it nds that have genuinely been proved from those that have not.

5.5 Theorems in Hypertext Journals

While mathematicians will trust a theorem found in a respected reference book, theorems in journal articles require a little more caution. There are 9

numerous examples of published theorems, often complete with proofs, that are not valid. Trust in a theorem can only be gained by proving it, or checking the proof presented in the article. This may require a signi cant e ort on the part of the reader, and is not always straightforward as such proofs are rarely presented in detail. The advent of hypertext journals like the The Journal of Universal Computer Science [14] may help improve things. Articles published in such journals could include not only the statement of a theorem and a sketch of its proof, but a signature from a trusted theorem prover. Verifying the signature would give the reader con dence in the theorem without needing to check its proof. Furthermore, such theorems could be saved for future use in the reader's own proofs. The hope that one day a large proportion of published mathematics could be mechanically checked is not necessarily a forlorn one. Another hypertext journal, The Journal of Formalized Mathematics, has already presented a great deal of work checked using the MIZAR [17] proof system.

Acknowledgments I would like to thank the following people who were kind enough to read and comment on earlier drafts of this paper: John Harrison, Emil Sekerinski, and Joakim von Wright. In addition, I would like to thank Arto Salomaa and Cunsheng Ding for explaining various issues of cryptography and hashing to me, and Konrad Slind for prompting me to consider the problems of version control for theorem provers. Where I have been wise enough to follow their advice, the comments of these people have helped me greatly to improve the paper. Any remaining aws are, or course, my own doing.

References [1] R. D. Arthan. HOL formalized: Formal design of the logical kernel. Speci cation Document ISS/HAT/DAZ/SPC001; issue 2.5, ICL Secure Systems, Eskdale Road, Berkshire RG11 5TT, England, Mar. 1996. [2] C. Cornes, J. Courant, J.-C. Filli^atre, G. Huet, P. Manoury, C. Mu~noz, C. Muthy, C. Parent, C. Paulin-Mohring, A. Sabi, and B. Wener. The Coq Proof Assistant Refernce Manual. INRIA, Domaine de Volceau, Rocquencourt | BP 105, 78153 Le Chesnay Cedex, France, 5.10 edition, July 1995. 10

[3] B. I. Dahn. Cooperation of automated and interactive theorem provers. In R. Matuszewski, editor, The QED Workshop II, number L/1/95 in Technical Reports, pages 31{32, Department of Logic, Warsaw University, Linuarskiego 4, 15-420 Bialystok, Poland, 20{22 July 1995. State Committee for Scienti c Research. [4] D. E. Denning. Digital signatures with RSA and other public-key cryptopsystems. Communications of the ACM, 27(4):388{392, Apr. 1984. [5] J. Denzinger and M. Fuchs. Goal oriented equational theorem proving using team work. In B. Nebel and L. Dreschler-Fischer, editors, KI-94: Advances in Arti cial Intelligence | Proceedings of the 18th German Annual Conference on Arti cial Intelligence, volume 861 of Lecture Notes in Arti cial Intelligence, pages 343{354, Saarbruecken, Germany, 18{23 Sept. 1994. Springer-Verlag, Berlin. [6] W. Die and M. E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22(6):644{654, Nov. 1976. [7] W. M. Farmer, J. D. Guttman, and F. J. Thayer. IMPS: An interactive mathematical proof system. Journal of Automated Reasoning, 11(2):213{248, Oct. 1993. [8] S. J. Garland and J. V. Guttag. A guide to LP, the Larch prover. Research Report 82, Digital Equipment Corporation, Systems Research Center, 130 Lytton Avenue, Palo Alto CA 94301, United States, Dec. 1991. [9] F. Giunchiglia. GETFOL manual. Technical Report 9107-01, DIST, University of Genoa, Via Opera Pia 11A, 16146 Genova, Italy, Mar. 1994. [10] M. J. C. Gordon and T. F. Melham, editors. Introduction to HOL: A theorem proving environment for higher order logic. Cambridge University Press, Cambridge, England, 1993. [11] M. J. C. Gordon, R. Milner, and C. P. Wadsworth. Edinburgh LCF, volume 78 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 1979. [12] J. D. Guttman. A proposed interface logic for veri cation environments. Technical Report M91-19, The MITRE Corporation, 202 Burlington Road, Bedford MA 01730-1420, United States, Mar. 1991. 11

[13] Z. Luo and R. Pollack. LEGO proof development system: User's manual. LFCS Report ECS-LFCS-92-211, Laboratory for Foundations of Computer Science, Department of Computer Science, University of Edinburgh, James Clerk Maxwell Building, The King's Buildings, May eld Road, Edinburgh EH9 3JZ, Scotland, May 1992. [14] M. Maurer and K. Schmaranz. J.UCS | The next generation of electronic journal publishing. Journal of Universal Computer Science, 0(0):117{126, Nov. 1994. [15] W. W. McCune. Otter reference manual and guide. Technical Report ANL-94/6, Argonne National Laboratory, Mathematics & Computer Science Division, 9700 South Cass Avenue, Argonne IL 60439-4801, United States, Jan. 1994. [16] L. C. Paulson. Isabelle: A Generic Theorem Prover, volume 828 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 1994. [17] A. Trybulec and H. A. Blair. Computer aided reasoning. In R. Parikh, editor, Logic of Programs: Proceedings, volume 193 of Lecture Notes in Computer Science, pages 406{412, Brooklyn, New York, 17{19 June 1985. Springer-Verlag, Berlin. [18] TTCP XTP-1. Proeedings of the Workshop on E ective Use of Automated Reasoning Technology in System Development. Naval Research Laboratory, 4555 Overlook Avenue SW, Wasgington DC 20375-5337, United States, 6{7 Apr. 1992. [19] M. Utting and K. Whitwell. Ergo user manual. Technical Report 93-19, Department of Computer Science, University of Queensland, QLD 4072, Australia, Oct. 1993.

12

Turku Centre for Computer Science Lemminkaisenkatu 14 FIN-20520 Turku Finland http://www.tucs.abo.

University of Turku  Department of Mathematical Sciences

 Abo Akademi University  Department of Computer Science  Institute for Advanced Management Systems Research

Turku School of Economics and Business Administration  Institute of Information Systems Science