Automated Conformance Testing of Java Virtual Machines - IEEE Xplore

2013 Seventh International Conference on Complex, Intelligent, and Software Intensive Systems

Automated Conformance Testing of Java Virtual Machines Andrea Calvagna, Emiliano Tramontana Dipartimento di Matematica e Informatica, University of Catania - Italy Email: {calvagna, tramontana}@cs.unict.it

incorrectly implemented or untested code, compromising the safety features of the system. Therefore, it is higly desirable to have an effective and automatic technique allowing to validate any JVM implementation and so assess its reliability, specifically in terms of its conformance to one very important aspect of the specifications: the type safety of a program. The main advantage of an automated approach to validation is that it need not direct human involvement, that might also be prone to mistakes, while being it a very hard and time consuming task. The paper is structured as follows. Section II provides an outline of our proposed solution. Section III describes the test generation approach step by step, giving details on both the formal modeling of the JVM specifications and how to combinatorially computing the test sequences. A discussion on a performed early assessment of the approach is presented in Section IV. Section V provides a comparison with relevant related work, and our conclusions are drawn in Section VI.

Abstract—We present a technique to fully automate the conformance testing of a Java virtual machine (JVM) implementation to the structural constraints it must satisfy to enforce type safety of program execution. The approach is based on formal modeling of the JVM as a finite state machine, ruled by the Java standard specifications. The model is used to derive a test suite and corresponding oracle that systematically explores the space of illegal states reachable by a JVM implementation under test. Also, a degree of conformance to the JVM specifications (i.e., too strict or too coarse) can be assessed by counting the number of false positives. Despite the huge test space, the entire proposed process need not human supervision. The technique is black box, fully automated, and can be applied for validating final products or during development i.e. for debugging purposes. Keywords-Software engineering; software testing; formal methods; Java;

I. I NTRODUCTION The Java computing environment has largely established itself as one of the main standards for Internet computing, ranging today from enterprise applications to small devices embedded applications. Its success is also due to the special attention put in the design of the platform’s security features, which are mainstream in an open computing environment in order to guarantee the safety of running programs and protect user data. This is more true for the widespread small and smart devices, such as smart cards, phones, and tablets, which we connect every day to the ubiquitous public Internet network, to access a vast range of personal services, like i.e. financial services. The official JavaTM computing platforms developed by the OracleTM company are indeed bullet-proof implementations of the JVM specifications, which in turn have already been formally proved secure thanks to a substantial research effort put in the last decades upon their formal design [1], [2], [3]. These official versions are indeed released after a vast amount of serious and hard testing/verification work that a company of the size of Oracle can afford. Note also that these are freely distributed only for noncommercial purposes, whereas they cannot be used as a support to the above mentioned and wide spread smart card based applications and services. Nevertheless, an actual nonofficial implementation of JVM, at the core of such Java based embedded runtime systems, in-house made in order to comply with budget constraints, may possibly contain 978-0-7695-4992-7/13 $26.00 © 2013 IEEE DOI 10.1109/CISIS.2013.99

II. OVERVIEW JVM validation by means of testing is a complex and time consuming task due to the huge number of possible internal states, execution paths, inputs of the system under test. It is therefore hard to find defects that could exist within an implementation of such a large and complex system [4]. Specifically, errors in the implementation of the structural constraints of the Java virtual machine specifications are the most dangerous because they result in potential important weaknesses of the security features of the execution environment, that is, a serious threat to the user’s data privacy. We present a fully automated technique to perform conformance assessment of a JVM implementation adherence to the structural constraints of the official specifications, which are those carrying the static security features of the Java computing platform. The Java language instruction set includes roughly three decades of statically typed opcodes of the generic format Topcode, where T is the set of eight different (basic or reference) supported types listed in Table I. That is, there is a specialized version of each instruction for every supported type of operands. Moreover, many instructions require one or two immediate operands to be specified, such as an index parameter ranging in the pool of local registers, or a start location for an array. Finally, they all require up to three parameters of a given type to be 547

set on top of the operand stack, before the opcode execution. As a consequence, a safety violation of the JVM structural constraints specifications may be the result of a wrong operand stack typing setup, or of a stack with inconsistent height with respect to different control flows. Moreover, a safety violation could be caused by a stack with fewer than expected parameters or without room for pushing the opcode execution results.

opcode

byte

short

int

long

float

double

char

reference

Table I E XAMPLES OF TYPE SUPPORT IN THE JAVA V IRTUAL M ACHINE INSTRUCTION SET FOR SOME COMMON OPCODES

As a result, the space of possible wrong typing situations that the JVM might deal with has a size in the order of opcodes × types × immediate operands × operand stack frames. Even if the maximum stack height needed during execution of an average Java method is typically low, i.e. about five or less, and the immediate operands for any instruction are two at most, the resulting figure is huge enough to discourage any user from manually dealing with a coverage based testing approach to the JVM validation. When considering ten types, ten instructions, each receiving maximum three operands and four positions for the operand stack, all the possible states are in the order of thousand of millions.

III. T EST SUITE GENERATION PROCESS The test suite generation process is composed of several steps whose dependencies are depicted in Figure 1 and described in the following.

Note that the combinatorial space is not simply flat, since not all these combinations are feasible in an actual Java program. I.e. only some combinations are compatible with the language syntax rules. In fact, from the Java specifications it is clear that each opcode might support a different subset of the T typing options, and it in turn with a different number of immediate and/or stacked operands. These differences could be modeled as constraints in a combinatorial model, but would still require quite a bit of manual effort. Instead, we observe that a different solution is to enumerate all flat combinations of pairs of opcodes, as if they would be scheduled for execution one after the other. In that case, the first opcode would in fact be the responsible for setting up the operand stack as required for the correct (or not) execution of the following opcode. As a consequence, by purposely selecting just the pairs that trigger an unsafe opcode invocation we have indirectly obtained a set of tests that is expected to trigger all actually realizable type errors that any correctly implemented JVM should be able to reveal. This validation technique is thus based on the combinatorial exploration of the space of illegal execution states of a JVM. This test suite creation process is completely automated, so that the large size of the test suite is not a problem but for the computing resources it will consume.

Figure 1.

Overall test suite generation and application process overview

A. Opcodes combinatorial enumeration A representative subset of the JVM opcodes have been modeled as template strings of the generic format or , where is an integer index ranging in the JVM local registers array, and Table II lists the currently considered options for the class and its type specialization . All modeled opcodes are enumerated with respect to their type variants and range of representative immediate operand values, by means of a combinatorial tool. Specifically, the ttuples [5] tool has been used to that purpose. This tool allows easy modeling of a set of generic options and their values as abstract input domains [6], [7], possibly subject to constraints expressed in terms of propositional first order logic. Of course, we are interested only in enumerating opcodes that comply with the JVM language syntactic rules. Thus, the combinatorial model includes a set of basic constraints that rule out syntactically malformed opcode expressions from the enumeration. The

For each of these unsafe/illegal execution states a triggering byte code test sequence is derived by means of a JVM formal model, as explained in the following sections. The JVM model also will tell us how to judge the test result after its execution, thus enabling also the automation of the test evaluation.

548

stack image just before the trailer is also saved for future reference as the JVM state Si . Table III reports some of the counterexample byte code sequences computed by the model checker in response to a few example input opcodes.

enumerated opcodes are also combined with themselves in pairs, and the resulting pairs list is saved to be used later on in the test suite generation process. Table II C URRENTLY SUPPORTED OPCODE TEMPLATES AND STATIC TYPE SPECIALIZATIONS . opcodes: types:

C. SAT solving We now recall all enumerated pairs of opcodes , and for every such a pair we invoke the NuSMV tool to check if the following byte code sequence is legal:

{load, store, pop, push, binop, swap, new, getfield, putfield, invoke, return} {byte, short, int, long float, double, char, reference}

header − opcodei ;

trailer − opcodej ;

that is, to check the sequence obtained by taking the header sub-seq, computed for the first opcode in the pair and appending to it the trailer of the second opcode, in order to determine if it is a correct or a faulty Java program. This is really a boolean satisfiability problem, since to answer this question it suffices to check if the (saved) operand stack state resulting from the execution of the first opcode header satisfies the structural requirements of the second opcode in the pair. In fact, the model checker computes the answer to this question by actually invoking in turn its built-in SAT solver on the following expression:

B. Model checking A JVM formal model has been realized as a finite state machine (FSM) in NuSMV [8] in order to simulate the operation of a Java virtual machine reference implementation. Of course, only the required core of the JVM has been modeled, that is, the JVM components needed to represent a correctly operating implementation of the opcodes static semantic rules. Specifically these safety rules are modeled as invariants. As an example, to model the safety rule for the load byte code instruction, an invariant has been defined as follows:

¬EG(stack = Si ∧ opcode = opcodej )

load =⇒ ¬stackF ull∧subtype(local[n], stack[top])

If the answer is no the whole sequence resulting from the append operation is saved as a test sequence triggering a type safety fault in the JVM, and added to the test suite build up. Otherwise, the test sequence might be saved as a positive test or simply be discarded. As an example, Table IV reports some of the test sequences resulting from the combinations of opcodes pairs of headers and trailers previously computed, and the respective outcome of the SAT solver. Note that this latter is actually an oracle that later will drive the automated test result reporting when the test suite will be applied, which is of course the last step of the process.

where local[n] is the type of the n-th local register content, stack[top] is the top element of the abstract operand stack, and subtype is a boolean relation returning True if the first operand is a known subtype of the second operand. The model checker is invoked on each enumerated opcode to check for the truth of the following computational tree logic template property: ¬(EX opcode = opcodei ∧ HALT ) where HALT := EF (opcode = return)

IV. E ARLY ASSESSMENT

and opcodei is the i-th enumerated opcode. This property means that no execution path of the JVM model exists that reaches a state whereby the next opcode scheduled for execution is the selected opcodei and that eventually terminates without errors. This property will always be false, since all opcodes fed to the model checker are correct by construction. The model will then output a proof that the property is false, that is just a counterexample execution trace reaching a state where opcodei is executed and ending with the return opcode. For every input opcodei , the resulting counterexample execution trace is then split in two parts: the header, which is the starting subsequence up to the first occurrence of the considered input opcode, not included, and the trailing subsequence, which is all the remainder. Moreover, the state of the JVM, that is the abstract operand

Despite the testing technique described in this paper is indeed fully automated from the test generation phase to the test application and evaluation, we are in the process of implementing the several scripts needed to connect each process step to the following. In order to check the soundness and the applicability of the proposed approach in practice, as soon as possible, a bunch of test sequences has been computed and manually applied to a common open source JVM implementation which is part of the Bytecode Engineering Library (BCEL) by Apache’s Commons project [9]. In order to do that, we have used a front end to the BCEL library which is the java byte-code editor (JBE) application. JBE is a bytecode editor suitable for viewing and modifying Java class files. It is built on top of the open-source jclasslib bytecode viewer by

549

opcode store_0 load_1 invoke new

header push; push; push; new; swap; push; push;

stack@opcode [ref, -, -] [short, -, -] [ref, ref, ref] [ref, -, -]

trailer store_0; push; return; load_1; pop; return; invoke; return; new; store_2; return;

Table III A N EXAMPLE OF EXECUTION TRACES OUTPUT PROVIDED BY THE MODEL CHECKER FOR SOME COMMON OPCODE INSTANCES . N OTE THAT THE MAX DEPTH OF THE OPERAND STACK FRAME IMAGE IS HERE SET TO THREE SLOTS , AND THE LOCAL REGISTERS ARRAY IS ALSO ASSUMED TO BE [ REF, SHORT, UNINIT ].

opcodes pair store_0 - load_1 store_0 - invoke store_0 - new load_1 - store_0 load_1 - invoke load_1 - new invoke - store_0 invoke - load_1 invoke - new new - store_0 new - load_1 new - invoke

test sequence push; load_1; pop; return; push; invoke; return; push; new; store_2; return; push; store_0; push; return; push; invoke; return; push; new; store_2; return; push; store_0; push; return; push; load_1; pop; return; push; new; store_2; return; push; store_0; push; return; push; load_1; pop; return; push; invoke; return;

Table IV C OMBINED TEST SEQUENCES COMPUTED FOR SOME OPCODES PAIRS ,

ej-technologies. Figure 2 shows an example screen shot of the running application, including an editing window open on a Java method’s bytecode. For verification and exporting the class files, JBE uses the BCEL library. By means of these tools we were able to create actual byte code test sequences in form of .class formatted files for some example test sequences matching those listed in Table IV. In fact, a dummy template program has been written and compiled with a Java compiler version 7, and then opened for editing in the JBE editor. This way, it has been possible for us to insert every computed test sequence in the template and save the modified .class file as a syntactically correct Java byte code program. These programs have then been fed into the BCEL built-in verifier, directly accessible through the JBE front end, and each of such invocations has shown that the BCEL JVM correctly revealed the expected type of injected fault, if any, as it should be for any JVM with a correct implementation of the Java specifications static checks for type safety of input programs. Figure 3 shows an example screen shot of such an event, with the verifier reporting a type violation with respect to the operand stack top contents, which was expected to be of type int but found to be a null reference. Note that this is just the fault that the test sequence pusposely sets up with the previous aconst_null opcode, that is a push in our model abstract syntax. We are currently at work to complete the implementation of the missing scripts to automate the whole process in order to extend the evaluation

expected outcome ok fault ok fault fault ok ok fault fault ok false fault

AND THEIR RESPECTIVE ORACLE DATA .

of this approach and present the results from its systematic application, that is including all computable fault triggering test sequences, to several existing JVMs, both commercial and free, and also to a commercial JVM implementation currently at the development stage.

Figure 2.

The open source Java Bytecode Editor front end screen-shot.

V. R ELATED

WORK

One main distinction that should be made is between research aimed at validating the JVM itself, particularly

550

complexity is not a random attribute but a carefully designed feature to make verification harder to accomplish, in terms of computational hardness of detecting the fault in each test case. More specifically, hardness of error detection is expressed in terms of the number of structural axioms that have to be exercised to reveal an illegal byte code in a single test, instead of cumulatively by all tests. Savary et al. [11] proposed a similar approach to testing a Java byte code verifier whereby test cases are produced by means of an abstract model of JavaCard specifications in terms of a preamble and a post-amble to a faulty instruction, that is whose execution is requested in a state that violates preconditions. Their approach is based on the enumeration and analysis of all possible negations of the structural constraints preconditions, which have to be modeled as the conjoin of elementary boolean expressions in order to analyze all realizable ways to negate a constraint. Unfortunately, how to automate the production of such expressions is not shown. Actually, manual processing all preconditions can be cumbersome and time consuming. In fact, it seems that preconditions have to be manually split into their elementary boolean parts in order to be mapped each to a different guard in a corresponding finite state model. Our approach need not doing so, as we rely on pairs of instructions, one setting a proper stack state, and the other that may (or may not) be faulty when executed next, whose enumeration and processing is completely automated. Another important consideration that might be made is that an approach to validation entirely based on formal reasoning and thus rigorous proving a given JVM implementation correctness is, of course, much better than a test based validation, which is unable to guarantee that it is completely fault free, no matter how thorough and smart the applied testing technique. However, there are several practical reasons to better reconsider the convenience of a testing based approach with respect to i.e. theorem proving. As an example, the time complexity of formal reasoning based verification of an incorrect byte-code program is exponential in the size of the program [12]. Bousquet et al. argue that testing and verification have two different objectives, and while verification is difficult and requires a good knowledge of the application code, testing is easier to carry out even without such a knowledge [13]. Moreover, verification is limited by the power of the tools and by the risk that insufficient or inappropriate system specifications prevent important faults from being detected. Sirer noted that known verification approaches operate on an abstract model of an implementation instead of its actual code [14]. This has the two-fold disadvantage that modeling a complex implementation is still undermined by the extensive error-prone human involvement required, hence reducing the value of being able to apply formal reasoning to it. Moreover, the model might miss important implementation details that, while apparently would not

Figure 3. Screen shot of the BCEL verifier invocation outcomes correctly revealing a fault triggering test sequence.

of its component implementing byte code verification, and research aimed at validating applets of such environments, which is not within our subject of interest. The aim is to help filling the gap between an already rock-solid specification and its actual implementations that may instead present flaws. We also choose to follow a validation approach instead of a formal verification of the JVM as indicated in the introduction. A test based approach aimed to the JVM, based on production grammars, has been first proposed by Sirer in the Kimera project [10]. However, they produced faults in the test programs by randomly altering one single byte in a correct byte code sequence. This random technique required over sixty thousands tests to allow a very broad coverage of the specification, but revealed itself unable to deeply test the implementation code. As Sirer itself states in the Kimera project web page: The randomly introduced errors were often easily detectable and do not exercise the more subtle aspects of Java code verification. Subtle flaws or flaws that may exist under complicated circumstances are typically not triggered with this approach. Indeed, some other automatic generation technique to create deeper, more stressful test cases for the Java virtual machine was still missing. Our test template embedding faulty byte code sequences, directly derived from the specifications constraints, provides a solution to this problem. In contrast to Sirer, our test cases

551

R EFERENCES

affect the model, could actually have subtle and undesired effects on the implementation behavior. Barbu et al. in [15] practically demonstrated, with a concrete successful attack, that security-breaks of even the most updated Java technology is still possible, in spite of all advances made by its security oriented features. The authors point out the strong need to pay very high attention on the validity of the actual software implementations of the Java specification, in particular for embedded systems. In fact, while the latest versions of these specifications already allow a very strong confidence with respect to its security and robustness against malicious attacks, their software implementations actually found on devices can potentially fail to enforce the expected, matching level of security against i.e. fault injections or other hardware based attacks, especially if additional complementary security-oriented features are let out of their software design. It follows that it is of major importance not to loose the contact with the actual implementation of a product when selecting a strategy for its validation, that is, to avoid introducing too many or too complex abstraction layers, which increase the complexity of the human modeling activity and in turn the chances of introducing errors as a consequence. Our proposed process has the advantage of requiring only one time a minimal modeling of the JVM, which can then be verified in arbitrarily strong ways prior to its usage, and then is fully automated: no human errors can be introduced during application of the testing process.

[1] D. Basin, S. Friedrich, and M. Gawkowski, “Bytecode verification by model checking,” Journal of Automated Reasoning, vol. 30, no. 3, pp. 399–444, 2003. [2] S. N. Freund and J. C. Mitchell, “A type system for the java bytecode language and verifier,” J. Autom. Reason., vol. 30, no. 3-4, pp. 271–321, Aug. 2003. [3] T. Nipkow, “Java bytecode verification,” Journal of Automated Reasoning, vol. 30, no. 3, pp. 233–233, 2003. [4] A. Calvagna, G. Pappalardo, and E. Tramontana, “A novel approach to effective parallel computing of t-wise covering arrays,” in Proceedings of International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE). IEEE, 2012, pp. 149–153. [5] A. Calvagna and A. Gargantini, “T-wise combinatorial interaction test suites construction based on coverage inheritance,” Software Testing, Verification and Reliability, vol. 22, no. 7, pp. 507–526, 2012. [Online]. Available: http://dx.doi.org/10.1002/stvr.466 [6] E. Borger and R. F. Stark, Abstract State Machines: A Method for High-Level System Design and Analysis. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2003. [7] “The ASMETA project.” http://asmeta.sourceforge.net

[Online].

Available:

[8] A. Cimatti, E. Clarke, E. Giunchiglia, F. Giunchiglia, M. Pistore, M. Roveri, R. Sebastiani, and A. Tacchella, “Nusmv 2: An opensource tool for symbolic model checking,” in Computer Aided Verification. Springer, 2002, pp. 241–268.

VI. C ONCLUSIONS

[9] “The byte code engineering library (apache commons bcel).” [Online]. Available: http://commons.apache.org/proper/commons-bcel

We presented an automated testing process suitable to assess the conformance to the structural specifications of a JVM implementation, without requiring direct human assistance nor supervision, thanks to automated test generation and results evaluation. This can be used to validate complete, final implementation of JVMs or on a work in progress, to perform regression testing, or to debug partial implementations available at different stages of development. Unlike existing approaches, this work is based on the combination of a complex control-flow template program, designed to stress the BCV type inference algorithm, and a combinatorial coverage of realizable structural specification violations, by means of faulty byte code sequences that the test template will embed. To the best of our knowledge, this is the first attempt to systematically validate a BCV implementation with a black box automated technique designed to reveal either flaws in instructions precondition checks or in the type inference algorithm. It is also the first combinatorial approach to JVM testing, since existing works aimed at Java applets testing.

[10] E. G. Sirer and B. N. Bershad, “Testing java virtual machines,” in Proc. Int. Conf. on Software Testing And Review, 1999. [11] A. Savary, M. Frappier, and J. Lanet, “Automatic generation of vulnerability tests for the java card byte code verifier,” in Proc. Conf. Network Inform. Systems Security. IEEE, 2011. [12] D. Basin, S. Friedrich, J. Posegga, and H. Vogt, “Java bytecode verification by model checking,” in Computer Aided Verification. Springer, 1999, pp. 681–681. [13] L. Du Bousquet, Y. Ledru, O. Maury, C. Oriat, and J.-L. Lanet, “Reusing a jml specification dedicated to verification for testing, and vice-versa: case studies,” Journal of Automated Reasoning, vol. 45, no. 4, pp. 415–435, 2010. [14] E. G. Sirer and B. N. Bershad, “Using production grammars in software testing,” in ACM SIGPLAN Notices, vol. 35, no. 1. ACM, 1999, pp. 1–13. [15] G. Barbu, H. Thiebeauld, and V. Guerin, “Attacks on java card 3.0 combining fault and logical attacks,” in Smart Card Research and Advanced Application, ser. LNCS. Springer Berlin Heidelberg, 2010, vol. 6035, pp. 148–163.

VII. ACKNOWLEDGMENT This work has been supported by project JACOS funded within POR FESR Sicilia 2007- 2013 framework.

552