Use of Software Failure Data from Large Space Systems - CiteSeerX

Use of Software Failure Data from Large Space Systems Myron Hecht The Aerospace Corporation El Segundo, California. USA [email protected]

1 INTRODUCTION Large Space Systems consist of space vehicles, terrestrial broadcast/receiving stations (often referred to as “ground antennas”), tracking and communications networks, and control centers – all of which rely extensively on embedded computing. For both the terrestrial and space-borne elements, software is becoming a significantly more important cause of operational failures [1]. Hence, there is a growing need for collection of valid software failure data that can be properly used to assess and improve dependability. This paper provides a brief overview of the challenges of collection and use of such failure data and is organized into two general categories: issues that are common with other real-time systems, and those that are (more or less) unique to the space domain.

used to determine the measure of association of candidate causes and effects during the development process and analyses can therefore be used to make adjustments in the development process. However, such data are not useful for quantitative measurement of reliability, availability, recovery time, recovery probability, and extent of common mode failures in replicated systems. For such purposes, most software defect reporting systems (also called “issue trackers” or “bug trackers”) are insufficient because they do not assure that operating time is collected, that all failures (whether they occur repeatedly and whether or not they can be reproduced) are recorded, and that data for consequent system recovery actions (or the lack of such actions) are recorded. Because of these problems, operating system event logs have been used to estimate quantitative parameters for software and system reliability and availability [4,5,6] 2.2 Parameter Estimation from Failure Data

2 COMMON ISSUES Software in space systems has much in common with other real-time systems. Satellites have embedded processors for attitude control, navigation, communication, thermal control, and for the payload. Terrestrial components of the satellite communications systems are similar (and in many cases identical) to other data networks. Satellite ground control centers use the same hardware and software infrastructures as conventional information technology applications. This section discusses three software failure data issues that are common with other applications. 2.1 Qualitative vs. Quantitative Use of Failure Data Data on software failures (or more precisely, system failures for which software defects were the primary causes) can be used for both qualitative and quantitative assessments. It has been the experience of this author that the most readily available failure data are those collected during development as part of a discrepancy reporting system. For qualitative assessments, this type of data (if properly recorded) can be used for qualitative purposes to identify problems and needed improvements in the development process. For example, classification techniques such as the Orthogonal Defect Classification Scheme (ODC) [2] and its variations [3] can be

Methods for estimating parameters (and associated confidence intervals) related to software reliability such as failure rates, recovery times, and recovery probabilities are well documented [6,7,8]. These methods assume that failures are stochastic processes (often referred to as Heisenbugs [9]) and that failure times exponentially distributed. Multiple researchers have confirmed these assumptions for well-tested software [10,11,12]. The parameters for software components of a larger system can be combined together with the hardware components to develop system level models for dependability [12,13,14]. However, there are also deterministic failures that are triggered by predictable circumstances (called Bohrbugs [15]) for which stochastic modeling is inappropriate. A classic example (for which failures were mostly prevented) is the turn-of-the-century rollover of date formats, the so-called “Y2K” bug. Different approaches must be used to predict system reliability for such deterministic failures. When the specific details of all (or substantially all) failures are known, system dependability prediction can be achieved by separating out deterministic failures and applying parameter estimation to the remaining random failures and then adding the deterministic effects for the segregated failures. However, such filtering is a problem when the analyst is remote (in time or space) from the event --

particularly when the only available data are operating system logs as opposed to the more detailed software discrepancy reports. In order to avoid inaccuracies in system reliability prediction for software failures, data recording and analysis methods are needed to distinguish between these types of failures 2.3 Software Reliability Growth Models and Failure Data Software reliability growth modeling (also referred to less precisely as "software reliability engineering") is used during development to predict when (or whether) a certain failure rate goal will be reached or what failure rate will be achieved if testing and other fault removal processes are stopped at a known time [16,17,18,19]. These are key questions for project management – particularly during latestage testing when budgetary or time pressures are strong. CASRE [20] and SMERFS [16] are freely available software tools that have significantly simplified the process of performing reliability growth modeling by automating its mechanical (although often quite involved) aspects. Both have modest input data requirements (time of occurrence and an optional severity) and incorporate multiple reliability models and least squares and maximum likelihood methods for parameter estimation. Both also provide figures of merit that can be used for ranking the predictive value of alternative software reliability growth models. As a result of this automation, software reliability growth modeling is much more accessible to non-specialists. However, results produced by these software tools are greatly influenced by the choices of what models to use, what data to include, and how to properly pre-process the data. For example, different models (e.g., Geometric, Musa, and Jelinski-Moranda) can predict failure rates that differ by a factor of three even though they fit the same historical failure data equally well (see, e.g., [21]). A second data-related example is the use of raw calendar time without properly accounting for test processing time and development effort. Improper time normalization for testing (i.e., bug detection) and removal efforts (whether measured in processor time or labor-months) can lead to seriously erroneous conclusions about the relative maturity of the software and its readiness for release. More widely disseminated guidelines on the use of these issues are necessary for credible software reliability growth modeling. 3 UNIQUE ISSUES The unique aspects of space system software are affected by the attributes of space systems in general including their very low production levels and high unit values, the large military or commercial significance of their missions, environmental stresses, and safety concerns. 3.1 Data Access Most organizations that own (or control) failure data from

space systems have corporate proprietary interests and national security concerns that impact the accessibility of external researchers. Identifying information on either the development organization or the specific characteristics of the problem is often difficult or impossible to remove, and even if it were possible, the owners of such data would not have the confidence that the data had in fact been anonymized. One possible solution is the development of a generalized coding scheme for such purposes. Eventually, a sufficiently widely recognized scheme could be used to support requests for release or access to such data and would also facilitate crossproject studies. 3.2 Standardized Data Specifications and Formats Progress in empirical software engineering as well as software reliability prediction rests on the ability to perform comparative studies across multiple comparable projects. Research questions such as “what is the effect of a specific development or test practice on operational reliability or dependability?” can best answered through such studies, and the general similarity of large space systems makes this domain a fertile area for this research. Unfortunately, there are no widely recognized standards for software failure data specifications, formats, and coding schemes to enable interchange in the space domain. Thus, most analysis of failure data (whether quantitative or qualitative) starts with a data conversion process followed by an arduous translation and standardization effort. The absence of standardization serves as a barrier to low cost as well as rapid turnaround analyses, and serves to reduce the viability of failure data feedback as part of the reliability growth process. Some progress in this area is being made. A recent data standardization guide for space systems (which includes software) was published by a consortium or government and industry organizations [22]. 3.3 Failure Data For “Legacy” or “Reused” Software: The design philosophy of space hardware and software is usually conservative and favors “re-use” of designs and software that have been “qualified”, i.e., been tested or successful in previous missions. On the other hand, the use of externally developed (frequently commercially sold and supported) software for common functions (e.g., operating systems, standardized satellite bus control applications, standardized telemetry packages, etc.) is often perceived as a means of reducing both development effort and schedule. Determining whether this perception is true requires data on the failure behavior of the precise software version under consideration in a comparable environment and using an operational profile comparable to what will be experienced for the space system. Both qualitative failure mode and quantitative reliability are necessary for the assessment, improvement, and integration of reused software (operating systems,). However, such data are not collected by the developer of the software under consideration and the program

management is faced with three unattractive options: expending resources to qualify the software, not using the software, or accepting the risk. Standards for operating and failure data collection and analysis to enable programs to utilize past experience would greatly facilitate the use of externally developed software with consequent cost savings and reduction in development times.

IEEE Transactions on Software Engineering, vol. 18, no. 11, pp. 943-956, Nov., 1992. 3

Chillarage, Inc., http://www.chillarege.com/odc/odcpapers.html, last visited 27 December 2006

4

R. Iyer, Z. Kalbarczyk, M. Kalyanakrishnam, “Measurement-Based Analysis of Networked System Availability,” in Performance Evaluation Origins and Directions, G. Haring, Ch. Lindemann, M. Reiser, eds., Lecture Notes in Computer Science, Springer Verlag 1999

5

J. Xu, Z. Kalbarczyk, R. Iyer, “Networked Windows NT System Filed Failure Data Analysis,” Proc. of Pacific Rim International Symposium on Dependable Computing, PRDC'99, Hong Kong, 1999.

3.4 Prediction of Catastrophic Failure Probabilities While most software failures in space systems are recoverable, some failures have mission-ending or even lifethreatening consequences. Examples include the Ariane 5 Launch Vehicle, the Mars Climate Orbiter, the Mars Polar Lander, and the Titan Huygens Probe. Often, such failures are triggered by rare single events or combinations of multiple events that were not anticipated during the development and testing effort. Bayesian methods have been used to predict probabilities for low frequency events for hardware systems [23], and extreme value distributions have been used for low frequency software failures [24]. However, at present, there is not sufficient experience with such methods for space software to establish credibility. Hence, the only accepted methods are defect prevention and removal by means of qualitative safety analysis methods such as Failure Modes and Effects Analyses and Fault Tree Analysis. Methods of operational experience and failure data collection to enable more sophisticated prediction techniques, analogous to the “stress-strain” probabilistic methods used in mechanical systems are needed. 4 CONCLUSIONS Many leading authorities in both software engineering and software reliability have identified the importance of the empirical base upon which both disciplines rely. Proper use of valid failure data is as important for these fields as for any other engineering discipline. Hopefully, by addressing the issues identified in this paper as well as others, we will be able to reach a point where it will be possible to use failure data to identify the proper independent variables, collect them from multiple projects, and measure the degree of association with operational reliability and dependability. Such an achievement would set the stage for the technical and mathematical rigor necessary for future advances in these fields. REFERENCES

6 D. Tang and M. Hecht, “Evaluation of Software Dependability Based on Stability Test Data,” Proc. 25th Int. Symp. Fault-Tolerant Computing, Pasadena, California, pp. 434-443, June 1995 7

Myron Hecht, Dong Tang, Herbert Hecht, and Robert Brill, “Quantitative Reliability and Availability Assessment for Critical Systems Including Software” Proceedings of the 12th Annual Conference on Computer Assurance, June 16-20, 1997, Gaithersburg, Maryland, USA

8

I. Lee, D. Tang, R.K. Iyer, and M.C. Hsueh, "Measurement-Based Evaluation of Operating System Fault Tolerance," IEEE Transactions on Reliability, June 1993, pp. 238-249.

9 S. Bourne, “A Conversation with Bruce Lindsay”, ACM Queue, Error Recovery, Vol. 2, No. 8 - November 2004, available at 10 Phyllis Nagle and James A Skrivan, "Software Reliability: Repetitive Run Experimentation and Modeling", NASA CR-165836, February 1982. 11 Edward N. Adams, "Optimizing Preventive Service of Software Products", IBM Journal of Research & Development, Jan. 1984, pp 2-14. 12 M.C. Hsueh and R Iyer, "Performability modeling based on real data: A case study", IEEE Transactions on Computers, vol. 37 no. 4, April 1988, pp. 478-484. 13 Myron Hecht, Dong Tang, Herbert Hecht, and Robert Brill, “Quantitative Reliability and Availability Assessment for Critical Systems Including Software” Proceedings of the 12th Annual Conference on Computer Assurance, June 1620, 1997, Gaithersburg, Maryland, USA

1

Paul Cheng, “Ground Software Errors Can Cause Satellites to Fail Too”, Proc. Ground Systems Architecture Workshop 2003, available from http://sunset.usc.edu/gsaw

14 Inwhan Lee and Ravi K. Iyer, "Software Dependability in the Tandem GUARDIAN System," in IEEE Transactions on Software Engineering, May 1995, pp. 455-467

2

R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M.-Y. Wong, "Orthogonal Defect Classification-A Concept for In-Process Measurements,"

15 [Wiki06] Wikipedia Bohrbug entry, http://en.wikipedia.org/wiki/Bohr_bug. last visited 27 December 2006

16 W. Farr, “Statistical Reliability Modeling Survey,” Handbook of Software Reliability Engineering, M. Lyu, Editor, McGraw-Hill, New York, pp. 71-117, 1996. 17

AIAA/ANSI R-013-1992, Software Reliability, pp. 1-2

Recommended

Practice:

18 M. C. K. Yang, A. Chao, “Reliability-Estimation & Stopping-Rules for Software Testing, Based on Repeated Appearances of Bugs”, IEEE Transactions On Reliability, Vol. 44, No. 2, June 1995, p. 315 19 D. R. Wallace, “Is Software Reliability Modeling a Practical Technique?”, 2002 Software Technology Conference, available on line at www.stconline.org/stc2002proceedings/SpkrPDFS/ThrTracs/p411. pdf 20 Allen Nikora, Open Channel Foundation CASRE web site, http://www.openchannelfoundation.org/projects/CASRE_ 3.0/.do 21 Myron Hecht and Douglas Buettner, “Use of Flight Software Failure Data from Unit and Integration Testing to Predict System Reliability,” Proc. 22nd Annual Aerospace Test Seminar, Manhattan Beach, CA October, 2006, proceedings available through http://www.aero.org/conferences/ats/22ndATS.html 22

B. Wong, R. Duphily, and M. Boeck, “Data Description and Format Specification for Space Vehicle Pre-Flight Anomalies”, Aerospace Corporation Report No. TOR2006(8546)-4603. Available from The Aerospace Corporation, El Segundo, California

23 Ronald L. Iman, Stephen C. Hora , “Bayesian Methods for Modeling Recovery Times with an Application to the Loss of Off-Site Power at Nuclear Power Plants” , Risk Analysis, Vol. 9 Issue 1 Page 25 March 1989 24 H. Hecht and P. Crane, "Rare Conditions and their Effect on Software Failures," Proceedings of the 1994 Reliability and Maintainability Symposium, January 1994, pp. 334 337.