Evolution and Adaptation of the VLT Data Flow System

25 downloads 5416 Views 1MB Size Report
hard work of dedicated development and operational staff, is what has made the ..... Linux. Some other products like the Next Generation Archive System, have ...
Evolution and Adaptation of the VLT Data Flow System Jens Knudstrup1, Karim Haggouchi, Michele Peron, Peter Quinn, Pascal Ballester, Klaus Banse, Tim Canavan, Maurizio Chavan, Nicolas Devillard, Dario Dorigo, Carlos Guirao, Carlo Izzo, Yves Jung, Nick Kornweibel, Cynthia Mavros, Gerhard Mekiffer, Andrea Modigliani, Ralf Palsa, Francesco Ricciardi, Cyrus Sabet, Fabio Sogni, Jakob Vinther, Andreas Wicenec, Bruce Wiseman, Stefano Zampieri Data Management and Operations Division, European Southern Observatory, Karl-Schwarzschild-Str. 2, D-85748 Garching bei München, Germany. ABSTRACT The VLT Data Flow System (DFS) has been developed to maximize the scientific output from the operation of the ESO observatory facilities. From its original conception in the mid 90s till the system now in production at Paranal, at La Silla, at the ESO HQ and externally at home institutes of astronomers, extensive efforts, iteration and retrofitting have been invested in the DFS to maintain a good level of performance and to keep it up to date. In the end what has been obtained is a robust, efficient and reliable ’science support engine’, without which it would be difficult, if not impossible, to operate the VLT in a manner as efficient and with such great success as is the case today. Of course, in the end the symbiosis between the VLT Control System (VCS) and the DFS plus the hard work of dedicated development and operational staff, is what has made the success of the VLT possible. Although the basic framework of DFS can be considered as ‘completed’ and that DFS has been in operation for approximately 3 years by now, the implementation of improvements and enhancements is an ongoing process mostly due to the appearance of new requirements. This article describes the origin of such new requirements towards DFS and discusses the challenges that have been faced adapting the DFS to an ever-changing operational environment. Examples of recent, new concepts designed and implemented to make the base part of DFS more generic and flexible are given. Also the general adaptation of the DFS at system level to reduce maintenance costs and increase robustness and reliability and to some extend to keep it conform with industry standards is mentioned. Finally the general infrastructure needed to cope with a changing system is discussed in depth.

Keywords: ESO DFS, system adaptation, software engineering, software testing, database systems.

1. INTRODUCTION At earlier SPIE conferences the concepts of the DFS have been described in detail ([1]-[6], see also [7]). For this reason we will refrain from repeating this information in depth in this article. Instead a brief overview is given here with the basic flow model representing the DFS as starting point (figure 1).

Figure 1: Data Flow Model of the VLT based on the Observation Block as the basic component. 1

Email: [email protected]

For the VLT two main modes of operation are defined: 1) Visitor Mode and 2) Service Mode. The former corresponds to the traditional way of observing, whereas for the latter all operations of the observing programmes are carried out by ESO staff. The proposal preparation for observation programmes consists of two phases. During Phase 1 a proposal is submitted containing mainly a description of the scientific goals of the programme. This is submitted to the ESO OPC (Observing Programme Committee). Subsequently the selected programmes will be specified in detail during Phase 2, whereby the P2PP (Phase 2 Proposal Preparation) tool is used by the astronomer to prepare the observation run. During this phase, the Observation Blocks (OBs) are prepared and usually written into the OB Repository at the ESO HQ. The OB is the smallest observational unit in the DFS and the VLT as such. It defines a set of sequences called templates to 1) Obtain the proper conditions for the observation, 2) Carry out the actual observation, resulting in one or more data frames. During Phase 1 and 2, Exposure Time Calculators (ETCs, [9], [10]) and Star Catalogue browsers (e.g. SkyCat, [8]) assist the user. During the execution of an observing programme in Service Mode, OBs are scheduled via the Observation Tool (OT), and submitted to the VLT Control SW (VCS) OB execution tool (BOB - Broker for OBs). Latter converts the actions defined in the templates to commands, that can be send to the VCS subsystems. The raw frames delivered by DFS are archived by the DFS Online Archive System (OLAS, [13]). OLAS performs some basic checks of the data before archiving these, and schedules the frames for persistent storage on e.g. DVDs. OLAS also distributes the data to other processes, subscribing to the data. An example of this is the pipelines ([20]), which reduce the frames for e.g. quicklook purposes. This is used by the operators to perform a verification of the quality and correctness of the observations performed. At the ESO HQ data frames are checked for quality concerning the data itself and the FITS header information. During this phase, suitable Master Calibration Frames are produced and archived to be used when reducing the raw science data.

2. THE ORIGIN OF CHANGING REQUIREMENTS Many new requirements imposed on DFS undoubtedly origin from the appearance of new, complex, high performance instruments. When the first building blocks of the DFS were laid in the mid 90s, still a lot of concepts and requirements needed to be refined and many new concepts had to be tried out in practice. Due to this, it is necessary to enhance and improve the system gradually. In general what creates new requirements and requests for modification towards the DFS are aspects such as: • • • • • • •

New instruments e.g. with new features and higher data rates and new instrument modes. Operation teams have developed schemes for operating the telescopes and have optimized existing procedures. New requirements have been derived from this in the form of support tools and new features needed in existing tools. Requests from users at the telescope sites, at the ESO HQ, but also to a large extend from external users, e.g. from the users of the proposal preparation tools, ETCs and the Science Archive Facility. Software problem reports requesting changes of the software in connection with bugs, new features or optimization. New observing methods e.g. for adaptive optics or from interferometry (VLTI, [14]). New requirements coming from large survey programs and in particular very demanding requirements coming from high performing, specialized survey telescopes. Introduction of new technologies, which simplify the system structure and improve the system performance and level of services that can be provided.

An example of an instrument that has generated requirements towards DFS is VIMOS with the Mask Manufacturing Unit. This generated new and demanding requirements towards OT to enable ESO Staff to prepare properly the VIMOS OB scheduling. The new Mask Tracker feature of OT supports the manufacturing, storage, insertion and discarding of masks in the so-called Instrument Cabinets in parallel with the usual OB execution process. The related specifications arrived some time after OT was in operations already supporting the FORS1/2, UVES and ISAAC instruments. Another issue in connection with VIMOS is the fairly high data rate (up to 20 GB/night). Also instruments like WFI (La Silla 2.2m, average 15 GB/night, maximum 50 GB/night) and the VLTI MIDI instrument (up to 40 GB/night) are a challenge for the archive. In the extreme category we find VST/OmegaCam (up to 75 GB/night) and VISTA (up to 300400 GB/night). For the instruments producing more than 40 GB/night, the limits of the capacity of the current DFS have been reached. This goes in particular for the archiving of data, but also for the pipelines, which of course have to be very performing when such large amounts of data must be processed. Apart from producing high data rates, survey programmes have also generated another type of requirement, which is the need to group files that belong together in one ’patch’. Up till now, DFS has not supported such a feature to link files

together. The problem already starts at OB level whereby it is not foreseen to group OBs that logically belong together. Also the archive DB facility does not offer such a feature. For the moment the file handling tool Gasgano ([23]) have been used to tackle this problem. First operation with PHRS

First use of DFS components at the NTT

DFS implemenation starts based on FORS1/ISAAC



First operation with Tcl/Tk version of P2PP

Implementation of Integration Tests initiated (SEG)



NTT overhaul initiated (NTT Big Bang)



Java implementation of P2PP initiated

Online Archive System in operation at the NTT



First Light VLT

FORS1 installed at the VLT

First ETC in operation

First operation FORS & ISAAC pipelines (VLT)





ISAAC installed at the VLT

First operation of Java version of P2PP



Operations of the VLT start

DFS in operation at Paranal



First release of the Common Pipeline Library



FORS2 installed at the VLT

UVES installed at the VLT

First version of NGAS installed at La Silla, 2.2m

First light VLT Interferometer



VIMOS installed at the VLT

NAOS/CONICA installed at the VLT

Figure 2: A few major milestones and events in the history of the VLT and DFS. One specialty to be mentioned in connection with the DFS compared to e.g. the VCS, is the fact that a wide range of different internal but also external users have to be supported. This means that overall, DFS is dealing with a quite complex set of different user profiles. This goes e.g. for tools like the P2PP, PHRS, the archive facility, the ETCs and more. The total list of users of DFS includes the astronomical community using the ESO facilities, VISAS2 and OPC and the user support group at ESO HQ, Paranal and La Silla operators, archive operations personnel, SW developers inhouse and externally, data flow operations in Garching, and more. This situation constitutes a special challenge in order to harmonize requirements and to avoid introducing changes rendering existing features backwards incompatible. It is clear that this rather large set of users with a wide spectra of interests, produces an extensive amount of requirements, and it requires a lot of caution and investigation to extend the system to satisfy all the users needs.

3. EFFORTS UNDERTAKEN TO COPE WITH NEW REQUIREMENTS Some specific examples of recent approaches initiated to adapt the DFS to new scenarios and to make the DFS more flexible are described in this section. Significant examples of initiatives to adapt the capabilities of the DFS are: Phase1/Phase2 Tools: A major improvement of the Phase 1/Phase 2 Tools has been the porting from Tcl/Tk to Java. One of the reasons for doing this, was to make the tools more portable. Various reasons allowed us to make this decision in 2000: • • •

The Java language was considered mature enough. Even if Java started to be used in the mid 90s, we wanted to avoid using an immature language, and instead first be sure it would become an accepted standard in the SW industry. Java had proved to be fairly portable. Java is more reliable thanks to strict data typing (as opposed to Tcl/Tk).

The first Java P2PP was successfully used in operations by mid 2000. One year later, the same applied to the OT, and we are now progressively moving the GUI part of the PHRS tools to Java as well. In parallel, we took benefit of the port to Java to implement common modules and are now using these for all Phase 1/Phase 2 tools.

2

Visiting Astronomers Section: The instance at ESO doing the handling of observing proposals.

Another important improvement has been to decouple features from the observation handling tools, which are instrument specific. A good example is the External Verification Module of P2PP and OT. The goal of this module is to perform extensive checks of OBs. It has been decided to write it with a scripting language in order to allow easy changes and adaptations done directly by the instrument scientists to avoid recompilation/reinstallation. This plug-in capability of some DFS tools has proven to be efficient, and should be extended if necessary. Archive System One of the biggest challenges for the archive system, is to cope with the increasing data rates mentioned above. In addition it was desirable to have a more scalable archive system. It has been necessary to introduce new technologies and concepts to obtain a system, which is performing well enough. The solution is called Next Generation Archive System (NGAS, [13], [24]), and is based on magnetic disks as storage media. The disks are prepared and checked in Garching and are sent to Chile where they are filled with data. Disks are always used in pairs in order to keep a back-up in Chile. All information about disks and files are stored in the DFS Sybase DB. The information updated at the observatory site is replicated to the Garching DB. The introduction of NGAS has not only had the advantage of making it possible to handle larger data rates. It also has other advantages such as providing easier access to the data and reducing the operational costs. In addition, since the disks are hosted in Linux based PCs, the archive itself is a powerful processing unit. The first implementation of the NGAS system has been tested in Chile in connection with the WFI instrument with good results. The NGAS system might be used for archiving data within this year from the first high data rate instruments at Paranal. Pipeline Infrastructure A major challenge for the instrument pipelines has been to support the specialties of each instruments. This led to a situation were the pipelines were implemented from scratch for each instrument. This again led to a major burden in terms of implementation, test and maintenance. In order to cope with this situation, a Common Pipeline Library (CPL) has now been designed and is going into its final implementation phase. The CPL is implemented in ANSI C to make it portable and efficient. The idea with the CPL is that it will form the basis of all future pipelines developed for ESO instruments. Existing pipelines may also be upgraded to use this library if the resources needed for this can be found. It was not possible to design the CPL earlier, since the definition of this is based on the experience collected during the first years of operating the DFS. An instrument pipeline is a complicated piece of SW, which requires a lot of optimization and tuning. Lot of this experience has now been put into the CPL. The CPL will thus ease the pipeline development considerably and will be used both for pipelines implemented by ESO but also for pipelines implemented by external consortias for ESO. It goes without saying that when changing technologies and replacing existing and operating components, this has to be done with great care and in the best case in an incremental manner. In general, the policy has been not to jump on to new technologies before having a certain guarantee that this technology has been embraced by industry and in the best case has been adopted by a standard organization like ISO.

4. SYSTEM LEVEL CONSIDERATIONS AND EFFORTS Apart from the DFS software itself, also the support framework for developing and operating the software is undergoing constant adaptation and optimization. Within this category fall the software-engineering framework, programming languages, hardware platforms, operating systems and DB systems. This being the very basis of the development and of the operational system, also has to be kept up to date and has to be improved continuously in order to • • • •

Increase the quality of the software produced. Decrease expenses to maintenance. Make use of new technologies to avoid ending up with an obsolete system. To some extent decrease expenses for equipment and for software product licenses.

In the following, a number of areas concerned by this are discussed and the approaches to deal with the SW maintenance and the operation are described.

Structure of the DFS Group The DFS Group is composed of various development teams as shown in figure 3.

Observation Support System Team

Archive Facility Team

System Engineering Team

Pipeline Development Teams

Exposure Time Calculators Team SW packages, libraries, binaries Test reports, standards, templates, SPR feedback

Figure 3: The different teams within the DFS Group. The SEG is the central, coordinating unit. Each team is responsible for one or more projects like the development and maintenance of the ETCs. Each project has a Project Scientist allocated to it. This person helps in defining requirements and priorities for the project and is usually external to the DFS Group. The central, coordinating unit of the DFS Group is the SEG (System Engineering Team). The responsibility of this team is to provide guidelines and standards for implementing the SW. In addition to carry out the Integration Tests and report back errors to the developers. Also the management of changes to be implemented in the SW is coordinated by the SEG. During the preparation of a new release of the DFS, the SEG is also taking care of handling the coordination. Finally the SEG is responsible for installing the DFS on the various ESO sites, or to assist in this process, and to distribute parts of DFS to external users. The SEG therefore plays a very important role within the DFS Group, and it would be very hard to maintain and deliver the DFS in the complex operational environment found within ESO without the support of the SEG. Handling of Software Changes With the large amounts of requests for new features and optimization of existing ones, together with bug reports, a welldefined scheme must be in place to handle this in a formal and efficient manner. This also to make it possible to involve all people with the necessary knowledge and experience needed for taking decisions. For this purpose the SW Problem Report (SPR) system is used ESO wide for submitting change requests and bug reports. The SPR system enables external and internal users to submit their change requests and bug reports either via a WEB based system or by a tool running directly on the user’s WS. On a regular basis a so-called Software Configuration Control Board (SCCB) meeting is arranged, where the people responsible or with an interest in the topic of the change request are invited. The Project Scientists usually participate in the SCCB meeting. During this meeting it is decided if the request should be implemented or not, sometimes also how, and a deadline is defined. All remarks, questions and decisions concerning a change request are recorded in the request file. The VCS software development group is using a similar scheme. The experiences using this formal scheme for reporting problems, bugs and change requests, has shown, that it is an indispensable part of any software development environment, in particular for one as complex as is the case of the DFS. Without it, it would be virtually impossible to get an overview of the changes to be introduced, and it would be impossible to avoid backwards incompatibilities and other inconsistencies. The SPR system also allows to share more easily information in this context with other divisions and groups, which are depending on the DFS. This makes it easier to coordinate the work and to define dates for the next releases. The SPR system also serves as a sort of change log for the various software products, indicating precisely when changes have been introduced. This can also be used for troubleshooting, and in general to keep track of the status and development of the various projects at management level.

Handling of Software Releases Due to the changes introduced continuously in DFS, regularly new releases of the DFS software are prepared and distributed to various sites. Such a release is usually produced every 6, lately 12 months, and this is normally coordinated with the release of the VCS since the two systems have mutual dependencies and it makes sense to synchronize the delivery of the two and to perform a simultaneous, iterative integration to avoid incompatibilities and other problems that may arise from changes introduced into the systems. A formal release procedure of DFS has been established, and must be followed for all components of the DFS. This integration cycle encompasses a number of steps and rules, for instance concerning when software packages should be delivered to the SEG for Integration Testing, i.e., when they should be frozen. This procedure provides the SEG enough time to implement new test cases as needed before doing the testing. It is also defined how release notes, describing the new features and possible problem reports fixed since last release, for the SW package should be delivered (format and when). Although the SW is frozen at the point when it is issued to the SEG for Integration Testing, it is clear that preparing a release is an iterative process whereby possible bugs or problems found during the testing are propagated back to the responsible developer, who must deal with them as soon as possible and provide a new release checked into and tagged in the SW configuration control system. From the latter, a new installation is delivered to the SEG for testing and in this way the cycle continues until all problems have been iterated out of the system. In general it is avoided to make too many major releases of the complete DFS, i.e. intermediate releases are restricted to a minimum. These are normally only done when a patch is necessary in some SW module, without which it would be difficult or impossible to provide a specific service. Also when a new instrument is installed, a new release is normally distributed containing new, special features for this instrument. A minor/partial release is produced more or less every month, to satisfy requirements from e.g. external users. Also the many dependencies of the DFS on external SW packages and systems like Sybase, Java, Python, various OS’, GNU and more, makes it necessary to go through the release cycle to test the SW with a new version of such an external SW component. Development Team Information Sharing Apart from the system of SPRs and SCCB meetings and of well-defined integration cycles, there are a number of other types of information that need to be shared and distributed within the DFS Group. For this purpose it is advantageous for obvious reasons to set up a WEB site giving access to all the relevant information in connection with the SW development. This is also the case for DFS. A very complete DFS Intranet WEB site has been established, which contains all the information needed for all steps and actions to be carried out in connection with the SW development, documentation, integration and release cycles. This WEB site is also home for a document repository containing the documents produced within the context of the DFS and from which they can be checked out via the WEB. A feature is also in place to check out documents for update, and subsequently to check in a new version of the document. Other information available from this WEB site are templates used for creating various documents like User’s Guides and Release Notes. Common Data Interfaces Definition In order to keep the data interfaces within ESO under control, the ESO Data Interface Control Board, DICB ([15]) was founded in 1995. The DICB is maintaining an important document, "Data Interface Control Document", which defines the contents and scope of the various data products being used and produced within the context of ESO ([16]). This document defines in particular properties for FITS files (mandatory keywords, usage of certain keywords etc.), format and contents of log files (Operations Logs), format of the so-called Parameter Files used to store keywords for setup and configuration purposes and used by the P2PP for storing OB and TSF parameters. Also defined is the format of the Data Interface Dictionaries (DIDs) that contain the definition of all keywords used at ESO. Finally, defined in the document are physical units used, and naming conventions for e.g. optical components and files generated. On a regular basis the members of the DICB gather to approve the contents of data products e.g. in connection with new instruments or other sub-systems. It may then be that some recommendations to change certain definitions are propagated back from the DICB to the responsible of the instrument. The instrument SW subsequently, should be updated according to the recommendations given by the DICB. The DIDs act as a central definition of the keywords and serve also as documentation for the keywords. The DIDs are under configuration control and to some extent they are considered as part of the SW. Within the framework of DFS, an online DB repository of these DIDs have been implemented (DidRep). This is updated on a daily basis with new

versions of the DIDs whenever these become available in the SW repository. A WEB interface to browse the repository is provided ([17]). This makes it easy to look-up information about keyword cards used e.g. in FITS files coming from a specific instrument. For the moment about 60% of all DIDs used at ESO are updated in the DidRep. In these DIDs more than 2500 keywords are defined. Without such a mechanism to update and browse these, it would be very difficult to interpret keywords used e.g. in FITS files. The experience has shown how important the role is of the DICB. The concept of the DIDs ensures that no attempts are initiated to invent keywords independently without the rest of the parties being involved knowing this. The DICB and the work done by this board, is not directly part of DFS, but DFS is highly depending of this and is continuously contributing to the standards, definitions and has also contributed with the online DID repository (DidRep). Work has been initiated to define the next generation standards for various data formats based on XML. This work is important also for projects like the Astrophysical Virtual Observatory (AVO, [18]) and ALMA ([19]). Formats e.g. for log files, XML based DIDs are currently under consideration. Programming Languages The issue of programming languages is a subject, which is rather delicate and quite controversial to deal with in an environment like the one of DFS. There are many arguments pro and contra each programming language, and each developer has his own preferences. It goes without saying that using many different programming languages is something that should be avoided. It has many disadvantages such as creating an inhomogeneous development environment where each SW developer might have to master quite a few languages. This reduces productivity and makes troubleshooting and debugging more cumbersome. It also makes it more difficult to hire new staff for SW development, because finding qualified personnel with such an extensive experience is very difficult. Furthermore, training people to become experts in a ’large’ number of languages is very costly in terms of time and money. For the sake of simplicity, maintainability and in order to make it possible to share and re-use code, it is therefore vital to reduce the number of languages to an absolute minimum still taking the needs of an efficient development and operational environment into account. In general it seems that there is a need for using a scripting language like Python, Perl or Tcl (un-typed), and a compiled language like C/C++ or Java (typed) in a development environment like the one of the DFS. A scripting language would typically be used for writing automatic regression tests and for producing prototypes for evaluation of new concepts. In addition it can be used for making small utilities, cron jobs and the like. In general the policy for DFS has been to avoid implementing large projects in scripting languages. The un-typed nature of the scripting languages makes them less attractive for this purpose. What concerns prototypes implemented in a scripting language it can be said, that the implementation time before a running system is obtained often is much shorter than when using a compiled/typed language. This is the actual motivation for doing it. The prototype will often have to be re-implemented in a typed language at a later stage. However, within the context of DFS there are examples where the first implementation developed in a scripting language has been used for years (e.g. P2PP) and where the implementation is working so well and is generating so few problems that it might be decided to keep the implementation (NGAS). Typically compiled languages are used for implementing systems where execution speed is a critical issue and for server applications where stability and testability is very important. Using a particular language for a project to solve a specific problem should only be done after thorough considerations and analysis. It must be proven that it is not possible to solve the problem using one of the standard programming languages. It should be mentioned, that over the years it may be necessary to move some modules and components from one language to another to make the maintenance more manageable, maybe for the simple reason of being able to find qualified personnel to take over responsibility for the code. In practice, reducing the number of programming languages within a complex environment like the DFS, turns out to be harder than it might seem. This first of all because the situation with respect to programming languages changes over the years. In addition, the requirements coming from new projects might require a special language, or the experiences made with an existing project show, that it would be a major advantage to implement the project in another language. The situation for DFS is such that C/C++ typically are used for the pipelines and various server applications like the OLAS servers. For the pipelines in general implementation in ANSI C has been favored for reasons of portability,

whereas for OLAS with less demanding requirements with respect to portability, C++ has been used to profit from object oriented methodologies. In general, most server applications are implemented in C++. Tcl/Tk has been used in particular in the beginning for implementing GUI applications because of the powerful features of the Tk package. However, for reasons of mainly maintainability, the tendency now is to go for Java to implement GUIs. This has been done e.g. with great success for the P2PP tool. Also in the beginning Perl was favored for implementing CGI scripts. Lately, Python more and more is taking over this role because in general within DFS, Python is considered a more ’manageable’ and more structured programming language. In general what concerns scripting languages, Python is taking over from Tcl, UNIX shell scripts and Perl due to the good experiences obtained with this language. Testing and Integration A fair amount of resources are being invested in testing the DFS. These tests both comprise Unit Tests of the individual module or component, and Integration Tests at a higher level. This is probably a pretty common scheme. The Unit Tests are implemented by the developer, and are a thorough verification of the basic functioning at class/method/function level, where knowledge about the internals of the SW is necessary in order to implement the test (white box test). These tests help the developer in assuring that no bugs have been introduced before checking the SW into the configuration control repository, or at least before delivering a release. In this way a lot of problems can be filtered out before the SW undergoes the next level of testing, which again saves a lot of release iteration time. The next level tests, the Integration Tests are carried out by a person different from the developer. In this case a person from the SEG. This is done to have a double-check of the SW and to have the tests implemented and carried out with a different perspective in mind, i.e., probably more the perspective of the user or application/system with which the SW is interfacing. The purpose of the Integration Tests is to verify that the features described in the SW Requirements Specification, the SW Design Specification and in the User’s Guide for the SW are working properly and are implemented in the way as documented. In addition it is verified that the integration between the SW package and other SW systems with which it interfaces, remain compatible since the last release. In the case of DFS the tests can roughly be parted into the following main categories (apart from Unit and Integration Tests), namely 1) automatic tests, and 2) manual tests. The manual type of tests has been used until now to test GUIs since implementing an automatic test of such applications may not only be difficult, but in some cases simply not possible. I.e., it is needed to have certain checks carried out with the ‘naked eye’ by a dedicated tester. The latter of course means that quite some resources are consumed in the testing, but skipping such tests on the other side is unacceptable. The experience with respect to testing is that testing is as important as the SW development itself. It should never be attempted to save resources on testing though of course the budget allocated may impose some limits such that a compromise is necessary. Many problems have been found before delivering the SW, saving a lot of time for installation and avoiding interruptions in the daily operations saving valuable observing time. The status of the testing of the DFS is such that quite a complete set of Integration Tests have been implemented and are carried out in connection with each release cycle. The most important components are tested quite thoroughly. Still tests of some SW packages are missing and should be developed and existing test cases enhanced. Therefore, there is still room for improvement of the tests. Nevertheless, a rough estimation is that the Automatic Integration Tests cover some 60-70% of the test cases that it would be desirable to implement. In total some 2000 test cases have been implemented for the various components. For the pipelines each recipe used (~360) has at least one test case. In the case of DFS the ratio between tester and developer is in the order of 1:8. Probably a more reasonable ratio would be in the order of 1:6. This would allow development of the complete set of tests required. The test programs of course, should be under configuration control as the rest of the SW and common utilities should be used for the implementation. It is advantageous to use a higher level scripting language like Python, Perl or Tcl for the implementation. Finally the test programs must be well documented, suitable GUIs must be provided for executing the tests together with tools to analyze the test data produced. The last points still need some refinement in the case of the DFS tests. As a minor last remark to the topic of testing, it can be mentioned that it is a problem to keep tests up to date, i.e. to update them with new test cases whenever the SW is updated with new features. The developers should be reminded about this since otherwise the tests may become obsolete and inadequate for verifying the SW after a while. Testing can consume an infinite amount of resources depending on the level at which it is done. It is therefore necessary to find an appropriate compromise between investment and the degree of validation of the SW. It is important to view the test

cases as an integral part of the SW development, and important to keep these up to date with new features introduced. In fact the tests are considered as real projects within the DFS. Hardware and Operating Systems The strategy of ESO with respect to HW platforms, is to avoid having too many different types, which of course makes sense. The same goes for OS’ whereby it was originally chosen to support only two. When the SW development started for the VLT (mid 90s) it was chosen to use HP-UX and Solaris. This also automatically selected the platforms for which most of the SW has been ported and tested. For the operational environment at the telescope sites it was decided to use only HP workstations (WS’) and thus HP-UX. The reason for this decision was that the computer equipment from this manufacturer was considered as being particularly robust and in fact tested under the conditions found in ~2600m altitude in a desert. This was even guaranteed by the manufacturer. Still all SW continuously has been supported on HPUX and Solaris, to be able to switch in case it should become necessary. This of course imposes an extra development, testing and integration effort but pays off, since it ensures that the SW is portable. I.e., that nothing is hardcoded in the SW, which locks it to a specific platform. It should be mentioned that the usage of HP so far has not been a bad choice, since very few problems have been encountered with WS HW. In the last years the situation has changed with the increasing popularity of Linux running on cheap PC HW (IBM compatible). The major part of the DFS has now been ported to Linux, and in some cases are running on a permanent basis on Linux. Until now though, only at the ESO HQ and at the home institutes of astronomers. Efforts are now undertaken to unify ESO’s usage of Linux, i.e. to standardize the usage of Linux and to streamline the installations and configurations used. It will maybe soon be possible to operate Linux based computers within the VLT control LAN. This has been avoided until now to keep the system as homogenous as possible and to reduce expenses to system administration. Using Linux of course will mean that ESO will save quite a lot of money for HW and licenses for OS’. In the end, probably a mix of HP and Linux will be running at the telescope sites. This again generates a requirement towards DFS to be fully supported also on Linux, which again mean that more resources will have to be invested in testing and integration. The usage of Linux + PCs at Paranal is still under consideration and discussion, but it is envisioned that this will progress pretty soon. Some SW products are supported even on more platforms. This goes in particular for P2PP which has to run on many platforms, but also for the image processing library “Eclipse” ([21]) developed within the framework of the instrument pipelines. Also the pipelines themselves and the Common Pipeline Library are supported on several platforms including Linux. Some other products like the Next Generation Archive System, have been dedicated from the beginning to run on Linux and is used in operation on this platform, although it also runs on Solaris/HP-UX. DB Infrastructure One of the areas in which a lot of considerations and design efforts must be invested in a project like DFS, is to obtain a consistent and robust 'DB infrastructure'. The reason for this is that first of all, the expenses for a DBMS and the services in this connection usually end up being a significant portion of the budget for SW development and maintenance. I.e., prices for licenses for the DB systems, and the expense for expertise to keep these systems running with a high degree of reliability and availability, cannot be ignored. Also where it is often less cumbersome to change or re-implement front-end applications and tools interacting with the DBMS, changing the DBMS itself or the structure of the data in it, is usually a rather painful exercise. This normally requires quite some resources and in any case is a delicate procedure. Great care has therefore to be taken when choosing DBMS and when designing the structure of the DB. The DBMS and the DB as such are to some extent even more important than the SW depending on these. The DB of a data flow system, is so to speak the very foundation or core part of the system and very operational critical. An important point when selecting a DBMS is to go for a system, which seems to have good chances of surviving for many years, and which does not have unreasonable high maintenance costs. It is also important of course that aspects such as the SQL dialect supported is according to the standards. For the application development it is recommendable to use ODBC for the DB access. This has been practiced to some extent for the DFS but should be practiced even more to make it possible to re-use SW more easily. In the case of DFS it was chosen to use Sybase. The main reason for this choice was that ESO was already using Sybase since years. In addition this company offer substantial reductions in license fees (for educational and non-profit research organizations). Furthermore Sybase is quite well established in particular as DBMS used by many banks. This makes it probable that Sybase will survive for many years on the market. So far ESO has had good experiences using Sybase. Only few problems have been encountered. The major being that of setting up the replication between the various

remote DB sites used and administered by ESO and to make this work in a reliable and robust manner. It should be mentioned that the DB infrastructure used by ESO is fairly complex. The DB system has to be available 24 hours, 7 days a week. There is usually no space for drop outs of this service. This may lead to interruptions in the operations of the telescope facilities. This imposes to put a scheme in place whereby a DB administrator (DBA) always is available and ready to intervene in case of problems. It is of course an extra expense to have such a service, but pays off since interrupting the operations are more costly. For administrating and handling the DB system used by DFS and other groups at ESO, ESO has a small team with three experts in it (EDAT - ESO Database Administration Team). All major requests for changes or configuration and problem resolving, is carried out by this team. Also at the telescope sites there are experts capable of intervening in case of problems. One service provided by the EDAT is warm stand-by of the DB holdings, whereby a replication DB system is synchronized with the main DB and can take over automatically in case the main DB would fall out. It should be mentioned, that expertise for DB administration is expensive and not so easy to find nor replace. The DFS DBs are rather technical and can only be used by ’experts’ who know the instruments well. For this reason the project Science Archive Facility (SAF) has been initiated. The purpose of SAF is to build a layer on top of the existing DB structure that will allow the generic archive user to query and obtain more scientific valuable information from the DB without knowing the details about the various instruments. Another purpose of SAF is to put the information still missing into the DBs, and to obtain a better integration of the various DBs for the different sub-systems. In the end, this will probably be done by setting up a data warehouse, i.e. a central repository where the data from the various databases is organized for use by analytical applications and user queries. The current set of databases will be maintained with the main purpose of supporting the various online transaction processing applications. As can be seen, with respect to the DBs, ESO has several activities going on to optimize and improve the running system. The data in the DBs is very valuable and represents together with the data files produced, the fruits of all the efforts invested to run the observatory. When this information is properly kept, it can be an important source of astronomical information for generations.

5. CONCLUSION AND FUTURE DEVELOPEMENT OF THE DFS The DFS is a complex but indispensable part of a modern observatory facility like the VLT, which is a major investment and thus has to be utilized as efficient as possible. The DFS has an estimated lifetime of 15-20 years. This means that the system cannot be seen as frozen or static after the first integration. One of the reasons for this is that new instruments continuously generate new requirements. In particular the next generation VLT instruments will produce new challenges for the DFS. A system like the DFS must evolve and must be optimized constantly to improve performance, level of services and to reduce maintainability costs. During its lifetime DFS will undergo a graduate, partial re-implementation, whereby individual components are renewed gradually and replaced without interfering with the on-going operation of the observatory. Another reason for this upgrade of the system is to apply contemporary technologies, which may simplify the system and increase system performance. In addition it might make it easier to do integration with external systems and make it easier to employ new staff to do maintenance. At the same time it is crucial to employ only very well proven technologies to avoid problems with ’premature prototypes’ of e.g. standards and SW components. This means that a quite conservative approach is advisable, whereby only well-established (but non-obsolete) technologies should be used. In general, trying to follow the industry standards and the major trends in technology applied by industry, turns out to be a major advantage. As an example of how much caution to apply when selecting technologies, it can be mentioned that in the early days of defining the DFS it was considered to use CORBA for interprocess communication and information exchange. Evaluations were carried out, but in the end it was decided not to go for CORBA. Reasons for this were that CORBA is complex and training is needed to be able to use it. In addition it was not sure if it was completely mature at that point. More simple technologies were used e.g. for transporting data files between the various hosts where something as basic as UNIX "rcp" was used. It was certainly not the most advanced technology also at that time, but has been working without any problems, and millions of frames have been transported around the system without any losses. Often the simplest and most basic technologies are those giving the fewest problems and the highest level of reliability and consume less resources for maintenance.

The experience so far when it comes to developing, maintaining, upgrading and operating the DFS, can overall be characterized as good. Many problems have been faced, which have led to changes in architecture and strategies but in general none of these problems have become show stoppers or insurmountable obstacles leading to major delays or interruptions of operation. Despite all the changes that have been necessary to adapt DFS to new requirements coming from operation, from users and due to the introduction of new instruments, it can be concluded that the original Data Flow Model (figure 1) still is relevant and has remained more or less unchanged. In general for testing new concepts where no experience is available it is always good first to implement a prototype of the tools needed and operate these for a while to gather experience and to define the exact requirements. This might appear as more costly. However, in the end it saves resources as the end product often will be more robust, consistent and better defined. In addition it is not necessary to keep an obsolete or insufficient implementation alive, a thing that may consume much more resources over time. The experience of DFS has shown, that it is very important to port the SW to various platforms to make it possible to switch HW/OS platform if needed, and to deliver the code if requested for other platforms than the original target platform(s). For this reason it is important to choose programming languages, which are portable, and implement the SW in a portable way as such, e.g. by using standards like ANSI C. The DB is usually the very core of a data flow system. This is often a difficult part to define and design when it comes to the product(s) to go for and the structure of the DB. This is a major issue in the case of ESO, where data is shared in a highly distributed environment. Expenses to licenses for DBMS’ and for maintenance costs may in the long run be a significant post within the budget of the data flow system. The two next major projects coming up for ESO that might influence DFS, apart from VLTI that is already under integration, are AVO and ALMA. To which extent these will make use of the DFS or DFS profit from the developments done for these projects, is not yet completely clear. However, an example of how the projects could profit from each other, is e.g. that the NGAS might be used as the base archive system for ESO’s ’contribution’ to the AVO ([24]). It is clear that in this case, additional features might have to be implemented to support this new customer. In this sense the projects are generating mutual feedback. It might also be that NGAS will be used as the central online archive for ALMA, with a number of adaptations of course. ALMA is also profiting from the DFS experience what concerns proposal preparation and scheduling, and the experiences made defining and running a complex DB infrastructure. The ALMA approach for the DB infrastructure, might be somewhat different from that of the DFS, whereby a more centralized and unified approach is taken. One day, when the ’ALMA DFS’ is done, it might be possible to propagate back the concepts and some of the services implemented to the VLT. It is still to be seen if latter is really desirable and possible though. ABBREVIATIONS The following abbreviations are used in this article: ALMA AVO CGI CPL DB DBMS DBA DFS ETC GUI HQ HW NGAS OB

Atacama Large Millimeter Array Astrophysical Virtual Observatory Common Gateway Interface Common Pipeline Library Database DB Management System Database Administrator Data Flow System Exposure Time Calculator Graphical User Interface Head Quarters Hardware Next Generation Archive System Observation Block

ODBC OLAS OPC OS OT P2PP PHRS SCCB SEG SPR SW VCS VLT VLTI

Open Database Connectivity Online Archive System Observation Programme Committee Operating System Observation Tool Phase 2 Proposal Preparation Proposal Handling and Reporting System Software Configuration Control Board Software Engineering Group Software Problem Report Software VLT Control System Very Large Telescope Very Large Telescope Interferometer

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [13] [14]

[15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

P.J.Quinn, et al., "VLT Data Flow System: From Concepts to Operations", in Observatory Operations to Optimize Scientific Return, SPIE Proc. 3349, 1998. A.M.Chavan, et al., "Support Tools for the VLT Operations: The NTT Prototype Experience", in Observatory Operations to Optimize Scientific Return, SPIE Proc. 3349, 1998. M.A.Albrecht, et al., "The VLT Science Archive System", in Observatory Operations to Optimize Scientific Return, SPIE Proc. 3349, 1998. D.R.Silva, et al., "VLT Science Operations: The First Year", in Observatory Operations to Optimize Scientific Return, SPIE Proc. 4010, 2000. P.J.Quinn, et al., "The ESO Data Flow System in Operations: Closing the Data Loop", in Observatory Operations to Optimize Scientific Return, SPIE Proc. 4010, 2000. A.M.Chavan, et al., "A Front-End System for the VLT’s Data Flow System", in Observatory Operations to Optimize Scientific Return, SPIE Proc. 4010, 2000. ESO DMD, Data Flow System Overview, "http://www.eso.org/org/dmd", "http://www.eso.org/projects/dfs". ESO SkyCat Tool, "http://archive.eso.org/skycat". ESO Exposure Time Calculators, "http://www.eso.org/observing/etc". ESO Exposure Time Calculators, Article ADASS 1999, "http://www.adass.org/adass/proceedings/adass99/P1-22". Science Archive Facility, Brochure, "http://archive.eso.org/archive2000.pdf" Science Archive Home Page, "http://archive.eso.org" Science Archive Summary, "http://archive.eso.org/VLT-SciArch/VLT-SciArch-Overview.html" Next Generation Archive Systems, "http://archive.eso.org/NGAST". P. Ballester, et al., "The VLT Interferometer Data Flow System: From Observation Preparation to Data Processing", in Observatory Operations to Optimize Scientific Return, SPIE Proc. 4844, 2002 & P.Ballester, "Data Flow for the Very Large Telescope Interferometer", in Observatory Operations to Optimize Scientific Return, SPIE Proc. 4477, 2001. ESO Data Interface Control Board, Home Page, "http://archive.eso.org/dicb". ESO Data Interface Control Board, "Data Interface Control Document", GEN-SPE-ESO-19940-794/2.0, "http://archive.eso.org/dicb/dic-2.0/dic-2.0.4.pdf". ESO DID Repository, "http://archive.eso.org/Tools/DidRep/DidRepWebQuery". Astrophysical Virtual Observatory, ESO Home Page, "http://www.eso.org/avo". Atacama Large Millimeter Array, ESO Home Page, "http://www.eso.org/projects/alma". DFS Pipeline WEB Page, "http://www.eso.org/projects/dfs/dfs-shared/web/vlt/vlt-instrument-pipelines.html" The Eclipse Software, WEB Page, "http://www.eso.org/projects/aot/eclipse". Instruments for the Very Large Telescope, "The http://www.eso.org/instruments". GASGANO WEB Site, "http://www.eso.org/observing/gasgano". Andreas J. Wicenec and Jens Knudstrup, "ESO’s Archive Computing Framework", in Advanced Global Communications Technologies for Astronomy II, SPIE Proc. 4845, 2002.