Case-based Reasoning for Complex ... - Semantic Scholar

Case-based Reasoning for Complex Telecommunication Systems Alfons Schuster, Roy Sterritt, Ken Adamson, Mary Shapcott, Edwin P. Curran University of Ulster, Faculty of Informatics, School of Information and Software Engineering Shore Road, Newtownabbey, Co. Antrim BT37 0QB, Northern Ireland email: {a.schuster, r.sterritt, k.adamson, cm.shapcott, ep.curran}@ulst.ac.uk

ABSTRACT: This paper aims to identify the potential of case-based reasoning (CBR) for problem-solving in complex telecommunication systems. The system under investigation is an environment on which automated test procedures are carried out frequently. The data gathered in these tests is used by engineers to assess the performance and the quality of the system. However, in some situations the complexity of the system together with the large size of the recorded data can lead to situations in which an engineer may find it difficult to come up with a quick and valid assessment. Based on research carried out in previous projects this paper emphasises the potential of CBR for data analysis in this area. The document therefore has different goals. Initially it provides an overview of the automated testing environment and the problems in this domain. The paper then highlights the advantages CBR may provide for intelligent data analysis. KEYWORDS: telecommunication systems, automated testing, data analysis, case-based reasoning

1

INTRODUCTION

There is a clear correlation between developments in telecommunications modelling and those within the fields of inferential statistics, computing science and technology. Exploring this correlation the authors in collaboration with NITEC - (Northern Ireland Telecommunications Engineering Centre) - NORTEL NETWORKS (from now on simply referred to as NORTEL) have undertaken research projects in the telecommunications domain. The projects were developed with the aim of understanding further the behavioural issues of large scale Synchronous Digital Hierarchy (SDH) networks Tanenbaum (1996), Sexten & Reid (1992). For example, a knowledge discovery approach using live data from NORTEL’s test SDH network was utilised to produce an abstracted cause and effect behavioural model of the network Sterritt (1998a). Currently the authors are collaborating with NORTEL to automate their shop floor test facility. This will involve the development of real-time knowledge acquisition tools and aims to extract abstract behavioural system models. This work has raised new issues and spawned new ideas to improve the efficiency and robustness of the testing process. A number of problems have been identified in relation to current techniques including, for example, deficiencies in the knowledge elicitation process where parameters on a specific event have not been assimilated in the model, the lack of robustness, the fast complexity of the domain, the rigidity and inability to work within the constraints of limited data, and finally the inflexibility to accommodate new developments can be identified as shortcomings. This paper aims to highlight the potential of CBR as a possible solution to some of these shortcomings. For example, CBR is able to operate in domains that are not well defined, or where problems arise due to missing data and/or data shortage. CBR also provides the facility to reduce the complexity of many systems through simple, robust and easy maintainable entities, so-called cases. And finally, CBR further provides advantages in terms of learning and explainability. Although the major intention of the paper therefore is to more clearly identify the value of CBR in the automated testing field the paper also provides an overview of future research directions. These directions propose a hybrid system, not only for (intelligent) data analysis, but also for data storage (database, data warehouse) to enable the achievement of continuous improvement in the testing process by a rigorous and competent sustained learning approach to automated testing. The remainder of the paper is organised as follows. Section 2 describes the automated testing environment. Section 3 provides an overview on the data processing involved in the project. CBR and its advantages to some of the problems in this domain are summarised in Section 4. Finally, Section 5 ends the paper with conclusions and future work.

2

TESTING OF TELECOMMUNICATION EQUIPMENT

NORTEL’s main business activity is the design and manufacture of telecommunication equipment. They are the world leader in supplying information transmission systems. For example, their 10 gigabit per second transport products

provide the equivalent of 1,000 paperback novels per second down a glass fibre thinner than a human hair. Telecommunication is a growth sector that is driven by a rapid increase in both voice traffic, and internet traffic worldwide. For example, data traffic is growing 30% per year versus 3% for voice traffic. It is fiercely competitive, both on technology and price, and is entering a new era with the liberalisation of telecommunications in Europe. However, as in many domains, testing is a major element in the development cycle of this equipment, both in terms of cost and timescale Bouloutas et al., (1994). Traditionally this testing was performed manually. An engineer would follow a test case which was loosely coupled to the requirements. This testing would be long and repetitive requiring much expensive over-time. This was not considered the best way to spend the engineers time but it did offer the advantage that, as an expert in the field, the engineer could spot anomalies and probe further beyond the test case. Yet, generally, it was felt regression testing was poorly done by humans.

3

AUTOMATED TESTING

Automated testing could offer a competitive advantage in terms of reduced cost, reduced time to market, and “freeingup” of specialised engineers for further design and development. Table 1 briefly summarises the advantages of automated testing over manual testing. Table 1: Advantages of automated testing over manual testing. Manual testing Ø Ø Ø Ø

Automated testing

Time expensive. Cost expensive. Repetitive task. Less “attractive” role for engineers.

Ø Ø Ø Ø

Reduced time to market. Reduced cost, because less labour intensive. Potential to increase quality by covering more. “Freeing up” of specialised engineers for further design and development. Ø Anything from 1 week to 1 months of manual testing can be run in 1 night.

On of the SDH Rigs constructed by the test engineers at NORTEL that facilitates the testing of each release of the multiplexer software is illustrated in Figure 1.

Figure 1: SDH network and automated testing. Typically, a SDH test network comprises many interacting components. For example, the test network illustrated in Figure 1 contains three multiplexers (EUSTON, ENFIELD, and ACTON), connected to each other in a ring topology using fibre optic cables. Each multiplexer has a series of slots which house cards with specific functions. These are Add/Drop multiplexers since they have connections to tributaries. Once a multiplexer receives data from a tributary, it multiplexes the data into a series of frames. The frames are sent out on the fibre optic cables until it arrives at its destination multiplexer. The frame is then de-multiplexed and sent on the appropriate tributary. In the test network, neighbouring tributaries are connected together so that there is a continuous cycle of data being transmitted. Any events that occur on a component (e.g., event type, event time) are detected by that component and sent to the Network Manager (referred to as Element Controller in Figure 1) via the Hub. The Network Manager logs these events in a

computer file known as the Event Log. A series of UNIX test scripts have been developed to automate the testing. They apply commands, or stimuli to the multiplexers in the network. These stimuli will result in various cause and effect relationships including the generation of faults on certain components. For each fault, the affected components will raise one or more alarms. These alarms are special types of events, and as such are cascaded back to the Network Manager, which logs them in the Event Log. During testing the Event Log can rapidly grow since it logs all events. For instance, the simulation of a fibre optic break generated 6Mb of data. Although the automation of the testing provides substantial cost and time savings this would be negated by having the engineer manually analyse the Event Log. In effect, this only changes the means by which the testing is carried out from real-time to batch and looses the advantage of additional real-time probing into the network’s state. To overcome this inherent disadvantage, the test engineer when coding the test script includes feedback comments which are output to a script log (referred to as MUX LOG in Figure 1). Since these logs will indicate specifically, and clearly the results that the script was designed to test, this allows the engineer to quickly identify if the test was a pass or fail. This approach generally works well but not every situation can be coded for in the script. A fail result highlights a detectable error so it is easy to interpret. A pass cannot be interpreted in the same way, it only proves that an event that was expected occurred. This problem leads back to a reliance on the Event Log to obtain a confidence level for that pass. In collaboration between NORTEL and the University of Ulster researchers have undertaken work to further support the automated testing process of broadband multiplexer software releases Sterritt (1998b, 1998c). The work however was restricted on a sub-part of the data that is recorded in a network test. One of the lessons learned therefore was that for obtaining a better understanding of the overall behaviour of the network it is necessary to collect, store, process, and analyse all the data produced in a network test.

4

DATA PROCESSING AND DATA ANALYSIS IN THE AUTOMATED TESTING ENVIRONMENT

The current project identified 4 different layers in the automated testing environment. Each layer is characterised by a different data processing task (Figure 2.

Figure 2: Data on different levels. Data processing starts on the bottom layer, Layer 1. Very basically the data processing in Layer 1 includes the recording of the raw data generated in a network test in a file called the Event Log. With respect to further aims in the project data collection may also include the gathering of so-called domain knowledge. This type of information, together with the raw data, is intended to be used on a level higher up in the hierarchy of Figure 2. Before, however the raw data is preprocessed & cleaned in Layer 2. At this stage it is necessary to mention work that has been done in previous projects. To better understand the behaviour of the network in these projects generic programs (written in Java) have been produced to initially analyse and visualise the data. Special emphasis was given on the “alarms“ raised in a network test. The results obtained by these programs provided first insight into the domain. Yet, the researchers involved realised the need to incorporate further network events in the analysis (e.g., user action events, system events, login

events ). It was further recognised that the data has to be better organised in a repository that allows faster access, higher flexibility, and possibly the advantage of online analytical processing (OLAP). Layer 3 in Figure 2 therefore constitutes a database/data warehouse layer, but also includes the data analysis and visualisation programs mentioned before. Finally, Layer 4 illustrates the intelligent data analysis layer. This layer presents one of the major goals of the project. It has been realised that conventional statistical analysis techniques may be limited in their potential of dealing with the problems in the complex and difficult automated testing (telecoms) domain. An aim in the project therefore is to employ and test advanced data analysis techniques, summarised under the notion of data mining, but also knowledge-based systems approaches to better understand and analyse the collected data and domain knowledge. CBR is one of the knowledge-based systems approaches the authors intend to apply in the domain.

5

CASE-BASED REASONING

CBR is an artificial intelligence (AI) approach to problem solving and learning that has received considerable attention over the last few years Kolodner (1993), Aamodt & Plaza (1994). Applications of CBR include many domains, for example, meteorology, medicine, and telecommunications, Jones & Roydhouse (1995), Schuster et al., (1997), Lewis (1993). CBR relies on the assumption that reminding and adaptation play a crucial role in human expert problem solving. Reminding means that facing a new problem situation domain experts very often get, or try to get, reminded of similar situations that have been solved in the past. Whenever such prior solutions are available the expert uses these solutions or the plans that led to a successful problem solving to fit the needs of the new problem. In CBR past experiences are referred to as cases. A case is usually described by a set of attributes, also often referred to as salient features, or simply features. Cases that are considered to be useful for further problem solving are stored in a memorylike construct called variously the case knowledge base, the case library, or simply the case base. In broad terms a CBR reasoning cycle consists of: (1) solving a new problem, and (2) learning from this experience. Figure 3 illustrates a simplified CBR scenario.

Figure 3: Simplified CBR scenario. The aim in Figure 3 is to gain information about a new problem situation from similar problems that have been solved in the past. In Figure 3, the previously solved solutions (Base Case 1, Base Case 2, …, Base Case n) are stored in the case base, whereas the new problem is illustrated by a so-called Query Case. Further, the Query Case and each base case are described by a set of features (F1, F2, …, Fn). According to the data recorded in these features the system retrieves the most similar base case(s) to the Query Case out of the case base, assuming that the information available through these case(s) can be used for solving the new problem. Figure 3 is a very simplified view of the CBR process. In reality CBR is more difficult and complex. For example, the Figure 3 scenario does not address CBR issues like case indexing, similarity assessment, case adaptation, and learning. This is not the place to address these topics in detail. It is however possible to highlight some of the advantages of CBR, particularly from the automated testing viewpoint: Ø Ø Ø Ø

CBR does not require causal models or a deep understanding of a domain, and therefore it can be used in domains that are poorly defined, where information is incomplete, contradictory, or where it is difficult to get sufficient domain knowledge. It is usually easier for experts to provide cases rather than to provide precise rules, and cases in general seem to be a rather uncomplicated and familiar problem representation scheme for domain experts. As the complexity of the knowledge base increases a CBR knowledge base is probably easier to maintain rather than a rule based knowledge base. For example, it is easier to add or delete a case in a CBR system as opposed to changing rules, which often implies a lot of reorganisation in a rule based system. Cases provide the ability to explain by example (retrieved cases) and to learn (adding a case to the case base). The explicit knowledge representation via cases therefore makes CBR distinct from other AI

Ø Ø

5.1

approaches like rule based systems that rely solely on general knowledge of a problem domain, or neural networks that do not provide any explanation facility at all. Past solutions and steps involved in the problem solving process can be reused and also provide valuable help in preventing repetition of previous errors. One of the most promising advantages from the viewpoint of automated testing is that CBR provides a means to drastically reduce the complexity of a domain through a simple, robust abstraction - a case.

PLANNED APPLICATION OF CBR IN AUTOMATED TESTING

In previous research in the domain data pre-processing and data cleaning procedures have been implemented to better organise, visualise, and understand the data recorded in a network test (Figure 4). It was mentioned before that the data produced in a test is recorded in the Event Log. Data cleaning falls back on the Event Log by generating files for the different event types that occur in a test (e.g., Alarm Events file, and User Action Events file in Figure 4). The cleaning process then parses through the Event Log, identifies every event that is recorded in the Event Log and writes it into the associated file. For example, according to Figure 4 an alarm event found in the Event Log would be registered in the Alarm Events file. Based on the entries in the different files it is possible to generate charts that illustrate the frequency of occurrence, for example that of an alarm (Figure 4 again). In the project such a chart is also referred to as a “footprint”. Note also that the splitting-up of the Event Log in the cleaning process is a first attempt to reduce the complexity of the test data.

Figure 4: Pre-processing of system alarms and frequency of occurrence charts. Further, one assumption is that the foot-prints can be utilised for the identification (classification) of a pass or a fail of a network test. In case there is a sufficiently large number of pass and fail foot-prints available, it should be possible to use classification techniques, for example a neural network, to generate a pass/fail classifier for network testing. A drawback of many classification techniques, neural networks included, is that they do not provide any explanation facilities. This is where CBR comes into place. CBR may provide such a facility, because it is possible to include knowledge about a specific episode (e.g., a network test) in a case. The knowledge might be a piece of information gathered from an engineer involved in the testing procedure. The intention therefor is to form cases from the data generated in the cleaning process and the knowledge that is available in the domain (Figure 4). This process involves the typical CBR and knowledge-based systems issues of case indexing, feature selection, knowledge acquisition, inference, etc. Since this paper is only a strategy paper proposing the CBR approach for this specific problem the results of the approach might be reported at a later stage of the project.

6

CONCLUSIONS AND FUTURE WORK

The intention of the presented paper was (a) to provided an overview of the authors’ research in the automated testing domain, (b) to pinpoint some of the problems identified in this field, and (c) to report on possible solutions for these problems. More specifically, the paper proposed a CBR approach to cope with the fast complexity of the domain, and also the huge amount of data recorded in a network tests. To a large extent the motivation for the approach is based on the experience the researchers involved gathered in previous CBR research. Future work intends to implement the CBR system on top of the already existing data processing procedures (data pre-processing, data cleaning, frequency occurrence of system alarms). There is also a second focus in the ongoing project. At the moment the data recorded in a test is more or less loosely available in a file system. To achieve a higher degree of organisation, faster accessibility,

increased flexibility, and also to allow online analytical processing the second focal point in the project is the development of a data warehouse for data storage and data analysis. ACKNOWLEDGEMENTS We would like to thank NORTEL NETWORKS for their support and funding of this work under the new “Jigsaw” programme, and also IRTU for previous funding under the Start 7 programme - the GARNET project. ABBREVIATIONS AI = arificial intelligence, CBR = case-based reasoning, NITEC = Northern Ireland Telecommunications Engineering Centre, OLAP = online analytical processing, SDH = synchronous digital hierarchy, SDHMS = SDH management system. REFERENCES Aamodt A. & Plaza E., 1994, “Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches”, AICOM, March, 7(1), pp. 39-59. Bouloutas A.T., Calo S., and Finkel A., 1994, “Alarm Correlation and Fault Identification in Communication Networks”, IEEE Transactions on Communication, Vol. 42, No 2/3/4, Feb/Mar/Apr. Jones E.K. & Roydhouse A., 1995, “Intelligent Retrieval of Archived Meteorological Data”. IEEE Expert Intelligent Systems and their Applications, pp.50-57. Kolodner J., 1993. “Case-Based Reasoning”, Morgan Kaufmann, San Mateo, California, USA. Lewis, L., 1993, “A Case-based Reasoning Approach to the Resolution of Faults in Communications Networks”, IFIP Trans. C-Comms Systems, Vol.12, pp. 671-682. Schuster A., Dubitzky W., Lopes P., Adamson K., Bell D.A., Hughes J.G., White J.A., 1997, “Aggregating Features and Matching Cases on Vague Linguistic Expressions”, Proceedings of 15th International Joint Conference on Artificial Intelligence IJCAI Nagoya, Japan, pp. 252-257. Sexten M., & Reid A., 1992, “Transmission Networking: SONET and the Synchronous Digital Hierarchy”, Astech House, Boston, USA. Sterritt R., Adamson K., Shapcott C.M., Bell D.A., 1998a, “An Architecture for Knowledge Discovery in Complex Telecommunication Systems”, Editors: Adey R.A., Rzevski G., Nolan P., Artificial Intelligence in Engineering XIII, Computational Mechanics Publications, Southampton, pp. (CD-ROM) 627 – 640. Sterritt R., Adamson K., Shapcott M., Curran E.P., 1998b, “Adapting an Architecture for Knowledge Discovery in Complex Telecommunication Systems for Testing Assurance”, Proceedings of NIMES 98 Conference on Complex Systems, Intelligent Systems and Interfaces, pp. 37 – 39. Sterritt R., Curran E.P., Adamson K., Shapcott M., 1998c, “Application of AI for Automated Testing in Complex Telecommunication Systems”, Proceedings of EXPERSYS 98, 10th International Conference on Artificial Intelligent Applications, pp. 97 - 102. Tanenbaum A.S., 1996, “Computer Networks”, Upper Saddle River, London: Prentice Hall.