Defining and Validating Metrics for Assessing the Maintainability of ...

2 downloads 0 Views 568KB Size Report
Oct 3, 2003 - Database design for smarties. Using UML for data modelling. San. Francisco. Morgan Kaufmann. 71. Olivas J. and Romero F. (2000). FPKD.
FACULTEIT ECONOMIE EN BEDRIJFSKUNDE HOVENIERSBERG 24 B-9000 GENT Tel. Fax.

: 32 - (0)9 – 264.34.61 : 32 - (0)9 – 264.35.92

WORKING PAPER

Defining and Validating Metrics for Assessing the Maintainability of Entity-Relationship Diagrams

Marcela Genero 1 Geert Poels 2 Mario Piattini 3

October 2003 2003/199

1

Department of Computer Science, University of Castilla-La Mancha, Ciudad Real, Spain Department of Management Information, Operations Management and Technology Policy, Ghent University, Ghent, Belgium 3 Department of Computer Science, University of Castilla-La Mancha, Ciudad Real, Spain Corresponding author is G. Poels, Faculty of Economics and Business Administration, Ghent University, [email protected], tel: +32 9 264 34 97. 2

D/2003/7012/37

Defining and Validating Metrics for Assessing the Maintainability of Entity-Relationship Diagrams Marcela Genero1, Geert Poels2 and Mario Piattini1 1

ALARCOS Research Group, Department of Computer Science University of Castilla-La Mancha Paseo de la Universidad, 4 - 13071 - Ciudad Real (Spain) {Marcela.Genero, Mario.Piattini}@uclm.es

2

Department of Management Information, Operations Management and Technology Policy Faculty of Economics and Business Administration Ghent University Hoveniersberg 24, 9000 Ghent (Belgium) [email protected]

Abstract Database and data model evolution is a significant problem in the highly dynamic business environment that we experience these days. To support the rapidly changing data requirements of agile companies, conceptual data models, which constitute the foundation of database design, should be sufficiently flexible to be able to incorporate changes easily and smoothly. In order to understand what factors drive the maintainability of conceptual data models and to improve conceptual modelling processes, we need to be able to assess conceptual data model properties and qualities in an objective and cost-efficient manner. The scarcity of early available and thoroughly validated maintainability measurement instruments motivated us to define a set of metrics for Entity-Relationship (ER) diagrams, which are a relevant graphical formalism of the conceptual data modelling method. In this paper we show that these objectives and easily calculated metrics, measuring internal properties of ER diagrams related to their structural complexity, can be used as indirect measures (hereafter called indicators) of the maintainability of the diagrams. These metrics may replace more expensive, subjective, and hence potentially unreliable maintainability measurement instruments that are based on expert judgement. Moreover, these metrics are an alternative to direct measurements that can only be obtained during the actual process of data model maintenance. Another result is that the validation of the metrics as early maintainability indicators opens up the way for an in-depth study of structural complexity as a major determinant of conceptual data model maintainability. Apart from the definition of a metrics suite, a contribution of this study is the methodological approach that was followed to theoretically validate the proposed metrics as structural complexity measures and to empirically validate them as maintainability indicators. This approach is based both on Measurement Theory and on an experimental research methodology, stemming mainly from current research in the field of empirical software engineering. In the paper we specifically emphasize the need to conduct a family of related experiments, improving and confirming each other, to produce relevant, empirically supported knowledge on the validity and usefulness of metrics. Keywords: conceptual data model, entity relationship diagrams, model evolution, model quality, maintainability, structural complexity, metrics, prediction, theoretical validation, empirical validation, experimentation

1

1. Introduction In the highly dynamic business environment we are in these days, existing business models become obsolete at an ever-increasing pace and must therefore be designed with flexibility in mind to satisfy the needs of agile companies. The constant reshaping of business models increases the volatility of the data requirements that must be supported by companies' data resource management policies and technologies. The ability to rapidly change databases and their underlying data models to support the needs of changing business models is a main concern of today's information managers. Database evolution is a significant problem, high on the research agenda of information system researchers. It has been observed recently that conceptual data model quality, which is postulated as a major determinant of the efficiency and effectiveness of data model evolution, is a main topic in current conceptual modelling research (Olivé, 2002). Especially the quality characteristic of model maintainability, i.e. the ability to easily change a model (ISO, 2001), seems to be a key factor in conceptual data model evolution. In order to evaluate and, if necessary, improve the maintainability of conceptual data models, data analysts need instruments to assess the maintainability characteristics of the models they are producing. The earlier this assessment can be done, the better, as it has proven much more economical to evaluate and, if necessary, improve quality aspects during the development process than afterwards (Boehm, 1981). Maintainability is, however, an external quality property meaning that it can only be assessed with respect to some operating environment or context of use (Fenton and Pfleeger, 1997). The maintainability of a conceptual data model will, for instance, depend on the model's understandability, which depends in turn on model properties such as structure, clarity and self-expressiveness, but as well on the data analyst's familiarity with the model (as for instance evidenced by the software cost drivers included in the COCOMO II model for software cost estimation (Boehm et al., 1995)). Likewise, the model's modifiability, another maintainability sub-characteristic, may depend on factors external to the model, like the type of modifications that must be applied. Therefore, external qualities such as maintainability are hard to measure objectively early on in the modelling process. They generally need to be assessed in a subjective way, for instance using expert opinions expressed through a formal or informal scoring system. For a more objective (and more cost-efficient) assessment of external quality attributes such as maintainability, an indirect measurement based on internal model properties is required (Fenton and Pfleeger, 1997). If a significant relationship with conceptual data model maintainability can be demonstrated, then measures of internal model properties

2

(i.e. “metrics”) can be used as indicators of sufficient or insufficient maintainability. Once constructed and validated, a measurement-fed maintainability prediction model can be implemented and employed with relatively few costs compared to an a-posteriori maintainability assessment. Therefore, the early availability of metrics would allow database designers to perform: − a quantitative comparison of design alternatives, and therefore an objective selection between several conceptual data model alternatives with equivalent semantic content. − an early assessment of conceptual data model maintainability, even during the modelling activity, and therefore a better resource allocation based on this assessment (e.g. redesigning high-risk models with respect to maintainability). In this paper we present a set of metrics for measuring the structural complexity of ER diagrams and present three controlled experiments we carried out in order to gather empirical evidence that these metrics are related to ER diagram maintainability. Our focus on ER modelling is justified by the observation that in today's database design world, it is still the dominant method of conceptual data modelling (Muller, 1999). Our focus on structural complexity as an internal model property that is potentially related to ER diagram maintainability is motivated by similar research in the field of empirical software engineering, where structural complexity has been shown to be a major determinant of external software qualities, including maintainability (Li and Henry, 1993; Harrison et al., 2000; Briand et al., 2001a; Fioravanti and Nesi, 2001; Poels and Dedene, 2001; Bandi et al., 2003). Moreover, as a syntactic model property, the measurement of ER diagram structural complexity can be automated, which assures the cost-efficiency, the consistency and repeatability of the (indirect) maintainability assessments. Apart from the proposal of ER diagram metrics to be used as maintainability indicators, a contribution of this paper is a methodological approach to metric definition, based on suggestions provided in Briand et al. (1995) and Calero et al. (2001), paying extensive attention to the validation of the metrics as measures of internal model properties and as indicators of external quality. For the validation of the metrics as ER diagram structural complexity measures we use an approach grounded in Measurement Theory that was proposed by Poels and Dedene (2000) for the theoretical validation of software metrics. For the validation of the metrics as ER diagram maintainability indicators a series of three related experiments was conducted in which the relationship between the metrics values and direct maintainability assessments was statistically analysed for significance. As far as we know, this is the first extensive empirical investigation of the relationship between ER diagram maintainability and ER diagram internal properties related to

3

structural complexity. Apart from being able to assess ER diagram maintainability in an objective and cost-efficient manner, our study may help towards understanding what factors make a model maintainable and what factors may prove a hindrance to model evolution properties, in particular maintainability characteristics. From this understanding, quality-focused model design knowledge may be derived which can then be incorporated into the modelling language, method and process. This paper is structured as follows. In section 2 we discuss related work in the field of metrics-based conceptual data model quality assurance. Our metric definition and validation methodology is described and motivated in section 3. After proposing our metrics suite in section 4, we proceed with the theoretical and empirical validation of the metrics in sections 5 and 6 respectively. Finally, concluding remarks and an outline of our future work is presented in section 7. 2. Related work Even though several quality frameworks for conceptual data models have been proposed (Lindland et al., 1994; Moody and Shanks, 1994; Krogstie et al., 1995; Schuette and Rotthowe, 1998) most of them do not include quantitative measures (i.e. metrics) to evaluate the quality of conceptual data models in an objective and costefficient way. Moreover, the metrics that can be found in these frameworks have not been thoroughly validated. The evidence that they measure the quality properties that they are purported to measure is scarce. In order to illustrate this, we will briefly comment the existing studies related to metrics for conceptual data models. Eick (1991) proposed a single quality metric applied to S-diagrams1 (Eick and Lockemann, 1985; Eick and Raupp, 1991). Three quality aspects, expressiveness, complexity and normalizedness normality, are meant to be captured by this metric. To our knowledge there is no published work confirming the theoretical or empirical validity of Eick´s metric. Gray et al. (1991) proposed a suite of metrics for evaluating the quality characteristics of ER diagrams (complexity and deviation from third normal form). These authors commented that empirical validation of these metrics has been performed, but they do not provide the results. Independently, Ince and Shepperd have carried out a theoretical validation of the metrics using the algebraic specification language OBJ (Ince and Shepperd, 1991) demonstrating the correctness of the underlying syntax of the metrics. 1

The S-diagram is a data model which was influenced by the work on the binary relation model (Abrial, 1974) and The Semantic Database Model (SDM) (Hammer and McLeod, 1981).

4

Kesh (1995) proposed a single measure for the quality of ER diagrams, combining different measures of the ontological quality of ER diagrams. In most cases these measures are Likert scales that need to be rated in a subjective way. Kesh´s proposal also requires that, to obtain an overall quality assessment, each measurement for each ontological quality factor be weighed, but he did not suggest how to determine the weights. The Kesh measure has not been theoretically validated. After a real world application of the quality model, Kesh concluded that his model provides a useful framework for analysing and making revisions to ER diagrams. Apart from this demonstration experiment, no empirical evidence on the validity and usefulness of the measure has been collected. Moody et al. (1998) proposed an extensive framework for evaluating the quality of ER diagrams. This framework includes a set of 25 measures which capture different quality characteristics of conceptual data models. Some of them are objectively calculated while others are based only on subjective expert ratings. After having conducted an action research study (Moody, 2000), Moody concluded that only the number of entities and relationship metrics and the development cost estimate have benefits that outweigh their cost of collection. Si-Said Cherfi et al. (2002) defined a framework considering three dimensions of quality: usage, specification and implementation. For each dimension, they defined quality criteria and their corresponding measures (not all of them are objective). This framework was used in a case study, with the aim of choosing between alternative ER diagrams modelling the same reality. Even though the obtained results demonstrate that the best ER diagram can be chosen using this framework, we do not agree that the ER diagrams used in the study have an equivalent semantic content (as claimed by the authors), which confounds the validity of the study. Table 1 compares the current proposals of conceptual data model quality measures. The first column of the table contains a reference to the study where the proposal has been published. In the second column, the quality focus of the measures (i.e. the quality properties that are measured ) is presented. The third column refers to the scope of the measures, meaning the kind of data model that is measured. The fourth column shows whether the measures are objective (e.g. metrics) or subjective (e.g. quality scores or rating assigned by 'expert judges'). The fifth and sixth columns reflect whether there are published studies in which the theoretical and the empirical validation of the metrics has been attempted. The last column reflects whether an automated tool exists for the metric calculation.

5

Authors Eick (1991)

Quality focus

Scope

Objective/ Subjective S-diagrams Objective

Expressiveness, complexity, normalizedness Gray (1991) Complexity, ER diagram Objective Deviation from third normal form Kesh (1995) Ontological quality, ER diagram Objective behavioural quality and Subjective Moody (1998) Completeness, ER diagram Objective integrity, and flexibility, subjective understandability, correctness, simplicity, integration, implementability Extended Objective Si-Said Cherfi Specification ER diagram and et al. (2002) dimension: subjective legibility, expressiveness, simplicity, correctness Usage definition: completeness, understandability Implementation dimension: implementability, maintainability

Theoretical Validation N Partially

Empirical validation N

Tool N

Partially

Y

N

N

Y

N

Partially

Y

N

Partially

Y

Table 1. Summary of quality measures for conceptual data models Summarising the related work, we can conclude that: − Most of the existing proposals of quality measures for conceptual data models have not gone beyond the definition stage. There are few published studies about their empirical validity and even fewer on their theoretical validity. The validation studies that have been carried out are demonstration experiments, case-studies or action research studies and are of a one-of-a-kind nature. A rigorous empirical validation approach based on controlled and replicated experimentation has not been observed. − Eick´s metric and Kesh´s metric as single measures for quality are relatively simple, but do not realistically capture conceptual data model quality. Fenton and Pfleeger (1997) do not recommend the use of this type of metrics that intend to capture all the aspects of quality in a single number. Furthermore, combining different measurement values into one aggregate value is of limited applicability in practice if the proposal is unclear about the combination rule, weights, etc. (as in Kesh (1995)

6

or if the choice of weights or coefficients is left to the discretion of the practitioner. Gray et al. (1991), for instance, remark that the values of the coefficients are likely to be highly dependent on the local environment, and may vary from one case to another. 3. Metric definition and validation methodology In recent literature, a large number of measures have been proposed for capturing properties of software artifacts in a quantitative way. However, few of these 'software metrics' have successfully survived the initial definition phase and are actually used in industry. This is due to a number of problems related to the theoretical and empirical validity of software metrics, the most relevant of which are summarised below (Briand et al., 2002): − Measures are not always defined in the context of some explicit well-defined measurement goal of the industrial interest they help to reach, e.g. reduction of development effort or faults present in the software products. − Even if the goal is explicit, the experimental hypotheses are often not made explicit, e.g. What do you expect to learn from the analysis and can you believe it? − Measurement definitions do not always take into account the environment or context in which they will be applied, e.g. Would you use a complexity measure that was defined for non-object oriented software in an object oriented context? − A reasonable theoretical validation of the measure is often not possible because the attribute that a measure aims to quantify is often not well defined, e.g. Are you using a measure of complexity that corresponds to your intuition about complexity (attribute)? − A large number of measures have never been subject to an empirical validation, e.g. How do you know which measures of size best predict effort in your environment? This situation has frequently led to some degree of fuzziness in measure definitions, properties, and underlying assumptions, making the use of the measures difficult, their interpretation hazardous, and the results of the various validation studies somewhat contradictory. These characteristics do not imply that progress cannot be made in the measurement field. For this purpose metrics must be defined in a methodological and disciplined way. In this work we follow the definition and validation method proposed by Calero et al. (2001), that tries to alleviate some of the problems mentioned above. The main tasks of this method are shown in figure 1.

7

METRIC DEFINITION

EMPIRICAL VALIDATION

THEORETICAL VALIDATION Propertybased Approaches

Measurement Theory-based Approaches

Experiments

Case Studies

Surveys

Figure1. Method for definition and validation of software metrics In the rest of this section we will take a closer look at the different steps of the method for obtaining valid metrics. 3.1. Metric definition Metric definition should be based on clear measurement goals. It is advised to take a goal-oriented measure definition approach such as GQM (Basili and Rombach, 1988; Basili and Weiss, 1984; Van Solingen and Berghout, 1999; Briand et al., 2002) to ensure that the metric selection or definition effort, the validation effort and the subsequent measurement effort all contribute to the achievement of a well-defined goal. Often this measurement goal relates to external quality attributes (e.g. maintainability, reusability) that cannot be measured directly at the desired moment (i.e. when the information is needed), and for which indirect, early available and easy to collect measures need to be found. The metric definition activity further involves the precise definition of the object of measurement, including its constituents. The unambiguous definition of the domain of a metric is a prerequisite for the validation of the metric, as well as for the practical use of the metric afterwards. Finally, as metrics are functions that take an argument (i.e. the object of measurement) and produce a value, there must be a precise description of the mathematical, logical or other formulation and notation that is used to define and denote the function prescriptions (i.e. how to calculate the value for the given argument).

8

3.2. Theoretical validation The main goal of theoretical validation is to assess whether a metric actually measures what it purports to measure (Fenton and Pfleeger, 1997). In the context of an empirical study, the theoretical validation of metrics establishes their construct validity, i.e. it 'proves' that they are valid measures for the constructs that are used as variables in the study. The validity of the measurement instruments used for the variables of an empirical study is a key factor in the overall study validity. On the one hand, the theoretical validity of the ER diagram metrics proposed in section 4 is used to claim their construct validity when used as measures of ER diagram structural complexity, which is the independent variable in the empirical studies described in section 6. On the other hand, knowledge of the scale type of the metrics helps when choosing the appropriate statistical technique(s) to analyse the data obtained in the experiments. The preferred theoretical validation approach is one in which measures can be used as ratio scales, allowing the use of a wide range of data analysis techniques, including parametric statistics, and thus providing maximum flexibility to the researcher (Fenton and Pfleeger, 1997). Unfortunately, as Van den Berg and Van den Broek (1996) remark, even though several attempts have been made at proposing methods and principles to carry out the theoretical validation of metrics (mainly in the context of software engineering), there is not yet a standard, accepted way of theoretically validating a software metric. Work on theoretical validation has followed two paths: − Measurement-theory based approaches such as those proposed by Whitmire (1997), Zuse (1998), and Poels and Dedene (2000). − Property-based approaches (also called axiomatic approaches), such as those proposed by Weyuker (1988) and Briand et al. (1996, 1997). In this work the theoretical validation of the ER diagram structural complexity metrics will be performed using the DISTANCE framework. This framework, proposed by Poels and Dedene (2000), is a conceptual framework for software metric validation grounded in Measurement Theory (Krantz et al., 1971; Roberts, 1979). This theory, originating from the discipline called Philosophy of Science, is a normative theory prescribing the conditions that must be satisfied in order to use mathematical functions as “measures”. Measurement theoretic approaches to software metric validation, such as DISTANCE, propose methods to verify whether these conditions hold for software metrics. Their use of Measurement Theory as the reference theory for measurement distinguishes these approaches from property-based approaches to metric validation (e.g. Weyuker, 1988; Briand et al., 1996, 1997), which are based on argumentation,

9

subjective experience or even intuition, instead of having a well-established theoretical base (Poels and Dedene, 1997). The DISTANCE framework offers a measure construction procedure to model properties of software artefacts and define the corresponding software metrics. In this sense, the framework has an added value above other measurement theoretic approaches (e.g. Whitmire (1997); Zuse (1998)) that focus on metric validation and cannot be used for metric definition. The framework is called DISTANCE as it uses the mathematical concept of distance and its measurement theoretic interpretation as the main cornerstone of the validation approach. The basic idea of DISTANCE is to define properties of objects in terms of distances between the objects and other objects that serve as reference points (or norms) for measurement. The larger these distances, the greater the extent to which the objects are characterised by the properties (or for quantitative properties, called quantities by Ellis (1968), the higher their values are). This particular definition of properties of objects allows them to be measured by functions that are called “metrics” in Mathematics. Metrics are functions that satisfy the metric axioms, i.e. the set of axioms that are necessary and sufficient to define distance measures (in the sense of Measurement Theory) (Suppes et al., 1989). The measurement theoretic theorems associated with distance measurement are part of the framework, meaning that the conditions specified by these theorems are met when carrying out the measure construction procedure. This ensures that the theoretical validity of the measures obtained with DISTANCE is formally proven within the framework of Measurement Theory. An important pragmatic consequence of this explicit link with Measurement Theory is that the resulting measures define ratio scales. In this section we will summarise the measure construction procedure of DISTANCE for ease of reference. A detailed discussion of the process model and the measurement theoretic foundations of DISTANCE is outside the scope of the paper, but can be found in (Poels and Dedene, 2000; Poels, 1999). 3.2.1. The DISTANCE measure construction procedure The measure construction procedure prescribes five activities. The procedure is triggered by a request to construct a measure for a property that characterises the elements of some set of objects. There might, for instance, be a request that expresses the need for a measure of some structural complexity aspect (i.e. a property) of ER diagrams. The activities of the DISTANCE procedure are briefly summarised below. For notational convenience, let P be a set of objects that are characterised by some property pty for which a measure needs to be constructed.

10

3.2.1.1. Finding a measurement abstraction The objects of interest must be modelled in such a way that the property for which a measure is needed is emphasised (Zuse, 1998). This is actually a requirement of all model-based approaches to metrication since the object of interest needs a representation before a metric can be applied (Dumke, 1998). A suitable representation, called measurement abstraction hereafter, should allow to what extent an object is characterised by the property to be observed. By comparing measurement abstractions, we should be able to tell whether an object is more, equally, or less characterised by the property than another object (at least for quantities (Ellis, 1968)). The outcome of this activity is a set of objects M that can be used as measurement abstractions for the objects in P with respect to the property pty. Let abs: P → M be the function that formally describes the rules of the mapping. 3.2.1.2. Defining distances between measurement abstractions This activity is based on a generic definition of distance that holds for elements in a set. To define distances between elements in a set, the concept of an ‘elementary transformation function’ is used. This is a homogeneous function on a set representing an atomic change of an element in some prescribed way. By applying an elementary transformation function to an element of a set, the element is transformed into another element of the set by changing it. Moreover, this change is atomic, meaning that it cannot be subdivided into a series of ‘smaller’ changes. Elements of a set can be changed in multiple ways. The second activity of the DISTANCE procedure requires that a set Te of elementary transformation functions be found that is sufficient to change any element of M into any other element of M (for a discussion on the conditions required for defining such a set of elementary transformation functions, we refer to (Poels, 1999)). If such a set Te is found, then the distance between two elements of M is defined as a shortest sequence of elementary transformations (i.e. applications of the elementary transformation functions in Te) to transform one element into the other. In general, there are multiple sequences of elementary transformations to go from one element to another. For the notion of distance used by the procedure, only the shortest sequences are taken into account, i.e. those that require the minimum number of elementary transformations. 3.2.1.3. Quantifying distances between measurement abstractions This activity requires the definition of a distance measure for the elements of M. Basically this means that the distances defined in the previous activity are now quantified by representing (i.e. measuring) them as the number of elementary transformations in the shortest sequences of elementary transformations between elements.

11

Formally, the activity results in the definition of a metric (in the mathematical sense) δ: M × M → ℜ that can be used to map (the distance between) a pair of elements in M to a real number. 3.2.1.4. Finding a reference abstraction This activity requires a kind of thought experiment. We need to determine what the measurement abstraction for the objects in P would look like if they were characterised by the theoretical lowest amount of pty (again, on condition that the property is a quantity). If such a hypothetical measurement abstraction (i.e. an object in M) can be found (for necessary conditions we refer to (Poels, 1999)), then this object is called the reference abstraction for P with respect to pty. The idea of DISTANCE is now to use this reference abstraction as a reference point for measurement. More particularly, DISTANCE uses the distance between the measurement abstraction of an object p in P and the reference abstraction of P as a formal definition of the property pty of the object p. This means that the larger the distance between the measurement abstraction and the reference abstraction, the more the property characterises the object (i.e. the greater the amount of pty in p). Let ref: P → M be the function that describes the rules of the mapping. 3.2.1.5. Defining a measure for the property The final activity consists of defining a measure for pty. Since properties are formally defined as distances, and these distances are quantified with a metric function, the formal outcome of this activity is the definition of a function µ: P → ℜ such that ∀ p ∈ P: µ(p) = δ(abs(p), ref(p)). 3.3. Empirical validation Common wisdom, intuition, speculation, and proof of concepts are not reliable sources of credible knowledge (Basili et al., 1999), hence it is necessary to place the metrics under empirical validation. Empirical validation is crucial for the success of any software measurement project (Schneidewind, 1992; Kitchenham et al., 1995; Basili et al., 1999). Through empirical validation we can demonstrate with real evidence that the measures we have proposed serve the purpose they were defined for and that they are useful in practice, i.e. related to some external attribute worth studying and therefore helping to reach some goal (e.g. assessment, prediction). In our case we want to demonstrate the empirical validity of the ER diagram structural complexity metrics, this means that we have to corroborate that they are really related to ER diagram maintainability.

12

3.3.1. Type of empirical studies There are three major types of empirical research strategy (Robson, 1993): − Experiments. Experiments are formal, rigorous and controlled investigations. They are launched when we want control over the situation and want to manipulate behaviour directly, precisely and systematically. Hence, the objective is to manipulate one or more variables and control all other variables at fixed levels. An experiment can be carried out in an off-line situation, for example in a laboratory under controlled conditions, where the events are organised to simulate their appearance in the real world. Experiments may alternatively be carried out on-line, which means that the research is executed in the field under normal conditions (Babbie, 1990). Experiments in the context of empirical software engineering research are further discussed in Wohlin et al. (2000). − Case studies. A case study is an observational study, i.e. it is carried out by the observation of an on-going project or activity. The case study is normally aimed at tracking a specific attribute or establishing relationships between different attributes. The level of control is lower in a case study than in an experiment. . Case study research is further discussed in Robson (1993), Stake (1995), Pfleeger (1994;1995) and Yin (1994). − Surveys. A survey is often an investigation performed in retrospect, when for example, a tool or technique, has been in use for a while. The primary means of gathering qualitative or quantitative data are interviews or questionnaires. These are completed by taking samples which are representative of the population to be studied. The results from the survey are then analysed to derive descriptive or explanatory conclusions. Surveys provide no control over the execution or the measurement, though it is possible to compare them to similar ones. Surveys are discussed further in Babbie (1990), Robson (1993) and Pfleeger and Kitchenham (2001). The prerequisites for an investigation limit the choice of the research strategy. A comparison of strategies can be used bearing in mind the following factors: − Execution control. Describes how much control the researcher has over the study. For example, in a case study data is collected during the execution of a project. If management decides to stop the studied project due to, for example, economic reasons, the researcher cannot continue the case study. The opposite is the experiment where the researcher is in control of the execution. − Measurement control. Is the degree to which the researcher can decide upon which measures are to be collected, included or excluded during execution of the study.

13

− Investigation cost. Depending on which strategy is chosen, the cost differs. This cost is related to, for example, the size of the investigation and the need for resources. The difference between case studies and experiments is that if we choose to investigate a project in a case study, the outcome of the project is some form of the product that may be retailed, i.e. it is an on-line investigation. In an off-line experiment the outcome is some form of experience which is not profitable in the same way as a product. − Ease of replication. Refers to the ease with which we can replicate the basic situation we are investigating. If replication is not possible, then you cannot carry out an experiment. Due to the possibility of replication of experiments their results are more generalisable. Table 2 shows each of the factors mentioned above for each empirical strategy and can be used as a general guideline to decide whether to perform a case study, a survey or an experiment. Factor

Survey

Case study

Experiment

Execution control

No

No

Yes

Measurement control

No

Yes

Yes

Investigation cost

High

Medium

High

Ease of replication

High

Low

High

Table 2. Research strategy factors (Wohlin et al., 2000) Looking at the comparison of empirical strategies shown in table 2 and the available resources, we decided to undertake controlled experiments to validate the ER diagram metrics proposed in section 4. For this reason we will now look in detail at the steps involved in the experiment process.

14

3.3.2. Experiment process The experiment process that is used for the empirical validation of the proposed metrics is based on the software engineering experiment process of Wohlin et al. (2000). Figure 2 shows the five main phases in this process. We will present these phases and their activities in more detail in the following paragraphs. It must be noted though that in this section, we focus on the many decision alternatives in the experiment process, whereas in section 6, where we present the results of the empirical metrics validation, more details are provided regarding the specific choices that we made. Experiment Idea

Experiment definition Experiment planning Experiment operation Analysis and interpretation Presentation and package

Conclusions

Figure 2. Overview of the experiment process (Wohlin et al, 2000)

3.3.2.1. Definition The purpose of the definition phase is to define the goals of an experiment, formulated from the problem to be solved. In order to capture the goals of the experiment a template for goal definition, based on the Goal-Question-Metric (GQM) paradigm (Basili and Weiss, 1984; Basili and Rombach, 1988; Van Solingen and Berghout, 1999), has been suggested (Briand et al., 1996b; Lott and Rombach, 1996). This goal template is: Analyse for the purpose of with respect to their

15

from the point of view of the in the context of − The Objects of the study the entity that is studied in the experiment, which can be products, processes, resources, models, metrics or theories. − The Purpose defines what the intention of the experiment is. − The Perspective tells us the viewpoint from which the experiment results are interpreted. The general perspective we take is that of professionals working with ER diagrams, possibly having to maintain them. − The Context is the “environment” in which the experiment is run. The context briefly defines which personnel is involved in the experiment (subjects) and which software artefacts (objects) are used in the experiment. All the experiments we have done were run in an academic environment. 3.3.2.2. Planning While the definition step determines why the experiment is conducted, the planning phase prepares how the experiment is conducted. The planning phase involves the following six activities: − Context selection. The context of the experiment can be characterised according to four dimensions: o Off-line vs. On-line. Experiments can be executed as large, real projects (online). An alternative is to run tasks parallel to the real project (off-line). o Student vs. Professional. Experiments can be executed with professional staff or with students. o Toy vs. Real problems. The experiment can address a real problem or a toy situation. o Specific vs. General. The results of the experiment can be valid in a specific context or valid in the general software engineering domain. − Hypothesis formulation. The experiment definition is formalised into hypotheses. As we know two hypotheses have to be formulated: o Null hypothesis (H0): This hypothesis states that there are no real underlying trends or patterns in the experiment setting; the only reasons for differences in our observations are coincidental. This is the hypothesis that the experimenter wants to reject with as high significance as possible. o Alternative hypothesis (H1): This is the hypothesis in favour of which the null hypothesis is rejected. − Variables selection. There are two kinds of variables in an experiment: independent and dependent variables. Those variables that we want to study to see the effect of the changes in the independent variables are called dependent variables (or response

16

variables). All the variables that are controlled and manipulated in the experiment are called independent variables. Those independent variables we want to change to see the effect over the dependent variable are called the factors. The different values of a factor are called treatments. − Selection of subjects. The selection of subjects is closely connected to the generalisation of the results from the experiment. In order to generalise the results to the desired population, the selection must be representative for that population. The selection of subjects is also called a sample of population. The sampling of the population can be either a probability or non-probability sample. Examples of probability sampling techniques are: simple random sampling, systematic sampling and stratified random sampling. Examples of non-probability sampling techniques are: convenience sampling and quota sampling. The size of the sample also affects the results when generalising. The larger the sample is, the lower the error becomes when generalising the results. The sample size is also closely related to the power of the statistical test. − Experiment design. An experiment consists of a series of tests. To get the most out of the experiment, the series of tests must be carefully planned and designed. The design of an experiment describes how the tests are organised and run. The design and the statistical analysis are closely related. To design the experiment, we have to look at the hypotheses to see which statistical analysis we have to perform to reject the null hypothesis. Based on statistical assumptions, e.g. the measurement scales, and on which objects and subjects we are able to use, we design the experiment. During the design we determine how many tests the experiment shall have to make sure that the effect of the treatment is visible. A proper design also forms the basis from which to allow for replication. When designing an experiment we can also distinguish between a within-subjects design or a between-subjects design. In the former design the same subjects are used for all the treatments while in the latter different subjects are used for each treatment. The best way for the groups assigned to the various experimental conditions to be as similar as possible is for them to actually be the same subjects. That is to say that once the sample which will be used in the experiment has been selected, the experimental conditions will be applied to all subjects. This has the advantage of guaranteeing control of all the variables due to the differences between the subjects which could contaminate the results of the research and, apart from that, it allows us to reduce the effort by finding the same information from fewer subjects. For that reason, within-subjects experiments are the most frequently used in software engineering (Basili et al., 1999).

17

However, a series of distorting effects exists in the within-subject design: o To be controlled before carrying out the experiment. Effects which can threaten the internal validity of the within-subject experiments include the learning effect (a form of controlling this effect is that a subject does not repeat an experiment more than once), the fatigue effect (owing to a particularly long experiment), and the motivation effect. o To be controlled during the course of the experiment. The practical effect (the way of controlling this is by ensuring that all levels of the independent variable appear in all possible orders), the persistence effect (when we suspect that the effects of a treatment persist for a long time, we should create periods of extinction between one treatment and the next and then study the influence one has on the other). − Instrumentation. Before execution, the instruments are developed for the specific experiment. The instruments for an experiment are of three types: o Experiment objects may be, for example, specification or code documents. When planning for an experiment, it is important to choose objects that are appropriate. o Guidelines are needed to guide the participants in the experiment. Guidelines include, for example, process descriptions and checklists. If different methods are compared in the experiment, guidelines for the methods have to be prepared for the experiment. In addition to the guidelines, the participants also need training in the methods to be used. o Measurements in an experiment are conducted via data collection. In human intensive experiments, data is generally collected via manual forms or by interviews. The overall goal of the instrumentation is to provide the means to perform the experiment and to monitor it, without affecting the control of the experiment. The results of the experiment shall be the same independently of how the experiment is instrumented. If the instrumentation affects the outcome of the experiment, the results are invalid. − Validity evaluation. A fundamental question concerning results from an experiment is how valid the results are. The degree of credibility of any experiment depends on the validity of how conclusions are drawn. It is important to consider the question of validity early on in the planning phase in order to plan adequate validity of the experiments. Four classes of evaluation criteria for experiments exist (Campbell and Stanley, 1963; Cook and Campbell,1979; Judd et al., 1991):

18

o Conclusion validity. Defines the extent to which conclusions are statistically valid. o Construct validity. Defines the extent to which the variables successfully measure the theoretical constructs in the hypothesis. A threat to the construct validity is the lack of theoretical proof of whether the measures for the independent and the dependent variables are really measuring the concepts they purport to measure. o Internal validity. Defines the degree of confidence in a cause-effect relationship between factors of interest and the observed results (i.e. the degree to which conclusions can be drawn about the cause-effect relationship of independent variables on the dependent variables). Threats to internal validity concern issues that may indicate a casual relationship, although there is none. Factors that have an impact on the internal validity are how the subjects are selected and divided into different classes, how the subjects are treated and compensated during the experiment, whether special events occur during the experiment, the material used in the experiment, etc. o External validity. External validity is the degree to which the results of the research can be generalised to the population under study and other research settings . The greater the external validity, the more the results of an empirical study can be generalised to actual software engineering practice. Threats to the external validity concern the ability to generalise experiments results outside the experiment setting. External validity is not only affected by the experiment design chosen, but also by the objects in the experiments and the subjects that were chosen. There are three main risks: not having the right participants as subjects, conducting the experiment in the wrong environment and performing it with a timing that affects the results. 3.3.2.3. Operation When an experiment has been designed and planned it must be carried out in order to collect the data that should be analysed. This is what we mean by operation of an experiment. In the operational phase of an experiment, the treatments are applied to the subjects. This means that this part of the experiment is the part where the experimenter actually meets the subjects. The operational phase is divided into three steps, they are preparation, execution and data validation. − Preparation. Before an experiment can be started, people who are willing to act as subjects have to be found. It is essential that people are motivated and willing to

19

participate throughout the whole experiment. The selection of subjects, in terms of sampling technique, is discussed in the definition phase. − Execution. The experiment can be executed in a number of different ways. Some experiments can be carried out on one occasion when all participants are gathered together at, for example, a meeting. The advantage of this is that the results of the data collection can be obtained directly at the meeting and there is no need to contact the participants and ask for their respective results later on. Another advantage is that the experimenter is present during the meeting and if any questions arise they can be resolved directly. Some experiments are, however, executed during a much longer time span, and it is impossible for the experimenter to participate in every detail of the experiment and the data collection. Data can be collected either manually by the participants who fill out forms or automatically by tools. The first alternative is probably the most common. − Data validation. When data has been collected, the experimenter must check that the data is reasonable and that it has been collected correctly. This deals with aspects such as whether the participants have understood the forms and therefore filled them out correctly. It is also important to review that the experiment has actually been conducted in the way that was intended. 3.3.2.4. Analysis and interpretation After we have collected the relevant data, we must analyse it in an appropriate way. There are three major items to consider when choosing analysis techniques: the nature of the data we have collected, the purpose of the experiments, and the type of experimental design. Pfleeger (1994;1995) carried out a detailed study on the different statistical tests for hypothesis testing, presenting a decision tree, which can be used to choose the best technique. Figure 3 shows the decision tree only regarding the purpose “Exploring a relationship or Validating software measures”, because this is the objective we pursue in all of the experiments we carry out in this thesis. The decision tree must be read from left to right. Beginning with the objective of our experiment, we move along the branch that fits our situation until we reach a leaf node with the appropriate analysis technique2.

2

In this study we use the Statistical Package For Social Science (SPSS, 11.0 version) to automatically apply different statistical tests for analysing the data collected in the empirical studies (SPSS, 2001).

20

Box plot Baseline

Scatter diagrams Pearson

Normal Exploring a relationship or Validating software measures Measure of association

Spearman

Non-normal

Kendall Statistical confirmation with correlational analysis

Linear regression

2 variables Normal

Multivariate regression

+ 2 variables Equation

Non-normal

Logaritmic transformation

Figure 3. Decision tree for analysis techniques Moreover, when analysing empirical data, it is advisable to realize that in the majority of cases, part of the information that the metrics (i.e. measures of the independent variables, mostly internal properties of the object of study) provide is redundant, which in statistical terms is equivalent to saying that metrics might be very correlated. This justifies the interest of analyzing the information that each metric captures to eliminate such redundancy. In experimental research in software engineering (Briand et al., 1998; Briand and Wüst, 2002; Briand et al., 2001b; Hanebutte et al., 2003), like as in other disciplines, this problem is solved by using Principal Component Analysis (PCA) (Dunteman, 1989). If a group of variables in a data set are strongly correlated, these variables are likely to measure the same underlying dimension (or group of related internal properties) of the object to be measured. PCA is a standard technique used to identify the underlying, orthogonal dimensions that explain relationships between the variables in the data set. 3.3.2.5. Presentation and packaging When an experiment has been carried out, the intention is often to present the findings. This could for example be done in a paper for a conference, a report for decisionmaking, a laboratory package for replication of the experiment or as educational material.

21

During the presentation and packaging of an experiment it is necessary to consider certain issues. It is essential not to forget important aspects or necessary information needed in order to enable others to replicate or take advantage of the experiment and the knowledge gained through the experiment (Shull et al., 2002). The diffusion of experimental materials is essential to encourage other researchers to replicate the experiments (Basili et al., 1999). However there is not a consensus yet about the contents of laboratory packages. For readers who are interested, there are several studies which provide more details about how to carry out empirical studies on software artefacts or models (Wholin et al., 2000; Perry et al., 2000; Briand et al., 1999a; Juristo and Moreno, 2001; Kitchenham et al., 2002). 4. Proposal of structural complexity metrics for ER diagrams We will first enumerate the ER modelling constructs that we consider as determinants of the structural complexity of ER diagrams. In principle, all elements of an ER diagram contribute to its structural complexity. There are, however, different flavours of ER modelling and therefore it is advisable to explicitly list (and name) the constructs that will be subjected to measurement. Next, in a second sub-section, our suite of structural complexity metrics is presented. The metric definitions are stated informally, in natural language. Formal definitions, using set theoretic constructs, will be provided in the next section, where the formal validation of the metrics is presented. 4.1. ER modelling constructs The ER diagrams for which the structural complexity metrics are proposed, build upon the constructs offered in the original ER model (Chen, 1976), plus a few additional constructs such as IS-A relationships. Instead of presenting a well-defined meta-model, we only list the modelling constructs. Precise definitions of their semantics can be found in classic textbooks on ER modelling (Elsmari and Navathe, 2000). Note also that, although we do not use the word 'type' (mainly in order to shorten the names of the metrics proposed in the next sub-section), all constructs are considered at the type level. The information contained within an ER diagram does not allow the measurement of structural complexity at the occurrence level. The ER modelling constructs considered in this paper include: − Entities (we do not distinguish between strong and weak entities) − Attributes:

22

o simple and composite attributes (a composite attribute is composed of several simple attributes) o non-derived and derived attributes (the value of derived attributes can be calculated or deduced from the values of other attributes) o single-valued and multi-valued attributes (a multi-valued attribute can take several values for the same entity occurrence) − Relationships: o reflexive, binary, and n-ary relationships (according to the degree of the relationship, respectively one, two, and more than two) o "common" and IS-A relationships (an IS-A relationship is a relationship between a super-type and a sub-type that is the result of a specialisation or generalisation process; “common” relationships are the relationships considered in Chen's original ER model) o “common” relationships can also be divided into 1:1, 1:N, and M:N relationships (according to the cardinalities of the relationship's roles, respectively one-to-one, one-to-many, and many-to-many) 4.2. A metrics suite for ER diagrams structural complexity Using the GQM (Basili and Rombach, 1988; Basili and Weiss, 1984; Van Solingen and Berghout, 1999) template for goal definition, the goal pursued for the definition of the metrics for ER diagrams is: Analyse ER diagrams for the purpose of Assessing with respect to their Maintainability from the point of view of the Requirements engineers, conceptual modellers, database designers in the context of Database development As we said in the introduction, external quality properties of ER diagrams such as maintainability are hard to assess in an objective and efficient way (e.g. using an automatic measurement tool). Therefore, we wish to investigate whether they can be measured indirectly, through internal product properties such as structural complexity, by means of quantitative measurement instruments in the form of metrics. In particular, our goal is to demonstrate that structural complexity metrics for ER diagrams are useful and reliable indicators of ER diagram maintainability. In order to accomplish this goal, we need to define the metrics and show their validity, both as structural complexity measures and as indirect measures (i.e. indicators) of ER diagram maintainability.

23

In accordance with Complexity Theory, the complexity of a system is based on the number of (different types of) elements and on the number of (different types of) (dynamically changing) relationships between them (Pippenger, 1978). So we considered that the structural complexity of an ER diagram is determined by the different elements that compose it, such as entities, attributes, relationships, etc. Fenton has formally proven that a single measure of complexity cannot capture all possible aspects or viewpoints on complexity (Fenton, 1994). Hence it is not advisable to define a single measure for ER diagram structural complexity. The approach taken here is to propose several objective and direct measures applied at diagram level, each one focussing on a different ER modelling construct. The metrics included in the metrics suite are shown in table 3. The function prescripts are defined informally, in natural language, as a first step, because in the theoretical validation part of the methodology a formal definition is stated. This is because the approach taken for the theoretical validation here includes metric definition, which is of course not the case for all theoretical validation approaches. Moreover, in this study theoretical validation precedes empirical validation, which justifies the informal character of the metric definitions in this first phase.

24

Metric name NE NA

NDA NCA NMVA NR NM:NR N1:NR. NN-AryR NBinaryR NRefR NIS_AR

Metric description The Number of Entities metric is defined as the number of entities within an ER diagram. The Number of Attributes metric is defined as the total number of attributes defined within an ER diagram, taking into account not only entity attributes but also relationship attributes. In this number are included all attributes (but not the composing parts of composite attributes). By convention, the NA metric does not count the attributes that a subtype inherits from its super-type (i.e. these attributes are counted only once, as attributes of the super-type). The Number of Derived Attributes metric is defined as the number of derived attributes within an ER diagram. The value of NDA is always strictly less than the value of NA. The Number of Composite Attributes metric is defined as the number of composite attributes within an ER diagram. This value is less than or equal to the NA value. The Number of Multi-valued Attributes metric is defined as the number of multi-valued attributes within an ER diagram. Again, this value is less than or equal to the NA value. The Number of Relationships metric is defined as the total number of relationships within an ER diagram. The Number of M:N Relationships metric is defined as the number of M:N relationships within an ER diagram. The value of NM:NR is less than or equal to the NR value. The Number of 1:N Relationships metric is defined as the total number of 1:N relationships within an ER diagram3. Also this value is less than or equal to the NR value. The Number of N-Ary Relationships metric is defined as the number of N-Ary relationships within an ER diagram. Its value is less than or equal to the NR value. The Number of Binary Relationships metric is defined as the number of binary relationships within an ER diagram. Again, the value is less than or equal to the NR value. The Number of Reflexive Relationships metric is defined as the number of reflexive relationships within an ER diagram. Its value is less than or equal to the value of the NR metric. The Number of IS_A Relationships metric is defined as the number of IS_A relationships within an ER diagram. In this case, we consider one relationship for each super-type/sub-type pair. Table 3. Metrics for ER diagrams

3

The number of 1:1 relationships was not used as a separate metric because these relationships are considered a subset of the 1:N relationships.

25

As an example, we will apply the outlined metrics to the ER diagram shown in figure 4, taken from Elmasri and Navathe (1994). Figure 4. Example of an ER diagram (Elmasri and Navathe, 1994) SSN

Birth_Date

Fi rst_Name Sex

Name

City

Last_Name Street

Address

PERSON

Nu mber

Age Number Is_A

Salary

1:N

Location s

Name

Ma jor_Dept

WORKS_FOR

DEPARTMENT

STUDENT

ALUMNUS

EMPLOYEE

1:1 Number_ Employees

Degrees

MANAGES

Is_A Is_A

M:N

Year

CONTROLS

1:N

Degree

Maj or

Is_A WORKS_IN

GRADUATE_

Percent_Ti me

STUDENT

PROJECT

Name Location

STAFF

FACULTY

Posi tion

Rank

STUDENT-

DegreeProgra m

ASSISTANT

Is_A

Number REASEARCH_ASSISTANT

TEACHING_ASSISTANT

Entity Relationship symbols Project

entity

relationship

IS_A relationship

simpe composite attribute attribute

multivaluated at tribute

derived attribute

26

Course

UNDERGRADUATE_ STUDENT

Class

Table 4 shows the value of each metric calculated for the example presented above. Metrics NE

NA

NDA NCA NMVA NR NM:NR N1:NR NN-aryR NBinary R NIS_AR

NRefR

Value Explanation 13 Entities = PROJECT, DEPARTMENT, EMPLOYEE, STAFF, FACULTY, STUDENT-ASSISTANT, ALUMNUS, PERSON, STUDENT, GRADUATE_STUDENT, UNDERGRADUATE_STUDENT, RESEARCH_ASSISTANT, TEACHING_ASSISTANT 23 Non-derived, single-value, simple attributes (not part of composite attributes)= PROJECT(Name, Number, Location), DEPARTMENT(Name, Number), EMPLOYEE(Salary), STAFF(position), FACULTY(Rank), STUDENT_ASSISTANT(Percent_Time), PERSON(SSN, Sex, Birth_Date) STUDENT(Major_Dept), GRADUATE_STUDENT(Degree_Program), UNDERGRADUATE_STUDENT(Class), RESEARCH_ASSISTANT(Project), TEACHER_ASISSTANT(Course) Derived attributes = DEPARTMENT(Number_Employees), PERSON(Age) Composite attributes = PERSON(Name, Address) and ALUMNUS(Degrees) Multivalued attributes = DEPARTMENT (Location) 2 Derived attributes = DEPARTMENT(Number_Employees), PERSON(Age) 3 Composite attributes = PERSON(Name, Address) and ALUMNUS(Degrees) 1 Multivalued attributes = DEPARTMENT (Location) 4 Relationships = CONTROLS, WORKS_FOR, MANAGES, WORKS_IN 1 M:N relationships = WORKS_IN 3 1:N relationships =WORKS_FOR, CONTROLS, MANAGES 0 4 Binary relationships =CONTROLS, WORKS_FOR, MANAGES, WORKS_IN 11 IS_A relationships = (PERSON, EMPLOYEE), (PERSON, ALUMNUS), (PERSON. STUDENT), (EMPLOYEE, STAFF), (EMPLOYEE, FACULTY), (EMPLOYEE, STUDENT_ASSISTANT), (STUDENT, STUDENT_ASSISTANT), (STUDENT, GRADUATE_STUDENT ), (STUDENT, UNDERGRADUATE_STUDENT), (STUDENT_ASSISTANT, RESEARCH_ASSISTANT), STUDENT_ASSISTANT, TEACHING_ASSISTANT) 0 Table 4. Metric values for the ER diagram of figure 4

27

5. Theoretical validation of the proposed metrics In this section we present the theoretical validation of the ER diagram metrics using the DISTANCE framework (Poels and Dedene, 2000). We show that all structural complexity related properties of ER diagrams can be modelled by means of measurement abstractions that take the form of sets. For measurement abstractions that are sets, a generic solution to the second measure construction activity (i.e. define distances between measurement abstractions) and the third measure construction activity (i.e. quantify distances between measurement abstractions) has been proposed in Poels (1999). This solution consists of generic definitions of distance and distance measure for sets, which can be instantiated to any kind of set. We will therefore only show the application of the DISTANCE procedure to one of the metrics (the Number of Attributes (NA) metric will be taken as an example). The theoretical validation of the rest of the metrics proceeds in an analogous way, and is therefore only briefly presented. 5.1. DISTANCE-based validation of the NA metric In section 4 the NA metric was defined as the number of attributes defined within an ER diagram. The metric intends to measure an aspect of structural complexity which basically states that the more attributes defined within an ER diagram, the higher its structural complexity. To exemplify the formal definition and theoretical validation of the NA metric, we use the ER diagrams shown in figure 5.

28

(ER diagram A) Name Number

Name

1:N

Number

CONTROLS DEPARTMENT

Location

PROJECT

(ER diagram B) Name Number

Location

SSN

First_Name

M:N PROJECT

Last_Name WORKS_FOR

EMPLOYEE

Figure 5. Two example ER diagrams − Finding a measurement abstraction. The set of objects that are characterised by a structural complexity property (referred to as the set P in section 3) is the Universe of ER diagrams4 (notation: UERD), i.e. the set of all conceivable syntactically correct ER diagrams that are relevant to some Universe of Discourse (UoD) (which does not need to be specified any further). The structural complexity property for which the NA metric is proposed (referred to as pty) is the number of attributes defined within an element of UERD. Let UA be the Universe of Attributes relevant to the UoD. The set of attributes defined within an ERD ∈ UERD, denoted by SA(ERD), is a subset of UA. The sets of attributes defined within the ER diagrams of UERD are elements of the power set of UA (notation: ℘(UA)). As a consequence we can equate the set of measurement abstractions (referred to as M in the DISTANCE procedure) to ℘(UA) and define the mapping function as: absNA: UERD → ℘(UA): ERD → SA(ERD)

4

For the theoretical validation we use the acronym ERD for ER diagrams.

29

This function simply maps an ER diagram onto its set of attributes, which is a sufficient representation of an ER diagram for the aspect of structural complexity considered. Applied to the example of Fig. 1 the mapping results in: absNA(ERD A) = SA(ERD A) = {Department.Number, Department.Name, Project.Name, Project.Number, Project.Location} absNA(ERD B) = SA(ERD B) = {Project.Name, Project.Number, Project.Location, Employee.SSN, Employee.First_Name, Employee.Last_Name} − Defining distances between measurement abstractions. A set of elementary transformation functions for ℘(UA), denoted by Te-℘(UA), must be found sothat any set of attributes can be transformed into any other set of attributes. It was shown in Poels (1999) that to transform sets, only two elementary transformation functions are required: one that adds an element to a set and another that removes an element from a set. So, given two sets of attributes s1 ∈ ℘(UA) and s2 ∈ ℘(UA), s1 can always be transformed into s2 by removing first all attributes from s1 that are not in s2, and then adding all attributes to s1 that are in s2, but were not in the original s1. In the 'worst case scenario', s1 must be transformed into s2 via an empty set of attributes. Formally, Te-℘(UA) = {t0-℘(UA), t1-℘(UA) }, where t0-℘(UA) and t1-℘(UA) are defined as: t0-℘(UA): ℘(UA) → ℘(UA): s → s ∪ {a}, with a ∈ UA t1-℘(UA): ℘(UA) → ℘(UA): s → s - {a}, with a ∈ UA In our example, the distance between SA(ERD A) and SA(ERD B) can be defined by a sequence of elementary transformations that removes Department.Number and Department.Name from SA(ERD A) and that adds Employee.SSN, Employee.First_Name and Employee.Last_Name to SA(ERD A). This sequence of five elementary transformations is sufficient to transform SA(ERD A) into SA(ERD B). Of course, other sequences, in which for instance the order of these five transformations is changed, exist and can be used to define the distance in sets of attributes between ERD A and ERD B. But it is obvious that no sequence can contain fewer than five elementary transformations if it is going to be used as a definition of this distance. − Quantifying distances between measurement abstractions. It was shown in Poels (1999) that the symmetric difference model, which is a particular instance of Tversky’s contrast model (Suppes et al., 1989), can always be used to define a

30

metric (in the mathematical sense) for distances between sets. Hence, we define the metric δ℘(UA) as: δ℘(UA): ℘(UA) × ℘(UA) → ℜ: (s, s') → s – s' + s' – s This definition is equivalent to stating that the distance between two sets of attributes, as defined by the shortest sequence of elementary transformations between these sets, is measured by the count of elementary transformations in the sequence. Note that for any element in s but not in s' and for any element in s' but not in s, an elementary transformation is required. The symmetrical difference model results in a value of 5 for the distance between the sets of attributes of ERD A and ERD B. Formally, δ℘(UA) (SA(ERD A), SA(ERD B)) = {Department.Number, Department.Name, Project.Name, Project.Number, Project.Location}

-

{Project.Name,

Employee.SSN,

Employee.First_Name,

{Project.Name,

Project.Number,

Employee.First_Name,

Project.Number,

Project.Location,

Employee.Last_Name} Project.Location,

Employee.Last_Name}

-

+

Employee.SSN,

{Department.Number,

Department.Name, Project.Name, Project.Number, Project.Location} = {Department.Number, Department.Name} + {Employee.SSN, Employee.First_Name, Employee.Last_Name} =5 − Finding a reference abstraction. For the structural complexity property that is considered, the obvious reference point for measurement is the empty set of attributes. It can be argued that an ER diagram with no attributes defined has the lowest structural complexity (at least for the aspect of structural complexity considered here) that can be imagined. Therefore we define the mapping function as: refNA: UERD → ℘(UA): ERD → ∅ − Defining a measure for the property. The NA metric can be formally defined as a function that returns for any ERD ∈ UERD the value of the metric δ℘(UA) for the pair of sets SA(ERD) and ∅: ∀ ERD ∈ UERD: NA(ERD) = δ℘(UA) (SA(ERD), ∅) = SA(ERD) - ∅ + ∅ - SA(ERD) = SA(ERD) As a consequence, a measure that returns the number of attributes in an ER diagram qualifies as a measure (in the sense of Measurement Theory) of the structural complexity property that is determined by the quantity of attributes defined within an

31

ER diagram. It must be noted here that, although this result seems trivial, other measurement theoretic approaches to metric validation cannot be used to guarantee the ratio scale type of the NA metric. The number of attributes defined in an ER diagram cannot for instance not be described by means of a modified extensive structure, as advocated in the approach of Zuse (1998), which is the best known way to arrive at ratio scales in software measurement (Genero, 2002). 5.2. DISTANCE-based validation of the other structural complexity metrics Due to space constraints we cannot present the results of the measure construction procedure for the other structural complexity aspects captured by the metrics proposed in section 4. All these structural complexity properties can be modelled by means of a set abstraction and all reference abstractions take the form of an empty set. As a consequence all the metrics can be validated as measures that count the elements in the set abstraction. The measurement abstractions that have been used when applying DISTANCE to the ER diagram structural complexity metrics are listed in table 5.

32

Metric NE

NDA

NCA

NMVA

NR

N1:NR

NM:NR

NBinaryR

NN-aryR

NIS_AR

NRefR

NRR

Measurement abstraction absNE: UERD → ℘(UE): ERD → SE(ERD) where UE is the Universe of Entities relevant to a UoD, SE(ERD) ⊆ UE is the set of entities within an ERD. absNDA: UERD → ℘(UA): ERD → SDA(ERD) where UA is the Universe of Attributes relevant to a UoD, SDA(ERD) ⊆ UA is the set of derived attributes within an ERD. absNCA: UERD → ℘(UA): ERD → SCA(ERD) where UA is the Universe of Attributes relevant to a UoD, SCA(ERD) ⊆ UA is the set of composite attributes within an ERD. absNMVA: UERD → ℘(UA): ERD → SMVA(ERD) where UA is the Universe of Attributes relevant to a UoD, SMVA(ERD) ⊆ UA is the set of multivalued attributes within an ERD. absNR: UERD → ℘(UR5): ERD → SR(ERD) where UR is the Universe of Relationships relevant to a UoD SR(ERD) ⊆ UR is the set of relationships within an ERD. absN1:NR: UERD → ℘(UR): ERD → S1:NR(ERD) where UR is the Universe of Relationships relevant to a UoD, S1:NR(ERD) ⊆ UR is the set of 1:N relationships within an ERD. absNM:NR: UERD → ℘(UR): ERD → SM:NR(ERD) where UR is the Universe of Relationships relevant to a UoD SM:NR(ERD) ⊆ UR is the set of M:N relationships within an ERD. absNBinaryR: UERD → ℘(UR): ERD → SBinaryR(ERD) where UR is the Universe of Relationships relevant to a UoD, SBinary(ERD) ⊆ UR is the set of Binary relationships within an ERD. absNN-aryR: UERD → ℘(UR): ERD → SNN-aryR(ERD) where UR is the Universe of Relationships relevant to aUoD, SNN-aryR(ERD) ⊆ UR is the set of N-ary relationships within an ERD. absNIS_AR: UERD → ℘(UR): ERD → SIS_AR(ERD) where UR is the Universe of Relationships relevant to a UoD, SIS_AR(ERD) ⊆ UIS_AR is the set of IS_A relationships within an ERD, An IS_A relationship is a child-parent pair within a generalisation/specialisation hierarchy belonging to an ERD. absNRefR: UERD → ℘(UR): ERD → SRefR(ERD) where UR is the Universe of Relationships relevant to a UoD, SRefR(ERD) ⊆ UR is the set of Reflexive relationships within an ERD. absNRR: UERD → ℘(UR): ERD → SRR(ERD) where UR is the Universe of Relationships relevant to aUoD, SRR(ERD) ⊆ UR is the set of relationships which are redundant within an ERD.

Table 5. Measurement abstractions for the ER diagram structural complexity metrics 6. Empirical validation As Miller (2000), Basili et al. (1999), Shull et al. (2002) among others, suggested, simple studies rarely provide definite answers. We are aware that only after performing a family of experiments can you build an adequate body of knowledge to extract useful conclusions. Following these suggestions, we have carried out a family of experiments, described in the next sections. 5

In the Universe of Relationships (UR) we consider both common relationships and generalisation

33

In order to describe the first experiment we use (with only minor changes) the experiment process proposed by Wholin et al. (2000) (see section 3.3.2.). 6.1. First Experiment This is the first experiment of a family of experiments, which we carried out as a preliminary study to corroborate our idea about the relationship between the structural complexity of ER diagrams, as measured by the metrics of section 4, and ER diagram maintainability. 6.1.1. Definition Using the GQM template for goal definition, the goal of the experiment is defined as follows: Analyse ER diagrams structural complexity metrics For the purpose of Evaluating With respect to The capability to be used as ER diagram maintainability indicators From the point of view of Requirements engineers, conceptual modellers, database designers In the context of Undergraduate Computer Science students and professors of the Software Engineering area at the University of Castilla-La Mancha. 6.1.2. Planning − Context selection. The context of the experiment is a group of undergraduate computer science students and staff members of the Software Engineering area, and hence the experiment is run off- line (not in an industrial software development environment). The subjects were nine staff members and seven students enrolled in the final-year of Computer Science in the Department of Computer Science at the University of Castilla-La Mancha in Spain. − The experiment is specific since it is focused on ER diagram structural complexity metrics. The ability to generalise from this specific context is further developed below when discussing threats to the experiment. The experiment addresses a real problem, i.e. what indicators can be used for the maintainability of ER diagrams? To this end it investigates the correlation between ER diagram structural complexity metrics and maintainability sub-characteristics.

relationships (also called IS_A relationships).

34

− Selection of subjects. The subjects were chosen for convenience, i.e. the subjects are undergraduate students and staff members of the Software Engineering area, that have experience in the design of ER diagrams. − Variables selection. The independent variable is the ER diagram structural complexity. The dependent variables are three maintainability sub-characteristics6: o Understandability: The capability of the ER diagram to be understood by it users (requirements engineers, conceptual modellers, database designers). o Analysability: The capability of the ER diagram to be diagnosed for deficiencies or for the identification of the parts to be modified. o Modifiability: The capability of the ER diagram to be changed when modifications are required. − Instrumentation. The objects used in the experiment were twenty four ER diagrams. The independent variable was measured via the metrics we proposed (see section 4). The dependent variables were measured according to the subjects’ rating about the understandability, analysability and modifiability. The dependent variables were all rated separately by each subject using an ordinal scale. − Hypothesis formulation. We wish to test the following hypotheses: o Null hypothesis, H0 : There is no significant correlation between the structural complexity metrics and the subject ratings of the maintainability subcharacteristics understandability, analysability and modifiability. o Alternative hypothesis, H1 : There is a significant correlation between the metrics and the subject ratings of the maintainability sub-characteristics understandability, analysability, modifiability. It must be noted that in the experiment we only considered the metrics NE, NA, NR, NM:NR, N1:NR, NNAry-R, NBinaryR, NIS_AR, because the others take the value zero in most of the ER diagrams we selected. − Experiment design. We selected a within-subjects design experiment, i.e. all the diagrams had to be rated by each of the subjects. The diagrams were put in a different order for each subject.

6

Even though understandability has not been considered as a maintainability sub-characteristic by the ISO 9126 standard (2001), we include it because a lot of work related to software quality exists that considers understandability to be a factor that influences maintainability (Fenton and Pfleeger, 1997; Briand et al. 2001b; Harrison et al., 2000).

35

6.1.3. Operation The operational phase is divided into three steps: preparation, execution and data validation. − Preparation. By the time the experiment was done, all of the students had taken two courses in Software Engineering and one course on Databases, in which they learnt in depth how to build ER diagrams. All the selected staff members had more than four years of experience in the design of ER diagrams. Moreover, subjects were given an intensive training session before the experiment took place. However, the subjects were not aware of the aspects we intended to study. Neither were they informed of the actual hypotheses stated. We prepared the material we handed to the subjects, consisting of a guide explaining the ER notation and twenty four ER diagrams. These diagrams were related to different universes of discourse that were general enough to be easily understood by each of the subjects. The structural complexity of each diagram was different, because we tried to cover a wide range of metric values (see table 6). Median of Median Median of ModifiaER Underof diagrams NE NA NR NM:NR N1:NR NN-AryR NBinaryR NIS_AR standability Analy- bility sability 5 19 5 2 3 0 5 0 2 2 2 1 12 44 11 3 8 2 9 4 3,5 4,5 4 2 6 23 7 0 7 0 7 2 4 4 4 3 9 33 11 5 6 0 11 2 5 5 5 4 13 15 11 8 3 0 11 5 2 2,5 3 5 6 16 4 4 0 1 3 3 3 4 4 6 6 16 40 4 0 1 3 3 2 2,5 3 7 11 33 8 1 7 0 8 4 4 5 4 8 6 45 7 0 7 0 7 0 3 3,5 4 9 4 2 3 0 3 0 3 0 1 2 2 10 9 31 6 1 5 0 6 2 3 3 4 11 6 17 3 0 3 0 3 2 2 2,5 3 12 12 11 5 4 1 0 5 9 3 4 4 13 10 35 11 4 11 1 10 0 3 3 3,5 14 8 29 6 0 6 0 6 2 3 3,5 4 15 6 29 4 4 0 2 2 0 3 3 3 16 6 34 5 2 3 0 5 0 3 3,5 4 17 6 15 5 0 5 0 5 0 2 2 2 18 6 16 4 2 2 0 4 0 2 2 2 19 8 19 5 5 0 0 5 3 3 3 3 20 5 10 3 2 1 0 3 1 3 3 3 21 9 26 5 2 3 0 5 4 2 3 3 22 8 12 5 3 2 0 5 4 3 4 4 23 11 32 7 1 6 0 7 4 4 4 4 24

Table 6. Metric values for each ER diagram in the first experiment

36

Each ER diagram had a test enclosed, which included a description of the maintainability sub-characteristics. Each subject had to rate each sub-characteristic using a scale consisting of seven linguistic labels. For example, the labels used for understandability are shown in table 7. Extremely difficult to understand

Very difficult A bit difficult to understand to understand

Neither difficult nor easy to understand

Quite easy to understand

Very easy to understand

Extremely easy to understand

Table 7. Understandability linguistic labels7 We chose seven linguistic labels because we considered they are enough to cover all the possible categories of our variable, understandability. The selection of an odd number of labels was made based on some suggestions provided in (Godo et al., 1989; Bonissone, 1982) where they justify that odd numbers contribute to obtain better results due to the fact that they are balanced − Execution. The subjects were given all the materials described in the previous paragraph. We explained to them how to carry out the tests (i.e. how to use the rating scales). We allowed one week to do the experiment, i.e., each subject had to carry out the experiment on his/her own, and could use unlimited time to solve it. We collected all the subjects’ ratings and the metric values for each ER diagram were automatically calculated by means of the MANTICA Tool (Genero et al., 2000a). − Data validation. After the ratings were collected, we checked for completeness. Given that a complete set of responses was obtained and all subjects had sufficient experience in building ER diagrams, we believed their subjective maintainability evaluations were reliable enough to serve the objectives of this preliminary study. 6.1.4. Analysis and interpretation First we summarised the data that was collected. We had the metric values calculated for each ER diagram (by the MANTICA tool) and we also calculated the median of the subject ratings for each maintainability sub-characteristic (see table 6). 6.1.4.1 Correlational analysis We first applied the Kolmogorov-Smirnov test to ascertain if the distribution of the data collected was normal or not. As the data were non-normal we decided to use a non7

In order to analyse the data collected in the experiment we assigned a number between 1 and 7 to each linguistic label. The worst case takes value seven (extremely difficult to understand), and the best case takes value one (extremely easy to understand).

37

parametric test like Spearman’s correlation coefficient, with a level of significance α = 0.05, which means the level of confidence is 95% (i.e. the probability that we reject H0, when H0 is false is at least 95%, which is statistically acceptable). Using Spearman’s correlation coefficient, each of the metrics was correlated separately to the median of the subject ratings of understandability, analysability and modifiability (see table 8). NE

NA

NR

NM:NR

N1:NR

NN-AryR

NBinaryR

NIS_AR

Understandability

0.5682 0.5266 0.4711

0.1188

0.4768

0.1100

0.5872

0.4067

Analysability

0.5600 0.4905 0.4646

0.1629

0.3742

0.0506

0.5595

0.4946

Modifiability

0.5866 0.7165 0.5934

0.0408

0.5879

0.0642

0.6899

0.3414

Table 8. Spearman’s correlation coefficient between the metrics and the median of the subjects’ ratings For a sample size of 24 (mean values for each diagram) and α = 0.05, the Spearman cutoff for accepting H0 is 0.38 (Briand et al., 1995; CUHK, 2002). Looking at table 8, we can conclude that:

− There is a high correlation between the median understandability ratings and the metrics NE, NA, NR, N1:NR, NBinaryR and NIS_AR, because the coefficients are above the cut-off, and the p-value < 0.05. So, for these metrics and understandability we reject the null hypothesis. − There is a high correlation between the median analysability ratings and the metrics NE, NA, NR, NBinaryR and NIS_AR, because the coefficients are above the cutoff, and the p-value < 0.05. For these metrics and analysability, H0 is rejected. − There is a high correlation between the median modifiability ratings and the metrics NE, NA, NR, N1:NR and NBinaryR, because the coefficients are above the cut-off, and the p-value < 0.05. The null hypothesis related to these metrics and modifiability is rejected. Observing the results obtained we can conclude that the metrics NE, NA, N1:NR, NBinaryR, and NIS_AR are significantly correlated with two or three of the maintainability sub-characteristics. The same conclusion cannot be drawn for the NM:NR and NN-AryR metrics. Given that the NN-AryR metric is zero for most of the 24 diagrams, we did not expect to find a significant correlation with the maintainability ratings (with hindsight we could also have omitted this metric from the experiment). The lack of correlation for the NM:NR metric is more difficult to interpret. Based on the results we could conclude that the contribution of M:N relationships to the structural

38

complexity of the ER diagrams used in the experiment is not related to their perceived maintainability, which is surprising given that the structural complexity due to other types of relationships (e.g. 1:N, IS_A) does seem to be related to maintainability. On the other hand, compared to the N1:NR metric, more diagrams take the value zero for NM:NR and the values are generally lower, so a possible effect (if there were any) is harder to demonstrate statistically. The results of this correlational analysis allow us to corroborate to some extent our suspicion that the structural complexity of ER diagrams is associated with their maintainability. Nevertheless, we realise that these findings can only be considered as preliminary, given the threats to this study’s validity as discussed below. 6.1.4.2. Principal Component Analysis Apart from the correlational analysis, we wished to investigate, by means of Principal Component Analysis (PCA), whether the metrics used in the experiment capture different dimensions of structural complexity. Three rotated principal components (PC) were obtained (see table 9), using an eigenvalue larger than 1.0 as a threshold for component selection. With these three rotated PCs 78.59% of the total variability can be explained (see table 10). Metrics NE NA NR NM:NR N1:NR NNAry-R NBinaryR NIS_AR

PC1 PC2 PC3 0.46702401 0.84115381 0.0541169 0.8510128 -0.10265826 0.24260479 0.00489821 0.15425342 0.68687582 -0.16563128 0.66335755 0.5165222 0.94368923 -0.07542907 -0.13011791 0.10328467 -0.10255609 0.85818971 0.4846038 -0.0069864 0.79286859 -0.11364914 0.8870457 -0.04147915

Table 9. Rotated components of the first experiment PCs PC1 PC2 PC3

Eigenvalue Percentage Accumulated Percentage 2.51256572 31.4070716 31.40707156 2.21981402 27.7476752 59.15474677 1.55556889 19.4446111 78.59935791

Table 10. Total variability explained by the PCs of the first experiment Looking at table 9, we can observe that the metrics with high loadings for PC1 refer to the structural complexity that is related to attributes (NA) and relationships (NBinaryR, N1:NR), most of which are from the binary 1:N type, as can be inferred from table 6. The second rotated PC seems to capture another dimension of structural complexity as the metrics with high loadings in PC2 refer to entities (NE and NIS_AR). Note that, although NIS_AR captures the structural complexity that is attributed to IS_A

39

relationships, its definition can also be interpreted as the number of subtypes of an entity. The NNAry-R metric shows a high loading on PC3, so it seems that the structural complexity that is attributed to complex relationships is another dimension of structural complexity. It should be noted though that the eigenvalue of PC3 is much lower than those of the other two components and that PC3 explains only 20% of the total variability, whereas both PC1 and PC2 explain each about 30%. Besides, the correlational analysis did not show any significant association between this aspect of structural complexity and maintainability. Given that we could as well have removed NNAry-R from our analysis (in the majority of diagrams used in the experiment there are no complex relationships), we believe it is not wrong to discard PC3 from further consideration. The metric NR is not included in any rotated component. This can probably be explained by the fact that a great deal of the relationships captured by this metric are already in PC1 (i.e. structural complexity related to binary 1:N relationships) and PC2 (i.e. structural complexity related to IS_A relationships). The fact that NM:NR does not show a high loading for any component can probably be explained by the absence of a significant correlation with the maintainability ratings, as explained before when discussing the correlational analysis. Summarising, based on the results of the PCA we may conclude that two of the rotated principal components can indeed be interpreted. There is one dimension of structural complexity, mainly captured by the NA, NBinaryR, and N1:NR metrics, that is related to the amount of attributes and binary 1:N relationships in an ER diagram. The other underlying dimension of ER diagram structural complexity mainly refers to the number of entities and sub-types present in the ER diagram. 6.1.5. Validity evaluation We will discuss the empirical study’s various threats to validity and the way we attempted to alleviate them: − Threats to conclusion validity. One issue that could affect the conclusion validity of this study is the size of the sample data (384 values, 24 diagrams and 16 subjects), which is perhaps not enough for both parametric and non-parametric statistical tests (Briand et al., 1995). We are aware of this, so we will consider the results of the experiment only as preliminary. Another threat to the conclusion validity is that some of the metrics considered do not cover a wide range of values or take the value zero. In that case it is harder to demonstrate that there is an effect.

40

− Threats to construct validity. The construct validity is the degree to which the independent and the dependent variables are accurately measured by the measurement instruments used in the study. The dependent variables we used are the maintainability sub-characteristics understandability, analysability and modifiability. We propose subjective metrics for them (using linguistic variables), based on the judgement of the subjects. As the subjects involved in this experiment have experience in ER design we think their ratings could be considered as being reliable. The construct validity of the measures used for the independent variables is guaranteed by Poels and Dedene’s framework (2000) used to validate them. (see section 5). − Threats to internal validity. Internal validity defines the degree of confidence in a cause-effect relationship between factors of interest and the observed results. The analysis performed here is correlational in nature. We have demonstrated that several of the metrics investigated had a statistically and practically significant relationship with maintainability sub-characteristics. Such statistical relationships do not demonstrate per se a causal relationship. They only provide empirical evidence of it. Only controlled experiments, where the metrics would be varied in a controlled manner and all other factors would be held constant, could really demonstrate causality. However, such a controlled experiment would be difficult to run since varying structural complexity in a system, while preserving its functionality, is difficult in practice. On the other hand, it is difficult to imagine what could be alternative explanations for our results besides a relationship between structural complexity and maintainability sub-characteristics. The following issues have been dealt with: o Differences betweensubjects. Using a within-subjects design, error variance due to differences between subjects is reduced. As Briand et al. (2001a) remark in software engineering experiments when dealing with small samples, variations in participant skills are a major concern that is difficult to fully address by randomisation or blocking. o Knowledge of the universe of discourse. The ER diagrams were from different universes of discourse but they are general enough to be easily understood by each of the subjects. This means that knowledge of the domain does not affect the internal validity. o Accuracy of subject responses. Subjects assumed the responsibility for rating each maintainability sub-characteristic. As they have experience in ER diagram design, we think their responses could be considered valid.

41

o Learning effects. The diagrams were put in a different order, to cancel out learning effects. Subjects were required to answer in the order in which diagrams appeared, though we had no effective control over this because the experiment was not performed in a class-room environment. o Fatigue effects. On average the experiment lasted for less than one hour, so fatigue was not very relevant. Also, the different order of the diagrams helped to cancel out these effects. o Persistence effects. In order to avoid persistence effects, the experiment was carried out by subjects who had never done a similar experiment. o Subject motivation. All the staff members who were involved in this experiment participated voluntarily, in order to help us in our research. We motivated students to participate in the experiment, explaining to them that similar tasks to the experimental ones could be done in exams or practice. o Other factors. Plagiarism and influence between subjects really could not be controlled. The subjects were told that talking to each other about the experiment was forbidden, but they did the experiment on their own without any supervision, so we had to trust them as far as that was concerned. We are conscious that this aspect could, to some extent, be a threat to the validity of the experiment, but at that moment it was impossible to join all the subjects together. − Threats to external validity. The greater the external validity, the more the results of an empirical study can be generalised to actual practice. Two threats to validity have been identified which limit the ability to apply any such generalisation: o Materials and tasks used. In the experiment we tried to use ER diagrams that could be representative of real cases. Related to the tasks, the judgement of the subjects is to some extent subjective, and does not represent a real task. So to alleviate this threat empirical studies taking “real cases” must be done. o Subjects. To solve the difficulty of obtaining professional subjects, we used staff members and advanced students from software engineering and database courses. We are aware that more experiments with practitioners and professionals must be carried out in order to be able to generalise these results. However, in this case, the tasks to be performed did not require high levels of industrial experience, so, experiments involving students can be considered appropriate (Basili et al., 1999). Moreover, students are the next generation of professionals, so they are close to the population under study (Kitchenham et al., 2002).

42

6.1.6. Presentation and package We have published in more detail this experiment in Genero et al. (2001a) and we have also put all of the experimental material on the web site http:\\alarcos.inf-cr.uclm.es in order to be replicated in other environments. 6.2. Second experiment As our preliminary (but limited) findings obtained in the first experiment were encouraging, we decided to carry out another experiment trying to improve the design of the previous one. We were looking particularly for more objective tasks to be solved by the subjects, because we realise that the obtained ratings of maintainability subcharacteristics are subjective and could have been influenced by the specific experience that the subjects of the first experiment had with ER diagrams. Another improvement was to introduce a more controlled experiment environment. In the second experiment all the participants were placed in the same room and we had someone supervise the execution of the tasks. As the majority of the steps of the second experiment process are identical to those of the first experiment, we will focus in our discussion on those issues which are different: − The subjects were forty students enrolled in the third year of Computer Science in the Department of Computer Science at the University of Castilla-La Mancha in Spain. − The independent variable is ER diagram structural complexity and was measured through the metrics we proposed (see section 4). − The dependent variables are two maintainability sub-characteristics: understandability and modifiability which were measured using time values. The time needed to understand an ER diagram is used as a measure of understandability and the time necessary to modify an ER diagram is used as a measure of modifiability. We did not further distinguish between the time to discover which modification needs to be done and the time to do it, so analysability was not separated from modifiability in this experiment. Our assumption underlying the choice of time as our measurement instrument for the dependent variables is that for the same problem or question and for the same modification task, the faster an ER diagram can be understood and the faster an ER diagram can be modified, the more understandable and modifiable the ER diagram is. − We wish to test the following hypotheses: o Null hypothesis, H0: There is no significant correlation between the structural complexity measures and understandability and modifiability time.

43

o Alternative hypothesis, H1: There is a significant correlation between the structural complexity measures and the understandability and modifiability time. − It must be noted that in the experiment we only considered the metrics NE, NA, NR, NM:NR, N1:NR and NBinaryR because the others take the value zero in most of the ER diagrams we selected. Moreover, in this experiment all relationships were binary, so the NR and NBinaryR values are the same for each diagram. − The material we handed to the subjects, consisted of a guide explaining the ER notation, and four ER diagrams. These diagrams were related to different universes of discourse that were general enough to be easily understood by each of the subjects. The structural complexity of each diagram is different because, as table 11 shows, the values of the measures are different for each diagram. ER diagrams 1 2 3 4

NE

NR, NA NBinaryR 3 5 8 5

2 6 10 6

11 15 21 13

N1:NR

0 1 4 1

NM:NR

2 5 6 5

Mean of Mean of Understandability Modifiability time (minutes) time (minutes) 3.9674 4.3225 3.9677 4.1290 4.25804 4.2903 4.2903 4.4838

Table 11. Measure values for each ER diagram in the second experiment – Each diagram had a test enclosed which included two parts:

o Part 1. A questionnaire in order to evaluate if the subjects really understood the content of the ER diagram. Each questionnaire contained exactly the same number of questions (five) and the questions were conceptually similar and in identical order. Each subject had to write down the time spent answering the questionnaire, by recording the initial time and final time. The difference between the two is what we call the understandability time (expressed in minutes). o Part 2. Two new requirements for the ER diagram. The subjects had to modify the ER diagram according to these new requirements, again writing down the initial time and the final time. The difference between these two times is what we called modifiability time, which includes both the time spent analysing what modifications had to be done and the time needed to perform them. − We allowed one hour to do all the tests. Each subject had to work alone. In case of doubt, they could only consult the supervisor who organised the experiment. − We collected all the data, including the initial and final time of the first part of the test, and the initial and final time of the second part related to the modifications tasks. − As shown in table 11 we also calculated the mean of the understandability time and the mean of the modifiability time.

44

− We discarded the tests of 9 subjects, because they included an incorrect answer or a required modification that was done incorrectly. Therefore, we took into account the responses of 31 subjects. 6.2.1. Correlational Analysis We want to test the hypothesis formulated above using the data presented in table 11. First, we applied the Kolmogorov-Smirnov test to ascertain if the distribution of the data collected was normal or not. As the data are normal and all measurement instruments use ratio scales, we decided to use a parametric test like Pearson´s correlation coefficient, with a level of significance α = 0.05, which means the level of confidence is 95% (i.e. the probability that we reject H0 when H0 is false is at least 95%, which is statistically acceptable). Using Pearson´s correlation coefficient, each of the measures was correlated separately to the mean of understandability and the modifiability time (see table 12). NE Understandability time Modifiability time

NA

0.7168

NR, NBinaryR 0.5588 0.7168

0.7246

0.5508

0.7246

N1:RN

NM:NR

0.7168

0.7168

0.7246

0.7246

Table 12. Pearson´s correlation coefficients between the measures and the understandability and modifiability time (all values are significant) For a sample size of 4 (mean values for each diagram) and α = 0.05, the Pearson cutoff for accepting H0 is 0.68 (Briand et al., 1995; CUHK, 2002). Except for the NA metric, the computed correlation coefficients (see table 12) are all above the cutoff and the corresponding p-values are less than 0.05. Therefore, for all metrics except NA, the null hypothesis H0 is rejected. Given these results, we can conclude that the ER diagram structural complexity metrics, except NA, are correlated to the understandability and modifiability time. 6.2.2. Principal Component Analysis As we wanted to demonstrate the non-redundancy of the metrics in our structural complexity metrics suite, we decided to perform a PCA. We obtained only one rotated PC (see table 13) which explained 94.75% of the total variability (see table 14). This result can probably be explained by the fact that only four diagrams were considered, which do not cover a wide range of metrics values. So the only conclusion that can be drawn from this PCA is that all metrics capture the same underlying dimension of structural complexity. However, we believe that this result is not so relevant (as compared to the PCAs of the other two experiments), given that the number of data points (only 4 per metric) poses a

45

serious threat to the conclusion validity of this PCA. We therefore discard the results from this PCA from further consideration.

Metrics NE NR NA NBinaryR N1:NR NM:NR

PC1 0.972068 0.99682149 0.6746546 0.99682149 0.96410852 0.9155161

Eigenvalue Percentage 5.68539195 94.7565324

Accumulated percentage 94.75653242

Table 14. Total variability explained by the PCs of the second experiment

Table 13. Rotated components in the second experiment 6.3. Third experiment After carrying out the previous two experiments we detected some more issues that to some extent could be improved. Even though the second experiment included more objective measures for the dependent variable, we only used 4 ER diagrams, which is too few to obtain conclusive results. So in this third experiment we used nine ER diagrams. − The subjects who carried out this third experiment were thirty-five students enrolled in the third year of Computer Science in the Department of Computer Science at the University of Castilla-La Mancha in Spain. − We considered again as the independent variable the ER diagram structural complexity measured via our metrics, and as dependent variables, three maintainability sub-characteristics: understandability, analysability and modifiability, measured via the judgement of the subjects (an ordinal scale), according to seven linguistic labels (see the first experiment). Understandability was also measured using time values. We called this understandability time. Our assumption here is that for the same problem or question, the faster an ER diagram can be understood, the more understandable the ER diagram is. − On one hand we wanted to test the following hypotheses: o Null hypothesis, H0: There is not a significant correlation between the structural complexity metrics and the subjects’ rating of maintainability subcharacteristics, such as understandability, analysability and modifiability. o Alternative hypothesis, H1: There is a significant correlation between the structural complexity metrics and the subjects’ rating of maintainability subcharacteristics, such as understandability, analysability and modifiability. And on the other hand, the following: o Null hypothesis, H0: There is not a significant correlation between the structural complexity metrics and understandability time. o Alternative hypothesis, H1: There is a significant correlation between the structural complexity metrics and understandability time. − As the values of some metrics take the value zero in most of the ER diagrams, we only take into account the metrics NE, NA, NR, NM:NR, N1:NR, NN-AryR, NBinaryR and NRefR.

46

− The material we handed to the subjects was a guide explaining the ER notation and nine ER diagrams related to the same universe of discourse (Stock Exchange). We considered a wide range of metric values, which means that the diagrams had different structural complexity (see table 16).

ER diagrams 1 2 3 4 5 6 7 8 9

METRIC VALUES

Median of subjects’ rating

NE NA NR N1:NR NM:NR NBinaryR NN-AryR NRefR F1 F2 F3 2 2 6 2 0 2 0 0 2 3 3 5 15 5 5 0 5 0 0 3 3 3 8 27 9 9 0 9 0 0 4 4 4 11 45 15 12 3 13 2 3 4 5 5 12 38 7 5 2 5 2 0 4 4 4 13 54 17 14 3 15 2 3 5 5 5 7 30 5 5 0 4 1 0 3 3 4 13 55 17 14 3 15 2 3 5 5 5 15 41 9 6 3 7 2 0 4 5 5

Mean of the Understandability time 3.20 3.40 4.18 4.25 3.00 4.60 3.67 4.80 4.26

Table 16. Metrics values for each ER diagram in the third experiment

− Each diagram had a test enclosed which included two parts: o Part 1. A questionnaire to evaluate if the subjects really understood the content of the ER diagrams. Each questionnaire contained exactly the same number of questions (five) and the questions were conceptually similar. Each subject had to write down the time spent answering the questionnaire, by recording their starting and finishing time. The difference between the two is what we called the understandability time (expressed in minutes). o Part 2. A description of each of the three maintainability sub-characteristics we considered. Each subject had to rate each sub-characteristic using a scale consisting of seven linguistic labels. − The subjects had to solve the tests alone, in no more than one hour. Any doubt could be solved by the person who monitored the experiment. − We collected all the data, including the starting and finishing time of the first part of the test, and the subject ratings of understandability, analysability and modifiability of the second part of the test. − We collected all the tests, checking if they were complete and if the responses were correct. We discarded the test of 8 subjects because they included an incorrect answer. Therefore, we took into account the responses of 27 subjects. The subjective

47

evaluation of understandability, modifiability and analysability was considered reliable, because the subjects had enough experience to do the kind of task required. − We summarised subject responses in a single table (see table 16) with 9 records (one record for each ER diagram) and 17 fields (12 metrics and 3 maintainability subcharacteristics: F1 = median of understandability (subject´s rating), F2 = median of analysability (subject´s rating) and F3 = median of modifiability (subject´s rating) and the mean of the understandability time). 6.3.1. Correlational Analysis The data shown in table 16 was analysed to test the hypotheses stated above. First, we applied the Kolmogorov-Smirnov test to ascertain if the distribution of the data collected was normal or not. As the data is not normal and all measurement instruments use ratio scales, we decided to use a non-parametric test like Spearman’s correlation coefficient (α = 0.05). The obtained Spearman’s correlation coefficients are shown in table 17.

Median of Understandability Median of Analysability Median of Modifiability Mean of Understandability Time

NE 0.847 0.886 0.877 0.811

NA 0.896 0.891 0.935 0.900

NR N1:NR 0.907 0.915 0.92 0.855 0.848 0.846 0.936 0.961

M:N NBINR NN-AryR NRefR 0.794 0.930 0.737 0.722 0.952 0.867 0.827 0.732 0.927 0.813 0.857 0.732 0.867 0.974 0.726 0.821

Table 17. Spearman’s correlation coefficients between the metrics and understandability, analysability, and modifiability ratings and the understandability time Looking at table 17, and considering that for a sample size of 9 (median or mean values for each diagram) and α = 0.05, the Spearman cutoff for accepting H0 is 0.66 (Briand et al., 1995; CUHK, 2002), we conclude that: − There is a high correlation (rejecting First hypothesis-H0) between understandability, analysability and modifiability and the metrics NE, NA, NR, N1:NR, NM:NR, NBinaryR, NN-AryR, NRefR because the correlation coefficient is greater than 0.66. − There is a high correlation (rejecting Second hypothesis-H0) between understandability time and the metrics NE, NA, NR, N1:NR, NM:NR, NBinaryR, NN-AryR and NRefR, because the correlation coefficient is greater than 0.66. Moreover, we also wanted to test whether there is a relationship between the subjective evaluation given by the subjects for understandability and the understandability time. Spearman’s correlation coefficient obtained in this case was 0.85, which means that the

48

subjective evaluation for each ER diagram given by the subjects was also reflected in the time they spent in understanding the diagram. 6.3.2. Principal Components Analysis After carrying out the correlational analysis, we undertook the PCA, as in the previous experiments. Two rotated PCs were obtained (see table 18) using an eigenvalue of 1.0 as a threshold for component selection. With these PCs 94.60% (see table 19) of the total variability is explained.

Metrics NE NA NR N1:NR NM:NR NBinaryR NN-AryR NRefR NIS_AR

PC1 0.42246669 0.70803434 0.93675389 0.96018802 0.57962093 0.94899769 0.4543879 0.95993016 -0.48471617

PC2 0.88481206 0.65675211 0.29087072 0.19927711 0.77722392 0.25263205 0.83237761 0.14120844 0.8466105

Table 18. Rotated components of the third experiment PCs PC1 PC2

Eigenvalue 5.07870068 3.43597448

Percentage 56.4300075 38.1774942

Accumulated Percentage 56.43000752 94.60750174

Table 19. Total variability explained by the PCs of the third experiment The metrics with high loadings for the first rotated principal component all refer to relationships. As opposed to PC1 in the PCA of experiment 1, also the NR and NRefR metrics are included here. The NM:NR metric shows a high loading for neither this component, nor the other, probably for the same reason as in the first experiment. The second rotated PC refers to entities and subtypes (i.e. the NE and NIS_AR metrics), as in the PCA of the first experiment, but now also includes the NN_AryR metric. We cannot really explain this last result, but it should be noted that the variance in the NN_AryR values in the diagrams is rather low, so chances are high that the high loading of this metric for PC2 is accidental.

49

Although the results of this PCA confirm the interpretation of the rotated components of the first experiment, there is one significant difference which is that the structural complexity due to the number of attributes in an ER diagram is not captured here in any of the dimensions. Given that the NA does have a significant correlation with all maintainability measurements, we cannot really explain this result. 6.4. Summary of the findings In this section we summarise the findings of the three experiments we carried out. Analysing table 20 it must be noted that, although we wanted to define and validate a full set of metrics, capturing most of the constructs used for ER modeling, we could not use in the experiment the metrics NCA, NMVA, NDA and NRR. On the one hand, this is due to the fact that the majority of tools available for modelling ER diagrams do not incorporate composite attributes, multivalued attributes and derived attributes. On the other hand, redundancy of relationships is difficult to detect without a complete understanding of the problem modelled by the ER diagram. The metrics NE, NA, NR, N1:NR, NM:NR, NBinaryR, NN-AryR, NRefR and NI_ASR capture constructs that are represented in the ER diagrams selected for the experiments. Some of them, however, could not be used in one (i.e. NN-AryR) or even two (i.e. NRefR, NIS_AR) experiments, due to a lack of discriminative power, meaning that they have the value zero for many of the diagrams used as objects in these experiments.

Controlled experiments First Experiment Second experiment Third experiment

Considered metrics NE, NA, NR, NM:NR, N1:NR, NN-AryR, NBinaryR, NIS_AR NE, NA, NR, NM:NR, N1:NR, NBinaryR NE, NA, NR, NM:NR, N1:NR, NN-AryR, NBinaryR, NRefR

Metrics validated NE, NA, NR, N1:NR, NBinaryR, NIS_AR NE, NR, NM:NR, N1:NR, NBinaryR NE, NA, NR, NM:NR, N1:NR, NN-AryR, NBinaryR, NRefR

Table 20. Summary of the empirical studies related to the metrics for ER diagrams

From the third column of table 20, we can see that all the metrics considered in the experiments are to some extent (and some more often and more clearly than others) correlated with the maintainability sub-characteristics as measured with our rating and timing measurement instruments. Based on these results we may conclude that the aspects of ER diagram structural complexity that are captured by the metrics proposed in this paper, are associated with the maintainability of these diagrams. Generally, the higher the structural complexity of an ER diagram, the longer it takes to modify the

50

diagram (experiment 2), the longer it takes to understand the diagram (experiments 2 and 3), and the higher the perceived difficulty of maintaining the diagram (experiments 1 and 3). The goal of the principal component analyses that we conducted was to identify underlying dimensions of ER diagram structural complexity and to demonstrate the non-redundancy of the metrics in our metrics suite. We believe that we partly accomplished this goal. As discussed before, the PCAs of experiments 1 and 3 confirm each other. There seems to be a dimension focused on entities (as captured by the NE and NIS_AR metrics) and another dimension focused on relationships, mainly of the binary 1:N type. It is interesting to note that these empirical results can be backed up by theoretical research in the field of software engineering measurement. Some authors like Briand et al. (1996; 1997), Whitmire (1997 and Poels and Dedene (1997) proposed an axiomatic model (see also our discussion of property-based approaches to theoretical metric validation in section 3.2) that distinguishes several internal properties of software artefacts. Included amongst these properties are size and complexity. It is interesting to note that, according to these authors, the size of a software artefact depends on the number of elements composing the artefact, whereas the complexity of the artefact depends on the number of relationships between the composing elements. In that sense, we can interpret the first rotated principal component, explaining most variability, as a complexity dimension of structural complexity, and the second component as a size dimension. This interpretation is also consistent with the Systems Theory-based definition of structural complexity used in this paper (see section 4.2) as being related to the number of elements and the number of relationships. The role of attributes as a distinct aspect of ER diagram structural complexity is still unclear and in fact not substantiated by the empirical data that we gathered. As far as the non-redundancy of the metrics is concerned, the empirical evidence at hand is less conclusive. On the one hand, it is clear that a suite of metrics for ER diagram structural complexity should include metrics focused on entities and metrics focused on relationships, as they capture different dimensions of structural complexity associated with maintainability. On the other hand, the principal component analyses also showed that some metrics, like NE and NIS_AR or like NBinaryR and N1:NR, can be considered as redundant metrics, given that they are related to the same underlying dimension of structural complexity. The correlation analyses and principal component analyses that we performed in our family of experiments suggest that when building maintainability prediction models, it is advised to include at least one metric related to the number of entities and at least one

51

other metric related to the number of binary one-to-many relationships. Whether to include metrics for attributes, complex relationships, many-to-many relationships and reflexive relationships depends on the nature of the ER diagrams, i.e. whether they include these constructs or not. 7. Conclusions The main contribution of this paper is the definition of a set of metrics for measuring the structural complexity of ER diagrams in a methodological way pursuing the goal of obtaining metrics that can be used as ER diagram maintainability indicators. These measures are theoretically valid because they were defined following the DISTANCE framework (Poels and Dedene, 2000), which means that, from a measurement theory point of view, they really measure the attribute they purport to measure. Moreover, the use of DISTANCE guarantees that the metrics can be used as ratio scale measurement instruments, justifying the use of parametric statistics such as Pearson’s correlation coefficient as in the second experiment. Performing empirical validation with metrics is fundamental in order to demonstrate their practical utility as indicators of ER diagram maintainability. In the paper we have summarized a family of three controlled experiments with the aim of corroborating our hypothesised correlation between the proposed metrics and measures of the maintainability sub-characteristics analysability, understandability and modifiability. The results of these experiments provide empirical evidence that the structural complexity aspects measured by the metrics NE, NA, NR, N1:NR, NM:NR, NBinaryR, NN-AryR, NRefR and NIS_AR are associated to some extent with these maintainability sub-characteristics. Our conclusion is strongest for the metrics NE, NR, NBinaryR, and N1:NR, as they show a significant correlation with maintainability measurements in all three experiments. By means of principal component analysis it was shown that the metrics included in the metrics suite capture at least two distinct underlying dimensions of structural complexity, one related to entities, the other related to relationships. This result can be used to decide on which metrics to include in a maintainability prediction model. On the other hand, the experimental data that we gathered also showed that there are redundant metrics in our metrics suite. Although the results we obtained are encouraging, we should be very careful to consider these results as conclusive. For instance, as several authors have mentioned (Briand et al., 1998; Briand and Wüst, 2002) PCA results are very dependent on the data at hand. Moreover, further empirical validation, including internal and external replication of the experiments presented in this paper, is needed. We also need to carry out additional

52

empirical studies, for instance with practitioners instead of students, in order to extend our “family” of experiments and to further build up the cumulative knowledge on ER diagram structural complexity and maintainability. Finally, data related to “real projects” is also needed to strengthen the evidence that these metrics can be used as ER diagram maintainability indicators. Some of the changes that could be made to improve the experiments presented here are: − Increase the size of the ER diagrams. By increasing the size of the ER diagrams we could have examples that are closer to reality. − Work with professionals in order to obtain more generalisable results − Increase the difference between the values of metrics. This option could lead to more conclusive results about the metrics and their relationship with the factor we are trying to control. − Carry out the experiment in a more controlled environment, using a modeling tool and not ER diagrams written on paper. As our final goal is to predict ER diagram maintainability once we have more empirical evidence about the relation between our metrics and ER diagram maintainability through more empirical studies we will focus our research on building an ER diagram maintainability prediction model. In previous studies we have applied techniques borrowed from artificial intelligence based on fuzzy logic (Olivas and Romero, 2000; Delgado et al., 2001) which demonstrated that they are also useful for prediction purposes in software engineering (Genero et al., 2001a; Piattini et al., 2001). As multivariate analysis has been suggested to be used for prediction purposes (Briand and Wüst, 2002) we plan to build an ER diagram maintainability prediction model using this technique. Moreover, it could be very interesting to compare the prediction accuracy of the models previously obtained (using techniques based on fuzzy logic) and those obtained by multivariate analysis. As several authors remark (El-Emam, 2001; De Champeaux, 1997; French. 1999) the practical utility of metrics would be enhanced if meaningful thresholds could be identified. However a recent series of studies which looked at thresholds for objectoriented metrics (Benlarbi et al., 2000; El-Emam, 2000; Galsberg et al., 2000) demonstrate that there are no threshold values for contemporary object-oriented metrics. To our knowledge studies trying to find thresholds values for ER diagram metrics do not exist, so we believe that could be a good issue to tackle in the near future. We cannot disregard the increasing diffusion of the object-oriented paradigm in conceptual modelling. Modern approaches towards object-oriented system development like Catalysis (D´Souza and Wills, 1999) and Rational Unified Process (Rational,

53

1998) consider conceptual modelling as a key step along the development life-cycle. Object-Oriented models are really more suitable than ER diagrams to describe the kind of IS built nowadays. Regarding OO models, we have been working on measures for UML class diagrams (Genero. et al., 2000b; Genero et al., 2001b; Genero, 2002; Genero et al., 2002a,2002b). Acknowledgements This research is part of the DOLMEN (TIC 2000-1673-C06-06) and the CALDEA (TIC 2000-1673-C06-06) projects, financed by Subdirección General de Proyectos de Investigación, - Ministerio de Ciencia y Tecnología (Spain). References 1. Abrial J. (1974). Data Semantics. IFIP TC2 Conference, North Holland, Amsterdam. 2. Babbie E. (1990). Survey research methods, Wadsworth. 3. Bandi R., Vaishnavi V. and Turk. D. (2003). Predicting Maintenance Performance Using Object-Oriented Design Complexity Metrics, IEEE Transactions on Software Engineering, 29(1), 77-87. 4. Basili V. and Rombach H. (1988). The TAME project: towards improvementoriented software environments. IEEE Transactions on Software Engineering. 14(6). 728-738. 5. Basili V. and Weiss D. (1984). A Methodology for Collecting Valid Software Engineering Data. IEEE Transactions on Software Engineering. 10. 728-738. 6. Basili V., Shull F. and Lanubile F. (1999). Building knowledge through families of experiments. IEEE Transactions on Software Engineering. 25(4). 435-437. 7. Benlarbi S., El-Emam K., Goel N. and Rai S., 2000. Thresholds for Object-Oriented Measures. NRC/ERB 1073. 8. Boehm B. (1981). Software Engineering Economics. Prentice-Hall. 9. Boehm, B., Clark, B., Horowitz, E.,Madachy, R., Shelby, R. and Westland, C. (1995). Cost Models for Future Software Life Cycle Processes: COCOMO 2.0, Annals of Software Engineering. 10. Bonissone P. (1982). Fuzzy Sets Based Linguistic Approach: Theory and Applications. Approximate Reasoning in Decision Analysis, M. M. Gupta and E. Sanchez (eds.), North-Holland Publishing Company, 329-339. 11. Briand L.. El Emam K. and Morasca S. (1995). Theoretical and empirical validation of software product measures. Technical report ISERN-95-03. International Software Engineering Research Network.

54

12. Briand L.. Morasca S. and Basili V. (1996). Property-Based Software Engineering Measurement. IEEE Transactions on Software Engineering. 22(1). 68-86. 13. Briand L.. Morasca S. and Basili V. (1997). Response to: comments “PropertyBased Software Engineering Measurement”: Refining the additivity properties. IEEE Transactions on Software Engineering. 22(3). 196-197. 14. Briand L., Wüst J., Lounis H. (1998). Replicated Case Studies for Investigating Quality Factors in Object-oriented Designs. Technical report ISERN 98-29 (version 3), International Software Engineering Research Network 15. Briand L., Arisholm S., Counsell F., Houdek F. and Thévenod-Fosse P. (1999a). Empirical Studies of Object-Oriented Artifacts. Methods. and Processes: State of the Art and Future Directions. Empirical Software Engineering. 16. Briand L., Bunse C. and Daly J. (2001a). A Controlled Experiment for evaluating Quality Guidelines on the Maintainability of Object-Oriented Designs. IEEE Transactions on Software Engineering, 27(6), 513-530. 17. Briand, L., Melo, W., Wüst, J.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Transactions of Software Engineering, 28(7), (2001b) 706-720. 18. Briand L., Morasca S. and Basili V. (2002). An Operational process for goal-driven definition of measures. IEEE Transactions on Software Engineering, 30(2), 120140. 19. Briand L. and Wüst J. (2002). Empirical Studies of Quality Models in ObjectOriented Systems. Advances in Computers, Academic Press, Zelkowitz (ed.), 59, 97-166. 20. Brito e Abreu F., Zuse H.. Sahraoui H. and Melo W. (1999). Quantitative Approaches in Object-Oriented Software Engineering. Object-Oriented technology: ECOOP´99 Workshop Reader. Lecture Notes in Computer Science 1743. SpringerVerlag. 326-337. 21. Calero. C.. Piattini. M. and Genero. M.. 2001. Empirical validation of referential integrity metrics. Information and Software Technology 43. 949-957. 22. Campbell D. and Stanley J. (1963). Experimental and quasi-experimental designs for research. Boston, Houghton Mifflin Co. 23. Cook T. and Campbell D. (1979). Quasi-experimentation: design and analysis issues for field settings. Boston, Houghton Mifflin Co. 24. Chen P. (1976). The Entity-Relationship Model: Toward a Unified View of Data. ACM Transactions on Database Systems. 1(1). 9-37. 25. Judd C., Smith E. and Kidder L. (1991). Research methods in social relations, Sixth edition, Orlando, Harcourt Brace Jovanovich.

55

26. CUHK - Chinese University of Honk Kong - Department of Obstetrics and Gynaecology –: Available: http://department.obg.cuhk.edu.hk/ResearchSupport/Minimum_correlation.asp (Last visited on July 22nd. 2002) 27. De Champeaux, D. (1997). Object-oriented development process and metrics. Upper Saddle River, Prentice-Hall. 28. D´Souza D. and Wills A. (1999). Objects. Components and Frameworks with UML: the Catalysis Approach. Addison-Wesley. 29. Delgado M., Gómez Skarmeta A. and Jiménez L. (2001). A Regression Methodology to Induce a Fuzzy Model. International Journal of Intelligent Systems, 16, 169-190. 30. Dunteman G. (1989). Principal Component Analysis. Sage University Paper 07-69, Thousand Oaks, CA. 31. Dumke R. (1998). Softwaremetrie: Grundlagen und Ziele. research report. Otto-vonGuericke Universität. Magdeburg. Germany. 32. Eick C. and Lockermann (1985). Acquisition of Terminological Knowledge using Database Design Techniques. ACM-SIGMOD Conference on Management of Data. 84-94. 33. Eick C. and Raupp T. (1991) Towards a Formal Semantics and Inference Rules for Conceptual Data Models. Data and Knowledge Engineering. 34. Eick C. (1991). A Methodology for the Design and Transformation of Conceptual Schemas. 17th International Conference on Very Large Data Bases. Barcelona. 2534. 35. El-Emam K., Benlarbi S., Goel N., Melo W., Lounis H. and Rai S., 2000. The optimal class size for Object-Oriented Software: a replicated study. NRC/ERB 1074. 36. El-Emam K., 2001. Object-Oriented Metrics: A Review on Theory and Practice, NRC/ERB 1085, National Research Council Canada.Ellis B. (2001). Basic Concepts of Measurement. Cambridge University Press. 37. Ellis, B. (1968). Basic Concepts of Measurement, Cambridge University Press. 38. Elmasri R. and Navathe S. (1994). Fundamentals of Database Systems. Second edition. Addison-Wesley. Massachussets. 39. Fenton N. (1994). Software Measurement: A Necessary Scientific Basis. IEEE Transactions on Software Engineering. 20(3). 199-206. 40. Fenton N. and Pfleeger S. (1997). Software Metrics: A Rigorous Approach. 2nd. edition. London. Chapman & Hall.

56

41. Fioravanti. F. and Nesi. P.. Estimation and Prediction Metrics for Adaptive Maintenance Effort of Object-Oriented Systems. IEEE Transactions on Software Engineering. 27(12). 2001. pp. 1062-1083. 42. French V., 1999. Establishing Software metric Thresholds. International Workshop on Software Measurement (IWSM´99). 43. Galsberg D., El-Emam K., Melo W., Machado J. and Madhavji N., 2000, Empirical Validation of Object-Oriented Design Measures (submitted for publication). 44. Genero M.. Piattini M.. Rincón-Cinca J. and Serrano M. (2000a). MANTICA: Una Herramienta de Métricas para Modelos de Datos. CACIC 2000. Usuahia. Argentina. (in Spanish). 45. Genero M., Olivas J., Piattini M. and Romero F. (2001a). Knowledge Discovery For Predicting Entity Relationship Diagram Maintainability. SEKE 2001. Argentina. Proceedings. Knowledge Systems Institute. 203-211. 46. Genero M., Olivas J., Piattini M. and Romero F. (2001b). Using metrics to predict OO information systems maintainability. CAISE 2001. Interlaken Switzlerland. Lecture Notes in Computer Science (LCNS) 2068. Dittrich. K.. Geppert. A. And Norrie. M.C. (eds.) Springer-Verlag. 388-401. 47. Genero M., Poels G. and Piattini M. (2002a). Defining and Validating Measures for Conceptual Data Model Quality. CAISE 2002. Toronto. Canada. LNCS 2348. (eds.) Springer-Verlag. 724-727. 48. Genero M.. Olivas J.. Piattini M. and Romero F. (2002b). Assessing object-oriented conceptual models maintainability. 1st International Workshop on Conceptual Modeling Quality (IWCMQ’02). Tampere. Finland. LNCS Sringer Verlag. (to appear). 49. Godo L., López de Mántaras R., Sierra C. and Verdaguer A. (1989). MILORD: the architecture and management of linguistically expressed uncertainty. International Journal of Intelligent Systems, 4, 471-501. 50. Gray R., Carey B., McGlynn N. and Pengelly A. (1991). Design metrics for database systems. BT Technology. 9(4). 51. Hammer M. and McLeod D. ( 1981). Database Description with SDM: A Semantic Database Model. ACM TODS. 6(3). 351.386. 52. Hanebutte N., Taylor C. and Reiner D. 2003. Techniques of Successful Application of Factor Analysis in Software Measurement. Empirical Software Engineering, 8, 43-57. 53. Harrison R.. Counsell S. and Nithi R. (2000). Experimental Assessment of the Effect of Inheritance on the Maintainability of Object-Oriented Systems. The Journal of Systems and Software. 52. 173-179. 54. Ince D. and Shepperd M. (1991). Algebraic validation of software metrics. ESEC

57

1991. 55. ISO 9126. (2001). Software Product Evaluation-Quality Characteristics and Guidelines for their Use. ISO/IEC Standard 9126. Geneva. 56. Juristo, N. and A.Mª., Moreno. 2001. Basics of Software Engineering Experimentation. Kluwer Academic Publishers, Boston. 57. Kesh S. (1995). Evaluating the Quality of Entity Relationship Models. Information and Software Technology. 37 (12). 681-689. 58. Kitchenham B., Pfleeger S. and Fenton N., 1995. Towards a Framework for Software Measurement Validation, IEEE Transactions on Software Engineering, 21(12), 929943. 59. Kitchenham B, Pfleeger S., Pickard L., Jones P., Hoaglin D., El Emam K. and Rosenberg J., 2002. Preliminary Guidelines for Empirical Research in Software Engineering, IEEE Transactions on Software Engineering, 28(8), 721-734. 60. Krantz D., Luce R.D., Suppes P. and Tversky A. (1971). Foundations of Measurement. Vol. 1. Academic Press. New York. 61. Krogstie J., Lindland O. and Sindre G. (1995). Towards a Deeper Understanding of Quality in Requirements Engineering. Proceedings of the 7th International Conference on Advanced Information Systems Engineering (CAISE). Jyvaskyla. Finland. 82-95. 62. Li W. and Henry S. (1993). Object-Oriented Metrics that Predict Maintainability. Journal of Systems and Software. 23(2). 111-122. 63. Lott C. and Rombach H. (1996). Repeatable Software Engineering Experiments for Comparing Defect-Detection techniques. Journal of Empirical Software Engineering, 1(3), 241-277. 64. Lindland O.. Sindre A. and Solvberg A. (1994). Understanding Quality in Conceptual Modeling. IEEE Software. March. 11(2). 42-49. 65. Miller J. (2000). Applying Meta-Analytical Procedures to Software Engineering Experiments”, Journal of Systems and Software, 54, 29-39. 66. Moody D. (2000). Metrics for improving the quality of entity relationship models: an empirical investigation. University of Melbourne Information Systems Working Paper. Department of Information Systems. University of Melbourne. (under submission). 67. Moody L. (1998). Metrics for Evaluating the Quality of Entity Relationship Models. Proceedings of the Seventeenth International Conference on Conceptual Modelling (E/R ´98). Singapore. November 16-19. 213-225.

58

68. Moody L. and Shanks G. (1994). What Makes A Good Data Model? Evaluating The Quality of Entity Relationships Models. Proceedings of the 13th International Conference on Conceptual Modelling (ER ´94). Manchester. England. 94-111 69. Moody L., Shanks G. and Darke P. (1998). Improving the Quality of Entity Relationship Models – Experience in Research and Practice. Seventeenth International Conference on Conceptual Modelling (ER ´98), Singapore, 255-276. 70. Muller R. (1999). Database design for smarties. Using UML for data modelling. San Francisco. Morgan Kaufmann. 71. Olivas J. and Romero F. (2000). FPKD. Fuzzy Prototypical Knowledge Discovery. Application to Forest Fire Prediction. SEKE’2000, Knowledge Systems Institute, Chicago, Ill. USA, 47-54. 72. Olivé A. (2002). Specific Relationship Types in Conceptual Modeling: The Cases of Generic and with Common Participants, unpublished keynote lecture, 4th International Conference on Enterprise Information Systems (ICEIS´02), Ciudad Real, Spain. http://www.iceis.org/iceis2002/keynote.htm. 73. Perry D., Porter A. and Votta Lawrence. (2000). Empirical Studies of Software Engineering: A Roadmap. Future of Software Engineering. Ed:Anthony Finkelstein. ACM. 345-355. 74. Pflegeer S. (1994-1995). Experimental design and analysis in software engineering part 1-5, ACM Sigsoft, Software Engineering Notes. 19(4), 16-20; 20(1), 22-26; 20(2), 14-16; 20(3), 13-15; 20(4), 14-17. 75. Pfleeger, S. and B.A. Kitchenham (2001) Principles of Survey Research”, Software Engineering Notes, 26, (6), pp. 16-18 76. Piattini, M., Genero, M. and Jiménez, L. (2001). A metric-based approach for predicting conceptual data models maintainability. International Journal of Software Engineering and Knowledge Engineering 11(6), 703-729. 77. Pippinger, N. (1978). Complexity Theory. Scientific American, 238(6), 1-15. 78. Poels G. (1999). On the Formal Aspects of the Measurement of Object-Oriented Software Specifications. Ph.D. dissertation. Faculty of Economics and Business Administration. Katholieke Universiteit Leuven. Belgium. 79. Poels G. and Dedene G. (1997). Comments on Property-Based Software Engineering Measurement: Refining the Additivity Properties. IEEE Transactions on Software Engineering. 23(3). 190-195. 80. Poels G. and Dedene G. (2000). Distance-based software measurement: necessary and sufficient properties for software measures. Information and Software Technology. 42(1). 35-46.

59

81. Poels G. and Dedene G. (2001). Evaluating the Effect of Inheritance on the Modifiability of Object-Oriented Business Domain Models. 5th European Conference on Software Maintenance and Reengineering (CSMR 2001). Lisbon. Portugal. 20-28. 82. Rational Software. (1998). Object Oriented Analysis and Design. Student Manual. http://www.rational.com/. 83. Roberts F.S. (1979). Measurement Theory with Applications to Decisionmaking. Utility and the Social Sciences. Addison-Wesley. 84. Robson C. (1993). Real world research: A resource for social scientists and practitioners-researchers. Blackwell. 85. Schneidewind N. (1992). Methodology For Validating Software Metrics. IEEE Transactions on Software Engineering, 18(5), 410-422. 86. Schuette R. and Rotthowe T. (1998). The Guidelines of Modeling – An Approach to Enhance the Quality in Information Models. Proceedings of the Seventeenth International Conference on Conceptual Modelling (ER´98). Singapore. 240-254. 87. Shull F., Basili V., Carver J. and Maldonado J. (2002.) Replicating Software Engineering Experiments: Addressing the Tacit Knowledge Problem. 2002 International Symposium on Empirical Software Engineering (ISESE 2002), Nara, Japan, IEEE Computer Society, 7-16. 88. Si-Said Cherfi S.. Akoka J. and Comyn-Wattiau I. (2002). Conceptual Modelling Quality – from EER to UML Schemas Evaluation. 21st International Conference on Conceptual Modeling (ER 2002). Tampere. Finland. 499-512. 89. SPSS 11.0. 2001. Syntax Reference Guide. Chicago. SPSS Inc. 90. Stake R. (1995). The art of case study research. Sage publications. 91. Suppes P.. Krantz D.. Luce. R.. and Tversky A. (1989). Foundations of Measurement: Geometrical. Threshold. and Probabilistic Representations. 2. San Diego. Calif.. Academic Press. 92. Van Den Berg and Van Den Broek. (1996). Axiomatic Validation in the Software Metric Development Process. In Chapter 10: Software Measurement, Edited by Austin Melton, Thomson Computer Press. 93. Van Solingen R. and Berghout E. (1999). The Goal/Question/Metric Method: A practical guide for quality improvement of software development. McGraw-Hill. 94. Weyuker E.J. (1988). Evaluating Software Complexity Measures. IEEE Transactions on Software Engineering. 14(9). 1357-1365. 95. Whitmire S. (1997). Object Oriented Design Measurement. John Wiley & Sons. Inc.

60

96. Wohlin C.. Runeson P.. Höst M.. Ohlson M.. Regnell B. and Wesslén A. (2000). Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers. 97. Yin R. (1994). Case study research design and methods. Sage Publications, Beverly hills, California. 98. Zuse H. (1998) A Framework of Software Measurement. Walter de Gruyter Berlin.

61

FACULTEIT ECONOMIE EN BEDRIJFSKUNDE HOVENIERSBERG 24 9000 GENT

Tel. Fax.

: 32 - (0)9 – 264.34.61 : 32 - (0)9 – 264.35.92

WORKING PAPER SERIES

5

00/80

K. DE WULF, G. ODEKERKEN-SCHRÖDER, The influence of seller relationship orientation and buyer relationship proneness on trust, commitment, and behavioral loyalty in a consumer environment, January 2000, 27p.(published as ‘Investments in consumer relationships: a cross-country and cross-industry exploration’, Journal of Marketing, 2001)

00/81

R. VANDER VENNET, Cost and profit efficiency of financial conglomerates and universal banks in Europe., February 2000, 33 p. (published in Journal of Money, Credit, and Banking, 2002)

00/82

J. BOUCKAERT, Bargaining in markets with simultaneous and sequential suppliers, April 2000, 23 p. (published in Journal of Economic Behavior and Organization, 2002)

00/83

N. HOUTHOOFD, A. HEENE, A systems view on what matters to excel, May 2000, 22 p.

00/84

D. VAN DE GAER, E. SCHOKKAERT, M. MARTINEZ, Three meanings of intergenerational mobility, May 2000, 20 p. (published in Economica, 2001)

00/85

G. DHAENE, E. SCHOKKAERT, C. VAN DE VOORDE, Best affine unbiased response decomposition, May 2000, 9 p. (forthcoming in Journal of Multivariate Analysis, 2003)

00/86

D. BUYENS, A. DE VOS, The added value of the HR-department : empirical study and development of an integrated framework, June 2000, 37 p. (published as ‘Personnel and human resource managers: Power, prestige and potential - Perceptions of the value of the HR function’, in Human Resource Management Journal, 2001).

00/87

K. CAMPO, E. GIJSBRECHTS, P. NISOL, The impact of stock-outs on whether, how much and what to buy, June 2000, 50 p.

00/88

K. CAMPO, E. GIJSBRECHTS, P. NISOL, Towards understanding consumer response to stock-outs, June 2000, 40 p. (published in Journal of Retailing, 2000)

00/89

K. DE WULF, G. ODEKERKEN-SCHRÖDER, P. SCHUMACHER, Why it takes two to build succesful buyer-seller relationships July 2000, 31 p. (published as ‘Strengthening Retailer-Consumer Relationships: The Dual Impact of Relationship Marketing Tactics and Consumer Personality’, in Journal of Business Research, 2002)

00/90

J. CROMBEZ, R. VANDER VENNET, Exact factor pricing in a European framework, September 2000, 38 p.

00/91

J. CAMERLYNCK, H. OOGHE, Pre-acquisition profile of privately held companies involved in takeovers : an empirical study, October 2000, 34 p.

00/92

K. DENECKER, S. VAN ASSCHE, J. CROMBEZ, R. VANDER VENNET, I. LEMAHIEU, Value-at-risk prediction using context modeling, November 2000, 24 p. (published in European Physical Journal B, 2001)

00/93

P. VAN KENHOVE, I. VERMEIR, S. VERNIERS, An empirical investigation of the relationships between ethical beliefs, ethical ideology, political preference and need for closure of Dutch-speaking consumers in Belgium, November 2000, 37 p. (published in Journal of Business Ethics, 2001)

00/94

P. VAN KENHOVE, K. WIJNEN, K. DE WULF, The influence of topic involvement on mail survey response behavior, November 2000, 40 p. (published in Psychology & Marketing, 2002).

00/95

A. BOSMANS, P. VAN KENHOVE, P. VLERICK, H. HENDRICKX, The effect of mood on self-referencing in a persuasion context, November 2000, 26p. (published in Advances in Consumer Research, vol.28, 2001, p.115-121)

00/96

P. EVERAERT, G. BOËR, W. BRUGGEMAN, The Impact of Target Costing on Cost, Quality and Development Time of New Products: Conflicting Evidence from Lab Experiments, December 2000, 47 p.

00/97

G. EVERAERT, Balanced growth and public capital: An empirical analysis with I(2)-trends in capital stock data, December 2000, 29 p. (forthcoming in Economic Modelling, 2003).

00/98

G. EVERAERT, F. HEYLEN, Public capital and labour market performance in Belgium, December 2000, 45 p. (forthcoming in Journal of Policy Modeling, 2004)

00/99

G. DHAENE, O. SCAILLET, Reversed Score and Likelihood Ratio Tests, December 2000, 16 p.

FACULTEIT ECONOMIE EN BEDRIJFSKUNDE HOVENIERSBERG 24 9000 GENT

Tel. Fax.

: 32 - (0)9 – 264.34.61 : 32 - (0)9 – 264.35.92

WORKING PAPER SERIES

6

01/100 A. DE VOS, D. BUYENS, Managing the psychological contract of graduate recruits: a challenge for human resource management, January 2001, 35 p. 01/101 J. CHRISTIAENS, Financial Accounting Reform in Flemish Universities: An Empirical Study of the implementation, February 2001, 22 p. 01/102 S. VIAENE, B. BAESENS, D. VAN DEN POEL, G. DEDENE, J. VANTHIENEN, Wrapped Input Selection using Multilayer Perceptrons for Repeat-Purchase Modeling in Direct Marketing, June 2001, 23 p. (published in International Journal of Intelligent Systems in Accounting, Finance & Management, 2001). 01/103 J. ANNAERT, J. VAN DEN BROECK, R. VANDER VENNET, Determinants of Mutual Fund Performance: A Bayesian Stochastic Frontier Approach, June 2001, 31 p. (forthcoming in European Journal of Operational Research, 2003) 01/104 S. VIAENE, B. BAESENS, T. VAN GESTEL, J.A.K. SUYKENS, D. VAN DEN POEL, J. VANTHIENEN, B. DE MOOR, G. DEDENE, Knowledge Discovery in a Direct Marketing Case using Least Square Support Vector Machines, June 2001, 27 p. (published in International Journal of Intelligent Systems, 2001). 01/105 S. VIAENE, B. BAESENS, D. VAN DEN POEL, J. VANTHIENEN, G. DEDENE, Bayesian Neural Network Learning for Repeat Purchase Modelling in Direct Marketing, June 2001, 33 p. (published in European Journal of Operational Research, 2002). 01/106 H.P. HUIZINGA, J.H.M. NELISSEN, R. VANDER VENNET, Efficiency Effects of Bank Mergers and Acquisitions in Europe, June 2001, 33 p. 01/107 H. OOGHE, J. CAMERLYNCK, S. BALCAEN, The Ooghe-Joos-De Vos Failure Prediction Models: a CrossIndustry Validation, July 2001, 42 p. 01/108 D. BUYENS, K. DE WITTE, G. MARTENS, Building a Conceptual Framework on the Exploratory Job Search, July 2001, 31 p. 01/109 J. BOUCKAERT, Recente inzichten in de industriële economie op de ontwikkelingen in de telecommunicatie, augustus 2001, 26 p. (published in Economisch en Sociaal Tijdschrift, 2001). 01/110 A. VEREECKE, R. VAN DIERDONCK, The Strategic Role of the Plant: Testing Ferdows' Model, August 2001, 31 p. (published in International Journal of Operations and Production Management, 2002) 01/111 S. MANIGART, C. BEUSELINCK, Supply of Venture Capital by European Governments, August 2001, 20 p. 01/112 S. MANIGART, K. BAEYENS, W. VAN HYFTE, The survival of venture capital backed companies, September 2001, 32 p. (published in Venture Capital, 2002) 01/113 J. CHRISTIAENS, C. VANHEE, Innovations in Governmental Accounting Systems: the Concept of a "Mega General Ledger" in Belgian Provinces, September 2001, 20 p. (published in V. Montesinos and J.M. Vela, Innovations in governmental accounting, Kluwer Academic Publishers, 2002). 01/114 M. GEUENS, P. DE PELSMACKER, Validity and reliability of scores on the reduced Emotional Intensity Scale, September 2001, 25 p. (published in Educational and Psychological Measurement, 2001) 01/115 B. CLARYSSE, N. MORAY, A process study of entrepreneurial team formation: the case of a research based spin off, October 2001, 29 p. 01/116 F. HEYLEN, L. DOBBELAERE, A. SCHOLLAERT, Inflation, human capital and long-run growth. An empirical analysis, October 2001, 17 p. 01/117 S. DOBBELAERE, Insider power and wage determination in Bulgaria. An econometric investigation, October 2001, 30 p. (forthcoming in International Journal of Manpower, 2003) 01/118 L. POZZI, The coefficient of relative risk aversion: a Monte Carlo study investigating small sample estimator problems, October 2001, 21 p. (forthcoming in Economic Modelling, 2003).

FACULTEIT ECONOMIE EN BEDRIJFSKUNDE HOVENIERSBERG 24 9000 GENT

Tel. Fax.

: 32 - (0)9 – 264.34.61 : 32 - (0)9 – 264.35.92

WORKING PAPER SERIES

7

01/119 N. GOBBIN, B. VAN AARLE, Fiscal Adjustments and Their Effects during the Transition to the EMU, October 2001, 28 p. (published in Public Choice, 2001). 01/120 A. DE VOS, D. BUYENS, R. SCHALK, Antecedents of the Psychological Contract: The Impact of Work Values and Exchange Orientation on Organizational Newcomers’ Psychological Contracts, November 2001, 41 p. 01/121 A. VAN LANDSCHOOT, Sovereign Credit Spreads and the Composition of the Government Budget, November 2001, 29 p. 01/122 K. SCHOORS, The fate of Russia’s former state banks: Chronicle of a restructuring postponed and a crisis foretold, November 2001, 54 p. (published in Europe-Asia Studies, 2003) 01/123 J. ALBRECHT, D. FRANÇOIS, K. SCHOORS, A Shapley Decomposition of Carbon Emissions without Residuals, December 2001, 21 p. (published in Energy Policy, 2002). 01/124 T. DE LANGHE, H. OOGHE, Are Acquisitions Worthwhile? An Empirical Study of the Post-Acquisition Performance of Privately Held Belgian Companies Involved in Take-overs, December 2001, 29 p. 01/125 L. POZZI, Government debt, imperfect information and fiscal policy effects on private consumption. Evidence for 2 high debt countries, December 2001, 34 p. 02/126 G. RAYP, W. MEEUSEN, Social Protection Competition in the EMU, January 2002, 20 p. 02/127 S. DE MAN, P. GEMMEL, P. VLERICK, P. VAN RIJK, R. DIERCKX, Patients’ and personnel’s perceptions of service quality and patient satisfaction in nuclear medicine, January 2002, 21 p. 02/128 T. VERBEKE, M. DE CLERCQ, Environmental Quality and Economic Growth, January 2002, 48 p. 02/129 T. VERBEKE, M. DE CLERCQ, Environmental policy, policy uncertainty and relocation decisions, January 2002, 33 p. 02/130 W. BRUGGEMAN, V. DECOENE, An Empirical Study of the Influence of Balanced Scorecard-Based Variable Remuneration on the Performance Motivation of Operating Managers, January 2002, 19 p. 02/131 B. CLARYSSE, N. MORAY, A. HEIRMAN, Transferring Technology by Spinning off Ventures: Towards an empirically based understanding of the spin off process, January 2002, 32 p. 02/132 H. OOGHE, S. BALCAEN, Are Failure Prediction Models Transferable From One Country to Another? An Empirical Study Using Belgian Financial Statements, February 2002, 42 p. 02/133 M. VANHOUCKE, E. DEMEULEMEESTER, W. HERROELEN, Discrete Time/Cost Trade-offs in Project scheduling with Time-Switch Constraints? February 2002, 23 p. (published in Journal of the Operational Research Society, 2002) 02/134 C. MAYER, K. SCHOORS, Y. YAFEH, Sources of Funds and Investment Activities of Venture Capital Funds: Evidence from Germany, Israel, Japan and the UK?, February 2002, 31 p. 02/135 K. DEWETTINCK, D. BUYENS, Employment implications of downsizing strategies and reorientation practices: an empirical exploration, February 2002, 22 p. 02/136 M. DELOOF, M. DE MAESENEIRE, K. INGHELBRECHT, The Valuation of IPOs by Investment Banks and the Stock Market: Empirical Evidence, February 2002, 24 p. 02/137 P. EVERAERT, W. BRUGGEMAN, Cost Targets and Time Pressure during New Product Development, March 2002, 21 p. (published in International Journal of Operations and Production Management, 2002). 02/138 D. O’NEILL, O. SWEETMAN, D. VAN DE GAER, The impact of cognitive skills on the distribution of the blackwhite wage gap, March 2002, 14 p. 02/139

W. DE MAESENEIRE, S. MANIGART, Initial returns: underpricing or overvaluation? Evidence from Easdaq and EuroNM, March 2002, 36 p.

FACULTEIT ECONOMIE EN BEDRIJFSKUNDE HOVENIERSBERG 24 9000 GENT

Tel. Fax.

: 32 - (0)9 – 264.34.61 : 32 - (0)9 – 264.35.92

WORKING PAPER SERIES

8

02/140 K. SCHOORS, Should the Central and Eastern European accession countries adopt the EURO before or after accession? March 2002, 29p. (published in Economics of Planning, 2002). 02/141 D. VERHAEST, E. OMEY, Overeducation in the Flemish Youth Labour Market, March 2002, 39p. 02/142 L. CUYVERS, M. DUMONT, G. RAYP, K. STEVENS, Wage and Employment Effects in the EU of International Trade with the Emerging Economies, April 2002, 24 p. (forthcoming in Weltwirtschaftliches Archiv, 2003). 02/143 M. GEUENS, P. DE PELSMACKER, The Role of Humor in the Persuasion of Individuals Varying in Need for Cognition, April 2002, 19 p. (published in Advances in Consumer Research, 2002). 02/144 M. VANHOUCKE, E. DEMEULEMEESTER, W. HERROELEN, Net Present Value Maximization of Projects with Progress Payments, April 2002, 23 p. (published in European Journal of Operational Research, 2003) 02/145 E. SCHOKKAERT, D. VAN DE GAER, F. VANDENBROUCKE, Responsibility-sensitive egalitarianism and optimal linear income taxation, April 2002, 37p. 02/146 J. ANNAERT, J. CROMBEZ, B. SPINEL, F. VAN HOLLE, Value and size effect: Now you see it, now you don’t, May 2002, 31 p. 02/147 N. HOUTHOOFD, A. HEENE, The quest for strategic groups: Overview, and suggestions for future research, July 2002, 22 p. 02/148 G. PEERSMAN, The transmission of monetary policy in the Euro area: Are the effects different across countries?, July 2002, 35 p. 02/149 G. PEERSMAN, F. SMETS, The industry effects of monetary policy in the Euro area, July 2002, 30 p. 02/150 J. BOUCKAERT, G. DHAENE, Inter-Ethnic Trust and Reciprocity: Results of an Experiment with Small Business Entrepreneurs, July 2002, 27 p. (forthcoming in European Journal of Political Economy, 2004) 02/151 S. GARRÉ, I. DE BEELDE, Y. LEVANT, The impact of accounting differences between France and Belgium, August 2002, 28 p. (published in French in Comptabilité - Controle - Audit, 2002) 02/152 R. VANDER VENNET, Cross-border mergers in European banking and bank efficiency, September 2002, 42 p. 02/153 K. SCHOORS, Financial regulation in Central Europe: the role of reserve requirements and capital rules, September 2002, 22 p. 02/154 B. BAESENS, G. VERSTRAETEN, D. VAN DEN POEL, M. EGMONT-PETERSEN, P. VAN KENHOVE, J. VANTHIENEN, Bayesian Network Classifiers for Identifying the Slope of the Customer Lifecycle of Long-Life Customers, October 2002, 27 p. (forthcoming in European Journal of Operational Research, 2003). 02/155 L. POZZI, F. HEYLEN, M. DOSSCHE, Government debt and the excess sensitivity of private consumption to current income: an empirical analysis for OECD countries, October 2002, 19 p. 02/156 D. O’NEILL, O. SWEETMAN, D. VAN DE GAER, Consequences of Specification Error for Distributional Analysis With an Application to Intergenerational Mobility, November 2002, 35 p. 02/157 K. SCHOORS, B. VAN DER TOL, Foreign direct investment spillovers within and between sectors: Evidence from Hungarian data, November 2002, 29 p. 02/158 L. CUYVERS, M. DUMONT, G. RAYP, K. STEVENS, Home Employment Effects of EU Firms' Activities in Central and Eastern European Countries, November 2002, 25 p. 02/159 M. VANHOUCKE, Optimal due date assignment in project scheduling, December 2002, 18 p.

FACULTEIT ECONOMIE EN BEDRIJFSKUNDE HOVENIERSBERG 24 9000 GENT

Tel. Fax.

: 32 - (0)9 – 264.34.61 : 32 - (0)9 – 264.35.92

WORKING PAPER SERIES

9

02/160 J. ANNAERT, M.J.K. DE CEUSTER, W. VANHYFTE, The Value of Asset Allocation Advice. Evidence from the Economist’s Quarterly Portfolio Poll, December 2002, 35p. 02/161 M. GEUENS, P. DE PELSMACKER, Developing a Short Affect Intensity Scale, December 2002, 20 p. (published in Psychological Reports, 2002). 02/162 P. DE PELSMACKER, M. GEUENS, P. ANCKAERT, Media context and advertising effectiveness: The role of context appreciation and context-ad similarity, December 2002, 23 p. (published in Journal of Advertising, 2002). 03/163 M. GEUENS, D. VANTOMME, G. GOESSAERT, B. WEIJTERS, Assessing the impact of offline URL advertising, January 2003, 20 p. 03/164 D. VAN DEN POEL, B. LARIVIÈRE, Customer Attrition Analysis For Financial Services Using Proportional Hazard Models, January 2003, 39 p. (forthcoming in European Journal of Operational Research, 2003) 03/165 P. DE PELSMACKER, L. DRIESEN, G. RAYP, Are fair trade labels good business ? Ethics and coffee buying intentions, January 2003, 20 p. 03/166 D. VANDAELE, P. GEMMEL, Service Level Agreements – Een literatuuroverzicht, (forthcoming in Tijdschrift voor Economie en Management, 2003).

Januari 2003, 31 p.

03/167 P. VAN KENHOVE, K. DE WULF AND S. STEENHAUT, The relationship between consumers’ unethical behavior and customer loyalty in a retail environment, February 2003, 27 p. 03/168

P. VAN KENHOVE, K. DE WULF, D. VAN DEN POEL, Does attitudinal commitment to stores always lead to behavioural loyalty? The moderating effect of age, February 2003, 20 p.

03/169 E. VERHOFSTADT, E. OMEY, The impact of education on job satisfaction in the first job, March 2003, 16 p. 03/170 S. DOBBELAERE, Ownership, Firm Size and Rent Sharing in a Transition Country, March 2003, 26 p. (forthcoming in Labour Economics, 2004) 03/171 S. DOBBELAERE, Joint Estimation of Price-Cost Margins and Union Bargaining Power for Belgian Manufacturing, March 2003, 29 p. 03/172 M. DUMONT, G. RAYP, P. WILLEMÉ, O. THAS, Correcting Standard Errors in Two-Stage Estimation Procedures with Generated Regressands, April 2003, 12 p. 03/173 L. POZZI, Imperfect information and the excess sensitivity of private consumption to government expenditures, April 2003, 25 p. 03/174 F. HEYLEN, A. SCHOLLAERT, G. EVERAERT, L. POZZI, Inflation and human capital formation: theory and panel data evidence, April 2003, 24 p. 03/175 N.A. DENTCHEV, A. HEENE, Reputation management: Sending the right signal to the right stakeholder, April 2003, 26 p. 03/176 A. WILLEM, M. BUELENS, Making competencies cross business unit boundaries: the interplay between inter-unit coordination, trust and knowledge transferability, April 2003, 37 p. 03/177 K. SCHOORS, K. SONIN, Passive creditors, May 2003, 33 p. 03/178 W. BUCKINX, D. VAN DEN POEL, Customer Base Analysis: Partial Defection of Behaviorally-Loyal Clients in a Non-Contractual FMCG Retail Setting, May 2003, 26 p. 03/179 H. OOGHE, T. DE LANGHE, J. CAMERLYNCK, Profile of multiple versus single acquirers and their targets : a research note, June 2003, 15 p.

FACULTEIT ECONOMIE EN BEDRIJFSKUNDE HOVENIERSBERG 24 9000 GENT

Tel. Fax.

: 32 - (0)9 – 264.34.61 : 32 - (0)9 – 264.35.92

WORKING PAPER SERIES

10

03/180 M. NEYT, J. ALBRECHT, B. CLARYSSE, V. COCQUYT, The Cost-Effectiveness of Herceptin® in a Standard Cost Model for Breast-Cancer Treatment in a Belgian University Hospital, June 2003, 20 p. 03/181 M. VANHOUCKE, New computational results for the discrete time/cost trade-off problem with time-switch constraints, June 2003, 24 p. 03/182 C. SCHLUTER, D. VAN DE GAER, Mobility as distributional difference, June 2003, 22 p. 03/183 B. MERLEVEDE, Reform Reversals and Output Growth in Transition Economies, June 2003, 35 p. 03/184 G. POELS, Functional Size Measurement of Multi-Layer Object-Oriented Conceptual Models, June 2003, 13 p. (forthcoming in Lecture Notes in Computer Science, 2003) 03/185 A. VEREECKE, M. STEVENS, E. PANDELAERE, D. DESCHOOLMEESTER, A classification of programmes and its managerial impact, June 2003, 11 p. (forthcoming in International Journal of Operations and Production Management, 2003) 03/186 S. STEENHAUT, P. VANKENHOVE, Consumers’ Reactions to “Receiving Too Much Change at the Checkout”, July 2003, 28 p. 03/187 H. OOGHE, N. WAEYAERT, Oorzaken van faling en falingspaden: Literatuuroverzicht en conceptueel verklaringsmodel, July 2003, 35 p. 03/188 S. SCHILLER, I. DE BEELDE, Disclosure of improvement activities related to tangible assets, August 2003, 21 p. 03/189 L. BAELE, Volatility Spillover Effects in European Equity Markets, August 2003, 73 p. 03/190 A. SCHOLLAERT, D. VAN DE GAER, Trust, Primary Commodity Dependence and Segregation, August 2003, 18 p 03/191 D. VAN DEN POEL, Predicting Mail-Order Repeat Buying: Which Variables Matter?, August 2003, 25 p. 03/192 T. VERBEKE, M. DE CLERCQ, The income-environment relationship: Does a logit model offer an alternative empirical strategy?, September 2003, 32 p. 03/193 S. HERMANNS, H. OOGHE, E. VAN LAERE, C. VAN WYMEERSCH, Het type controleverslag: resultaten van een empirisch onderzoek in België, September 2003, 18 p. 03/194 A. DE VOS, D. BUYENS, R. SCHALK, Psychological Contract Development during Organizational Socialization: Adaptation to Reality and the Role of Reciprocity, September 2003, 42 p. 03/195 W. BUCKINX, D. VAN DEN POEL, Predicting Online Purchasing Behavior, September 2003, 43 p. 03/196 N.A. DENTCHEV, A. HEENE, Toward stakeholder responsibility and stakeholder motivation: Systemic and holistic perspectives on corporate sustainability, September 2003, 37 p. 03/197 D. HEYMAN, M. DELOOF, H. OOGHE, The Debt-Maturity Structure of Small Firms in a Creditor-Oriented Environment, September 2003, 22 p. 03/198 A. HEIRMAN, B. CLARYSSE, V. VAN DEN HAUTE, How and Why Do Firms Differ at Start-Up? A ResourceBased Configurational Perspective, September 2003, 43 p.