Coordination and Productivity Issues in Free Software

Coordination and Productivity Issues in Free Software: the Role of Brooks’ Law Andrea Capiluppi School of Computing, Information Technology and Engineering University of East London – UK [email protected]

Paul J. Adams and Cornelia Boldyreff Centre of Research on Open Source Software University of Lincoln – UK [email protected]

Abstract

of software artefacts and software process data, and the approach validated and pinned with a consideration of validity threats, in order to draw any conclusion. On a theoretical – and logical – basis, several contributions have underpinned the assumption that Brooks’ Law will not apply to FLOSS projects [27, 16, 15]. One of the preferred arguments to justify this position is based on the concept of ”egoless programming”: when developers are not territorial about their code, but instead they encourage others to improve the systems, improvement is much more likely, widespread and faster [34]. Some research studies have used rather simplistic approaches to validate the above assumption, by for instance using the number of developers and an operationalization of a project’s status as a proxy of the complexity of interrelations [29]. One of these studies has shown that in the Sourceforge1 dataset, the correlation between the “output per person” and “number of active programmers” was very small. It concluded that, also given to the strict modularization, Brooks’ Law does not apply (at least) to the SourceForge data set [19]. Other researchers have maintained a more careful profile; given the assumption that Brooks’ Law can be hardly transcended, and will apply anyway to software (and other) collaborating teams, a more appropriate research question is how to distinguish between a core team (where the Brooks’ Law does and will definitely apply) and the larger cohort of contributors. On these fringes, in fact, obscure bugs are tracked and fixed, and, more importantly, radical experimentations are implemented and tested in parallel with the main, more important features: in a few words, the core team can focus on the core work [12]. The real goal is therefore how to diminish the effects of Brooks Law and have a small improvement of the overall software development process [33].

Proponents of the Free Software paradigm have argued that some of the most established software engineering principles do not fully apply when considered in an open, distributed approach found in Free Software development. The objective of this research is to empirically examine the Brooks’ Law in a Free Software context. The principle is separated out into its two primary premises: the first is based on a developer’s ability to become productive when joining a new team; the second premise relates to the quality of coordination as the team grows. Three large projects are studied for this purpose: KDE, Plone and Evince. Based on empirical evidence, the paper provides two main contributions: based on the first premise of Brooks’ Law, it claims that coordination costs increase only in a very specific phase for Free Software projects. After that, these costs become quasi-constant. Secondly, it shows that a ramp up period exists in Free Software projects, and this period marks the divide between projects that are successful at engaging new contributors from others that only benefit from occasional new contributors.

1

Introduction

Among the comparisons between closed- and opensource processes, much attention has lately been given to the applicability of the so-called Brooks’ Law to Free Software projects. Within his famous work, “The Mythical Man Month”, Fred Brooks introduced a simple premise, that, oversimplifying as he put it, states [3]: Adding manpower to a late project makes it later This is, by Brooks’ own admission, not a universally applicable premise. Still, when a comparison is to be made, emphasis should be given to the empirical analysis

1

1

http://sourceforge.net

Within this research an empirical analysis of Brooks’ Law within Free Software is conducted. Two principle studies are analysed: firstly, the ability of a project to successfully integrate new contributors is investigated; and secondly, the network of interaction between existing contributors is studied. A new term, coordination cohesion, is introduced and used to assess the strength of project coordination over time. The assessment is conducted for the KDE, Evince and Plone projects. From an empirical perspective, this research has studied whether the claim of Free Software advocates holds globally, or within specific limited circumstances. The paper is articulated as follows: section 2 illustrates the previous works in the area of productivity of Free Software developers. The role of Brooks’ Law in the Free Software literature will be summarised with respect to its primary, qualitative results. Section 3 will introduce the two working hypotheses, and section 4 will summarize the empirical method used. In terms of results, section 5 will show what was observed in relation to the issue of communication among Free Software developers, while section 6 will present the evidence related to the early productivity of developers when joining an established team. Finally, section 7 will identify the main threats to the external and internal validity of the proposed empirical experiment, and section 8 will conclude.

2

Related Works

In the last decade, the Free/Libre and Open Source Software (FLOSS) development approach has attracted vast and diverse interested audiences, namely software practitioners, software engineering researchers and, in the large, the end users of software systems. Each of these cohorts has brought to attention one (or several) aspects of this approach [28], criticized [10] or advocated it [23], and, at times, compared it with established approaches [1, 31]. At first, the traditional, closed-source development has been questioned in light of this new paradigm; specifically, studies have been made to determine whether or not FLOSS systems show higher quality characteristics than closed source software [30]. Also, in terms of its evolutionary patterns, research papers have reported on whether the average FLOSS project could be considered comparable to a traditional software system [4, 14, 11]. It has also been argued that closed- and open-source paradigms represent just the extremes of the software management spectrum, and that several other flavors should be also considered (e.g., commercial open-source, community open-source, among others, [6]). This has then posed the basis for the comparison of these hybrid systems with traditional software [22]. More recently, researchers have started analyzing the fundamentals differences between the established software

engineering approaches and the FLOSS development style, comparing it both with the traditional development style (i.e., composed of the phases of requirements, design, implementation, testing and maintenance [24]), and finding differences with the development cycles ([25, 5]), and finding similarities with the Agile paradigm [18, 1]. As a further and one of the most serious challenges that the FLOSS approach still has to face is the issue of trust among end-users: specifically, on one side, the internal product and process attributes of the FLOSS have been analyzed to establish its dependability, reliability and overall quality as an alternative to commercial software, both in research papers [35] and in pan-European projects2 . On the other side, the chance for end-users to benefit from proper, centralized support for software created by the community at large should be assessed [20, 2]. In all the above related works on the matter (but specifically for enabling the trust by users), it is absolutely necessary that research studies on FLOSS systems are performed rigorously, with a strong emphasis on the repeatability of experiments, on the rigorous empirical approach in the collection of data, its parsing and the presentation of results, and finally by illustrating any threats to validity of which assumptions were made throughout the experiment. Specifically, comparisons between the termed closed- and opensource approaches should be formulated on a sound empirical basis, avoiding the pitfalls of personal opinions, strong advocacy and the such like.

3

Working Hypothesis

Analyzing the thinking behind Brooks’ Law, two critical observations can be derived, and potentially operationalized with research questions, metrics and verifiable tests: the work described in this paper covers both points in order and they are tested to determine whether both are generally true or not for Free Software projects. In addition, the reasoning behind [12] and [33] will be tested too; it will be in fact studied whether the coordination paths keep their weight among the core developers of a FLOSS project.

3.1

Issues of Communication

The communication paths between newly joining and existing software developers (hence, not specifically to FLOSS projects) become increasingly complex, creating a considerable overhead; in fact, the number of communication links grows as a square of the number of developers in the project. For a project of n developers, the total number of communication links between them will be n2 − n. It is, 2 SQO-OSS: Software Quality Observatory for Open Source Software, www.sqo-oss.eu

therefore, the null hypothesis of this work, that coordination quality deteriorates as a Free Software project grows. However, given the modular nature of Free Software development, it is expected that a growth in the contributor base over time will not necessarily decrease coordination quality. This can largely be attributed to communication channels becoming as modular as the team structure [7], a situation that is known to be problematic in distributed teams [13].

3.2

Issues of Early Productivity

Within a generic software process (not specifically to Free Software projects), new developers, no matter how competent, are not able to work fully productively when they join a new project. This is known to be generally true when building a team for any purpose [32]. Factors such as a need to get to know the team, the underlying process and its codebase all contribute to this. This period, where full productivity has not been achieved, is known as the ramp up period. It is therefore the null hypothesis, for the second part of this research, that developers increase in productivity over time. Given that many Free software contributors are simply “scratching an itch” [26] it is expected that most Free Software developers will show increased productivity, whilst some will show decreased productivity after addressing their area of interest.

4

4.2

Community Graphing

The Community Graph is a simple, undirected, weighted graph in which nodes represent accounts with the SCM system and edges represent a relationship between the nodes where this relationship may be interpreted as “has worked on the same artefact as”. The weight of the edge is the count of artefacts on which the pair of SCM accounts has collaborated. A similar approach to community network analysis was taken by Martinez-Romo et al [21]. Figure 1 shows the Community Graph for the KDE Marble3 project. The graph has been laid out using the Kamada-Kawai algorithm [17]. Within this particular implementation of that algorithm, edges of higher weight are made shorter. The result, here, being that SCM accounts which have worked together frequently, will appear closer together. One of the consequences of this is that the graph representation reveals distinct, concentric “levels” of contribution. These are similar to the central levels of contribution as hypothesized by Crowston et al [8].

Method and Operationalization

In the following, an overview of the definitions and the methods used to extract or evaluate the metrics used throughout this research is reported. Since this research focuses purely on source code management systems (SCM), the definitions of commit (i.e., the atomic transaction to a SCM), and committer (or developer, i.e., the author of such transaction) will be used throughout the paper.

4.1

Communication Paths Figure 1. Community Graph for KDE Marble

Given the total number of developers, potentially each one could share the work on some artefact (code, or other) with several other developers. Each one-to-one interaction is here named a communication link: the maximum number of links, given a number n of developers, is n2 − n, as mentioned above. This value is clearly an upper bound, and summarizes a completely loose approach of collaboration, where each node (i.e., developer) has an interaction with every other node. As it is unlikely that each developer is linked to every other developer extra links may be required to connect one developer to another, via intermediate contributors. A communication path is a set of links connecting one developer to another. A communication path may only be one link long; when two developers are directly linked.

4.3

Graph Processing – Cohesion

Having produced a graph representing all the utilized communication paths amongst developers in a project, a method is required by which we can measure how compact the communication is. In order to achieve this, a slightly modified version of the Floyd–Warshall algorithm [9] for finding the shortest paths between all pairs of nodes is utilized. As described above, the weight of edges within the graphs increases as developers work more with each other 3 KDE

Marble: http://edu.kde.org/marble/

on the same resource. It is, therefore, important to establish the shortest path between nodes in order to measure how compact the communication within a project is. Higher path weights will denote how often two developers have worked together, i.e. more compaction of communication. Therefore, the average weight of the path between nodes in the communication graph will be defined here as the “cohesion” of a community graph. The following conventions have been used in this research: • cohesion between two joined nodes (or pairwise cohesion) is the weight of the edge between them; • cohesion between any two nodes (or path cohesion) is the weight of the shortest path between them, the higher the weight the greater the cohesion; • overall graph cohesion (or coordination cohesion, or just “cohesion”) is the average weight of all the shortest paths between the nodes.

Plone As seen above, the first segment is also well visible in Plone: few developers have strong communication links that confirm the applicability of Brook’s Law to Plone development (Figure 2 – bottom left). Also the second segment is visible: when increasing the number of developers, the number of coordination paths shrinks. The third segment is not visible, apparently because, in Plone, the same critical mass of developers than KDE hasn’t been achieved. Evince Evince achieves levels of coordination cohesion much greater than either Plone or KDE Figure 2 (bottom right), suggesting that a small number of contributors are responsible for maintaining a larger proportion of the repository’s artefacts. This is largely to be expected; both KDE and Plone are very modular in both community and architecture, whereas Evince is more monolithic as a single application. Again segments 1 and 2 are clearly visible, the third still unreached given the fewer number of developers.

5.1

5

Further Analysis

Results – Issues of Communication

KDE The number of developers within the project is plotted against the cohesion (as defined above) of the community graph, for each week over the lifetime of the project (from 1997 until january 2009). The higher the cohesion, the stronger the coordination paths. Figure 2 (top left) shows the results obtained for the KDE project. Three segments can be observed: 1. Few active developers, high cohesion: in this first phase, fewer than 10 developers produce high cohesion scores, greater than 20, Figure 2 (top left, and zoomed, top right). This certainly follows Brooks’ Law and conventional wisdom in both the process-driven and Agile communities. This also confirms what has been reported by earlier works [33, 12]: small Free Software teams behave as traditional software engineering teams, in terms of coordination efforts. 2. Increasing number of developers, quasi-constant cohesion: as the KDE project grows in number of developers (from 50 to 230), it has, as expected, seen a dilution of its coordination paths. Efficient management of the community and regular restructuring of the project into teams produce a bounded coordination effort Figure 2 (top left). 3. Very large number of developers, increasing cohesion: after hitting a certain threshold, an apparent critical mass is achieved, requesting a coordination cohesion vastly larger than when found when the project had only 10 developers (Figure 2 – top left). Further analysis of this is produced below (Section 5.1).

As mentioned previously, Figure 2 (top left) is potentially indicative of the KDE community having reached some form of critical mass required for greater coordination cohesion. In this section it is studied whether changes within the KDE project acted as the catalyst for this improved coordination cohesion. The KDE evolutionary plot is shown in Figure 3. Figure 3 indicates that the improved coordination cohesion is not created by the KDE project reaching a critical mass of developers., but a change in its characteristics. The most likely factors are: 1. No change in the number of developer pairs in the graph, but these pairs have vastly increased edge weights. This would indicate that the existing community members are taking an increased responsibility for more of the project’s artefacts. It would also indicate that growth in the number of active committers has stagnated. This latter point makes this option seem unlikely. 2. An increased number of community members that only connect to only one other community member. This would normally be indicative of an increase in new developers joining the community. Although a new developer only making a link with one other in a week seems likely, the number of new developers required to create the increase in cohesion shown in Figure 3 (above) would be very large and therefore makes this option seem less likely. 3. The number of pairs of active developers decreases. When creating too many coordination paths, this can

180

30 Weekly Data

Weekly Data

160 25 140

20

100

Cohesion

Cohesion

120

80

60

15

10

40 5 20

0

0 0

50

100

150

200 Developer Count

250

300

350

400

0

80

5

10 Developer Count

15

160 Weekly Data

Weekly Data

70

140

60

120

50

100 Cohesion

Cohesion

20

40

80

30

60

20

40

10

20

0

0 0

5

10

15 Developer Count

20

25

30

0

5

10

15 20 Developer Count

25

30

35

Figure 2. Cohesion of KDE, KDE with only 20 developers, Plone and Evince reduce the overall cohesion by making the project’s coordination paths overly complex. A reduction in the number of active pairs would indicate a period of more focused development and still allow for community growth. This makes this option highly likely. As the latest point seems the most likely, the number of pairs of developers is plotted for each week of KDE development and shown chronologically (in Figure 3 below). This new plot supports this point: a reduction in the number of active pairs of developers is the cause for the increased coordination cohesion within KDE development. The first piece of evidence is a correlation between the drop in active developers pairs in late 2001, with the increased cohesion during the same period. The second piece of evidence is the correlation between the decrease in active pairs from late 2004 to 2007 and the subsequent increase, with the increase and drop in coordination cohesion during the same period. Based upon the evidence in the Figures 2,and both Figures 3, it is possible to conclude that it is a distinct change in KDE working practices and not the reaching of a critical mass in the number of contributors that is the primary

contributor to KDE’s changes in coordination cohesion.

6

Results – Issues of Productivity

The second issue tackled in this paper is related to software productivity in the context of Brooks’ Law: it has been hypothesized above that a contributor requires a “ramp up” period in which they are integrated into a team before they are fully effective. Based on earlier, corporate observations on productivity [32], it can be suggested that many of the stages of team development are achieved in this period. The approach taken in this second part is to measure productivity as the contributor’s commit rate (i.e., the number of commits they make in a given time period). In an earlier work, it has been shown how the first three weeks of a Free Software contributor’s contribution can be indicative of their overall effort in the future [1]. Therefore, the first step in evaluating the ramp up time for contributors is to separate contributors in two categories: 1. partially engaged: committers whose initial contribution is less than one commit per week for their first

However, KDE obtains the worst improvement in mean productivity between its two groups: its contributors committing in each of their first three weeks are only 1.4 time more productive than those that do not. This is similar to Plone’s result of 1.6, but significantly less than Evince’s 2.4. So, although Evince is the least effective in engaging new contributors it is more effective in getting the best productivity out of its more-active early contributors.

180 Weekly Data 160 140

Cohesion

120 100 80 60 40

6.1

20

Productivity Results – KDE

0 01/01/2009

01/01/2010

01/01/2009

01/01/2010

01/01/2008

01/01/2007

01/01/2006

01/01/2005

01/01/2004

01/01/2003

01/01/2002

01/01/2001

01/01/2000

01/01/1999

01/01/1998

01/01/1997

Date 7000 Weekly Data 6000

Pair Count

5000

4000

3000

2000

1000

0 01/01/2008

01/01/2007

01/01/2006

01/01/2005

01/01/2004

01/01/2003

01/01/2002

01/01/2001

01/01/2000

01/01/1999

01/01/1998

01/01/1997

Date

Figure 3. Coordination Cohesion Over Time (above) and Development Pair Count Over Time (below) in KDE

three weeks; 2. fully engaged: contributors committing in each of their first three weeks. Table 1 shows the results of these measurements for each of KDE, Plone and Evince and for the two categories. The comparison between the two categories provides an indication of how successful a project is at engaging its developers beyond their ramp up period: projects with a greater number of contributors working in each of the first three weeks would indicate that project is effective in bringing in new contribution. Looking at the count of developers committing in each of their first three weeks it is clear that the KDE project is better at engaging its contributors early on, with 39% (to the nearest percentage point) active in each of their first three weeks. This compares favourably with 14% in the Plone project and 9% in Evince.

Figure 4 (top left) shows the individual productivity of each KDE contributor, during (horizontal axis) and after (vertical axis) their ramp up. Two observations can be made from this plot: at first, the spread of results is evident, indicating that the vast majority of contributors see a change in their productivity after their ramp up period; more-linear data would indicate that contributors do not display a change in productivity. Secondly, the density of contributors with low productivity (those with a commit rate of 1 or less) during their ramp up is considerable. From this project, 447 contributors fit into this set, almost one third of the entire project. Combining these two observations, it is possible to divide the distribution in 4 quadrants, corresponding to different pools of developers: a) top right quadrant (I): high commit rates both before and after the ramp-up (engaged developers); b) top left quadrant (II): low commit rate before and high commit rate after the ramp-up (successful engagement of developers); c) bottom left quadrant (III): low commit rates both before and after the ramp-up (sporadic developers); d) bottom right quadrant (IV): high commit rate before but low commit rate after the ramp-up (unsuccessful engagement of developers). Coupled with the data shown in Table 1 this plot gives a clearer indication that the KDE project, whilst effective and obtaining new contributors, isn’t as effective at keeping those contributors productive over time, given a ramp up of three weeks. The vast majority of developers commit more than once in a week during their ramp up period (all the points at the right of the vertical line, quadrants I and IV), and a considerable amount of them keep the same rate after the ramp up (quadrant I).

6.2

Productivity Results – Plone

Figure 4 (top right) shows that, similar to KDE, a large proportion of its contributors have a low commit rate during

Partially Engaged Project KDE Plone Evince

Min 0.007 0.008 0.015

Max 1.000 1.000 1.000

Mean 0.363 0.285 0.173

Fully Engaged Count 941 370 113

Min 0.024 0.047 0.049

Max 1.000 0.895 0.949

Mean 0.506 0.451 0.413

Count 597 60 11

Table 1. Commit Statistics Before and After the Ramp Up Period

Figure 4. Productivity of KDE, Plone and Evince their ramp up period; they constitute 47% of contributors. Also similar to KDE, most of the Plone contributors show some form of change to their commit rate after their ramp up period is complete. The important difference from KDE, is that this plot shows that very few Plone contributors show improved productivity compared to those that have degraded productivity after their ramp up period. Only 32 of the 131 contributors showed improved productivity; less than 25% of the entire project. This Figure, coupled with the data in Table 1 provides evidence that, whilst not as capable of bringing new developers as KDE, the Plone project is also less effective

at converting new contributors into productive members of the contribution team. This is reflected by a lower ratio of developers committing equally high both during and after the ramp up (quadrant I): the majority of new developers, after an initially high commit rate fall to a lower commit rate after three weeks (quadrant IV).

6.3

Productivity Results – Evince

Figure 4 (bottom) shows that of the three projects, Evince sees the most dramatic change in its contributors after the end of the ramp up period. The vast majority of

Evince contributors have a low initial commit rate; 95 of 121 developers, over 78% of the overall community. In addition, the Evince community is able to compound this by also having the lowest commit rate improvements, with only 2 contributors showing improved commit rates after the end of their ramp up period. This is also visible in the lowest committers belonging to the quadrant I, and in general those ones committing at a high rate after the ramp up (quadrants I and II). Coupled with the data in Table 1, this confirms that the Evince project is not effective at bringing in new developers, and is not very effective at turning those contributors it manages to keep into productive team members.

6.4

Further Analysis

The results above are obtained when a ramp up period of three weeks is used. In this section, different ramp up periods are used: at first, the lengths of 1,2 and 3 weeks are assessed, and the mean commit-per-week rate evaluated for each. The results are shown in Table 2. As visible, when the ramp up period is extended, the mean commit rate drops both during and after the ramp up. However, the mean commit rate appears to be falling faster during the ramp up than in the period after. Potentially there is a point at which the two rates intersect, and the commit rates (before and after) are the same. This point would be a good indicator of when the average developer in a project has reached their full potential. The second step in the extended examination, therefore, is to measure and plot the mean commit rates during and after ramp up for each project for four, eight, twelve, twentyfour and fifty-two weeks. These figures are then plotted to establish is a point of intersection is reached. The resulting plots are shown in the three Figures 5. Figure 5 (top left) and Figure 5 (top right) show that, for the KDE and Plone projects, there is an intersection of the trends in mean commit rate for during and after the ramp up period. For both, the intersection occurs at 29 weeks, to the nearest week. This shows that, for the two projects, if the ramp up period is chosen to be shorter than this, the average contributor will be more productive during the ramp up. If the ramp up is chosen to be longer, then the average contributor will be more productive after the ramp up. Figure 5 (bottom) shows instead that for the Evince project there is also a point of intersection in the trends of average commit rate during and after various durations of ramp up period. This figure indicates, however, that Evince developers require a much longer ramp up period of around 70 weeks.

7

Threats to Validity

This section details the validity considerations inherent in the two studied premises.

7.1

Issues Of Coordination

Although this research has aimed to produce an accurate picture of the actual coordination paths through a project, only one type of data source has been used; email and bug tracking have all been disregarded. This is a serious threat especially when dealing with the analysis of new developers joining a project; it is arguable that their contribution starts at first by being acquainted with the existing developers on the mailing lists, or by submitting bug reports at first. The actual code contribution could be in some ways correlated with the duration of this ramp up period.

7.2

Issues Of Early Productivity

This research aimed to provide an understanding of how contributor productivity changes over time. This was especially important with regards to joining a new team, which is pertinent to Brooks’ Law. The approach taken here has been based upon counting the commits made by contributors to Free Software projects. As discussed above, this is not a classic approach to measuring productivity. Typically, in measuring productivity a metric such as lines of code per day would be used. This approach, however, does not take into account the effort of contributors working on artwork or documentation, for example. Measuring commit activity is equally limited in that the fact that a commit has taken place, although an indication that something has been produced, does not reveal how much was produced. Both approaches have their strengths and weaknesses. Measuring commits was decided upon as it allows for easy automation and encompasses the full range of project contribution in an activity agnostic manner.

8

Conclusions

This paper has proposed an empirical evaluation of a general principle of software engineering (the so-called Brooks’ Law) in the context of Free Software. Several earlier works have hindered the possibility that this principle, when applied in the distributed and open approach as advocated by the FLOSS proponents, will not apply as in traditional software engineering. The contribution that this work aimed to establish was a level of confidence in the conclusions by means of an empirical evaluation. Two aspects were considered: the issue of coordination of distributed developers, and the engagement of new developers through

One Week Project

During 7.651 6.647 2.138

KDE Plone Evince

Two Weeks

After 2.859 2.598 0.344

During 5.844 4.243 1.357

Three Weeks

After 2.714 1.622 0.292

During 5.052 3.300 1.107

After 2.645 1.327 0.279

Table 2. Mean Commit Rates For Varying Ramp Up Periods

8

7 During After

7

6

6

5

Mean Commits / Week

Mean Commits / Week

During After

5

4

3

4

3

2

2

1 0

10

20

30 Duration (Weeks)

40

50

60

0

10

20

30 Duration (Weeks)

40

50

60

2.2 During After 2 1.8

Mean Commits / Week

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

10

20

30

40 Duration (Weeks)

50

60

70

80

Figure 5. Mean Commit Rate During And After Ramp Up In KDE, Plone and Evince a “ramp up” period: three large projects were investigated for this reason: KDE, Plone and Evince. Regarding the first issue, this paper aimed at evaluating whether, with a growing number of developers, the coordination effort was also continually increasing. It was found that typically two phases can be found in Free Software projects: at first, a large coordination effort is required to manage few developers. This confirms the applicability of Brooks’ Law in the context of Free Software. A second phase sees a dilution of this effort as long as new developers join in, and a quasi-constant level of effort is needed. A third phase was observed for one of the studied systems,

where the effort required increased to a significant level again: in a further analysis, this was found not to be caused by coordination issues, but by the project’s internal characteristics. Regarding the second issue, the hypothesis to be tested was whether developers increase their productivity after a defined ramp up period. It was found that, given a defined period for ramp up (three weeks), projects can be classified based on how effective they are in engaging new developers: KDE can attract and retain new developers when observing the quadrants of its process. Evince is successful too at attracting new developers, but not as good as KDE in keeping

them attached to the project after the ramp up. This divide is also mirrored by another finding: for both KDE and Plone, developers become effective committers within a 30 weeks’ ramp up. For Evince this process is much longer and takes around 70 weeks.

References [1] P. J. Adams and A. Capiluppi. Bridging the gap between agile and free software approaches: The impact of sprinting. International Journal of Open Source Software and Processes, 1:58–71, 2009. [2] A. Bonaccorsi and C. Rossi. Why open source software can succeed. Research Policy, 32(7):1243–1258, July 2003. [3] F. P. Brooks. The Mythical Man-Month: Essays on Software Engineering, 20th Anniversary Edition. Addison-Wesley Professional, 1995. [4] A. Capiluppi. Models for the evolution of OS projects. In Proceedings of ICSM 2003, pages 65–74, Amsterdam, Netherlands, 2003. IEEE. [5] A. Capiluppi, J. M. González-Barahona, I. Herraiz, and G. Robles. Adapting the ”staged model for software evolution” to free/libre/open source software. In IWPSE ’07: Ninth international workshop on Principles of software evolution, pages 79–82, New York, NY, USA, 2007. ACM. [6] E. Capra and A. Wasserman. A framework for evaluating managerial styles in open source projects. In B. Russo, E. Damiani, S. Hissam, B. Lundell, and G. Succi, editors, Open Source Development, Communities and Quality, page 114. IFIP International Federation for Information Processing, 2008. [7] M. Conway. How do committees invent. Datamation, 14(4):28–31, 1968. [8] K. Crowston, H. Annabi, J. Howison, and C. Masango. Effective work practices for software engineering: free/libre open source software development. In Proceedings of the ACM workshop on Interdisciplinary software engineering research, 2004. [9] R. W. Floyd. Algorithm 97: Shortest path. Communications of the ACM, 6:345, June 1962. [10] A. Fuggetta. Controversy corner: open source software-an evaluation. J. Syst. Softw., 66(1):77–90, 2003. [11] M. W. Godfrey and Q. Tu. Eovlution in open source software: A case study. In Proeedings of 16th IEEE Int. Conf. on Software Maintenance, pages 131–142, 2000. [12] R. Goldman and R. Gabriel. Innovation Happens Elsewhere: How and Why a Company Should Participate in Open Source. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004. [13] J. D. Herbsleb and R. E. Grinter. Splitting the organization and integrating the code: Conway’s law revisited. In ICSE ’99: Proceedings of the 21st international conference on Software engineering, pages 85–95, New York, NY, USA, 1999. ACM. [14] I. Herraiz, J. M. González-Barahona, and G. Robles. Determinism and evolution. In A. E. Hassan, M. Lanza, and M. W. Godfrey, editors, Mining Software Repositories, pages 1–10. ACM, 2008.

[15] J. P. Johnson. Economics of open source software. http://opensource.mit.edu/papers/johnsonopensource.pdf. [16] P. Jones. Brooks law and open source: The more the merrier? IBM DeveloperWorks, August 2002. [17] T. Kamada and S. Kawai. An algorithm for drawing general undirected graphs. Information Processing Letters, 31:7–15, 1989. [18] S. Koch. Agile principles and open source software development: A theoretical and empirical discussion. In Extreme Programming and Agile Processes in Software Engineering: Proceedings of the 5th International Conference XP 2004, number 3092 in Lecture Notes in Computer Science (LNCS), pages 85–93. Springer Verlag, 2004. [19] S. Koch. Profiling an open source project ecology and its programmers. Electronic Markets, 14(2):77–88, 2004. [20] K. R. Lakhani and E. von Hippel. How open source software works: ¨freeüser-to-user assistance. Research Policy, 32(6):923–943, June 2003. [21] J. Martinez-Romo, G. Robles, J. M. Gonzalez-Barahona, and M. Otuño-Perez. Using social network analysis to study collaboration between a floss community and a company. In Proceedings of the Fourth International Conference on Open Source Systems. Open Source Development, Communities and Quality, 2008. [22] T. Mens, J. Fernandez-Ramil, and S. Degrandsart. The evolution of Eclipse. In Proc. International Conference on Software Maintance (ICSM). IEEE Computer Society Press, 2008. [23] B. Perens. The emerging economic paradigm of open source. First Monday, 10(SI-2), 2005. [24] R. S. Pressman. Software engineering: a practitioner’s approach (2nd ed.). McGraw-Hill, Inc., New York, NY, USA, 1986. [25] V. T. Rajlich and K. H. Bennett. A staged model for the software lifecycle. IEEE Computer, pages 66–71, July 2000. [26] E. Raymond. The Cathedral and the Bazaar, chapter The Cathedral and the Bazaar. O’Reilly & Associates, Inc., 1999. [27] E. S. Raymond. The Cathedral and the Bazaar. O’Reilly & Associates, Sebastopol, CA, USA, 1999. [28] W. Scacchi. Understanding Open-Source Software Evolution, pages 181–205. Wiley, 2006. [29] C. M. Schweik, R. C. English, M. Kitsing, and S. Haire. Brooks’ versus linus’ law: an empirical test of open source projects. In dg.o ’08: Proceedings of the 2008 international conference on Digital government research, pages 423–424. Digital Government Society of North America, 2008. [30] I. Stamelos, L. Angelis, A. Oikonomou, and G. L. Bleris. Code quality analysis in open source software development. Information Systems Journal, 12(1):43–60, 2002. [31] J. Tirole and J. Lerner. The Simple Economics of Open Source. SSRN eLibrary, 2000. [32] B. Tuckman. Developmental sequence in small groups. Psychological Bulletin, 63, 1965. [33] S. Weber. Why open source works. Ubiquity, 5(11):1–1, 2004. [34] G. M. Weinberg. The Psychology of Computer Programming. John Wiley & Sons, Inc., New York, NY, USA, 1985. [35] L. Zhao and S. Elbaum. Quality assurance under the open source development model. Journal of Systems and Software, 66(1):65–75, 2003.