Using internal benchmarking as strategy for

0 downloads 0 Views 57KB Size Report
Using internal benchmarking as strategy for cultivation: A case of improving. COBOL software maintenability. Petter Øgland. Department of Informatics ...
Using internal benchmarking as strategy for cultivation: A case of improving COBOL software maintenability Petter Øgland Department of Informatics, University of Oslo, Norway [email protected] Abstract. Implementing metric systems for organizational learning in a software development environment remains a challenge. Some researchers state that traditional quality management methods from the assembly line do not apply for knowledge workers, while other researchers point to problems of what specifics should be focused on when implementing metrics systems. In this paper, a bottom-up approach is designed as a part of an action research program, and used for creating a robust way of installing the system, applying Kurt Lewin’s force field analysis as a way of compensating for lacks in system design. Using Jeffrey Goldstein’s view of Lewin’s force field theory through the perspective of complexity theory, changing structure is then used as an approach for changing culture, indicating that this may be a way of creating organizational learning without having to deal too much with issues that are usually emphasized in explicit system design.

Introduction While the theory for software process improvement and organizational learning has existed for several decades, there are still challenges in how to implement the concepts in an efficient manner. Efforts often fail, and often they fail costly. As viewed by some researchers, not only do organizations fail to install learning mechanisms, but in the process of running installment projects they manage to develop sustainable mechanisms for preventing the organization in becoming a learning organization (Lyytinen & Robey, 1999).

The issue of installing systems or infrastructures that will aid organizations to become learning organizations thus remains an important challenge, especially as the socio-technological environment tends to grow increasingly complex and thus tend to make problems that could previously be solved by traditional systems thinking less stable (Ciborra et al, 2000). There are, however, schools of organizational change and learning that try to take organizational complexity into account. Goldstein (1994) has been a popular reference among organizational change researchers with a total quality management (TQM) background (e.g. Dooley et al, 1995), but Goldstein’s simple ideas do not seem to have made much of an impact on organizational change researchers with an information systems (IS) background. In the research described in this article, the starting point has been to use the theories of Kurt Lewin in the context of software process improvement, following the example of Iversen, Mathiassen and Nielsen (2004), and then use the insights provided by Goldstein on Lewin to the context of software process improvement. The aim of the research is thus to investigate to which extent simple ideas from the area of complex adaptive systems (CAS) can be applied within software process improvement. The article starts with relating the problem to research done in the area of complexity and cultivation in information systems theory, thus motivating the section on methodology. The results are presented as a case study, followed by a discussion. The final section extracts what seems to be the essence of what can be learned from the study.

Related research In both academic and consultancy literature we find a typical advice for installing metric systems, quality management systems, knowledge management systems etc., to be that there should be strong top management commitment (e.g. Davenport, 1998; Ahren, Clouse & Turner, 2004). However, often top management will not be able to support quality improvement initiatives as much as one might want. In the 1990s, an approach on how to create change in change resistant organizations by use of complexity theory was developed (e.g. Goldstein, 1994). Complexity theory, as used in this context, is usually a “soft” interpretation of the mathematical theory of dynamic systems, and can consequently be seen as a kind of elaboration on classical sociological functionalism (Parsons, 1951), but with the exception that the structures or systems being studies are no longer considered stable. On the contrary, the main idea of the approach is to consider the organization as an evolving organism, driven by positive feedback loops and

increasing returns rather than the control and homeostasis perspective of organizational management cybernetics. As pointed out by Ciborra et al (2000, chapter 1), literature on the management of corporate information infrastructure seldom takes the complexity issue into account and thus provide implementation and maintenance strategies based on the wrong assumption that the dynamics of the organization can be described as a predictable system. While the traditional approach to software engineering was based on viewing the development process as a stable and predictable process, in recent years there have been developed alternative methods (e.g. “agile methods”) for evolving information systems in unstable or unpredictable environments, thus breaking down the classical development cycle into several iterative cycles where typically specifications are minimized and decisions are delayed as long as possible. Goldstein argues that total quality management (TQM) and similar frameworks for creating organizational change are best understood through the paradigm of complexity. However, there seem to be few ideas on how to use the statistical and mathematical methods of TQM in a way consistent with the complexity view. However, if we look at some of the insights from the history of management theory, I want to focus on three ideas: • • •

Hawthorn effect (“What gets measured gets done”) Internal benchmarking as a method for stimulating internal competition “Double loop learning” (Argyris & Schön, 1978) in terms of challenging existing methods and standards

In this paper I want to discuss the idea of cultivating quality management in a resistant culture by use of measurements and benchmarking as ways of stimulating the organization to develop patterns of double loop learning. I want to see to which extent it is possible to take whatever standards may be at hand, use these standards for measuring and benchmarking, and thus make the work processes comply with the standards (“single loop learning”) or frustrating the organization to revise their own standards (“double loop learning”) if the current standards are perceived as suboptimal.

Research setting and approach This section describes the research setting and the empirical strategy adopted. The research was conducted at the Software Maintenance Section at the IT department of the Norwegian Directorate of Taxes (NTAX).

IT Department

Information Security

Research Scientist

Systems Development Section

Systems Maintenance Section

Systems Production Section

Group 1 Projects: MVA, FOS, DSF

Group 2 Projects: GLD, ER@, FLT

Group 4 Projects: PSA, LFP, LEP, RISK

Technical Infrastructure Section

Figure 1. Simplified organizational chart for the IT department at NTAX

Ten of the NTAX mainframe information systems are based on COBOL software, and need to be maintained on an annual basis. Seven of the systems (LFP, LEP, PSA, ER@, GLD, FLT, FOS) follow annual life cycles, meaning that maintenance and COBOL standardization are carried out during specific times of the year. The remaining three systems (MVA, DSF, RISK) are systems that are being maintained on an ongoing basis. The maintenance is taken care of by approximately 40 programmers, each of the projects delegated among three groups. The distribution between male and female programmers is about 50/50. The age distribution was from about mid thirties to mid sixties, with most of the people being in the age slot between 40 and 50. Few of the programmers have formal computer education, although the employment policy in recent years has focused on getting people with formal computer background. The research was designed as a part of a quality improvement strategy in 2000, and, as illustrated in figure 1, the research is carried out by a researcher being a part of the organization. In order to handle problems having to do with doing research in ones own organization, an action research approach for doing research in one’s own organization has been adopted (Coghlan & Brannick, 2001). The study is part of the broader context of an action research initiative dealing with several NTAX processes. The case study is based on the empirical data collected from the COBOL standardization process 1998-2005 by the researcher who during the period of research held the position of quality manager, a function organized as a part of the system development section (figure 1). The empirical data was collected through unstructured interviews, document analysis and observation. Interviews were held with programmers when presenting them with results from document analysis. Managers were mostly

interviewed at the end of one cycle and the beginning of the next. Document analysis consisted of going through various drafts and final versions of the internal COBOL standard, system documentation made by the programmers, quality statistics provided by the programmers, plus various sorts of literature the programmers were using for designing and updating the internal COBOL standard. During the whole period from 2000 to the present, about 50 interviews with programmers were conducted, about 3 interviews with group managers, about 3 interviews with the systems maintenance section manager and 2 interviews with the IT manager. The interviews were conducted in an improvisational manner without notes or minutes being written.

Case study In 1997 one of the COBOL programmers at the Norwegian Directorate of Taxes died, and other programmers had to step in. As there had been no standardized way of programming, this caused severe problems, making it clear to both management and programmers there should be a standardized way of working. As a part of the IT strategy document of 1998, the need for updating current COBOL software and making all new software compliant with modern standards was made into an issue (SKD, 1998, chapter 4.3.1). In 2000, a first version of a local COBOL programming standard was produced by the programmers and accepted by management as the standard to be followed. The programmers also made a metrics system for measuring deviations between software in use and the standard. The diagrams below show annual measurements results in terms of nonconformity level (NC level), and the rate of change in terms of difference in nonconformity level for two consecutive years (delta NC).

180

80

160 60

140 120

40

100 80

20

60

2005

2004

2003

2002

2001

2000

1999

NC level

2005

2004

2003

2002

-20

0

2001

20

2000

0

40

-40 Delta NC UCL = 22

AVG = 14 LCL = 6

=

Figure 2. Measuring overall development of the non-conformity (NC) level against the standard

The first of the two diagrams shows that the non-compliance level is decreasing (i.e. software is getting increasingly compliant with the standard), while the second diagram shows that there was slow speed and great variation in improvement rates in the beginning, while there has been greater speed and less variation in improvement during the most recent years. The reason why there is a significant gap between the years 2003 and 2004 is due to the fact that the programmers felt uncomfortable with the original standard they had made, after using it for a few years, and decided to revise the standard. The gap is thus not due to excessive improvement in the non-conformity level, but due to a change in how non-conformity was being measured. In spite of both programmers and management being aware of the problem of non-standardized software being a potential problem if there should be needs for rotating software maintenance responsibility, and the programmers themselves being in charge of both making the programming standard and the way for monitoring how the standard was being followed, the monitoring has generally been met with resistance both among programmers and managers. As a first step in the COBOL standardization action research, the researcher was told by management that the person given the responsibility for designing the COBOL standard was the COBOL programmer that had to step in when another COBOL programmer died, and thus had strong views on what structured COBOL software should look like. During the interviews, it soon became obvious that this person had an axe to grind with other programmers. In particular, there seemed to be a dispute between this person X and another person Y within the same group. The person X was irritated by Y having more responsibility and greater salary, and seemed keen on

developing and implementing the COBOL standard in a way that would put the person Y in a bad position. By defining a COBOL standard and metric program that would make him (X) score well and everybody else bad, he hoped to be able to use these statistics during annual appraisals, and as a way of gaining more power within the community. When introducing the standard for the community at large, neither the researcher nor X said anything about the metrics. During various random interviews with programmers, the researcher was told that there were all sorts of standards and procedures already existing for various tasks, but nobody took the standards seriously, as they were not being monitored against the standards, so nobody cared all that much about this new COBOL standard either. However, as the measurements program was implemented and the researcher started asking X for data to analyse, the programmers started getting engaged in the process. While the standard itself had been made by a committee of programmers, headed by X, and accepted and signed by management at the section level and department level (figure 1), the researcher started asking the programmers why the measurements indicated this or that, and asked them if they intended to do anything about it. Part of the data analysis done by the researcher consisted of benchmarking the various projects against each other, and when interviewing the programmers, he also explained how he presented the research data to management. Figure 3 illustrates a typical data analysis done by the researcher as a preparation for an interview. 120

106

100 80

66

66

62

60

58 50

40

25

FOS

RISK

LFP

DSF

ER@

NC level (LFP) Linear regression

GLD

0 1998 1999 2000 2001 2002 2003 2004 2005

MVA

13

9

PSA

20

FLT

y = -16,507x + 174,9 2 R = 0,9807

113

LEP

180 160 140 120 100 80 60 40 20 0

NC level 2005

Figure 3. Standardization results for one project (LFP) and benchmarking results for this project (LFP) as compared to the other nine projects.

In the diagram on the left hand side above, we see the improvement results for on the standardization projects (LFP). In the diagram on the right hand side we benchmark the results for all the projects. As we see, the LFP project ranks about average on the benchmark. Figure 4 is a typical example of the information that we would show on an annual basis for the LFP group, giving them the impression that we were concerned with what they were doing and thus hope that the Hawthorn effect would make them continue. We also hoped the benchmark results would stimulate the competitive instincts within the group, and we also told them that we also presented the results to management as we hoped this would perhaps also add tension for further improvements. As the work environment can not be seen as framework for a controlled experiment, the diagrams were used only as weak indications as to whether the strategies seemed to work or not, or rather as indicators for identifying behaviour contradictory to what we were expecting. In addition to the run charts and benchmarks in figures 2 and 3, in figure 4 we have included a benchmark with respect to improvement speed as additional motivation for the programmers. In order to investigate relationships between the two benchmark diagrams, we have also included a correlation analysis on the right hand side of diagram 4. 20

20 16,5

15,3

15

15 9,8

10

9,6

10 7,1

6,4

5,8

5

3,6

5

3,4

0

MVA

PSA

FOS

RISK

LEP

ER@

GLD

DSF

LFP

-5

FLT

0

0

-5 -4,7

50

100

R2 = 0,4758

-10

-10 Improvement rate

NC level / delta NC Least squares parable

Figure 4. Benchmarking against NC improvement rates and performing correlation analysis for understanding the relationship between individual NC levels and NC improvement rates

The diagram on the left-hand side of figure 4 shows improvement for nine out of ten projects. We were allowed to interview the group performing “best in class” (LFP), but not allowed to interview the group performing “worst in class”.

The person being responsible for the LFP group was the person Y that was the target for the standardization program in the eyes of person X. During the interview, this person told us that he had no plans in changing any of the programs he was personally responsible for, and that most of the changes in software having been done so far had introduced new errors and production failure. When being interviewed, a fellow senior programmer Z and their manager M were also present. The programmer Y said that the reason for improvement had mainly to do with two cases. Firstly, he had asked the group to identify a limited group of programs to standardize and asked them to only modify a few types of errors that he expected would have impact on the measurements. Secondly, he said that some of his fellow programmers had been highly motivated by the internal benchmarking, and had put in a lot of effort to improve the results. The other senior programmer Z admitted that he was highly motivated by the benchmarking, and said that he also used the opportunity to make the software better in ways that would not be measurable against the standard. The manager M, who incidently was the person who had suggested that we should to the COBOL standardization quality improvement to begin with, believed that more uniform ways of programming might make reduce the change of severe problems when people retire or other reasons for rotating software maintenance. Being confronted with the fact that the this approach seemed to have worked quite well and that they were “best in class”, they were asked whether they believed measurements and statistical methods was a good way of quality control and whether they had ideas for other areas where similar control and improvement could be installed. However, neither of them seemed particularly happy with measurements, and had no suggestions for further use of such methods. In fact, they explained that they had been asked by top management not to respond to any questions asked by the researcher if there was not a manager present and generally to participate as little as possible in the research project.

Case analysis Analysing the case with respect to the force field and complexity issues presented previously, it seems reasonable to consider the force field of the organization spanned by the three groups of programmers, managers and the researcher. In table I, the cumulative force (vector) for each group is described in terms of direction (attitude) and strength (engagement).

Engagement Attitude

Programmers Weak Negative

Management Weak Positive

Researcher Strong Positive

Table I. Attitudes and engagements for the COBOL standardization

As the attitudes and engagement with the COBOL standardization has varied over time, it is only the researcher who has maintained consistent attitudes and engagements over time. In the case of the programmers, there was little engagement to begin with, as nobody expected to be monitored. Once the measurements started, attitudes and engagement were mixed, but in general the attitude became negative and engagement strong. This lead to the standard being revised, causing the attitude to become more neutral and engagement weaker. The attitude and engagement of the managers followed the same pattern as that of the programmers, as they generally found it more important to keep the programmers happy than making sure some standard was being followed. As the data collection and analysis done by the researcher was a part of the control loop, the researcher was in the position where the people who had asked him to do the research were less convinced in the process than he was himself. The researcher could always point towards top management or the national audit as reasons for why the standardization was being carried out, but as COBOL standardization was a process far removed from the core business process, there was little real support from the establishment. However, probably due to the fact that the researcher was not a part of the programming group, the managers within the programming community apparently felt a greater need to protect the programmers from the research being carried out than any care for whether the COBOL software was getting any better or not. Considering the results in a more abstract context, what was happening seems to have been the following: 1. There were mixed views about the standardization approach among the programmers 2. The low level managers found it more important to maintain calmness among programmers than to have them adopt to a quality standard 3. The research scientist carried out data collection and analysis, deliberatly ignoring whatever the programmers nor the managers were saying about the metrics system, thus giving them “single loop learning” feedback by the presentation of the analysis itself and “double loop learning” from the

fact that they were being confronted with their own methods and standard being put into a rigid system. The situation seems to be similar to the famous Hawthorn studies (Scott, 1998, p. 62) in terms of improvement being achieved simply due to the fact that the programmers where being monitored. Unlike the Hawthorn studies, however, the “research questions” (control parameters) where not made by the scientists, but rather by the programmers themselves, and the tension being used for the Lewin-like force field analysis consisted not only of programmers getting attention from the research but in tension from the following sources: • • •

Programmers being observed by researcher Programmers being monitored by management Programmers monitoring each other through benchmark statistics

In order to make such a system sustainable, the strategy should be on weakening the relationship between the programmers and the researcher while maintaining the process through programmers being monitored by management and by programmers monitoring each other through benchmark statistics. However, as was shown in the final interviews, there were still mixed feelings about the measurement method, and general experience from action research projects in the sixties and seventies should indicate that the metrics system will probably collapse the minute the researcher stops his data collecting and analysis activities.

Conclusion Simple methods for quality improvement, as used on the industrial assembly line, seem to face a challenge when being put into use among software programmers. In an initial force field analysis done at NTAX, it was noticed that there were forces in favor of introducing COBOL standards and forces preventing the standard being implemented. Viewing the overall force field of the organization, the initial analysis indicated there being a point attractor for maintaining status quo. The purpose of the research was to investigate whether a simple behaviorist method of benchmarking software quality among groups of programmers would be a method for changing the force field, and perhaps creating a new attractor for maintaining continuous improvement. By letting the programmers themselves develop the standards and metrics, and using elements of internal tension and competition for igniting the benchmarking

system, software quality has been continuously improved for seven years. However, the improvement seems to be inherit to the dissipative structure of the action researcher requesting data for analysis, and thus no new attractor for remaining in a status of continuous improvement seems to have been established. On the contrary, as seems to be the general case in action research, when the research is terminated, it is expected that improvements will stall and the system will return to its original equilibrium of no improvement. In conclusion, the main insights from the research seems to be that “what gets measured gets done” is also the case with knowledge workers, although the knowledge worker may perhaps be more resistant to being monitored and thus cause hesitation in management for whether monitoring by measurements is a good approach. In terms of further research, there seems to be a need for investigating the action research approach in itself, focusing on what can be done in terms of designing research in a manner that will make improvement processes sustainable.

References Ahren, D. M., Clouse, A. and Turner, R. (2004). CMMI Distilled. Second Edition. AddisonWesley, Boston. Argyris, C. and Schön, D. (1978). Organizational Learning: A Theory of Action Perspective, Reading, MA: Addison-Wesley. Coghlan, D. and Brannick, T. (2001). Doing Action Research in Your Own Organization, SAGE Publications, London. Ciborra, C. et al (2000). From Control to Drift: The Dynamics of Corporate Information Infrastructures, Oxford University Press: Oxford. Davenport T. H. and Prusak, L. (1998). Working Knowledge: How Organizations Manage What They Know. Harvard Business School Press, Boston, Massachusettes. Dooley, K., Johnson, T. and Bush, D. (1995). “TQM, Chaos and Complexity”, Human Systems Management, 14(4): 1-16. Fitzgerald, K. (2003). “Why TQM Initiatives Fail to Transform Organizations: The Role of Management and Leadership Quality”, Decisions Science, December 2003. Goldstein, J. (1994). The Unshackled Organization: Facing the Challenge of Unpredictability Through Spontaneous Reorganization, Productivity Press: Portland, Oregon. Iversen, J. H., Mathiassen, L. and Nielsen, P. A. (2004). ”Managing Risk in Software Proccess Improvement: An Action Research Approach”, MIS Quarterly, Vol. 28 No. 3., pp. 395433. Lyytinen, K. and Robey, D. (1999). “Learning failure in information systems development”, Information Systems Journal, vol. 7, 85-101. Parsons, T. (1951). The Social System. Free Press: New York. Scott, W. R. (1998). Organizations. Fourth Edition. Prentice Hall, Upper Sadle River, New Jersey.

SKD (1998). Strategisk plan for bruk av IT i Skatteetaten, SKD nr 62/96, Norwegian Directorate of Taxes, Oslo. SKD (2001). Opprydding og standardisering av COBOL-baserte IT-systemer, SKD nr 61/01, Norwegian Directorate of Taxes, Oslo. SKD (2002). Opprydding og standardisering av COBOL-baserte IT-systemer, SKD 2002-018, Norwegian Directorate of Taxes, Oslo. SKD (2004). Opprydding og standardisering av COBOL-baserte IT-systemer, SKD 2004-001, Norwegian Directorate of Taxes, Oslo. SKD (2005). Opprydding og standardisering av COBOL-baserte IT-systemer, SKD 2005-003, Norwegian Directorate of Taxes, Oslo. SKD (2006). Opprydding og standardisering av COBOL-programvare, Internal project report 24.01.06, Norwegian Directorate of Taxes, Oslo.