Applied Discrete Mathematics and Heuristic Algorithms

ISSN 2411–7889

Samara State University (Russia)

Applied Discrete Mathematics and Heuristic Algorithms International Scientific Journal

Vol. 1, No. 3 2015

2

APPLIED DISCRETE MATHEMATICS AND HEURISTIC ALGORITHMS, 1, 3 (2015)

Periodic electronic scientific journal founded in 2015 published 4 times a year. The founder of the journal is Samara State University (Russia). Editor-in-chief: Boris Melnikov (Russia, Samara State University, Professor). Publishing house “University of Samara”, Russia, 443011, Samara, Academic Pavlov str., 1. Tel. +7 846 334 54 23. e-mail: [email protected] Editorial office: Russia, 445027, Togliatti, Yubileynaya str., 31-G. Tel. +7 848 250 52 33. e-mails: [email protected], [email protected] Web-sites of the journal: http://samsu.ru/ru/node/4736 and http://www.archive.samsu.ru/ru/node/4736. We publish Russian electronic periodical “Heuristic Algorithms and Distributed Computing” since January 2014. Since 2015, our editorial board in cooperation with Samara State University publishes English electronic periodical “Applied Discrete Mathematics and Heuristic Algorithms”. It is not just English translation of Russian version. We suppose to include in English journal translations of some Russian papers published in previous year, but first we will publish redesigned and improved versions of previously published articles along with new papers. The editorial board invites authors to submit articles for both journals. Our journal includes following sections: “Mathematical Modeling”, “Algorithms and Heuristic”, “Applied Discrete Mathematics and Automata Theory”, “Parallel and Distributed Computing”. © Samara State University, 2015.


3

Editorial board Juraj HROMKOVIČ, honorary editor

Switzerland, Zürich, Eidgenössische Technische Hochschule

Boris MELNIKOV, editor-in-chief

Russia, Samara State University

Alexander KRUTOV, vice-editor-in-chief


Sergey MAKARKIN, executive secretary


Alessandra CHERUBINI

Italy, Milano, Politecnico di Milano

Shi-Jinn HORNG (洪西進) Shamil ISHMUKHAMETOV

Republic of China, Taipei, National Taiwan University of Science and Technology Russia, Kazan (Volga Region) Federal University

Antal Miklós IVÁNYI

Hungary, Budapest, Eötvös Loránd Tudományegyetem

Zoltán KÁSA

Romania, Târgu Mureş, Universitatea Sapientia

Jörg KELLER

Germany, Hagen, Fernuniversität in Hagen

Sergey NOVIKOV


Guennady OUGOLNITSKY

Russia, Rostov-on-Don, South Federal University

Shariefuddin PIRZADA

India, Srinagar, University of Kashmir

Sergo REKHVIASHVILI

Russia, Nalchik, Institute of Applied Mathematics and Automation

Vladimir RUDNITSKY

Ukraine, Cherkassy, Technological University

Yuri SMIRNOV

Russia, Penza State University

Andrey SOKOLOV

Russia, Petrozavodsk, Karelian Research Centre of Russian Academy of Sciences

Sergey SOLOVYEV

Russia, Moscow State Lomonosov University

Yuri TYUTYUNOV

Russia, Rostov-on-Don, South Research Centre of Russian Academy of Sciences

4


CONTENT Mathematical Modeling I. Senina, M. Borderies, P. Lehodey. A spatio-temporal model of tuna population dynamics and its sensitivity to the environmental forcing data 5–20

Algorithms and Heuristic D. B. Hanchate, R. S. Bichkar. The machine learning in software project management: A journey. Part I 21–47

Applied Discrete Mathematics and Automata Theory B. F. Melnikov. On the star-height of a regular language. Part III: The star-height of an automaton 48–56

Parallel and Distributed Computing A. M. Iványi, Z. Kása. Parallel partial ranking 57–75 About the authors

76

APPLIED DISCRETE MATHEMATICS AND HEURISTIC ALGORITHMS, 1, 3 (2015) 5–20

A SPATIO-TEMPORAL MODEL OF TUNA POPULATION DYNAMICS AND ITS SENSITIVITY TO THE ENVIRONMENTAL FORCING DATA *

Inna SENINA, email: [email protected] Collecte Localisation Satellites, Toulouse, France Southern Federal University, Rostov-on-Don, Russia Mary BORDERIES, email: [email protected] INP-ENSEEIHT, Toulouse, France Patrick LEHODEY, email: [email protected] Collecte Localisation Satellites, Toulouse, France Abstract. The model SEAPODYM is used to predict the spatio-temporal dynamics of skipjack tuna (Katsuwonus Pelamis) in the Pacific Ocean. It computes the density of tuna prey populations (micronekton) and those of tuna as a top predator species in off-line coupled simulations. This model is based on advection-diffusion-reaction equations with movement and reaction terms being dependent on physical and biochemical ocean variables, which are either predicted by ocean circulation models or derived from satellite or in-situ observations. A Maximum Likelihood Estimation (MLE) approach with fishing and tagging data is used to estimate model parameters and to obtain the model solution that allows the best fit to the observations. Two physical models – ECCO (http://www.ecco-group.org/) and OMEGA (http://argo.ocean.ru) provided input data to SEAPODYM and optimal solutions were obtained because of parameter estimation procedure. To explain the differences in the resulting solutions we performed a sensitivity analysis and implemented a method to study the propagation of uncertainty in the forcing data based on the use of empirical orthogonal functions. Results showed that the model solutions depend strongly on the quality of the ocean currents due to the high sensitivity of micronekton model solutions to these variables. Keywords and phrases: population dynamics model; skipjack tuna; maximum likelihood estimation; sensitivity analysis; estimation of uncertainty; empirical orthogonal functions. Computing Classification System 1998: G.3 Mathematics Subject Classification 2010: 93E12

1. INTRODUCTION During the last 20 years, the catches of tuna populations kept growing [1] while the general reduction of catch per unit of effort raised the concerns about the status of tuna stocks. When direct observations of population abundance are unavailable or unreliable, modeling provides the only possi*

© I. Senina, M. Borderies, P. Lehodey, 2015.

6

I. Senina, M. Borderies, P. Lehodey

bility to learn about the population dynamics and to assess the amount of fish, which can be harvested without bringing the populations to extinction levels [2]. For tunas that are very mobile animals, population dynamics models must be able to describe temporal and spatial dynamics in order to provide the correct estimates of the stock and to be used as a tool for fishery management [3]. Such models should also possess predictive skills, in other words, they must be tightly coupled to the existing observations through a robust method of data assimilation [4-6]. SEAPODYM is one of the models that were developed following these objectives to investigate tropical tuna spatial dynamics linked to the ocean ecosystem (see [7-11]). It predicts spatial distributions of tuna prey density and density of studied tuna population by (usually) monthly age class. Fish movements are driven by environmental factors that control the spawning conditions and the accessibility to forage. Thus, the model takes into account the climate variability through the use of physical and biogeochemical forcing variables such as temperature, currents, oxygen, phytoplankton concentration and euphotic depth. These variables are first used by the sub-model component SEAPODYM-MTL to simulate the distribution of prey, i.e., the mid-trophic level (MTL) organisms (micronekton) that are key drivers of tuna dynamics. The outputs of SEAPODYM are therefore strongly dependent on the quality of its environmental forcing. The physical variables (temperature and currents) are outputs of ocean circulation models, either from hindcast simulations or reanalyses. In the first case the ocean model dynamics is mostly driven by atmospheric forcing; in the reanalysis, the simulation also includes observations of oceanic variables (e.g. ARGO profilers, satellite sea surface temperature and altimetry) that are assimilated in the model to correct the model outputs and produce more realistic circulation patterns. The biogeochemical variables (primary production, dissolved oxygen concentration and euphotic depth) can be obtained from a biogeochemical model that is coupled to the physical model or from satellite ocean color sensors, from which chlorophyll-a, euphotic depth and vertically-integrated primary production are estimated. However, in that case the dissolved oxygen concentration is not available and needs to be replaced by a climatology (i.e., interpolated monthly average fields based on all available observations). The model SEAPODYM includes a representation of fisheries and predicts total catch (CPUE) and size frequency of catch by fleet when spatial fishing data (catch and effort) are available. A method of parameter estimation was implemented [11], allowing the model predictions to ap-

A spatio-temporal model of tuna population dynamics ...

7

proach observations in a quantitative way and thus providing robust metrics to evaluate the model predictive skills. The observations are 1) catch-at-age by fishing gear at a given month and 1x1 degree grid cell; 2) length frequency distributions for the fishery at quarterly temporal resolution and per sampling region. Recently the tagging data were as well integrated within the parameter estimation procedure, allowing better estimation of movement and habitat parameters [12]. Once the model has converged to an optimal solution with all control parameters being estimated, the fit is also tested using an out-of-sample dataset to check the model predictive skills and the absence of error bias. Skipjack tuna is the most abundant tuna species in the Pacific Ocean and its contribution to the total tuna catch is highly significant (about 80% in Western and Central Pacific Ocean (WCPO) and 72% of tuna catches on entire Pacific during last 15 years, see [1]). Exhaustive dataset with spatially distributed industrial catches is collected by two tuna commissions in Pacific Ocean (WCPFC and IATTC) and provided for SEAPODYM studies. Skipjack has a rather short life cycle, only 5 years, which makes this species attractive for modeling. Several applications of SEAPODYM to Pacific skipjack tuna have been produced from different physical and biochemical forcings, with the optimization approach providing the model parameter estimates that minimize the errors in model predictions [12-15]. In this paper, we present and discuss the model predictions for skipjack obtained with two different physical forcing datasets. Although the same fisheries data are used, the results show significant differences in the parameters estimates, the population distributions and consequently the overall model predictions. The reasons of these differences are investigated, in particular through a sensitivity analysis (SA) conducted for the SEAPODYM-MTL sub-model. A better knowledge on the model sensitivity to its input variables allows us to decide which forcing variables are needed with high accuracy and, on the opposite, which ones may contain errors. Eventually, a very low sensitivity to a given variable may lead to the simplification of the model by excluding the concerned variable that has no impact on the model solution. The paper is organized as follows: the Methods section presents the forcing datasets and shortly describes the main dynamical mechanisms that involve the forcing variables. Then, the methodology for the sensitivity analysis is provided. The Results section gives the optimal solutions of SEAPODYM obtained under different forcings, and lists the results of the sensitivity analysis. Finally, the Conclusion section summarizes and discusses the main results and their implications for further studies.

8

2.


METHODS

2.1. SEAPODYM forcing and related dynamic mechanisms In the present study, the SEAPODYM forcing dataset included biogeochemical variables derived from satellite ocean color data and two physical reanalyses predicted by different ocean circulation models. The first one, ECCO, is based on the MIT general circulation model (MITgcm), a numerical model designed for the study of the atmosphere, ocean, and climate. It is forced by the atmospheric reanalysis ERA-INTERIM. The second, OMEGA, is a reanalysis of the ocean circulation built from historical data on temperature and salinity obtained by ARGO profiling floats 1 and a generalized hydrodynamic adjustment method [16,17]. ARGO datasets for the period 2003-2012 were used as observations to obtain balanced fields of temperature, salinity, and currents in the overall area of the global ocean 2. The SEAPODYM model simulates tuna age-structured populations. Different life stages are considered: larvae, juveniles and (immature and mature) adults. After juvenile phase, fish become autonomous, i.e., they have their own movement (linked to their size and habitat quality) in addition to be transported by oceanic currents. The movement of both immature and mature fish is controlled by two habitat indices – spawning and feeding, depending on their location and season. In the equatorial region where the seasonal changes of the day length are very moderate, the fish movements are always governed by feeding habitat index, i.e. depend on the density of prey organisms in their habitat. There are six prey functional groups described in the model [18,19], three are resident organisms in their vertical layer and three are undertaking the diurnal migrations to the upper layers during the night. For convenience, let us represent these variables in the matrix form, i.e. F is a left triangular matrix with the row index indicating the day layer of micronekton and column index standing for the night layer. The habitat index H a for tuna of age a is the total accessible micronekton in the water column, calculated as follows:





Ha  Θa   Fe  1-   FΤ e ,

(1)

where Θ a is the row vector consisting of accessibility measures to model vertical layers, the accessibility being the function of tuna preference or tolerance at age in relation to temperature and oxygen in each layer;  is the daily proportion of day time and e is a unit column vector.

1 2

http://www.argo.ucsd.edu/About_Argo.html http://argo.ocean.ru


9

Besides movement, the feeding habitat index is also used to control local mortality rates of the tuna in relation to the environmental conditions: Ma  ma  1   

1 2H a

,

(2)

where ma is the average mortality at age due to predation and senescence,  is the model parameter. If the habitat index is 0 (unfavorable habitat) then mortality rate increases 1   times and the opposite, the mortality decreases 1   times if the habitat is 1 (favorable). From equations (1) and (2) if  is large enough, it is obvious that the micronekton is the critical variable determining tuna spatio-temporal dynamics. Hence, the sensitivity analysis presented in this paper is focusing on the SEAPODYM-MTL model and the three environmental variables that control the dynamics of micronekton – ocean currents, temperature and primary production (for more details see [18]). 2.2. Sensitivity analysis To assess the sensitivity of the SEAPODYM-MTL model to the physical forcing, one needs to choose a method to perturb the input variables in space and time, and then to introduce sensitivity measures that quantify the response of the model to created perturbations. Two methods were employed to generate perturbations of forcing fields: 1) a linear perturbation of the mean; 2) an ensemble of perturbations based on the error between variables predicted by OMEGA and ECCO models (for physical variables) or by VGPM and VGPM-EPPLEY algorithms (for primary production). In the first method the linear perturbation formula was applied to monthly climatology of each variable as follows: (3)  ( p )       , where   0,1 is a discrete-valued factor and  is monthly climatology of variables  . The objectives of this type of perturbations are 1) to rank the input variables by their respective sensitivity and 2) to determine whether the climatological fields derived from the in-situ data can be used as the forcing without losing quality in model predictions. The perturbations of the second type were obtained by calculating the error field between two models (M1 and M2) variables and decomposing it using the method of empirical orthogonal functions [20-22]:    M1   M 2   i  i t  i x, y  , (4) where  i t  is the time series of amplitudes of ith empirical orthogonal function  i x, y  . The EOF computation was done with help of R routine

10


eof.mca.r 3. Then adding the random combination of twenty dominant (in terms of explained variance) EOF modes to the reference variables provides the ensemble of perturbed variables: 20  k p     i 1  iki (t ) i x, y  , k  1,30,  ~ 0,1 , (5) so the ensemble of 30 perturbed fields for each variable were generated and hence 90 simulations (each forced by the dataset with a single perturbed variable) were prepared. The SA with perturbations (5) can also be viewed as an uncertainty analysis as it includes explicitly the uncertainty of the predicted forcing fields and through the model response to these added errors provides the measures of their propagation into the SEAPODYM model predictions. Similar method was employed by Lucas and co-authors in [23] to study the uncertainty of temperature fields with respect to atmospheric forcing variables. Finally, in order to assess the sensitivity of the model to each input variables the following sensitivity measures were introduced for micronekton density F:  F ( p ) (t )  F (t )    100% , S1  E F (t )  





 

 E F ( p ) (t )  E F ( p ) (t ) S2    Var F (t ) 

2

 

1

(6) 2

.

(7)

The measure S1 was used in the first type of SA with linear perturbations. The averages were calculated over the entire model domain or region (S1), or for each grid cell (S2). The S2 measures the proportion of variability (standard deviation) in the perturbation ensemble with respect to the natural variability observed in the respective variable. Note also that these measures need to be calculated skipping the first years of simulations during which the perturbations are not yet effective. In the present study the simulations started in 2003 and the period 2006-2010 was selected for computations of sensitivity statistics. 3.

RESULTS

3.1. Skipjack ECCO and OMEGA solutions with SEAPODYM Two optimal solutions were achieved for skipjack tuna dynamics predicted by SEAPODYM using maximum likelihood estimation approach and the same fishing datasets. Thus, both configurations only differ by their physical forcing datasets, OMEGA and ECCO; the primary production and oxygen fields were the same. 3

http://menugget.blogspot.fr/2011/11/empirical-orthogonal-function-eof.html


11

Although in both cases the quality metrics for the global fit are comparable (the mean R-squared goodness of fit for OMEGA and ECCO are 0.69 and 0.67 correspondingly), the differences in stock estimates and predicted distributions are significant (see Figures 1, 2). OMEGA forcing provides generally higher stock than ECCO, except at the end of 2011 when the depleting effects of fishing (same fishing effort can produce different effect on population depending on its spatial and demographic dynamic processes) and environment becomes stronger in OMEGA. Due to the differences in recruitment parameter estimates and respective spatial distributions of young and adult biomass, the fishing impact on the modeled skipjack population under two forcing datasets are not the same. Thus, despite that OMEGA predicts larger stock the estimated fishing impact (calculated as 1  Bref BF0   100% for the mean biomasses with (reference) and without fishing (F0) predicted at the last year of simulation) is stronger (16%) than in ECCO (11%). These results can lead to very different conclusions if these SEAPODYM predictions will be used in management applications (e.g. a task of estimating efficiencies of marine protected areas presented in [24]). Therefore, such differences should be thoroughly examined, uncertainties estimated and the sources of uncertainties explained.

Figure 1. SEAPODYM stock predictions based on ECCO and OMEGA physical forcing fields during the 2004-2012 period. The solid lines indicate the predictions with fishing and dashed lines correspond to no fishing scenario.

As evident from Equations (1-2) the spatial distributions of a species, which is actively migrating and has restricted habitat in terms of thermal preferences and oxygen demands, should be highly correlated to the accessible micronekton biomass. This is the case of skipjack that prefers warmer and oxygen-rich waters of the upper pelagic layer and rarely visits the deeper habitats where the food is usually more abundant [25]. The Figure 2 illustrates that in both model (OMEGA and ECCO) the distributions of young (immature) skipjack are essentially similar to those of surface layer

12


micronekton at night, i.e. the sum of resident and migrant organisms. However, in the Warm Pool (140E-180E, 15S-15N), the major spawning area of skipjack where more than 60% of Pacific skipjack (% of exploited population) is caught, the micronekton biomass fields appear to be very distinct in distributions although close in absolute values.

Figure 2. Five year averages of density of epipelagic micronekton at night (left) and immature skipjack (right) predicted by SEAPODYM with OMEGA (top) and ECCO (bottom) physical forcing. Circles are the average catches by purse-seine fleets associated with FAD (Fish Aggregating Devices). Means are calculated over 2004-2008 periods.

3.2. SEAPODYM-MTL sensitivity analysis The results of SA for SEAPODYM-MTL obtained from linear perturbations of OMEGA fields are illustrated on Figure 3. They show that the main variables responsible for the change of micronekton model outputs in this area are ocean currents. Currents modify the micronekton densities in


13

the surface layer by 30% in the Warm Pool (up to 40% locally) and by 22% on average over the entire Pacific Ocean. The second critical variable is the primary production although its impact is not so pronounced, only 7.5% on average in the Warm Pool (with local variations of 5-20%) and 5% Pacific-wide. The temperature seems to be the least significant variable in terms of seasonal to inter-annual variability. The use of climatological fields instead of time series would create less than 3% change of micronekton biomass (see Figure 3) in both Warm Pool and in the whole Pacific area.

Figure 3. The S1 measures of sensitivity analysis with linear perturbations of ocean currents, temperature and primary production. The values are calculated for the Warm Pool area (140E-180E, 15S-15N).

The SA based on EOF-based perturbations generally confirms these results but it also provides the information on the error in the model predictions in response to the uncertainty in the input data. Amongst the three forcing variables being considered, the largest errors in terms of absolute values and variability exist between current fields predicted by the OMEGA and ECCO physical models (see Appendix, Fig A1). The perturbations in oceanic currents can reach 0.2 m/sec which corresponds to 50% of the statistic maximum of all current velocities (at the 1 degree grid resolution). Since the primary production is not a variable of the ocean circulation models, predictions are taken from two different algorithms, VGPM [26] and Eppley-VGPM, using both satellite derived ocean color data, but the latter taking into account the temperature dependence in photosynthesis de-

14


scription following the work on phytoplankton growth rates by Eppley [27]. The Eppley-VGPM algorithm predicts higher productivity of the tropical ocean than VGPM while the effect is the opposite in the subtropical regions. The error-based perturbations for primary production constitute 57% of the primary production in the tropical region and in the eastern parts of North and South Pacific subtropical gyres. The smallest errors are obtained from the temperature fields, the maximum being only 0.2°C. In order to synthesize the results of SA while providing the most detailed statistics of the sensitivity measures for the purposes of skipjack dynamics modeling, four regions were defined – I) 140E-180E, 10S-10N ; II) 180E-150W, 10S-10N ; III) 150W-100W, 10S-10N ; IV) 135E-180E, 20N40N. These regions are known to be the principal fishing grounds for skipjack fisheries. The S1 measures calculated over the ensemble of simulations for these regions are shown in the Table 1. The sensitivities to ocean currents are about twice higher than in the first SA. The S1 indices for currents are between 42.7% and 78.7%, although the local changes (not shown) in epipelagic micronekton can be as high as 150% in the tropics and 500% in the sub-tropical regions. On the contrary, the sensitivity to the primary production is twice lower despite of the rather high errors between Eppley and VGPM. The later indicates that taking into account the inter-annual variability in the primary production is critical. Table 1. Sensitivity measures S1 (in % of reference biomass) calculated for epipelagic micronekton density for four selected regions and the entire Pacific Ocean. I II III IV OCEAN  \ region 42.7 47.6 53.1 78.7 79.4 Currents 2.7 2.5 1.4 1.3 2.3 Production 0.3 0.3 0.1 0.2 Temperature 0.3

Following the smallest errors in the temperature fields, the response to the temperature perturbation is the lowest among all variables– between 0.1% and 0.4%, which is not significant for the model and optimization procedure as it is an order of magnitude lower than the errors in observations. The variability ratio ( S 2 ) is an informative metrics of perturbation efficiency showing how the variability of the micronekton resulting from the forcing variable perturbations compares to its natural variability. This measure can emphasize the areas susceptible to large errors, i.e. where the variability of perturbed ensemble is higher than the natural one. The results are presented in the Table 2. The response in terms of variability ( S 2 ) is estimated to be higher than in terms of absolute values ( S1 ), especially in the


15

most dynamic region (IV) of Kuroshio currents extension where all three input variables generate stronger impact on micronekton than in other regions. The response to the temperature appears to be not significant in view of the variability ratios, which is consistent with the SA-1 results and S1 metrics of sensitivity of the SA-2. Table 2. Sensitivity measures S 2 (variability ratio – ensemble variability over the natural variability) calculated for epipelagic micronekton density for four selected regions and entire Pacific Ocean.  \ region I II III IV OCEAN 1.24 1.33 1.42 4.45 4.81 Currents 0.06 0.06 0.03 0.05 0.12 Production 0.009 0.007 0.01 0.01 Temperature 0.007

4.

CONCLUSION

Nowadays, predictions of physical ocean variables produced by hydrodynamic models with data assimilation techniques are often freely available on global scale and at spatial resolutions up to a few kilometers 4. The biochemical variables (primary production, oxygen) can be predicted by coupled biochemical models (e.g. PISCES – Pelagic Interaction Scheme for Carbon and Ecosystem Studies, [28]) or derived from either in-situ or satellite observations [26,29]. The processing of in-situ data usually involves interpolation and / or use of empirical simplified relationships. Every method has its own strengths and weaknesses and their predictions have uncertainties, which often cannot be strictly evaluated due to the lack of observations. Thus, the errors in the physical and biochemical fields are poorly known and hence not provided. The motivation for the present sensitivity analyses with respect to the forcing data was dictated by the significant differences in the results of the optimization studies performed with SEAPODYM under different forcing. The two model configurations used here – OMEGA and ECCO are good examples as both model’s predictions are linked to observations via data assimilation technique (ECCO) or model dynamical adjustment to the ocean observations (OMEGA). Having by definition the minimal errors in modeled fields, they represent an ideal dataset for understanding and measuring the model responses to the input data. It should be noted that the results of sensitivity analysis for tuna dynamics model depend on the parameter estimates. For instance, if the variability of the tuna mortality with respect to the environment is high, it is 4

Please visit https://reanalyses.org/ocean/overview-current-reanalyses for an overview of existing reanalysis.

16


expected that the response to the temperature, oxygen and MTL fields will be high as well and vice versa. If the dynamics is estimated to depend mostly on the population intrinsic processes such as reproduction rate, or natural (predation and senescence) mortality rates rather than environment (perfect adaptation, high thermal ranges), then the impact of input data will be minimal. However, this is not the case for skipjack tuna, which has rather restricted habitat in terms of temperature and oxygen tolerance [30] and hence in terms of accessibility to prey species. The results of SA for SEAPODYM-MTL demonstrated that having good oceanic current predictions is critical. Given the uncertainties in the current predictions by the ocean circulation models, the local changes in the SEAPODYM predictions can reach 150% in the tropical areas and appears to be much higher (up to 500% of reference values) in the subtropics. At the same time the use in the MTL model of climatological temperature fields, e.g. derived from in-situ data, seems to be justified by its low (less than 5%) sensitivity. It can however be critical to use the time series of temperature fields for tuna model application as the temperature anomalies such as ENSO are known to have pronounced effect on the skipjack recruitment [8]. It is interesting to note also the non-linearity of the sensitivity measures with respect to temperature and currents. This non-linearity appears due to the nonlinear function depending on temperature, which is used in the model to describe the time of development of micronekton (during this time the production of micronekton is drifting passively) and its mortality [18]. Intuitively, given the responses of the model to the currents and temperature fields, it is likely that the slope parameter of this function can be better estimated as a control variable of the time of development function. Further work is necessary to complete this first sensitivity analysis for the SEAPODYM model of top predator dynamics. In particular, it is important to test the impact of the errors in the temperature fields on the recruitment of tuna and the response to the inter-annual variability of the dissolved oxygen, the time series of which are often not available and replaced by climatology. Nevertheless, this study serves a useful demonstration of the importance of good forcing data for modeling ecosystem dynamics. ACKNOWLEDGMENTS We would like to thank Konstantin Lebedev of Shirshov Institute of Oceanology for running and providing the OMEGA model outputs. The work was funded in part by Oceanic Fisheries Program of Secretariat of the


17

Pacific Community and by INDESO project of Collecte Localisation Satellites. REFERENCES 1.

Secretariat of the Pacific Community (2012), Tuna Fisheries Yearbook, Western and Central Pacific Fisheries Commission, Pohnpei, Federated States of Micronesia, 148 p. 2. Tyutyunov, Yu., Senina, I., Arditi, R., Jost, C. (2002), “Risk assessment of harvested pike-perch population in the Azov Sea”, Ecological modelling, Vol. 149, pp. 297–311. 3. Sibert, J. R., Hampton, J. (2003), “Mobility of tropical tunas and the implications for fishery management”, Marine Policy, Vol. 27, pp. 87–95. 4. Sibert, J. R., Hampton, J., Fournier, D. A., Bills, P. J. (1999), “An advection–diffusion–reaction model for the estimation of fish movement parameters from tagging data, with application to skipjack tuna (Katsuwonus pelamis)”, Can. J. Fish. Aquat. Sci., Vol. 56, pp. 925– 938. 5. Robinson, A. R., Lermusiaux, P. F. J. (2002), “Data assimilation for modeling and predicting coupled physical biological interactions in the sea”, The Sea, John Wiley, New York, Vol. 12. pp. 475–536. 6. Faugeras, B., Maury, O. (2005), “An advection–diffusion–reaction population dynamics model combined with a statistical parameter estimation procedure: application to the Indian skipjack tuna fishery”, Math. Biosci. Eng., Vol. 2, No. 4, pp. 1–23. 7. Bertignac, M., Lehodey, P., Hampton, J. (1998), “A spatial population dynamics simulation model of tropical tunas using a habitat index based on environmental parameters”, Fisheries Oceanography, Vol. 7, pp. 326–334. 8. Lehodey, P. (2001), “The pelagic ecosystem of the tropical Pacific Ocean: dynamic spatial modeling and biological consequences of ENSO”, Progress in Oceanography, Vol. 49, pp. 439–468. 9. Lehodey, P., Chai, F., Hampton, J. (2003), “Modelling climate-related variability of tuna populations from a coupled ocean-biogeochemicalpopulations dynamics model”, Fisheries Oceanography, Vol. 12, No. 4, pp. 483–494. 10. Lehodey, P., Senina, I., Murtugudde, R. (2008), “A spatial ecosystem and populations dynamics model (SEAPODYM) - Modelling of tuna and tuna-like populations”, Progress in Oceanography, Vol. 78, pp. 304–318.

18


11. Senina, I., Sibert, J., Lehodey, P. (2008), “Parameter estimation for basin-scale ecosystem-linked population models of large pelagic predators: Application to skipjack tuna”, Progress in Oceanography, Vol. 78, pp. 319–335. 12. Senina, I., Royer, F., Lehodey, P., Hampton, J., Nicol, S., Ogura, M., Kiyofuji, H, Sibert, J. (2012), “Integrating conventional and electronic tagging data into SEAPODYM”, Pelagic Fisheries Research Program of University of Hawaii at Manoa Newsletter, Vol. 16, No 1, pp. 9–14. 13. Lehodey, P., Senina, I., Calmettes, B., Hampton, J., Nicol, S., Williams, P., Molina, J. J., Ogura, M., Kiyofuji, H., Okamoto, S. (2011), “SEAPODYM working progress and applications to Pacific skipjack tuna population and fisheries”, Working paper, 7th Regular Session of the Scientific Committee – August 9-17, 2011, Pohnpei, Federated States of Micronesia. 14. Lehodey, P., Senina, I., Calmettes, B., Hampton, J., Nicol, S. (2012), “Modelling the impact of climate change on Pacific skipjack tuna population and fisheries”, Climatic Change, Springer, No. 119, pp. 95–109. 15. Lehodey, P., Senina, I, Titaud, O., Calmettes, B., Conchon, A., Dragon, A., Nicol, S., Caillot, S., Hampton, J., Williams P. (2014), “SEAPODYM applications in WCPO”, Working paper, 10th Regular Session of the Scientific Committee – August 6-14, 2014, Majuro, Republic of the Marshall Islands. 16. Ivanov, Yu. A., Lebedev, K. V., Sarkisyan, A. S. (1997), “Generalized Hydrodynamic Adjustment Method (GHDAM)”, Translated from Russian. Izvestiya, Atmospheric and Oceanic Physics, Vol. 33, No. 6, pp. 752–757. 17. Lebedev K. V., Yaremchuk M. I. (2000), “A diagnostic study of the Indonesian Throughflow”, J. Geophys. Res., Vol. 105, No. C5, pp. 11243–11258. 18. Lehodey, P., Murtugudde, R., Senina, I. (2010), “Bridging the gap from ocean models to population dynamics of large marine predators: a model of mid-trophic functional group”, Progress in Oceanography, Vol. 84, pp. 69–84. 19. Lehodey, P., Conchon, A., Senina, I., Domokos, R., Calmettes, B., Jouanno, J., Hernandez, O., Kloser, R. (2015), “Optimization of a micronekton model with acoustic data”, ICES Journal of Marine Science, Vol. 72, No. 5, pp. 1399–1412. 20. Obukhov, A. M. (1947), “Statistically homogeneous fields on a sphere”, Advances in Mathematical Sciences, [Statisticheski odnorod-


21.

22.

23.

24.

25.

26.

27. 28.

29.

30.

19

nye polya na sfere, Uspekhi matematicheskikh nauk], Vol. 2, No. 2, pp. 196–198. Obukhov, A. M. (1960), “The statistically orthogonal expansion of empirical functions”, Translated from Russian by the American Geophysical Union. Akademia Nauk SSSR. Izvestiya. Seriya Geofizicheskaya, No. 3, pp. 288–291. Lorenz, E. N. (1956), “Empirical orthogonal functions and statistical weather prediction”, Technical report, Statistical Forecast Project Report 1, Dept. of Meteor., MIT, 1956, 49p. Lucas, M., Ayoub, N., Barnier, B., Penduff, T., De Mey, P. (2008), “Stochastic study of the temperature response of the upper ocean to uncertainties in the atmospheric forcing in an Atlantic OGCM”, Ocean Modelling, Vol. 20, No. 1, pp. 90–113. Sibert, J., Senina, I., Lehodey, P., Hampton, J. (2012), “Shifting from marine reserves to maritime zoning for conservation of Pacific bigeye tuna (Thunnus obesus)”, PNAS, Vol. 109, No. 44, pp. 18221–18225. Dagorn, L., Bach, P., Josse, E. (2000), “Movement patterns of large bigeye tuna (Thunnus obesus) in the open ocean, determined using ultrasonic telemetry”, Marine Biology, Vol. 136, pp. 361–371. Behrenfeld, M. J, Falkowski, P. G. (1997), “Photosynthetic rates derived from satellite-based chlorophyll concentration”, Limnology and Oceanography, Vol. 42, pp. 1–20. Eppley, R.W. (1972), “Temperature and phytoplankton growth in the sea”, Fishery Bulletin 1972, Vol. 70, pp. 1063–1085. Aumont, O. Bopp, L. (2006), “Globalizing results from ocean in situ iron fertilization studies”, Global Biogeochemical Cycles, Vol. 20, GB2017, pp. 1–15. Garcia, H. E., Locarnini, R. A., Boyer, T. P., Antonov, J. I. (2006), World Ocean Atlas 2005 Volume 3: Dissolved Oxygen, Apparent Oxygen Utilization, and Oxygen Saturation. S. Levitus, Ed., NOAA Atlas NESDIS 63, U.S. Government Printing Office, Washington, D.C., 342 p. Brill, R. (1994), “A review of temperature and oxygen tolerance studies of tunas pertinent to fisheries oceanography, movement models and stock assessments”, Fisheries Oceanography, Vol. 3, No. 3, pp. 204–216.

20


APPENDIX A. u

v

PP

T

Figure A1. The sum of twenty dominant modes of EOF for three ocean variables – (top) currents vector field (m/sec), (bottom-left) primary production (mmolC/sq.km) and (bottom-right) temperature (C).

Applied Discrete Mathematics and Heuristic Algorithms, 1, 3 (2015) 21–47

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

The machine learning in software project management: A journey. Part I ∗ Dinesh Bhagwan HANCHATE, email: [email protected] Vidyapratishthan’s College of Engineering, Baramati, Pune, Maharashtra, India. Rajankumar S. BICHKAR, email: [email protected] G. H. Raisoni College of Engineering & Management, Wagholi, Pune, Maharashtra, India. Abstract. Software Project Management (SPM) is the most important and toughest job in software engineering organizations. For successful project management, it needs the proper planning and scheduling of each software development umbrella activity. Software planning needs proper communication, requirement elicitation, suitable patterns and phase wise testing of software for fulfilling the clients requirements and completing the umbrella activities. All these can be effectively achieved and predicted by accurate estimates of the bud-get and schedule. Here, literature review is illustrated by the most commonly used Machine Learning (ML) techniques such as neural networks, case based reasoning, classification and regression trees, rule induction, genetic algorithm and genetic programming for the subdomain of SPM. In current era, ML imparts consistently promising accuracy in some SPM fields. Keywords and phrases: Software project management, machine learning. Computing Classification System 2012: D.2.9, I.2.6 Mathematics Subject Classification 2013: 68N30, 68T05

1

Introduction

Machine Learning (ML, [41]) always gives idea to improve the task performance by experience. ML shows it’s strength in various areas of applications and some interrelated domains. The training and testing of ML gives the ground truth by evaluation. ML involves learn ∗

c D. B. Hanchate, R. S. Bichkar, 2015.

22

D. B. Hanchate, R. S. Bichkar

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = from experience from programs which adapt their performance on a certain task over time. The paper takes tour of the characteristics and applicability of some regularly used ML algorithms. It is difficult to make the classification of ML [13]. That’s why we use to learning style and similarity function type groupings [51]. The techniques and algorithms for all which are described in this section are given in the Table 1 with classification.

1.1

Supervised Machine Learning (SML)

Input data is assumed as training data. It has a known label or result as e.g. it is a spam/not-spam. This type of model use to go through a training process where it makes predictions. Predictions are corrected if they found wrong. The training process repeat until the model gets a desired accuracy in the training data. The basic work flow of a ML learning approach of this type is illustrated in Figure 1 [21].

Figure 1: ML techniques shows their evaluation by training and testing.

Software project management and machine learning: A journey. Part I

23

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Table 1: ML Techniques and examples; Rgln, Rgrn, Clstr and Ensl are regularization, regression, clustering, and ensemble respectively. MLs SML SSML USML RML Rgrn

IBL Rgln DTL

Kernel Clstr ARL ANN

DL

DR

Ensle

Examples Logistic Regression (LR) Back Prorogation Neural Network (BPNN) Classification and regression Association Rule, Learning and Clustering Apriori and K-means. Temporal difference learning, Q-learning. Robot Control. Ordinary Least Squares, Logistic Regression, Stepwise Regression, Multivariate Adaptive Regression Spines (MARS), Locally Estimated Scatter plot Smoothing (LOESS) kNN, Learning Vector Quantization (LVQ) and Self-organising Map (SOM). Ridge Regression, Least Absolute Shrinkage, Selection Operator (LASSO) Elastic Net. Classification and Regression Tree (CART), Gradient Boosting Machines (GBM), Iterative Decotomiser 3 (ID3) C4.5, Chi-squared Automatic Interaction Detection (CHAID), Decision Stump Random Forest, Multivariate Adaptive Regression Spines (MARS) SVM, Radial Basis Function (RBF) and Linear Discriminate Analysis (LDA) K-means, Expectation Maximization (EM) Ariori, Eclat Algorithm Perception, Back-prorogation, Hopfield Network, Self-Organising Map (SOM), Leanring Vector Quantization (LVQ) Restrictive Boltzman Machine (RBM), Deep Belief Networks (DBN) and Convolution Network Stacked Auto-encoders. Principal Component Analysis (PCA), Partial Least Squares Regression (PLS) Sammon Mapping, Multidimensional Scaling (MDS) and Projection Pursuit. Boosting, Bootstrap Aggregation (Bagging) , AdaBoost Stacked Generalisation (Blending), Gradient Boosting Machines (GBM) and Random Forest

24


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

1.2

Unsupervised Learning ML (USML)

In this learning type, input data is neither labeled nor has a known result. A model is created by deducing structures present in the input data itself. Example problems are association rule learning and clustering.

1.3

Semi-Supervised Machine Learning (SSML)

This model is the mixing of labeled and unlabeled examples. The model earns the structures to make suitable organizing the data as well as making predictions. SSML algorithms are expansion to other flexible methods which consider how to model the unlabeled data.

1.4

Reinforcement Machine Learning (RML)

Provided input data is stimulation to a model from the space to which the model must act and react. Feedback is provided as punishments and rewards in the environment. Example problems are robot control. ML diagram in Figure 2 shows mathematical view of conversion and mapping for getting evaluated solution [69].

1.5

Regression

Regression is not a class of problem but a process. It does make relationships between variables. These variables are refined step by step using error in the predictions made by concerned model.

1.6

Instance-based Learning (IBL)

Instance based learning model is a decision problem of cases and instances of training data. These methods build up a example D/B and compare new data to the D/B using a similarity coefficients to find the best match ever and then, make a prediction. An IBL lights mainly on a representation of the stored instances as well similarity measures used between instances.


25

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Figure 2: ML – Mathematics showing mapping learning to the ground reality of final hypothesis.

1.7

Regularization Methods

This is an extension made to another method which penalizes models based on their complexity.

1.8

Decision Tree Learning (DTL)

DTL methods construct a prototype of decisions. Decisions are made on actual attribute values in the data. Decisions forked in a tree is repeated until a prediction decision is made for a given record [13].

1.9

Bayesian Learning (BL)

BL methods uses Bayes theorem for the problems of classification and regression types. The Figure 3 showing Training-ML-Prediction which reflects BL method.

26


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Figure 3: Training-ML-Prediction.

1.10

Kernel Methods

Kernel Methods are well known for the Support Vector Machines (SVM). Kernel Methods are dealt with mapping an i/p data into a higher dimensional vector environment where some classification become easier to model.

1.11

Clustering Methods

Clustering puts light on the class of problem and methods. Clustering methods are typically organized by the modeling approaches. These are centroid-based and hierarchal modes. It uses inherent information and structure from the data to organize data into maximum commonality groups.

1.12

Association Rule Learning (ARL)

ARL extract rules based on the observations and relationships between variables in concerned data.


27

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

1.13

Artificial Neural Networks(ANN)

ANNs are inspired by the structure and function of biological neural networks. They are a class of pattern matching. They are mostly utilized for regression and classification problems.

1.14

Deep Learning

Deep Learning methods are most modernization to ANNs. These are popular for abundant cheap computation. They builds much larger and more complex neural networks.

1.15

Dimensionality Reduction

It sees dimensionally reduction data and make them larger with the inherent information and structure present in the data so that one can visualize dimensional data. This simplified data can be used so far in a supervised learning method.

1.16

Ensemble Methods

Ensemble methods are made up of multiple weaker models. These are trained independently and then predictions are combined to make the overall prediction. Much effort is required into weak learners to combine them and the manner they combine. This class consists of a very powerful techniques and they are so popular. The classifications of ML algorithms in terms of function is considered and can be seen in the ML classifier as per Jason classification in Figure 4 [51]. The ML classifiers divide into statistical and structural models, then in turns structural divides into rule, distance and neural based model [13]. The year wise historical development of some ML techniques is shown in the Figure 5 [45]. Some of the ML algorithms comparison [38] shows the characteristically comparison among the algorithms in Table 1. The importance of ML is explained conceptually by Jan Van Leeuwan [34] in Probably Approximate Correct (PAC) and meta learning models [58]. On other side, SE (Software Engineering) is the discipline which gives many ways to adapt different path to accomplish quality software development. SE gives indirectly indications of learning phases

28


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Figure 4: ML Classifiers tree as per the Jason Brownlee’s classification. in various directions. Hence, SE is suitable field where we can introduce ML algorithms to develop many more process, product, people and project. Many topics in the Software Engineering [50] [62] can be a fertile soil where many problems and questions can be achieved and solved by ML. We provide comparative study of some software development tasks using learning algorithms which had been using by many authors and researchers. The purpose of this paper is to have a glance and work on machine learning, in SPM [9] also. The work flow model [14] in SPM is shown in Figure 6 which gives direction for changing the reports and reporting the problem in SPM. The customer and developer report impart not only the good relation but also give transparencies in the management [14].

2

Software Project Management (SPM)

Cost Estimation, measurement and analysis of software are the most important factors in SPM [30, 32]. Software managers come across routinely with software projects that contain errors or inconsistencies and exceed budget and time limits [67]. A common assumption in the SE is the software development process inherent laws. However, since software is not a tangible product, the nature and size of the factors


29

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Figure 5: Popularity and historical development of ML analyzed and graphed [23].

Figure 6: Change reports and problem reporting in work flow model in SPM. that influence this process are hard to establish. Therefore, software managers (SM) are routinely confronted with software projects that

30


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Table 2: Different ML techniques comparison (NN is Natural net), [38]. Characteristics Natural handling of data of Mixed type Handling of missing values Robustness to outliers in input space Insensitive to monotone Transformations of Input Computational Scalability (Large N) Ability to deal with irrelevant inputs Ability to extract linear combinations of features Interpretability Predictive Power

NN Poor

SVM Poor

Trees Good

Mars Good

k-NN Poor

Poor

Poor

Good

Good

Good

Poor

Poor

Good

Poor

Good

Poor

Poor

Good

Poor

Poor

Poor

Poor

Good

Good

Poor

Poor

Poor

Good

Good

Poor

Good

Good

Poor

Poor

fair

Poor Good

Poor Good

Fair Poor

Good Fair

Poor Good

contain errors [54] or inconsistencies and this projects exceed budget and time limits. SM with organization must decide how to allocate the available resources based on predictions of unknown future. However, accuracy of prediction of development effort can reduce the cost from inaccurate estimation [66], misleading tendering bids and disabling the monitoring progress. An accurate modeling can assist in scheduling resources and in evaluation of risk factors [39]. In order to develop a project in any engineering domain, the right planning, scheduling and implementing the development process are major factors [4]. For complex or even for simple Project, both planning and scheduling are most important initiatives. The number of possible implementation and resource allocation ways are complex and overwhelming task for a small and simple project also. The planning


31

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = with goals and life cycle model make a path for SDLC (Software Development Life Cycle), SCMP (Software Configuration Management Plan), SQAP (Software Quality Assurance Plan), project plan and schedule and CIP (Component Iteration Plan). All these stages and components are shown in Figure 7 [35].

Figure 7: Planning and their component stages in SPM. Planning and scheduling are clearly different activities [68]. The planning phase [59] is to decide, what must be done and how to do restrictions on the scheduling process. Figure 8 showing typical phases in Quality Project Planning (QPP) which require to refer the estimation of time and resource for each activity and process. Also, the PM should know and understand the precedence relationships between each umbrella activity or process and other constraints. In order to develop any project, it is necessary to measure the performance of schedule and feasibility of plan. This can be seen in the work of Tara [22] who used scheduling tools and techniques to optimize project outcomes. To do this, it needs process groups to share the knowledge areas of SPM. This has been shown in the Figure 9 of process group [25] and management sub-domain. These are also called as Project Management Methodologies (PMM). This is shown in the figure of PMM (Project Management Methodology). The Project Management Institutes (PMI) recommends eight

32


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Figure 8: Quality Project Planning (QPP) and phases in SPM (Water Fall Model).

Figure 9: Process groups and Management sub-domain areas of PMM that are critical to managing a project [47]. The eight areas are management of Scope, Schedule, Time, Cost Manage-


33

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ment, Quality, Human Resource, Communications, Risk and Procurement [25, 47]. The PMM diagram describes how any software does apply the Project Management Body of Knowledge (PMBK) to be successful in the project [56]. PMM gives also the direction to timely and quality completion of the software project which is shown in the Figure 10. Figure 10 showing PPM also reflects SPM Knowledge Areas (SPMKA) [55] and shows not only the different SPM domains but also the scope of project to improve the internal triangle (developed by Lewis) [36].

Figure 10: The way of Timely and Quality completion of project with PMM Another important process of SPM is RMP Risk Management Process [26]. Any project RMP places it into four levels. These are risk planning-assessment-handling-monitoring at first, risk identification and analysis at second, feedback at third and risk documentation at fourth level. The RMP flow is shown in Figure 11 [40]. Some of the objectives of software project is based on the RMP measurement. Using the risk objectives and goal of a project, overall performance of the plan and schedule can be accessed. For efficient scheduling, integration of different kinds of data is essential. Construction and making of a schedule and plan require models of processes, defined relationships between tasks and resources, defined objectives and performance measures, and also the algorithm which is to be implemented, along with the required data structure. Resources include

34


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Figure 11: Risk Management Process. people, machines, and raw materials [12]. Schedules in project are utilized to allocate resources to tasks (or tasks to resources). Tasks or umbrella activities in a project could be combination of simple tasks, milestones, risks and simple actions for development of software modules. In general, objectives for any software project development could be minimizing the total time required for development of project, maximizing the net current project value, or it could be minimization of late delivery of products [53] or minimizing the complexity of project. Finally, a solution that go through SPM is more than just plans. Although project planning and project scheduling are considered separate entity, planning and scheduling are usually connected for the sake of time management. For producing a feasible schedule (Schedule may be any schedule concerned with PMM), we may require to make changes in a schedule with different set of activities. Conversely, there could be a plan, that does not have feasible schedule [6]. In either case, objectives such as minimize make span or maximize net-present value of the cash flows, independent of the plan or schedule, decide the value of a schedule and plan of project. With the help of plan and the objectives, one can settle on the difficulty of finding a schedule. Software Project Scheduling Problems (SPSP) are not static and are based on incomplete data. Unless and until the project is not completed the schedule is not static, and most of the project plans get changed as soon as they are declared. The dynamics may be due to underprivileged estimates for cost or time or budget, insufficient data


35

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = or information, or unexpected turbulence [5]. As a result, it is a lot puzzled to find an optimal schedule by satisfying existing constraints and also also adjusting to added constraints and changes in the configuration of given problem. Scheduling problems [1] have many types of constraints. Constraints are of some forms like: • temporal constraints, • precedence constraints, and • combination of both. Usually, a plan comprises lot of hurdles to make it little flexible for the change, in addition to components which are nearly unrestrained. Figure 12 shows the role of 4 factors i.e. size, cost, schedule and quality in SPM. The design of SPM shows the various areas to be considered while doing SPM seriously [33] where Effort Multiplier Factors (EMF) with particular project will be taken into account.

Figure 12: Role of size, cost, schedule and quality in SPM.

36


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

3

Software Project Scheduling

In SPSP [24], a project development process [48] is consisting of several set of tasks. Each task or activity of project development again has precedence relationship between each other, and in order to maintain that precedence, we cannot execute any task until its previous task get execution completed. Each activity has its estimated time duration and cost of execution. Conceivably, for any software project development process [53], the frequent objective is to minimize the time project duration (DUR) of project development to its completion. A lot of interesting area in SPSP is described by researchers [30]. For example, Resource Constrained Project Scheduling Problems (RCPSP, [28]), in which the job requires several resources and these are limited resources. Another version of resource constrained scheduling problem is multi-modal resource-constrained project scheduling problems, in which each job or task may be completed in more than one way or mode, and there could be different resource requirement for each mode. In such cases, multiple project should be scheduled for execution.

Figure 13: Task Precedence Graph [16, 17]. Figure 13 shows TPG; each “P” shows parts of a project scheduling problem or activity or task. An individual scheduling problem [30] may include number of projects. Every project again having several tasks to be executed. Each task again may consist dissimilar types of


37

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = constraints and conditions, as of resource requirements, temporal restrictions, and precedence relations. The majority of tasks comprises measures associated with the project performance, as product quality [18], estimated DUR and materials cost. Typical resources include any devices, machines, materials and people. However, resource allocation also includes physical locations, virtual circuits, processors and hardware (physical) materials [70]. Up to some level, the multi-mode project scheduling problems combines two methods scheduling and planning [11]. Within simple multimode scheduling problems, as every task or activity may be implemented in more than one mode, this is the similar to selection of any one execution plan from provided set of plans. In case of complex multi-modal project scheduling problems, if there are some tasks which can be blow up into multiple tasks, or if few tasks can be exchanged and swapped, then the scheduler effectively does more crux and combinatorial plan or schedule for that scheduling problem. The project are is not vital or critical, but it does conclude to some extent the problem’s complexity. It also decides the granularity and fine details of the task and the time extent [47].

3.1 3.1.1

Factors making scheduling problem hard and complex Scaling issues: Problem size

Multiple ways and methods have been defined to decrease the size of the search hypothesis of scheduling problem [8]. Also some methods have been deliberated to decide whether schedule or some part of it could be feasible if fractional knowledge about that schedule is provided. These methods try to minimize the search space of problem solution by taking advantage of having knowledge about specific problem. However, pruning heuristics are not always available and they are obliviously rare. 3.1.2

Ambiguity and the dynamic behavior of real problems

Practically speaking, it is often less important to find optimal solution [11] that to handle with ambiguities during planning and volatile disorders during schedule implementation [1]. Some cases, plans are dependent on fined and known processes. In this scenario, nature of

38


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = resources, important requirements of tasks and activities are all familiar. But, it is predicted in well manner by an author. In many other other cases, though, guesses are less precise due to insufficient data or predictive models. In such cases the schedule or plan is prone to change because of the variance nature of the plan. In either case, unanticipated stumbling blocks to the schedule may occur. Stumbling blocks or disturbances are inevitable as it may be a mechanical failure or human error and inclement weather. These type of disorders and conflicts [27] may need only the alternative of a single resource, or they may require full reformulation of the plan or schedule [19, 20]. Here, any optimization solution method should be able to adjust if any modifications occurs in the problem construction whilst preserving the perspective of the work already completed [18].

3.1.3

Infeasibility: Sparseness of the solution space

There is no any feasible solution to a scheduling problem [15] based on the representation and the modeling assumptions. If all resources are available in period which results in as if there are not any temporal conditions on tasks and resources. One who need to extend the project till all tasks are accomplished at the very worst case and time. On the other hand, if once resources are used then resources can go forever hence tasks should be executed only at particular time period, but a feasible solution is not guaranteed. Some algorithms having capability of determining if such an infeasible situation exists. Searching is an optimal solution becomes more tough because of constraints of the problem [7]. If multiple constraints are considered here then the search space traversal becomes confounded so adding up the constraints usually minimizes the chances of getting feasible solutions for a given problem.

3.2

Solution Techniques

Precise techniques of finding a solution can guarantee to get an optimal solution [52], but usually turn into unrealistic when they face problems of large sets of constraints.


39

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 3.2.1

Critical Path Method

The CPM (Critical Path Method) gives the Resource Unconstrained Schedule (RUCS) where a set of precedence-constrained activities are of deterministic durations. It consider that infinite resources and the shortest possible make span. Though, the critical path method does not consider temporal and resource constraints. Stochastic variations [42, 61] and dynamic variations [10] have also been constructed in an attempt to bring a critical path method modeling assumptions closer to reality so these methods include probabilistic estimates of task duration. 3.2.2

Linear and Integer Programming

Many scheduling problems can be formulated in two ways (first is traditional linear and another is integer programming form) but only if significant simplifications are made. Patterson presented an overview of optimal solution methods for project scheduling [46], and Demeulemeester and Herroelen published a more recent survey on it [20]. In general, exact methods depend on two things first one is characteristics of the objective function (e.g. strictly integer values) and second one is specific constraint formulations (e.g. only single-mode tasks). As Lawrence Davis noted that, many of the constraints generally found in real scheduling problems do not lend themselves well to traditional operations research or math programming techniques. In addition, the linear programming formulations typically do not scale well, so they can be used only for specific instances or small problems. A dynamic programming approach was described by Held and Karp [29] in which an optimal schedule was incrementally developed by constructing and optimal schedule for any two tasks, then extending that schedule by adding tasks until all tasks have been scheduled. 3.2.3

Bounded Enumeration

Many multiple solution methods search a decision tree generated from the precedence relations in the project plan. As shown in Figure 15, here a root of the tree corresponds to the first task. The second level of the tree is the set of tasks that may be scheduled once the first task has been scheduled. The final tree thus represents a precedence-feasible set of task sequences. Any one of the root-to-leaf sequences can then

40


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = be passed to a schedule generator. Alternatively, the sequence of tasks can be scheduled directly if the tree generation/pruning algorithm also considers resource constraints. The search consists of traversing the tree until the best root-to leaf path is found. Enumerative methods are typically bounded using heuristics in order to reduce the size of the tree. It is very easy to see how the tree grows quickly with the number of activities. Depending on the precedence relations, each new task can add many branches to the tree. When tasks are modeled with multiple execution modes, each execution mode adds another layer of combinatorial choices for the scheduler. As noted by Sprecher and Drexl [63, 64], enumerative methods cannot solve large problems; the tree is too big although. Significant progress has been made in the pruning techniques, branch and bound methods are still limited to less than one hundred activities or even fewer in the multi-modal cases, and they still have to require special heuristics to accommodate variations in resource constraint formulations.

Figure 14: Generation of a tree from Precedence Feasible Solution from software project plan. Precedence constraint is shown for 7 tasks project (work order).

3.2.4

Heuristic Solution Methods

Heuristic methods sometimes find optimal solutions [44], but usually find simply “good” solutions. Heuristic methods need far less time and/or space than other exact methods. The heuristics gives specification to make a decision in given a particular phenomenon. Heuristics [2, 3] are the simply rules for deciding which action to be taken.


41

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Figure 15: A Tree showing Precedence Feasible solution for scheduling 7 tasks in a project.

Heuristics in scheduling are called as dispatch rules. The definition of these rules is often very complicated and most are made for a specific type of problem with a very special constraints and assumptions set. Heuristics are of deterministic and stochastic as they produce same and different results and respective types. They may go for single rule at a time, or they may take parallel decisions. Hybrid algorithms may combine multiple heuristics. Traditional heuristic methods [2] includes generally three steps: planning, sequencing, then scheduling. A few include planning in the generation of schedules by permitting more than one plan and allowing the search to choose between plans as it schedules [49]. Precedence constraints make domination of the search in the sequencer, whereas resource constraints make in the scheduler. We shall continue to consider software project management and machine learning in Part II of this paper.

References 1. Adamuthe, A. C., Bichkar, R. S. (2011), “Hybrid genetic algorithmic approaches for personnel timetabling and scheduling problems in healthcare”, IJCA Proceedings on International Conference on Technology Systems and Management (ICTSM), Vol. 2, pp. 11–18. ⇒ 35, 37

42


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2. Akartunali, K., Miller, A. (2009), “A heuristic approach for big bucket multi-level production planning problems”, European Journal of Operational Research, Vol. 193, No. 2, pp. 396–411. ⇒ 40, 41 3. Wu, T., Shi, L., Geunes, J., Akartunali, K. (2011), “On the equivalence of strong formulations for capacitated multi-level lot sizing problems with setup times”, Journal of Global Optimization, Vol. 53, No. 4, pp. 615–639. ⇒ 40 4. Alba, E., Chicano, J. F. (2007), “Software project management with GAs”, SceinceDirect, Information Sciences, Vol. 177, Iss. 11, pp. 2380–2401. ⇒ 30 5. Anandasivam G., Konduru S. (2008), “Research Note On Vendor Preferences for Contract Types in Offshore Software Projects: The Case of Fixed Price vs. Time and Materials Contracts”, Info. Sys. Research, Vol. 19, No. 2, pp. 202–220. ⇒ 35 6. 1992 (Argyris C., A.), Knowldge for action: A guide to overcoming barriers to organisational change, Jossey Bass, San Franciso, California, 336 p. ⇒ 34 7. Artigues, C., Demassey, S., Neron, E. (2007), ResourceConstrained Project Scheduling: Models, Algorithms, Extensions and Applications, Wiley-ISTE, 288 p. ⇒ 38 8. Aytug, H., Bhattacharyya, S., Koehler, G. J., Snowdon, J. L. (1994), “A review of machine learning in scheduling”, IEEE Transactions on Engineering Management, Vol. 41, No. 2, pp. 165–171. ⇒ 37 9. Bhatnagar, S. (2006), Indian software industry, Proceedings of the fifth international conference on genetic algorithms, IIM, Ahmedabad, India, pp. 95–124. ⇒ 28 10. Blazewicz, J., Lenstra, J. K., Rinnooy Kan, A. H. G. (1983), “Scheduling subject to resource constraints: classification and complexity”, Discrete Applied Math., Vol. 5, pp. 11–24. ⇒ 39 11. Bohem, B. (1981), Software Engineering economics, Prentice Hall PTR, New Jersey, 767 p. ⇒ 37 12. Brooks Jr., F. P. (1995), The Mythical ManMonth, AddisonWesley, 336 p. ⇒ 34 13. Brownlee, J. (2013), “A tour of machine learning algorithms”, available at: http://machinelearningmastery.com/a-tour-ofmachine-learning-algorithms/ ⇒ 22, 25, 27 14. Chrissis, M., Konrad, M., Shrum, S. (2001), CMMI: Guidelines


43

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

15.

16.

17.

18. 19.

20.

21. 22.

23. 24.

25.

26.

for Process Integration and Product Improvement, SEI series in software engineering, Addison-Wesley Professional, 688 p. ⇒ 28 Cesta, A., Oddi, A. (2002), “A constraint-based method for project scheduling with time windows”, Journal of Heuristics, Kluwer academic publishers, Netherland Vol. 8, No. 1, pp. 109136. ⇒ 38 “CSC/ECE 501: Operating System Principles” (1998), available at: http://historysquared.com/2012/02/02/machine-learningalgorithms-comparison-table/ ⇒ 36 “CSC/ECE 501: Process synchronization” (1998), available at: http://people.engr.ncsu.edu/efg/501/f98/lectures/notes/ lec4.html ⇒ 36 DeMarco, T., Lister, T. P. (1999), Productive Projects and Teams. 2nd ed., Dorset House Publishing, New York, 264 p. ⇒ 37, 38 Demeulemeester, E. L., Herroelen, W. S. (1997), “New benchmark results for the resource-constrained project scheduling problem”, Management Science, Vol. 43, pp. 1485–1492 ⇒ 38 Demeulemeester, E. L., Herroelen, W. S. (2002), Project Scheduling. A Research Handbook. International Series in Operations Research & Management Science, Kluwer Academic Publishers, Boston, 591 p. ⇒ 38, 39 Duda, R. O., Hart, P. E., Stork, D. G. (2007), Pattern classification, Wiley-Interscience, 680 p. ⇒ 22 Duggan, T., Richter, L. (2011), “Using scheduling tools and techniques to optimize project outcomes”, available at: http://www.brighthubpm.com/software-reviews-tips/125683using-scheduling-tools-and-techniques-to-optimize-projectoutcomes/ ⇒ 31 Golge, E. (2015), “A blog from a human-engineer-eing”, available at: http://www.erogol.com/ ⇒ 29 Hanchate, D. B., Bichkar, R. S. (2014), “SPS by combination of crossover types and changeable mutation SGA”, International Journal of Computer Applications, Vol. 94, No. 10, pp. 1–11. ⇒ 36 Hanchate, D. B., Bichkar, R. S. (2014), “Software project contacts by GRGA scheduling and EVM”, International Journal of Computer Applications, Vol. 97, No. 13, pp. 1–26. ⇒ 31, 33 Hanchate, D. B., Padulkar, D. M., Shinde, A. S. (2008), “Impact of risk factors in risk management by Bayesian learning”, ACM

44


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

27.

28.

29.

30.

31.

32. 33.

34.

35.

36.

37.

38.

Proceedings of International Conference on Advances in Computing, ICAC-2008, pp. 1–3. ⇒ 33 Hanchate, D. B., Shinde, S. A., Sakhare, Y. N., Kare, S. S. (2011), “A survey of determination of effects to conflicts in project scheduling”, International Journal of electronics communication and computer engineering, Vol. 2, pp. 187–191. ⇒ 38 Hanchate, D. B., Thorat, Y. A., Ambole, R. H. (2012), “Review on Multimode Resource Constrained Project Scheduling Problem”, International Journal of Computer Science & Engineering Technology (IJCSET), Vol. 3, No. 5, pp. 155-159. ⇒ 36 Held, M., Karp, R. M. (1962), “A dynamic programming approach to sequencing problems”, Journal for the Society for Industrial and Applied Mathematics, Vol. 1, No. 10. ⇒ 39 Hughes, B., Cotterell, M., Mall, R. (2011), Software Project Management (SIE), Tata McGraw-Hill Education Pvt. Ltd., 432 p. ⇒ 28, 36 Hughes, B., Cotterell, M. (2011), Software Project Management (SEI), . Tata McGraw-Hill Education Pvt. Ltd, New Delhi, India, 384 p. ⇒ Jalote, P. (2004), Software project management in practice, Addison Wesley, New York, 312 p. ⇒ 28 Karayaz, G., Keating, C. B., Henrie, M. (2014), “Designing project management systems”, 47th Hawaii International Conference on System Sciences, Kauai, Hawaii, USA, pp. 1-10. ⇒ 35 Leeuwen, V. J. (2004), “Approaches in machine learning”, Algorithms in Ambient Intelligence, Series Philips Research, Springer Netherlands, Vol. 2, pp. 151–166. ⇒ 27 Lewis, J. P. (2012), Software Development Life Cycle (SDLC). 100 Most asked Questions. SDLC Methodologies, Tools, Process and Business Models, Emereo Publishing, 184 p. ⇒ 31 Lewis, J. P. (2010), A Hands-on Guide to Bringing Projects in On Time and On Budget, Tata McGraw-Hill Education Private Limited, New Delhi., 592 p. ⇒ 33 Lin Jia Ying (2014), “Brief History of Machine Learning”, available at: http://blog.csdn.net/bestlinjiayin/article/details/ 38848257/ ⇒ 27 “Machine Learning Algorithms Comparison Table” (2012), available at: http://historysquared.com/2012/02/02/machine-


45

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 39.

40.

41. 42. 43.

44.

45.

46.

47. 48.

49.

50. 51.

learning-algorithms-comparison-table/ ⇒ 27, 30 Madachy, R. J. (1995), “Knowledge-Based Risk Assessment and Cost Estimation”, Autom. Softw. Eng., Vol. 2, No. 3, pp. 219– 230. ⇒ 30 Meyer, D. M. (2010), “Risky business why better risk management can protect lives the environment. Part 2”, available at: https://valuestream2009.wordpress.com/2010/05/14/riskybusiness-why-better-risk-management-can-protect-lives-theenvironment-part-2/ ⇒ 33 Mitchell, M. (1999), An Introduction to Genetic Algorithms, MIT Press, NCambridge, USA, 218 p. ⇒ 21 Neumann, K. (1990), Stochastic Project Networks, Springer Berlin Heidelberg, 237 p. ⇒ 39 Neumann, K., Schwindt, C., Zimmermann, J. (2013), Project Scheduling with Time Windows and Scarce Resources: Temporal and Resource-Constrained Project Scheduling with Regular and Nonregular Objective Functions, Springer Berlin Heidelberg, 340 p. ⇒ Nielsen, J. (1994), Usability Inspection Methods: Chapter “Heuristic Evaluation”, John Wiley & Sons, Inc., New York, USA, pp. 25–62. ⇒ 40 Oliveira, D. M. (2014), “Brief History of Machine Learning”, available at: https://www.linkedin.com/pulse/2014102410111052688293-brief-history-of-machine-learning ⇒ 27 Patterson, J. H. (1984), “A Comparison of Exact Approaches for Solving the Multiple Constrained Resource, Project Scheduling Problem”, Manage. Sci., Vol. 30, No. 7, pp. 854–867. ⇒ 39 PMBOK (2013 a), “PMBOK Guide and Standards: PMI India”, available at: http://www.pmi.org.in/pmbok.asp ⇒ 32, 33, 37 PMBOK (2013 b), “PMBOK Guide and Standards: A Guide to the Project Management Body of Knowledge”, available at: http://www.pmi.org/PMBOK-Guide-and-Standards.aspx ⇒ 36 Pochet, Y., Vyve, M. V. (2004), “A general heuristic for production planning problems”, INFORMS Journal on Computing, Vol. 16, No. 3, pp. 316–327 ⇒ 41 Pressman, R. S. (1992), Software Engineering: A Practitioners Approach, McGrawHill, Inc., New York., 976 p. ⇒ 28 Przydatek, M. (2014), “Machine Learning”, available at: http://mariuszprzydatek.com/category/machine-learning/ ⇒

46


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 52.

53.

54.

55. 56.

57.

58.

59.

60.

61. 62. 63.

64.

22, 27 Ranjbar, M., Davari, M. (2013), “An Exact Method for Scheduling of the Alternative Technologies in R&D Projects”, Computers & Operations Research, Vol. 40, No. 1, pp. 395–405. ⇒ 38 Rosen, A. (2008), Effective IT Project Management: Using Teams to Get Projects Completed on Time and Under Budget, PHI, India, 304 p. ⇒ 34, 36 SAM (2013), “Software Asset Management”, available at: http://w3.softwareone.com/en-za/Licensing/SoftwareAsset Management/Pages/default.aspx ⇒ 30 Schwalbe, K. (2013 a), Information Technology Project Management, Course tehcnology, 635 p. ⇒ 33 Schwalbe, K. (2013 b), Revised An Introduction to Project Management: With Brief Guides to Microsoft Project 2013, CreateSpace Independent Publishing Platform, 524 p. ⇒ 33 Talbot, F. B., Patterson, J. H. (1984), “An Integer Programming Algorithm with Network Cuts for Solving the Assembly Line Balancing Problem”, Management Science: Journal of the Institute for Operations Research and the Management Sciences, Vol. 30, No. 1, pp. 85–89. ⇒ Shervashidze, N., Schweitzer, P., Van L. E. J., Mehlhorn, K., Borgwardt, K. M. (2011), “Weisfeiler-Lehman Graph Kernels”, J. Mach. Learn. Res., Vol. 12, pp. 2539–2561. ⇒ 27 Smith, B., Crockett, N. M. (1989), Business Services: Catching the wind, Bureau of Employment Security, Division of Economic Analysis & Research., 20 p. ⇒ 31 Smola, A. J., Schölkopf, B. (2004), “A Tutorial on Support Vector Regression”, Statistics and Computing, Vol. 14, No. 3, pp. 199– 222. ⇒ Slowinski, R., Weglarz, J. (1998), Advances in Project Scheduling, Elsevier Science Publishing, New York, 531 p. ⇒ 39 Sommerville, I. (1998), Software Engineering, Addison Wesley, New York, 684 p. ⇒ 28 Sprecher, A., Drexl, A. (1996), Solving Multi-Mode ResourceConstrained Project Scheduling Problems by a Simple, General and Powerful Sequencing Algorithm. Part I: Theory, Research Report No. 385, Christian-Albrechts-Universität zu Kiel, 222 p. ⇒ 40 Sprecher, A., Drexl, A. (1996), Solving Multi-Mode Resource-


47

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

65.

66.

67.

68.

69. 70.

Constrained Project Scheduling Problems by a Simple, General and Powerful Sequencing Algorithm. Part II: Computation, Research Report No. 386, Christian-Albrechts-Universität zu Kiel, 208 p. ⇒ 40 Study Guides and Strategies (2011), “Cooperative learning series: Problem-based learning”, available at: www.studygs.net/pbl.htm ⇒ Trendowicz, A. (1998), Software Cost Estimation, Benchmarking, and Risk Assessment, The Fraunhofer IESE Series on Software and Systems Engineering, Springer-Verlag Berlin Heidelberg, 322 p. ⇒ 30 Vandecruys, O., Martens, D., Baesens, B., Mues, C., Backer, M. D., Haesen, R. (2008), “Mining Software Repositories for Comprehensible Software Fault Prediction Models”, Journal of Systems and Software, Vol. 81, No. 5, pp. 823–839. ⇒ 28 Wall, M. (1996), A Genetic Algorithm for Resource-Constrained Scheduling, Massachusetts Institute of Technology, Department of Mechanical Engineering, 13 p. ⇒ 31 Wang, X. (2013), “What is Machine Learning?”, available at: times.cs.uiuc.edu/course/598f14/ml-ir.pptx ⇒ 24 Yourdon, E. E. (1999), Death March: The Complete Software Developer’s Guide to Surviving “Mission Impossible” Projects, Prentice Hall PTR, Upper Saddle River, New Jersey, USA, 218 p. ⇒ 37


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

On the star-height of a regular language. Part III: The star-height of an automaton ∗ Boris MELNIKOV, email: [email protected] Samara State University, Samara, Russia Abstract. The star height problem was set in 1963 and solved in 1988; up to now, only two solutions were published. The first one (of K. Hashiguchi) was called “extremely difficult” in some next papers; and the second one (of D. Kirsten, 2005) is much simpler. In this paper we consider a new approach to this problem; the short scheme of the proof is the following one. We define the star height for an automaton, considering all the possible orders of its states and making regular expressions for each order in the usual way. We show, that we can construct corresponding automaton for each regular expression, and therefore we can do this thing for a hypothetical regular expression defining the given regular language and having minimum possible star-height. Then there is the minimum possible value of star-height for some hypothetical automaton defining the given regular language; let this automaton be K. We consider not only K, but also the concrete order τ of its states corresponding to the regular expression having minimum possible star-height. Considering the states of K in the order τ , we obtain for the next state one of the three following things. Either each its loop has equivalent one which does not pass the considered state. Or there exists some other state, which has the smaller value of the order τ and defines the same loops. Or we can add some edges to obtain one of previous cases. Using a finite sequence of such steps, we obtain the automaton, which is equivalent to the given one; moreover, we can a priory limit the number of states of such “minimum” automaton, using the knowledge of the given language only. Thus, this sequence of steps gives the nondeterministic finite automaton having a priori limited number of states, defining the given regular language and having the minimum possible star-height. Keywords and phrases: nondeterministic finite automaton; regular language; the star height problem. ∗

c B. Melnikov, 2015.

On the star-height of a regular language. Part III . . .

49

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Computing Classification System 2012: F.4.3 Mathematics Subject Classification 2013: 05C38

The third part of this paper is the continuation of [1, 2]. We continue the numeration of sections, propositions, formulas and figures, but use new numbers of references.

6

The star-height of finite automaton and the connected things

In this section, we introduce our reformulation of the theory considered in [3, 4], and more recently in [5]. Some parts of this reformulation (some terms and examples) were considered in [6].

6.1

Special version of the proof of Kleene’s theorem

Let us consider a special version of the proof of Kleene’s theorem. For the given automaton (1), let us consider some injective (“ordering”) function τ : Q → R+ . We shall also write, e.g., p < r meaning τ (p) < τ (r), use “max” meaning ( p, if τ (p) ≥ τ (r) max(p, r) = r, if τ (p) < τ (r) , etc. In this section, let us fix K and τ . For some states q, p, r ∈ Q, where p ≥ q and r ≥ q, let us consider some simple path from p to r, 1 whose sequence of vertices is (p, q1 , q2 , . . . , qs , r) such that s ≥ 0 and (∀i ∈ {1, . . . , s}) (qi > q). We shall denote the set of all such paths by ∆q (p, r). We set ∆q = { q 0 is a state of a path of ∆q (q, q) and q 0 6= q } . If ∆q (p, r) 6= 6 o, then we shall write Vq (p, r) (otherwise Vq (p, r)). If Vq (p, r) and Vq (r, p), then we shall write Wq (p, r). 1

We allow p = r. In this case, i.e., if p = r, it is a simple loop.

50

B. Melnikov

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = For each states s, f ∈ Q (we allow s = f ), we shall consider automaton Ks→f = ( Qs→f , Σ, δs→f , {s}, {f }), (12) where Qs→f = {s, f } ∪ Qsf , Qsf = { q ∈ Q | q > max(s, f ) }, and δs→f is constructed in the following way: a

p −→ r δs→f

a

if and only if p −→ r, p ∈ {s} ∪ Qsf , r ∈ {f } ∪ Qsf . δ

We shall construct regular expressions corresponding languages of automata of the type (12) by induction on τ (min(s, f )). We shall define these expressions (i.e., expressions obtained in the way described below) by ρs→f ; if s = f , we shall denote the automaton and the expression by Ks and ρs . Basis of induction. If min(s, f ) = qmax = max({ q | q ∈ Q }), then automaton (12) (i.e., Kqmax ) defines language defined also by regular expression o∗ n a ρqmax = a ∈ Σ | qmax −→ qmax ; (13) remark that anyway ρqmax 3 ε. Step of induction. If s 6= f , then we write expression defining language of automaton Ks→f in the following way: n o [ a ρs→f = a ∈ Σ | s −→ f + ρs→q · ρq · ρq→f (14) q>max(s,f )

(for expression, ∪ symbolizes +’s). And if s = f , then we write expression defining language of automaton Ks in the following way: !∗ n o [ a ρs = a ∈ Σ | s −→ s + ρs→q · ρq · ρq→s . (15) q>s

By the hypothesis of inductions, all the expressions of the right parts of (14) and (15) are already constructed. For some pair s ∈ S and f ∈ F , denote [ Ls→f = L(Ks→f ) + L(Ks→q ) · L(Kq ) · L(Kq→f ). q Tr for each q ∈ / Sr ; a

(r4) there is no edge of the type s1 −→ s2 , where s1 , s2 ∈ Sr . δr

Possible automata for the regular expressions 6 o, ε and a (for each a ∈ Σ) are given on Fig. 18–20 respectively. It is evident, that for each of these expressions, we have SH(Kr ) = 0. Conditions (r3) and (r4) also hold. - s∅ f∅

- sε

- sa

a ?

fa

Fig. 19

Fig. 18

Fig. 20

Now, let us suppose that we already have automata corresponding to given regular expressions 3 p and r. Certainly, we can also suppose that we also have functions τ for these automata obtaining regular expressions which star-height is equal to the star-height of the given expressions (i.e., giving minimum possible star-height). I.e., we can suppose that conditions (r1–r4) hold. Let these automata be Kp = ( Qp , Σ, δp , Sp , Fp ) and Kr = ( Qr , Σ, δr , Sr , Fr ); 2

Notice that we do not assert, that r = R(K, τ ). See, e.g., the example before, where we have obtained 3 factors (ab∗ c)∗ in (17), but we can consider, e.g., τ = (ab∗ c)∗ . 3 I.e., automata defining the same regular languages and having the same star-height.

54

B. Melnikov

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = and corresponding functions be τp : Qp → {1, . . . , |Qp | } and τr : Qr → {1, . . . , |Qr | }. Then we can use the following automata and ordering functions; let us remark in advance, that the transition functions are considered as the sets, and the facts of defining required regular expressions by these automata are evident. For expression (p + r), we can consider automaton K(p+r) = ( Qp ∪ Qr , Σ, δp ∪ δr , Sp ∪ Sr , Fp ∪ Fr ). And the ordering function (for values Tp , Tr and Mp = max τp (qp )) qp ∈Qp

can be the following one:  τp (sp ),    τ (s ) + T , r r p τ(p+r) (q) =  τp (qp ) + Tp + Tr ,    τr (qr ) + Tp + Tr + Mp ,

if if if if

sp ∈ Sp ; sr ∈ Sr ; qp ∈ Q p \ S p ; qr ∈ Q r \ S r .

Evidently, SH(R(K(p+r) , τ(p+r) )) = max SH(R(Kp , τp )), SH(R(Kr , τr )) . Conditions (r1), (r3) and (r4) also hold. For expression (p · r), we can consider automaton K(p·r) = ( Qp ∪ Qr , Σ, δp ∪ δr ∪ δ 0 , Sp , Fr ), where δ 0 (q , a) 3 sr if and only if δ(q , a) 3 fp for all possible fp ∈ Fp , sr ∈ Sr , a ∈ Σ (δ 0 contains no other elements). The ordering function τ(p·r) can be defined in the following way: ( τp (qp ), if qp ∈ Qp ; τ(p+r) (q) = τr (qr ) + Mp , if qr ∈ Qr . Like previous case,

SH(R(K(p·r) , τ(p·r) )) = max SH(R(Kp , τp ), SH(R(Kr , τr )) , and conditions (r1), (r3) and (r4) also hold.

On the star-height of a regular language. Part III . . .

55

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = And for expression (r∗ ), we can consider automaton K(r∗ ) = ( Qr ∪ {q 0 }, Σ, δr ∪ δ 0 , Sr ∪ {q 0 }, Fr ∪ {q 0 } ), where δ 0 (q , a) 3 sr if and only if δ(q , a) 3 fr for all possible fr ∈ Fr , sr ∈ Sr , a ∈ Σ (δ 0 contains no other elements). The ordering function τ(r∗ ) can be defined in the following way:  if q ∈ Qr; τr (q), τ(r∗ ) (q) = τr (q 0 ) = min τr (q) / 2 . q∈Qr

Conditions (r1), (r3) and (r4) are evident; let us prove (r2). By the way of construction K(r∗ ) , we obtain the following fact: each path of its transition graph, which belongs to K(r∗ ) and does not belong to Kr , has to contain a state sr ∈ Sr . Then for any states q, q 0 , q 00 ∈ Qr , where q ∈ / Sr , we obtain the coincidence of the sets ∆q (q 0 , q 00 ) for automata K(r∗ ) and Kr . And only for states sr ∈ Sr , we can obtain some new paths of sets ∆sr (sr , q) and ∆sr (q, sr ). 4 For automaton Kr , corresponding (defined in this section) function τr and some its state q, we denote defined in previous section value SH(ρq ) by SHr (q). Then by Proposition 4 and condition (r3) for automaton K(r∗ ) , we obtain that:   if q ∈ Qr \ Sr ; SH(r∗ ) (q) = SHr (q), SH(r∗ ) (s) ≤ SHr (q) + 1, if s ∈ Sr ;  SH ∗ (q 0 ) = 0 . (r ) Then by condition (r4) for automaton K(r∗ ) , we obtain condition (r2). Thus, using Proposition 6 we can reformulate the star-height problem in the following way: for the given regular language, we have to construct the equivalent finite automaton K having the minimum possible star-height. After that, considering n! bijective functions of the type τ : Q → {1, . . . , n} (where n = |Q|), we construct regular expressions R(K, τ ) and choose the expression having the minimum possible star-height. 4

I.e., the paths which belong to automaton K(r∗ ) and does not belong to Kr .

56

B. Melnikov

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

References 1. Melnikov, B. (2015), “On the star-height of a regular language. Part I: The main definitions”, Applied Discrete Mathematics and Heuristic Algorithms, Vol. 1, No. 1, pp. 52–61. ⇒ 49 2. Melnikov, B. (2015), “On the star-height of a regular language. Part II: Auxiliary constructions”, Applied Discrete Mathematics and Heuristic Algorithms, Vol. 1, No. 2, pp. 63–73. ⇒ 49 3. Eggan, L. (1963), “Transitions graphs and the star height of regular events”, Michigan Math. J., No. 10, pp. 385–397. ⇒ 49 4. McNaughton, R. (1967), “The loop complexity of pure-group events”, Information and Control, No. 11, pp. 167–176. ⇒ 49 5. Sakarovitch, J. (2009), Elements of Automata Theory, Cambridge university press, 782 p. ⇒ 49 6. Melnikov, B, Vakhitova, A. (1998), “Some more on the finite automata”, The Korean J. of Comp. and Appl. Math., Vol. 5, No. 3, pp. 495–506. ⇒ 49


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Parallel partial ranking ∗ ´ Antal Miklós IVANYI, email: [email protected] E¨ otv¨ os Lor´ and University, Budapest, Hungary ´ Zoltán KASA, email: [email protected] Sapientia Hungarian University of Transylvania, Cluj-Napoca, Romania Abstract. Let n, k and p be positive integers (2 ≤ n and 1 ≤ k ≤ n). We rank the k largest numbers of n different numbers using at most p processors. We organize a tournament consisting of pairwise matches ending always with the win of the better player. Let R(n, k, p) denote the minimal number of rounds, necessary to rank the largest k given numbers (in worst case). We use the parallel mathematical model proposed in 1883 by Lewis Carroll, propose a new ranking algorithm Combined and using it we characterize R(n, k, p) and its expected values. Keywords and phrases: parallel ranking, partial ranking. Computing Classification System 1998: G.2.2 Mathematics Subject Classification 2010: 05C07

1

Introduction

Ranking of objects is a typical practical problem. One of the popular ranking methods is the pairwise comparison of the objects. Many authors describe different applications: e.g. Newman et al. [48] network modeling, while Csató, Iványi and Pirzada [16, 36] sport applications. In this paper we deal with the minimization of the number of rounds to determine the correct ordering of the k best participants of a given tournament. The structure of the paper is the following. After this introductory Section 1, in Section 2 the used definitions are gathered, while in Section 3 examples are presented. In Section 4, the earlier known mathematical and algorithmic results are described. Section 5 contains a ∗

c A. M. Iv´

anyi, Z. K´ asa, 2015.

58

A. M. Iványi, Z. Kása

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = short review on some popular connected topics, while in Section 6 the new mathematical and algorithmic results are described. In Section 7 the proposed algorithms are described. In Section 8 we summarize the number of required by Combined rounds for n = 2, 6 , 8 and the possible k values, while in the closing Section 9 the results are summarized.

2

Definitions

In this paper we deal with loopless directed graphs (with generalized tournaments in which the teams do not play against oneself) and sequences of integers. Let a, b, and n be nonnegative integers with a ≤ b, and let q = (q1 , . . . , qn ) be a sequence of a nonnegative integers. In an (a, b, n) tournament there are n players, and the number of points distributed on the matches lies between a and b. If a ≤ c+d ≤ b and all integer result permitted, then the tournament is complete, otherwise incomplete. For example: • tennis is a complete (1, 1, n)-sport with permitted results R = {0 : 1, 1 : 0}; • chess is a complete (2, 2)-sport with R = {0 : 2, 1 : 1, 2 : 0}; • football is an incomplete (2, 3)-sport with R = {0 : 3, 1 : 1, 3 : 0}; • Davis Cup is an incomplete (3, 5)-sport with R = {0 : 3, 0 : 4, 0 : 5, 1 : 3, 1 : 4, 2 : 3, 3 : 0, 3 : 1, 3 : 2, 4 : 0, 4 : 1, 5 : 0}. Natural tool of the representation of the results of tournaments with n teams is a directed multigraph G on n vertices (T1 , . . . , Tn ) or an n × n sized point matrix M. If Ti gets x points in the match against Tj then G contains x edges directed from Ti to Tj and mij = x in M. Let Rmin (n, k, p) denote the minimal number of rounds, which are sufficient (for at least one permutation of the players). Q(n, k, p) the minimal number of matches at minimal number of rounds to rank the largest k numbers using in each round at most bn/2c processors. We suppose that the numbers are different (in chess tournaments the stronger player always wins).

Parallel partial ranking

59

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = The aim of this paper is to characterize the functions defined in this section.

3

Examples

We will use the terminology of sport: the compared objects are called players or teams, the comparisons matches, the given rules of comparisons sports, the execution of the pairwise comparisons is called (round-robin) tournament, and the number of points gathered by a team is its score. The typical results of the matches are win+loss and draw. The winners get p points, the loser 0 points, and after a draw both teams get 1 point. For example, Table 1 contains the results of the Group C in the European Football Championship 2012 [66]. Team T1 T2 T1 = Spain − 1 T2 = Italy 1 − T3 = Croatia 0 1 T4 = Ireland 0 0

T3 T4 3 3 1 3 − 3 0 0

si 7 5 4 0

Table 1: Point matrix of the Group C in the European Football Championship 2012 [66]. The results of a tournament is often presented in the sport newspapers in compressed form using sport matrices. For example Table 2 contains the sport matrix of the tournament whose point matrix is shown in Table 1 [66] where wi is the number of wins, di is the number of draws, li is the number of losses, and si is the score of Ti . Team wi T1 2 T2 1 T3 1 T4 0

di 1 2 1 0

li 0 0 1 3

si 7 5 4 0

Table 2: Sport matrix of Group C of the European Football Championship.

60


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Since in the case of p-sports if n is fixed, then wi + di + li = n − 1 and si = pwi + di , therefore the corresponding S sport matrix can be defined by any two sequences from w, d, l, and s. The sport matrix in Table 2 determines the unique point matrix shown in Table 1. But generally this is not so: usually many score matrices correspond to a given sport matrix and sometimes does not exist point matrix corresponding to a sport matrix.

4

Earlier known results

4.1

Optimum sequential sorting (the case n = k and p = 1)

There are many versions of the sorting problem. A problem can be formulated for example by the set of allowed operations and with the aim of the selection. The classical sequential version was formulated by Hugo Steinhaus in 1950 [62, pages 323–327]: the numbers are different and only the operation of comparison of two elements is allowed. There are many lower and upper bounds of the number of necessary comparisons [12, 26, 27, 34, 37], and also near-optimal algorithms as Ford-Johnson [25] and its improvements [10, 42, 43, 44], and brute-force results [51, 52, ?].

4.2

Parallel selection of several ranked elements (the case k = 1, 2, 3)

It is interesting, that the first parallel model was proposed in 1883 by Lewis Carrollg 1 [?]. He also used only comparisons, but allowed to use parallel processors. Let R(n, k, p) be the number of necessary rounds for the perfect ranking of the best k players on p processors. Old results are, that R(n, 1, dn/2e =e log nd and R(n, 2, dn/2e)) = dlog ne + dlogdlog nee, see [38]. The first attempt to investigate R(n, 3, dn/2e) was also made by Lewis Carroll [?], who proposed a parallel method to rank the three best tennis players in a lawn tournament of 32 players. 1

Well-known English writer and mathematician.


61

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A rule to approximately decide on the number of necessary round for given number of players and prizes was proposed by Mr. Model 2 [33]. His formula is P +7·Q , (1) R= 5 where R is the number of rounds, P is the number of participants, and Q is the number of qualified places. Using the results of a concrete tournaments, in 1972 Haág and Meleghegyi [31] proved, that the formula (1) [33, 46] can give false result.

5

Further popular topics

There are further interesting and popular research problems connected with selection. In 1983 Ajtai, Komlós and Szemerédi [3] described a sorting network consisting of n/2 parallel processors and an algorithm which in c log n rounds sorts n different numbers. Alon, Azar, and Vishkin devoted a series of papers [5, 6, 7, 8, 9] to the problem of approximate parallel selection. For example, the problem of the sequential selection of the largest and smallest elements is solved [54, 67]. After unsuccessful attempts of Schreier [58] and Slupecki [61], in 1994 Kislitsyn [38] determined the number of necessary comparisons of the simultaneous sequential selection of the best and second best players. Popular problem is the selection only of the t’th number from n different numbers using only binary comparisons. In 1989 Ajtai, Komlos, Steiger, and Szemerédi [1, 4] proved, that using n processors it is possible to select the k-t largest numbers from n different numbers in O log log n) time. The minimal number [32, 71] and the average number [45, 47] of necessary comparisons are also studied. Popular is the special case when we wish to select the median, that is the dn/2e-th element [11, 18, 57]. There are results in connection with the sequential ranking of the k largest elements too [55]. There are experiments with other operations: e.g., Gasarch used restricted quadratic queries [28], Lam and Lam [40] parity tests, Yao 2

A man, who used the name Mr. Model as a pseudonym.

62


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = [70] median tests. Several random algorithms are also elaborated to solve selection problems [?, 65].

6

New results

6.1

Properties of Swiss triangle

Some properties of the elimination and Swiss tournaments can be characterized by the following triangle (we propose to call it Swiss triangles). For the simplicity, we will investigate tournaments where the number n of players is a power of two. Let m = 5 and n = 25 = 32. Two possible representation of the Swiss triangle τ5 is as follows. 32 0 1 16 16 2 8 16 8 3 4 12 12 4 4 2 8 12 8 2 5 1 5 10 10 5 1 or 0 1 2 3 4 5

32 16 8 4 2 1 0

16 16 12 8 5 1

8 12 12 10 2

4 8 2 10 5 1 3 4 5

m For a given m, let Tr,c be the element of τm in row r (r = 0, 1, . . . , m) and column c (c = 0, 1, . . . , m). By the definition of the triangle:

m T0,0 = 2m 1 m m Tr,0 = Tr−1,0 , r = 1, . . . , m 2 1 m m Tr,r = Tr−1,r−1 , r = 1, . . . , m 2 1 m m m T + Tr−1,c , r = 2, . . . , m; c = 1, . . . , r − 1. (2) Tr,c = 2 r−1,c−1


63

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = To solve this recurrence, let us observe the similarity to the Pascal’s triangle [69], except a multiplier depending only on the row index, therefore we try to obtain the solution in the form: r m Tr,c = Cr . c Introducing this in the recursive equation (2), we will obtain: r−1 r−1 r 1 + Cr , = Cr−1 2 c−1 c c and this yields 1 1 1 Cr = Cr−1 = · · · = r C0 = r T0,0 = 2m−r , 2 2 2 that is m Tr,c

r = 2m−r , c

r = 0, 1, . . . , m; c = 0, 1, . . . , r.

Now we present several further properties of the Swiss triangles. 1. The sum of the elements in each row is 2m . r r X X r m Tr,c = 2k−m = 2m−r 2r = 2m . c c=0 c=0 This property is a direct consequence of the definition of the Swiss triangles. 2. The sum of the elements in column c is m−c m m X 1 c + i X X r 1 m = 2m−c . Σc = Tr,c = 2m r i 2 c 2 i r=c r=c i=0 The difference between the sum of two adjacent columns is m+1 Σc−1 − Σc = . c

64


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 3. (Right angle property.) By direct computation, m T2r+1,r

0 1 2 3 4 5

=

m T2r+1,r+1

=

m T2r+2,r+1 ,

m−2 r = 1, 2, . . . , . 2

32 16 16 8 16 8 4 12 12 4 2 8 12 8 2 1 5 10 10 5 1 0 1 2 3 4 5

4. (Diagonal property: the c th diagonal is equal to the c th column.) By direct computation, m m Tc+i,i = Tc+i,c

0 1 2 3 4 5

6.2

i = 0, 1, . . . , m − c.

32 16 16 8 16 8 4 12 12 4 2 8 12 8 2 1 5 10 10 5 1 0 1 2 3 4 5

New algorithmic results

At the beginning of the pairing, all players are active. If the rank of a player becomes known, then we delete the given player and he becomes inactive. In the following lemmas we suppose that the investigated tournament has n participants. Lemma 1 If a player played all matches and has l losses, then his rank is l + 1. Proof.

Only the winners of his lost matches precede the given player.

Lemma 2 If a player has w wins, then his rank is at most n − w.


65

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Proof.

The players who lost i matches have larger rank than i.

Lemma 3 (Carroll [?]) If a player has l losses, then his rank is at least l + 1. Proof. See [?]. Our algorithms organize the matches, and if the rank of an active player is fixed, then we delete him (to decrease the number of the necessary matches) – and this player becomes inactive. Lemma 4 If there is only one unbeaten player among the active players, then he is the best among the active players. Proof. The remaining players have at least one loss, therefore they can not be the best active player.

7

Algorithms for minimization of the number of rounds

In this section we present several heuristic algorithms. We use the parallel mathematical model proposed by Lewis Carroll in 1883 [?]. We suppose, that n is a power of 2 (if not, then we can add weak “dummy” players). We have n/2 processors (or tennis courts, chess boards). The processors work in rounds: each processor is able to execute one comparison in each round. The numbers (players) can participate in each round at most in one comparison/match.

7.1

Swiss-Perfect algorithm

Probably the most distributed parallel ranking method is the Swiss System, which has several good description [15, 16, ?, 20, 21, 22, 23, 29, 33, 39, 50, 59, 60, 63, 64, 68].

7.2

Deleting algorithm

If the rank of a player Pi is known, then we delete it. Another reason for the deleting is, if we wish only to rank the k best player – then the players having at least k losses have no chance to get a prize, therefore we delete them too.

66


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

7.3

Transitive algorithm

Our numbers (players) are different, so their set is well-ordered. Therefore if Pq wins against Pr , and Pr wins against Ps , then we can increase the number of wins of Pq and the number of losses of Ps , since the result of the match between Pq and Ps is determined.

7.4

Combined algorithm

The pairing algorithm Combined joins the ideas of the Transitive, Delete. Our numbers (players) are different, so their set is well-ordered. According to Transitive if Pq wins against Pr , and Pr wins against Ps , then we can increase the number of wins of Pq and the number of losses of Ps , since the result of the match between Pq and Ps is determined. In a given tournament let denote the players so: Pi,ri , where i is the serial number and ri is the rank of the given player for i = 1, . . . , n. Input. n: the size parameter of the set of the rankable players (n ≥ 2); k: the prescribed number of players to be ranked (1 ≤ k ≤ n); π = P1,r1 , P2,r2 , . . . , Pn,rn ; a permutation of the rankable numbers. Output. Mn×n : the result matrix (mi,j = 1x means, that the match of Pi and Pj was in the x th round and Pi won, Pj lost; R = [ri1 , ri2 , . . . , rik ]: the sequence of the largest k ranked numbers; ρ: the number of rounds.

7.5

Transitive algorithm

The following algorithm Transitive [14, 41] after the second and further rounds adds to the result matrix M the results which are consequences of the transitivity of the tournament. The pseudocodes of these algorithms can be found in [35].

7.6

Combined-All algorithm

Algorithm Combined-All generates all permutations of the players and determines the minimal, maximal and average numbers of the necessary rounds. For the generation of the permutations, we use the algorithm of Nijenhuis and Wilf [49].


67

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

8

Minimal, maximal and average number of rounds for concrete permutations

In this section we give the minimal, average and maximal number of rounds in the case of Combined-All. The first analysis of the average performance measures of selection algorithms appeared probably in the paper of Floyd and Rivest [24]. Later e.g. Cunto and Munro [17] continued the investigations.

8.1

A minimal example with four players

As the first simple example (requiring a minimal number of rounds) let the input data of Combined the parameter k = 3 and the following permutation of four players: π1 = [P1,1 , P2,3 , P3,4 , P4.2 ] = [1, 3, 4, 2], where Pi,j denotes the player with index i and rank j. The basic principle of Combined is that it divides the loss groups into two parts and pairs the first element of the first part with the first element of the second part and so on. At the beginning of the pairing we have only one loss group L0 (0) = [P1,1 , P2,3 , P3,4 , P4.2 ], so the pairing in the first round is 1-4, 2-3. The point matrix M1 after the first round is Table 3. Player 1/1 2/3 3/4 4/2 1/1 X 11 2/3 X 01 3/4 01 X 4/2 11 X Table 3: Point matrix M of Combined after the first round. Now we compute the loss groups L0 = [1, 2] and L1 = [3, 4], aresulting the pairing 1-2, 4-3 of the second round. Table 4 shows the point table after the second round, but before the application of Transitive. Now Transitive computes the results of the matches 1-2 and 34. Table 5 contains the resulted point matrix M (the results of Transitive are bold). Now using Lemma 1 we can successively determine the ranks of all fourplayers: announce, the rank of P1,1 is 1, the rank of P4,2 is 2, the rank of P2,3 is 3 and the rank of P3,4 is 4.

68


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Player 1/1 2/3 3/4 4/2 1/1 X 11 12 2/3 X 12 01 3/4 01 02 X 4/2 02 11 X Table 4: Point matrix M of Combined after two rounds. Player 1/1 2/3 3/4 4/2 1/1 X 12 11 12 2/3 02 X 12 01 3/4 01 02 X 02 4/2 02 11 12 X Table 5: Point matrix M of Combined after two full rounds. The final result is, that Combined ranked the whole permutation π1 in two rounds.

8.2

A maximal example with four players

As the second simple example (requiring a maximal number of rounds) let the input data of Combined the parameter k = 3 and the following permutation of four players: π3 = [P1,1 , P2,2 , P3,3 , P4.4 ] = [1, 2, 3, 4]. Now the pairing of the first round is 1-3, 2-4, and the pairing of the second round is 1-2, 3-4. The point matrix after two rounds is Table 6. Player 1/1 2/2 3/3 4/4 1/1 X 12 11 12 2/2 02 X 12 3/3 01 X 11 4/4 02 02 01 X Table 6: Point matrix M of Combined after the first round. Now we can delete P1,1 as the winner of the tournament, and also P4,4 as the fourth player. But the perfect ranking of all players requires a third round containing only the match 2-3. So the perfect ranking of this permutation requires three rounds. The four players can be permuted in 24 ways. Those permutations (8 from 24) allow the success of Transitive, where the second and


69

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = third best players meet in the first round. So RCOM (4, 2, 2) = 2 2/3.

8.3

A near maximal example with six players

In the third example we have six players. Now let k = 5 and π3 be the following permutation π3 = [P1,1 , P2,6 , P3,5 , P4,2 , P5,3 , P6,4 ]. The pairing of the first round is 1-4, 5-2, 6-3, and the pairing of the second round is 1-2, 3-4, 5-6. Now Transitive does not find new results. Table 7 contains the point matrix M after two full rounds. Player 1/1 2/6 3/5 4/2 5/3 6/4 1/1 X 12 12 11 2/6 02 X 01 02 3/5 02 X 01 02 4/2 01 12 X 5/3 11 X 12 6/4 1 2 11 02 X Table 7: Point matrix M of Combined after the second round. The pairing of the third round is 1-5, 4-6, 2-3, the received point matrix is Table 8. Player 1/1∗ 2/6∗ 3/5 4/2 5/3 6/4 1/1∗ X 12 12 11 12 13 2/6∗ 02 X 03 0 3 01 0 3 3/5 02 13 X 02 02 4/2 01 13 12 X 1 − 4 13 5/3 02 11 X 12 6/4 03 11 11 03 02 X Table 8: Point matrix M of Combined after the third round. Now Delete deletes P1,1 and P2,6 . The pairing of the fourth round is 4-5 (P4,2 can play only this match). The result of the fourth round is Table 9.

70


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Player 1/1∗ 2/6∗ 3/5 4/2 5/3 6/4 1/1∗ X 12 12 11 13 13 2/6∗ 02 X 03 0 3 01 0 3 3/5 02 13 X 02 02 02 4/2 01 13 12 X 14 13 5/3 03 11 12 04 X 12 6/4 03 13 11 03 02 X Table 9: Point matrix M of Combined after the fourth round. Using Transitive we get the rank of all player, and the result, that Combined needs four rounds to rank the given permutation.

9

Summary n 2 4 4 4 6 6 6 6 6 8 8 8 8 8 8

k min max average 1 1 1 2/2 = 1 1 2 2 48/24 = 2 2 2 3 66/24 = 2.666667 3 2 3 66/24 = 2.666667 1 2 3 2040/720 = 2.833333 2 2 5 2640/720 = 3.666667 3 2 5 2880/720 = 4 4 2 5 2976/720 = 4.133333 5 2 5 2992/720 = 4.155556 1 3 3 120960/40320 = 3 2 3 5 170112/40320 = 4.219048 3 3 6 191488/40320 = 4.749206 4 3 7 196096/40320 = 4.863492 5 3 7 198272/40320 = 4.917460 6 3 7 198784/40320 = 4.930158

Table 10: Minimal, maximal and average number of rounds of Combined. On the base of these simulation data we can formulate the following conjectures. Conjecture 1 If n is an even number, then for algorithm Combined at least blog nc rounds are necessary to determine the best player.


71

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Conjecture 2 If n is an even number, then for algorithm Combined at most n − 1 rounds are necessary to determine the best player.

10

Acknowledgements

The authors thank Ervin Haág and Gyula Császár for the useful consultations, Richard Forster for the computer experiments, further Vladimir N. Potatov and László Csató for their help connected with the bibliography.

References 1. Ajtai, M., Komlos J., Steiger W. L., Szemerédi E. (1986), “Deterministic selection in O(log log n) parallel time”, ACM Symp. on Theory of Computing, Vol. 18, pp. 188–195. ⇒ 61 2. Ajtai, M., Komlos J., Steiger W. L., Szemerédi E. (1989), “Optimal parallel selection has complexity O(log log n)”, J. Comput. Syst. Sci., Vol. 38, pp. 125–133. ⇒ 61 3. Ajtai, M., Komlos J., Szemerédi E. (1983), “Sorting in c log n parallel steps”, Combinatorica, Vol. 3, pp. 1–19. ⇒ 61 4. Ajtai, M., Komlos J., Steiger W. L., Szemerédi E. (1989), “Almost sorting in one round”, Adv. Comput. Res, Vol. 5, pp. 117–125. ⇒ 61 5. Alon, N., Azar, Y. (1988), “Sorting, approximate sorting and searching in rounds”, SIAM J. Discrete Math., Vol. 1, pp. 269– 280. ⇒ 61 6. Alon, N., Azar, Y. (1991), “Parallel comparison algorithms for approximation problems”, Combinatorica, Vol. 11, No. 2, pp. 97–122. ⇒ 61 7. Alon, N., Azar, Y. (2006), “The average complexity of deterministic and randomized parallel comparison-sorting algorithms”, SIAM J. Comp., Vol. 17, No. 6, pp. 1178–1192. ⇒ 61 8. Alon, N., Azar, Y. (2006), “Sorting, approximate sorting, and searching in rounds”, , Vol. 1, No. 3, pp. 269–280. ⇒ 61 9. Alon, N., Azar, Y., Vishkin U. (1986), “Tighte complexity bounds for parallel comparison sorting”, IEEEE, 27th Annual Symp. Foundations Comp. Science (27–29. October, 1986, Toronto, On, Canada), pp. .502–510 ⇒ 61

72


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 10. Ayala-Rincón, M., de Abreu, B. T., de Siqueira, J. (2007), “A variant of the Ford-Johnson algorithm that is more space efficient”, Inf. Proc. Letters, Vol. 102, No. 5, pp. 201–207. ⇒ 60 11. Bent, S. W., John, J. W. (1985), “Finding the median requires 2n comparisons”, Proc. Seventeenth annual ACM symposium on Theory of computing (STOC’85), pp. 213–216. ⇒ 61 12. Blum, M., Floyd, R. W., Pratt, W., Rivest, R. L., Tarjan, R. E. (1973), “Time bounds for selection”, J. Computer System Sci., Vol. 7, pp. 448–461. ⇒ 60 13. Carroll, L. (1883), “Lawn tennis tournaments”, St. James’ Gazette, August 1, pp. 5–6. Reprinted in The Complete Works of Lewis Carroll, New York Modern Library, 1947. ⇒ 14. Cormen, T. H., Leiserson, Ch. E., Rivest, R. L., Stein, C. (2009), Introduction to Algorithms, The MIT Press, McGraw Hill, Cambridge/New York, 1292 p. ⇒ 66 15. Csató, L. (2012), Ranking by pairwise comparisons for Swisssystem tournaments, available at: http://www.springerlink.com/content/1435-246x/ ⇒ 65 16. Csató, L. (2015), Ranking in Swiss system chess team tournaments, Corvinus Economics Working Papers, Budapest, 1/2015, 38 p. ⇒ 57, 65 17. Cunto, W., Munro, J. I. (1989), “Average case selection”, J. ACM, Vol. 36, No. 2, pp. 270–279. ⇒ 67 18. Dor, D., Zwick, U. (1999), “Selecting the median”, SIAM J. Comp., Vol. 7, No. 5, pp. 1722–1758. ⇒ 61 19. Elo, A. E. (1978), The Rating of Chess Players: Past and Present, Batsford, London, 112 p. ⇒ 20. FIDE (2013), Handbook. 04. FIDE Swiss rules, available at: http://www.fide.com/component/handbook/?id=83&view=article ⇒ 65 21. FIDE (2013), Handbook. 05. FIDE Tournament Rules, available at: https://www.fide.com/component/handbook/?id=20 &view=category ⇒ 65 22. FIDE (2015), Basic rules for Swiss Systems, available at: http://www.fide.com/fide/handbook.html?id=83&view=article ⇒ 65 23. FIDE (2015), Dutch System, available at: https://www.fide.com/fide/handbook.html?id=167&view=article ⇒ 65


73

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 24. Floyd, R. W., Rivest, R. L. (1975), “Expected time bounds for selection”, Comm. ACM, Vol. 18, No. 3, pp. 165–172. ⇒ 67 25. Ford, L. S., Johnson, A. (1959), “Tournament problem”, Amer. Math. Monthly, Vol. 66, pp. 387–389. ⇒ 60 26. Fussenegger, F., Gabow, H. N. (1976), “Using comparison trees to derive lower bounds for selection problems”, in Proc. 17th IEEE Symp. Comp. Sci., Houston, IEEE, pp. 178–182. ⇒ 60 27. Fussenegger, F., Gabow, H. N. (1979), “A counting approach to lower bounds for selection problems”, J. ACM, Vol. 26, No. 2, pp. 227–238. ⇒ 60 28. Gasarch, W. I. (1991), “On selecting the k largest with restricted quadratic queries”, Inform. Process. Lett., Vol.38 , No. 4, pp. 193– 195. ⇒ 61 29. Good, I. J. (1955), “On the marking of chess players”, The Mathematical Gazette, Vol. 39, pp. 292–296. ⇒ 65 30. Gross, J. L., Yellen, J., Zhang, P. (2013), Handbook of Graph Theory, CRC Press, 1630 p. ⇒ 31. Haág, E., Meleghegyi, Cs. (1972), “A middle final – nothing deciding [Egy középdöntö – amelyik semmit sem döntött el]”, Magyar Sakkélet, Vol. 10, pp. 190–191. ⇒ 61 32. Hadian, A., Sobel, M. (1970), “Selecting the t th largest using binary errorless comparisons”, Combinatorial Theory and its Applications, II, Proc. Colloq., Balatonf¨ ured, North-Holland, Amsterdam, pp. .585–599 ⇒ 61 33. Hollosi, A., Pahle, M. (2013), Swiss pairing, available at: http://senseis.xmp.net/?SwissPairing ⇒ 61, 65 34. Hyafil, L. (1976), “Bounds for selection”, SIAM J. Comput., Vol. 5, No. 5, pp. 109–114. ⇒ 60 35. Iványi, A. M., Kása, Z. (2015), “Quick partial ranking”, Acta Univ. Sapientiae Inform., Vol. 7, No. 2, submitted. ⇒ 66 36. Iványi, A. M., Pirzada, Sh. (2011), “Comparison based ranking”, in: Algorithms of Informatics, AnTonCom, Budapest, Vol. 3, pp. 1209–1258 ⇒ 57 37. Kirkpatrick, D. G. (1981), “A unified lower bound for selection and set partitioning problems”, J. ACM, Vol. 28, pp. 150–165. ⇒ 60 38. Kislitsyn, S. S. (1964), “Finding the k th element in ordered set with pairwise comparisons [O vydelenii k-go elementa uporyadochennoy sovokupnosti putem poparnyh sravneniy]”, Siberian Math. J. [Sibirskiy Matematicheskiy Zhurnal], Vol. 5,

74


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 39.

40.

41.

42. 43.

44.

45. 46. 47.

48.

49.

50. 51. 52.

53.

No. 2, pp. 557–564. ⇒ 60, 61 Kujansuu, E., Lindberg, T., Mäkinen, E. (1995), The stable roommates problem and chess tournament pairings, University of Tampere, Dept. of Comp. Sci., Report A-1995-2, 10 p. ⇒ 65 Lam, T. W., Lam, H. F. (2000), “Selecting the k largest elements with parity tests”, Discrete Appl. Math., Vol. 101, No. 1–3, pp. 187–196. ⇒ 61 Lee, A., J. (2015), Discrete Structures for Computer Science, available at: https://people.cs.pitt.edu/∼adamlee/courses/cs0441/ lectures/lecture27-closures.pdf ⇒ 66 Manacher, G. K. (1979), “The Ford-Johnson sorting algorithm is not optimal”, J. ACM, Vol. 26, No. 3, pp. 441–456. ⇒ 60 Manacher, G. K. (1979), “Significant improvements to the HwangLin merging algorithm”, J. ACM, Vol. 26, No. 3, pp. 434–440. ⇒ 60 Manacher, G. K., Bu, T. D., Mai, T. (1989), “Optimal combinations of sorting and merging”, J. ACM, Vol. 36, pp. 290–334. ⇒ 60 Mäkinen, E. (1996), “Programming projects on chess”, ACM SIGCSE Bull., Vol. 28, No. 4, pp. 41–44. ⇒ 61 Mr. Model (2013), Formula for the number of necessary rounds, available at: http://www.chesscafe.com/text/geurt125.pdf ⇒ 61 Motoki, T. (1982), “On the average-case complexity of selecting the k th best”, Inf. Proc. Letters, Vol. 15, No. 5, pp. 214–219. ⇒ 61 Newman, M., Barabási, A. L., Watts, D. J. (2006), The Structure and Dynamics of Networks, Princeton University Press, 592 p., ⇒ 57 Nijenhuis, A., Wilf, H. S. (1978), Combinatorial Algorithms for Computers and Calculators, Academic Press, New York, 302 p., ⇒ 66 ´ Olaffson, S. (1990), “Weighted matchings in chess tournaments”, J. Oper. Res. Soc., Vol. 41, No. 1, pp. 17–24. ⇒ 65 Peczarski, M. (2004), “New results in minimum-comparison sorting”, Algorithmica, Vol. 40, pp. 133–145. ⇒ 60 Peczarski, M. (2007), “The Ford-Johnson algorithm still unbeaten for less than 47 elements”, Inf. Proc., Vol. 101, No. 3, pp. 126–128. ⇒ 60 Peczarski, M. (2012), “Towards optimal sorting of 16 elements”,


75

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 54. 55. 56. 57. 58.

59. 60. 61. 62. 63. 64. 65.

66. 67. 68. 69. 70. 71.

Acta Univ. Sapientiae, Inform., Vol. 4, No. 2, pp. 215–224. ⇒ Pohl, I. (1972), “A sorting problem and its complexity”, Comm. ACM, Vol. 15, No. 6, pp. .462–464 ⇒ 61 Ramanan, P. V., Hyafil, L. (1984), “New algorithms for selection”, J. Alg., Vol. 5, pp. 557–578. ⇒ 61 Reischuk R. (1985), “Probabilistic parallel algorithms for sorting and selection”, SIAM J. Comput., Vol. 14, No. 2, pp. 396–409. ⇒ Schönhage, A., Pattinson M., Pippenger, N. (1976), “Finding the median”, J. Computer System Sci., Vol. 13, pp. 184–199. ⇒ 61 Schreier J. (1932), “On the tournament elimination systems [O systemach eliminacji w turniejach]”, Mathesis Polska, Vol. 7, pp. 154–160. ⇒ 61 Sensei’s Library (2015), Swiss pairing, available at: http://senseis.xmp.net/?SwissPairing ⇒ 65 Sensei’s Library (2015), Tie breakers, available at: http://senseis.xmp.net/?Tiebreaker ⇒ 65 Slupecki, L. (1951), “On the systems of tournaments”, Colloquium Math., Vol. 2, No. 4, pp. 286–290. ⇒ 61 Steinhaus, H. (1950), Mathematical Snapshots, Oxford University Press, New York, 266 p, ⇒ 60 Swiss Perfect Library (2015), Swiss Perfect 2.4. for DOS, available at: http://www.swissperfect.com/download.htm ⇒ 65 Swiss Perfect Library (2015), Tie-Breaks in Swiss Tournaments, available at: http://www.swissperfect.com/tiebreak.htm ⇒ 65 Ting, H. F., Yao, A. C. (1994), “A randomized algorithm for finding maximum with o((lg n)2 ) polynomial tests”, Inform. Process. Lett., Vol. 49, pp. 39–43. ⇒ 62 UEFA (2015), European Football Championship 2012, available at: http://www.uefa.com/uefaeuro/ ⇒ 59 ´ (1983), “On the largest and smallest elements”, Ann. Varecza, A. Univ. Sci. Budapest. Sect. Comput., Vol. 4, pp. 3–10. ⇒ 61 Wikipedia (2015), Swiss-system tournament, available at: http://en.wikipedia.org/wiki/Swiss-system tournament ⇒ 65 Wikipedia (2015), Pascal’s triangle, available at: http://en.wikipedia.org/wiki/Pascal%27s triangle ⇒ 63 Yao, A. C. (1989), “Selecting the k largest with median tests”, Algorithmica, Vol. 4, pp. 293–300. ⇒ 62 Yap, C. K. (1976), “New upper bounds for selection”, Comm. ACM, Vol. 19, No. 9, pp. 501–508. ⇒ 61

76


ABOUT THE AUTHORS Rajankumar S. BICHKAR – doctor of philosophy (computer science), professor of Electronics Department, College of Engineering and Management, Wagholi (Pune, Maharashtra, India). Mary BORDERIES – graduate student of Polytechnical Institute of Toulouse INP-ENSEEIHT (Toulouse, France). Dinesh Bh. HANCHATE – assistant professor of Department of Computer Engineering, College of Engineering, Baramati (Pune, Maharashtra, India). Antal Miklós IVÁNYI – doctor of physical and mathematical sciences, professor of University of Budapest (Eötvös Loránd Tudományegyetem, Budapest, Hungary). Zoltán KÁSA – doctor of physical and mathematical sciences, professor of Sapientia Hungarian University of Transylvania (Magyar Tudományos Académia, Târgu Mureş, Romania). Patrick LEHODEY – doctor of philosophy, senior scientist in Collecte Localisation Satellites (Toulouse, France). Boris F. MELNIKOV – doctor of physical and mathematical sciences, professor, head of Department “Applied Mathematics and Informatics”, Togliatti branch of Samara State University (Togliatti, Russia). Inna N. SENINA – candidate of physical and mathematical sciences, doctor of philosophy, researcher of Collecte Localisation Satellites (Toulouse, France), doctorant of Southern Federal University (Rostov-on-Don, Russia).