RAxML: A Parallel Program for Phylogenetic Tree Inference Alexandros P. Stamatakis, Harald Meier Technical University of Munich, Department of Computer Science Thomas Ludwig Ruprecht-Karls-University, Department of Computer Science

Abstract The computation of large phylogenetic trees based on the maximum-likelihood criterion is extremely computationally expensive. We propose a new, partially randomized parallel algorithm for the reconstruction of large phylogenetic trees, which yields equally good or even better trees in less time. We provide initial results for alignments containing 150 up to 1000 sequences from various sources. First results show run time improvements of at least factor 2 over PAxML for the experiments we conducted.

1. Randomized A(x)ccelerated Likelihood

Maximum

Introduction: In previous work [8] [7] we have introduced Subtree Equality Vectors (SEVs) for significantly accelerating the topology evaluation function of maximum likelihood-based phylogeny programs. We implemented SEVs in PAxML (Parallel A(x)ccelerated Maximum Likelihood) which was derived from parallel fastDNAml [9]. PAxML shows run time improvements of approximately 25% to 65% compared to parallel fastDNAml, yielding best accelerations for large alignments ( 150 sequences) on PC processor architectures. The algorithmic optimizations of PAxML focused on obtaining exactly the same results as parallel fastDNAml in less time. The goal of current work on RAxML is to obtain better or equally good trees than PAxML in less time with a novel algorithm. We briefly describe the essential parts of the novel program and provide some initial encouraging results. A Randomized Approach: One of the main disadvantages of the stepwise addition algorithm initially proposed in [2] and implemented with some modifications in parallel fastDNAml is that the final tree strongly depends on the input order of the sequences. Thus, it is recommended to run the program several times with different randomized

sequence input order permutations at high rearrangement levels for obtaining reliable results. This practice becomes prohibitive for large trees ( 200 sequences) even on supercomputers since thorough rearrangements and an augmentation in the number of taxa increases run time by orders of magnitude. In order to handle those problems we have designed RAxML, an MPI-based parallel program which is based on a simple master-worker architecture. RAxML initially analyzes a large number of randomized input order permutations, which are evaluated by the worker processes without rearrangements. The master collects the tree topologies from the workers and maintains a list of good initial trees. When the specified number of randomized input order permutations has been evaluated a fraction or several fractions of the topologies in the tree list are used for building a majority-rule consensus tree with CONSENSE [4] from the PHYLIP, which we integrated into our code. These consensus topologies are then evaluated by the workers, since they often show a better likelihood than the original best tree in the list. In the final step the master rearranges the tree in the same way as parallel fastDNAml/PAxML.

Test Data & Results For testing our program we extracted alignments comprising 150, 200, 250, 500 and 1000 taxa (150 ARB,...,1000 ARB) from the ARB [5] small subunit ribosomal RiboNucleic Acid (ssrRNA) database. Those alignments contain organisms from the three kingdoms Eucarya, Bacteria and Archaea. In addition, we used the 150 sequence data set (150 SC) from [9] and performed exactly the same exhaustive search with PAxML as specified in [9] using 10 randomized input order permutations and global as well as local rearrangements set to 5. All tests have been conducted on the HELICS [3] 512 processor Linux Cluster. In Table 1 we list the required CPU hours and final likelihood values of PAxML and RAxML for the respective alignments, the number of randomized trees evaluated by RAxML, the number of jumbles (J) conducted with PAxML and the rearrangement setting (R) which is common to both programs. Finally we

Table 1. Total Amount of CPU hours and final likelihoods for PAxML and RAxML Data CPU hrs PAxML Lh J R CPU hrs RAxML Lh # trees 150 SC 1635.66 -44146.90 10 5 64.29 -44145.98 500 150 ARB 300.40 -77189.78 1 5 106.14 -77189.69 500 200 ARB 774.56 -104743.33 1 5 287.57 -104743.32 500 250 ARB 1947.18 -131468.97 1 5 481.46 -131475.52 500 1000 ARB 9898.05 -402282.08 1 1 1070.59 -401501.57 3837

acc 25.44 2.83 2.69 4.04 9.23

Table 2. Amount of CPU hours and likelihoods for the randomized phase of RAxML with maximum likelihood and parsimony Data 150 SC 150 ARB 200 ARB 250 ARB 500 ARB 1000 ARB

CPU hrs max. likelihood 24.10 53.80 110.70 177.44 913.09 2882.22

Lh -44217.74 -77235.17 -104859.28 -131683.32 -253103.57 -402683.21

provide the acceleration (acc) achieved by RAxML. Note that the 1000 taxon tree has already been calculated with the parsimony-based version of RAxML described below. Furthermore, we conducted an additional run for on jumble of the 150 SC alignment with parallel fastDNAml which required 343.09 CPU hours in contrast to 165.50 CPU hours it required with PAxML to demonstrate the advantages of PAxML. In a second version of RAxML we replaced the maximum likelihood evaluation of randomized input sequence permutations by the parsimony program dnapars from PHYLIP which uses a similar stepwise addition algorithm for tree building. We first calculate a set of trees using parsimony and evaluate the final topologies with maximum likelihood. Since there seems to exist a relationship between parsimony and maximum likelihood [1] and parsimony is significantly faster than maximum likelihood this change significantly reduces the time required for calculating an initial set of good trees. A comparison of randomized tree inference times is given in Table 2. The great variations in tree number and acceleration factor are due to the fact that the number of actual trees can not be predicted since dnapars yields an arbitrary number of equally parsimonious trees for each input order permutation. Availability & Future Work: Like PAxML, RAxML will soon become available for download as open source program at [6]. In addition all alignments and tree files will become available for download in an effort to establish a first maximum-likelihood benchmark-suite, which is urgently needed for comparing maximum likelihood programs. We intend to make more extensive use of the con-

CPU hrs parsimony 10.80 5.89 12.69 5.41 41.93 178.63

Lh -44219.21 -77235.49 -104822.64 -131625.73 -253132.42 -402416.93

# trees 1732 508 517 171 731 3837

acc 2.23 9.13 8.72 32.80 21.77 16.14

sensus tree information for accelerating the rearrangement process. This information can be used to perform local optimizations of frequently appearing subtrees. Furthermore, we are currently developing [email protected], a [email protected] version of RAxML.

References [1] R. DeBry and L. Abele. The relationship between parsimony and maximum likelihood analyses: tree scores and confidence estimates. Mol. Biol. Evol., 12:291–297, 1995. [2] J. Felsenstein. Evolutionary trees from dna sequences: A maximum likelihood approach. J. Mol. Evol., 17:368–376, 1981. [3] HeLiCs. Heidelberg linux cluster. Technical report, HELICS . UNI - HD . DE , 2002. [4] L. Jermiin et al. Majority-rule consensus of phylogenetic trees obtained by maximum-likelihood analysis. Mol. Biol. Evol., 14:1297–1302, 1997. [5] W. Ludwig et al. Arb: a software environment for sequence data. Nucl. Acids Res., 2003. [6] ParBaum. homepage and downlaod. Technical report, WWWBODE . IN . TUM . DE / STAMATAK / RESEARCH . HTML , 2003. [7] A. P. Stamatakis, T. Ludwig, H. Meier, and M. J. Wolf. Accelerating parallel maximum likelihood-based phylogenetic tree computations using subtree equality vectors. In Proceedings of SC2002, November 2002. [8] A. P. Stamatakis, T. Ludwig, H. Meier, and M. J. Wolf. Axml: A fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method. In Proceedings of CSB2002, August 2002. [9] C. Stewart et al. Parallel implementation and performance of fastdnaml - a program for maximum likelihood phylogenetic inference. In Proceedings of SC2001, November 2001.

Abstract The computation of large phylogenetic trees based on the maximum-likelihood criterion is extremely computationally expensive. We propose a new, partially randomized parallel algorithm for the reconstruction of large phylogenetic trees, which yields equally good or even better trees in less time. We provide initial results for alignments containing 150 up to 1000 sequences from various sources. First results show run time improvements of at least factor 2 over PAxML for the experiments we conducted.

1. Randomized A(x)ccelerated Likelihood

Maximum

Introduction: In previous work [8] [7] we have introduced Subtree Equality Vectors (SEVs) for significantly accelerating the topology evaluation function of maximum likelihood-based phylogeny programs. We implemented SEVs in PAxML (Parallel A(x)ccelerated Maximum Likelihood) which was derived from parallel fastDNAml [9]. PAxML shows run time improvements of approximately 25% to 65% compared to parallel fastDNAml, yielding best accelerations for large alignments ( 150 sequences) on PC processor architectures. The algorithmic optimizations of PAxML focused on obtaining exactly the same results as parallel fastDNAml in less time. The goal of current work on RAxML is to obtain better or equally good trees than PAxML in less time with a novel algorithm. We briefly describe the essential parts of the novel program and provide some initial encouraging results. A Randomized Approach: One of the main disadvantages of the stepwise addition algorithm initially proposed in [2] and implemented with some modifications in parallel fastDNAml is that the final tree strongly depends on the input order of the sequences. Thus, it is recommended to run the program several times with different randomized

sequence input order permutations at high rearrangement levels for obtaining reliable results. This practice becomes prohibitive for large trees ( 200 sequences) even on supercomputers since thorough rearrangements and an augmentation in the number of taxa increases run time by orders of magnitude. In order to handle those problems we have designed RAxML, an MPI-based parallel program which is based on a simple master-worker architecture. RAxML initially analyzes a large number of randomized input order permutations, which are evaluated by the worker processes without rearrangements. The master collects the tree topologies from the workers and maintains a list of good initial trees. When the specified number of randomized input order permutations has been evaluated a fraction or several fractions of the topologies in the tree list are used for building a majority-rule consensus tree with CONSENSE [4] from the PHYLIP, which we integrated into our code. These consensus topologies are then evaluated by the workers, since they often show a better likelihood than the original best tree in the list. In the final step the master rearranges the tree in the same way as parallel fastDNAml/PAxML.

Test Data & Results For testing our program we extracted alignments comprising 150, 200, 250, 500 and 1000 taxa (150 ARB,...,1000 ARB) from the ARB [5] small subunit ribosomal RiboNucleic Acid (ssrRNA) database. Those alignments contain organisms from the three kingdoms Eucarya, Bacteria and Archaea. In addition, we used the 150 sequence data set (150 SC) from [9] and performed exactly the same exhaustive search with PAxML as specified in [9] using 10 randomized input order permutations and global as well as local rearrangements set to 5. All tests have been conducted on the HELICS [3] 512 processor Linux Cluster. In Table 1 we list the required CPU hours and final likelihood values of PAxML and RAxML for the respective alignments, the number of randomized trees evaluated by RAxML, the number of jumbles (J) conducted with PAxML and the rearrangement setting (R) which is common to both programs. Finally we

Table 1. Total Amount of CPU hours and final likelihoods for PAxML and RAxML Data CPU hrs PAxML Lh J R CPU hrs RAxML Lh # trees 150 SC 1635.66 -44146.90 10 5 64.29 -44145.98 500 150 ARB 300.40 -77189.78 1 5 106.14 -77189.69 500 200 ARB 774.56 -104743.33 1 5 287.57 -104743.32 500 250 ARB 1947.18 -131468.97 1 5 481.46 -131475.52 500 1000 ARB 9898.05 -402282.08 1 1 1070.59 -401501.57 3837

acc 25.44 2.83 2.69 4.04 9.23

Table 2. Amount of CPU hours and likelihoods for the randomized phase of RAxML with maximum likelihood and parsimony Data 150 SC 150 ARB 200 ARB 250 ARB 500 ARB 1000 ARB

CPU hrs max. likelihood 24.10 53.80 110.70 177.44 913.09 2882.22

Lh -44217.74 -77235.17 -104859.28 -131683.32 -253103.57 -402683.21

provide the acceleration (acc) achieved by RAxML. Note that the 1000 taxon tree has already been calculated with the parsimony-based version of RAxML described below. Furthermore, we conducted an additional run for on jumble of the 150 SC alignment with parallel fastDNAml which required 343.09 CPU hours in contrast to 165.50 CPU hours it required with PAxML to demonstrate the advantages of PAxML. In a second version of RAxML we replaced the maximum likelihood evaluation of randomized input sequence permutations by the parsimony program dnapars from PHYLIP which uses a similar stepwise addition algorithm for tree building. We first calculate a set of trees using parsimony and evaluate the final topologies with maximum likelihood. Since there seems to exist a relationship between parsimony and maximum likelihood [1] and parsimony is significantly faster than maximum likelihood this change significantly reduces the time required for calculating an initial set of good trees. A comparison of randomized tree inference times is given in Table 2. The great variations in tree number and acceleration factor are due to the fact that the number of actual trees can not be predicted since dnapars yields an arbitrary number of equally parsimonious trees for each input order permutation. Availability & Future Work: Like PAxML, RAxML will soon become available for download as open source program at [6]. In addition all alignments and tree files will become available for download in an effort to establish a first maximum-likelihood benchmark-suite, which is urgently needed for comparing maximum likelihood programs. We intend to make more extensive use of the con-

CPU hrs parsimony 10.80 5.89 12.69 5.41 41.93 178.63

Lh -44219.21 -77235.49 -104822.64 -131625.73 -253132.42 -402416.93

# trees 1732 508 517 171 731 3837

acc 2.23 9.13 8.72 32.80 21.77 16.14

sensus tree information for accelerating the rearrangement process. This information can be used to perform local optimizations of frequently appearing subtrees. Furthermore, we are currently developing [email protected], a [email protected] version of RAxML.

References [1] R. DeBry and L. Abele. The relationship between parsimony and maximum likelihood analyses: tree scores and confidence estimates. Mol. Biol. Evol., 12:291–297, 1995. [2] J. Felsenstein. Evolutionary trees from dna sequences: A maximum likelihood approach. J. Mol. Evol., 17:368–376, 1981. [3] HeLiCs. Heidelberg linux cluster. Technical report, HELICS . UNI - HD . DE , 2002. [4] L. Jermiin et al. Majority-rule consensus of phylogenetic trees obtained by maximum-likelihood analysis. Mol. Biol. Evol., 14:1297–1302, 1997. [5] W. Ludwig et al. Arb: a software environment for sequence data. Nucl. Acids Res., 2003. [6] ParBaum. homepage and downlaod. Technical report, WWWBODE . IN . TUM . DE / STAMATAK / RESEARCH . HTML , 2003. [7] A. P. Stamatakis, T. Ludwig, H. Meier, and M. J. Wolf. Accelerating parallel maximum likelihood-based phylogenetic tree computations using subtree equality vectors. In Proceedings of SC2002, November 2002. [8] A. P. Stamatakis, T. Ludwig, H. Meier, and M. J. Wolf. Axml: A fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method. In Proceedings of CSB2002, August 2002. [9] C. Stewart et al. Parallel implementation and performance of fastdnaml - a program for maximum likelihood phylogenetic inference. In Proceedings of SC2001, November 2001.