Parallel Multiple Sequence Alignment with ... - Semantic Scholar

1 downloads 16553 Views 139KB Size Report
A good example of such software is the RAxML– .... than the number of workers, again dynamic scheduling is used. However, at this .... Crp and Crm describe ...
Parallel Multiple Sequence Alignment with Decentralized Cache Support Denis Trystram1, Jaroslaw Zola1,2 ? 1

Laboratoire ID–IMAG, Grenoble, France ?? , France 2 Institute of Computer & Information Sciences Czestochowa University of Technology, Poland email: [email protected]

Abstract. In this paper we present a new method for aligning large sets of biological sequences. The method performs a sequence alignment in parallel and uses a decentralized cache to store intermediate results. The method allows alignments to be recomputed efficiently when new sequences are added or when alignments of different precisions are requested. Our method can be used to solve important biological problems like the adaptive update of a complete evolution tree when new sequences are added (without recomputing the whole tree). To validate the method, some experiments were performed using up to 512 Small Subunit Ribosomal RNA sequences, which were analyzed with different levels of precision.

1 Introduction Multiple sequence alignment (MSA) is one of the most commonly studied problems in computational biology. It is a general technique utilized in biological sequences analysis such as structure modeling, function prediction or phylogenetic analysis [1]. Unfortunately, finding an accurate multiple alignment is a hard optimization problem. Firstly, because it is difficult to provide a formalization which would be satisfactory from the biological viewpoint. Secondly, having a good model usually means it is algorithmically very hard to produce the best (or optimal) alignment. Indeed, the Generalized Tree Alignment Problem (GTA), which has been shown to be the most accurate formalization of MSA, is Max–SNP–Hard [2]. Another factor making MSA a complex problem is the size of the analyzed data. Often, an input dataset contains hundreds of long sequences (e.g. longer than 1000 bp 3 ). This is especially true for biological sequence databases [3–5]. For example, in June 2004, the Hovergen Database contained 312987 aligned nucleic sequences classified into 32820 families [3]. While databases like the Hovergen are very useful in molecular phylogenetic studies, they require an enormous number of computations when updated. Adding new sequences to an already aligned family of sequences usually requires the entire alignment to be recomputed from the beginning. ? ?? 3

The work of Jaroslaw Zola has been supported by French Government. Laboratory ID–IMAG is funded by CNRS, INRIA, INPG and UJF. Base Pair (bp) is a basic unit used to express the sequence length. One bp corresponds to one character.

In previous contributions [6, 7] we have reported on two heuristics aimed at solving the GTA problem. In this paper, we extend these approaches. We analyze typical application cases of our alignment procedure (called PhylTree) and we characterize sources of reference locality in the alignment computations. Furthermore, we propose a parallel server designed to build alignments of large sequence sets. The server utilizes a decentralized cache to store partial alignments so that they can be reused when a new alignment is requested (i.e. extended with new sequences or a more precise one). The remainder of this paper is organized as follows. Section 2 presents a brief analysis of possible applications of the PhylTree method. Section 3 gives an overview of our server with decentralized cache. Section 4 provides some experimental results with actual biological data. Finally, possible extensions of this work are discussed in Section 5.

2 A new method for Multiple Sequence Alignment 2.1 The PhylTree method The PhylTree method is a generic multiple sequence alignment procedure which has been proposed recently [6]. As it has been presented in detail in our previous papers, we provide here only its basic concepts with no technical details. The PhylTree method was designed to build a multiple alignment and the corresponding phylogenetic tree4 simultaneously. It can be characterized as an iterative clustering method whose principle is to group closely related sequences [8]. It consists of two successive phases: first it generates a distance matrix for all input sequences (based on the all–to–all pairwise alignments). Then, it searches for the optimal partial solutions which are later combined to obtain the final phylogenetic tree and multiple alignment. The method uses two principles: neighbourhood and cluster. A neighbourhood is a set of k closely related taxa5 , where k is an input parameter of the PhylTree procedure and is a constant integer value (typically, k ≥ 4, not larger than 10). A cluster is a group of m ≤ k taxa which creates a part of the final phylogeny. To determine the clusters the PhylTree algorithm generates a neighbourhood for every input sequence, then, it finds the best phylogenetic tree for each neighbourhood, and finally, analyzes all the found trees to extract their common subtrees. These subtrees describe a set of highly related taxa and correspond to clusters. The above process is iterative. The accuracy of the method depends mostly on the parameter k. Increasing k will widen the search space analyzed to find the clusters. Unfortunately, while this should improve the accuracy of the final solution, it also increases the number of computations required to process a single neighbourhood. Basically, to process a neighbourhood we (2k−2)! have to compute all possible 2k−1 trees to find the optimal one. (k−1)! 4 5

A phylogenetic tree, or phylogeny, represents the evolutionary history of a set of species. In our consideration it is a rooted binary tree whose leaves are the input sequences. Taxon is an individual, a strain, a species or any unit of classification. In our case, a taxon is a node of the tree.

An important property of PhylTree is its genericity, which means that the method can use different alignment algorithms and different scoring functions. Moreover, it is possible to use various definitions of a neighbourhood and neighbourhoods of variable size (a variant called QuickTree has been proposed as a way of rapid solving large instances with decreased accuracy). 2.2 PhylTree as a generic scheme As mentioned before, PhylTree has been designed to solve the GTA problem. This problem is a formalization of the multiple sequence alignment based on Steiner trees [2]. In general, having a set of sequences S we want to determine an optimal phylogenetic tree T and the corresponding alignment Al. This basic use of PhylTree is referred to as a single execution. In this paper we propose an approach based on a decentralized cache support to implement efficiently PhylTree. It will allow many variants of the problem, which are of interest to biologists, to be handled easily. One single execution of PhylTree can be formalized as follows: for a given tuple (S, F, aF , k) find a relevant pair (T, Al), where S is the set of the sequences to align, F describes the alignment method with a scoring function, aF is a set of scoring function arguments, and k is a tuning parameter for the precision. For example, if F denotes Sankoff’s Parsimony with a linear affine gap insertion cost then aF represents the costs of gap opening and gap continuation. Such a formulation can be used to describe extensions of the single execution problem: It is very common for a set of sequences with already computed alignment to be extended with new elements. This is especially true for genomic sequence databases with periodic updates. Every time new sequences are added, a single execution is performed to obtain an alignment and its corresponding phylogenetic tree for the extended set. This example can be expressed as follows: for a given (F, aF , k) and {S0 , S1 , . . . , Sl } such that Si ⊂ S j , i < j find {(T0 , Al0 ), (T1 , Al1 ), . . . (Tl , All )} (sometimes, a relation be/ i 6= j). Surprisingly, while this tween sequence sets can be more general: Si ∩ S j 6= 0, situation is very common there are no good solutions able to determine new alignments and new phylogenetic trees based on the previous results. Another interesting extension is to build alignments with a different level of precision. A series of single executions is performed for the same set of input sequences, which differ only by the parameter k. That is: for a given (S, F, aF ) and {k0 , k1 , . . . , kl } generate {(T0 , Al0 ), (T1 , Al1 ), . . . (Tl , All )}. Of course, it is possible that in some cases parameter k will be changed together with a set of input sequences. More precisely, when expanding a set of previously aligned sequences we may wish to change the precision of the new alignment. In our considerations we assume that all single executions share the same alignment procedure and the same scoring function arguments. While this is a significant simplification (e.g. aF describes usually continuous arguments) it is reliable for most applications. Actually, we assume one evolutionary model when analyzing a given group of sequences .

2.3 Related work Both multiple sequence alignment and phylogeny are of great importance to biologists, but at the same time these problems are very computationally demanding. That is why parallel and distributed programming is often used to improve the efficiency of existing bioinformatics applications [9]. One of the most popular programs is the ClustalW package [8]. This tool implements basic MSA algorithms based on the phylogenetic tree approximation. In recent years a few different parallel versions of ClustalW have been proposed [10–12], designed for both shared and distributed memory architectures. However, in most cases, the parallel approach is limited to the main ClustalW algorithm. While these approaches have been proved to be efficient for a single execution case, they make no assumption about possible dependencies between series of executions. Moreover, the accuracy of the solutions generated by ClustalW is usually poor for large amounts of input data. In [13] Catalyurek et al. proposed an implementation of ClustalW based on caching the alignments score. This work, however, is limited to the sequential version of the algorithm, and cache is only utilized to store the scores of the pairwise alignments. A significant part of the research on parallel bioinformatics is concerned with maximum likelihood methods [1]. However, this class of algorithms is designed to reconstruct the evolutionary history (phylogenetic tree), and the sequence alignment is the tool required to build a proper tree. A good example of such software is the RAxML– III package [14]. Recently, the authors of RAxML have reported a phylogeny inference of 10000 taxa [15]. While this result is really impressive, it cannot be directly compared with the results of the multiple alignment, since the problems are slightly different. 2.4 Alignment reference locality The basic idea of the PhylTree method is a greedy exploration of the partial alignments search space. The exploration is performed through the analysis of the neighbourhoods. A set of multiple alignments is performed for every neighbourhood. If two neighbourhoods share two or more common elements, their analysis will require some common computations. Obviously, the alignments computed for the same elements of the first and second neighbourhood will be the same. Furthermore, the analysis of a single neighk! distinct pairwise alignments to be computed. On bourhood of size k requires 2·(k−2)! the other hand, all possible pairwise alignments are generated in the first phase of the method. Consequently, the alignments computed in the first phase of the PhylTree processing can be used in the second phase, and the results computed during the second phase can be reused from one iteration to the next, or even within the same iteration. In fact, our experiments showed that in some cases only 20% of computations have to be actually performed as the others are redundant. The properties described above relate to a single execution of PhylTree. Of course, the same features hold for the cases described in Section 2.2. If we consider a series of single executions, then all requests (executions) have very similar characteristics, e.g. the same evolutionary model and common (or the same) input dataset. Therefore, the intermediate results, generated by a query in the series, have the potential to be reused by other requests. For example, if a series of single executions is based on the same set

of input sequences but different values of k are requested, then all pairwise alignments computed during the first execution can be reused in all subsequent executions. Hence, a distance matrix for the whole group of requests needs to be computed once only.

3 Parallel server and cache support The PhylTree method has been used to build a PhyloBuilder server. The server allows a series of single executions, submitted by one or several users, to be run. Our server was designed to work under the control of a batch queuing system, for example SGE or PBS. Users submit their computation (request) via some kind of interface, for example a web page, and for every request a single execution script is generated. This script is submitted to the dedicated scheduler queue. Next, each request is executed using our parallel server, and a persistent cache is used to store partial results. Users requests are dynamic. This means that we are not able to predict which parameters or what kind of input will be used in the request. 3.1 Parallel PhylTree The PhylTree method provides good quality results at the expense of being time consuming for most real–life applications. Even if caching is applied, the number of computations to perform remains large. But, at the same time, the PhylTree design makes it easy to parallelize. We have chosen a distributed master-worker architecture with arbitrary selected master node. This choice was based on the following observations: 2

– The first phase of PhylTree is an independent task consisting of (n 2−n) pairwise alignments. As a result, it is easy to parallelize. However, the parallelisation should support heterogeneous architectures. Primarily, because two different pairwise alignments may require different computation times (depending on the lengths of the input sequences). Secondly, because caching may change the actual number of computations which have to be performed by a given worker. – The second phase of PhylTree is an iterative process. At each iteration, a set of neighbourhoods is processed, and a single iteration is completed by the distance matrix update. Thus, each iteration contains a single synchronization point. Moreover, the number of neighbourhoods per iteration is typically close to / less than the number of available workers. Hence, the assignment of neighbourhoods to workers may result in large idle times at the end of each iteration. Summarizing, the parallel PhylTree method proceeds as follows: First, the master processor reads the input sequences and broadcasts them to all workers. Next, each worker receives a part of the distance matrix to compute. At this stage, we use the guided self–scheduling strategy [16], since it allows the load imbalance imposed by the heterogeneous environment to be minimised. Processing the distance matrix, a worker analyzes its efficiency by measuring the number of base pairs aligned per second. Thanks to this analysis, the master node can rank all the workers accordingly to their efficiency.

In the second phase, only the processing of the neighbourhoods is parallelized. That is: at each iteration, the master determines the neighbourhood sets, and then, it starts to generate all possible tree topologies for each neighbourhood. Each worker receives its part of the generated topologies and looks for one with the highest alignment score. To achieve this objective, it has to compute MSA for every single topology. Because the number of created tree topologies for all neighbourhoods is usually much greater than the number of workers, again dynamic scheduling is used. However, at this stage, the host priorities computed in the first phase are utilized. Half of the total number of topologies is distributed proportionally to the workers’ priorities. Then, the other part is distributed using the guided self–scheduling. When a worker completes its part of the computations it sends back the resulting best tree topology and the corresponding alignment to the master. To complete an iteration, the master computes the clusters and updates the distance matrix accordingly. 3.2 Cache of alignments In order to remove redundant computations, and so improve the efficiency of the PhylTree, we have designed and implemented an alignment cache. The purpose of the alignment cache is to store and manage all intermediate alignments so that they can be reused in future computations. An important issue here is to design a caching system is such a way that the cache management cost will not offset the performance improvement obtained from using cached results. Another concern is how to store the alignment results. Cached data should contain all the information required when it is reused, for example, a description of the underlying tree topology. This requirement is a direct consequence of the PhylTree genericity – some of the alignment procedures may use various help structures (e.g. alignment profile). Finally, since we are dealing with a parallel version of PhylTree, the cache system must be able to work in the distributed environment. To accomplish the requirements described above, we have implemented a decentralized caching system based on the CaLi framework [17]. Our solution consists of two subsystems which differ in their management policies, and store different types of alignments. The first subsystem is dedicated to caching pairwise alignments. It is managed using the well known LRU policy and is replicated among all the workers when the first phase of PhylTree is completed. The second subsystem is responsible for storing only multiple alignments. It is managed using a variant of Greedy–Dual Size (GDS) policy [18] and it is distributed among the worker nodes. The are several reasons why we distinguish between pairwise and multiple alignment. Pairwise alignments are the most frequently computed alignments in every execution (see Fig. 1). The time required to compute a pairwise alignment is much shorter than the time required to compute a multiple alignment. Finally, all pairwise alignments computed in the first phase of PhylTree will be reused in the second phase. On the other hand, the multiple alignment computations take place only in the second phase and we are not able to predict a priori which of them will be the most often requested. In addition, multiple alignment’s popularity decreases with its size, but its cost (time to compute) increases. Unfortunately, because PhylTree may use various alignment pro-

30

k=4 k=5 k=6 k=7

Percent of Requests [%]

25 20 15 10 5 0

2

4

6

8

10

12

14

16

Alignment Size

Fig. 1. Percentage of requests for an alignment of given size in a single execution. The same input data set was used for each execution.

cedures, it is not possible to describe the exact dependencies between the size, cost and popularity of alignments. Figure 1 shows the percentage of requests for an alignment of given size in a single execution of various levels of precision. As it can be seen, in every execution nearly 30% of requests is related to the pairwise alignments. This can be explained by the fact that PhylTree creates only several sequences (clusters) per iteration. As a result, in almost every iteration some pairwise alignments are requested. Of course, this general tendency may differ slightly depending on the size and type of the input dataset. 3.3 Cache implementation As we have already pointed out, our caching system utilizes two distinct subsystems. While these subsystems are based on different management strategies, they use similar techniques to create and describe the entries stored in the cache. A single alignment inserted into the cache is compressed and written in the binary file. Such an entry is identified by a unique key, generated as follows. First, the sequences to be aligned and the tree topology are digested using a hashing function, for example SHA–256. Then, the resulting fingerprint is extended with the identifier of alignment method F and its parameters aF . Each time a worker is requested to compute an alignment, it first generates the key and then queries the proper cache (depending on the number of input sequences). If a hit occurs, the requested alignment is read and the identifiers of the component sequences are compared with the identifiers of the input sequences. This mechanism allows us to detect possible collisions of keys. If no collision is found, the alignment loaded from the cache is used. Otherwise, that is in the case of collision or cache miss, it is computed, and its result is inserted into the cache. The pairwise alignment cache is a local storage system replicated among the workers. Because the number and the size of pairwise alignments are small compared to the total size of data generated during a single execution, the time of replication is not really significant compared to the total processing time. Replication guarantees that all the

pairwise alignments will be served by the local cache in the second phase. The pairwise alignment cache is managed using the LRU policy. As a result, each pairwise alignment computed in the first phase remains in the cache during the second phase. Of course, we assume here that the capacity of the pairwise alignment cache is large enough to store the alignments generated by at least one single execution. The management of the multiple alignment cache is a more challenging task. Because of the distribution of the computations, the same alignment may be requested or computed by different workers simultaneously. Additionally, the alignment computed by a given worker in one iteration may be requested by another one during another iteration. Therefore we have implemented the multiple alignment cache as a decentralized, content–addressable system [19]. Every key describing the cache record is mapped to one of the workers, which becomes the delegate worker for a given key. When a worker requests a multiple alignment, the caching system first checks the local storage. If no proper entry is found, the request is forwarded to a delegate worker cache. If a miss occurs, the worker computes the requested alignment and then inserts the result into the local cache and into the delegate worker cache. If a remote hit occurs, the requested entry is inserted into the local storage. In this way the multiple alignment cache is partially replicated, which in turn increases the number of local hits, and the application of a good hashing function guarantees the uniform distribution of the cache entries. The application of content–addressing has a very important advantage: If the requested alignment is not cached locally, it means that it has been not used by a given worker yet. At the same time, this alignment could have been computed by another worker and be already present in the delegate worker cache. Hence, in the worst case only one remote request is necessary to check if a given alignment has to be computed or not.

4 Performance evaluation We performed a set of experiments with actual biological data to validate our approach. We randomly created several groups of between 32 and 512 sequences, coming from SSU ribosomal RNA [4]. The average length of the analyzed sequences was 1300 bp varying from 1200 bp to 1400 bp. These sequences were then analyzed using our parallel server. Our experiments were performed on a small cluster of 7 SMP nodes connected by a GbitEthernet network. Every node was equipped with a dual Itanium2 CPU (one CPU used by the server and one by OS), 4GB of RAM and was running under Linux. A single node could use 32MB of storage for a pairwise alignment cache and 128MB for a multiple alignment cache. The SCSI disk storage was managed by a ReiserFS file system which allows many large files to be handled efficiently. 4.1 Experiments In the first experiment, we generated a set of requests to compute MSA with a different level of precision for |S| = 64 sequences. We started our simulation with an empty cache submitting requests with parameter k0 = 4. For each further request, the value of k was increased by 1. Table 1 shows the results of request processing for different values of k.

In the next experiment we examined efficiency of our approach when the set of input sequences is extended in every execution. In the first request we used a set of |S0 | = 32 sequences and then, we doubled the number of sequences in each request: Si = Si−1 ∪ Sx , where Sx is a group of new input sequences of the same cardinality. In every request, we have used k = 5 as a level of precision. Table 2 presents the time of request processing depending on the size of the input dataset. In both experiments we utilized Sankoff’s Parsimony with the gap opening cost equal to 2.0 and gap continuation cost equal to 1.0. k 4 5 6

Tp [s] 3940 18330 178517

Tc [s] 2155 8591 74092

Hr p 0.42 1.0 1.0

Hrm 0.28 0.31 0.37

Cr p 0.085 0.096 0.08

Crm 0.36 0.43 0.50

Cs 0.45 0.53 0.58

T

E = Tpc 1.82 2.13 2.41

Table 1. Results of request processing for different values of k, where Tp is the execution time for the parallel server and Tc is the execution time for the parallel sever with cache support. Hr p and Hrm are respectively the hit ratio for the pairwise and multiple alignment cache. Cr p and Crm describe the cost hit ratio for the pairwise and multiple alignment cache, and Cs = 1 − TTcp is the cost saving ratio for the whole cache system. |S| 32 64 128 256 512

Tp [s] 6747 18182 55300 156831 438539

Tc [s] 3430 9123 24796 63631 144090

Hr p 0.87 0.82 0.74 0.63 0.43

Hrm 0.28 0.29 0.34 0.34 0.45

Cr p 0.099 0.079 0.066 0.067 0.048

Crm 0.39 0.41 0.48 0.53 0.63

Cs 0.49 0.49 0.55 0.59 0.67

T

E = Tpc 1.96 1.99 2.23 2.46 3.04

Table 2. Results of request processing for different sizes of the input dataset S.

4.2 Discussion As we could expect, the caching technique noticeably improved the performance of the server in both of the described experiments. Table 1 shows that the hit ratio Hr, and cost hit ratio Cr (defined as the cost of alignments found in the cache divided by the total cost of alignments requested during the execution, where the cost of alignment is the time required to compute it) increases with every execution. This can be explained by the fact that the results of previous computations are utilized, and an increase in the size of neighbourhoods results in a higher redundancy of computations. The cost saving ratio Cs is dominated by the cost hit ratio of the multiple alignment cache. This confirms our claim that it is more profitable to cache multiple alignments. The cost saving generated by the pairwise alignment cache is lesser than we expected, however it is still significant. Similar tendencies can be observed in the second experiment (Tab. 2),

except that the pairwise alignment cache hit ratio decreases. This is because of a geometric increase in the input dataset size, which in turn increases the relative number of multiple sequence alignment computations. The cache hit ratio for the multiple sequence alignment cache stabilizes around 35% for the input of 256 sequences. At this point, the cache is saturated and the cache replacement policy is used. In spite of this, the cost saving ratio, as well as cache hit ratio, increases. This is possible thanks to the application of the cost–aware replacement policy, that is GDS. In fact, our trace driven simulations have shown that GDS attains the highest hit ratio and cost saving ratio in comparison with other strategies, like LRU, LFU or other size–based policies. The presented results show that the efficiency of the server depends on the size of the input data and the required precision. More precisely, the cost saving ratio will be minor for short sequences, and analysis with small k will result in a low cache hit ratio. The same factors will influance the scalability of our system. Increasing the number of workers should allow larger problems to be solved with better precision rather than achieving better performance for small data. In our experiments we did not compare the parallel and sequential versions of the server. This is because the sequential version is too memory consuming, e.g. for k > 5 alignment of a few sequences becomes an out–of–core problem. For the same reason only one server process is executed on the SMP node.

5 Conclusions In this work we presented a new cache–based approach to solving the parallel multiple sequence alignment problem. We condacted a formal analysis and we verified by experiments that the application of decentralized caching can substantially improve the efficiency of the alignment computations in both, a single execution and series of single executions. We believe that our approach can be combined with other existing MSA software, such as, for example, ClustalW or 3D-Coffee. In our considerations we assume that all requests share the same alignment procedure. The problem of how to use the results of previous computations when alignment with a different evolution model is requested remains open. In particular, is it possible to exclude some of the tree topologies during neighbourhood processing knowing that they were of little value when analyzed with different parameters. Both, our parallel server and detailed results of MSA can be accessed on–line via the https://hal.icis.pcz.pl/PhyloServer web page.

References 1. Holder, M., Lewis, P.O.: Phylogeny estimation: traditional and bayesian approaches. Nature Reviews Genetics 4 (2003) 275–284 2. Jiang, T., Lawler, E.L., Wang, L.: Aligning sequences via an evolutionary tree: complexity and approximation. In: ACM Symp. on Theory of Computing. (1994) 760–769 3. Duret, L., Mouchiroud, D., Gouy, M.: HOVERGEN, a database of homologous vertebrate genes. Nucleic Acids Res. 22 (1994) 2360–2365 4. Wuyts, J., Van de Peer, Y., Winkelmans, T., R., D.W.: The european database on small subunit ribosomal RNA. Nucleic Acids Res. 30 (2002) 183–185

5. Cole, J.R., et al.: The Ribosomal Database Project (RDP–II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res. 31 (2003) 442–443 6. Guinand, F., Parmentier, G., Trystram, D.: Integration of multiple alignment and phylogeny reconstruction. In: Eur. Conf. on Comp. Biology, Poster Abstr. (2002) 7. Parmentier, G., Trystram, D., Zola, J.: Cache-based parallelization of multiple sequence alignment problem. In: Proc. of Euro-Par ’04. (2004) 1005–1012 8. Higgins, D., Thompson, J., Gibson, T.: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22 (1994) 4673–4680 9. Yap, T.K., Frieder, O., Martino, R.L.: Parallel computation in biological sequence analysis. IEEE Trans. on Par. and Dist. Proc. 9 (1998) 283–294 10. Li, K.B.: ClustalW–MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19 (2003) 1585–1586 11. Ebedes, J., Datta, A.: Multiple sequence alignment in parallel on a workstation cluster. Bioinformatics 20 (2004) 1193–1195 12. Mikhailov, D., Cofer, H., Gomperts, R.: Performance optimization of ClustalW: Parallel ClustalW, HT Clustal, and MULTICLUSTAL. http://www.sgi.com/industries/sciences/chembio/resources/clustalw (2005) 13. Catalyurek, U., Ferreira, R., Kurc, T., Saltz, J.: Improving performance of multiple sequence alignment analysis in multi–client environments. In: Proc. of HiCOMB ’02. (2002) 14. Stamatakis, A., Ludwig, T., Meier, H.: RAxML–III: A fast program for maximum likelihood–based inference of large phylogenetic trees. Bioinformatics 1 (2004) 1–8 15. Stamatakis, A., Ludwig, T., Meier, H.: Parallel inference of a 10.000–taxon phylogeny with maximum likelihood. In: Proc. of Euro-Par ’04. (2004) 997–1004 16. Polychronopoulos, C.D., Kuck, D.J.: Guided self–scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Trans. on Computers 36 (1997) 1425–1439 17. Zola, J.: CaLi – generic computational buffers library. http://icis.pcz.pl/∼zola/CaLi (2005) 18. Balamsh, A., Krunz, M.: An overview of web caching replacement algorithms. IEEE Comm. Surv. & Tutor. 6 (2004) 44–56 19. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content– addressable network. In: Proc. of SIGCOMM ’01. (2001) 161–172