Molecular Distance Geometry Optimization Using ... - IEEE Xplore

0 downloads 0 Views 1MB Size Report
Build-up and Evolutionary Techniques on GPU. Levente .... the SA phase is implemented in OpenCL and executed ...... rithms and artificial neural networks, ser.
321

Molecular Distance Geometry Optimization Using Geometric Build-up and Evolutionary Techniques on GPU Levente Fabry-Asztalos, Istv´an L˝orentz, and R˘azvan Andonie Abstract—We present a combination of methods addressing the molecular distance problem, implemented on a graphic processing unit. First, we use geometric build-up and depthfirst graph traversal. Next, we refine the solution by simulated annealing. For an exact but sparse distance matrix, the buildup method reconstructs the 3D structures with a root-mean˚ Small and medium square error (RMSE) in the order of 0.1 A. structures (up to 10,000 atoms) are computed in less than 10 seconds. For the largest structures (up to 100,000 atoms), ˚ and execution time is about 540 the build-up RMSE is 2.2 A seconds. The performance of our approach depends largely on the graph structure. The SA step improves accuracy of the solution to the expense of a computational overhead. Index Terms—Graph algorithms, Graphics processors, Molecular Distance Geometry, Parallel algorithms, Simulated annealing

I. I NTRODUCTION

D

ETERMINING the three-dimensional structures of molecules is very important in many scientific fields, but especially in chemical biology and medicinal chemistry. Exact three-dimensional structures play a critical role in molecular interactions (e.g. between two biologically significant large proteins and between an enzyme and its therapeutically important small-molecule inhibitors) and contribute to chemical and biological properties. Structural characteristics can be determined experimentally using X-ray crystallography and NMR spectroscopy or theoretically using various potential energy minimization and bioinformatics techniques. In chemistry, distance geometry problems arise especially when determining the geometric structures of large molecules (e.g. proteins). Many chemical structures (both large and small) cannot be crystallized. Therefore, their exact three-dimensional structures cannot be determined using X-ray crystallography. When structures are determined using nuclear magnetic resonance (NMR) spectroscopy, due to the experimental limitations of this method, often only a subset of inter-atomic distances are determined. The Molecular Distance Geometry Problem (MDGP) aims to reconstruct, from the given pairwise distances, the 3D positions (x, y, z coordinates of each atom). Manuscript received December 4, 2011. L. Fabry-Asztalos is with the Department of Chemistry, Central Washington University, Ellensburg, WA, USA (e-mail: [email protected]). I. L˝orentz is with the Electronics and Computers Department, Transilvania University, Bras¸ov, Romania (e-mail: isti [email protected]). R. Andonie is with the Computer Science Department, Central Washington University, Ellensburg, WA, USA, and the Electronics and Computers Department, Transilvania University, Bras¸ov, Romania (e-mail: [email protected]).

978-1-4673-1191-5/12/$31.00 ©2012 IEEE

Formally, given a set of n atoms in a molecule and D, a set of Euclidean distances between atoms, the task is to find the atom positions x1 , ..., xn ∈ R3 such that ||xi − xj || = dij , where dij ∈ D. The MDGP is interdisciplinary. In graph theory, it corresponds to finding the embedding of an undirected graph in R3 , considering the atoms as a graph vertices and the known interatomic distances as weighted edges. In a complete graph, when all pairwise distances are accurately known, the classical multidimensional scaling [1], [2] solves the geometric distance problem by eigendecomposition of the squared normalized distance matrix [3]. The MDGP can be also seen as a non-linear global optimization problem: Find the X conformation that minimizes the given stress function: X σ 2 (X) = wij (||xi − xj || − δij )2 (1) i 0 the largest 3 positive eigenvalues and Q+ = [q1 , q2 , q3 ] the matrix formed by the associated eigenvectors. The coordinates are given by: X = Q+ Λ+ 1/2 where Λ+ is the diagonal matrix formed by (λ1 , λ2 , λ3 ).

C. Breadth-first build-up After the coordinates of at least four atoms have been determined, we place the remaining atoms successively, by breadth-first graph traversal. The key computation here is to determine x, the vector of coordinates of an atom, knowing the exact di distances to its already placed neighbors x1 , . . . , xb . This method computes the intersections of spheres (analogous to the rulerand-compass method in 2D), by solving the following system of equations:  ||x − x1 || = d1    ||x − x || = d 2 2 (2)  · · ·    ||x − xb || = db By squaring the equations and subtracting the first equation from the others, we obtain a linear system:  2(x1 − x2 )x′ = ||x1 ||2 − ||x2 ||2 + d22 − d21    2(x − x )x′ = ||x ||2 − ||x ||2 + d2 − d2 1 3 1 3 3 1 (3)  · · ·    2(x1 − xb )x′ = ||x1 ||2 − ||xb ||2 + d2b − d21 where x is the 3-component row vector of the unknowns and x1 . . . xb are the row vectors of the known coordinates. The system can be written in matrix form as Ax′ = B

(4)

The system of equations is overdetermined for b > 4. We solve it by minimizing ||Ax′ − B||, using the pseudoinverse of A. If the singular value decomposition (SVD) of A is UΣV′ , then the pseudoinverse A−1 is VΣ−1 U′

(5)

and the approximate solution vector is obtained by x′ = A−1 B.

(6)

The procedure which determines the coordinates of vertex vi using the set B of already determined vertices is: procedure D ETERMINE XYZ(vi ,B) Require: dij = distance(vi , vj ), ∀vj ∈ B. Require: xj , ∀vj ∈ B. Solve the linear system described by Eq. (3) return x end procedure We use the Apache Commons Math library [33] to perform the SVD computation. To detect degenerate cases (for example if all base atoms are coplanar), we provide the function

procedure ILL C ONDITIONED(A) compute λmax , λmin 6= 0 eigenvalues of A′ A compute rank(A) condition number κ = |λmax |/|λmin | if rank < 3 or κ > κmax then return true else return false end if end procedure During the build-up procedure, when an ill-conditioned system is encountered, the current atom is skipped and the buildup continues with the remaining atoms. The heuristic is that in a second pass over the undetermined atoms, it is possible that the system become well conditioned from the additional determined base atoms. This is the algorithm: Require: P the set of vertices that had been already determined (initialized from the clique computed in the previous step) Q a queue, initialized with the vertices from P and the vertices adjacent to them. procedure BFB(V,E,P,Q) ⊲ Breadth-first build-up f =0 while Q 6= ∅ do v = dequeue(Q) ⊲ extract the vertex from the head B = ∅ ⊲ B will form the bases of new placement for all vertices u adjacent to v that u ∈ P do B = B ∪ {u} end for if |B| ≥ 4 and not ILL C ONDITIONED(B) then D ETERMINE XYZ(v,B) P = P ∪ {v} f =0 for all vertices u adjacent to v that u ∈ / P do Q = Q ∪ {u} end for else put back v to the end of Q f =f +1 if f > |Q| then return P ⊲ partial solution end if end if end while return P ⊲ full solution end procedure Variable f is incremented whenever an atom’s coordinates cannot be determined (due to missing or ill-conditioned bases), and reset to 0 when the determination succeeds. If f reaches the queue size, the algorithm stops and reports a partial solution, since it cannot advance in the depth first traversal. In case when the BFB procedure terminates with a partial solution (the queue is not empty upon returning), we increment the allowed condition number κmax and rerun the BFB algorithm. This heuristic helps reiterating over previously undetermined atoms.

324 D. Merging of partial solutions Steps A,B,C are restarted for every not yet determined vertex. Each breadth-first exploration results in a subgraph of determined vertices. The resulting solutions are non-overlapping for every quad-connected component of the graph and our algorithm stops in this case, reporting the list of distinct components. If the buildup procedure stops because of ill-conditioned system, it is possible that two partial solutions share some vertices. If at least four non-planar vertices are shared, their positions can be aligned unambiguously. To align two partial solutions, one must be translated and rotated to the other’s system, using the Kabsch algorithm [34] as described below. Let X and Y be the n × 3 matrices of the coordinates of the two structures, both structures centered around the origin. Then, compute Z = XR, where R is the optimal rotation matrix that aligns X to Y minimizing the RMSE E: E=

n X

||(XR)i − yi ||2

(7)

i=1

procedure ALIGN(X, Y) compute the covariance matrix: C = X′ Y singular value decomposition: UΣV′ = C compute the rotation matrix: R = UV′ rotate the X coordinate set: Z = XR return Z end procedure The partial solutions are merged by a greedy algorithm. First we sort the sequence S1 , S2 , . . . , Sn by the decreasing error σ 2 (given in Eq. (1)), then we try to merge the sets, until one solution remains, or no more merging can occur: procedure M ERGE(S = S1 , S2 , . . . , Sn ) i=1 while |S| > 1 and i < |S| do for j = i + 1 to |S| do A = Si ∩ Sj if |A| > 4 then replace Si with ALIGN(Si , Sj ) remove Sj end if end for i=i+1 end while return S end procedure The algorithm might fail to merge all partial solutions, in case of a too sparse matrix. In this case, the output will be a list of separate portions of the determined conformation. Finally, the complete algorithm is synthesized here: procedure BUILDUP(V,E) W =V ⊲ The set of unexplored vertices while W 6= ∅ do choose any v ∈ W ⊲ Choose an unexplored vertex W =W −v

if H AS C LIQUE(v, neighbors(v)) then P = P LACE C LIQUE Q = P ∪ neighbors(P ) BFB(V, E, P, Q) S =S +P ⊲ Save partial solution W =W −P end if end while M ERGE(S) end procedure E. Efficiency analysis The time complexity of the algorithm depends largely on the G(V, E) graph structure. We consider a graph to be 4connected if we cannot find three vertices whose removal disconnects the graph. The BFB algorithm stops when a frontier of a 4-connected subgraph is encountered, and it’s restarted by BUILDUP routine from a previously unexplored region. In general the breadth-first search takes O(|E| + |V |) time. For the initial clique placement, we need to find a minimum 4-clique of a root vertex and his neighbors. The BronKerbosch algorithm [35] we use has a worst-case time of O(3m/3 ), where m is the number of vertices of the subgraph formed by v and its neighbors, that is m = degree(v) + 1. From our dataset (Table I), we get an average degree of a vertex of about 23. We made a trade-off between accuracy of clique finder and execution time, limiting the subgraph size when searching for cliques to 20 vertices, which gives us a constant bound on the clique finding time, Tclique . Clique search is restarted every time a new build-up is started, for a different region in the graph. In the best case, the BFB algorithm traverses the entire graph, needing just one initial clique. In the worst case (for a very sparse graph), the clique search is repeated for every vertex, giving a O(|V |) time. However, for our dataset, the number of times cliquesearch performed was far less than the number of vertices. For example, for the structure with PDB ID 1VRI, which contains 150,720 atoms, the initial clique search was done 368 times. Aligning two partial solutions Si , Sj by the Align routine requires time in O(max(|S1 |, |S2 |) which is bounded by O(n). Merging time for all partial solutions is in O(n|S|). F. Refinement by Simulated Annealing The solution(s) found so far are generally inaccurate due to accumulation of numerical errors during buildup. We use them as inputs to the SA refinement step [36]. In our case, the SA algorithm minimizes the following error function: X 2 U= (||xi − xj || − δij ) (i,j)∈D

+

X

max(||xi − xj || − δc , 0)2

(8)

(i,j)∈D /

The first term is identical to the error defined in Eq. (1) for known exact distances (in set D), whereas the second

325 101

Temperature

3FXI Err 1AON Err 1S1I Err 3OAA Err 3K1Q Err 1VRI Err

0

10

Placement Error

10−1

10−2

10−3

10−4

10−5

0

200

400

600

800

1000

Epoch

Fig. 1. Simulated Annealing of structure 1PTQ. Gray levels represents intermediate iterations, the final solution is plotted in black.

part corresponds to the lower-bound violations penalty. For ˚ The time our data set, such a lower-bound is δc = 5 A. complexity of evaluating the first term of the error is in O(|D|), where |D| is the number of known distances in the associated graph. Direct summation of the second term would require O(n2 ) − O(|D|) complexity. To efficiently compute the second term, we require to determine pairs of atoms that are less than δc away, by building an octree (the BarnesHut method [37]), reducing the complexity to O(n log n) in non-degenerate cases. We write the octree code in Java and pass the data structures to the OpenCL [38], [39] framework for accelerating computations. For the parallel implementation, we defined the per-vertex error as the sum of the distance errors of the adjacent edges Ei , and Ni the set of vertices that are geometrically close to v. but not contained in Ei : X ǫi (x) = (||xi − xj || − δij )2 j∈Ei

+

X

max(||xi − xj || − δc , 0)2

(9)

j∈Ni

In our OpenCL implementation, we assigned a thread to each vertex. The thread perfoms a local annealing of the assigned vertex. While this method is not strictly equivalent to the sequential SA, the net effect is the decrease of global error. procedure PARALELL A NNEALING Require: x1 , . . . , x1 ⊲ determined previously by build-up T = T0 ⊲ initial temperature repeat for k = 1 to internalIterations do for i = 1 to n do do in parallel y = xi + ∆x ⊲ ∆x small random vector ∆ǫi = ǫi (y) − ǫi (x) ⊲ based on Eq. (9) i if ∆ǫi ≤ 0 or exp( −∆ǫ T ) > rand(0, 1) then x=y ⊲ Accept perturbation end if

Fig. 2.

Simulated annealing results.

end for end for T = αT Epoch = Epoch + 1 until max{ǫi } > ǫlimit and Epoch > Epochmax i return x1 , . . . , xn end procedure Since, in most of the cases, the initial configuration is already a local minimum (some times very close to the global optimum), we empirically choose an initial temperature which is not too high to destroy the conformation. The annealing schedule is Tt+1 = αTt , with α = 0.999, and it is updated for each epoch. We choose one epoch having 100 internal iterations. IV. E XPERIMENTAL RESULTS We create an artificial data set using a random set of structures from the Protein Data Bank [40], with sizes ranging from 404 atoms to 150,720, listed in Table I. For oligomers, all atoms were considered. From the x, y, z coordinates of the atoms in a molecule, we create a sparse graph, with vertices representing atoms, and edges were placed only between atoms with distances ˚ The choice of the 5 A ˚ less than a cutoff range of 5 A. cutoff distance corresponds to the distances observable by the Nuclear Overhauser Effect [41]. The inputs of the tested algorithms are solely the inter-atomic distances, no chemical properties or interactions with solvents are being used. Fig.3 depicts the distribution degree for cutoff distances ˚ of 4, 5 and 6 A. We also determine the graph density and clustering coefficient for the input data structures. The density of graph 2|E| G(V, E) is given by |V |·(|V the average local |−1) , whereas P clustering coefficient [42] is C¯ = Ci , where Ci =

2|Ei | |Vi | · (|Vi | − 1)

(10)

defines the local clustering coefficient, and Vi , Ei are the vertices and edges of the subgraph formed by vertex i and its adjacent vertices.

326 TABLE I

0.1

˚ T HE MOLECULES USED IN THE EXPERIMENTS , CUTOFF DISTANCE 5 A. # of Atoms

Min deg

Avg deg

# known distances

Density (%)

Clustering coeff

1PTQ 1RDG 1HOE 1LFB 1PHT 1POA 1AX8 2KXD 1VMP 1HAA 1F39 1GPV 1RGS 1BPM 2G33 3R1C 2F8V 2XTL 3FXI 1HMV 2VZ8 2VZ9 1AON 3OAA 3K1Q 1VRI

404 516 581 641 988 1067 1074 1142 1166 1310 1653 1842 2059 3673 4658 6865 7409 7974 12500 29596 30281 31949 58870 99573 101798 150720

4 1 3 5 3 2 2 9 1 8 4 7 3 3 4 3 1 4 3 3 3 3 3 3 3 3

22.03 22.51 22.72 21.76 27.77 23.48 23.28 39.29 41.20 40.92 23.17 25.91 23.04 24.40 22.67 27.30 22.29 25.03 22.95 23.37 23.14 23.14 24.18 23.54 24.40 25.32

4450 5808 6600 6974 13720 12525 12502 22436 24020 26801 19154 23863 23720 44818 52807 93699 82587 99805 143426 345757 350358 369726 711737 1171764 1241974 1907977

5.47 4.37 3.92 3.40 2.81 2.20 2.17 3.44 3.54 3.13 1.40 1.41 1.12 0.66 0.49 0.40 0.30 0.31 0.18 0.08 0.08 0.07 0.04 0.02 0.02 0.02

0.607 0.578 0.594 0.619 0.597 0.574 0.591 0.608 0.614 0.603 0.575 0.609 0.589 0.567 0.617 0.605 0.578 0.553 0.564 0.586 0.570 0.568 0.585 0.566 0.580 0.569

TABLE II R ESULTS OF SIMULATED ANNEALING ON GPU. Name PDB ID 1PTQ 1RDG 1HOE 1LFB 1PHT 1POA 1AX8 2KXD 1VMP 1HAA 1F39 1GPV 1RGS 1BPM 2G33 3R1C 2F8V 2XTL 3FXI 1HMV 2VZ8 2VZ9 1AON 3OAA 3K1Q 1VRI

N atoms

Dur. [sec]

Epochs

ef inal

RMSE

404 516 581 641 988 1067 1074 1142 1166 1310 1653 1842 2059 3673 4658 6865 7409 7974 12500 29596 30281 31949 58870 99573 101798 150720

0.1 0.2 12.7 0.8 9.9 0.4 8.1 0.1 0.2 0.9 1.1 0.3 34.3 7.5 75.3 1.5 123.7 156.4 221.3 1530.1 538.0 573.8 1175.9 2043.5 2426.0 3073.0

3 12 1e3 75 427 22 422 4 7 23 42 12 1e3 128 1e3 12 1e3 1e3 1e3 1e3 1e3 1e3 1e3 1e3 1e3 1e3

3.96e-02 7.75e-04 4.65e-02 9.15e-04 4.53e-03 6.78e-04 4.93e-03 2.49e-04 6.95e-04 3.29e-04 4.76e-04 6.96e-04 4.69e-02 3.36e-04 4.61e-02 3.46e-04 4.57e-02 4.65e-02 4.64e-02 6.04e-01 4.61e-02 4.64e-02 6.85e-02 9.58e-02 2.00e-01 4.64e-02

1.11e-01 2.11e-01 1.96e-01 2.92e-02 5.80e-02 9.57e-02 1.46e-01 2.10e-03 1.57e-02 4.97e-02 8.03e-03 2.56e-02 1.55e-01 2.79e-02 2.41e-01 4.45e-03 1.69e-01 1.10e-01 1.33e-01 8.79e+01 1.33e-01 1.74e-01 1.32e+00 1.62e+00 3.18e+01 1.46e-01

The local clustering coefficient is useful to determine how close related are the neighbors of a vertex to a clique. We use this coefficient to quantify the ability of the build-up algorithm to successfully reconstruct the coordinates. We run the programs on an Intel Core i7 @ 3.4 GHz processor (graph build-up). SA was executed on Nvidia

0.08 0.07 0.06 p

Name PDB ID

Cutoff at 4 Å Cutoff at 5 Å Cutoff at 6 Å

0.09

0.05 0.04 0.03 0.02 0.01 0

0 2 4 6 8 10 14 18 22 26 30 34 38 42 46 50 54 58 Degree

Fig. 3. Distribution of vertex degrees, for different cutoff ranges (4,5,6) for the selected dataset.

CUDA GTX 560 GPU. We stopped the annealing by either reaching 1,000 epochs, or when the maximal vertex error dropped below a threshold. The results of the build-up algorithm (without annealing) are shown in Table III. We compare our method to CMDSCALE (classical scaling) routine from MDSJ [11] and DGSOL [16]. The NA entries indicate that the program was not run successfully due to memory or time constraints. The quantities measured are: execution time in seconds, number of successfully determined atoms, and two different error measures. The first, in column Error is the root mean stress p σ 2 /|E|, where σ 2 is given by Eq. (1), and |E| is the number of edges. The second, in column RMSE contains the root mean squared error of the differences in x, y, z coordinates, given by Eq. (7) and using as reference the coordinates from the original PDB structure. Execution times are shown in a log-log plot in Fig. 4. The plot reveals the asymptotic faster nature of the build-up algorithm compared to the classical multidimensional scaling and the DGSOL program. We also observe that the execution time of the build-up algorithm has more irregularities. These are determined by the breadth-first search algorithm, which depends on the graph structure. The results of SA are depicted in Table IV and Fig. 2. Here we measured the duration, number of epochs, the mean squared distance errors at the end of annealing, and finally the root mean squared conformation error as given by Eq. (7) (using the original conformation as reference). Fig. 1 illustrates the intermediate results of the PDB ID 1PTQ structure during SA. V. C ONCLUSIONS We have introduced a method for the molecular distance problem. Our approach uses several standard algorithms. For an exact but sparse distance matrix, the build-up method reconstructs the 3D structures with a RMSE in the order of ˚ Small and medium structures (up to 104 atoms) are 0.1 A. computed in less than 10 seconds. For the largest structures ˚ and execution (up to 105 atoms), the build-up RMSE is 2.2 A time is about 540 seconds. The SA step improves the accuracy of the solution to the expense of longer execution time.

327 TABLE III R ESULTS OF THE BUILD - UP ALGORITHM .

Name PDB ID

#Atoms

Dur. [sec]

1PTQ 1RDG 1HOE 1LFB 1PHT 1POA 1AX8 2KXD 1VMP 1HAA 1F39 1GPV 1RGS 1BPM 2G33 3R1C 2F8V 2XTL 3FXI 1HMV 2VZ8 2VZ9 1AON 3OAA 3K1Q 1VRI

404 516 581 641 988 1067 1074 1142 1166 1310 1653 1842 2059 3673 4658 6865 7409 7974 12500 29596 30281 31949 58870 99573 101798 150720

0.25 0.12 0.14 0.10 0.21 0.26 0.27 0.14 0.28 0.28 0.32 0.48 0.46 0.53 1.05 3.56 3.06 2.96 9.76 27.22 35.67 54.89 146.91 226.65 179.53 541.80

Buildup algorithm #placed Error ˚ [A] 404 516 581 641 988 1067 1074 1142 1166 1310 1653 1842 2059 3673 4658 6865 7409 7974 12500 14798 30281 31949 58501 99542 94275 150700

1.35e-01 1.01e-01 5.59e-02 1.17e-01 2.19e-01 1.77e-01 1.27e-01 3.79e-02 4.42e-02 7.15e-02 8.56e-02 8.46e-02 1.39e-01 7.80e-02 1.24e-01 6.44e-02 1.17e-01 8.54e-02 9.94e-02 1.11e-01 1.05e-01 1.13e-01 1.07e-01 3.77e+00 7.39e-01 3.49e-01

RMSE ˚ [A]

Dur. [sec]

3.15e-01 2.89e-01 2.12e-01 2.86e-01 5.80e-01 4.56e-01 3.32e-01 8.20e-02 1.64e-01 1.62e-01 1.97e-01 1.75e-01 3.51e-01 1.87e-01 2.71e-01 1.47e-01 2.58e-01 2.16e-01 2.04e-01 8.47e+01 2.60e-01 2.79e-01 1.04e+01 6.76e+00 3.97e+01 2.22e+00

0.86 1.05 1.01 1.35 4.77 5.97 6.10 7.42 7.85 11.18 23.24 32.30 44.55 248.00 499.60 1582.26 NA NA NA NA NA NA NA NA NA NA

BFB CMDSCALE DGSOL

1024

Duration [seconds]

256 64 16 4 1 0.25 0.062 256

512

1024

2048

4096

8192

16384 32768 65536

Number of atoms

Fig. 4.

Execution times, for 3 different algorithms.

The performance depends largely on the graph structure. Our computational method has the potential to assist in more accurately predicting the three-dimensional structures of molecules that are critical in chemical biology and medicinal chemistry. Currently, we are seeking methods to implement the build-up stage on GPUs too, improve the performance of the SA stage and solve the problem for inexact input distances. R EFERENCES [1] W. Torgerson, “Multidimensional scaling: I. Theory and method,” Psychometrika, vol. 17, no. 4, pp. 401–419, Dec. 1952. [2] I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and Applications. Springer, 2005.

CMDSCALE Error ˚ [A] 2.41e+00 2.48e+00 2.56e+00 2.60e+00 2.69e+00 2.75e+00 2.76e+00 2.63e+00 2.69e+00 2.72e+00 2.90e+00 2.87e+00 2.97e+00 3.13e+00 3.14e+00 3.12e+00 NA NA NA NA NA NA NA NA NA NA

RMSE ˚ [A] 7.52e+00 7.48e+00 9.08e+00 1.08e+01 1.04e+01 1.21e+01 1.21e+01 9.31e+00 1.08e+01 1.04e+01 1.67e+01 1.62e+01 1.79e+01 2.17e+01 2.86e+01 3.42e+01 NA NA NA NA NA NA NA NA NA NA

DGSOL Dur. Error ˚ [sec] [A] 6.97 11.23 12.87 16.85 36.84 32.64 37.57 36.51 57.89 50.58 64.82 93.63 98.20 190.26 374.94 1122.18 NA NA NA NA NA NA NA NA NA NA

5.84e-02 7.77e-02 5.96e-02 5.15e-02 5.08e-02 7.68e-02 6.68e-02 4.73e-02 3.04e-02 4.28e-02 6.40e-02 4.76e-02 5.55e-02 7.22e-02 4.91e-02 6.63e-02 NA NA NA NA NA NA NA NA NA NA

[3] G. Seber, Multivariate observations, ser. in probability and statistics. Wiley-Interscience, 2004. [4] T. Kamada and S. Kawai, “An algorithm for drawing general undirected graphs,” Inf. Process. Lett., vol. 31, pp. 7–15, April 1989. [5] U. Brandes and C. Pich, “Eigensolver methods for progressive multidimensional scaling of large data,” in Graph Drawing, ser. Lecture Notes in Computer Science, M. Kaufmann and D. Wagner, Eds. Springer Berlin / Heidelberg, 2007, vol. 4372, pp. 42–53. [6] R. Davidson and D. Harel, “Drawing graphs nicely using simulated annealing,” ACM Trans. Graph., vol. 15, pp. 301–331, October 1996. [7] T. F. Havel, “Distance geometry: Theory, algorithms and chemical applications,” in Encyclopedia of Computational Chemistry. John Wiley and Sons, 1998, pp. 723–742. [8] W. Glunt, T. L. Hayden, and M. Raydan, “Molecular conformations from distance matrices,” J. Comput. Chem., vol. 14, pp. 114–120, January 1993. [9] C. Lavor, L. Liberti, and N. Maculan, “Molecular distance geometry problem,” in Encyclopedia of Optimization, C. A. Floudas and P. M. Pardalos, Eds. Springer US, 2009, pp. 2304–2311. [10] L. M. Blumenthal, Theory and applications of distance geometry. Bronx, NY: Chelsea, 1970. [11] A. Group, MDSJ: Java Library for Multidimensional Scaling, 2009. [Online]. Available: http://www.inf.uni-konstanz.de/algo/software/ mdsj/ [12] T. Havel, I. Kuntz, and G. Crippen, “The theory and practice of distance geometry,” Bulletin of Mathematical Biology, vol. 45, pp. 665–720, 1983. [13] M. Pharr and R. Fernando, GPU Gems 2: Programming techniques for high-performance graphics and general-purpose computation. Addison-Wesley Professional, 2005. [Online]. Available: http://http. developer.nvidia.com/GPUGems2/gpugems2 chapter43.html [14] J. de Leeuw, “Applications of convex analysis to multidimensional scaling,” in Recent Developments in Statistics, J. Barra, F. Brodeau, G. Romier, and B. V. Cutsem, Eds. Amsterdam: North Holland Publishing Company, 1977, pp. 133–146. [15] E. R. Gansner, Y. Koren, and S. North, “Graph drawing by stress majorization,” in Graph Drawing. Springer, 2004, pp. 239–250. [16] J. J. Mor´e and Z. Wu, “Distance geometry optimization for protein structures,” J. of Global Optimization, vol. 15, pp. 219–234, October 1999.

328 [17] A. Grosso, M. Locatelli, and F. Schoen, “Solving molecular distance geometry problems by global optimization algorithms,” Computational Optimization and Applications, vol. 43, pp. 23–37, 2009. [18] M. Nilges, A. M. Gronenborn, A. T. Brunger, and M. G. Clore, “Determination of three-dimensional structures of proteins by simulated annealing with interproton distance restraints. Application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2,” Protein Eng., vol. 2, no. 1, pp. 27–38, Apr. 1988. [19] A. H. C. Kampen, L. M. C. Buydens, C. B. Lucasius, and M. J. J. Blommers, “Optimisation of metric matrix embedding by genetic algorithms,” Journal of Biomolecular NMR, vol. 7, pp. 214–224, 1996. [20] R. Leardi, Nature-inspired methods in chemometrics: genetic algorithms and artificial neural networks, ser. Data handling in science and technology. Elsevier, 2003. [21] Q. Dong and Z. Wu, “A linear-time algorithm for solving the molecular distance geometry problem with exact inter-atomic distances,” Journal of Global Optimization, vol. 22, pp. 365–375, 2002. [22] ——, “A geometric build-up algorithm for solving the molecular distance geometry problem with sparse distance data,” J. of Global Optimization, vol. 26, pp. 321–333, July 2003. [23] D. Wu and Z. Wu, “An updated geometric build-up algorithm for solving the molecular distance geometry problems with sparse distance data,” J. of Global Optimization, vol. 37, pp. 661–673, April 2007. [24] R. Davis, C. Ernst, and D. Wu, “Protein structure determination via an efficient geometric build-up algorithm,” BMC Structural Biology, vol. 10, no. Suppl 1, p. S7, 2010. [25] A. Sit, Z. Wu, and Y. Yuan, “A geometric buildup algorithm for the solution of the distance geometry problem using Least-Squares approximation,” Bulletin of Mathematical Biology, 2007. [26] C. Lavor, L. Liberti, N. Maculan, and A. Mucherino, “Recent advances on the discretizable molecular distance geometry problem,” European Journal of Operational Research, 2011. [27] A. Mucherino, C. Lavor, L. Liberti, and E.-G. Talbi, “A parallel version of the branch and prune algorithm for the molecular distance geometry problem,” in Proceedings of the ACS/IEEE International Conference on Computer Systems and Applications, ser. AICCSA ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1–6.

[28] L. Liberti, C. Lavor, B. Masson, and A. Mucherino, “Polynomial cases of the discretizable molecular distance geometry problem,” CoRR, vol. abs/1103.1264, 2011, informal publication. [29] G. M. Crippen, “Linearized embedding: a new metric matrix algorithm for calculating molecular conformations subject to geometric constraints,” J. Comput. Chem., vol. 10, pp. 896–902, October 1989. [30] B. A. Hendrickson, “The Molecular Problem: Determining Conformation from Pairwise Distances,” Cornell University, Ithaca, NY, USA, Tech. Rep., 1990. [31] W. Rieping, M. Habeck, and M. Nilges, “Inferential Structure Determination,” Science, vol. 309, no. 5732, pp. 303–306, Jul. 2005. [32] JGraphT – a free Java graph library, 2009. [Online]. Available: http://jgrapht.sourceforge.net [33] “Commons-Math: The Apache Commons Mathematics Library.” [Online]. Available: http://commons.apache.org/math/ [34] W. Kabsch, “A solution for the best rotation to relate two sets of vectors.” Acta Crystallographica, vol. 32, pp. 922–923, 1976. [35] C. Bron and J. Kerbosch, “Algorithm 457: finding all cliques of an undirected graph,” Commun. ACM, vol. 16, pp. 575–577, September 1973. [36] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, pp. 671–680, 1983. [37] J. Barnes and P. Hut, “A hierarchical O(N log N) force-calculation algorithm,” Nature, vol. 324, no. 6096, pp. 446–449, Dec. 1986. [38] Khronos OpenCL Working Group, The OpenCL Specification, version 1.1, 2010. [Online]. Available: http://www.khronos.org/registry/cl/ specs/opencl-1.1.pdf [39] “JOCL - Java binding for the OpenCL API.” [Online]. Available: http://jogamp.org/jocl/www/ [40] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The Protein Data Bank,” Nucleic Acids Research, vol. 28, pp. 235–242, 2000. [Online]. Available: http://www.rcsb.org/pdb/ [41] K. W¨uthrich, “Protein structure determination in solution by NMR spectroscopy.” Journal of Biological Chemistry, vol. 265, no. 36, pp. 22 059–22 062, 1990. [42] R. D. Luce and A. D. Perry, “A method of matrix analysis of group structure,” Psychometrika, 1949.