Text S1.

Supporting Information: Accurate prediction of the dynamical changes within the second PDZ domain of PTP1e Elisa Cilia1, Geerten W. Vuister2 and Tom Lenaerts1,3 1 – MLG, Département d’Informatique, Université Libre de Bruxelles, Boulevard

du Triomphe CP212, 1050 Brussels, Belgium 2 – Department of Biochemistry, University of Leicester, Henry Wellcome

Building, Lancaster Road, Leicester LE1 9HN, United Kingdom 3 – AI-‐lab, Vakgroep Computerwetenschappen, Vrije Universiteit Brussel,

Pleinlaan 2, 1050 Brussels, Belgium

Measures of betweenness centrality We evaluated different measures of shortest path betweenness centrality. We computed node and edge betweenness centrality measures proposed in [1] and [2]. In [1], the betweenness centrality of a vertex or edge k is given by: !! =

!(!,!|!) !,!∈! !(!,!) ,

(Eq. 1)

where V is the set of vertices (residues) in the network, σ counts the number of shortest paths between vertices i and j, and σ(i,j|k) counts the number of shortest paths passing trough vertex/edge k [3,4]. We normalized this measure over the number of vertices/edges in the network. The measure proposed in [2] is slightly different. It corresponds to the change in the characteristic path length (L) observed by removing from the network a certain vertex or edge k:

∆!! = |! − !!"#,! |

(Eq. 2)

The characteristic path length L is essentially the average shortest path length ! !

!!! !(!, !), where N is the total number of vertex pairs and d(i,j) is the shortest

path length between two vertices i and j. We also computed node betweenness centrality as proposed in [5]. In that case the algorithm works by removing at each step the vertex with highest betweenness-‐centrality (according to Eq. 1) and its incident edges. Then, it computes again the betweenness centrality values for the residues affected by the removal in the remaining network. This is repeated until no more edges remain, or the network splits into different connected components. For extracting the most relevant residues from the edge betweenness centrality matrix we applied again the CAST algorithm. Edge betweenness centrality measures resulted in a less discriminant power in predicting allostery-‐involved

2

residues in hPDZ2 with respect to node betweenness centrality ones (data not shown). We selected the measure of node betweenness centrality in Eq. 2 as the best performing one in this context and we compared to that (see Figure S2 and Table S2). Note that, we have also looked at the performance of a predictor based on the changes in betweenness centrality happening upon binding the peptide. Still, these measures of centrality gave us better results when directly computed on the networks built from the ligand-‐free or ligand-‐bound structures (see Figure S2 and Table S2).

3

Tables Residues

Labeling

Residues

Labeling

ILEA6

N

VALA58

N

VALA9

M

LEUA59

N

LEUA11

M

ALAA60

M

ALAA12

M

VALA61

P

LEUA18

P

VALA64

P

ILEA20

P

LEUA66

P

VALA22

P

ALAA69

P

THRA23

N

THRA70

N

VALA26

P

ALAA74

M

THRA28

M

VALA75

M

VALA30

P

THRA77

N

ILEA35

N

LEUA78

P

VALA37

M

THRA81

P

ALAA39

P

VALA84

M

VALA40

P

VALA85

P

ILEA41

N

LEUA87

M

ALAA45

N

LEUA88

M

ALAA46

N

LEUA89

M

ILEA52

N

THRA96

M

Table S1. Assignment of classes to the experimentally derived hPDZ2 residues. The methyl side-‐chain residues are labelled as positive (P), negative (N), or missing (M) according to the experimental results reported in [6]. Positive residues are those for which NMR methyl side-‐chain relaxation experiments showed a significant change in S2 and τe parameters (the difference observed in the fitted parameter was twice the error [6]. On the opposite, negative residues are those for which the change is not significant according to the same criterion. Finally, residues for which there are missing or contradictory data, and due to this uncertainty cannot be included in the positive or negative sets, are

4

considered as missing values and therefore not included in the computation. Note that V9 is included in the missing data because subsequent results reported in [7] have shown a different dynamics response of this residue to key mutations like I35V and I20F and its response to peptide binding. NBC Predictor

AUC (bound) AUC (unbound) AUC (changes)

Freiman betweenness centrality

0.49

0.5

0.44

Del Sol betweenness centrality

0.59

0.55

0.55

Newman betweenness centrality

0.5

0.34

-‐

Table S2. Areas Under the ROC Curves (AUC) shown in Figure S2, for the Node Betweenness Centrality (NBC)-‐based predictors.

5

t 0.193 0.101 0.094 0.081 0.076 0.063 0.054 0.053 0.048 0.045 0.034 0.026

Predictions using only METHYL residue changes (Figure 1A) L87, L89, V9, L11, V26 + I20

0.189

L87, L89, V9, H71, D5, Y36, L11, V26

0.120

+ R57, L78

0.115

+ L78 + V40 + V30 + V37, V75, T77 + L66 + I35

t

+ L18 + T81 + V22

0.086 0.085 0.083 0.081 0.071 0.067 0.043 t2=0.027

+ V40 + I35 + E47, I20 + T81 + K54, R51, V75, K13, F7, E67, H86, V37, L18 + S17, S21, D49, N27, T28, V22 + R31, T70

0.024

+ I52 + N16, L88, V85

t1=0.023 0.021 0.02

+ L88, T70 + I52

0.022 0.021

0.018 0.016 0.015 0.012 0.007 0.006

+ V58 + T23 + I41 + T96, L59 + V84, I6

0.019 0.017 0.015 0.013 0.009 0.007

+ V64 + A12, A39, A45, A46, A60, A69, A74

+ L66 + S29, D56, H53, V30 + T77

0.025

+ V61 + T28, V85

0

Predictions using ALL residue changes (Figure S1)

0

+ V61 + H32, E8, T23 + V58 + D15, S48, I41 + Q93, L59 + K91, E10, T96, V84, I6 + E90, K72, S94, K2, V64 + P3, N14, N62, S65, E76, R79, N80, P1, K38, Q43, Q73, P42, Q83, P95, G4, A12, G19, G24, G25, G33, G34, A39 + G4, A12, G19, G24, G25, G33, G34, A39, G44, A45, A46, G50, G55, A60, G63, G68, A69, A74, G82, G92

Table S3. Predictions corresponding to the different points of the ROC curves in Figure 3A, by decreasing the threshold t of the clustering algorithm. The bold lines indicate the points in the ROC curves corresponding to the best performing predictors. The experimentally identified residues are annotated in bold.

6

"2 !5

!4 "1 !3 !2 !1

PRO1 LYS2 PRO3 GLY4 ASP5 ILE6 PHE7 GLU8 VAL9 GLU10 LEU11 ALA12 LYS13 ASN14 ASP15 ASN16 SER17 LEU18 GLY19 ILE20 SER21 VAL22 THR23 GLY24 GLY25 VAL26 ASN27 THR28 SER29 VAL30 ARG31 HIS32 GLY33 GLY34 ILE35 TYR36 VAL37 LYS38 ALA39 VAL40 ILE41 PRO42 GLN43 GLY44 ALA45 ALA46 GLU47 SER48 ASP49 GLY50 ARG51 ILE52 HIS53 LYS54 GLY55 ASP56 ARG57 VAL58 LEU59 ALA60 VAL61 ASN62 GLY63 VAL64 SER65 LEU66 GLU67 GLY68 ALA69 THR70 HIS71 LYS72 GLN73 ALA74 VAL75 GLU76 THR77 LEU78 ARG79 ASN80 THR81 GLY82 GLN83 VAL84 VAL85 HIS86 LEU87 LEU88 LEU89 GLU90 LYS91 GLY92 GLN93 SER94 PRO95 THR96

THR96 PRO95 SER94 GLN93 GLY92 LYS91 GLU90 LEU89 LEU88 LEU87 HIS86 VAL85 VAL84 GLN83 GLY82 THR81 ASN80 ARG79 LEU78 THR77 GLU76 VAL75 ALA74 GLN73 LYS72 HIS71 THR70 ALA69 GLY68 GLU67 LEU66 SER65 VAL64 GLY63 ASN62 VAL61 ALA60 LEU59 VAL58 ARG57 ASP56 GLY55 LYS54 HIS53 ILE52 ARG51 GLY50 ASP49 SER48 GLU47 ALA46 ALA45 GLY44 GLN43 PRO42 ILE41 VAL40 ALA39 LYS38 VAL37 TYR36 ILE35 GLY34 GLY33 HIS32 ARG31 VAL30 SER29 THR28 ASN27 VAL26 GLY25 GLY24 THR23 VAL22 SER21 ILE20 GLY19 LEU18 SER17 ASN16 ASP15 ASN14 LYS13 ALA12 LEU11 GLU10 VAL9 GLU8 PHE7 ILE6 ASP5 GLY4 PRO3 LYS2 PRO1

!6

Figures

!1

!2

!3

"1

!4

!5

"2

!6

Figure S1. A representation of the matrix of dynamical changes (heat map) in hPDZ2. Absolute differences in side-‐chain coupling, normalized between 0 and 1, are represented with colors from blue (0) to red (1) according to the color scale reported on top. Given two residues R1 and R2, it corresponds to |!!"#$% !! , !! − !!"## (!! , !! )|, i.e. the absolute difference of mutual information between the residues side-‐chain conformational distributions in the bound and unbound states (see also Materials and Methods). The amino acid sequence is annotated with the corresponding secondary structure elements.

7

B

Node Betweenness (bound form)

Node Betweenness (unbound form)

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

TPR

TPR

A

0.5

0.5

0.4

0.4

0.3

0.3

0.2 0.1 0

0.2

random betweenness centrality del Sol betweenness centrality Newman betweenness centrality 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

random betweenness centrality del Sol betweenness centrality Newman betweenness centrality

0.1

0.9

0

1

0

0.1

0.2

0.3

0.4

FPR

0.5

0.6

0.7

0.8

0.9

1

FPR

C

Node Betweenness (changes) 1 0.9 0.8 0.7

TPR

0.6 0.5 0.4 0.3 0.2 random betweenness centrality del Sol betweenness centrality

0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FPR

Figure S2. ROC curves of the node betweenness centrality-‐based predictors. A) Node betweenness centrality is computed according to different algorithms [1], [2], [5], starting from the structure of hPDZ2 in the bound form. B) Node betweenness centrality is computed starting from the structure of hPDZ2 in the unbound form. C) Residues are ranked according to the difference in their node betweenness centrality in the two states (bound and unbound). 1 0.9 0.8 0.7

TPR

0.6 0.5 0.4 0.3 0.2 methyl random predictor integrating backbone info

0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure S3. ROC curve for the predictor based on integrated information extracted from backbone variations and side-‐chain dynamics. A very slight AUC improvement can be noticed (from 0.74 to 0.75). FPR

8

"2

!5 !4 "1 !3 !2

!1

LYS2 PRO3 GLY4 ASP5 THR6 PHE7 GLU8 VAL9 GLU10 LEU11 ALA12 LYS13 THR14 ASP15 GLY16 SER17 LEU18 GLY19 ILE20 SER21 VAL22 THR23 GLY24 GLY25 VAL26 ASN27 THR28 SER29 VAL30 ARG31 HIS32 GLY33 GLY34 ILE35 TYR36 VAL37 LYS38 ALA39 ILE40 ILE41 PRO42 LYS43 GLY44 ALA45 ALA46 GLU47 SER48 ASP49 GLY50 ARG51 ILE52 HIS53 LYS54 GLY55 ASP56 ARG57 VAL58 LEU59 ALA60 VAL611 ASN62 GLY63 VAL64 SER65 LEU66 GLU67 GLY68 ALA69 THR70 HIS71 LYS72 GLN73 ALA74 VAL75 GLU76 THR77 LEU78 ARG79 ASN80 THR81 GLY82 GLN83 VAL84 VAL85 HIS86 LEU87 LEU88 LEU89 GLU90 LYS91 GLY92 GLN93 VAL94 PRO95

PRO95 VAL94 GLN93 GLY92 LYS91 GLU90 LEU89 LEU88 LEU87 HIS86 VAL85 VAL84 GLN83 GLY82 THR81 ASN80 ARG79 LEU78 THR77 GLU76 VAL75 ALA74 GLN73 LYS72 HIS71 THR70 ALA69 GLY68 GLU67 LEU66 SER65 VAL64 GLY63 ASN62 VAL61 ALA60 LEU59 VAL58 ARG57 ASP56 GLY55 LYS54 HIS53 ILE52 ARG51 GLY50 ASP49 SER48 GLU47 ALA46 ALA45 GLY44 LYS43 PRO42 ILE41 ILE40 ALA39 LYS38 VAL37 TYR36 ILE35 GLY34 GLY33 HIS32 ARG31 VAL30 SER29 THR28 ASN27 VAL26 GLY25 GLY24 THR23 VAL22 SER21 ILE20 GLY19 LEU18 SER17 GLY16 ASP15 THR14 LYS13 ALA12 LEU11 GLU10 VAL9 GLU8 PHE7 THR6 ASP5 GLY4 PRO3 LYS2

!6

!1

!2

!3

"1

!4

!5

"2

!6

Figure S4. Heat map of the matrix of dynamical changes computed for mPDZ2. The numbering of the amino acid is referred to the alignment in Figure 1C. The amino acid sequence is annotated with the corresponding secondary structure elements.

9

Figure S5. Network of the short-‐range dynamical changes in mPDZ2. Residues highlighted in green are those predicted from the complete matrix of changes in MI coupling (light green for the methyl side-‐chain bearing ones and dark green for the others). Red edges represent an increase in MI coupling upon the binding event, while blue edges represent a decrease in coupling. The thickness of the edges represents the amount of the change. Peptide residues and their contacts with the domain residues are highlighted in orange. The network visualization follows the organic layout implemented in Cytoscape [8]. Note that, for mPDZ2 the network of residue contacts is extracted from the NMR ensembles by including only the contacts present in more than the 50% of the backbones of the ensemble.

10

ASP15 GLN83 LYS13

ASN80

THR81

VAL84 ASN16 ASN62

ARG79

SER17

ASN14 VAL85

THR77

GLU76 LYS72

LEU18

GLN73 LEU11

GLU10

LEU78

VAL61

VAL75 P0

HIS86

VAL64

P1

LEU87

LEU66

VAL9

P2

THR70

ILE20 ASP49

HIS71 GLU8

SER48

VAL22 ARG51

LEU59

ILE52 LEU88

VAL58

SER65

ILE35 VAL26

SER21 GLU67

P3

ASN27

P4

VAL40 PHE7

ILE6

LEU89 THR23

VAL37

ASP56 GLU47 HIS53 GLU90

GLN43

THR28

LYS38 ARG57

ILE41

P5

TYR36

ASP5

SER29 LYS54

PRO1 LYS91

VAL30

PRO42

ARG31 HIS32

LYS2

PRO3 GLN93

SER94

1 Conservation * * * * * hPDZ2 PKPGD I F 1- K P G D T F mPDZ2 Conservation * * * * * hPDZ2 PKPGD I F 51 mPDZ2 K P G D T F Conservation * * * * * * * hPDZ2 R I HKGDR 51 mPDZ2 R I HKGDR Conservation * * * * * * * hPDZ2 R I HKGDR mPDZ2 R I HKGDR

* E E * E E * V V * V V

* * VE VE * * V !1 E V * E * LA LA * * LA LA

11 * * * . * . * * * LAKNDNSLG 11 L AK T DGS LG * * * . * . * * * LAKNDNSLG 61 L * A * K * T * D * G * S * L * G * VNGV S L EGA 61 VNGV S L EGA * * * * * * * * * VNGV S L EGA VNGV S L EGA

!4

BS

DS1

DS2

DS3

DS4

!5

DS2-4

* I I * I *I T T * T T

21 * * * * * * S V T GGV 21 S V T GGV * * * * * * S!2 V T G G V 71 S * V * T * G * G * V * HKQAV E 71 HKQAV E * * * * * * HKQAV E HKQAV E "2

* N N * N N * T T * T T

* T T * T T * L L * L L

* * SV SV * * SV S * V * RN RN * * RN RN

31 * * * * * RHGG I 31 RHGG I * * * * * RHGG I 81 R * H * G * G * *I T GQV V 81 T GQV V * * * * * T GQV V T GQV V

* Y Y * Y Y * H H * H H

* * * VKA VKA * * * V!3K A V * K * A * LLL LLL * * * LLL LLL

: V I : V *I E E * E E

41 * * : * * * I PQGAA 41 I PKGAA * * : * * * I P Q G A"1A 91 *I P * K * GA * A KGQS P T 91 KGQV P * * * * KGQS P T KGQV P -

* E E * E E

* S S * S S

* * DG DG * * DG DG

!6

Figure S6. The predictions based on mutual information couplings from the matrices in Figure S1 and S4 are mapped in a sequence alignment of the two homologous domains (hPDZ2 and mPDZ2), and on the network of changes in mutual information for hPDZ2. The regions discussed in the paper are highlighted with different colors (binding site (BS), distal region 1, 2, 3 and 4 (DS1, DS2, DS3, DS4) and the linker region between DS2 and DS4 (DS2-‐4) also shown in Figure 2C. The alignment also reports the secondary structure of the domains.

11

Figure S7. The matrix of raw ΔMI values for the methyl-‐group bearing residues. The matrix colors the values from blue (-‐0.2) to red (0.7) according to the color scale reported on the right. To produce the matrix in Figure 1A, the absolute values were taken of each ΔMI and then these absolute values were normalized between 0 and 1. References 1. Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40: 35-‐41. 2. del Sol A, Fujihashi H, Amoros D, Nussinov R (2006) Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Molecular systems biology 2: 2006 0019. 3. Brandes U (2001) A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25: 163-‐177. 4. Brandes U (2008) On variants of shortest-‐path betweenness centrality and their generic computation. Social Networks 30: 136-‐145.

12

5. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America 99: 7821-‐7826. 6. Fuentes EJ, Der CJ, Lee AL (2004) Ligand-‐dependent dynamics and intramolecular signaling in a PDZ domain. Journal of molecular biology 335: 1105-‐1115. 7. Fuentes EJ, Gilmore SA, Mauldin RV, Lee AL (2006) Evaluation of energetic and dynamic coupling networks in a PDZ domain protein. Journal of molecular biology 364: 337-‐351. 8. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27: 431-‐432.

13