Text S1.

8 downloads 0 Views 1MB Size Report
2 – Department of Biochemistry, University of Leicester, Henry Wellcome. Building, Lancaster Road, Leicester LE1 9HN, United Kingdom. 3 – AI-‐lab, Vakgroep ...
Supporting  Information:   Accurate  prediction  of  the  dynamical  changes  within  the   second  PDZ  domain  of  PTP1e     Elisa  Cilia1,  Geerten  W.  Vuister2  and  Tom  Lenaerts1,3     1  –  MLG,  Département  d’Informatique,  Université  Libre  de  Bruxelles,  Boulevard  

du  Triomphe  CP212,  1050  Brussels,  Belgium   2  –  Department  of  Biochemistry,  University  of  Leicester,  Henry  Wellcome  

Building,  Lancaster  Road,  Leicester  LE1  9HN,  United  Kingdom   3  –  AI-­‐lab,  Vakgroep  Computerwetenschappen,  Vrije  Universiteit  Brussel,  

Pleinlaan  2,  1050  Brussels,  Belgium        

 

Measures  of  betweenness  centrality   We   evaluated   different   measures   of   shortest   path   betweenness   centrality.   We   computed  node  and  edge  betweenness  centrality  measures  proposed  in  [1]  and   [2].  In  [1],  the  betweenness  centrality  of  a  vertex  or  edge  k  is  given  by:   !! =  

!(!,!|!) !,!∈! !(!,!) ,  

   

 

 

(Eq.  1)  

where  V  is  the  set  of  vertices  (residues)  in  the  network,  σ  counts  the  number  of   shortest  paths  between  vertices  i  and  j,  and  σ(i,j|k)  counts  the  number  of  shortest   paths  passing  trough  vertex/edge  k  [3,4].  We  normalized  this  measure  over  the   number  of  vertices/edges  in  the  network.  The  measure  proposed  in  [2]  is  slightly   different.   It   corresponds   to   the   change   in   the   characteristic   path   length   (L)   observed  by  removing  from  the  network  a  certain  vertex  or  edge  k:        

∆!! = |! − !!"#,! |  

 

 

 

(Eq.  2)    

The   characteristic   path   length   L   is   essentially   the   average   shortest   path   length   ! !

!!! !(!, !),  where  N  is  the  total  number  of  vertex  pairs  and  d(i,j)  is  the  shortest  

path  length  between  two  vertices  i  and  j.   We  also  computed  node  betweenness  centrality  as  proposed  in  [5].    In  that  case   the   algorithm   works   by   removing   at   each   step   the   vertex   with   highest   betweenness-­‐centrality   (according   to   Eq.   1)   and   its   incident   edges.   Then,   it   computes   again   the   betweenness   centrality   values   for   the   residues   affected   by   the   removal   in   the   remaining   network.   This   is   repeated   until   no   more   edges   remain,  or  the  network  splits  into  different  connected  components.   For  extracting  the  most  relevant  residues  from  the  edge  betweenness  centrality   matrix   we   applied   again   the   CAST   algorithm.   Edge   betweenness   centrality   measures  resulted  in  a  less  discriminant  power  in  predicting  allostery-­‐involved  

 

2  

residues   in   hPDZ2   with   respect   to   node   betweenness   centrality   ones   (data   not   shown).  We  selected  the  measure  of  node  betweenness  centrality  in  Eq.  2  as  the   best  performing  one  in  this  context  and  we  compared  to  that  (see  Figure  S2  and   Table  S2).     Note   that,   we   have   also   looked   at   the   performance   of   a   predictor   based   on   the   changes   in   betweenness   centrality   happening   upon   binding   the   peptide.   Still,   these   measures   of   centrality   gave   us   better   results   when   directly   computed   on   the   networks   built   from   the   ligand-­‐free   or   ligand-­‐bound   structures   (see   Figure   S2  and  Table  S2).          

 

 

3  

Tables   Residues  

Labeling  

Residues  

Labeling  

ILEA6  

N  

VALA58  

N  

VALA9  

M  

LEUA59  

N  

LEUA11  

M  

ALAA60  

M  

ALAA12  

M  

VALA61  

P  

LEUA18  

P  

VALA64  

P  

ILEA20  

P  

LEUA66  

P  

VALA22  

P  

ALAA69  

P  

THRA23  

N  

THRA70  

N  

VALA26  

P  

ALAA74  

M  

THRA28  

M  

VALA75  

M  

VALA30  

P  

THRA77  

N  

ILEA35  

N  

LEUA78  

P  

VALA37  

M  

THRA81  

P  

ALAA39  

P  

VALA84  

M  

VALA40  

P  

VALA85  

P  

ILEA41  

N  

LEUA87  

M  

ALAA45  

N  

LEUA88  

M  

ALAA46  

N  

LEUA89  

M  

ILEA52  

N  

THRA96  

M  

Table   S1.   Assignment   of   classes   to   the   experimentally   derived   hPDZ2   residues.   The   methyl   side-­‐chain   residues   are   labelled   as   positive   (P),   negative   (N),   or   missing   (M)   according   to   the   experimental   results   reported   in   [6].   Positive   residues   are   those   for   which   NMR   methyl   side-­‐chain   relaxation   experiments   showed  a  significant  change  in  S2  and  τe  parameters  (the  difference  observed  in   the  fitted  parameter  was  twice  the  error  [6].  On  the  opposite,  negative  residues   are   those   for   which   the   change   is   not   significant   according   to   the   same   criterion.   Finally,   residues   for   which   there   are   missing   or   contradictory   data,   and   due   to   this   uncertainty   cannot   be   included   in   the   positive   or   negative   sets,   are    

4  

considered   as   missing   values   and   therefore   not   included   in   the   computation.   Note  that  V9  is  included  in  the  missing  data  because  subsequent  results  reported   in  [7]  have  shown  a  different  dynamics  response  of  this  residue  to  key  mutations   like  I35V  and  I20F  and  its  response  to  peptide  binding.                 NBC  Predictor    

AUC  (bound)   AUC  (unbound)   AUC  (changes)  

Freiman  betweenness  centrality    

0.49  

0.5  

0.44  

Del  Sol  betweenness  centrality    

0.59  

0.55  

0.55  

Newman  betweenness  centrality    

0.5  

0.34  

-­‐  

Table   S2.   Areas   Under   the   ROC   Curves   (AUC)   shown   in   Figure   S2,   for   the   Node   Betweenness  Centrality  (NBC)-­‐based  predictors.        

 

 

5  

 

t   0.193   0.101   0.094   0.081   0.076   0.063   0.054   0.053   0.048   0.045   0.034   0.026  

Predictions  using   only  METHYL   residue  changes   (Figure  1A)   L87,  L89,  V9,  L11,   V26   +  I20  

0.189  

L87,  L89,  V9,  H71,  D5,  Y36,   L11,  V26  

0.120  

+  R57,  L78  

0.115  

+  L78   +  V40   +  V30   +  V37,  V75,  T77   +  L66   +  I35  

t  

 

+  L18   +  T81   +  V22  

0.086   0.085   0.083   0.081   0.071   0.067   0.043   t2=0.027  

+  V40   +  I35   +  E47,  I20   +  T81   +  K54,  R51,  V75,  K13,  F7,   E67,  H86,  V37,  L18   +  S17,  S21,  D49,  N27,  T28,   V22   +  R31,  T70  

0.024  

+  I52   +  N16,  L88,  V85  

t1=0.023   0.021   0.02  

+  L88,  T70   +  I52  

0.022   0.021  

0.018   0.016   0.015   0.012   0.007   0.006  

+  V58   +  T23   +  I41   +  T96,  L59   +  V84,  I6  

0.019   0.017   0.015   0.013   0.009   0.007  

+  V64   +  A12,  A39,  A45,   A46,  A60,  A69,  A74  

+  L66   +  S29,  D56,  H53,  V30   +  T77  

0.025  

+  V61   +  T28,  V85  

0    

Predictions  using  ALL   residue  changes     (Figure  S1)  

0        

+  V61   +  H32,  E8,  T23   +  V58   +  D15,  S48,  I41   +  Q93,  L59   +  K91,  E10,  T96,  V84,  I6   +  E90,  K72,  S94,  K2,  V64   +  P3,  N14,  N62,  S65,  E76,   R79,  N80,  P1,  K38,  Q43,  Q73,   P42,  Q83,  P95,  G4,  A12,  G19,   G24,  G25,  G33,  G34,  A39   +    G4,  A12,  G19,  G24,  G25,   G33,  G34,  A39,  G44,  A45,   A46,    G50,  G55,  A60,  G63,   G68,  A69,  A74,  G82,  G92  

  Table  S3.  Predictions  corresponding  to  the  different  points  of  the  ROC  curves  in   Figure   3A,   by   decreasing   the   threshold   t   of   the   clustering   algorithm.   The   bold   lines   indicate   the   points   in   the   ROC   curves   corresponding   to   the   best   performing   predictors.    The  experimentally  identified  residues  are  annotated  in  bold.            

6  

"2 !5

!4 "1 !3 !2 !1

PRO1 LYS2 PRO3 GLY4 ASP5 ILE6 PHE7 GLU8 VAL9 GLU10 LEU11 ALA12 LYS13 ASN14 ASP15 ASN16 SER17 LEU18 GLY19 ILE20 SER21 VAL22 THR23 GLY24 GLY25 VAL26 ASN27 THR28 SER29 VAL30 ARG31 HIS32 GLY33 GLY34 ILE35 TYR36 VAL37 LYS38 ALA39 VAL40 ILE41 PRO42 GLN43 GLY44 ALA45 ALA46 GLU47 SER48 ASP49 GLY50 ARG51 ILE52 HIS53 LYS54 GLY55 ASP56 ARG57 VAL58 LEU59 ALA60 VAL61 ASN62 GLY63 VAL64 SER65 LEU66 GLU67 GLY68 ALA69 THR70 HIS71 LYS72 GLN73 ALA74 VAL75 GLU76 THR77 LEU78 ARG79 ASN80 THR81 GLY82 GLN83 VAL84 VAL85 HIS86 LEU87 LEU88 LEU89 GLU90 LYS91 GLY92 GLN93 SER94 PRO95 THR96

THR96 PRO95 SER94 GLN93 GLY92 LYS91 GLU90 LEU89 LEU88 LEU87 HIS86 VAL85 VAL84 GLN83 GLY82 THR81 ASN80 ARG79 LEU78 THR77 GLU76 VAL75 ALA74 GLN73 LYS72 HIS71 THR70 ALA69 GLY68 GLU67 LEU66 SER65 VAL64 GLY63 ASN62 VAL61 ALA60 LEU59 VAL58 ARG57 ASP56 GLY55 LYS54 HIS53 ILE52 ARG51 GLY50 ASP49 SER48 GLU47 ALA46 ALA45 GLY44 GLN43 PRO42 ILE41 VAL40 ALA39 LYS38 VAL37 TYR36 ILE35 GLY34 GLY33 HIS32 ARG31 VAL30 SER29 THR28 ASN27 VAL26 GLY25 GLY24 THR23 VAL22 SER21 ILE20 GLY19 LEU18 SER17 ASN16 ASP15 ASN14 LYS13 ALA12 LEU11 GLU10 VAL9 GLU8 PHE7 ILE6 ASP5 GLY4 PRO3 LYS2 PRO1

!6

Figures    

!1

!2

!3

"1

!4

!5

"2

!6

  Figure   S1.   A   representation   of   the   matrix   of   dynamical   changes   (heat   map)   in   hPDZ2.  Absolute  differences  in  side-­‐chain  coupling,  normalized  between  0  and  1,   are  represented  with  colors  from  blue  (0)  to  red  (1)  according  to  the  color  scale   reported   on   top.   Given   two   residues   R1   and   R2,   it   corresponds   to   |!!"#$% !! , !! − !!"## (!! , !! )|,   i.e.   the   absolute   difference   of   mutual   information   between   the   residues   side-­‐chain   conformational   distributions   in   the   bound   and   unbound   states   (see   also   Materials   and   Methods).   The   amino   acid   sequence  is  annotated  with  the  corresponding  secondary  structure  elements.      

 

7  

B

Node Betweenness (bound form)

Node Betweenness (unbound form)

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

TPR

TPR

A

0.5

0.5

0.4

0.4

0.3

0.3

0.2 0.1 0

0.2

random betweenness centrality del Sol betweenness centrality Newman betweenness centrality 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

random betweenness centrality del Sol betweenness centrality Newman betweenness centrality

0.1

0.9

0

1

0

0.1

0.2

0.3

0.4

FPR

0.5

0.6

0.7

0.8

0.9

1

FPR

C

Node Betweenness (changes) 1 0.9 0.8 0.7

TPR

0.6 0.5 0.4 0.3 0.2 random betweenness centrality del Sol betweenness centrality

0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

 

FPR

Figure   S2.   ROC   curves   of   the   node   betweenness   centrality-­‐based   predictors.   A)   Node  betweenness  centrality  is  computed  according  to  different  algorithms  [1],   [2],   [5],   starting   from   the   structure   of   hPDZ2   in   the   bound   form.   B)   Node   betweenness  centrality  is  computed  starting  from  the  structure  of  hPDZ2  in  the   unbound  form.    C)  Residues  are  ranked  according  to  the  difference  in  their  node   betweenness  centrality  in  the  two  states  (bound  and  unbound).   1 0.9 0.8 0.7

TPR

0.6 0.5 0.4 0.3 0.2 methyl random predictor integrating backbone info

0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

    Figure  S3.  ROC  curve  for  the  predictor  based  on  integrated  information  extracted   from   backbone   variations   and   side-­‐chain   dynamics.   A   very   slight   AUC   improvement  can  be  noticed  (from  0.74  to  0.75).     FPR

 

8  

"2

!5 !4 "1 !3 !2

!1

LYS2 PRO3 GLY4 ASP5 THR6 PHE7 GLU8 VAL9 GLU10 LEU11 ALA12 LYS13 THR14 ASP15 GLY16 SER17 LEU18 GLY19 ILE20 SER21 VAL22 THR23 GLY24 GLY25 VAL26 ASN27 THR28 SER29 VAL30 ARG31 HIS32 GLY33 GLY34 ILE35 TYR36 VAL37 LYS38 ALA39 ILE40 ILE41 PRO42 LYS43 GLY44 ALA45 ALA46 GLU47 SER48 ASP49 GLY50 ARG51 ILE52 HIS53 LYS54 GLY55 ASP56 ARG57 VAL58 LEU59 ALA60 VAL611 ASN62 GLY63 VAL64 SER65 LEU66 GLU67 GLY68 ALA69 THR70 HIS71 LYS72 GLN73 ALA74 VAL75 GLU76 THR77 LEU78 ARG79 ASN80 THR81 GLY82 GLN83 VAL84 VAL85 HIS86 LEU87 LEU88 LEU89 GLU90 LYS91 GLY92 GLN93 VAL94 PRO95

PRO95 VAL94 GLN93 GLY92 LYS91 GLU90 LEU89 LEU88 LEU87 HIS86 VAL85 VAL84 GLN83 GLY82 THR81 ASN80 ARG79 LEU78 THR77 GLU76 VAL75 ALA74 GLN73 LYS72 HIS71 THR70 ALA69 GLY68 GLU67 LEU66 SER65 VAL64 GLY63 ASN62 VAL61 ALA60 LEU59 VAL58 ARG57 ASP56 GLY55 LYS54 HIS53 ILE52 ARG51 GLY50 ASP49 SER48 GLU47 ALA46 ALA45 GLY44 LYS43 PRO42 ILE41 ILE40 ALA39 LYS38 VAL37 TYR36 ILE35 GLY34 GLY33 HIS32 ARG31 VAL30 SER29 THR28 ASN27 VAL26 GLY25 GLY24 THR23 VAL22 SER21 ILE20 GLY19 LEU18 SER17 GLY16 ASP15 THR14 LYS13 ALA12 LEU11 GLU10 VAL9 GLU8 PHE7 THR6 ASP5 GLY4 PRO3 LYS2

!6

   

!1

!2

!3

"1

!4

!5

"2

!6

  Figure   S4.   Heat   map   of   the   matrix   of   dynamical   changes   computed   for   mPDZ2.   The  numbering  of  the  amino  acid  is  referred  to  the  alignment  in  Figure  1C.  The   amino   acid   sequence   is   annotated   with   the   corresponding   secondary   structure   elements.        

 

9  

Figure   S5.   Network   of   the   short-­‐range   dynamical   changes   in   mPDZ2.   Residues   highlighted  in  green  are  those  predicted  from  the  complete  matrix  of  changes  in   MI  coupling  (light  green  for  the  methyl  side-­‐chain  bearing  ones  and  dark  green   for  the  others).  Red  edges  represent  an  increase  in  MI  coupling  upon  the  binding   event,   while   blue   edges   represent   a   decrease   in   coupling.   The   thickness   of   the   edges  represents  the  amount  of  the  change.  Peptide  residues  and  their  contacts   with   the   domain   residues   are   highlighted   in   orange.   The   network   visualization   follows   the   organic   layout   implemented   in   Cytoscape   [8].   Note   that,   for   mPDZ2   the   network   of   residue   contacts   is   extracted   from   the   NMR   ensembles   by   including  only  the  contacts  present  in  more  than  the  50%  of  the  backbones  of  the   ensemble.  

 

10  

ASP15 GLN83 LYS13

ASN80

THR81

VAL84 ASN16 ASN62

ARG79

SER17

ASN14 VAL85

THR77

GLU76 LYS72

LEU18

GLN73 LEU11

GLU10

LEU78

VAL61

VAL75 P0

HIS86

VAL64

P1

LEU87

LEU66

VAL9

P2

THR70

ILE20 ASP49

HIS71 GLU8

SER48

VAL22 ARG51

LEU59

ILE52 LEU88

VAL58

SER65

ILE35 VAL26

SER21 GLU67

P3

ASN27

P4

VAL40 PHE7

ILE6

LEU89 THR23

VAL37

ASP56 GLU47 HIS53 GLU90

GLN43

THR28

LYS38 ARG57

ILE41

P5

TYR36

ASP5

SER29 LYS54

PRO1 LYS91

VAL30

PRO42

ARG31 HIS32

LYS2

PRO3 GLN93

SER94

1 Conservation * * * * * hPDZ2 PKPGD I F 1- K P G D T F mPDZ2 Conservation * * * * * hPDZ2 PKPGD I F 51 mPDZ2 K P G D T F Conservation * * * * * * * hPDZ2 R I HKGDR 51 mPDZ2 R I HKGDR Conservation * * * * * * * hPDZ2 R I HKGDR mPDZ2 R I HKGDR

* E E * E E * V V * V V

* * VE VE * * V !1 E V * E * LA LA * * LA LA

11 * * * . * . * * * LAKNDNSLG 11 L AK T DGS LG * * * . * . * * * LAKNDNSLG 61 L * A * K * T * D * G * S * L * G * VNGV S L EGA 61 VNGV S L EGA * * * * * * * * * VNGV S L EGA VNGV S L EGA

!4

BS

DS1

DS2

DS3

DS4

!5

DS2-4

* I I * I *I T T * T T

21 * * * * * * S V T GGV 21 S V T GGV * * * * * * S!2 V T G G V 71 S * V * T * G * G * V * HKQAV E 71 HKQAV E * * * * * * HKQAV E HKQAV E "2

* N N * N N * T T * T T

* T T * T T * L L * L L

* * SV SV * * SV S * V * RN RN * * RN RN

31 * * * * * RHGG I 31 RHGG I * * * * * RHGG I 81 R * H * G * G * *I T GQV V 81 T GQV V * * * * * T GQV V T GQV V

* Y Y * Y Y * H H * H H

* * * VKA VKA * * * V!3K A V * K * A * LLL LLL * * * LLL LLL

: V I : V *I E E * E E

41 * * : * * * I PQGAA 41 I PKGAA * * : * * * I P Q G A"1A 91 *I P * K * GA * A KGQS P T 91 KGQV P * * * * KGQS P T KGQV P -

* E E * E E

* S S * S S

* * DG DG * * DG DG

!6

 

Figure   S6.   The   predictions   based   on   mutual   information   couplings   from   the   matrices   in   Figure   S1   and   S4   are   mapped   in   a   sequence   alignment   of   the   two   homologous   domains   (hPDZ2   and   mPDZ2),   and   on   the   network   of   changes   in   mutual   information   for   hPDZ2.   The   regions   discussed   in   the   paper   are   highlighted   with   different   colors   (binding   site   (BS),   distal   region   1,   2,   3   and   4   (DS1,   DS2,   DS3,   DS4)   and   the   linker   region   between   DS2   and   DS4   (DS2-­‐4)   also   shown   in   Figure   2C.   The   alignment   also   reports   the   secondary   structure   of   the   domains.          

 

11  

  Figure  S7.  The  matrix  of  raw  ΔMI  values  for  the  methyl-­‐group  bearing  residues.     The  matrix  colors  the  values  from  blue  (-­‐0.2)  to  red  (0.7)  according  to  the  color   scale   reported   on   the   right.   To   produce   the   matrix   in   Figure   1A,   the   absolute   values  were  taken  of  each  ΔMI  and  then  these  absolute  values  were  normalized   between  0  and  1.         References     1.   Freeman   LC   (1977)   A   set   of   measures   of   centrality   based   on   betweenness.   Sociometry  40:  35-­‐41.   2.   del   Sol   A,   Fujihashi   H,   Amoros   D,   Nussinov   R   (2006)   Residues   crucial   for   maintaining  short  paths  in  network  communication  mediate  signaling  in   proteins.  Molecular  systems  biology  2:  2006  0019.   3.   Brandes   U   (2001)   A   faster   algorithm   for   betweenness   centrality.   Journal   of   Mathematical  Sociology  25:  163-­‐177.   4.   Brandes   U   (2008)   On   variants   of   shortest-­‐path   betweenness   centrality   and   their  generic  computation.  Social  Networks  30:  136-­‐145.  

 

12  

5.  Girvan  M,  Newman  MEJ  (2002)  Community  structure  in  social  and  biological   networks.  Proceedings  of  the  National  Academy  of  Sciences  of  the  United   States  of  America  99:  7821-­‐7826.   6.   Fuentes   EJ,   Der   CJ,   Lee   AL   (2004)   Ligand-­‐dependent   dynamics   and   intramolecular   signaling   in   a   PDZ   domain.   Journal   of   molecular   biology   335:  1105-­‐1115.   7.  Fuentes  EJ,  Gilmore  SA,  Mauldin  RV,  Lee  AL  (2006)  Evaluation  of  energetic  and   dynamic  coupling  networks  in  a  PDZ  domain  protein.  Journal  of  molecular   biology  364:  337-­‐351.   8.  Smoot  ME,  Ono  K,  Ruscheinski  J,  Wang  PL,  Ideker  T  (2011)  Cytoscape  2.8:  new   features   for   data   integration   and   network   visualization.   Bioinformatics   27:  431-­‐432.    

 

13