SUPPLEMENTARY MATERIAL Fractional Acetylation of gene bodies

0 downloads 0 Views 4MB Size Report
Aug 12, 2014 - In order to understand the distribution of acetylations within genic regions, we defined two categories- 100bp bins that are accurately classified ...
SUPPLEMENTARY MATERIAL

Fractional Acetylation of gene bodies

In order to understand the distribution of acetylations within genic regions, we defined two categories- 100bp bins that are accurately classified as genic using just acetylations and those that are accurately classified using all 24 marks but not acetylations, both at the same false positive rate. Overall, it was found that, 99% of all the active refseq genes in IMR90 and 96.54% in H1 could be atleast partially recovered using acetylations, which is comparable to that achieved using all 24 modifications. However, when we looked at the distribution of fractions of genes recovered by either case, it was found that all 24 marks leads to 90-100% recovery of most genes, while the fractions recovered by just acetylations appear to be more evenly distributed (Supplementary Figure 3A,B).

In order to examine the bias of acetylations towards exon-intron boundaries, we compared the proportion of acetylated bins recovered within exonic regions to the proportion recovered without acetylations, and found this difference to be highly significant in both H1 and IMR90 (pvalue85% using all modifications and >70% using just acetylations (Supplementary Figure 5B). Hence, it would appear that all of these exon-intron junctions are either enriched or depleted in acetylations with respect to the gene body. In order to investigate this, we look at the Z-score normalized profile of histone modifications (Supplementary Figure 5D) as in H1. Cluster 1 and cluster 2 appear to be enriched with respect to genic background, while cluster 3 and cluster 4 appear to be depleted.

 

2  

Hence, in both H1 and IMR90, we observe categories of exon-intron boundaries with varying degrees of enrichment or depletion of histone acetylations with respect to other genic regions.

SUPPLEMENTARY TABLES

Table S1. GO terms for acetylation-enriched genes in H1 Table S2.GO terms for acetylation-enriched genes in IMR90 Available online at http://enhancer.ucsd.edu/nisha/ac_paper_supp_tables

SUPPLEMENTARY FIGURE LEGENDS

Figure S1.Differential histone modifications between enhancers and promoters. A,B.) Preference of various histone modifications from -1kb to +1kb around an enhancer or promoter using a Z-score normalized score (blue) as compared to the randomly shuffled class labels (red) in A.) H1 B.) IMR90 C.) Ordering of histone acetylations by their out-of-bag variable importance in classification of enhancers against promoters in IMR90.

Figure S2. Genome-wide prediction of promoters and enhancers. A,B) Ordering of all 24 histone modifications by their out-of-bag variable importance for the prediction of promoters in A.)H1 and B.)IMR90. C,D)Validation rates and E,F) misclassification rates for enhancer predictions at various voting percentage cutoffs in C,E.)H1 and D,F.) IMR90 using all 24 modifications(blue), H3K4me1/2/3 (red), H3K4me1,H3K4me3 and H3K27ac(cyan) and 15 acetylations(green).

Figure S3. Recovery of genic regions using acetylations

 

3  

A,B.) Fraction of gene body predicted in A.)H1 and B.)IMR90 using all 24 modifications(blue) or just acetylations(red). C,D.) Fraction of predicted 100bp bins lying at various distances from exon-intron boundaries using all 24 modifications(blue) or just acetylations(red) in C.)H1 and D.)IMR90. E,F.) Distance of predicted 100bp bins from exon-intron boundaries versus activity of exons in E.)H1 and F.)IMR90.

Figure S4. Acetylations within the gene body distal to exon-intron boundaries and DNAse-I hypersensitive sites in H1 A.) ROC curves showing classification of such distal genic regions using all 24 modifications(blue) or only 15 acteylations(green).B.) Variable Importance of acetylations in classification of distal genic regions C.)Heatmap showing enrichment of acetylations in genic regions as compared to intergenic ones using a Z-score normalized measure. D.)UCSC genome browser snapshot of gene PTPRJ showing enrichment of acetylations as compared to neighbouring intergenic region. E,F.) Gene expression levels versus fractional enrichment of acetylations within the gene body in E.)H1 and F.)IMR90.

Figure S5. Enrichment of acetylations at exon-intron boundaries for each chromatin state (Fig.5A,B) with respect to genic background A,B.) Maximum classification accuracy of each state against genic background using all 24 modifications(blue) or 15 acetylations (red) in A.) H1 and B.) IMR90. C,D.) Histone acetylation RPKM levels at exon-intron boundaries Z-score normalized against levels at genic regions, distal to DNAse-I HS and exon intron boundaries, in C.) H1 and .D) IMR90.

Figure S6. Association of chromatin modification patterns with splice-site usage A,B) ROC curves showing classification of A.) Group I class of exon intron junctions or B.) Group II exon intron junctions against constitutive background, at various distances for filtering

 

4  

either constitutive exons neighbouring alternative ones (indicated by CSXkb in the legend) or at distances for filtering the alternative exons for the ones proximal to constitutive exons (indicated by Altxkb) . Filtering distance parameter is increased in increments of 2.5 kb from 0 to 10kb.C.) Association of various types of splicing events with chromatin states in IMR90(Fig.5B). D.) Association of various types of splicing events with chromatin states in H1(Fig.5A).

Figure S7. “Promoter-like” chromatin states are associated with various splice variants. Snapshots from UCSC genome browser of A.) PLEKH3 showing alternative splicing of several exons in H1 as compared to IMR90 B.) VIM showing altternative splicing of several exons in IMR90 as compared to H1.

SUPPLEMENTARY REFERENCES Breiman, L. 2001. Random Forests. Machine Learning 45: 5-32.

 

5  

K20a c H2A K5ac H3K 23ac H2B K12a c H3K 14ac H3K 18ac H2B K5ac H2B K120 ac H2B K15a c H3K 4ac H3K 27ac H3K 56ac H4K 91ac H4K 5ac

H2B

9ac

H3K

C. Average Variable Importance Histone modifications ordered by classification accuracy H3K 4 H3K me3 4me H3K 2 H3K 9ac H3K4me1 H3K 56ac H3K 27ac 79 H3K me1 1 H3K 4ac 2 H3K 3ac 18a H4K c H2B 5ac K2 H4 0ac H4K K91ac 20 H2B me1 K5ac H3 H3KK9me3 H2B 79me2 K120 a H3K c H2B 4ac H2BK12ac H3K K15ac 27 H2 me3 H3KAK5ac 36m e3

Mean Z-score normalized RPKM read counts TSS Enh

H1

0.6

Real Random

0.4

Permuted replicates Avg. RPKM

3

2

1 TSS

A.

0.2

0

Histone modifications ordered by classification accuracy

H3K 4 H3K me3 4m H3K e2 H3K 9ac H2B 4me1 K H2B 20ac K H3K 12ac 2 H2B 7me3 K15a c H3 H2B K18ac K120 H3K ac H2B 27ac K H4K 5ac 5 H3K ac 9 H3K me3 H3K 14ac 79m e2 H2 H3K AK5ac 36m H3K e3 H3K 4ac H4K 23ac 20 H4K me1 9 H3K 1ac H3K 56ac 79m e1

0.2 Enh

Supplementary Figure 1 B. IMR90

0.6 Real Random

0.4

0.2 0

0.2

0.4

0.6

Supplementary Figure 2

Average variable importance

2

Permuted replicates Avg. RPKM

1

0

C.

IMR90 TSS

B.

H1 TSS

IMR90 enhancers 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Ac All 24 me1/me2/me3 me1/me3/27ac

50 10 40 30 20 Number of enhancers (X1000)

H1 enhancers

0.08

Ac All 24 me1/me2/me3 me1/me3/27ac

0

20 40 60 Number of enhancers (X1000)

80

IMR90 enhancers

F. Misclassification Rate

Misclassification Rate

0.08

0.06

0.06

0.04

0.04

0.02 0

0

Validation Rate

Validation Rate

0.9

E.

1

D.

H1 enhancers

0

Permuted replicates Avg. RPKM

Average variable importance

A.

0.02

50 10 40 30 20 Number of enhancers (X1000)

0

20 40 60 Number of enhancers (X1000)

80

Supplementary Figure 3

0.4

0.2

0.8

Fraction of gene recovered

H1 0.3

1

Recovered by: Ac only 24 marks/no Ac

0.1

0

1

2

3

4

Distance from exon-intron boundary (in kb)

H1

F. Recovered by: Ac only 24 marks/no Ac

24 20

16 12

1

2

3

4

Distance from center of exon (in kb)

0.5

0

0

0.4

0.2

0.6

0.8

Fraction of gene recovered

1

IMR90

D.

0.2

Average exonic expression(FPKM)

Fraction of genic regions recovered

0.6

Fraction of genic regions recovered

0

C.

E.

Fraction of total genes recovered

0.4

0.9

Recovered by: Ac only 24 marks/no Ac

0.4

0.3 0.2 0.1 0

1

3

2

4

Distance from exon-intron boundary (in kb)

Average exonic expression(FPKM)

Fraction of total genes recovered

0.8

0

IMR90

B.

H1

A.

IMR90

Recovered by: Ac only 24 marks/no Ac

10 8 6 4 2

1

2

3

4

Distance from center of exon (in kb)

Supplementary Figure 5 H1

B.

1

Maximum classification accuracy

Maximum classification accuracy

A.

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2 1

D.

Z-score normalized RPKM with respect to genic background

C.

IMR90

1

2

4 3 Cluster Number

5

6

All 24 Ac

H1

6 4

1

2 3 Cluster Number

4

All 24 Ac

cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6

--- line for zero enrichment

2 0 IMR90

6 4 0

cluster 1 cluster 2 cluster 3 cluster 4

--- line for zero enrichment

Supplementary Figure 6 IMR90 Group I vs CS

True Positive Rate

1.0

0.5

B.

Filtering distance 0 Alt.2.5kb CS 2.5kb CS 5kb CS 7.5kb CS 10kb LOD

0

0.5 False Positive Rate

C.

IMR90 Group II vs CS

1.0 True Positive Rate

A.

0.5

Filtering distance 0 CS.2.5kb Alt 2.5kb Alt 5kb Alt 7.5kb Alt10kb LOD

0

1.0

0.5 False Positive Rate

Alternative

log(enrichment of alt vs consti exons)

IMR90

0

2 3 Cluster Number

4

AA active/upstream AA inactive/dstream AD active/dstream AD inactive/upstream Alternative (0 to 0.9) Intronic Retention(> 0.9) Alternative

H1

0 1

2

3 4 Cluster Number

5

6

Constitutive

log(enrichment of alt ves consti exons)

1

Constitutive

D.

1.0

Supplementary Figure 7 A.

H1_K4me3 IMR_K4me3 H1_K36me3 IMR_K36me3 H1_K27ac IMR_K27ac

B.

H1_K4me3 IMR_K4me3 H1_K36me3 IMR_K36me3 H1_K27ac IMR_K27ac