File S4 Figures S1-S11

7 downloads 0 Views 3MB Size Report
and Histogram. Count g101_f17 g104_f2 g107_f6 g110_f6 g119_f6 g124_f12 g126_f4 g137_f5 g138_f10 g140_f8 g143_f2 g155_f5 g156_f4 g165_f6 g166_f3.
A. ##Raw,sequencing,reads,by,library, # # # # # # # # ##B.#

Analysis(ready,reads,aligned,to,the, reference,

Trimmoma?c,

Trimmed,and,filtered,reads,by,library,

GATK,(UnifiedGenotyper),

Raw,variants,

BWA,(aln,,samse,,sampe);,Samtools,(view,,sort,,index),

Reads,mapped,to,a,reference,by,library, Samtools,(merge,,sort,,index),

variant_filter.pl,

Filtered,set,of,variants,

Reads,mapped,to,a,reference, GATK,(FastaAlternateReferenceMaker), Samtools,(rmdup),

Consensus,sequence,

Removed,PCR,duplicates, GATK,(UnifiedGenotyper,,RealignerTargetCreator,,IndelRealigner),

Realignment,of,reads,around,indels, GATK,(BaseRecalibrator,,PrintReads),

Base,quality,recalibra?on,

Samtools,,bcLools,(mpileup,,view),

Read,depth,for,every,posi?on, Consensus,sequence,for,every,gene,for,every, genome,and,posi?ons,with,low,coverage,

Analysis(ready,reads,aligned,to,the, reference,

C. ## Trimmed,and,filtered,reads,by, #library,and,new,references, # # # # # # # ##D.#

Analysis(ready,variants,and,reads, mapped,to,the,reference,

BWA,(aln,,samse,,sampe);,Samtools,(view,,sort,,index),

Reads,mapped,to,new,references,by,library,

HapCompas,

Variants,phased,using,assembly, Samtools,(merge,,sort,,index),

Reads,mapped,to,new,reference, Samtools,(rmdup),

get_halotypes.pl,

Haplotype,sequences,

Removed,PCR,duplicates, GATK,(UnifiedGenotyper,,RealignerTargetCreator,,IndelRealigner),

Realignment,of,reads,around,indels,

Haplotype,sequences,by,assembly, block,

GATK,(BaseRecalibrator,,PrintReads),

Base,quality,recalibra?on, GATK,(UnifiedGenotyper),

Raw,variants, variant_filter.pl,

Filtered,set,of,variants,

Analysis(ready,variants,

 

Figure S1. Haplotype assembly pipeline (Step 2 in Figure 1). Processing proceeds from step A to B to C to D, generating haplotype sequences by assembly blocks and consensus sequences for covered regions of the reference genome. These sequences are subsequently used in Step 3 in Figure 1.

2000

Color Key and Histogram

−2

−1

0

1

Value

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257

Drymocallis_glandulous_index40 americana_CFRA989_index08 xamericana_CFRA988_index41 americana_CFRA554_index47 bracteata_CFRA106_index09 bracteata_CFRA1334_index10 bracteata_CFRA1952_index12 xbracteata_CFRA107_index13 bracteata_CFRA174_index17 bracteata_CFRA456_index29 bucharica_CFRA520_index35 bucharica_CFRA522_index39 bucharica_CFRA1906_index44 californica_CFRA371_index21 xchinensis_CFRA1199_index16 californica_CFRA388_index43 chinensis_CFRA202_index33 xxdaltoniana_CFRA1685_index15 daltoniana_CFRA1685_index40 iinumae_CFRA2025_index34 iinumae_CFRA1853_index35 iinumae_CFRA1849_index30 mandshurica_CFRA1947_index07 mexicana_CFRA61_index05 nilgerrensis_CFRA1188_index20 xxxnilgerrensis_CFRA1223_index28 nilgerrensis_CFRA1358_index32 nilgerrensis_CFRA1224_index45 nipponica_CFRA1864_index07 nipponica_CFRA1009_index33 nipponica_CFRA1862_index38 xpentaphylla_CFRA1913_index11 nipponica_CFRA1863_index42 nubicola_CFRA1797_index27 xxxnubicola_CFRA1797_index08 pentaphylla_CFRA1198_index24 vesca_CFRA612_index04 vesca_CFRA1922_index15 vesca_CFRA1967_index19 viridis_CFRA1812_index22 vesca_CFRA438_index25 viridis_CFRA1256_index02 viridis_CFRA1597_index36 corymbosa_CFRA1911_index19 gracilis_CFRA1973_index20 moupinensis_CFRA1974_index21 orientalis_CFRA1801_index16 tibetica_CFRA1907_index18 moschata_CFRA117_index13 chiloensis_SAL3−1_index09 chiloensis_GP33−1_index10 virginiana_y33b2−12_index12 virginiana_0477−27−1_index14 iturupensis_CFRA1841_index17

Drymocallis_glandulous_index40 americana_CFRA989_index08 americana_CFRA988_index41 x americana_CFRA554_index47 bracteata_CFRA106_index09 bracteata_CFRA1334_index10 bracteata_CFRA1952_index12 bracteata_CFRA107_index13 x bracteata_CFRA174_index17 bracteata_CFRA456_index29 bucharica_CFRA520_index35 bucharica_CFRA522_index39 bucharica_CFRA1906_index44 californica_CFRA371_index21 x californica_CFRA388_index43 chinensis_CFRA202_index33 chinensis_CFRA1199_index16 daltoniana_CFRA1685_index15 x daltoniana_CFRA1685_index40 iinumae_CFRA2025_index34 iinumae_CFRA1853_index35 x iinumae_CFRA1849_index30 mandshurica_CFRA1947_index07 mexicana_CFRA61_index05 x nilgerrensis_CFRA1188_index20 nilgerrensis_CFRA1223_index28 x nilgerrensis_CFRA1358_index32 x nilgerrensis_CFRA1224_index45 nipponica_CFRA1864_index07 x nipponica_CFRA1009_index33 nipponica_CFRA1862_index38 x nipponica_CFRA1863_index42 nubicola_CFRA1797_index08 x nubicola_CFRA1797_index27 pentaphylla_CFRA1913_index11 x pentaphylla_CFRA1198_index24 x vesca_CFRA612_index04 x vesca_CFRA1922_index15 vesca_CFRA1967_index19 viridis_CFRA1812_index22 vesca_CFRA438_index25 viridis_CFRA1256_index02 viridis_CFRA1597_index36 corymbosa_CFRA1911_index19 gracilis_CFRA1973_index20 moupinensis_CFRA1974_index21 orientalis_CFRA1801_index16 tibetica_CFRA1907_index18 moschata_CFRA117_index13 chiloensis_SAL3-1_index09 chiloensis_GP33-1_index10 virginiana_y33b2-12_index12 virginiana_0477-27-1_index14 iturupensis_CFRA1841_index17

1500

Color Key and Histogram

0

Count

0

Count

A.! ! ! ! ! ! ! ! ! ! ! ! ! ! B.!

−2

0

Value

2

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257

Drymocallis_glandulous_index40 americana_CFRA989_index08 americana_CFRA988_index41 bracteata_CFRA106_index09 bracteata_CFRA1334_index10 bracteata_CFRA1952_index12 bracteata_CFRA107_index13 bracteata_CFRA456_index29 bucharica_CFRA520_index35 bucharica_CFRA522_index39 bucharica_CFRA1906_index44 californica_CFRA371_index21 chinensis_CFRA202_index33 chinensis_CFRA1199_index16 daltoniana_CFRA1685_index15 iinumae_CFRA2025_index34 iinumae_CFRA1853_index35 mandshurica_CFRA1947_index07 mexicana_CFRA61_index05 nilgerrensis_CFRA1223_index28 nipponica_CFRA1864_index07 nipponica_CFRA1862_index38 nubicola_CFRA1797_index08 pentaphylla_CFRA1913_index11 vesca_CFRA1967_index19 viridis_CFRA1812_index22 vesca_CFRA438_index25 viridis_CFRA1256_index02 viridis_CFRA1597_index36 corymbosa_CFRA1911_index19 gracilis_CFRA1973_index20 moupinensis_CFRA1974_index21 orientalis_CFRA1801_index16 tibetica_CFRA1907_index18 moschata_CFRA117_index13 chiloensis_SAL3−1_index09 chiloensis_GP33−1_index10 virginiana_y33b2−12_index12 virginiana_0477−27−1_index14 iturupensis_CFRA1841_index17

Drymocallis_glandulous_index40 americana_CFRA989_index08 americana_CFRA988_index41 bracteata_CFRA106_index09 bracteata_CFRA1334_index10 bracteata_CFRA1952_index12 bracteata_CFRA107_index13 bracteata_CFRA456_index29 bucharica_CFRA520_index35 bucharica_CFRA522_index39 bucharica_CFRA1906_index44 californica_CFRA371_index21 chinensis_CFRA202_index33 chinensis_CFRA1199_index16 daltoniana_CFRA1685_index15 iinumae_CFRA2025_index34 iinumae_CFRA1853_index35 mandshurica_CFRA1947_index07 mexicana_CFRA61_index05 nilgerrensis_CFRA1223_index28 nipponica_CFRA1864_index07 nipponica_CFRA1862_index38 nubicola_CFRA1797_index08 pentaphylla_CFRA1913_index11 vesca_CFRA1967_index19 viridis_CFRA1812_index22 vesca_CFRA438_index25 viridis_CFRA1256_index02 viridis_CFRA1597_index36 corymbosa_CFRA1911_index19 gracilis_CFRA1973_index20 moupinensis_CFRA1974_index21 orientalis_CFRA1801_index16 tibetica_CFRA1907_index18 moschata_CFRA117_index13 chiloensis_SAL3-1_index09 chiloensis_GP33-1_index10 virginiana_y33b2-12_index12 virginiana_0477-27-1_index14 iturupensis_CFRA1841_index17

Figure S2. Heatmap depicting the fraction of gene length not covered by sequencing. The threshold for depth of coverage was set at 3 reads per position. The fraction depicted is normalized and standardized across all the genomes by gene. Colors represent individual values as described in the color key. Histogram in the color key indicates the distribution of the colors in the heatmap. A. All 54 genomes. Excluded genomes that had average normalized, standardized fractions of genes uncovered over 0.25 when averaged across all the genes, are marked by “x” before their names. B. The 40 retained genomes only.

400 0

Count

Color Key and Histogram

0

0.01

0.02

Value

0.03

g101_f17 g104_f2 g107_f6 g110_f6 g119_f6 g124_f12 g126_f4 g137_f5 g138_f10 g140_f8 g143_f2 g155_f5 g156_f4 g165_f6 g166_f3 g167_f2 g16_f1 g179_f8 g17_f20 g180_f1 g184_f3 g186_f6 g18_f11 g192_f4 g196_f48 g196_f6 g196_f91 g199_f7 g19_f10 g19_f2 g1_f10 g202_f4 g21_f4 g23_f8 g2_f8 g31_f17 g33_f5 g33_f8 g35_f14 g37_f11 g37_f5 g40_f8 g41_f5 g43_f1 g47_f1 g4_f3 g4_f6 g52_f5 g53_f10 g54_f3 g5_f7 g60_f2 g68_f1 g73_f4 g79_f4 g80_f10 g80_f6 g82_f16 g84_f3 g85_f6 g88_f11 g88_f5 g88_f9 g8_f3 g90_f10 g97_f1 g98_f6 g98_f9 g9_f18

Drymocallis F. vesca bucharica F. chinensis daltoniana F. iinumae mandshurica F. nilgerrensis nipponica F. .viridis nubicola pentaphylla F .corymbosa F .gracilis .moupinensis F .moschata .orientalis .tibetica F .chiloensis F .iturupensis .virginiana

Drymocallis F. vesca F. bucharica F. chinensis F. daltoniana F. iinumae F. mandshurica F. nilgerrensis F. nipponica F. nubicola F. pentaphylla F .viridis F .corymbosa F .gracilis F .moupinensis F .orientalis F .tibetica F .moschata F .chiloensis F .iturupensis

g101_f17 g104_f2 g107_f6 g110_f6 g119_f6 g124_f12 g126_f4 g137_f5 g138_f10 g140_f8 g143_f2 g155_f5 g156_f4 g165_f6 g166_f3 g167_f2 g16_f1 g179_f8 g17_f20 g180_f1 g184_f3 g186_f6 g18_f11 g192_f4 g196_f48 g196_f6 g196_f91 g199_f7 g19_f10 g19_f2 g1_f10 g202_f4 g21_f4 g23_f8 g2_f8 g31_f17 g33_f5 g33_f8 g35_f14 g37_f11 g37_f5 g40_f8 g41_f5 g43_f1 g47_f1 g4_f3 g4_f6 g52_f5 g53_f10 g54_f3 g5_f7 g60_f2 g68_f1 g73_f4 g79_f4 g80_f10 g80_f6 g82_f16 g84_f3 g85_f6 g88_f11 g88_f5 g88_f9 g8_f3 g90_f10 g97_f1 g98_f6 g98_f9 g9_f18

F .virginiana

Figure S3. Distance between haplotypes calculated under the Kimura (1980) substitution model and averaged across all the comparisons for a given fragment and species. Fragments are in columns and are marked with gene number and fragment number; species are in rows. Colors represent individual values as described in the color key. The histogram in the color key indicates the distribution of the colors in the heatmap.

Input: map = matrix, mapping positions of consensus sequences from different organisms onto each other, generated from the multiple sequence alignment blocks = for each individual it is a set of blocks where haplotypes were assembled, positions correspond to position in consensus reference sequence Output: haplotype_fragmets = regions with continuous haplotype assembly across all of the genomes blocks_new_borders = obtain new positions of haplotype assembly blocks by individual using map total_positions = number of columns in the map haplotype_fragmets = empty list set position to 1 while position