Supplementary Information for - PNAS

1 downloads 0 Views 2MB Size Report
2992m. 94.013°E. 30.052°N. 2. KIZ010938. E1. Nanorana parkeri. 33.37. 29.82. 0.89. 13.51. China: Tibet Autonomous Region, Nyingchi County, Lulang. 2900m.
Supplementary Information for Selection and environmental adaptation along a path to speciation in the Tibetan frog Nanorana parkeri Guo-Dong Wanga,b,1, Bao-Lin Zhanga,c,1, Wei-Wei Zhoua,d, Yong-Xin Lia,c, Jie-Qiong Jina, Yong Shaoa, He-chuan Yange, Yan-Hu Liuf, Fang Yana, Hong-Man Chena, Li Jing, Feng Gaoh, Yaoguang Zhangg, Haipeng Lih,b, Bingyu Maoa,b, Robert W. Murphya,i, David B. Wakej,2, Ya-Ping Zhanga,b,2, and Jing Chea,b,d,2

a

State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; b Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; c Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, Yunnan; China; d Southeast Asia Biodiversity Research Institute, Chinese Academy of Sciences, Yezin, 05282 Nay Pyi Taw, Myanmar; e Human Genetics, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 138672, Singapore; f Laboratory for Conservation and Utilization of Bio-Resources & Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming 650091, China; g Key Laboratory of Freshwater Fish Reproduction and Development of the Ministry of Education and Key Laboratory of Aquatic Science of Chongqing, Southwest University School of Life Sciences, Chongqing 400715, China; h CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;

www.pnas.org/cgi/doi/10.1073/pnas.1716257115

i

Centre for Biodiversity and Conservation Biology, Royal Ontario Museum, Toronto, ON, Canada M5S 2C6; j Department of Integrative Biology and Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720-3160, USA.

1

G.-D.W. and B.-L.Z. contributed equally to this work. To whom correspondence may be addressed. Email: [email protected], [email protected], or [email protected]. 2

This PDF file includes: Supplementary text Figs. S1 to S18 Tables S1 to S15 References for SI reference citations

Supplementary Information Text

1.1 Estimation of population size change after Last Glacial Maximum (LGM, ~20,000 years ago) and coalescent simulations The PSMC approach indicated that all five populations of Tibet frog expanded recently (See SI Appendix, fig. 4). Because this method lacks power more recently than ~20,000 years (1), and the G-PhoCS approach cannot simulate change in Ne for a single population (2), we used fastsimcoal2 (3) to estimate Ne expansion within the last 30,000 years. The divergence time, migration rate, and ancestral Ne in fastsimcoal2 30,000 years ago were set by demographic parameters inferred from G-PhoCS. The beginning time of recent population expansion was set to 8,000 years ~ 30,000 years (1,600 ~ 6,000 generations) based on the result of PSMC. The current Ne and 95% credibility intervals were inferred by G-PhoCS, with Ne before expansion. A relatively large expansion range (100 times of beginning Ne) was allowed for each population. Simulations were performed 50 times and the best-fit of results was chosen based on the maximum likelihood value. For each run, demographic estimates were obtained from 100,000 simulations (-n 100,000) and 40 Expectation/Conditional Maximization cycles (-L40) per parameter file.

In the best result of fastsimcoal2, five populations had undergone various degrees of expansion from 14,000 years to 29,000 years. The Ne of E1, E2 and E3 expanded 1.67 to 3.00 fold, and those of E4 expanded 15.19 times and W expanded 67.32 fold, respectively (See SI Appendix, table S15 and Fig. S16). Based on the expansion model, we performed a neutral coalescent simulation and again calculated the PBSW, finding the distribution of the PBS value similar to that obtained with the G-PhoCS

(See SI Appendix, fig. Fig. S17), the values for average and median PBSw under the expansion model were 0.4017 and 0.3879, respectively, slightly smaller than those under the G-PhoCS model (mean: 0.4202, median: 0.4046). The value of the Fst tends to be slightly smaller when the population has undergone expansion, because of the effect of genetic drift (4). Using the new model assuming recent population expansion, we re-calculated all population genomic parameters as in the text. Further statistical analysis produced results identical to those obtained earlier (See SI Appendix, fig. S18). These results indicate that our inference from HDRs based on cut-off values of the upper 2.5% PBS is reliable and robust to recent population expansion.

1.2 Control file of G-PhoCS. GENERAL-INFO-START seq-file trace-file locus-mut-rate

nutral_1k_1000.txt nutral_1k_1000.log VAR 10.0

burn-in 0 mcmc-iterations 2000000 mcmc-sample-skip 19 start-mig 0 iterations-per-log 20 logs-per-line 20

# # # # # # #

find-finetunes TRUE find-finetunes-num-steps 2500 finetune-coal-time 0.01 finetune-mig-time 0.3 finetune-theta 0.04 finetune-mig-rate 0.02 finetune-tau 0.0000008 finetune-mixing 0.003 finetune-locus-rate 0.3

tau-theta-print # tau-theta-alpha of 100% # tau-theta-beta

1000.0 1.0

# for STD/mean ratio

10000.0

# for mean of 1e-4

mig-rate-print mig-rate-alpha mig-rate-beta

1.0 4.0 0.002

GENERAL-INFO-END CURRENT-POPS-START POP-START name samples theta-alpha theta-beta POP-END

GRP1 KIZ05992 d KIZ08194 d KIZ08133 d 2.0 2000

POP-START name samples theta-alpha theta-beta POP-END

GRP2 KIZYPX6143 d KIZ05613 d KIZ010938 d 2.0 2000

POP-START name samples theta-alpha theta-beta POP-END

GRP3 KIZ08359 d KIZ08398 d KIZ08351 d 2.0 2000

POP-START name samples theta-alpha theta-beta POP-END

GRP4 KIZ012631 d KIZ012640 d KIZ02247 d 2.0 2000

POP-START name samples theta-alpha theta-beta POP-END

GRP5 KIZ08106 d KIZ08116 d KIZYPX31094 d 2.0 2000

CURRENT-POPS-END ANCESTRAL-POPS-START POP-START name children

GRP45 GRP4 GRP5

theta-alpha theta-beta tau-alpha tau-beta POP-END

2.0 2000 0.1 100000

POP-START name GRP345 children GRP3 GRP45 theta-alpha 2.0 theta-beta 2000 tau-alpha 0.1 tau-beta 1000 POP-END POP-START name GRP2345 children GRP2 GRP345 theta-alpha 2.0 theta-beta 2000 tau-alpha 0.3 tau-beta 1000 POP-END POP-START name GRP12345 children GRP1 GRP2345 theta-alpha 2.0 theta-beta 2000 tau-alpha 1.0 tau-beta 1000 POP-END ANCESTRAL-POPS-END MIG-BANDS-START BAND-START source GRP4 target GRP5 BAND-END BAND-START source GRP2 target GRP4 BAND-END BAND-START source GRP3 target GRP5

BAND-END BAND-START source GRP3 target GRP4 BAND-END BAND-START source GRP5 target GRP4 BAND-END BAND-START source GRP4 target GRP2 BAND-END BAND-START source GRP5 target GRP3 BAND-END BAND-START source GRP5 target GRP1 BAND-END BAND-START source GRP4 target GRP3 BAND-END BAND-START source GRP345 target GRP1 BAND-END BAND-START source GRP2345 target GRP1 BAND-END BAND-START source GRP1 target GRP5 BAND-END BAND-START source GRP1 target GRP345 BAND-END

BAND-START source GRP1 target GRP2345 BAND-END MIG-BANDS-END

1.3 Simulation command lines under a range of possible demographic models. 1.3.1 The command of coalescent simulation under the full set of demographic parameters derived from G-PhoCS analyses. ms 60 50000000 -t 0.0006758 -I 5 12 12 12 12 12 \ -n 1 0.236312518496597 -n 2 0.365788694880142 -n 3 0.514205386208938 -n 4 0.346552234388872 -n 5 1 \ -m 2 1 1.2480958236 -m 1 2 1.048192832 \ -m 4 2 9.07224709448 -m 2 4 1.85090949118 \ -m 3 1 1.3407885516 -m 1 3 0.8220241976 \ -m 5 1 1.05602089372 -m 1 5 0 \ -m 3 2 0.91914267222 -m 2 3 0.88110412036 \ -ej 0.0451775673276117 2 1 -en 0.0451775673276117 1 0.657295057709381 \ -ej 0.052681266646937 3 1 -en 0.052681266646937 1 0.971589227582125 \ -em 0.052681266646937 1 5 0 -em 0.052681266646937 5 1 0.47844768034 \ -ej 2.0864160994377 4 1 -en 2.0864160994377 1 0.315329979283812 \ -em 2.0864160994377 1 5 0 -em 2.0864160994377 5 1 0 \ -ej 8.81473808819177 5 1 -en 8.81473808819177 1 0.714116602545132

1.3.2 Simulation without the post-divergence migration E234 to W. ms 60 50000000 -t 0.0006758 -I 5 12 12 12 12 12 \ -n 1 0.236312518496597 -n 2 0.365788694880142 -n 3 0.514205386208938 -n 4 0.346552234388872 -n 5 1 \ -m 2 1 1.2480958236 -m 1 2 1.048192832 \ -m 4 2 9.07224709448 -m 2 4 1.85090949118 \ -m 3 1 1.3407885516 -m 1 3 0.8220241976 \ -m 5 1 1.05602089372 -m 1 5 0 \ -m 3 2 0.91914267222 -m 2 3 0.88110412036 \ -ej 0.0451775673276117 2 1 -en 0.0451775673276117 1 0.657295057709381 \ -ej 0.052681266646937 3 1 -en 0.052681266646937 1 0.971589227582125 \ -em 0.052681266646937 1 5 0 -em 0.052681266646937 5 1 0 \ -ej 2.0864160994377 4 1 -en 2.0864160994377 1

0.315329979283812 \ -em 2.0864160994377 1 5 0 -em 2.0864160994377 5 1 0 \ -ej 8.81473808819177 5 1 -en 8.81473808819177 1 0.714116602545132

1.3.3 Simulation without current and post-divergence migration from E to W. ms 60 50000000 -t 0.0006758 -I 5 12 12 12 12 12 \ -n 1 0.236312518496597 -n 2 0.365788694880142 -n 3 0.514205386208938 -n 4 0.346552234388872 -n 5 1 \ -m 2 1 1.2480958236 -m 1 2 1.048192832 \ -m 4 2 9.07224709448 -m 2 4 1.85090949118 \ -m 3 1 1.3407885516 -m 1 3 0.8220241976 \ -m 5 1 0 -m 1 5 0 \ -m 3 2 0.91914267222 -m 2 3 0.88110412036 \ -ej 0.0451775673276117 2 1 -en 0.0451775673276117 1 0.657295057709381 \ -ej 0.052681266646937 3 1 -en 0.052681266646937 1 0.971589227582125 \ -em 0.052681266646937 1 5 0 -em 0.052681266646937 5 1 0 \ -ej 2.0864160994377 4 1 -en 2.0864160994377 1 0.315329979283812 \ -em 2.0864160994377 1 5 0 -em 2.0864160994377 5 1 0 \ -ej 8.81473808819177 5 1 -en 8.81473808819177 1 0.714116602545132

1.3.4 Simulation with recent population expansion after Last Glacial Maximum (LGM, ~20,000 years ago)(Figure S16). ms 60 50000000 -t 0.0006758 -I 5 12 12 12 12 12 \ -n 1 3.544687777 -n 2 0.73157739 -n 3 1.028410772 -n 4 1.039656703 -n 5 67 \ -en 0.022965374 1 0.236312518496597 -en 0.022965374 2 0.365788694880142 -en 0.022965374 3 0.514205386208938 -en 0.022965374 4 0.346552234388872 -en 0.022965374 5 1 \ -m 2 1 1.2480958236 -m 1 2 1.048192832 \ -m 4 2 9.07224709448 -m 2 4 1.85090949118 \ -m 3 1 1.3407885516 -m 1 3 0.8220241976 \ -m 5 1 1.05602089372 -m 1 5 0 \ -m 3 2 0.91914267222 -m 2 3 0.88110412036 \ -ej 0.0451775673276117 2 1 -en 0.0451775673276117 1 0.657295057709381 \ -ej 0.052681266646937 3 1 -en 0.052681266646937 1 0.971589227582125 \ -em 0.052681266646937 1 5 0 -em 0.052681266646937 5 1 0.47844768034 \ -ej 2.0864160994377 4 1 -en 2.0864160994377 1 0.315329979283812 \

-em 2.0864160994377 1 5 0 -em 2.0864160994377 5 1 0 \ -ej 8.81473808819177 5 1 -en 8.81473808819177 1 0.714116602545132

Fig. S1. Species-tree inferred using the maximum pseudo-likelihood coalescent method (MP-EST). Numbers beside nodes are bootstrap support values.

Fig. S2. Species-tree inferred using average ranks of coalescence (STAR). Numbers beside nodes are bootstrap support values.

Fig. S3. Tree topology inferred from TreeMix when 1 migratory tract is allowed.

Fig. S4. Trajectories of demographic history inferred by pairwise sequential Markovian coalescent (PSMC). Because PSMC yields high rates of false-negatives at low sequence coverage, only individuals with high coverage (≥15X) genome sequences are included. Samples of E4 from Lhasa River valley (LRV) and outside of it that have the highest sequence coverage are indicated. The early stage of Last Glacial Maximum (MIS 4, 30~70 Ka), the Penultimate glaciation (MIS6, 100~200 Ka) and the Naynayxungla glaciation (MIS 14~18, 500 ~720 Ka) are shaded in light blue.

Fig. S5. Genomic divergence between W and groups of E of Nanorana parkeri. (A). Venn diagram shows overlapping regions of high differentiation (top 2.5% FST distribution) between W and E1–4, which are highly correlated (P