Exploring microbial dark matter to resolve the deep ...

3 downloads 94 Views 259KB Size Report
for Soil (MP Biomedicals, Solon, Ohio, USA) and quantified by NanoDrop spectrophotometer (Thermo. Scientific). Libraries were constructed using the Nextera ...
Supplementary material for Saw et al, 2015, Exploring microbial dark matter to resolve the deep archaeal ancestry of eukaryotes, Phil. Trans. R. Soc. B. doi: 10.1098/rstb.2014.0328

Exploring microbial dark matter to resolve the deep archaeal ancestry of eukaryotes Sampling sites Environmental samples discussed in this study were collected from three locations. Hot spring sediment sample 10Y13 was collected from a hot spring near Lower Culex Basin (44°34'23.0"N 110°47'40.5"W) in Yellowstone National Park (YNP), USA. The temperature of the hot spring from which the sample was taken was 68.8°C with pH of 8.6. Hot spring sediments appeared light brown in colour. Deep-sea sediment sample LCGC14 was subsampled at 75 cm below sea floor from a 2meter-long gravity core retrieved from the Arctic Mid-Ocean Spreading Ridge (73°45'47.4"N 8°27'50.4"W) at a depth of 3283 meters below sea level [1]. LHC4 sample was taken from a hot spring located in the Long Valley Caldera, near Mammoth Lakes, CA, USA (37°41'26.2"N 118°50'39.2"W). The temperature of this hot spring was around 80°C with near-neutral pH and black sediment [2].

Amplicon data Amplification and sequencing of the V4 region of 16S rRNA genes of Bacteria and Archaea from LHC4 sample was performed as described in Kozich et al [3] with the following modifications: to be more inclusive of several archaeal lineages, forward primer 515F was modified to contain a C or T at the 4th position from the 5' end (5'-GTGYCAGCMGCCGCGGTAA-3’), and a corresponding modification was made to the read 1 sequencing primer. PCR was conducted using 5 Prime HotMasterMix DNA polymerase (#2200410, 5 Prime Inc., Gaithersburg, MD, USA) and included 33 amplification cycles. Amplicons were sequenced on the Illumina MiSeq platform at Micro-Seq Enterprises (Las Vegas, NV, USA). For Loki’s Castle sediment and Yellowstone hot spring sediment samples, ‘universal’ A519F (5’-CAGCMGCCGCGGTAA-3’) and U1391R (5’-ACGGGCGGTGWGTRC-3’) primers were used to amplify ~900 bp fragment of the 16S rRNA gene spanning V3 to V8 regions. Detailed methods for PCR conditions and library construction are described in a previous study [1]. Amplicons were generated with Illumina MiSeq instrument.

©  The  Authors  under  the  terms  of  the  Creative  Commons  Attribution  License  http://creativecommons.org/licenses/by/3.0/,  which   permits  unrestricted  use,  provided  the  original  author  and  source  are  credited.  

A total of 118 Mbp, 506 Mbp, and 21 Mbp of raw Illumina MiSeq amplicon data were generated for Loki’s Castle, Yellowstone, and LHC4 samples, respectively. The 5’ and 3’ regions of the amplicons were extracted by scanning for respective forward and reverse primers in read pairs using a custom Python script. Amplicon reads with average Phred quality below 30 were discarded from the analysis. As the amplicon size was roughly 900 bp, the read pairs could not be merged and were analyzed separately. In this analysis, only the 5’ end of the amplicon extracted from both read pairs was used for estimation of OTU and phylum-level diversity. UPARSE pipeline [4] was used for further quality filtering, clustering of OTUs and chimera filtering. Resulting OTUs were classified by searching against Silva 16S rRNA data (release 119) [5] using BLASTn [6] with maximum E-value cutoff of 1e

-5

and minimum identity threshold of 85%.

Analysis of SAG data SAG data from YNP was generated by first obtaining cell fractions from hot spring sediment samples using Nycodenz gradient centrifugation, FACS for sorting individual cells into 384-well microtiter plates, alkaline lysis and multiple displacement amplification (MDA) at the Bigelow Single Cell Genomic Center. Sequencing libraries and reads were generated by the SNP&SEQ sequencing facility at Uppsala University. Total raw Illumina HiSeq data (paired-end 2x100 bp) generated for the two SAGs ranged from 1.5 Gbp for MCG SAG (10Y13-A3) to 2.0 Gbp for MCG SAG (10Y13-F10). Detailed methods for single-cell isolation and generation of SAG data for the Korarchaeon from LHC4 sample has previously been described in a study by Dodsworth et al [7]. One microliter of the original SAG DNA of the Korarchaeon was used to perform another MDA reaction to generate additional SAG DNA for library preparation. MDA reaction was performed using Qiagen REPLI-g Mini kit and purified with QIAamp DNA Mini kit (Qiagen, Hilden, Germany). A sequencing library was generated from 1 ng of the resulting SAG DNA with NexteraXT library preparation kit (Illumina, San Diego, CA, USA) and the library was sequenced with Illumina MiSeq instrument. Total raw Illumina MiSeq data (2x300 bp) generated for the Korarchaeon SAG was 4.9 Gbp. Fastqc tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to assess sequence

quality

and

Illumina

adapters

were

trimmed

using

Scythe

(https://github.com/vsbuffalo/scythe). The qualities of adapter-trimmed reads were further improved using Sickle (https://github.com/najoshi/sickle) to only keep reads with Phred quality scores higher

than 30. Resulting trimmed and quality-filtered reads were assembled with SPAdes version 3.1.1 [8] using the flags “--sc” and “--careful”, as well as default k-mers, i.e. 21, 33, and 55 for the HiSeq data and 21, 33, 55, 75, 101, and 125 for the MiSeq data.

Generation of metagenomic data DNA was extracted from sediment from LHC4, collected on 01 May 2011, using the FastDNA Spin Kit for Soil (MP Biomedicals, Solon, Ohio, USA) and quantified by NanoDrop spectrophotometer (Thermo Scientific). Libraries were constructed using the Nextera DNA Sample Preparation Kit (Illumina). Two separate sequencing libraries were generated from the LHC4 sample: a MiSeq library with a total of 7.3 Gbp and a HiSeq library with a total of 10.8 Gbp of raw sequence data. Sickle was used to remove completely or trim low quality sequence ends with Phred quality score below 30. Qualitytrimmed sequences (5.7 Gbp for MiSeq and 10.0 Gbp for HiSeq) from both libraries were assembled using Ray Meta version 2.3.1 [9] with k-mer of 35 and utilizing 1280 CPU cores on Beskow supercomputing cluster at PDC Center for High Performance Computing at KTH (Kungliga Tekniska Högskolan, Sweden).

Generation of Loki’s Castle metagenome Loki’s Castle metagenomic reads were generated from sediment samples collected via gravity coring method [1]. Briefly, community DNA from the 7.5 g of raw sediment materials was extracted using FastDNA spin kit and quantified using fluorescent nanodrop ND-3300 instrument (Thermo Scientific) to measure dsDNA concentration. Sequencing library for Illumina HiSeq instrument was then generated using Nextera library preparation kit from Illumina. Amount of raw data generated by the HiSeq 2500 instrument was about 13.1 Gbp. Detailed assembly and further downstream analysis of the Loki’s Castle metagenomic data was described in Spang et al [1].

Binning of metagenomic contigs using PhymmBL Contigs belonging to target organisms were identified using supervised binning with PhymmBL [10], which was trained with available reference genomes (for instance clean contigs from Korarchaeon SAG was used to train PhymmBL). Additional manually added training sets included manually curated, contamination-free SAG assemblies. A training set comprising contigs of Lokiarchaeota

present in Loki Castle sediment samples was generated using the following approach: Phylogenetic trees of individual taxonomic marker genes were constructed and manually inspected. Comparison with the archaeal species tree allowed identification of contigs belonging to Lokiarchaeota (e.g. placement of genes from metagenomic contigs as deep-branching members of TACK superphylum indicated that these originated from Lokiarchaeota). Following several rounds of verification of this training set (see Spang et al for details [1]) these contigs were then used for supervised binning using PhymmBL. Only contigs larger than 1 kbp were analysed.

Co-assembly of new Korarcheota A total of 15.2 million quality-trimmed and adapter-removed Illumina MiSeq reads from the SAG data and 237974 Illumina HiSeq reads mapped against contigs classified as Korarcheota by PhymmBL were assembled with SPAdes [8] using a set of k-mer sizes (21,33,55,75,101,125) and the “--sc” and “--careful” flags to handle single-cell data.

Estimation of genome completeness Using a set of 162 marker genes known to be present in single copies in most archaea [11], genome completeness and redundancy was estimated by counting these marker genes in the three SAGs and three metagenomic bins in this study. This was accomplished by running PSI-BLAST of alignment profiles of marker genes from a representative set of archaea including all major lineages against the proteomes of these SAGs or metagenomic bins.

Phylogenetic analyses Phylogeny of 16S rRNA genes. A total of 338 16S rRNA gene sequences representing major taxonomic clades from the TACK superphylum and members of deeply branching archaea from Silva release 119 were selected. Five taxa from Euryarchaeota were chosen as outgroup clade. Sequences were aligned with MAFFT L-INS-i [12] and columns with gaps if present in >50% of the taxa were removed with trimAl tool [13]. Maximum likelihood phylogeny was constructed using RAxML version 8.0.22 [14], using GTRGAMMA model of nucleotide substitutions and 100 bootstrap replicates.

Phylogenetic analysis of concatenated marker proteins. A set of 36 conserved single-copy marker genes were identified in the SAGs using PSI-BLAST and aligned with those from a select group of archaea, bacteria, and eukaryotes using the MAFFT L-INS-i tool. Alignments were trimmed using trimAl to remove column positions containing gaps in more than 50% of the taxa and subsequently concatenated. Maximum likelihood phylogeny was inferred using RAxML with 100 bootstrap replicates. Bayesian phylogenetic analyses were performed using Phylobayes [15], utilizing CAT+GTR model of amino acid substitution and 4 independent chains for ~10,000 generations. Consensus tree was generated with bpcomp tool from Phylobayes package to discard the first 2000 generations from the chains and sampling trees once every 50 generations. Bootstrap values from RAxML maximum likelihood tree were mapped on to corresponding positions in the Phylobayes consensus tree using “sumtrees.py” tool from DendroPy package [16].

Actin phylogeny. Representatives for major eukaryotic actin families (actins and ARP 1-3) [17, 18] as well as for crenactins (arCOG05583) were retrieved from NCBI and merged with actin homologs identified in Lokiarchaeum [1] as well as in the archaeal SAGs obtained in this study. The selected sequences were aligned using MAFFT L-INS-i and trimmed with trimAl to retain only those columns present in at least 50% of the sequences. Alignments were subjected to Maximum likelihood phylogenetic analyses using RAxML (8.0.22, GAMMA-LG) with the slow bootstrap option (100 bootstraps).

18

70 KC604525_1_912_F9P122000_Arc_2_E02 61 KC604536_1_912_Sc_EA05 46 JQ684417_1_914_Sc_EA05 5954 JN562365_1_1309_HDBA_SITS413 33 JN562358_1_1310_Sc_EA05 JN562354_1_1310_Sc_EA05 56 KC604519_1_912_Sc_EA05 80 JN562364_1_1310_Sc_EA05

JN562350_1_1309_Sc_EA05 uncultured_crenarchaeote_74A4 JQ226748_1_1345_Marine_Group_I Nitrosopumilus_maritimus_SCM1 Candidatus_Nitrosoarchaeum_limnia_SFB1 CSU51469_Cenarchaeum_symbiosum_small_subunit_ribosomal_RNA Cenarchaeum_symbiosum_A AJ294881_1_911_Marine_Group_I GQ994282_1_914_Marine_Group_I AB240745_1_910_Marine_Group_I FN553842_1_1342_Marine_Group_I AF119127_1_1361_Marine_Group_I 46 uncultured_crenarchaeote_4B7 100 uncultured_marine_crenarchaeote_AD1000_202_A2 97 uncultured_marine_crenarchaeote_SAT1000_23_F7 39 AB194000_1_1442_Marine_Group_I 29 78 AF121992_1_916_Marine_Group_I 8275 uncultured_marine_crenarchaeote_KM3_47_D6 AB019729_1_1401_Marine_Group_I AY316120_23792_25263_Marine_Group_I 100 JQ222372_1_913_F9P122000_Arc_2_E02 68 JQ221038_1_914_F9P122000_Arc_2_E02 57 AB193991_1_1443_Sc_EA05 100 AB050241_Uncultured_archaeon_SAGMA_11 93 AB050240_1_1405_SAGMCG_1 95 FJ174736_1_912_SAGMCG_1 30 97 57 EF021167_1_922_SAGMCG_1 FN691529_1_904_SAGMCG_1 89 JN397659_1_1403_SAGMCG_1 56 78 HM187547_1_1331_SAGMCG_1 KC437100_1_1219_SAGMCG_1 89 HM187516_1_1320_HDBA_SITS389 84 HM187555_1_1332_HDBA_SITS389 HM187570_1_1316_HDBA_SITS389 HM187524_1_1332_HDBA_SITS413 75 HQ269036_1_994_SCG 39 KC505279_1_912_SCG 4793 HQ678244_1_912_SCG AY278097_1_925_SCG 90 uncultured_crenarchaeote_AJ496176 14 95 CT573795_1_988_SCG 100 uncultured_crenarchaeote_AJ627422 8 79 KC505276_1_912_SCG GQ871406_1_913_SCG EF522590_1_925_SCG 68 56 Ca_Nitrososphaera_gargensis_Ga9_2 64 UAU62820_Unidentified_archaeon_SCA11 53 FR846896_1_912_AB64A_17 84 HM187536_1_1325_ArcC_u_cD06 55 KC437097_1_1228_AB64A_17 35 AY278096_1_923_SCG 61 8 AJ535123_1_1400_SCG 99 FJ174734_1_911_ArcC_u_cD06 EU307065_1_909_ArcC_u_cD06 68 KC437240_1_1217_AK31 91 DQ417488_1_1042_AK31 57 AB213104_1_1342_AK31 46 42 KC437196_1_1218_AK31 AY555811_1_1084_AK31 32 HM187490_1_1334_AK31 100 HM187468_Uncultured_archaeon_clone_HDBA_SIPT691 4764 EU307032_1_906_AK31 59 KC437244_1_1217_AK31 70 KC437272_1_1217_AK31 KC437239_1_1216_AK31 64 JN881569_1_1350_D_F10 98 JN881570_1_1353_D_F10 98 GU137386_Uncultured_archaeon_clone_PNG_TB_4B140H1_A031 41 GU137379_1_1311_D_F10 92 EF444606_1_1051_D_F10 46 100 EF444594_1_1148_D_F10 34 96 AB293212_1_1307_D_F10 GU137363_1_1306_D_F10 50 JX047158_1_915_AS48 53 JN881615_1_1206_D_F10 40 98 44 EF444596_1_1088_D_F10 JN881571_1_1350_D_F10 GU137359_1_1323_AS48 EU239960_Candidatus_Nitrosocaldus_yellowstonii_strain_HL72 100 DQ300331_1_1341_Papm3A43 AB213098_1_1375_Papm3A43 100 HQ214605_1_907_MBGA 66 HM998419_1_941_MBGA 31 EU369854_1_995_MBGA 100 JF715330_1_944_AK59 31 HQ214595_1_905_AK59 81 HM745411_1_1461_SAGMCG_1 100 FJ584381_1_940_SAGMCG_1 EU307057_1_942_SAGMCG_1 4 84 AB329796_Uncultured_archaeon 45 JF747770_1_943_MBGA 97 96 JX000778_1_1275_MBGA FJ487456_1_1354_MBGA 16 DQ641810_1_915_MBGA 89 FJ902677_1_1254_Z273FA48 FJ485501_1_1274_Z273FA48 5 94 FJ150803_1_1369_MBGA 95 uncultured_marine_crenarchaeote_AD1000_325_A12 100 EF106843_1_1180_MBGA 91 37 uncultured_marine_crenarchaeote_KM3_153_F8 JQ085825_17904_19402_MBGA 77 98 HM187551_1_1345_AK56 24 90 EU307069_1_926_AK56 97 63 HM195143_1_911_AK56 HM187553_1_1369_AK56 100 JQ220666_1_943_pSL12 90 uncultured_marine_crenarchaeote_AD1000_23_H12 EF444663_1_1126_pSL12 HM187491_1_1367_AK59 77 FJ936603_1_1019_AK59 22 87 AY555832_1_1082_AK59 100 FJ936583_1_1019_AK59 96 FJ716359_1_933_AK59 KC437190_1_1237_AK59 46 94 KF836722_1_1281_pSL12 94 AY861965_Uncultured_crenarchaeote_clone_OPPD032 100 U63343_1_1342_pSL12 90 18 FJ716362_1_945_pSL12 47 JX576118_1_950_pSL12 76 53 99 ASPK01000003_14511_15983_OPPD003 EU635918_1_1343_OPPD003 100 AY861961_Uncultured_crenarchaeote_clone_OPPD028 98 FJ936642_1_1020_OPPD003 FJ936681_1_1020_OPPD003 100 38 AY555818_1_1082_AK56 61 AY555831_1_1082_AK56 33 AB302036_1_906_AK56 16 60 EU369909_1_980_AK56 AB329818_1_1339_AK56 AB329813_1_1342_AK56 83 100 EU155999_1_1471_AK59 JX428644_1_1032_AK59 49 28 KF278486_1_1249_FS243A_60 AB611357_1_902_FS243A_60 51 DQ300328_1_1339_FS243A_60 47 AB302012_1_902_FS243A_60 36 71 AB293209_Uncultured_archaeon EF100626_1_1387_FS243A_60 81 DQ270595_1_932_FS243A_60 100 DQ270594_1_914_FS243A_60 100 AB302019_1_904_TOTO_A6_15 100 AB167488_1_943_TOTO_A6_15 AB302013_1_904_TOTO_A6_15 45 uncultured_archaeon_HE574571 79 AF325181_1_1357_THSCG 100 AB007307_Unidentified_archaeon 100 uncultured_archaeon_HE574567 100 DQ834113_1_1343_THSCG 83 100 HE574570_1_1075_THSCG 91 uncultured_archaeon_HE574568 100 AB019732_Unidentified_archaeon_DNA_for_small_subunit_rRNA JQ611056_1_927_THSCG 87 JF428802_1_939_THSCG 33 GQ927605_1_934_THSCG 78 20 DQ300340_1_1338_THSCG 86 AB213091_1_1373_THSCG 86 AB019733_Unidentified_archaeon_DNA_for_small_subunit_rRNA 100 AJ299172_1_938_THSCG AB293228_Uncultured_archaeon 100 Ca_Caldiarchaeum_subterraneum 100 JN881579_Candidatus_Caldiarchaeum_subterraneum_clone_PNG_TBR_A55 42 UCU63341_Uncultured_crenarchaeote_clone_pSL4 UCU63340_Uncultured_crenarchaeote_clone_pSL22 100 81 uncultured_marine_crenarchaeote_E6_3G 38 HQ214610_17213_18939_C3 KC003888_1_901_THSCG 40 MCGE09 GQ927646_1_939_C3 39 100 KC925859_1_908_C3 100 AF424527_1_993_C3 81 JQ245943_1_941_C3 54 69 HM244037_1_938_C3 57 GQ356830_1_945_C3 39 100 DQ363786_1_957_C3 58 KC003754_1_906_C3 48 EF125505_1_961_C3 FJ649529_Uncultured_archaeon_clone_AMSMV_S1_A46 34 61 DQ841219_Uncultured_archaeon_clone_MOB4_13 AF005765_1_901_MCG GU127583_1_947_MCG 94 JQ258665_1_949_MCG 100 AB300148_1_1196_MCG 71 AB161343_1_942_MCG 41 99 AB364895_1_1428_MCG 74 AJ579323_Uncultured_archaeon_partial 100 EU420714_1_1479_MCG GU363075_Uncultured_archaeon_clone_BD72AR18 75 63 54 41 AB284821_Uncultured_archaeon 23 AB176996_Uncultured_archaeon EU915232_Uncultured_archaeon_clone_242mbsf_1_1D 50 KF836712_1_1281_MCG UCU63339_Uncultured_crenarchaeote_clone_pSL17 72 AB213070_Uncultured_archaeon AB213101_Uncultured_archaeon 41 Aeropyrum_pernix_K1 100 AF526976_1_1030_Crenarchaeota 95 DQ228589_1_1073_Crenarchaeota 72 100 Acidilobus_saccharovorans_345_15 DQ833956_1_1336_Crenarchaeota 77 AB629611_1_942_Crenarchaeota 53 Hyperthermus_butylicus_DSM_5456 100 JQ611053_1_924_Crenarchaeota Pyrolobus_fumarii_1A 94 100 AY264344_Desulfurococcus_fermentans_strain_Z_1312 97 Desulfurococcus_kamchatkensis_1221n 93 Thermosphaera_aggregans_DSM_11486 78 34 AF526966_1_1028_Crenarchaeota 80 97 Staphylothermus_marinus_F1 100 AF526972_1_1030_Crenarchaeota Ignicoccus_hospitalis_KIN4_I 100 Fervidicoccus_fontis_Kam940 51 100 HQ395717_1_1429_Crenarchaeota Ignisphaera_aggregans_DSM_17230 96 AB182498_1_1465_Crenarchaeota 100 Acidianus_hospitalis_W1 100 Metallosphaera_cuprina_Ar_4 100 Metallosphaera_sedula_DSM_5348 91 Sulfolobus_acidocaldarius_DSM_639 71 Sulfolobus_tokodaii_str__7 96 Sulfolobus_islandicus_M_16_4 47 100 Pyrobaculum_aerophilum_str__IM2 DQ924705_1_1336_Crenarchaeota 100 Pyrobaculum_calidifontis_JCM_11548 49 Thermoproteus_uzoniensis_768_20 86 97 Vulcanisaeta_distributa_DSM_14429 62 DQ924651_1_1335_Crenarchaeota 100 Caldivirga_maquilingensis_IC_167 16 JX989262_1_1472_Crenarchaeota 99 AB496483_1_943_Crenarchaeota 98 Thermofilum_pendens_Hrk_5 AB302028_1_902_Crenarchaeota 55 90 HM244359_1_941_TMCG 100 GU127579_1_950_TMCG 100 KC925967_1_915_TMCG 100 HQ141848_1_944_TMCG 78 AY861950_Uncultured_crenarchaeote_clone_OPPD014 80 AY861964_Uncultured_crenarchaeote_clone_OPPD031 50 DQ243747_1_1388_TMCG 80 100 ASPO01000025_14362_15843_TMCG 26 88 DQ490012_Uncultured_archaeon_clone_GBS_L2_E12 99 NAG1 100 86 EF156625_1_1331_TMCG DQ924751_1_1339_TMCG UCU63338_Uncultured_crenarchaeote_clone_pJP96 100 EU924234_1_1340_TMCG UCU63342_Uncultured_crenarchaeote_clone_pSL50 76 AY555814_1_1083_AK8 100 GU120523_1_999_AK8 96 AF419648_1_1401_AK8 99 FJ901995_1_1103_AK8 GQ994168_1_928_AK8 100 AB293230_1_1373_pMC2A209 AB175574_1_978_pMC2A209 100 GQ228721_Uncultured_korarchaeote_clone_IceG2_9a 94 GQ228784_1_1015_Korarchaeota 89 GQ228799_1_1261_Korarchaeota 99 100 HM150581_1_937_Korarchaeota 99 HQ395726_1_1431_Korarchaeota CNBRG16SD_Unidentified_korarchaeote_pJP78 95 JN573314_1_924_Korarchaeota 61 NR_074112_Candidatus_Korarchaeum_cryptofilum_OPF8 48 96 AM749964_1_1159_Korarchaeota 100 DQ228519_1_1441_Korarchaeota 100 KF278485_Uncultured_archaeon_clone_OTU_25_1 EU559679_1_929_Korarchaeota 97 EU559684_1_966_MHVG1 88 DQ417490_1_1076_MHVG1 96 AB007303_Unidentified_archaeon_gene_for 89 14 DQ228522_Uncultured_archaeon_clone_F99a102 97 DQ417465_1_1074_MHVG1 12 EF100620_1_920_MHVG1 71 100 GU137389_1_1341_MHVG1 92 EF100633_1_906_MHVG1 AB629595_1_945_MHVG1 98 AB302039_1_906_MHVG1 100 JX298757_1_936_MHVG1 FR852935_1_937_MHVG1 100 GQ994203_1_933_DSAG 67 GQ410898_1_930_DSAG 28 GQ926420_1_926_DSAG 44 AF119128_Uncultured_archaeon_CRA8_27cm 57 DSAG_rRNA 95 55 AJ704632_1_933_DSAG 4 5919 JN123705_1_1423_DSAG AY835411_1_1460_DSAG 100 FJ404076_1_920_DSAG 100 AB177026_1_943_DSAG 99 EU731915_1_1269_DSAG JQ817590_1_1425_DSAG 100 AF119137_Uncultured_archaeon_APA3_11cm AB019721_Unidentified_archaeon_DNA_for_small_subunit_rRNA 90 JQ816545_1_1430_AAG 96 DQ522938_1_1307_AAG 58 41 DQ640138_1_1333_AAG 56 DQ640139_SBAK_mid_13_Uncultured_archaeon_clone_SBAK_mid_13 100 JQ989652_1_1412_AAG 100 JN123684_1_1413_AAG 99 GU553642_1_1414_AAG 100 JX000794_1_1414_AAG 15 100 JX000838_1_1452_AAG 100 47 JQ817556_1_1412_AAG JQ817537_1_1383_AAG 36 100 AB019714_pMC2A249_Unidentified_archaeon_DNA_for_small_subunit_rRNA AB019715_pMC2A14_Unidentified_archaeon_DNA_for_small_subunit_rRNA AB019717_pMCA256 100 56 DQ640136 82 DQ640135 DQ640140 98 DQ640141_SBAK_shallow_12 77 28 HQ588643 98 100 JQ989638 JQ989360 42 JX000812 100 64 AB703576 AB424704_Fhm5A01 100 GQ926286_1_920_MHVG GQ926365 15 84 AB355100_1_932_MHVG 80 AB177229_1_923_MHVG 100 DQ522903_SBAK_deep_04_Uncultured_archaeon_clone_SBAK_deep_04 24 AB019718_pMC2A15_Unidentified_archaeon_DNA_for_small_subunit_rRNA 100 FN820420_GoC_Arc_109_D0_C1_M0_Uncultured_archaeon_partial FN820348_GoC_Bac_171_D0_C1_M0 100 AB329826_1_1308_MHVG 100 GQ927552_1_910_MHVG 100 AB600441_1_1309_MHVG 100 AB301878_1_1000_MHVG 100 AB189390_1_904_MHVG AB193961_1_1434_MHVG 90 Halobacterium_sp__NRC_1 36 Methanosarcina_acetivorans_C2A 41 68 Archaeoglobus_fulgidus_DSM_4304 Methanocaldococcus_jannaschii_DSM_2661 Pyrococcus_furiosus_DSM_3638 63 59 96

22

100

58

84 71 100 49 29

0.2

Supplementary Figure 1: Maximum likelihood phylogenetic tree of archaeal lineages within the TACK superphylum showing all major clades as classified in the Silva rRNA gene database.

1 100

1 100

Thermotoga_maritima_MSB8 Synechocystis_PCC_6803 Bacillus_subtilis_168 0.98 Bacteroides_thetaiotaomicron 0.74 0.74 100 Rhodopirellula_baltica_SH_1 78 1 87 Chlamydia_trachomatis_D_UW_3 93 Borrelia_burgdorferi_B31 Campylobacter_jejuni_NCTC_111 71 1 Rickettsia_prowazekii_Madrid 100 Escherichia_coli_K_12_substr Eury_AAA252_I15 1 Diapher_AAA011_E11 100 Ca_Micrarchaeum_acidiphilum_A 0.97 Aenigma_AAA011_O16 1 100 Ca_Nanosalinarum_sp_J07AB56 100 Ca_Nanosalina_sp_J07AB43 1 100 Nanoarchaeum_equitans_Kin4_M 100 Nanoarchaeote_Nst1 1 100 Ca_Parvarchaeum_acidophilus_A 1 1 Nano_AAA011_G17 100 Nano_AAA011_D5 Methanopyrus_kandleri_AV19 Methanocaldococcus_jannaschii 1 100 1 Methanotorris_igneus_Kol_5 100 0.85 Methanococcus_maripaludis_C6 Methanothermus_fervidus_DSM_2 1 100 Methanothermobacter_thermauto 1 100 1 Methanosphaera_stadtmanae_DSM 100 Methanobacterium_AL_21 MBGD_SCGC_AB539N05 1 Uncultured_Marine_Group_II_Eu 100 1 Methanomassiliicoccus_lu_B10 83 1 Aciduliprofundum_boonei_T469 1 Thermoplasma_acidophilum_DSM 100 1 1 100 Picrophilus_torridus_DSM_9790 1 100 100 Ferroplasma_acidarmanus_fer1 1 Ferroglobus_placidus_DSM_1064 100 Archaeoglobus_fulgidus_DSM_43 Methanocorpusculum_labreanum 1 1 Methanoplanus_petrolearius_DS 100 1 100 Methanoculleus_marisnigri_JR1 100 1 71 1 Methanospirillum_hungatei_JF 1 Methanosphaerula_palustris_E1 100 Methanosaeta_thermophila_PT 1 1 100 Methanosarcina_acetivorans_C2 100 0.95 Methanohalobium_evestigatum_Z 79 Methanocella_paludicola_SANAE 1 Haloferax_volcanii_DS2 88 1 Halalkalicoccus_jeotgali_B3 100 1 92 1 Halobacterium_NRC_1 96 Haloarcula_marismortui_ATCC_4 1 Thermococcus_kodakarensis_KOD 100 Pyrococcus_furiosus_DSM_3638 1 Loki 2 100 Lokiarchaeum 1 74 Loki 3 1 Trichomonas_vaginalis 67 1 Entamoeba_histolytica_HM_1_IM 100 0.98 Tetrahymena_thermophila 100 Leishmania_infantum 0.74 95 Plasmodium_falciparum 0.75 Thalassiosira_pseudonana_CCMP 1 Arabidopsis_thaliana 100 1 Dictyostelium_discoideum 1 1 Saccharomyces_cerevisiae 97 80 Homo_sapiens 1 Korarchaeon SAG (LHC4) 100 Ca_Korarchaeum_cryptofilum_OPF8 MCG SAG (10Y13-F10) 0.99 MCG SAG (10Y13-A3) MCG_SCGC_AB539E09 1 1 Aiga_AAA471_G05 83 100 1 Aiga_AAA471_F17 100 1 Caldiarchaeum_subterraneum 98 Aiga_0000106_J15 Thaum_AB_179_E04 1 Ca_Nitrososphaera_gargensis_G 100 1 Thaum_AAA007_O23 100 1 Cenarchaeum_symbiosum_A 100 1 Nitrosopumilus_maritimus_SCM1 100 1 100 1 Nitrosoarchaeum_limnia_SFB1 0.99 100 Nitrosoarchaeum_koreensis_MY1 81 1 Geoarchaeon_NAG1 100 Geo_AAA471_B05 0.75 Thermofilum_pendens_Hrk_5 1 1 Vulcanisaeta_distributa_DSM_1 100 100 Caldivirga_maquilingensis_IC 1 100 Thermoproteus_uzoniensis_768 1 100 Pyrobaculum_calidifontis_JCM 1 100 Pyrobaculum_aerophilum_IM2 1 Ignisphaera_aggregans_DSM_172 100 1 1 Sulfolobus_tokodaii_7 100 Sulfolobus_acidocaldarius_DSM 1 Sulfolobus_solfataricus_P2 100 100 Sulfolobus_islandicus_M_16_4 1 86 Acidianus_hospitalis_W1 1 1 100 1 Metallosphaera_sedula_DSM_534 100 100 Metallosphaera_cuprina_Ar_4 Fervidicoccus_fontis_Kam940 Ignicoccus_hospitalis_KIN4_I 1 0.75 Staphylothermus_marinus_F1 1 1 100 Thermosphaera_aggregans_DSM_1 100 Desulfurococcus_kamchatkensis 0.6 1 Pyrolobus_fumarii_1A 1 100 Hyperthermus_butylicus_DSM_54 100 1 Aeropyrum_pernix_K1 100 Acidilobus_saccharovorans_345 1

99

Supplementary Figure 2: Combined maximum likelihood and Bayesian phylogenetic tree showing all taxa. Bootstrap values were mapped onto the backbone tree produced by Phylobayes using “sumtrees.py” tool.

Supplementary References 1. Spang, A., Saw, J.H., Jørgensen, S.L., Zaremba-Niedzwiedzka, K., Martijn, J., Lind, A.E., van Eijk, R., Schleper, C., Guy, L. & Ettema, T.J.G. 2015 Complex archaea that bridge the gap between prokaryotes and eukaryotes. In press. 2. Vick, T.J., Dodsworth, J.A., Costa, K.C., Shock, E.L. & Hedlund, B.P. 2010 Microbiology and geochemistry of Little Hot Creek, a hot spring environment in the Long Valley Caldera. Geobiology 8, 140-154. (doi:10.1111/j.1472-4669.2009.00228.x). 3. Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K. & Schloss, P.D. 2013 Development of a dualindex sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Applied and environmental microbiology 79, 5112-5120. (doi:10.1128/AEM.01043-13). 4. Edgar, R.C. 2013 UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nature methods 10, 996-998. (doi:10.1038/nmeth.2604). 5. Yilmaz, P., Parfrey, L.W., Yarza, P., Gerken, J., Pruesse, E., Quast, C., Schweer, T., Peplies, J., Ludwig, W. & Glockner, F.O. 2014 The SILVA and "All-species Living Tree Project (LTP)" taxonomic frameworks. Nucleic acids research 42, D643-648. (doi:10.1093/nar/gkt1209). 6. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 25, 3389-3402. 7. Dodsworth, J.A., Blainey, P.C., Murugapiran, S.K., Swingley, W.D., Ross, C.A., Tringe, S.G., Chain, P.S., Scholz, M.B., Lo, C.C., Raymond, J., et al. 2013 Single-cell and metagenomic analyses indicate a fermentative and saccharolytic lifestyle for members of the OP9 lineage. Nature communications 4, 1854. (doi:10.1038/ncomms2884). 8. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., et al. 2012 SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology : a journal of computational molecular cell biology 19, 455-477. (doi:10.1089/cmb.2012.0021). 9. Boisvert, S., Raymond, F., Godzaridis, E., Laviolette, F. & Corbeil, J. 2012 Ray Meta: scalable de novo metagenome assembly and profiling. Genome biology 13, R122. (doi:10.1186/gb-2012-13-12-r122). 10. Brady, A. & Salzberg, S.L. 2009 Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature methods 6, 673-676. (doi:10.1038/nmeth.1358). 11. Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N.N., Anderson, I.J., Cheng, J.F., Darling, A., Malfatti, S., Swan, B.K., Gies, E.A., et al. 2013 Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431-437. (doi:10.1038/nature12352). 12. Katoh, K., Misawa, K., Kuma, K. & Miyata, T. 2002 MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research 30, 3059-3066. 13. Capella-Gutierrez, S., Silla-Martinez, J.M. & Gabaldon, T. 2009 trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972-1973. (doi:10.1093/bioinformatics/btp348). 14. Stamatakis, A. 2014 RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312-1313. (doi:10.1093/bioinformatics/btu033). 15. Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. 2013 PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Systematic biology 62, 611-615. (doi:10.1093/sysbio/syt022). 16. Sukumaran, J. & Holder, M.T. 2010 DendroPy: a Python library for phylogenetic computing. Bioinformatics 26, 1569-1571. (doi:10.1093/bioinformatics/btq228). 17. Goodson, H.V. & Hawse, W.F. 2002 Molecular evolution of the actin family. Journal of cell science 115, 2619-2622. 18. Yutin, N., Wolf, M.Y., Wolf, Y.I. & Koonin, E.V. 2009 The origins of phagocytosis and eukaryogenesis. Biology direct 4, 9. (doi:10.1186/1745-6150-4-9).