Final Report Template - Publications

2 downloads 617 Views 1MB Size Report
Email: [email protected] ... Email: [email protected]. ... managers, as well as policy frameworks that promote and encourage the use of adaptive ...
A Rapid Genetic Approach for Assessing Sediment Biodiversity and Function RIRDC Publication No. 10/176

RIRDC

Innovation for rural Australia

A Rapid Genetic Approach for Assessing Sediment Biodiversity and Function

by Christopher M. Hardy, Anthony A. Chariton, Leon N. Court and Matthew J. Colloff

December 2010 RIRDC Publication No. 10/176 RIRDC Project No. PRJ-004941

© 2010 Rural Industries Research and Development Corporation. All rights reserved.

ISBN 978-1-74254-143-3 ISSN 1440-6845 A Rapid Genetic Approach for Assessing Sediment Biodiversity and Function Publication No. 10/176 Project No. PRJ-004941 The information contained in this publication is intended for general use to assist public knowledge and discussion and to help improve the development of sustainable regions. You must not rely on any information contained in this publication without taking specialist advice relevant to your particular circumstances. While reasonable care has been taken in preparing this publication to ensure that information is true and correct, the Commonwealth of Australia gives no assurance as to the accuracy of any information in this publication. The Commonwealth of Australia, the Rural Industries Research and Development Corporation (RIRDC), the authors or contributors expressly disclaim, to the maximum extent permitted by law, all responsibility and liability to any person, arising directly or indirectly from any act or omission, or for any consequences of any such act or omission, made in reliance on the contents of this publication, whether or not caused by any negligence on the part of the Commonwealth of Australia, RIRDC, the authors or contributors. The Commonwealth of Australia does not necessarily endorse the views in this publication. This publication is copyright. Apart from any use as permitted under the Copyright Act 1968, all other rights are reserved. However, wide dissemination is encouraged. Requests and inquiries concerning reproduction and rights should be addressed to the RIRDC Publications Manager on phone 02 6271 4165. Researcher Contact Details Dr Christopher M. Hardy CSIRO Entomology GPO Box 1700 Canberra, ACT 2601 Phone: 02 6246 4375 Fax: 02 6246 4296 Email: [email protected] In submitting this report, the researcher has agreed to RIRDC publishing this material in its edited form. RIRDC Contact Details Rural Industries Research and Development Corporation Level 2, 15 National Circuit BARTON ACT 2600 PO Box 4776 KINGSTON ACT 2604 Phone: Fax: Email: Web:

02 6271 4100 02 6271 4199 [email protected]. http://www.rirdc.gov.au

Electronically published by RIRDC in December 2010 Print-on-demand by Union Offset Printing, Canberra at www.rirdc.gov.au or phone 1300 634 313

ii

Foreword A DNA microarray chip for monitoring biodiversity has been developed and tested under Australian conditions. Application of this technology will eventually enable researchers and environmental managers to determine and predict impacts of environmental perturbation on biodiversity more rapidly, cost-effectively and efficiently than is currently possible using existing techniques. Detailed information on the development, testing and application of the “biodiversity chip” is provided in this report. The importance of this report is that it demonstrates the basic capability to change the way that environmental assessment and monitoring is done by providing the details on the development and use of ecogenomics technology for biodiversity monitoring. It will be a useful basis for those contemplating investment or formulating policy involving adaptive natural resource management, for which monitoring and assessment are critical steps in the adaptive management cycle. This work has demonstrated proof-of-concept for ecogenomics biodiversity monitoring in estuarine sediments. The underlying principles and technology are, however, equally applicable to freshwater and terrestrial ecosystems. The implementation of this technology represents the next step in its adoption and use. This process requires partnerships between scientists and natural resource managers, as well as policy frameworks that promote and encourage the use of adaptive management and monitoring approaches. This project was supported by funds provided by the former Land and Water Australia (LWA), the Rural Industries Research and Development Corporation (RIRDC), CSIRO and the NSW Environmental Trust. This report, an addition to RIRDC’s diverse range of over 2000 research publications, forms part of our Dynamic Rural Communities R&D program, which aims to promote and sustain vibrant, resilient regional communities through targeted commissioned and collaborative R&D investments. Most of RIRDC’s publications are available for viewing, free downloading or purchasing online at www.rirdc.gov.au. Purchases can also be made by phoning 1300 634 313.

Craig Burns Managing Director Rural Industries Research and Development Corporation

iii

Acknowledgments The authors acknowledge the financial support provided by Land & Water Australia, the Rural Industries Research and Development Corporation, CSIRO Transformational Biology Platform and CSIRO Water for a Healthy Country National Research Flagship. This project has also been assisted by the New South Wales Government through its Environmental Trust. We gratefully acknowledge the expert technical assistance of Diana Hartley (CSIRO Entomology). We also thank Graeme Batley for reviewing the manuscript.

iv

Executive Summary Estuarine sediment ecosystem health is commonly assessed by identifying and counting the larger invertebrate animals present in sediments. This approach requires a large number of replicate samples, takes a long time and cannot be done without specialised taxonomic expertise. Because of its selective nature, other organisms that may be critical to ecosystem function, or act as indicators of abiotic changes, tend to be ignored. Ideally, complementary approaches are required that address ecosystem function and represent a comprehensive assessment of biodiversity. Recent advances in ecogenomics approaches involving high throughput DNA sequencing and DNA microarrays (‘gene chips’), now provide, for the first time, an opportunity to measure and understand biological complexity and ecosystem function at a series of environmentally meaningful scales. Who is the report targeted at? This report is targeted at researchers and natural resource managers. Considerable time has been spent demonstrating and discussing ecogenomics with researchers and managers from Australia and overseas. These interactions have been through one-on-one meetings, workshops and presentations at Australian and international conferences, as well as via scientific publications. This work is also relevant to policy, especially that relating to design and implementation of large-scale adaptive environmental management and monitoring programs, such as the Murray-Darling Basin Plan currently being developed by the Murray-Darling Basin Authority. This report is of relevance to all agencies that have responsibilities for the conservation and management of biodiversity and ecosystem health. Where are the relevant industries located in Australia? What is the location of the strongest industry representation in Australia? The location of the subject matter of this report is Sydney Harbour. The scope of its findings concerns estuarine ecosystems of the east coast of Australia. The relevant industry is environmental impact assessment and monitoring by State and Federal Agencies and private companies. Several new research and monitoring activities have been generated from the results of the present project. These include genomic analysis of an industrialised ground water system (Botany Aquifer, Sydney; CSIRO and Macquarie University); ecogenomic monitoring of estuaries in southeast Queensland (CSIRO) and the resistance and resilience of floodplain and wetland communities to drought (Murray-Darling Freshwater Research Centre and CSIRO). Background In this report, a DNA microarray chip is described which encapsulates a broad range of biota (invertebrates, algae and micro-organisms, ranging from the very small to the very large) as well as functional genes diagnostic of key major biogeochemical process that occur within estuarine sediments. The genetic information for the chip is derived from two sources: genes extracted from organisms living in sediments in Sydney Harbour and sequenced using 454 Life Sciences technology, and gene sequences obtained from GenBank (a web-based sequence database). DNA was extracted from sediments taken from locations that had been impacted by industrial contaminants (‘impacted locations’) and compared with those from sites that were not contaminated (‘reference locations’). The DNA from these sediments was directly sequenced and screened against the microarray, providing equivalent information on the differences in biota and the expression of functional genes between the impacted and reference locations.

v

Aims/objectives The goal of this project was to develop a technique to concurrently assess the structural and functional status of ecosystems using genetic measures of the organisms within them. The key objective of this project was to develop a rapid and cost-effective method for monitoring the structural and functional biodiversity of estuarine sediments. Methods used The proof of concept approach involved design, construction and testing of a DNA microarray chip (Estuarine Sediment Ecology Array) that encapsulates the genetic information for a broad range of biota (invertebrates, algae and micro-organisms) and functional genes diagnostic for major estuarine biogeochemical processes. DNA extracted from sediments was screened against the microarray to obtain information on the differences in biota and genes expressed between control and contaminantimpacted locations. Results/key findings Two different and innovative approaches for assessing estuarine sediment biodiversity are presented in this study: Direct sequencing of sediments using 454 pyrosequencing and hybridization of samples to a DNA microarray chip. Both methodologies provide considerable benefits over existing monitoring tools such as taxonomic identification and chemical analyses. In particular, the ecosystem array approach has the potential to provide ecological and biological data for inclusion into environmental impact assessments at a similar cost and turn-around time to that of standard chemical data. Implications for relevant stakeholders Application of this technology will eventually enable researchers, industries and environmental managers to determine or predict impacts of environmental perturbation on biodiversity more rapidly, cost-effectively and efficiently than is currently possible using existing techniques. We believe that regulators and industry in NSW (and ultimately worldwide) would directly benefit from more precise and cost-effective risk assessments of sediment quality, enabling the prioritising of any necessary management actions to be evidence-based on defensible science. The techniques and methods developed in this project could be routinely used to monitor and assess coastal developments and activities such as stormwater inputs, marina constructions, dredging and the disposal of contaminated sediments; as well as provide information on the diets of commercially important fish and feral species. The application of ecogenomic techniques, especially pyrosequencing, is equally applicable to monitoring programs for a broad range of terrestrial and aquatic ecosystems. For example, high throughput genomic-based approaches are currently being integrated into the Canadian Aquatic Biomonitoring Network (CABIN) (Hajibabaei and Shokralla, 2009). Recommendations This report aims to promote the concept, application and likely impacts of genomic technologies to environmental researchers, industry and natural resource managers and decision makers. It is therefore recommended that a workshop be developed to promote the research developed to showcase ecogenomic applications in environmental research, whilst also providing an introduction of the topic to interested parties.

vi

Contents Foreword ............................................................................................................................................... iii Acknowledgments................................................................................................................................. iv Executive Summary .............................................................................................................................. v 1.

Introduction .......................................................................................................................... 1

2.

Objectives ............................................................................................................................. 3

3.

Methodology ......................................................................................................................... 4 3.1 Experimental design............................................................................................................... 4 3.2 DNA extraction .................................................................................................................... 10 3.3 DNA sequencing .................................................................................................................. 13 3.4 DNA microarray design ....................................................................................................... 17 3.5 DNA microarray screening .................................................................................................. 20

4.

Results ................................................................................................................................. 22 4.1 Analysis of sediment biodiversity using DNA pyrosequencing .......................................... 22 4.2 Analysis of sediment biodiversity using DNA microarrays ................................................ 28

5.

Discussion............................................................................................................................ 32 5.1 Objective 1: Innovative, rapid and cost-effective biodiversity assessment ......................... 32 5.2 Objective 2: Estuarine sediment ecology microarrays......................................................... 32 5.3 Objective 3: Human impacts on sediment biodiversity ....................................................... 32 5.4 Objective 4: A standardised approach for environmental monitoring................................. 33 5.5 Objective 5: Regional biodiversity assessments .................................................................. 38 5.6 Objective 6: Adoption and promotion of innovation idea ................................................... 38

6.

Implications ........................................................................................................................ 42

7.

Recommendations .............................................................................................................. 43

8.

References ........................................................................................................................... 44

vii

Tables Table 1:

Mean concentrations of contaminants sampled from the six study locations ....................... 7

Table 2:

18S rRNA gene fusion primers, target environmental locations and fraction samples ...... 15

Table 3:

Experimental procedures and tracking of sample tubes ...................................................... 16

Table 4:

Summary of sequence probes included on the Affymetrix DNA microarray chip ............. 19

Table 5:

Results of 454 sequencing run ............................................................................................ 24

Table 6:

Two-factor PERMANOVA analysis ................................................................................... 25

Table 7:

Project Knowledge and Adoption (K&A) activities ........................................................... 39

Figures Figure 1: The reference and impacted locations used during the course of this study ......................... 4 Figure 2: Sites at reference locations: a) Boronia Park; b) Woodford Bay; c) Tambourine Bay......... 5 Figure 3: Sites at impacted locations a) Iron Cove; b) Hen & Chicken Bay; c) Five Dock Bay.......... 6 Figure 4: Environmental sample collection strategy ............................................................................ 9 Figure 5: Collection of sediment core samples from Sydney Harbour ............................................... 10 Figure 6: Agarose gel electrophoresis of PCR products ..................................................................... 12 Figure 7: Non-metric multidimensional scaling (nMDS) of biodiversity profiles ............................. 25 Figure 8: Taxa present in reference and impacted estuaries determined by 454 sequencing ............. 26 Figure 9: Taxa unique to reference and impacted estuaries determined by 454 sequencing ............. 27 Figure 10: Bacterial control probe match-mismatch hybridization signals .......................................... 28 Figure 11: Detection range for Affymetrix positive control hybridization probes ............................... 29 Figure 12: Detection range for 18S rRNA gene control hybridization probes ..................................... 29 Figure 13: Intensity plot readouts from hybridized DNA microarrays................................................. 30 Figure 14: Taxa present in reference and impacted estuaries determined by DNA microarrays ......... 31 Figure 15: Comparison between Affymetrix microarray and unadjusted 454 sequencing .................. 34 Figure 16: Comparison between Affymetix microarray and adjusted 454 sequencing ........................ 35 Figure 17: Pipeline for environmental monitoring using 454 pyrosequencing .................................... 36 Figure 18: Pipeline for environmental monitoring using Affymetrix DNA microarrays ..................... 37

viii

1.

Introduction

There is growing demand by environmental agencies for predictive, evidence-based information on the nature and extent of ecological responses of ecosystems to management interventions, be it the provision of environmental water allocations in the case of floodplain and wetland ecosystems, or the revegetation and regeneration of grassy woodlands. In the planning and prioritisation process for biodiversity and ecosystem management, decisions need to be made about which ecosystems represent key environmental assets, and deliver key functions, and about the processes that threaten or compromise biodiversity and ecosystem function. Increasingly, such planning and monitoring frameworks are inherently adaptive, requiring processes and mechanisms for monitoring and assessment to ensure successful outcomes. Long-term environmental monitoring data represent a critical resource for successful management of ecological assets and underpin the Adaptive Management Cycle. Sediments perform an essential role in the maintenance and functioning of aquatic ecosystems and are intrinsically coupled to the overlying waters. Aquatic ecosystem health is commonly assessed by identifying and counting the invertebrate communities living in sediments and the water column. This approach requires a large number of replicate samples, takes a long time, and needs specialised identification skills and taxonomic expertise. It tends to ignore those organisms that are not part of the target groups, even though they may be critical for the functioning of the system or represent indicators of external abiotic changes, e.g. salinity or flow. Traditional approaches examine only a small part of the total diversity of the ecosystem (generally less than 50 groups of organisms), leading to the assumption that changes in invertebrate composition are representative of the overall health of the ecosystem, even though there is strong evidence that other groups of organisms may be better environmental indicators (Kennedy and Jacoby, 1999). The inclusion of microscopic organisms is generally considered too difficult and time consuming to be used as routine. Genetic approaches are seen as the tool of the future for research and monitoring of aquatic biota (Hugenholtz and Tyson, 2006; Thomas and Klaper, 2004). Nevertheless benthic community studies are being recommended as an important line of evidence in the assessment of sediment quality. Ideally, a combination of complementary structural and functional approaches is required. Molecular techniques offer a powerful means of obtaining this information, as genetic diversity and expression are critical states which reflect the health of an ecosystem. More specifically, genetic diversity data can be used to gain information regarding the spatial and temporal distributions of a broad range of biota, including all life-stages, cryptic taxa and microscopic biota, whilst functional data can be obtained from the expression of genetic traits which correlate with major biogeochemical processes (e.g. nitrification). Recent advances in high-throughput and output molecular techniques now enable the simultaneous analysis of thousands of genes, dramatically reducing time and costs. This is driving a revolution in the way ecosystems are assessed (DeSantis et al. 2005; Gill et al. 2006; Tringe et al. 2005; Tyson et al. 2004; Venter et al. 2004). Advances in the next generation of affordable, high throughput DNA sequencing (see http://www.454.com) and microarray (see http://www.affymetrix.com/index.affx) capabilities provide, for the first time, an opportunity to understand biological complexity and ecosystem function at environmentally meaningful scales. In addition, microarray studies are not ‘single-hit’ experiments, as the genetic information can be synthesised in vitro on a slide, providing an amendable template which can be used by other researchers. The innovation described in this report is the development of a microarray-based approach for rapidly assessing the health of aquatic ecosystems. This involved designing and making a microarray chip that contains the genetic information of a broad range of biota (and functional genes diagnostic of nitrogen and sulfur cycling. DNA is extracted from sediments and screened against the microarray, providing information which is used to identify differences in biota and the expression of functional genes 1

between reference and impacted locations. Most importantly, researchers and managers will be able to obtain ecological data at a similar cost and timeframe to standard chemical data. The concept has been validated in an estuarine ecosystem, however the technique can be readily adapted for use in terrestrial and freshwater ecosystems. High density “Affymetrix” DNA microarrays have also been applied elsewhere for detection and monitoring of airborne organisms (DeSantis et al. 2005), but to our knowledge, the technology has not previously been applied in aquatic ecosystems. This report describes the proof-of-principle for a tangible and innovative approach, tested and validated under Australian conditions, that is capable of generating essential data on biodiversity for inclusion in decision frameworks supporting the management of aquatic ecosystems.

2

2. Objectives •

To develop a rapid and cost-effective genetic approach for monitoring the structural and functional biological diversity of sediments;



To produce a microarray chip containing a broad range of genes encompassing the primary biota and functional genes that are present in aquatic sediments and are responsible for key ecosystem processes;



To compare the diversity and relative abundance of DNA extracted from control and humanimpacted locations;



To develop the findings into a standardized approach for identifying differences between impacted and control locations, as a basis for high-throughput, low cost environmental monitoring;



To produce region-specific biodiversity assessments based on DNA micro arrays and provide the protocols to utilize this approach in other areas;



To initiate the ideology, application and likely impacts of genomic technologies to environmental researchers, managers and decision makers;

3

3. Methodology 3.1

Experimental design

3.1.1

Selection of study sites

Suitability of locations as ‘reference’ or ‘impacted’ (i.e. containing contaminants) was based on the protocol described by Weisberg et al. (1997) and is similar to that currently used by the United States Environmental Protection Agency (USEPA). In this respect, locations must not be highly developed or near a known point-source discharge and no contaminant should exceed Interim Sediment Quality Guideline concentrations (ISQG-high). Using these criteria, numerous locations within Sydney Harbour and the Hawkesbury and Georges rivers catchments have been previously examined by the CSIRO and the NSW EPA against Australian and New Zealand Interim Sediment Quality Guidelines (ANZEC/ARMCANZ, 2000). In addition, all reference and contaminated locations selected for use in this study were required to show similar salinity and sediment profiles and be unlikely to be subject to hypoxic events. On this basis, reference locations were selected from within the Lane Cove River and impacted locations, with significantly higher levels of particulate and porewater contaminants, were chosen from within the Parramatta River (Figures 1-3). At the impacted locations, Iron Cove sediments contained a complex mixture of organic and metal contaminants, Five Dock Bay was characterised by high concentrations of porewater chromium and zinc, and Hen and Chicken Bay contained high lead and copper concentrations, although concentrations of organics were somewhat lower than the other two locations. Although ISQGs were exceeded in some of the Lane Cove reference locations, concentrations were nevertheless below ISQG-high limits (Table 1).

Reference locations Boronia Park

Hen & Chicken Bay

Tambourine Bay

Five Dock Bay

Woodford Bay

Iron Cove

Impacted locations

Figure 1: The reference and impacted locations used during the course of this study

4

a)

b) c)

Figure 2: Sites at reference locations: a) Boronia Park; b) Woodford Bay; c) Tambourine Bay

5

b)

a)

c)

Figure 3: Sites at impacted locations a) Iron Cove; b) Hen & Chicken Bay; c) Five Dock Bay

6

Table 1: Mean concentrations of contaminants sampled from the six study locations

Contaminant

Treatment

Location

Ag

As

Cd

Cr

Cu

Ni

Pb

Zn

PAHslow

PAHshigh

PCBs

TPHs

TOC

mg/kg

mg/kg

mg/kg

mg/kg

mg/kg

mg/kg

mg/kg

mg/kg

µg/kg

µg/kg

µg/kg

µg/kg

%

7

Reference

Boronia Park

bdl

16

0.5

32

100

11

130*

340*

1020*

2390*

30*

340

4.9

Reference

Tambourine Bay

1.1*

16

0.4

39

130

12

170

400*

710*

1790*

27*

360

4.2

Reference

Woodford Bay

bdl

13

0.2

22

6

100

230*

970*

2420*

46*

160

2.0

Contaminated

Five Dock Bay

1.4*

20*

2.3*

56

140

10

330

4000**

1160*

4530*

57*

3720

4.0

Contaminated

Hen & Chicken Bay

1.5*

31*

1.7*

171*

690*

18

450

1240**

1400*

4980*

38*

320

3.7

Contaminated

Iron Cove

2.3*

26*

4.1*

67

325

21*

70

1450**

1310*

4540*

28*

1390

8.8

Guideline #

ISQG-Low

1.0

20

1.5

80

65

21

50

200

552

1700

23

~

N/A

Guideline #

ISQG-High

3.7

70

370

270

52

220

410

3160

9600

~

~

N/A

10

70*

# Low and high Australian and New Zealand Interim Sediment Quality Guideline (ISQG) values in amount of contaminant present per kilogram (dry weight) of sample (ANZECC/ARMCANZ 2000); ~ Indicates the absence of a guideline value; * The mean concentration (n = 3) for this location exceeded the ISQG-low; ** The mean concentration (n=3) for this location exceeded the ISQG-High; bdl = concentrations were below the detection limit. PAHs-low and PAHs-high = Total low and high molecular weight polycyclic aromatic hydrocarbons; PCBs = Total polychlorinated biphenyls; TPHs = Total petroleum hydrocarbons. Concentrations for all organic contaminants (PAHs, PCBs and TPHs) have been normalized to 1% total organic carbon. TOC = Total organic carbon. Adapted from Chariton et al. (2010).

3.1.2

Collection strategy and storage of sediment samples

Site selection Sites with approximate dimensions of 3×3 m2 were nested equidistant across a 100 m section of bay to encapsulate the biodiversity of each location. There were 3 sample sites per location, 4 replicate samples taken per sample site and 3 sieve-size fractions (macro, meio and micro) generated per replicate sample, producing a total of 216 estuarine sediment samples (Figure 4). PVC core sampling Macro fraction core samples were collected from estuarine sample locations by pushing a PVC pipe (10 cm in diameter) down to a depth of ~10 cm from the surface (Figure 5). Two such PVC core samples were pooled per replicate sample. Core samples used to generate meio and micro fraction samples were collected by pushing a PVC pipe (10 cm in diameter) down to a depth of 10 cm from the surface and subsampling from the top 1 cm. One such PVC core sample was used per replicate sample. Sample fractionation Macro fraction samples >500 µm in size were generated by sieving each pair of 10 cm deep macro core samples with a 500 µm sieve and rinsing in distilled water. Meio fraction samples between 63 µm and 500 µm in size were generated by sieving the single 1-cm deep core samples with a 500 µm sieve and rinsing with distilled water and then collecting the material retained by a 63 µm sieve. Micro fraction samples < 63 µm in size were generated by collecting the material that passed through the 63-µm sieve. New nylon meshing was used for each sample, with the meshing soaked in bleach for 24 h and thoroughly rinsed with Milli-Q filtered water prior to use to avoid cross-contamination between samples. Storage of estuarine environmental samples Macro fraction invertebrates were preserved in absolute ethanol in 50 mL or 10 mL tubes in the field and stored at -20°C until required. Core samples used to generate meio and micro fraction samples were stored on ice in the field and preserved at -20°C after sieving. Homogenisation of macro fraction samples Ethanol was removed from the macro samples by washing twice with 2 volumes of chilled PBS buffer, pH 7.2 (50 mM potassium phosphate, 150 mM NaCl) and a 3 mm glass bead positioned at the base of the tube to fill the dead space at the bottom of each tube. An equal volume of Qiagen Animal Tissue Lysis (ATL) buffer was added to each tube and the macro fraction samples homogenised to an even slurry using separate disposable pestles mounted on a Dumax homogeniser (model SB1020-C). Each macro homogenate was then digested with 20 µl of 20 mg/mL Proteinase K at 55°C for 1 h, dispensed in up to 400-µL aliquots and stored at -20°C until required for DNA extraction.

8

Location 2

Location 3

Location 4

Location 5

Location 6

Location 1 Site 1 Replicate A

Replicate B

Site 2 Replicate C

Replicate D

Replicate A

Replicate B

Site 3 Replicate C

Replicate D

Replicate A

Replicate B

Replicate C

Replicate D

9

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 1 macro

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 2 meio

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Fraction 3 micro

Figure 4: Environmental sample collection strategy

Figure 5: Collection of sediment core samples from Sydney Harbour

3.2

DNA extraction

3.2.1

DNA extraction from sediments

The UltraClean Soil DNA kit (MO BIO Laboratories Inc., Carlsbad, CA, USA) was used to extract DNA and remove PCR inhibitors from a total of 216 estuarine environmental fraction samples including up to 400 µL of the 72 macro homogenate samples and about 0.6 g of the 72 meio- and 72 microbiota fraction samples. The “wet soil sample method” was used to remove excess water from the meio- and microbiota samples and the “alternative protocol for maximum yields” was used to optimize yields of DNA extracted, according to the manufacturer’s instructions. A Bio 101 Savant FastPrep tube agitator (model FP120) was also used during the cell lysis step to enhance the action of the kits grinding beads on the environmental samples. 3.2.2

Polymerase chain reaction (PCR) of DNA

PCR optimisation Preliminary PCR experiments indicated that PCR amplification was being affected by inhibitors in the DNA extracts. Serial dilution of DNA extracts from macro, meio and microbiota fraction samples (undiluted, 1:2, 1:4, 1:8, 1:16, 1:32, 1:64 and 1:128 diluted) identified a 1:20 dilution and Qiagen Taq PCR Core Kit as optimal for PCR amplification of these DNA extracts (Figure 6). Subsequently, 50 µL PCR reactions were setup in 0.2 mL tubes using 1X coral PCR buffer, 1X Q-solution, MgCl2 at 2.5 10

mM, dNTP each at 200 µM, the conserved 18S rRNA gene primers at 0.4 µM each, 1 unit Taq DNA Polymerase per reaction and 2 µL of 1:20 diluted environmental fraction DNA extract per reaction. The PCR reactions were cycled in an Eppendorf Mastercycler epgradient S thermal cycler (1 cycle at 94°C for 2 min; 35 cycles of 94°C for 1 min, 50°C for 1 min, 15 sec, 72°C for 1 min, 30 sec; and 1 cycle of 72°C for 5 min, then held at 25°C) once the cycler had reached 94°C (simplified hot start). PCR for eukaryotic 18S rDNA using conserved gene primers Two primers were designed to target two conserved areas flanking a variable part in the 5’ region of the eukaryotic 18S rRNA gene. These conserved 18S primers (18S 1560 forward: TGGTGCATGGCCGTTCTTAGT and 18S 2035 reverse: CATCTAAGGGCATCACAGACC) (Hardy et al. 2010) were used to amplify 18S PCR products from all 216 estuarine environmental fraction DNA samples, 200 bp to 300 bp in length depending on eukaryotes present. The quantity and quality of amplified DNA and relative band intensities were determined by running 10 µL of each of the 216 18S PCR reactions on a 2% agarose, 1X TBE buffer gel alongside NEB 100 bp ladder (Cat: N3231S). The remaining 40 µL of each 18S PCR reaction was then purified separately using the Qiagen QIAquick PCR Purification Kit (Cat: 28106) and the amplicon DNA bound to each spin column eluted with 50 µL 1mM Tris.HCl, pH 8.5. PCR for bacterial 16S rDNA, amoA and nifH using conserved gene primers The 16S rRNA, amoA and nifH genes were amplified for 35-40 cycles as for the 18S rRNA gene using 1:20 diluted micro fraction DNAs using the following primers: 16S_27f (5’AGAGTTTGATCMTGGCTCAG-3’) and 16S_519r (5’-GWATTACCGCGGCKGCTG -3’) (Stackebrandt et al. 1991); amoA_1F (5’-GGGGTTTCTACTGGTGGT-3’) and amoA_2R (5’CCCCTCKGSAAAGCCTTCTTC-3’) (Rotthauwe et al. 1997); nifH_F (5’GGHAARGGHGGHATHGGNAARTC-3’) and nifH_R (5’-GGCATNGCRAANCCVCCRCANAC3’) (Mehta et al. 2003).

11

amo

nif

Figure 6: Agarose gel electrophoresis of PCR products

PCR products shown were obtained from 48 different microbiota fraction DNA extracts using 18S rRNA, 16SrRNA, amoA & nifH gene specific PCR primers

12

3.3

DNA sequencing

3.3.1

Barcoding of PCR products prior to sequencing

Design of 18S fusion primers with barcodes Fusion primers (9 forward and 1 reverse, Table 2) were designed to add the A and B adapter sequences, required for 454 pyrosequencing, to the amplified 18S rRNA gene amplicons by PCR for library construction (Margulies et al. 2005). In addition, the 9 forward fusion primers contained 10 base barcodes, differing by at least 4 nucleotides (Parameswaran et al. 2007) to allow pooling and sequencing of DNA from independent samples (6 locations and 3 fraction samples). 18S fusion primer PCR Between 10 µL and 20 µL of purified 18S PCR products from each group of 4 replicate tubes, depending on band intensity from the 18S PCR product gel check (above) were pooled into separate tubes resulting in a total of 54 mixes of 18S PCR products (Table 3). The 454 adapter A and B sequences containing location and fraction-specific barcodes were then added to the 54 pooled 18S PCR products using the Qiagen Taq PCR Core Kit (Cat: 201225) in 0.2 mL tubes using 1X coral PCR buffer, 1X Q-solution, MgCl2 at 2.5 mM, dNTP each at 200 µM, the 18S fusion primers at 0.2 µM each, 1 unit Taq DNA Polymerase per reaction and 15 µl for macro pooled 18S PCR products and 30 µL for meio and micro pooled 18S PCR product per reaction. The PCR reactions were cycled in an Eppendorf Mastercycler epgradient S thermal cycler (1 cycle at 94°C for 3 min; 4 cycles of 94°C for 1 min, 50°C for 1 min, 15 sec, 72°C for 1 min, 30 sec; and 1 cycle of 72°C for 3 min, then held 25°C) once the cycler had reached 94°C (simplified hot start). The 54 mixes of 18S fusion primer PCR products were then divided into 2 groups of 27 (27 control sample tubes, 27 impacted sample tubes). The 27 mixes in each group were pooled into 9 separate tubes, combining 3 reactions for each type of fraction sample, resulting in a 18 mixes of barcoded 18S fusion primer PCR products. The 18 mixes of pooled 18S fusion primer PCR products were then purified separately using the Qiagen QIAquick PCR Purification Kit (Cat: 28106) and the amplicon DNA bound to each spin column eluted with 50 µL of 1 mM Tris.HCl, pH 8.5. The quantity and quality of amplicon DNA recovered and relative band intensities were compared by running 5 µL of each of the 18 eluates on a 2% agarose, 1X TBE buffer gel alongside NEB 100 bp ladder (Cat: N3231S). Preparation of 18S rDNA amplicon libraries for sequencing The 18 mixes were divided into 2 groups of 9 (9 control sample tubes, 9 impacted sample tubes). The 9 mixes in each group were pooled into 3 separate tubes, one tube for each type of fraction sample. The DNA concentration in the final 6 tubes was determined spectrophotometrically from 2 µL subsamples using a NanoDrop ND-1000 spectrophotometer. A total of 3.3 µg of pooled amplicon DNA was prepared in a control amplicon library tube from the 3 control fraction samples (macro: meio: micro) in the ratio of 1:2:2 respectively. Similarly, 3.3 µg of impacted amplicon DNA was prepared in an amplicon library tube from the 3 impacted fraction samples at the 1:2:2 ratio (Table 3). The DNA in each tube was precipitated in 0.3M sodium acetate, pH 5.2, and 2 volumes of ethanol and reconstituted in 11 µL of 10 mM Tris.HCl, pH 8.5, 1 µL of the control amplicon library and 1 µL of the impacted amplicon library was checked on a 2% agarose gel in 1X TBE buffer to confirm DNA quality and quantity alongside NEB 100 bp ladder (Cat: N3231S) and Bioline HyperLadder 1 DNA markers (Cat: BIO-33025). Pyrosequencing was performed on a Life System 454 GS FLX Sequencer at the Australian Genome Research Facility (QLD, Australia). Sequencing was run using 454 sequencing primer A on a picotitre plate with a large gasket format (two regions, one for each amplicon library) on 15/05/08 and the raw data returned as FASTA sequences.

13

3.3.3

Processing of DNA sequences

Raw FASTA sequences provided by AGRF were imported into Microsoft EXCEL spreadsheets and processed as follows: •

Sequences were binned into 18 different sets, based on detection of the barcodes and forward primer sequence (allowing for variation at each base);



Barcodes and forward primer sequences were then cropped from each sequence;



Reverse primer sequences were then cropped from all sequences (allowing for variation at each base);



A master file of unique sequences was then created by removing all duplicate reads;



Any sequence less than 80 nucleotides in length or that contained unknown bases were then removed from the dataset;



The “COUNTIF” function in EXCEL was then used to count the number of times (reads) each unique sequence in the masterfile was present in each of the barcoded sequence subsets.

3.3.4

Assignment of environmental DNA sequences to taxonomic groups

All unique 18S rRNA gene sequences were screened by batch BLASTn against GenBank (http://blast.ncbi.nlm.nih.gov/) and assigned to the taxon with the highest bit score (including uncharacterized eukaryotes). All sequences whose lowest identified taxon was Class, Phylum or uncharacterised eukaryote were resubmitted to GenBank and the top 100 closest hits examined. Sequences were then reassigned, as above, to the highest total score defined taxon, provided the total similarity score was greater than 60. If nucleotide identity to the highest defined taxon was below 60, sequences were defined as Eukaryote and assigned the highest total score uncharacterized eukaryote for the closest GenBank hit. If no hits were returned for a sequence with total score greater than 60, then the sequence was assigned “no match” to GenBank. Sequences were placed into preliminary taxon cluster sets with the same Order name, or if Order was not defined, at the next available level down (Family or Genus). If no taxon at Order or below was defiend, sequences were assigned upwards to the first taxonomically defined level (equivalent to Class, Phylum, then Eukaryote). Each taxon cluster was imported into the sequence manipulation package Vector NTI Version 10.3 (Invitrogen Life Sciences), aligned using ClustalW (6) in batches of up to 800 sequences, and Neighbour Joining (NJ) trees produced. The resulting NJ trees were examined and groups of sequences that formed clearly separated clades were allocated to 2 or more new taxon cluster sets. Next, sequences from each taxon cluster were compared phylogenetically with all other taxon cluster sets and sets exhibiting overlapping clades by NJ were combined. Finally representative sequences from all taxon clusters were aligned and again, cluster sets with sequences that produced overlapping clades by NJ were combined. This process was repeated until no further overlapping taxon cluster sets were identified by NJ. Sequences were then allocated to the lowest common taxon for all members of their respective taxon clusters.

14

Table 2: 18S rRNA gene fusion primers, target environmental locations and fraction samples

15

Primer

Primer Name

Primer Sequence

Sieve-Size Fraction

Control Location

Impacted Location

1

4A-CTAAGAACGT-F18S

GCCTCCCTCGCGCCATCAGCTAAGAACGTTGGTGCATGGCCGTTCTTAGT

Macro

Boronia Park

Iron Cove

2

4A-TGAACAATCG-F18S

GCCTCCCTCGCGCCATCAGTGAACAATCGTGGTGCATGGCCGTTCTTAGT

Meio

Boronia Park

Iron Cove

3

4A-CGAATAACTG-F18S

GCCTCCCTCGCGCCATCAGCGAATAACTGTGGTGCATGGCCGTTCTTAGT

Micro

Boronia Park

Iron Cove

4

4A-CTAAGTTGCA-F18S

GCCTCCCTCGCGCCATCAGCTAAGTTGCATGGTGCATGGCCGTTCTTAGT

Macro

Tambourine Bay

Five Dock Bay

5

4A-TCAAGTTAGC-F18S

GCCTCCCTCGCGCCATCAGTCAAGTTAGCTGGTGCATGGCCGTTCTTAGT

Meio

Tambourine Bay

Five Dock Bay

6

4A-TGAACTTGAC-F18S

GCCTCCCTCGCGCCATCAGTGAACTTGACTGGTGCATGGCCGTTCTTAGT

Micro

Tambourine Bay

Five Dock Bay

7

4A-ACTTGTTCAG-F18S

GCCTCCCTCGCGCCATCAGACTTGTTCAGTGGTGCATGGCCGTTCTTAGT

Macro

Woodford Bay

Hen & Chicken Bay

8

4A-AGTTCTTGAC-F18S

GCCTCCCTCGCGCCATCAGAGTTCTTGACTGGTGCATGGCCGTTCTTAGT

Meio

Woodford Bay

Hen & Chicken Bay

9

4A-CATTGTTAGC-F18S

GCCTCCCTCGCGCCATCAGCATTGTTAGCTGGTGCATGGCCGTTCTTAGT

Micro

Woodford Bay

Hen & Chicken Bay

10

4B-R18S

GCCTTGCCAGCCCGCTCAGCATCTAAGGGCATCACAGACC

All

All

All

Table 3: Experimental procedures and tracking of sample tubes Control library tubes Procedure

Impacted library tubes

16

total

macro

meio

micro

subtotal

macro

meio

micro

subtotal

Collection and storage of estuarine environmental fraction samples

216

36

36

36

108

36

36

36

108

Extraction of DNA (including preparation of macro sample homogenates)

216

36

36

36

108

36

36

36

108

Dilution of DNA extracts (1:20)

216

36

36

36

108

36

36

36

108

PCR using conserved 18S rRNA gene primers and gel check

216

36

36

36

108

36

36

36

108

PCR reaction purification

216

36

36

36

108

36

36

36

108

Pooling of 18S PCR products (all 4 replicates per fraction sample) based on original agarose gel band intensities and gel check

54

9

9

9

27

9

9

9

27

PCR using 18S fusion primers (9 unique primer sets for 3 locations X 3 fractions) and gel check

54

9

9

9

27

9

9

9

27

Pooling of 18S fusion primer PCR products (3 like-fractions per group)

18

3

3

3

9

3

3

3

9

PCR reaction purification and gel check

18

3

3

3

9

3

3

3

9

Pooling of 18S fusion primer PCR products (3 like-fractions per group)

6

1

1

1

3

1

1

1

3

Spectrophotometric quantitation of amplicon DNA on NanoDrop

6

1

1

1

3

1

1

1

3

Pooling of 3.3 µg of 18S fusion primer PCR products per library from macro, meio & micro fractions in the ratio of 1:2:2 and gel check

2

1

1

1

1

3.3.5

Statistical analysis

Differences in taxon richness, that is, the number of distinct 18S rRNA gene sequences per sample, were compared between reference and contaminant impacted sites for each biological fraction (macro, meio and micro-fraction) at all taxonomic levels, using the Wilcoxon Signed Rank Test. Differences in biotic assemblages were examined using a two factor (treatment and biological fraction) analysis of variance PERMANOVA with interaction, a procedure analogous to a multivariate ANOVA. Separate analyses were performed at each taxonomic level (sequence, taxon cluster, Class, Order and Phylum). SIMPER analysis was used to identify which taxa contributed the most to differences between reference and contaminated treatments. To graphically illustrate the differences the similarities/differences and variability of the biotic assemblages (treatment and biotic fraction) a nonmetric dimensional scaling (nMDS) ordination map was derived from the sequence data. All multivariate procedures of the biotic data were performed using the statistical package PRIMER 6+ (Primer-E, Plymouth).

3.4

DNA microarray design

3.4.1

Sequence selection

Sequences to be included in the microarray design were derived from 454 sequences from Sydney Harbour and GenBank database screening. All sequences present more than once in the Sydney Harbour 454 sequence dataset (20,807 sequences) were included in the initial screening stages. The GenBank sequences were comprised of 30,000 database entries for 18S rDNA and N-cycle and Scycle bacterial genes. Sequence sets for 18S, N- and S- cycle genes included entries from around the world, whereas the 16S bacterial DNA were either from whole genomes or identified from Australian marine, terrestrial and freshwater environments. GenBank sequences identified using ENTREZ and BLAST were downloaded fron GenBank and inserted into the sequence analysis package Vector NTI (Invitogen). Sequences were aligned in their respective taxonomic (18S and 16S rRNA) or functional (N-, S- cycle) groups using ClustalW. Sequences were then cropped to the most variable regions present in most of the sequences for each sequence set. The retained regions were selected to span less than 300 nucleotides to facilitate PCR amplification as well as 454 sequence validation at later dates if required. The criteria for retaining sequences were as follows: • • • •

Sequences containing nucleotide ambiguities in the selected regions were discarded; Sequences that did not span the selected cropped variable region for each gene were discarded; All duplicate sequences were excluded; Sequences that differed to another sequence by less than 3 nucleotides were discarded.

The main sequence set (Set 1, n=25,048 sequences) spanned an ~200 nucleotide (nt) variable segment of the 18S rRNA gene present at the 3’ end of the gene. This region was the target for the 454 sequencing run on the Sydney Harbour sediments (n=20,678 valid sequences). The remaining 18S rDNA sets (Set 2, n=5497; Set 3, n=4903) were sourced from GenBank and spanned 2 different variable regions (~140 nt and 500 nt respectively) towards the 5’ end and the centre of the 18S rDNA gene respectively. Wherever possible, sequences in the second reference set were derived from the same GenBank sequence retained in the first dataset. The second and third sets of targets against the 18S rRNA genes were included in the microarray design to enable detection and validation of selected reference species using independent DNA probes

17

Two sets of 16S rDNA sequences were created. Set 1 (n=1976) spanned an ~150 nucleoide (nt) variable segment of the 16S rDNA gene present at the centre, whereas Set 2 (n=2218) spanned a ~200 nt segment at the 5’ end of the gene. Finally, probe sets for all known bacterial nitrogen (15 genes) and sulfur cycles (3 genes) were included on the array. The taxonomic distribution and numbers of GenBank reference sequences retained for the DNA microarray design phase are shown in Table 4. They include over 35,000 18S rRNA gene sequences derived from 25 Eukaryotic Kingdoms (animal, fungi and plant) represented by 2993 Taxonomic Clusters. Finally, probe sets for the bacterial nitrogen (15 genes) and sulfur cycles (3 genes) were included on the array. 3.4.6

Affymetrix DNA microarray synthesis

The final sequence dataset, submitted to Affymetrix to generate the DNA microarray design, was comprised of 42,531 unique sequences. All sequences were labelled with their respective taxonomic, functional or environmental source code in 2993 phylogenetically and/or functionally related groups and imported into EXCEL spreadsheets with their full taxonomic description. The 20,678 18S rDNA sequences from the 454 sequencing on the Sydney Harbour sediments (see above) were labelled in the dataset with their closest “hit” on the GenBank database as their taxonomic descriptor. The initial design parameters for the array were set to produce 4 unique probes of 25 nucleotides for each submitted sequence. In addition, probe sets common to groups of taxonomically related sequences were created to enable the detection of taxa that may not be represented on the array by a unique probe set. The array was then filled to capacity by including additional probes that also fulfilled the design conditions for the first choice probe sets identified in the first round. Finally, a mismatch probe was included on the array for each exact match probe. The design phase was completed on 30.05.2009 and 95 chips were synthesised and delivered on 30.06.2009. The summary of the contents of final micoarray chip is as follows: • • • • • •

No. of sequences included on array: 42,531 No. taxonomic clusters (groups of related sequences): 2,993 No. unique probe pairs (match + mismatch): 244,065 No. of probe pair sets: 52,446 Part number: 520690 SO number: SO#1029465

18

Table 4: Summary of sequence probes included on the Affymetrix DNA microarray chip

Eukaryotes (18S gene)

19

Taxonomic Group

Seqs

Alveolata Amoebozoa Apusozoa Centroheliozoa Choanoflagellida Cryptophyta Diplomonadida Euglenozoa Fungi Metazoa Parabasalidea Rhizaria Rhodophyta stramenopiles Viridiplantae Other Unclassified

5246 746 214 53 81 91 43 132 1446 17478 19 2058 30 3248 869 355 3341

Total

35448

Clusters

206 54 6 3 8 5 4 11 130 878 3 118 7 201 125 52 615

2426

Bacteria (16S gene) Probes

21203 4985 780 290 525 558 583 868 7146 90677 134 12350 276 11709 5036 2071 19904

179095

Taxonomic Group

Crenarchaeota Euryarchaeota Nanoarchaeota Unclassified Archaea Acidobacteria Actinobacteria Aquificae Bacteroidetes Chlamydiae Chlorobi Chloroflexi Cyanobacteria Deinococcus_Thermus Fibrobacteres Firmicutes Fusobacteria Gemmatimonadetes Planctomycetes Proteobacteria Spirochaetes Verrucomicrobia Unclassified Bacteria

Seqs

48 78 2 6 115 306 3 477 6 10 21 147 22 5 195 4 17 94 1968 6 113 551

4194

Bacteria (Functional genes)

Clusters

3 18 2 1 5 8 1 4 2 3 4 13 4 2 10 1 2 3 93 2 6 360

547

Probes

815 811 14 54 1341 3222 21 3679 41 80 289 1156 298 197 2038 39 373 897 15816 56 1093 5118

37450

Gene

Seqs

Nitrogen-cycle amoA_1 167 amoA_2 149 amoA_3 4 chiA 102 hao 8 narG-3’ 241 narG-5’ 50 nasA 83 nifH 392 nirK 208 nirS 158 norB 77 nosZ-3’ 195 nosZ-5’ 92 nrfA 61 nxrA 45 ureC 123 Sulfur-cycle dsrA 524 dsrB 134 soxB 76 Affymetrix controls BioB BioC BioD Cre Other 2889

Clusters

Probes

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1395 849 26 910 86 2296 406 851 3584 1919 1454 704 1996 850 527 319 904

1 1 1

4710 1276 834 126 84 84 84 1237

20

27520

3.5

DNA microarray screening

3.5.1

Labelling of PCR products with a fluorescent dye Terminal deoxynucleotidyl transferase was used to add biotinylated residues to target DNA molecules. Aliquots of PCR products (250 ng for chip hybridization and 300 ng for determining labelling efficiency) were added to 0.2 mL tubes (Eppendorf, cat# 0030124.332). Labelling and hybridization control DNA (containing fish, nematode and yeast PCR products) was added to the 250ng aliquot and all samples were brought to a final volume of 30 µL using nuclease-free water. Samples were then denatured at 94oC, 4 min and cooled on ice for 3 min. Aliquots (20 µL) of labelling master-mix containing 1X TdT reaction buffer (Promega cat# M189A), 0.3 mM DNA labelling reagent (Affymetrix cat# 900542) and 60 U TdT (Promega cat# M1875) were then added to each sample and incubated at 37oC for 60 min before heating to 70oC for 15 min to stop the reaction. The labelling reactions containing 250 ng target DNA and DNA labelling and hybridization controls (375 pM final concentration) were stored at 20oC until required for chip hybridization. The 300 ng target labelling reactions were further processed using the Affymetrix Gel-Shift Assay procedure. The Affymetrix Gel-Shift Assay protocol (Affymetrix GeneChip Whole Transcript DoubleStranded Target Assay Manual, Appendix A, P/N 702179 Rev.3) was used with slight modification to assess the efficiency of target labelling. Each 300 ng target labelling reaction prepared above was split into two aliquots of 150 ng in 0.2 mL tubes and heated to 70oC for 2 min. 10 µL of 2 mg/mL NeutrAvidin (Invitrogen cat# A2666) in PBS buffer, pH 7.2 was then added to the second sample of each pair and incubated at room temperature for 10 min. Sample pairs were loaded onto 1.5% agarose/1X TBE gels with 3 µL GelRed (Biotium, cat# 41003) per 100 mL gel along with 100 bp DNA ladder (Invitrogen cat# 15628-019) and separated by electrophoresis at 8.5 V/cm.

3.5.2

Hybridization of labelled DNA to microarrays The Affymetrix custom probe array cartridges contained probe arrays (SHSED1s520690F) with format HD 528K, had a hybridization fill volume of 130 µL and a final volume of 160 µL. Two hours prior to hybridization reaction setup, the Affymetrix GeneChip Hybridization Oven 640 was set to 48oC & 60 rpm and heater block set to 65oC to equilibrate. Prior to use, the 20X stock of GeneChip Eukaryote Hybridization Controls (Affymetrix cat# 900454, with biotin-labelled, antisense, cRNA for bioB, bioC, bioD, cre at 1.5 pM, 5 pM, 25 pM and 100 pM, respectively), was thawed at RT then incubated at 65oC for 5 min to completely resuspend the mixture. The Affymetrix GeneChip Fluidics Station 450, enabled up to 4 hybridization cocktail reactions and 4 probe array cartridges to be setup and processed per session. Probe array cartridges stored at 4oC were left at room temperature for 20 min to equilibrate before use. Aliquots (40 µL)of each target labelling reaction were added to separate 0.5 mL tubes (Eppendorf cat# 0030124502). Aliquots (110 µL) of hybridization master-mix containing 1X hybridization mix (Affymetrix cat# 900720), 50 pM Control Oligonucleotide B2 (Affymetrix cat# 900454), 1X GeneChip Eukaryote Hybridization Controls (Affymetrix cat# 900454), 0.1 mg/mL herring sperm DNA (Promega cat# D181A), 0.5 mg/mL BSA (Invitrogen cat# 15561020) and 10% DMSO (Affymetrix cat# 900720), were then added to each 40 µL target labelling aliquot and mixed by gentle pipetting. The resultant 150 µl hybridization reactions, containing 200 ng (40 µl/50 µL X 250 ng) of labelled PCR product and fish, nematode, yeast

20

in-house labelling controls at 100 pM for each control, were incubated for 1 cycle of 99oC for 5 min followed by 1 cycle of 48oC for 5 min in an Eppendorf Mastercycler epgradient S and clarified by centrifugation at 17,320 g for 5 min. Prior to hybridization, the probe array cartridges were injected with 130ul of prehybridization mix (Affymetrix cat# 900720) using P200 filter tips (Molecular BioProducts cat# 2069) and incubated at 48oC, 60 rpm for 20 min in the Affymetrix hybaid oven. The prehybridization mix was then removed from each probe array cartridge and replaced with 130 µL of clarified hybridization mix prepared above. Each probe array cartridge was checked to ensure that the small bubble in each moved freely and was then placed in the Affymetrix hybaid oven to incubate at 48oC, 60 rpm for 16 h. The Affymetrix GeneChip Operating Software (GCOS) version 1.4 was used to operate the Affymetrix GeneChip Fluidics Station 450 and the Affymetrix GeneChip Scanner 3000. The Affymetrix GeneChip Hybridization, Wash, and Stain kit (cat# 900720) was used with the Affymetrix GeneChip Fluidics Station 450 to wash and stain each hybridized Affymetrix probe array cartridge. Prior to washing and staining Affymetrix probe arrays, the BLEACHv3_450 protocol was run on the Affymetrix fluidics station to eliminate any residual streptavidin phycoerythrin (SAPE)-antibody complex from the fluidics station tubing and needles. Experiment data for each hybridized Affymetrix probe array cartridge was entered into GCOS and the Affymetrix fluidics station primed with Wash Buffer A and Wash Buffer B (Affymetrix cat# 900720) by running the Prime_450 protocol. 3 tubes with Stain Cocktail 1, Stain Cocktail 2, and Array Holding Buffer (Affymetrix cat# 900720) were placed in the sample holders for each cartridge module and the Affymetrix probe array cartridges washed and stained by running the FS450_0002 protocol. 10 min prior to scanning, the Affymetrix GeneChip Scanner 3000 was turned on to warm up the green 532 nm laser used for fluorophore excitation. 3.5.3

Microarray scanning and data analysis Each stained Affymetrix probe array was scanned once using the Affymetrix GeneChip Scanner 3000. The yellow 570 nm light emitted by the excitated phycoerythrin fluorophore was proportional to the amount of bound target at each location (feature or cell) on the probe array. The probe array cell intensity files (*.cel files) and probe array library file (called a channel definition file, *.cdf file) created by GCOS were then analysed using the Gene Expression Statistical System (GESS) software package.

21

4. Results 4.1

Analysis of sediment biodiversity using DNA pyrosequencing

4.1.1

Sequence deconvolution

The sequencing run returned 200,294 reads from the reference locations and 230,185 reads from the impacted sites. The average read length was 190 nucleotides in length. Most sequence reads (96% from reference and 95% from impacted sites) were associated with identifiable DNA barcodes that enabled these reads to be allocated to one of the 6 locations and 3 size fractions (Table 5). All reads less than 80 nucleotides in length or containing ambiguous sequence were removed from the dataset leaving 387,800 confirmed reads. A total of 57,591 unique sequences were present in this set of 387,800 reads. 36,784 sequences (64%) of these were only represented by a single read in the dataset. The quality of these singleton read sequences was ambiguous and consequently, these reads were considered to be of ‘lower quality’ as they were likely to contain sequence errors, errors produced during PCR in addition to species rare at the study sites. The remaining sequences (n=20,807, 36%) were represented by multiple reads and one or more barcodes. Most of these sequences (n=15,074) represent taxa present only at reference or impacted locations. Sequences associated with a single barcode (n=10,716) were considered “higher quality” than singleton read sequences but still likely to include PCR errors, whereas those with multiple barcodes (n=10,091) were designated “highest quality” as they were generated in independent PCR reactions. The annotated sequences from the Sydney Harbour study have been deposited into GenBank under Genome Project 36317 with the accession numbers FJ919969-FJ930059 (highest quality) and FJ986623-FJ997209 (higher quality). 4.1.2

Biodiversity in reference and impacted locations determined by DNA sequencing

All “highest quality” sequences (n=10,091) were then compared to entries in the GenBank database using BLASTn and assigned as belonging to their nearest relative taxon as descibed above (Section 3.3.4). Analysis of the metagenomic data provided unequivocal discrimination between biota from reference and contaminated sites, irrespective of biological fraction (macro to micro) and taxonomic level (Sequence to Phylum) (Table 6). As illustrated using a non-metric multidimensional scaling (nMDS) ordination map of the sequence data (Figure 7), unequivocal discrimination was obtained between biota from reference and contaminated sites, irrespective of biological fraction (macro to micro) and taxonomic level (Sequence to Phylum). In this analysis, an increase in proximity represents an increase in assemblage similarity. The compositions of the biotic assemblages in the macro fractions, which contain metazoa such as polychaetes and bivalves, were always significantly different to those in the smaller fractions. The sequence data were used to: (i)

examine the composition of sediments, including sub-macrobiota;

(ii)

discriminate between biota sampled from reference and impacted sediments (Figure 8);

(iii)

indentify potential indicator taxa, i.e. taxa which are found exclusively in reference or impacted locations (Figure 9).

22

Non-metazoans and novel eukaryotes (previously ignored in environmental assessment) were highly prevalent in meio and micro fractions. Comparisons between meio and micro biotic assemblages in particular emphasize far broader compositional differences between locations for fauna such as fungi (Ascomycota), dinoflagellates (Dinophyceae), centric diatoms (Coscinodiscophyceae) and Rhizaria (Cercozoa), in addition to metazoans such as Polychaeta, Kinorhyncha and Maxillopoda. Importantly, responses recorded at the macro-scale (the fauna traditionally used as environmental indicators) were not necessarily preserved in the smaller size fractions. For example, capitellid polychaetes (blood worms) were more dominant in the reference meio and micro, but not macro, fractions compared to contaminated sites. Conversely, a decline in relative contribution of arthropods was observed at all contaminated treatments, regardless of fraction. Numerous 18S rDNA sequences (304) were uniquely associated with all contaminated locations, whereas others were uniquely associated with all reference locations (176). These sequences represent a diverse range of environmentally sensitive organisms including annelids, molluscs, arthropods, fungi, Dinophyceae and stramenopiles, and at all 3 size classes (macro, meio, and micro) of sediment samples examined. Polychaetes and gastropods in particular were restricted to reference locations, whereas bivalves were more prevalent in contaminated sites. Another group of particular note was Kinorhyncha, which was present in all reference samples, regardless of biological fraction, but undetected in contaminated locations. This result demonstrates that a clear relationship exists between sequence detection (and hence species) and geochemical processes such as the presence of heavy metals and organic matter contaminants in the sediments of Sydney Harbour. The results from this component of the study have been accepted for publication in an international journal as: Chariton AA, Court LN, Hartley DM, Colloff MJ and Hardy CM. (2010). Ecological assessment of estuarine sediments by pyrosequencing eukaryotic ribosomal DNA. Frontiers in Ecology and the Environment. e-View doi:10.1890/090115.

23

Table 5: Results of 454 sequencing run

Location Reference Boronia Park Fraction Barcode Name

Macro C01

Meio C02

Micro C03

Tambourine Bay Macro C04

Total reads

Meio C05

Totals

Impacted Micro C06

Woodford Bay Macro C07

Meio C08

Micro C09

Iron Cove Macro I01

Meio I02

Five Dock Bay Micro I03

Macro I04

187936

Meio I05

Micro I06

Hen and Chicken Bay Macro I07

Meio I08

Micro I09

406128

218192

24

Barcoded reads Reads ≥80nt

8420 8246

29207 28238

23629 22943

12004 11698

26594 25853

26982 26020

12131 11966

21720 21204

27249 26274

11467 11312

31195 29910

21908 21246

16857 16565

28667 27561

33468 32400

15356 15042

26658 25502

32616 31808

406128 393788

Unambiguous reads Seqs by fraction Singleton seqs Seqs by location Singleton seqs Seqs by treatment Singleton seqs Seqs present ≥2X

8194 1277 514

27958 5709 2521 11665 5570

22707 5582 2535

11624 1486 487

25565 5501 2330 12134 5644 29237 16121 10709

25715 6251 2827

11859 1429 551

20992 4390 1948 10121 4907

26023 5254 2408

11166 1473 542

29186 7850 3860 13519 6853

20820 5478 2451

16344 1956 778

27015 6405 2864 14045 6952 34979 20663 12004

31680 7012 3310

14821 1902 727

25034 5844 2563 13699 6857

31097 7290 3567

387800

57591 20807 36784 5733

(36%) (64%) (10%)

Unique sequences Seqs present ≥2X Singleton seqs Seqs in Ref & Imp

(387800 reads) (351016 reads) (36784 reads)

36783 75183 36783 64216 36784

Table 6: Two-factor PERMANOVA analysis Taxonomic level

Treatment

Fraction

Fraction (pair-wise test)

Interaction (treatment X fraction)

*

***

macro ≠ meio ≠ micro

ns

***

**

macro ≠ meio = micro

ns

Order

**

***

macro ≠ meio = micro

ns

Class

***

***

macro ≠ meio = micro

ns

Phylum

***

***

macro ≠ meio = micro

ns

Sequence Taxon-cluster

Comparisons were between reference and contaminated sites (treatment), biological fractions (macro-, meio- and microbiota) and their interactions Pair-wise test results are presented where significant differences were detected among fractions: ns, P>0.05; *, P