Biochemical and structural characterization of two Dictyostelium ...

4 downloads 12310 Views 1MB Size Report
Apr 1, 2016 - ... Energy Laboratory, 15013 Denver West Parkway, Golden CO 80401, USA .... Given the similarities in the organism development in D. purpurem, it is .... app) were determined in pNPL solutions of 0.0, 0.8, 1.6, 2.5, 4.2, 5.8 6.6, and 8.3 mM. ... HPX-87P column operated at 65°C with water as mobile phase.
AEM Accepted Manuscript Posted Online 1 April 2016 Appl. Environ. Microbiol. doi:10.1128/AEM.00163-16 Copyright © 2016, American Society for Microbiology. All Rights Reserved.

1

Biochemical and structural characterization of two Dictyostelium cellobiohydrolases from

2

the Amoebozoa kingdom reveal a high conservation between distant phylogenetic trees

3

of life

4

Running title: Dictyostelium cellobiohydrolases

5 6

Authors: Sarah E. Hobdey1,§, Brandon C. Knott2, Majid Haddad Momeni3,¶, Larry E. Taylor II1, Anna S. Borisova3, Kara K.

7

Podkaminer1, Todd A. VanderWall1, Michael E. Himmel1, Stephen R. Decker1, Gregg T. Beckham2,#, Jerry Ståhlberg3,#

8 9

Authors' affiliations:

10

1. Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden CO 80401, USA

11

2. National Bioenergy Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado, 80401,

12

USA

13

3. Department of Chemistry and Biotechnology, Swedish University of Agricultural Sciences, Uppsala BioCenter, Almas allé 5, SE-

14

75651, Uppsala Sweden

15 16

Current affiliations:

17

§ Idaho Veterans Research and Education Foundation, VA Medical Center, 500 W. Fort St., Boise, ID 83706

18

¶ Enzyme and Protein chemistry, Department of Systems Biology, Technical University of Denmark, Elektrovej, 2800 Kongens

19

Lyngby, Denmark

20 21

# E-mail address corresponding author (Gregg T. Beckham): [email protected]

22

# E-mail address corresponding author (Jerry Ståhlberg): [email protected]

23

1

24

ABSTRACT

25

Glycoside Hydrolase Family 7 (GH7) cellobiohydrolases (CBHs) are commonly employed enzymes in plant cell wall degradation

26

across eukaryotic kingdoms of life, as they provide significant hydrolytic potential in cellulose turnover. To date, many fungal

27

GH7 CBHs have been examined, yet many questions remain regarding structure-activity relationships in these important natural

28

and commercial enzymes. Here, we present crystal structures and biochemical analysis of two GH7 CBHs from social amoeba:

29

Dictyostelium discoideum and Dictyostelium purpureum (DdiCel7A and DpuCel7A, respectively). DdiCel7A and DpuCel7A natively

30

consist of a catalytic domain and do not exhibit a carbohydrate-binding module (CBM). The structures, resolved to 2.1 Å

31

(DdiCel7A), and 2.7 Å (DpuCel7A), are homologous to other GH7 CBHs with an enclosed active site tunnel. Two primary

32

differences between the Dictyostelium CBHs and the archetypal model GH7 CBH from Trichoderma reesei Cel7A (TreCel7A)

33

occur near the hydrolytic active site and the product binding sites. To compare the activity of these enzymes with TreCel7A, the

34

Family 1 TreCel7A CBM and linker was added to the C-terminus of the Dictyostelium enzymes, DdiCel7ACBM and DpuCel7ACBM,

35

which were recombinantly expressed in T. reesei. DdiCel7ACBM and DpuCel7ACBM hydrolyze Avicel, pretreated corn stover, and

36

phosphoric acid swollen cellulose as efficiently as TreCel7A when compared at their temperature optima. The Ki of cellobiose is

37

significantly higher for DdiCel7ACBM and DpuCel7ACBM than for TreCel7A: 205, 130, and 29 μM, respectively. Taken together, the

38

present study highlights the remarkable conservation in the activity of these key natural and industrial enzymes across quite

39

distant phylogenetic trees of life.

40

IMPORTANCE

41

GH7 CBHs are among the most important cellulolytic enzymes both in nature and for emerging industrial applications in

42

cellulose breakdown. Understanding the diversity of these key industrial enzymes is critical to engineering them for higher

43

activity and stability. The present work demonstrates that two GH7 CBHs from social amoeba are surprisingly quite similar in

44

structure and activity to the canonical GH7 CBH from the model biomass-degrading fungus, T. reesei, when tested in equivalent

45

conditions (with added CBM-linker domains) and on an industrially relevant substrate.

46

2

47

INTRODUCTION

48

Dictyostelia are a class of social amoebae (historically known as slime molds) containing four different groups of terrestrial

49

bacterivores from the Kingdom Amoebozoa. During vegetative growth, Dictyostelia exist as single-celled organisms; upon

50

starvation, a lack of nutrients become preventive for vegetative growth, and the cells aggregate into a multi-cellular slug. Slugs

51

have a defined posterior and anterior, have the ability to migrate, are sensitive to light and temperature, and exhibit an innate

52

immune system. When conditions are sufficiently severe, the slug can form a fruiting body where cells differentiate into spore

53

and stalk. During the formation of the slug and fruiting body, proteins and cellulose are deposited as an extracellular matrix,

54

providing the organism with environmental protection and structural rigidity (1-4). Cellulose is also found in the sheath that

55

surrounds the cell aggregates, and is deposited in the stalk, stalk cell walls, and spore coats (2, 5). Thus, the deposition and

56

reorganization of cellulose upon morphogenesis into the fruiting body is crucial to the development and propagation of the

57

organism.

58

Cellulose is the homopolymer of β-(1,4)-D-glucose, and individual cellulose chains pack into dense, recalcitrant

59

crystalline microfibrils in plant cell walls and other biological tissues (6). In nature, the intrinsic crystallinity and recalcitrance of

60

cellulose is of significant benefit to plants and other organisms that employ it as a structural material, such as those in the

61

Dictyostelium family. Besides providing a structural framework for biological tissues, cellulose also provides significant

62

protection from physical, chemical, and biological damage. The genomes of Dictyostelium discoideum and Dictyostelium

63

purpurem exhibit >40 genes related to cellulose synthesis and hydrolysis, including glycoside hydrolase Family 3 (GH3) β-

64

glucosidases, Family 5 and 9 endoglucanases (GH5, GH9) and a single gene with sequence homology to a GH Family 7 (GH7)

65

cellobiohydrolase (CBH) (7). GH7 CBHs are of particular interest in that they exhibit significant hydrolytic potential and are

66

commonly used by fungi and other biomass-degrading eukaryotes for cellulose hydrolysis (8). GH7 CBHs also form the basis of

67

most industrial cellulase cocktails for industrial lignocellulosic biomass conversion (8-10). The D. discoideum GH7 CBH (DdiCel7A,

68

previously CbhA) (7, 11) is a developmentally regulated enzyme, with the highest levels of expression occurring during

69

formation of the fruiting body (11). The DdiCel7A null mutant results in normal phenotypes, but delayed morphogenesis from

70

slug to fruiting body (11), suggesting DdiCel7A is required for breakdown and rearrangement of cellulose during these phases of

71

development. Given the similarities in the organism development in D. purpurem, it is likely that DpuCel7A serves a similar

72

function.

73

Structural and biochemical data of GH7s over the last several decades have revealed a similar overall fold and catalytic

74

arrangement (8). Structures of GH7 catalytic domains are known from ten CBHs: Trichoderma reesei Cel7A (12-14), 3

75

Heterobasidion irregulare Cel7A (15), Phanerochaete chrysosporium Cel7D (16), Talaromyces emersonii Cel7A (17), Trichoderma

76

harzianum Cel7A (18), Melanocarpus albomyces Cel7B (19), Aspergillus fumigatus Cel7A (20), Humicola grisea var. thermoidea

77

Cel7A (21) Limnoria quadripunctata Cel7B (22), Geotrichum candidum Cel7A (23), and three endoglucanases (EGs): T. reesei

78

Cel7B (24), Humicola insolens Cel7B (25) and Fusarium oxysporum Cel7B (26). All GH7s share a β-jelly roll fold, with two

79

antiparallel β-sheets packing face to face to form a curved β-sandwich. Long loops extend the edges of the β-sandwich and form

80

a ~45 Å long binding groove along the entire length of the enzyme. In CBHs, loops A1-4 and B1-4 (nomenclature defined in

81

reference (8)) are further elongated and enclose the cellulose chain in a tunnel using a threading-like mechanism, whereas EGs

82

exhibit a more open cleft. Many GH7 enzymes are bi-modular in nature, having a Family 1 carbohydrate-binding module (CBM)

83

connected to the catalytic domain (CD) by a glycosylated, flexible linker comprised of about 30 amino acids (27-29).

84

Deconstruction of cellulose by GH7 CBHs is a multi-step process that includes substrate binding, formation of the

85

catalytically active complex, hydrolysis, product release and processive translation along the substrate chain (8, 9, 30, 31). These

86

cellulases act from the reducing end of cellulose chains and perform multiple, processive hydrolytic events before disassociating

87

from a cellulose chain (32). For cellulases, such as the model GH7 CBH from T. reesei (TreCel7A), this process continues until the

88

enzyme either runs into an obstruction where the enzyme is stalled and eventually releases the substrate or the end of the

89

cellulose chain is reached (33-35). TreCel7A exhibits the most extensively enclosed tunnel amongst known GH7 CBH structures,

90

while PchCel7D displays the most open active site due to several loop deletions and residue size reductions on the tips of tunnel

91

enclosing loops (36). It has been suggested that this more open active site can have increased activity on microcrystalline

92

cellulose by increasing the rate of endo-initiation, while decreasing the effect of product inhibition (34, 36).

93

The post-hydrolysis release of cellobiose, while important for processivity, also has an undesired consequence of

94

becoming a strong inhibitor of hydrolysis. Product inhibition slows the overall conversion rate of cellulose to glucose,

95

particularly in high substrate concentrations (>10%) (37, 38). In commercial cellulase cocktails and natural organismal

96

secretomes, this product inhibition is relieved by the action of β-glucosidases, glycoside hydrolases that cleave cellobiose into

97

two glucose molecules. As GH7 CBHs comprise the majority of industrial cellulase mixtures and are significantly cellobiose-

98

inhibited, overcoming product inhibition in GH7 CBHs is of paramount importance for achieving high product yields in the

99

enzymatic hydrolysis of biomass.

100

Although the study of GH7 CBHs has now spanned several decades, our collective understanding of their action is still

101

being revealed (8). The development of accurate structure-activity relationships in these enzymes, especially given their

102

significant societal importance in the renewable fuels industry as well as their natural importance in the turnover of the most 4

103

abundant biological material on earth warrants concerted efforts to elucidate, understand, and improve their function. With the

104

wealth of new genomics and metagenomics data, new GH7 CBHs have recently been revealed, as described above, outside of

105

the fungal kingdom (7, 39, 40), thus presenting new opportunities to study these key enzymes in phylogenetically distant

106

branches of life from fungi (41). To that end, here we present two GH7 CBH structures and corresponding biochemical

107

characterization from the Amoebozoa kingdom from the social amoebas D. discoideum and D. purpureum. We also construct

108

and express chimeras of these GH7 CBHs in an industrially relevant fungal host, T. reesei, with the Family 1 CBM and linker from

109

TreCel7A such that performance characterizations can be conducted on an equivalent basis to the full-length model GH7 CBH,

110

TreCel7A. Approaches such as this, wherein full-length enzymes are produced in the same expression host to ensure similar

111

glycosylation patterns (42), will be essential to conduct comparisons of cellulase activities on a consistent basis (43). Using this

112

approach, we demonstrate that both Dictyostelia GH7 CBHs are equally as efficient as TreCel7A in the hydrolysis of crystalline

113

cellulose at their relative temperature and pH optima, and also exhibit increased tolerance to product inhibition.

114 115

MATERIALS AND METHODS

116

Cloning and Protein Expression: The genomes of D. discoideum and D. purpureum have been fully sequenced (7, 40) and each has been shown

117

to contain one GH7 CBH (Cel7A, previously CbhA) (7, 11), herein respectively referred to as DdiCel7A and DpuCel7A to conform with current

118

nomenclature (8). DdiCel7A and DpuCel7A sequences were obtained from the Joint Genome Institute genome database (44, 45). Dictyostelium

119

GH7 CBH genes were codon optimized and synthesized (GeneArt) without introns, cloned into, and expressed in the linearized pTrEno

120

plasmid, transformed into a T. reesei QM6a cbh1-delete strain as previously described (46). In nature, DdiCel7A and DpuCel7A do not exhibit a

121

CBM, thus the TreCel7A linker, which connects the CD and Family 1 CBM, was added to the 3’ end to generate a chimeric Cel7A, referred to

122

here as DdiCel7ACBM and DpuCel7ACBM. The linker sequence starts at the “GNPPGGNPP” motif from the TreCel7A linker.

123 124

Expression medium was inoculated from transformed spore stocks and the production volume was scaled up step-wise from 0.1 L to a final volume of 8 L in a 14 L bioreactor (New Brunswick BioFlo3000), as previously described (46).

125 126

Protein purification: Prior to purification, expression of the Dictyostelia GH7 CBHs, with and without the TreCel7A CBM-linker domain, was

127

verified by activity on p-nitrophenyl-β-D-lactopyranoside (pNPL, Sigma Aldrich, St. Louis, MO) and visualization on SDS-PAGE (Figure S1) and

128

western blot with rabbit anti-Cel7A polyclonal IgG.

129

For protein purification, fermentation broths were filtered to 0.45 μm, concentrated ~40x, and exchanged into 20 mM Bis-Tris buffer

130

pH 6.5 by hollow fiber ultrafiltration with 10,000 Da molecular weight cut-off. The filtrate was adjusted to 1.5 M (NH4)2SO4 by the addition of

131

4.0 M (NH4)2SO4 in 20 mM Bis-Tris buffer pH 6.5 and filtered through 0.2 µm PES prior to hydrophobic interaction chromatography on a 26/10

5

132

Phenyl Sepharose Fast Flow column eluted with a 1.5 M – 0.0 M (NH4)2SO4 gradient in 20 mM Bis-Tris buffer pH 6.5 over 8 column volumes.

133

Fractions containing Cel7s, identified by activity on pNPL, were pooled, desalted, and separated on a Source 15Q 10/100 Tricorn anion

134

exchange column with a gradient of 0 to 500 mM NaCl in 20 mM Bis-Tris buffer pH 6.5 for 30 column volumes. Purity was assessed by SDS-

135

PAGE and western blot. The anion exchange chromatography was completed twice for the native recombinant DdiCel7A and DpuCel7A

136

enzymes only. Finally, the enzymes were size-separated and buffer-exchanged over a 26/60 Superdex 75 size exclusion column in 20 mM

137

acetate buffer pH 5.0, 100 mM NaCl containing 0.02% NaN3. Purity was determined by SDS-PAGE. Trypsin digest, followed by LC-mass

138

spectrometry (MS) and peptide mapping, confirmed protein identities.

139 140

Differential Scanning Calorimetry: Thermal stability was evaluated by differential scanning microcalorimetry (DSC) using a MicroCal model VP-

141

DSC calorimeter (MicroCal, Inc., Northhampton, MA). Data were collected by Origin for DSC software (MicroCal). Samples contained 50 μg/mL

142

protein at pH 5.0 in 20 mM acetate buffer, 100 mM NaCl. The calorimeter scan rate was 60°C/h with a range of 30°C to 110°C.

143 144

pH and Temperature optimum: pH profiles for enzymatic activity were generated in duplicate by incubating 1.6 mM pNPL with 0.66 μM

145

indicated enzyme in 150 μL of MacIlvaine buffer (citrate-phosphate-NaCl) with a constant conductivity between 19-21 mS/cm, in a 96-well

146

plate at pH 2.9, 3.9, 5.0, 6.1, 7.1 and 8.3 at 30°C for 0, 1, 5, 10, 20, and 30 min. The reaction was stopped by the addition of 25 μL of 1.0 M

147

sodium carbonate and the A405 was determined by a colorimetric microtiter plate reader. The absorbance values were correlated to pNP

148

concentration via pNP standards and the pH profile graphs were generated by plotting the linear rate of pNP production as a function of pH.

149

Temperature profiles were generated in duplicate by incubating 1.6 mM pNPL with 0.66 μM enzyme in 2.0 mL of 20 mM sodium

150

acetate buffer pH 5.0, 100 mM NaCl and 0.02% NaN3. Reactions were incubated at 30, 35, 40, 45, 50, 55, 60 and 65°C in a water bath. At 0, 1,

151

5, 10, 20 and 30 min, 150 μL were removed from reaction vial to a 96 well plate containing 25 μL of 1.0 M sodium carbonate. Absorbance

152

values were determined as done for pH, and temperature profile graphs were generated by plotting the linear rate of pNPL turnover as a

153

function of temperature.

154 155

Enzyme kinetics: Vmax and apparent Km (Kmapp) were determined in pNPL solutions of 0.0, 0.8, 1.6, 2.5, 4.2, 5.8 6.6, and 8.3 mM. pNPL was

156

incubated with 0.66 μM CBH at 40°C or 45°C in 20 mM acetate buffer pH 5.0, 100 mM NaCl, in 96-well microtiter plates. Reactions were

157

quenched by the addition of 25 μL of 1.0 M sodium carbonate at 0, 1, 5, 10, 15, 20 and 30 min. The absorbance values were converted to

158

molar product concentrations using pNP standard curves. The data were fit to the Michaelis-Menten expression. Fitting and plots were

159

generated using Kaleidagraph (Synergy Software).

160

Inhibition experiments were completed with 0, 0.16.6, 41.6, 75.0, 91.6, and 108.3 μM cellobiose and 0.0, 0.8, 1.7, 2.5, and 4.2 mM

161

pNPL incubated with 0.6 μM Cel7A for 0, 5, 10, 15, and 20 min. Reactions were quenched by the addition of 25 μL of 1.0 M sodium carbonate

162

and the A405 was recorded. To fit kinetic parameters, theoretical rates were calculated from starting input values of Vmax, Km

app

, and Ki using

6

163

the Michaelis-Menten expression for competitive enzyme inhibition. Data were also evaluated for noncompetitive, uncompetitive, and mixed

164

inhibition.

165

Nonlinear regression fitting was accomplished using Excel Solver add-in (Microsoft, Redmond WA). Weighted squared residuals were

166

calculated for each data point using three different weighting schemes: simple, (νobs-νcalc)2; proportional, (νobs-νcalc)/νcalc)2; and statistical,

167

((νobs-νcalc)2/νcalc)), and the sum of residuals was minimized. The best fit to experimental data obtained was used for the reported kinetic

168

parameters derived.

169 170

Hydrolysis of Crystalline and Amorphous Cellulose: Hydrolysis reactions contained 5.0 mg/mL cellulose of the following substrates: Avicel (PH-

171

101), pretreated corn stover (NREL dilute-acid-pretreated corn stover, P050921) or phosphoric acid swollen cellulose (from cotton linter).

172

Substrates were incubated with 20 mg CBH/g-glucan, 1.0 mg/g-glucan -glucosidase (Aspergillus niger, Novozymes 188) and 1.0 mg/g-glucan

173

Cel5ACD:Y245G, (Acidothermus cellulolyticus) in 20 mM sodium acetate buffer pH 5.0, 100 mM NaCl, and 0.02% NaN3. Reactions were

174

conducted in triplicate in 1.8 mL sealed cryogenic vials with continuous mixing by rotational inversion at ~10 rpm, incubated at 55°C or 40°C,

175

as specified. At indicated time points, 100 μL samples were removed from the reaction vials, diluted in water and boiled for 5 min to quench

176

the reaction. Boiled samples were filtered through a 0.22 m syringe filter into glass HPLC vials. Glucose and cellobiose were determined by

177

HPLC analysis on an Aminex HPX-87P column operated at 65°C with water as mobile phase.

178 179

Crystallization and X-ray data collection: The purified DdiCel7A and DpuCel7A proteins (without CBM) were desalted to 20 mM sodium acetate

180

buffer, pH 5.0, prior to crystallization. Initial screens for crystallization conditions, by sitting–drop vapor diffusion in 96-well plates, were setup

181

using a Mosquito crystallization Robot (TTP, Labtech, Cambridge, UK). Equal amount of protein (12 mg/mL) and well solution were mixed in

182

0.3 µL crystallization drops. The most promising crystallization hits for DdiCel7A and DpuCel7A were obtained with the Morpheus (Molecular

183

Dimensions) and PEG ION II (Hampton Research) screens, respectively.

184

Crystals used for data collection were grown at 20°C by the hanging drop vapor diffusion method (47), by mixing 1:1 ratio of protein

185

(7 mg/mL in 20 mM sodium acetate, pH 5.0) and precipitant solutions; DdiCel7A with 12.5% w/v PEG 1000, 12.5% w/v PEG 3350, 12.5% v/v

186

MPD, 0.1 M sodium HEPES/MOPS, pH 7.5, 0.03 M MgCl2, 0.03M CaCl2, and DpuCel7A with 0.1 M sodium malonate pH 7.0, 12% w/v PEG 3350.

187 188

Data collection, processing and structure solution: X-ray diffraction data were collected at 100 K from single crystals of DdiCel7A and DpuCel7A

189

at the MAX-Lab I911-3 (Lund, Sweden) and ESRF ID23-1 (Grenoble, France) beamlines, respectively. The data were indexed and integrated

190

using IMOSFLM2 and scaled with SCALA3, in the CCP4 program suite (48). 5% of the reflections were set aside for calculation of Rfree factors.

191

The crystal structures were solved by molecular replacement using Phaser (49) in the CCP4 package. A structure of TreCel7A (PDB code 4D5J)

192

(50) was used as search model for DdiCel7A, which was then used as model for DpuCel7A. The DdiCel7A structure is solved in space group

193

P212121 with a single molecule in the asymmetric unit, and DpuCel7A in space group P212121 with two molecules in the asymmetric unit. Both

7

194

structures were refined by alternating cycles of maximum-likelihood refinement with REFMAC5 (51) and manual rebuilding using Coot (52).

195

Statistics from diffraction data processing and structure refinement are summarized in the Results section. The coordinates and structure

196

factors for the DdiCel7A and DpuCel7A structures have been deposited with the Protein Data Bank under accession codes 4ZZQ and 4ZZP,

197

respectively.

198 199

Sequence alignments and analyses: A structure based sequence alignment was first generated using Swiss PDB Viewer v. 4.1 ((53);

200

http://www.expasy.org/spdbv/) by superimposing the structures of DdiCel7A (4ZZQ), DpuCel7A (4ZZP), and GH7 CBHs in PDB (TreCel7A, 4C4C;

201

ThaCel7A, 2YOK; AfuCel7A, 4V20; RemCel7A, 3PL3; HgtCel7A, 4CSI; MalCel7B, 2RFZ; HirCel7A, 2XSP; PchCel7D, 1Z3V; LquCel7B, 4IPM). This

202

alignment was opened in Jalview v. 2.9.0b2 (54), and full-length sequences corresponding to the PDB entries and a collection of 38 non-fungal

203

GH7 sequences were added followed by multiple sequence alignment using the MUSCLE Web Service with default settings (55). Regions

204

flanking the GH7 domain were trimmed off (e.g. signal peptide, CBM, linker). The sequences selected for Figure 7 and Table 4 were realigned

205

with MUSCLE, and the program Indonesia (56) was used to calculate pairwise sequence identities and similarities using the Gonnet

206

substitution matrix (57). Phylogenetic analyses of an extended set of 113 GH7 protein sequences were conducted with the program MEGA7

207

(58). Sequences were collected by pBLAST search of each non-fungal GH7 against NCBI and UniProt databases, and selection of taxonomically

208

diverse hits among the many fungal entries retrieved. The sequences were aligned with MUSCLE (55), and regions flanking the GH7 domain

209

were trimmed off. A 90% identity threshold was applied to remove redundancy, and endoglucanases lacking the B4 loop (marked in Figure 1

210

and Figure 4A) were excluded. The evolutionary history was inferred using the Minimum Evolution method (59) and bootstrap phylogeny

211

testing with 700 replicates (60). The optimal tree with the sum of branch length of 19.8 is shown. Branch lengths are drawn to scale with

212

evolutionary distances in units of number of amino acid substitutions per site, computed using the Dayhoff matrix based method (61). The ME

213

tree was searched using the Close-Neighbor-Interchange (CNI) algorithm (62) at a search level of 1. The Neighbor-joining algorithm (63) was

214

used to generate the initial tree. All ambiguous positions were removed for each sequence pair. There were a total of 520 positions in the final

215

dataset.

216 217 218

RESULTS

219

Expression and purification of enzymes

220

DdiCel7A, DdiCel7ACBM, DpuCel7A, DpuCel7ACBM, and TreCel7A were recombinantly expressed in a newly developed T. reesei eno

221

expression system (46). The constitutive eno promotor drives expression of Cel7A to allow growth on glucose, which represses

222

the native cellulase system. Coupling this system with the Δcbh strain of T. reesei ensures that only the recombinant Cel7A

223

protein is expressed, as previously described (46). For evaluation of cellulose digestion on various solid substrates, we added the

8

224

Family 1 CBM from TreCel7A to the CD of native DdiCel7A and DpuCel7A (dubbed DdiCel7CBM and DpuCel7ACBM). Fed-batch

225

fermentations typically yielded 5-25 mg/L recombinant protein after processing and purification. All proteins were purified to

226

electrophoretic homogeneity, with the exception of DpuCel7A, which exhibited a doublet band by SDS-PAGE (Figure S1).

227

Sequencing Mass Spectrometry identified both bands as native DpuCelA, and it is thus unclear why there are two populations.

228

However, as we were not able to separate the two populations, the mixed population was provided for crystallization.

229 230

Temperature, pH, and thermostability profiles

231

The dependence of activity on temperature and pH for DdiCel7ACBM and DpuCel7ACBM, and TreCel7A for comparison was

232

determined using pNPL as substrate. Activities were measured at eight temperatures at pH 5.0, and the data point with highest

233

activity was recorded as the temperature optimum in Table 1. DdiCel7ACBM exhibits a lower temperature optimum, 45°C, than

234

DpuCel7ACBM and TreCel7A, which exhibit optimal activity around 55°C, as shown in Figure 1. For consistency, the optimal pH

235

was determined in MacIlvains buffers with a constant ionic strength. Activity was evaluated at 40°C for DdiCel7ACBM and 45°C for

236

DpuCel7ACBM and TreCel7A to account for the difference in temperature optima. The pH of maximum pNPL turnover for

237

DpuCel7A and DdiCel7A, pH 5.0, is higher than that of TreCel7A, pH 4.0, as shown in Figure 1.

238 239

The denaturation temperature was measured with DSC in the absence of substrate. The Tm of DdiCel7ACBM, DpuCel7ACBM, and TreCel7A are 53°C, 62°C and 64°C, respectively, as shown in Table 1 and Figure 1.

240 241

Enzyme kinetics on pNPL

242

The kinetic properties of DdiCel7ACBM, DpuCel7ACBM, and TreCel7A were compared on pNPL (Table 1). The Vmax and Kmapp were

243

determined by plotting the pNPL turnover rate for increasing substrate concentration and evaluated by the Michaelis-Menten

244

model, as shown in Figure 2. The catalytic rate constant of DpuCel7ACBM was significantly higher at 36.2 min-1 than the kcat of

245

DdiCel7ACBM and TreCel7A, which were nearly equal at 16.2 and 15.9 min-1 respectively.

246

DpuCel7ACBM and DdiCel7ACBM was higher than TreCel7A, 3.2, 3.2 and 1.2 mM, respectively, indicating a lower binding affinity

247

for pNPL.

However, the Kmapp of both

248 249

Cellulose hydrolysis

250

Enzymatic activity comparisons were conducted on 0.5% Avicel, pretreated corn stover (PCS), and phosphoric acid swollen

251

cellulose (PASC) substrates, respectively. The enzyme cocktails comprised β-glucosidase from Novozymes 188, the 9

252

endoglucanase E1 from A. cellulolyticus and DpuCel7ACBM, DdiCel7ACBM, or TreCel7A (64). Assays were completed at the

253

previously determined pH and temperature optimum for DpuCel7ACBM and DdiCel7ACBM, pH 5.0 at 55°C and 40°C, respectively.

254

Glucose and cellobiose concentrations were determined by HPLC and plotted as a function of time. Curves were fit to a single

255

exponential; the time constant (τ) for cellulose hydrolysis is reported in Table 2 and indicates that cocktails with both

256

DpuCel7ACBM and DdiCel7ACBM convert cellulose to glucose in an equivalent manner as TreCel7A on all three cellulose substrates.

257

The hydrolysis profiles are all strikingly similar, Figures 2 and 3, with only one minor difference in the hydrolysis of Avicel by

258

DpuCel7ACBM, Figure 2, where the enzyme reaches a lower maximum conversion but at a faster time than TreCel7A.

259 260

DdiCel7A and DpuCel7A structures

261

After extensive screening, crystals were obtained of the T. reesei-expressed DdiCel7A and DpuCel7A proteins that could be used

262

for synchrotron radiation diffraction data collection. However, in both cases the crystals were small and the diffraction data

263

were notoriously weak and suffered from twinning and high mosaicity. This is reflected by high R/Rfree factors of 0.25/0.35 and

264

0.23/0.31 for the DdiCel7A and DpuCel7A structures, refined at 2.1 and 2.7 Å resolution, respectively, and a rather high average

265

temperature factor (B factor) of 28.5 Å2 for DdiCel7A. Nevertheless, almost the entire backbone could be traced with

266

confidence. All amino acid side chains were placed and refined, apart from Lys-438 of DdiCel7A and Asn-438 and Asn-439 of

267

DpuCel7A, which were not visible in the electron density and thus excluded from the final models. The DdiCel7A structure is

268

solved in space group P212121 with a single protein chain in the asymmetric unit, and DpuCel7A in P212121 with two chains, A

269

and B, in the asymmetric unit. In both structures, the N-terminal glutamine residue is cyclized to pyroglutamate (PCA1), Pro-388

270

is cis-proline, all 18 cysteines form disulfide bonds, N-glycosylation is evident at Asn-269 with density for one N-

271

acetylglucosamine unit, and Lys437 is the last visible residue. In DpuCel7A, an additional GlcNAc is attached to Asn-112.

272

Statistics of diffraction data processing and structure refinement are summarized in Table 3.

273 274

Structural alignments and analysis

275

As expected from the high amino acid sequence identity (Figure 1), the folds of DdiCel7A and DpuCel7A are very similar to one

276

another (80% identity; rmsd 0.45 Å) as well as to the catalytic module of TreCel7A (59% identity; rmsd 0.66 and 0.62 Å) (Figure

277

4). The catalytic residues for the Dictyostelium Cel7A structures (Glu211, Asp213, and Glu216, corresponding to TreCel7A

278

Glu212, Asp214, and Glu217) and aromatic residues that stack on the -7, -4, -2, and +1 binding sites (W40, W38, W373, W382,

279

respectively) overlap the positions of the analogous residues in TreCel7A (PDB code 4C4C). Figures 4A and B show the 10

280

superposition of DdiCel7A and DpuCel7A with the cellononaose Michaelis complex of TreCel7A (PDB code 4C4C) where surface-

281

loops of interest and glucoside-binding subsites are indicated (14). The corresponding loop regions are also highlighted in the

282

structure-based sequence alignment of the catalytic modules of GH7 CBH structures available in the PDB (Figure 1).

283

Structural alignments of DdiCel7A and DpuCel7A to other known GH7 CBH structures show that the eight major binding

284

tunnel loops (A1-A4 and B1-B4) of DdiCel7A and DpCel7A are exceptionally similar to each other and to that of TreCel7A,

285

revealing a highly enclosed active site (Figure 5). Loops A3 and B3 of Ddi and Dpu lack the Tyr residues at their tips that are

286

present in TreCel7A. However, the Dictyostelium Cel7A enzymes have a tyrosine (Tyr202) that protrudes from the base of the B2

287

loop whose side chain resides in the same general space as that of the B3 tyrosine in the other GH7 structures (Tyr247 in

288

TreCel7A) (Figure 6). The opposing residue on loop A3 (Tyr371 in TreCel7A) is Gln377 in DdiCel7A and Leu377 in DpuCel7A.

289

Glutamine at this position is a rare motif among GH7 CBHs and is found only in another protist, Pseudotrichonympha grassii, a

290

parabasilian symbiont in the gut of termites. A few distantly related basidiomycetes have glutamate at this position, e.g., in

291

HirCel7A, where MD simulations showed that the glutamate sidechain can interact with a bound cellulose chain (15). DpuCel7A

292

and DdiCel7A also possess two glutamine residues on the A1 loop that would be in direct contact with the cellulose surface

293

when the enzyme is adsorbed (Q97 and Q99) (65); TreCel7A only possesses the former of these. Near the entrance to the

294

tunnel, an aromatic residue in DpuCel7A (Phe-6) may potentially provide an additional -8 subsite and help to guide a cellulose

295

chain into the tunnel. Leucine is found at the corresponding position in DdiCel7A and TreCel7A.

296

TreCel7A has four attachment sites on the CD for N-glycosylation: N45, N64, N270, and N384. Only the N270 glycan site

297

is maintained (at N269) in DpuCel7A and DdiCel7A, where one attached N-acetylglucosamine is visible in both structures. At the

298

position analogous to TreCel7A N45, both DdiCel7A and DpuCel7A have a one-residue deletion, and TreCel7A N384 is replaced

299

by a threonine. Instead, the Dictyostelium sequences exhibit an N-glycosylation site at N112, also found in ThaCel7A and where

300

GlcNAc is seen in the DpuCel7A structure, and an additional site at N433 in DpuCel7A where there is not clear electron density

301

to observe glycan attachment. We also note that TreCel7A has ten disulfide bridges whereas the Dictyostelium Cel7A enzymes

302

have only nine. The more common scenario in GH7 cellulases is to lack this extra disulfide bridge between Cys4 and Cys72

303

(TreCel7A numbering). In fact, TreCel7A and T. harzianum Cel7A are the only GH7 members with a solved structure that possess

304

the tenth disulfide bridge.

305

Lastly, the product binding sites of DdiCel7A and DpuCel7A are quite similar to TreCel7A with a few notable differences

306

among residues that have been extensively studied and discussed in past GH7 studies (8). Thr246 at the tip of loop B3 in

307

TreCel7A is replaced by Ala245 in DdiCel7A and DpuCel7A, resulting in the loss of a hydrogen bond to the substrate at subsite +1 11

308

(Figure 6A). At the same time, the sidechain of Asp345 in loop B4 of DdiCel7A and DpuCel7A is suitably positioned for H-bonding

309

to OH1 of the reducing-end +2 glucosyl unit. Actually, most GH7 CBHs have aspartate at the corresponding position, but

310

TreCel7A and other Trichoderma species lack this interaction due to a one-residue deletion in loop B4 (15, 50, 66).

311 312

Cellobiose inhibition

313

In light of the structural differences found in the product-binding site, Ki studies for product inhibition were carried out using

314

pNPL as substrate. Simultaneous Nonlinear regression was used to evaluate the events for competitive, uncompetitive,

315

noncompetitive, and mixed inhibition. The data collected suggest that these enzymes are competitively inhibited by cellobiose

316

and the Ki for competitive inhibition is reported in Table 1. Cellobiose is 5-7 times less inhibitory to DdiCel7ACBM and

317

DpuCel7ACBM than to TreCel7A, with Ki values of 205, 130, and 29 μM, respectively. This is consistent with previous studies

318

investigating similar structural changes in the GH7 CBH active site (36).

319 320

DISCUSSION

321

GH7 CBHs are complex, important enzymes in both nature and the growing biofuels industry. Continued structural and

322

biochemical studies of these enzymes are essential to developing detailed structure-function relationships, given their

323

importance. The GH7 CBHs from D. discoideum and D. purpureum are now the second and third known non-fungal GH7 enzyme

324

structures, with the first being from a salt water inhabiting, wood-boring isopod, L. quadripunctata (22).

325

Although many GH7 CBHs are bi-modular, exhibiting a catalytic domain (CD) connected to a carbohydrate-binding

326

module (CBM) by ~30 amino acid flexible linker, the native sequences of DdiCel7A and DpuCel7A do not exhibit a CBM-linker

327

domain. It has long been known that the CBM aids the enzyme by increasing the binding to its crystalline cellulose substrate

328

(67); however, it is likely that there are evolutionary reasons for the lack of a CBM, such as environments of high cellulose

329

density where the substrate concentration greatly surpasses the Km of the CBM or CD. Indeed, a recent study by Varnai et al.

330

demonstrates that TreCel7A without the CBM-linker is able to achieve the same extent of conversion on Avicel and pretreated

331

wheat straw at high (20%) solids loadings, while at lower solids loading, such as 0.5% used here, the CBM-linker aids in

332

conversion by binding crystalline cellulose and increasing the local concentration (68). Consequently, for biochemical studies,

333

we have added the Family 1 CBM and linker from TreCel7A to the CD of native DdiCel7A and DpuCel7A (DdiCel7CBM and

334

DpuCel7ACBM) to more directly be able to compare GH7 CBH conversion at low solids loadings. Going forward, self-consistent

335

comparisons of GH7 CBHs with (or without) CBM-linkers, enabled by the development of a Cel7A-delete T. reesei expression 12

336

host (46), will be important to the continued development of structure-activity relationships and the development of

337

baseline activity of GH7 CBHs from natural diversity in the context of a background enzyme cocktail on process relevant

338

substrates.

339

Since T. reesei is known to be one of the most cellulolytic organisms, it is surprising that both the Dictyostelium GH7s

340

were equally active on biomass and crystalline cellulose at their respective temperature optima (Table 1). However, previous

341

investigations of the cellulose synthase in D. discoideum have shown that the cellulose deposited in the sheath is highly

342

crystalline cellulose I (69) and that the high crystallinity is positively correlated with increased strength and rigidity of the stalk

343

(70). Thus, it is not unreasonable to surmise that the Dictyostelium Cel7s have evolved a highly efficient mechanism for

344

rearrangement of crystalline cellulose I to resourcefully modify the cell wall and sheath structure and potentially recycle glucose

345

during formation of the stalk and fruiting body. Additionally, Eichinger et al. speculate that Dictyostelia may also utilize the

346

cellulose-and hemicellulose-degrading enzymes in their genomes to degrade plant tissue for food (40). Immuno-labeling of GH7

347

CBHs is a well-known technique to visualize these enzymes (71), and could potentially be used to examine the distribution of

348

these enzymes when they are expressed in the presence and absence of either self-generated cellulose or exogenous sources to

349

determine the function of GH7 CBHs in the life cycle of Dictyostelia.

350

In a previous report following in vivo expression of GH7 CBH in D. discoideum, the enzyme was found to be highly

351

controlled by the developmental phase of the organism (11). While the actual quantity of GH7 CBH at each stage was not

352

reported, it may be that this organism only expresses a low amount of Cel7 at critical developmental phases and perhaps

353

expression of large quantities like those secreted by T. reesei during biomass degradation could be detrimental to the organism.

354

It is also possible that the hydrolytic ability of these enzymes is advantageous in that it enables the organism to progress

355

through morphogenesis quickly, therefore allowing it to rapidly escape ill-fated environments. Given this tight regulation of

356

expression, it would also be interesting to investigate the GH7 CBH promoter regulation during morphogenesis to better

357

understand the process of morphogenesis or the mechanism of transcription regulation, as well as for potential use in future

358

GH7 expression systems for industrial applications.

359

The crystal structures of native DdiCel7A and DpuCel7A are exceptionally similar to each other and to TreCel7A (Figure

360

4). Indeed, the degree of binding tunnel closure is the closest to TreCel7A observed in GH7 structures to date (Figure 5).

361

Functionally, this suggests increased processivity and a reduced dissociation constant, perhaps similar to TreCel7A. Of particular

362

significance is the lack of tyrosine in the B3 and A3 loops whose side chains maintain some degree of flexibility (exemplified by

363

their different positions in TreCel7A structures 1CEL and 4C4C) and partially enclose the binding tunnel at the catalytic center 13

364

(8). To compensate, the Dictyostelium Cel7A enzymes have a unique tyrosine (Y202) that protrudes from the base of the B2 loop

365

whose side chain resides in the same general space as that of the A3 and B3 tyrosines in the other GH7 structures (Figure 6A).

366

The result is that the degree of binding tunnel enclosure in DpuCel7A and DdiCel7A in this region is comparable or greater than

367

in the other GH7 CBHs. In addition to the aforementioned implications for processivity and dissociation constant, this may

368

suggest that the Dictyostelium Cel7A enzymes may have elevated probabilities of ‘endo-initiation’ as compared with TreCel7A

369

(34).

370

The product binding sites of DdiCel7A and DpuCel7A are similar to TreCel7A with a few notable, potentially important

371

differences, namely T246 (TreCel7A numbering) and D345 (DpuCel7A and DdiCel7A numbering) (Figure 6A). Stahlberg et al. (72)

372

speculated on the flexibility of the T246 side chain, as evidenced by elevated temperature factors and weaker electron density

373

when the +1 binding site was vacant. However, when the +1 binding site was filled, the position of T246 was as stable as the rest

374

of the protein, thus providing further evidence for the importance of this hydrogen bond. von Ossowski et al. (36) discussed the

375

impact of this residue’s absence in PchCel7D, as did Ubhayasekera et al. (66). The latter of these studies concluded that the two

376

most significant differences in ligand binding between TreCel7A and PchCel7D are the ‘direct hydrogen bond’ with T246 (absent

377

in PchCel7D) and the D336 interaction at the binding tunnel exit present in PchCel7D, but replaced by glycine in TreCel7A. Like

378

PchCel7D, both Dictyostelium structures possess an aspartate at this position (D345 in Ddi and DpuCel7A), as illustrated in Figure

379

6B. The conservation of this interaction in GH7 cellulases has been discussed (66); as mentioned in the Results section, the

380

aspartate residue here is rather well conserved in GH7 CBHs except those in the Trichoderma genus and a few others where the

381

B4 loop is one residue shorter. This aspartate is additionally absent in GH7 EGs due to deletion of the B4 loop, implying this

382

residue may play a role in cellulose chain processivity in some GH7 CBHs.

383

To verify the functional significance of the unique product site motif, we investigated the biochemical characteristics

384

and activities of DdiCel7ACBM and DpuCel7ACBM on soluble (Table 1) and insoluble (Table 2) substrates. We found that on the

385

soluble substrate pNPL, DpuCel7ACBM has a higher specific activity than either DdiCel7ACBM or TreCel7A. Interestingly, we could

386

find no structural difference in the CD of DdiCel7A and DpuCel7A to fully explain the differences in activities. Evolutionarily, it is

387

unclear why the Dictyostelium Cel7s are more cellobiose tolerant. Perhaps the high viscosity of the extracellular matrix amplifies

388

the effective cellobiose concentration due to limited diffusion. Both Dictyostelium organisms contain multiple genes predicted

389

to have β-glucosidase activity (seven for D. discoideum and eight for D. purpureum) (7); however, to our knowledge, it has not

390

been reported if the organism utilizes the glucose for energy or for some other purpose, such as feeding a bacterial symbiote, as

14

391

has been previously suggested (40). The co-secretion of β-glucosidases alongside the GH7 CBHs has also not been examined to

392

our knowledge.

393

GH7 genes occur in branches of the eukaryotic tree of life that diverged over 1 billion years ago (Figure 7) (73),

394

suggesting either an ancient ancestral GH7 gene or gene uptake via horizontal gene transfer (HGT). These genes have been well

395

characterized and identified in many cellulolytic fungi and more recently in animals such as marine wood borers (Crustacea, e.g.,

396

the gribble or Limnoria quadripunctata) (39). Moreover, they have been found in additional eukaryote branches such as amoeba

397

(40), oomycetes (Stramenopiles, e.g., the potato-blight pathogen Phytophthora infestans), haptophytes (e.g., the phytoplankton

398

Emiliania huxleyi), and parabasilids (Excavata, e.g., the termite hindgut symbiont Pseudotrichonympha grassii) (39, 74).

399

Compared to other GH families, GH7 CBHs are noteworthy in how conserved they are in terms of sequence identity (>40%;

400

Figure 7; Table 4). If GH7 cellulases were derived from a single ancestor before the earliest point of divergence in the eukaryotic

401

tree of life, the sequence conservation is remarkable given the early event of divergence. If this is indeed the case, these

402

findings suggest that GH7 CBHs cannot accommodate a broad sequence space for primary function. However, we cannot rule

403

out that HGT has occurred on a more recent timescale, thus limiting the extent of sequence divergence. We have performed

404

phylogenetic analysis of taxonomically diverse GH7 protein sequences to construct an evolutionary tree based on the GH7

405

domain alone. A simplified tree is shown in Figure 8, and the complete tree with all sequence names is shown in Figure S2. The

406

amoeba are placed among ascomycete sequences, indeed suggesting HGT from an early ascomycete to a Dictyostelial ancestor.

407

The event should then have occurred after the separation of Ascomycota and Basidiomycota (~600 Myr ago), but before the

408

divergence of D. discoideum and D. purpureum estimated at around 400 million years ago (7), and supposedly much earlier as

409

indicated by additional nodes further down in the Amoebozoa subtree and the high similarity between DdiCel7A and DpuCel7A

410

(Figure 8). HGT has been hypothesized as a primary means of genome evolution in D. discoideum (40). Given the lifestyle of

411

Dictyostelia inhabiting the forest soil, it seems plausible that HGT of GH7 CBHs from fungi may have been the primary

412

mechanism for these enzymes being present in slime mold genomes (or vice versa). Similar arguments have been made for HGT

413

into wood boring crustaceans, such as L. quadripunctata (39). We note, however, that the genes in Dictyostelia strongly

414

supported to have been acquired by HGT are all of bacterial origin (7), and prokaryote-eukaryote HGT is generally easier to

415

detect than HGT of nuclear genes between eukaryotes (75). For oomycetes, extensive whole-genome, gene-to-gene

416

phylogenetic analysis has been applied to detect putative HGT between fungi and plant pathogenic oomycetes, and the results

417

indicate that HGT has been fundamental to the evolution of plant parasitic traits within the oomycetes (76). Interestingly,

418

though, the GH7 gene is not present among the 34 candidate HGT genes identified in the analysis, suggesting that it does not 15

419

show strong signs of being acquired by HGT in oomycetes (while a GH10 gene annotated as a β1,4-xylanase is), which is also

420

consistent with our phylogenetic analysis (Figure 8). Regardless of the gene transfer mechanism or ancestry, we find it

421

remarkable that, despite a distant phylogenetic and temporal relationship between Dictyostelia GH7 CBHs and their more well-

422

studied fungal counterparts, such high homology and similarity in activity and structure is preserved.

423

In conclusion, this study presents two crystal structures of social amoeba GH7 CBHs, and shows that they are

424

structurally and functionally very similar to the well-studied TreCel7A, revealing a remarkable conservation in these important

425

cellulose-degrading enzymes across distant phylogenetic branches on the eukaryotic tree of life (39). Several key differences in

426

the makeup of the binding tunnel loops and the product binding sites likely give rise to the observed differences in kinetics on

427

the small molecule substrate, pNPL and reduction in product inhibition by cellobiose compared to TreCel7A. Lastly, this study

428

presents an approach to append a CBM-linker on GH7 CBHs to produce chimeric cellulases for direct comparison on insoluble

429

cellulose-rich substrates. This approach will be essential to building a large collection of self-consistent results in the study of

430

GH7 CBH activities and for determining accurate structure-function relationships in these important enzymes (8). The functional

431

relevance of the GH7 CBHs in Dictyostelia, likely intimately related to the reshaping of the extracellular matrix important to the

432

slime mold life cycle in their morphogenesis from slug to fruiting body (11) or in their ability to digest cellulose-containing

433

organisms in the forest soil (40), remains a question of particular interest to understand the evolutionary pressures on the

434

activities of slime mold GH7 CBHs.

435 436

Funding information.

437

SEH, BCK, LET, KKP, TAV, MEH, SRD, and GTB acknowledge the US Department of Energy BioEnergy Technologies Office for

438

funding. MHM and JS acknowledge the Faculty for Natural Resources and Agriculture at the Swedish University of Agricultural

439

Sciences through the research program MicroDrivE. BCK also thanks the National Renewable Energy Laboratory’s Director’s

440

Fellowship Program for funding.

441 442

Acknowledgments.

443

We are grateful to Nils Högberg, Swedish University of Agricultural Sciences, for advice on phylogenetic analysis. The U.S.

444

Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains

445

a nonexclusive, paid up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others

446

to do so, for U.S. Government purposes. 16

447 448 449

References

450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496

1. 2. 3. 4. 5. 6. 7.

8. 9. 10. 11. 12. 13. 14.

15.

16.

17.

18.

19.

Wang Y, Slade MB, Gooley AA, Atwell BJ, Williams KL. 2001. Cellulose-binding modules from extracellular matrix proteins of Dictyostelium discoideum stalk and sheath. Eur J Biochem 268:4334-4345. Zhang P, McGlynn AC, Loomis WF, Blanton RL, West CM. 2001. Spore coat formation and timely sporulation depend on cellulose in Dictyostelium. Differentiation 67:72-79. Huber RJ, O'Day DH. 2015. Proteomic profiling of the extracellular matrix (slime sheath) of Dictyostelium discoideum. Proteomics 15:3315-3319. Freeze H, Loomis WF. 1977. Isolation and characterization of a component of the surface sheath of Dictyostelium discoideum. Journal of Biological Chemistry 252:820-824. Freeze H, Loomis WF. 1978. Chemical analysis of stalk components of Dictyostelium discoideum. Biochimica et Biophysica Acta (BBA)-General Subjects 539:529-537. Nishiyama Y, Langan P, Chanzy H. 2002. Crystal structure and hydrogen-bonding system in cellulose Iβ from synchrotron X-ray and neutron fiber diffraction. Journal of the American Chemical Society 124:9074-9082. Sucgang R, Kuo A, Tian X, Salerno W, Parikh A, Feasley CL, Dalin E, Tu H, Huang E, Barry K, Lindquist E, Shapiro H, Bruce D, Schmutz J, Salamov A, Fey P, Gaudet P, Anjard C, Babu MM, Basu S, Bushmanova Y, van der Wel H, KatohKurasawa M, Dinh C, Coutinho PM, Saito T, Elias M, Schaap P, Kay RR, Henrissat B, Eichinger L, Rivero F, Putnam NH, West CM, Loomis WF, Chisholm RL, Shaulsky G, Strassmann JE, Queller DC, Kuspa A, Grigoriev IV. 2011. Comparative genomics of the social amoebae Dictyostelium discoideum and Dictyostelium purpureum. Genome Biol 12:R20. Payne CM, Knott BC, Mayes HB, Hansson H, Himmel ME, Sandgren M, Ståhlberg J, Beckham GT. 2015. Fungal Cellulases. Chemical Reviews 115:1308-1448. Chundawat SP, Beckham GT, Himmel ME, Dale BE. 2011. Deconstruction of lignocellulosic biomass to fuels and chemicals. Annu Rev Chem Biomol Eng 2:121-145. Himmel ME, Ding SY, Johnson DK, Adney WS, Nimlos MR, Brady JW, Foust TD. 2007. Biomass recalcitrance: engineering plants and enzymes for biofuels production. Science 315:804-807. Kunii M, Yasuno M, Shindo Y, Kawata T. 2013. A Dictyostelium cellobiohydrolase orthologue that affects developmental timing. Dev Genes Evol 224:25-35. Divne C, Stahlberg J, Reinikainen T, Ruohonen L, Pettersson G, Knowles JK, Teeri TT, Jones TA. 1994. The threedimensional crystal structure of the catalytic core of cellobiohydrolase I from Trichoderma reesei. Science 265:524-528. Divne C, Ståhlberg J, Teeri TT, Jones TA. 1998. High-resolution crystal structures reveal how a cellulose chain is bound in the 50 Å long tunnel of cellobiohydrolase I from Trichoderma reesei. Journal of molecular biology 275:309-325. Knott BC, Haddad Momeni M, Crowley MF, Mackenzie LF, Götz AW, Sandgren M, Withers SG, Ståhlberg J, Beckham GT. 2013. The mechanism of cellulose hydrolysis by a two-step, retaining cellobiohydrolase elucidated by structural and transition path sampling studies. Journal of the American Chemical Society 136:321-329. Momeni MH, Payne CM, Hansson H, Mikkelsen NE, Svedberg J, Engstrom A, Sandgren M, Beckham GT, Stahlberg J. 2013. Structural, biochemical, and computational characterization of the glycoside hydrolase family 7 cellobiohydrolase of the tree-killing fungus Heterobasidion irregulare. J Biol Chem 288:5861-5872. Muñoz IG, Ubhayasekera W, Henriksson H, Szabó I, Pettersson G, Johansson G, Mowbray SL, Ståhlberg J. 2001. Family 7 cellobiohydrolases from Phanerochaete chrysosporium: crystal structure of the catalytic module of Cel7D (CBH58) at 1.32 Å resolution and homology models of the isozymes. Journal of molecular biology 314:1097-1111. Grassick A, Murray PG, Thompson R, Collins CM, Byrnes L, Birrane G, Higgins TM, Tuohy MG. 2004. Threedimensional structure of a thermostable native cellobiohydrolase, CBH IB, and molecular characterization of the cel7 gene from the filamentous fungus, Talaromyces emersonii. European Journal of Biochemistry 271:4495-4506. Textor LC, Colussi F, Silveira RL, Serpa V, Mello BL, Muniz JRC, Squina FM, Pereira N, Skaf MS, Polikarpov I. 2013. Joint X-ray crystallographic and molecular dynamics study of cellobiohydrolase I from Trichoderma harzianum: deciphering the structural features of cellobiohydrolase catalytic activity. Febs Journal 280:56-69. Parkkinen T, Koivula A, Vehmaanpera J, Rouvinen J. 2008. Crystal structures of Melanocarpus albomyces cellobiohydrolase Cel7B in complex with cello-oligomers show high flexibility in the substrate binding. Protein Sci 17:1383-1394.

17

497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30. 31.

32. 33. 34. 35. 36.

37.

38. 39.

40.

Moroz O, Maranta M, Shaghasi T, Harris P, Wilson K, Davies G. 2015. The three-dimensional structure of the cellobiohydrolase Cel7A from Aspergillus fumigatus at 1.5 Å resolution. Acta Crystallographica Section F: Structural Biology Communications 71:114-120. Haddad Momeni M, Goedegebuur F, Hansson H, Karkehabadi S, Askarieh G, Mitchinson C, Larenas EA, Ståhlberg J, Sandgren M. 2014. Expression, crystal structure and cellulase activity of the thermostable cellobiohydrolase Cel7A from the fungus Humicola grisea var. thermoidea. Acta Crystallographica Section D: Biological Crystallography 70:23562366. Kern M, McGeehan JE, Streeter SD, Martin RN, Besser K, Elias L, Eborall W, Malyon GP, Payne CM, Himmel ME. 2013. Structural characterization of a unique marine animal family 7 cellobiohydrolase suggests a mechanism of cellulase salt tolerance. Proceedings of the National Academy of Sciences 110:10189-10194. Borisova AS, Eneyskaya EV, Bobrov KS, Jana S, Logachev A, Polev DE, Lapidus AL, Ibatullin FM, Saleem U, Sandgren M. 2015. Sequencing, biochemical characterization, crystal structure and molecular dynamics of cellobiohydrolase Cel7A from Geotrichum candidum 3C. FEBS Journal 282:4515-4537. Kleywegt GJ, Zou JY, Divne C, Davies GJ, Sinning I, Stahlberg J, Reinikainen T, Srisodsuk M, Teeri TT, Jones TA. 1997. The crystal structure of the catalytic core domain of endoglucanase I from Trichoderma reesei at 3.6 Å resolution, and a comparison with related enzymes. Journal of Molecular Biology 272:383-397. MacKenzie L, Sulzenbacher G, Divne C, Jones T, Woldike H, Schulein M, Withers S, Davies G. 1998. Crystal structure of the family 7 endoglucanase I (Cel7B) from Humicola insolens at 2.2 Å; resolution and identification of the catalytic nucleophile by trapping of the covalent glycosyl-enzyme intermediate. Biochem J 335:409-416. Sulzenbacher G, Driguez H, Henrissat B, Schulein M, Davies GJ. 1996. Structure of the Fusarium oxysporum endoglucanase I with a nonhydrolyzable substrate analogue: substrate distortion gives rise to the preferred axial orientation for the leaving group. Biochemistry 35:15280-15287. Stals I, Sandra K, Geysens S, Contreras R, Van Beeumen J, Claeyssens M. 2004. Factors influencing glycosylation of Trichoderma reesei cellulases. I: Postsecretorial changes of the O-and N-glycosylation pattern of Cel7A. Glycobiology 14:713-724. Beckham GT, Bomble YJ, Matthews JF, Taylor CB, Resch MG, Yarbrough JM, Decker SR, Bu L, Zhao X, McCabe C. 2010. The O-glycosylated linker from the Trichoderma reesei Family 7 cellulase is a flexible, disordered protein. Biophysical journal 99:3773-3781. Sammond DW, Payne CM, Brunecky R, Himmel ME, Crowley MF, Beckham GT. 2012. Cellulase linkers are optimized based on domain type and function: insights from sequence analysis, biophysical measurements, and molecular simulation. PLoS One 7:e48615. Beckham GT, Bomble YJ, Bayer EA, Himmel ME, Crowley MF. 2011. Applications of computational science for understanding enzymatic deconstruction of cellulose. Curr Opin Biotechnol 22:231-238. Bu L, Beckham GT, Shirts MR, Nimlos MR, Adney WS, Himmel ME, Crowley MF. 2011. Probing carbohydrate product expulsion from a processive cellulase with multiple absolute binding free energy methods. Journal of Biological Chemistry 286:18161-18169. Vrsanska M, Biely P. 1992. The Cellobiohydrolase-I from Trichoderma reesei Qm-9414 - Action on CelloOligosaccharides. Carbohydr Res 227:19-27. Igarashi K, Uchihashi T, Koivula A, Wada M, Kimura S, Okamoto T, Penttila M, Ando T, Samejima M. 2011. Traffic jams reduce hydrolytic efficiency of cellulase on cellulose surface. Science 333:1279-1282. Kurasin M, Valjamae P. 2011. Processivity of cellobiohydrolases is limited by the substrate. J Biol Chem 286:169-177. Jalak J, Valjamae P. 2014. Multi-mode binding of Cellobiohydrolase Cel7A from Trichoderma reesei to cellulose. PLoS One 9:e108181. von Ossowski I, Stahlberg J, Koivula A, Piens K, Becker D, Boer H, Harle R, Harris M, Divne C, Mahdi S, Zhao Y, Driguez H, Claeyssens M, Sinnott ML, Teeri TT. 2003. Engineering the exo-loop of Trichoderma reesei cellobiohydrolase, Cel7A. A comparison with Phanerochaete chrysosporium Cel7D. Journal of Molecular Biology 333:817-829. Andric P, Meyer AS, Jensen PA, Dam-Johansen K. 2010. Reactor design for minimizing product inhibition during enzymatic lignocellulose hydrolysis: II. Quantification of inhibition and suitability of membrane reactors. Biotechnol Adv 28:407-425. Gan Q, Allen SJ, Taylor G. 2002. Design and operation of an integrated membrane reactor for enzymatic cellulose hydrolysis. Biochemical Engineering Journal 12:223-229. King AJ, Cragg SM, Li Y, Dymond J, Guille MJ, Bowles DJ, Bruce NC, Graham IA, McQueen-Mason SJ. 2010. Molecular insight into lignocellulose digestion by a marine isopod in the absence of gut microbes. Proceedings of the National Academy of Sciences 107:5345-5350. Eichinger L, Pachebat JA, Glockner G, Rajandream MA, Sucgang R, Berriman M, Song J, Olsen R, Szafranski K, Xu Q, Tunggal B, Kummerfeld S, Madera M, Konfortov BA, Rivero F, Bankier AT, Lehmann R, Hamlin N, Davies R, Gaudet P, 18

553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608

41. 42. 43. 44. 45. 46.

47. 48. 49. 50. 51.

52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65.

66.

Fey P, Pilcher K, Chen G, Saunders D, Sodergren E, Davis P, Kerhornou A, Nie X, Hall N, Anjard C, Hemphill L, Bason N, Farbrother P, Desany B, Just E, Morio T, Rost R, Churcher C, Cooper J, Haydock S, van Driessche N, Cronin A, Goodhead I, Muzny D, Mourier T, Pain A, Lu M, Harper D, Lindsay R, Hauser H, et al. 2005. The genome of the social amoeba Dictyostelium discoideum. Nature 435:43-57. Cragg SM, Beckham GT, Bruce NC, Bugg TD, Distel DL, Dupree P, Etxabe AG, Goodell BS, Jellison J, McGeehan JE. 2015. Lignocellulose degradation mechanisms across the Tree of Life. Current Opinion in Chemical Biology 29:108-119. Beckham GT, Dai Z, Matthews JF, Momany M, Payne CM, Adney WS, Baker SE, Himmel ME. 2012. Harnessing glycosylation to improve cellulase activity. Curr Opin Biotechnol 23:338-345. Singh A, Taylor LE, Vander Wall TA, Linger J, Himmel ME, Podkaminer K, Adney WS, Decker SR. 2014. Heterologous protein expression in Hypocrea jecorina: A historical perspective and new developments. Biotechnology advances. Grigoriev IV, Nordberg H, Shabalov I, Aerts A, Cantor M, Goodstein D, Kuo A, Minovitsky S, Nikitin R, Ohm RA. 2011. The genome portal of the department of energy joint genome institute. Nucleic acids research:gkr947. Nordberg H, Cantor M, Dusheyko S, Hua S, Poliakov A, Shabalov I, Smirnova T, Grigoriev IV, Dubchak I. 2014. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic acids research 42:D26-D31. Linger J, Taylor L, Baker J, Vander Wall T, Hobdey S, Podkaminer K, Himmel M, Decker S. 2015. A constitutive expression system for glycosyl hydrolase family 7 cellobiohydrolases in Hypocrea jecorina. Biotechnology for Biofuels 8:45. McPherson A. 1982. Preparation and analysis of protein crystals. John Wiley & Sons. Collaborative CP. 1994. The CCP4 suite: programs for protein crystallography. Acta crystallographica Section D, Biological crystallography 50:760. McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ. 2007. Phaser crystallographic software. Journal of applied crystallography 40:658-674. Haddad Momeni M, Ubhayasekera W, Sandgren M, Ståhlberg J, Hansson H. 2015. Structural insights into the inhibition of Cellobiohydrolase Cel7A by xylooligosaccharides. FEBS Journal doi:10.1111/febs.13265:n/a-n/a. Murshudov GN, Skubák P, Lebedev AA, Pannu NS, Steiner RA, Nicholls RA, Winn MD, Long F, Vagin AA. 2011. REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallographica Section D: Biological Crystallography 67:355-367. Emsley P, Cowtan K. 2004. Coot: model-building tools for molecular graphics. Acta Crystallographica Section D: Biological Crystallography 60:2126-2132. Guex N, Peitsch MC. 1997. SWISS-MODEL and the Swiss-Pdb Viewer: an environment for comparative protein modeling. electrophoresis 18:2714-2723. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. 2009. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25:1189-1191. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32:1792-1797. Madsen D, Johansson P, Johansson N, Arent S, Harris M, Kleywegt G. 2005. INDONESIA: An integrated workbench for sequence and structure alignment and analysis, manuscript submitted, Uppsala, Sweden. Gonnet GH, Cohen MA, Benner SA. 1992. Exhaustive matching of the entire protein sequence database. Science 256:1443-1445. Kumar S, Stecher G, Tamura K. 2016. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Molecular Biology and Evolution accepted. Rzhetsky A, Nei M. 1992. A simple method for estimating and testing minimum-evolution trees. Mol Biol Evol 9:945967. Felsenstein J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution:783-791. Schwartz R, Dayhoff M. 1978. Matrices for detecting distant relationships. Atlas of protein sequence and structure 5:353-358. Nei M, Kumar S. 2000. Molecular evolution and phylogenetics. Oxford university press. Saitou N, Nei M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution 4:406-425. Baker JO, Ehrman CI, Adney WS, Thomas SR, Himmel ME. 1998. Hydrolysis of cellulose using ternary mixtures of purified celluloses. Appl Biochem Biotechnol 70-72:395-403. Payne CM, Resch MG, Chen L, Crowley MF, Himmel ME, Taylor LE, Sandgren M, Ståhlberg J, Stals I, Tan Z. 2013. Glycosylated linkers in multimodular lignocellulose-degrading enzymes dynamically bind to cellulose. Proceedings of the National Academy of Sciences 110:14646-14651. Ubhayasekera W, Munoz IG, Vasella A, Stahlberg J, Mowbray SL. 2005. Structures of Phanerochaete chrysosporium Cel7D in complex with product and inhibitors. FEBS J 272:1952-1964. 19

609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632

67. 68. 69. 70. 71.

72.

73. 74. 75. 76.

77.

Vantilbeurgh H, Tomme P, Claeyssens M, Bhikhabhai R, Pettersson G. 1986. Limited Proteolysis of the Cellobiohydrolase I from Trichoderma reesei - Separation of Functional Domains. FEBS Lett 204:223-227. Varnai A, Siika-Aho M, Viikari L. 2013. Carbohydrate-binding modules (CBMs) revisited: reduced amount of water counterbalances the need for CBMs. Biotechnol Biofuels 6:30. Blanton RL, Northcote DH. 1990. A 1,4-beta-D-glucan-synthase system from Dictyostelium discoideum. Planta 180:324332. Wakeman H. 1954. Mechanical Properties of Cellulose and its Derivatives, p 1247-1355. In Ott E, Spurlin, H. (ed), Cellulose and Cellulose Derivatives. Wiley Interscience, New York. Resch MG, Donohoe BS, Baker JO, Decker SR, Bayer EA, Beckham GT, Himmel ME. 2013. Fungal cellulases and complexed cellulosomal enzymes exhibit synergistic mechanisms in cellulose deconstruction. Energy & Environmental Science 6:1858-1867. Ståhlberg J, Henriksson H, Divne C, Isaksson R, Pettersson G, Johansson G, Jones TA. 2001. Structural basis for enantiomer binding and separation of a common β-blocker: crystal structure of cellobiohydrolase Cel7A with bound (S)-propranolol at 1.9 Å resolution. Journal of molecular biology 305:79-93. Berney C, Pawlowski J. 2006. A molecular time-scale for eukaryote evolution recalibrated with the continuous microfossil record. Proceedings of the Royal Society B: Biological Sciences 273:1867-1872. Parfrey LW, Grant J, Tekle YI, Lasek-Nesselquist E, Morrison HG, Sogin ML, Patterson DJ, Katz LA. 2010. Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life. Systematic Biology 59:518-533. Keeling PJ, Palmer JD. 2008. Horizontal gene transfer in eukaryotic evolution. Nature Reviews Genetics 9:605-618. Richards TA, Soanes DM, Jones MD, Vasieva O, Leonard G, Paszkiewicz K, Foster PG, Hall N, Talbot NJ. 2011. Horizontal gene transfer facilitated the evolution of plant parasitic mechanisms in the oomycetes. Proceedings of the National Academy of Sciences 108:15258-15263. Brevnova EE, Flatt J, Gandhi C, Rajgarhia V, McBride J, Warner A. 2013. Isolation and characterization of Schizochytrium aggregatum cellobiohydrolase I (Cbh 1), US Patent 8,470,592 B2.

633

20

634

FIGURES

635

Figure 1: Sequence alignment of GH7 CBHs that have been structurally characterized to date. The blue highlighted regions

636

denote N-glycosylation motifs, the residues in red (EXDXXE motif) denote the catalytic residues, and the residues enclosed in

637

boxes represent the A1-A4 and B1-B4 loops enclosing the active site tunnel.

638 639

Figure 2. Biochemical and kinetic characterization of DdiCel7ACBM (green), DpuCel7ACBM (blue) and TreCel7A (red). (A) Activity

640

comparison on 2 mM pNPL, 40°C at various pH values. (B) Activity comparison on 2 mM pNPL, pH 5 at various temperatures. (C)

641

DSC traces comparing denaturation temperature, Tm, of all three enzymes. (D) Michaelis-Menten plot of comparing activity on

642

various concentrations of pNPL at 45°C, pH 5.

643 644

Figure 3. Cellulose hydrolysis on Avicel (A,D), pretreated corn stover (B,E) and phosphoric acid swollen cellulose (C,F) in a three

645

enzyme cocktail: experimental CBH, endoglucanase (E1), and β-glucosidase. A-C, comparison of DdiCel7ACBM (green) and

646

TreCel7A (red) at 40°C, pH 5.0. D-F, comparison of DpuCel7ACBM (blue) and TreCel7A (red) at 50°C, pH 5.0. All digestions

647

evaluated against a no CBH control (purple).

648 649

Figure 4. Superposition of DpuCel7A (A) and DdiCel7A (B) with the cellononaose Michaelis complex of TreCel7A (PDB code

650

4C4C). The key substrate-binding loops are labeled in (A) and the substrate binding sites are labeled in (B).

651 652

Figure 5. Space-fill GH7 structures comparing substrate tunnel enclosure of CBHs from T. reesei (PDB code 4C4C), D. discoideum,

653

D. purpureum, H. irregulare (PDB code 2YG1), P. chrysosporium (PDB code 1GPI), and endoglucanase (EG) Cel7B from T. reesei

654

(PDB code 1EG1). In all frames, the cellononaose ligand from the TreCel7A Michaelis complex is shown in gray ‘sticks’.

655 656

Figure 6. Structural alignments of DdiCel7A and DpuCel7A to other known GH7 CBH structures. A) Alignment around the active

657

site illustrating interactions of the B2, B3, and A3 loops discussed in the main text. B) Product binding sites illustrating the

658

relative positioning of the commonly found aspartate residue that interacts with the +2 glucosyl unit, but which is replaced by

659

glycine in TreCel7A and ThaCel7A. The ligands are both from TreCel7A structures: Michaelis (PDB code 4C4C) and Primed GEI

660

(PDB code 3CEL). The aspartic acid side chains of PchCel7D, HirCel7A, TemCel7A, AfuCel7A, LquCel7B essentially overlay those of

661

DpuCel7A and DdiCel7A and are not shown for clarity. 21

662 663

Figure 7. Eukaryote evolution timeline of branches where GH7 genes have been found, and protein sequence identities (ID%)

664

and similarities (Sim%) within the GH7 domain to D. discoideum Cel7A. Branch names and timepoints of divergence are from

665

(73). Only points of early divergence are included (