Invited Commentary

0 downloads 0 Views 874KB Size Report
B: S* = £" x 2\ x 2"r x (4"-3) x [6 x (n-2)] where n-2 is the number of core monosaccharides that can originate monosaccharide branches, 4""3 are the permutations.
Glyco-Forum section

Invited Commentary A calculation of all possible oligosaccharide isomers both branched and linear yields 1.05 x 1012 structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems

This calculation underscores the reason for the long-standing technology barrier for the development of a microchemistry in carbohydrate analysis comparable in sensitivity with Edman protein and Sanger DNA sequencing methods. It also reveals the barrier to facile synthetic methods for oligosaccharides comparable to those developed for peptide synthesis. Key words: calculation/isomer barrier/oligosaccharide isomers

Roger A.Laine Introduction Departments of Biochemistry and Chemistry, Louisiana State University and The Louisiana Agricultural Center, Baton Rouge, LA 70803, USA

© Oxford University Press

759

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

The number of all possible linear and branched isomers of a hexasaccharide was calculated and found to be >1.05 x 1012. This large number defines the Isomer Barrier, a persistent technological barrier to the development of a single analytical method for the absolute characterization of carbohydrates, regardless of sample quantity. Because of this isomer barrier, no single method can be employed to determine complete oligosaccharide structure in 100 nmol amounts with the same assurance that can be achieved for 100 pmol amounts with single-procedure Edman peptide or Sanger DNA sequencing methods. Difficulties in the development of facile synthetic schemes for oligosaccharides are also explained by this large number. No current method of chemical or physical analysis has the resolution necessary to distinguish among 1012 structures having the same mass. Therefore the 'characterization' of a middle-weight oligosaccharide solely by NMR or mass spectrometry necessarily contains a very large margin of error. Greater uncertainty accompanies results performed solely by sequential enzyme degradation followed by gel-permeation chromatography or electrophoresis, as touted by some commercial advertisements. Much of the literature which uses these single methods to 'characterize' complex carbohydrates is, therefore, in question, and journals should beware of publishing structural characterizations unless the authors reveal all alternate possible structures which could result from their analysis. Today, only a combination of quantitative sugar analysis, methylation linkage analysis, partial degradation by enzymes or chemistry, and mass spectrometry can reduce the number of possibilities to one. The present study yields a number of individual formulae and a master set of equations necessary for the determination of all possible reducing-end isomers for di- to octasaccharides, above which branching isomers generate astronomical numbers, larger than Avogadro's number. Because hexasaccharides are generally among the largest biologically active, protein-recognized oligosaccharide sequences, and also among the largest repeating units in polysaccharides, the present calculation was limited to dp6. Despite this simplification, the number of possible structures calculated for reducing hexasaccharides comprised of D hexoses alone is >1012. Available microchemistry for biologically active oligosaccharides requires between 10 and 100 nmol for a minimum necessary combination of wet chemistry/enzymology/mass spectrometry employing partial degradation. The relatively high limiting quantity for analysis of carbohydrates (compared with proteins and DNA) has remained static for 20 years, despite intense research activity.

Carbohydrates, by their unique branching structure, contain an evolutionary potential of information content several orders of magnitude higher in a short sequence than in any other biological oligomer. This study addresses informational potential inherent in biological recognition systems comprised of complex carbohydrate ligands recognized for targeted activities by specifically binding cognate protein receptors, such as lectins. Evolution of receptor/ligand cognate pairs in carbohydrates is complex and probably very slow. Single point mutations in glycosyl transferase proteins are not likely to alter sugar structures, except in cases where a minor amino acid change could alter recognition among closely related sugars comprising otherwise the same structure (Yamamoto and Hakomori, 1990). The polypeptide-based carbohydrate recognition information is carried in one or more genes. Evolution of biological recognition of just one additional sugar on an existing structure may require a combination of the following: (i) extensive mutation of the peptide sequence of an existing glycosyl transferase, or evolution of a novel transferase, and (ii) evolution of a new lectin to contain the new binding/recognition site. The complex carbohydrate cognate is coded into a specifically ordered set of glycosyl transferase genes where each precursor is part of the recognition system in the binding site of glycosyl transferases for acceptance of the next sugar. Understanding the evolution, genetic control and organization of these newly discovered carbohydrate-protein recognition systems will be a significant research challenge. In all biological heteropolymers, the linear sequence of monomers comprises, in some manner, a biological code. The ability of proteins to conform in a concave or convex manner to recognize all other biological molecules includes the recognition of complex carbohydrates. Proteins such as lectins, enzymes and antibodies can exhibit exquisite binding specificities for the shape, charge, epimers, anomers, linkage positions, ring size, branching and monosaccharide sequence of carbohydrate ligand molecules where the maximum recognized size is usually hexamer or smaller (Cisar et al, 1975; Takeo and Kabat, 1978). Carbohydrate sequences possess unique solution structures which, although dynamic, are shown by nuclear Overhauser effect NMR and molecular modelling to be populated mainly by minimum-energy three-dimensional conformations (Smith-Gill et al, 1984; Cummings and Carver, 1987; Poppe et al, 1990; Miller et al, 1992; French et al, 1993). Oligosaccharide haptens, being rather more rigid than short peptides because of steric crowding (ibid.), must be envisioned in three-dimensional space for specific recognition by proteins. Carbohydrate polymers themselves often contain a complex multifaceted sequence, and specific proteins can bind to relatively short subsets or haptens within longer saccharide

Glyco-Forum section

Taken together, these interesting findings give bold introduction to a new excitement in carbohydrate biochemistry. A growing specialty area of biochemistry concerns itself with the biology of protein recognition of specific carbohydrates. This field has been coined 'Glycobiology' by Raymond Dwek (Rademacher et al, 1988; Opdenakker et al, 1993), a name which has also been adopted by this journal and an international scientific society of some 700+ members (formerly the Society for Complex Carbohydrates). What, therefore, are the structural components that make carbohydrates so complex and what is the magnitude of the potential information content it is apparent that higher organisms have exploited? Usually, saccharide-binding proteins recognize a six-sugar oligomer or smaller. This paper proposes that within a hexasaccharide sequence comprised of a set of six different sugars which may be repeated, >1.05 x 1012 possible carbohydrate structures exist. In contrast, a set of six amino acids which can be repeated can only generate 46 656 different structures, >7 orders of magnitude lower. Carbohydrates have seven major structural features, comprising (i) epimers, including D and L forms; (ii) linear sequence of core and linear branches; (iii) ring size; (iv) anomeric configuration; (v) linkage position; (vi) branching positions and (vii) reducing terminal attachment, all of which contribute to large numbers of equal-mass isomers in a short sequence potentially recognizable by proteins. Calculation of the isomers of an oligosaccharide was mentioned in Nathan Sharon's collected lectures (Sharon, 1975) as 760

originating with John Clamp, who estimated 1056 isomers for a trisaccharide comprised of three different hexoses. This calculation was based on six sequence permutations of three different monomers (3!), eight permutations of alpha and beta anomeric configurations at each of three sugars (23), and 16 possibilities of attachment of the reducing terminal and internal sugar (to the 2, 3, 4 or 6 hydroxyl of their respective aglycones) (42). This number ( 6 x 8 x 1 6 ) = 768, and it is not clear how the number 1056 was calculated. However, owing to not considering repeating sugars, ring size or branching, as shown below, both Clamp and Sharon underestimated by nearly two orders of magnitude. Richard Schmidt (1986) published a table showing a calculation of 720 isomers for a trisaccharide, 34 560 for a tetrasaccharide and 2144 640 for a pentasaccharide. In 1988, Laine et al. published a formula including a ring size term, and estimated the resulting number for a linear, reducing pentasaccharide with non-repeating units as follows: n!x2"ax2"rx4"-' where n is the number of saccharides, 2" subscript 'a' is the anomeric term, 2" subscript 'r' is the ring size term and the linkage position is represented by 4""'. Employed in a specific calculation for a linear pentasaccharide comprised of five different non-repeating hexoses, this resulted in 31 457 280 isomers (ibid.). However, the number of possible isomers is actually much larger due to branching and repeated monomers. Hellerqvist (1990) estimated 2.72 billion possible structures for a hexasaccharide containing aminosugars, fucose and hexoses. Sugar monomers are often repeated in natural carbohydrates, just as in peptides. Repeating saccharides, for example, were considered in a separate calculation by Schmidt (1986). Therefore, in the Clamp/Sharon formula 3! x 23 x 42 for the number of possible trisaccharides from a set of three hexoses, the first term should have been 3 3 = 27 instead of 3! = 6. The total should have been multiplied by another term for ring size, since most sugars can occur in either pyranose or furanose forms. This would have increased the result by a factor of 23 possibilities orx8. The furanose form presents the possibility that in a trisaccharide of sequence ABC, sugar A could have been connected through the 5 position of sugar B, for example. However, this factor is taken into account by the ring size term keeping the number of possibilities of linkage positions at 42 = 16. Thus, the correct number for linear trisaccharides made up from a set of three hexoses is 27 x 8 x 8 x 16 = 27 648. Another possibility for the configuration of a trisaccharide is a branched structure where sugars A and B are both glycosidically attached to sugar C by a 2,3; 2,4; 2,6; 3,4; 3,6 or 4,6 branching pattern (six possibilities). Where sugar C is in the furanose form, however, additional possibilities include 2,3; 2,5; 2,6; 3,5; 3,6; 5,6 for a total of 12 different branched structures. The ring size term 2"r, however, when applied to the branching sugar, takes into account the additional six furanose structures.

\ C(I->R)*

\ or

*R = reducing-end attachment site

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

sequences, such as in heparin (Riensenfeld et al, 1977; Atha et al, 1987; vanBoeckel and Petitou, 1993). A lectin or other carbohydrate-binding protein can act in control mechanisms, as signals for polypeptide location within the cell, such as lysosomal protein markers (Reitman and Kornfeld, 1981) and, in the metazoan, for specific cell surface recognition of one cell by another. A large collective of lowavidity interactions may take place where multimeric intercellular binding occurs (Lee, 1990). Specific spacing of carbohydrate moieties within a structure may confer several orders of magnitude tighter binding (ibid.). Possible higher complexity might occur where low-avidity binding of patterns of sets of carbohydrates by sets of binding proteins may form recognition systems which may play a powerful role in intercellular sociology during development (Feizi, 1985, 1988), in the immune system (Aruffo et al, 1990; Brandley et al, 1990; Polley etai, 1991; Yuen et al., 1992) and in parasitology (Friedman et al, 1985) and other microbial pathogenesis (Srnka et al., 1992). Numerous reviews and recent papers have been written regarding new discoveries in carbohydratebased recognition systems such as the selectins (Aruffo et ai, 1990; Brandley et al, 1990; Polley et al, 1991; Yuen et al, 1992), glycosaminoglycan clotting factors (Lindahl and Hook, 1978; Casu, 1989; Lane et al, 1992; Tollefsen, 1992), tumour markers (Hakomori, 1984; Hoff et al, 1990), parasite recognition systems (Friedman et al, 1985), Rhizobium nodulation systems (Truchet et al, 1991; Fisher and Long, 1992), plant pathogen recognition (Maniara et al, 1984) and others (Karlsson, 1986) which need no further discussion here. Banausic motives and research by new start-up companies have recently driven science to many new discoveries in the immune cell recognition systems. Current molecular understanding of this system alone augurs a giant breakthrough in immunochemistry.

Glyco-Forum section

Since each branch can occur in two different ways, such as A6,B3 or B6,A3, there are again 12 different ways to branch these three sugars. The permutation term, £", however, takes care of this A6,B3 and B6,A3 branching duplex. Possible branched trisaccharides from a set of three hexoses, each one unique and different from the linear structures, are 27 x 8 x 8 x 6 = 1 0 368. The total structures from a trisaccharide comprised of three hexoses, choosing among a set of only three different hexoses, is 27 648 (linear forms) plus 10 368 (branched forms) = 38 016, ~40-fold higher than Clamp's, Sharon's or Schmidt's estimate of 720-1050. The formula for isomers of a trisaccharide having a reducing end is thus:

possible isomers for a reducing hexasaccharide comprised from a set of six hexoses in the D-configuration. Since both D- and L-configurations of hexoses appear in nature, especially in plants, fungi and microbes, the possible isomers are even higher (by a factor of 26). However, D/L interconversion is not simple chemistry or biochemistry for most hexoses, the molecules having 4-5 chiral centres as mirror images. Also, the occurrence of both D- and L-forms of the same sugar in the same organism is rare. Although in this calculation we will only consider the possible D-isomers, we must consider that the pure L-forms generate an equal number and the mixed D,L-forms would add a multiple of 64 to the total number.

E" x 2"r x 2\ x 4" -' (linear forms) E" x 2"r x 2\ x 6""2 (branched forms)

Linear structures Non-repeating

Non-reducing oligosaccharides Trisaccharides can also be configured with the trehalose-type aldose-l-»l-aldose or the sucrose/raffinose non-reducing aldose-1—»2 ketose internal linkage structure, and larger oligosaccharides can be linked in a head-to-tail cyclodextrin fashion. These kinds of permutations would add a large number to this calculation. At first blush, for the set of 'cyclodextric' hexasaccharides, the linear permutations number calculated below would be multiplied by four due to the linkage term added by the extra head-to-tail linkage, making the clyclodextrics alone close to 0.8 trillion. However, since there would be no terminals, many of the cyclic 'isomers' might be identical, depending on the starting position. To simplify, the scope will be limited to the much more common reducing-end saccharides. There have been no reported estimations of all isomers resulting from oligosaccharide branching.

Reducing oligosaccharides To address the issue of carbohydrate isomers in a biologically relevant size more thoroughly, we have estimated all of the

The total number of possible structures, 5*, of a D-hexosecontaining hexasaccharide begins with the value for a linear chain of six different non-repeating sugars ABCDEF, whose general formula is as follows: A: S* = n ! x 2 " a x 2 " r x ( 4 " - ' ) where n is the number of different hexoses in a string. I. n\ is the linear permutation term, no sugar monomers repeated (6! = 120). II. 2"a is the term for anomeric isomers (26) = 64. III. 2"r is the term for ring size (pyranose or furanose) 26 = 64. IV. 4"~' is the linkage position term (4s = 1024). While all five of the carbons 2-6 hydroxyls can participate in the linkage position when considering pyranose and furanose forms, pyranose excludes the 5 linkage and furanose excludes the 4 linkage, therefore this part of term IV is taken into account by term III, above. This number for linear non-repeating structures of a hexasaccharide considering only D stereochemistry would be: A: S* = 6! x 2 6 x 2 6 x 45 = 3 019 898 880 (three billion!) A list of linear, non-repeating, D-form, reducing-end oligosaccharides up to six sugars in length is presented in Table I, using equation (A).

Table I. Linear isomers of D-hexoses, each hexose used once Oligosaccharide size

Hexose set

Linear isomers

Monosacchande Disaccharide Trisaccharide Tetrasaccharide Pentasaccharide Hexasaccharide

I 2 3 4 5 6

2 128 6144 393 216 31 457 280 3 019 898 880

761

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

Caveat on the use of NMR as a single spectroscopic method. Each trisaccharide would contain 15 ring protons including the anomeric, thus the proton NMR spectrum would need to resolve 38 016 x 15 = 570 240 'different' proton environments within 0.5 p.p.m. It is doubtful that a tenth of this number of lines could be resolved using multidimension proton NMR. In fact, the carbon-13 spectrum, 30 times more dispersed, would need to resolve 38 016 x 18 carbons = 684 288 lines if they happened all to be different, an impossibility. NMR by itself cannot, therefore, be used to absolutely identify trisaccharides or higher oligomers by virtue of chemical shift values. As for mass spectrometry, all isomers have the same mass. Partial fragmentation in collisional activated mass spectrometry might provide the combination of partial degradation and spectral pattern to resolve such parameters as position of linkage (Yoon and Laine, 1992), but will not be sufficient without other sensitive chemical and enzymatic manipulations.

Glyco-Forum section

Each of the above represented examples of singly branched species can be considered as a separate saccharide that has a fixed branch point with regard to the location of the branching sugar moiety within the chain, the branch being moveable among the hydroxyls on the branch point sugar. All of the monosaccharides in the hexamer are then considered to contribute to isomers just as the linear form, but with the branch positions moveable among carbons on each monomer capable of forming branches within the chain. The general formula for sets of oligosaccharide isomers branched with a single monosaccharide along the core chain would be:

Table II. Linear isomers from a set of 1-6 D-hexoses Oligosaccharide size

Hexose set

Linear isomers

Monosaccharide Disacchande Trisacchande Tetrasaccharide Pentasaccharide Hexasaccharide

1 2 3 4 5 6

2 256 27 648 4 194 304 819 200 000 195 689 447 424

Repeating hexoses However, as in peptides, individual sugars often occur more than once in a natural oligomer, therefore if each or any of the members of the six-sugar set could be repeated, equation (A) becomes (A') as follows:

A':S* = £"x2"ax2"rx(4"-')

A': S* = 66 x 26 x 26 x 45

= 46 656 x 64 x 64 x 1024 = 195 689 447 424 Nearly 200 billion, an astonishing number! Table II shows the results of this calculation for mono- to hexasaccharides. Note that all of the mono- to pentasaccharides added together comprise C->D->E->F •R

1 A

B->C->D

FR

B->C

E-->FR

1 A

1 A

III

IV

We will omit the arrows in the structures which are understood as pointing toward the reducing-end 'FR'. 762

This first branching example gives nearly 300 billion additional possible structures!

Disaccharide branches Hexasaccharides with a single disaccharide branch:

AB

The monosaccharide in position 'F' is assigned to be the reducing end throughout, designated as 'FR'.

1 A

B: S*BI = 66 x 2" x 2 6 x (43) x [6 x (n-2)] = 46 656 x 64 x 64 x 64 x 24 = 293 534 171 136

C-D-E-FR I

Branched structures

B->C- ->D-->E->FR

where n-2 is the number of core monosaccharides that can originate monosaccharide branches, 4"" 3 are the permutations of positions of linkage on unbranched monomers within the chain and 6 x (n-2) are the possible arrangements of branches on each of the hexopyranoses in the chain capable of producing a branch (n-2). These would be, for example, in I, above, the A,B branches on C inserted as either A,B or B,A, respectively, on the 2,3; 2,4; 2,6; 3,4; 3,6; or 4,6 positions of pyranoses and 2,3; 2,5; 2,6; 3,5; 3,6; and 5,6 positions of furanoses. However, we assume that permutations of the ABC monomers are included in the E" term, therefore 12 possibilities remain for each possible branch position. However, the pyranose/furanose term, 2"r, includes the alternate set of six structures. Since the six possible positions for branching in each ring form account for 12 possibilities by the multiple of two in the ring form term, the factor for single branches should be 6 x (n-2). In this case, the number of isomers for each of configurations I—IV, above, would be:

C-D-E-FR I AB VI

C-D-E-FR

etc.

I AB

VII

V is the same as II, where ABDEFR can be considered the 'core' structure with a single monosaccharide branch on D; however, VI and VII are novel arrangements. The formula for this set would be: C: S* = E" x 2"a x 2"r x (4"~3) x [6 x (n-4)] where disaccharide branches that generate new compounds beyond single branches already considered can only happen on /?—4 of the monomers. Tetrasaccharides and below would not produce novel compounds.

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

where n is the length of the chain in monomers, and E is the number of different kinds of monomers (epimers) in the set. E" is the linear permutation term where individual sugar types can be repeated within the chain. The remaining terms are the same as in equation (A). In this case, the number of permutations for a linear hexasaccharide would be as follows:

B: S* = £" x 2\ x 2"r x (4"-3) x [6 x (n-2)]

Glyco-Forum section

For hexasaccharides the numerical total is:

Numerically, for hexasaccharides with two monosaccharide branches:

46 556 x 64 x 64 x 64 x 12 = 146 452 512 768 46 656 x 64 x 64 x 16 x 36 x 3 = 330 225 942 528 Trisaccharide branches: to the core chain D-E-FR 1 ABC

The general formula representing the number of permutations with B monosaccharide branches would be:

D-E-FR 1 ABC

F': S* - E" x 2"a x 2"r x 4""(B + 2) x [6B x (term for permutation of branches along the core monosaccharides)]

IX

VIII

VIII is the same as III, sugar 'D' being the single branch on the core ABCEF, and IX is the same as VII, with a disaccharide branch on the reducing-end sugar 'FR'. Therefore, new compounds only occur in heptasaccharides and above. The formula is:

A heptasaccharide is the smallest compound capable of triple single branches as in: D-E-F-G-(reducing end) I I I ABC

D: S* = E" x 2\ x 2"r x 4"- 3 x [6 X {((n-6) + (Abs.(n-6))/2)]

Tetrasaccharide branches

Triply branched monosaccharides For hexasaccharides, two single branches on the same core monosaccharide (trisubstituted or triple-branched) represent another novel set: B 1 C-D-E-FR 1 A

E-FR I ABCD

XIV

X is the same as IV with a monosaccharide branch on 'FR', and only produces new compounds with octasaccharides and above: This generates the formula: E:S* = £"x 2"a x 2"r x 4"- 3 x [6 x (((n-7) + (Abs.(n-7))/2)] For a neprasaccharide and smaller, this is numerically zero. Di-branched compounds Two single branches on two different core monosaccharides give three new types of arrangement: C-D-E-FR I I AB XI

C-D-E-FR I I A B XII

C-D-E-FR etc. I I AB

B 1 C-D-E-FR 1 A

XV

XVI

Branch possibilities for triple-substituted monosaccharides including both pyranose and furanose forms: (2,3,4); (2,3,6); (2,4,6); (3,4,6) for pyranose branching structures; and (2,3,5); (2,3,6); (2,5,6); (3,5,6) for furanoses (eight configurations, of which we only need to use a factor of four due to the ring size factor 2"r). Each one of these four can have six different permutative arrangements, such as ABC, ACB, BAC, BCA, CAB, CBA, covered by term (£"). Each of three locations in the trisaccharide can be trisubstituted in this way, therefore the term 4 x (n-3) as follows: G: S* = ET x 2"a x 2"r x (4"-4) x [4 x (n-3)] This formula does not function for trisaccharides or lower. For a hexasaccharide:

XIII

46 656 x 64 x 64 x 16 x 12 = 36 691 771 392

The factor of six different branch combinations now needs to be applied to two of the monosaccharides in the core while the anomerics and other permutations remain the same. F: S* = E" x 2\ x 2"r x 4""" x [62 x (n-4 + n-5 + ...+

B 1 C-D-E-FR 1 A

n-(n~l))]

Where 62 is the term considering all combinations of two branches among the hydroxyls of two separate hexoses. The term for locations of two branches along the core is (n-4 + n-5 + ... + (n-(n-l)). This formula is not valid for tetrasaccharides or below, where n-(n-l) - 3 and the series begins with n - 4 . Pentasaccharides give n-4 = 1 from the n- (n -1) term. Hexasaccharides give n-A = 2 + n-5 = 1 for a value of three. Likewise, heptasaccharides would give six.

Double disaccharide branches can occur on n-5 individual monosaccharide core members, hexasaccharides being the smallest saccharide for which this set produces new compounds. F is trisubstituted on this example. This could also be construed as a single and a disaccharide branch on a trisaccharide core. AB I E-FR I CD XVH

H: 5* = E" x 2\ x 2"r x (4""4) x [4 x (n-5)] 763

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

For a hexasaccharide or smaller, no new compounds are generated beyond those already considered, therefore the result is zero.

Glyco-Forum section

Not valid for pentasaccharides or below. For a hexasaccharide: 46 656 x 64 x 64 x 16 x 4 = 12 230 590 464 One monosaccharide and one disaccharide branches on different core monosacchandes: AB 1 D-E-FR 1 C

AB 1 D-E-FR 1 C

XVIII

Triple-branched FR can have four allowed variations, as in equation (G) above, and monosaccharide C in this illustration gives six variations, as in equation (B) above. No singly substituted hexoses occur in saccharides of this general structure smaller than 7-mers. This is not valid for pentasaccharides or lower. K: 5* = E" x 2"a x 2"r x (4""*) x 6 x [4 x (n-5)] For a hexasaccharide:

etc.

XIX

46 656 x 64 x 64 x 1 x 6 x 4 = 4 586 471 424 Tetra-branched versions are also possible: ED \/ F-R XXIII A ABC

XVIII is the same as XIII where the core is ABEF, while XIX has the core DEF or ABF, a novel arrangement. Therefore, we need to consider a new class of branched compounds where we have a single, itself branched trisaccharide branch, as follows:

XX

D-E-FR 1 C A A B XXI

XX has the core ACEF, or BCEF, the same as XIII and XVIII, while XXI, being branched on the reducing end by one disaccharide and a branched trisaccharide, is the same new compound as XIX. As the saccharide core length grows, new compounds can be formed in this series. This also introduces another form of dibranched structures in the same molecule. Thus, for single trisaccharide branched oligosaccharides: two branches allow 62 different substitution patterns each as in equation (F) above, and the term for how many monosacchandes along the core that are substitutable by the branched trisaccharide is (n-5). This is not valid for pentasaccharides or smaller. J: S* = E" x 2"a x 2"r x (4"-5) x [62 x (n-5)] For a hexasaccharide: 46 656 x 64 x 64 x 4 x 36 = 27 518 828 544 The first linkage component (4"~5) represents singly substituted core saccharides. Larger saccharides will present much more complex branching permutations. Another new compound can be envisioned which is a variation on XXI which also has D and E connected to the reducingend F. As: D I E-FR I C A AB

L: S* = E" x 2"a x 2"r x (4"-5)s x [n-4] x (4"- 5 ) t Valid only for pentasaccharides and hexasaccharides The penultimate term [n-4] shows the number of core saccharides capable of tetrasubstitution, while the last term (4"~5), shows substitution of the disaccharide AB on the hydroxyls of F in XXIII. (4""5)s is also the factor for the linkages to the single monosubstituted hexose. In heptasaccharides and above, one could also envision a trisaccharide branch that could be inserted in the compound analogous to XXIII, while a disaccharide branch would find itself in the analogue to XXIV. Therefore, for higher oligosaccharides, extra terms need to be added to equation (L). For a hexasaccharide: 46 656 x 64 x 64 x 4 x 2 x 4 = 6 115 295 232 This covers all possibilities for a D-hexasaccharide or smaller, where F is the reducing end or is attached to an aglycon. Hepta-, octa- and nonasaccharides offer possibilities of a number of higher orders of branching. Decasaccharides offer the first possibility of quadruply branched saccharides:

Compound XXII

This compound also has a triple branch on F and opens the door to another form of triple-branched versions of oligosaccharides where a monosaccharide and a branched trisaccharide are both substituted onto a core monosaccharide. 764

Tetra-branched hexoses are completely substituted as 2,3,4,6 for pyranoses or 2,3,5,6 for furanoses. Since no other branching is possible, the original term E" for substitution permutations covers all of the possibilities, except that the disaccharide on structure XXIII could occupy four different sites on FR, creating another factor of four in that structure.

c A B D A E F A G H A I J

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

D-E-FR 1 C /\ A B

AB \/ E-FR XXIV A CD

Glyco-Forum section

(or three tri-branched residues): ABC I I I

G-H-I-J-(reducing end) I I I DEF

A':S*=£"x2" a x2" r x(4' 1 - 1 ) + B: S* = ET x 2\ x 2"r x (4""3) x [6 * (n-2)] + C: S* = ET x 2\ x 2"r x (4""3) x [6 * (n-4)] + D: S* = ET x 2"a x 2"r x 4""3 x [6 * (((n-6) + (Abs.(n-6))/2)] + E: S* = ET x 2\ x 2"r x 4"'3 x [6 * (((n-7) + (Abs.(n-7))/2)] + F: S* = ETx 2\ x 2"r x 4"^ x [62 * (n-4 + n-5 + ... + n-(n-l))] + G: S* = ET x 2\ x 2"r x (4""1) x [4 x (n-3)] + H: 5* = ET x 2"a x 2"r x (4"^) x [4 x (n-5)] + J: S* = ET x 2\ x 2"r x (4""5) x [62 x (n-5)] + K: S* = ET x 2"a x 2"r x (4""6) x 6 x [4 x (n-5)] + L: S* =ETx 2\ x 2"r x (4"-5)s x [n-4] x (4""5)t Totals taken from A' to L for hexasaccharides made up of D-hexoses: A' B C D E F G H J K L

195 689 447 424 293 534 171 136 146 452 512 768 0 0 330 225 942 528 36 691 771 392 12 230 590 464 27 518 828 544 4 586 471 424 6 115 295 232 Total: 1053 045 031 000

Without considering L-sugars, or non-reducing forms, the total number of compounds from a hexasaccharide comprised of six different hexoses will be the total of the above, >1012

Oligosaccharide size

Hexose set

Monosacchande Disaccharide Trisaccharide Tetrasacchande Pentasaccharide Hexasaccharide

1 2 3 4 5 6

2 256 38 016 7 602 176 2 633 600 000 1 053 045 031 000

Oligosaccharide Isomers from D-Hexoses 10 10' 10 1 10 1 10 1

Octasaccharide Isomers Exceed 10e+17

Branched and Linear Oligosaccharides

t Linear Oligosaccharides

10 2 10'

10°

2

3 4 5 6 7 Degree of Polymerization

Fig. 1. Oligosaccharide isomers from D-hexoses.

possible compounds. Including the mirror image L-sugar forms as stereochemical isomers within this set would increase this number by a factor of 26 = 64, to more than 64 trillion. Table III shows the number of total reducing oligosaccharides possible from mono- to hexasaccharide. Figure 1 shows the data from Tables II and III plotted along with data from peptides of the same length. Extrapolation in Figure 1 shows that linear and branched totals for heptasaccharides would generate ~1015 compounds and octasaccharides would generate >10ls, with divergence from the linear forms increasing from 1 log at hexamers to >2 logs at octamers. The divergence is due to an increase in branching types. Nonasaccharides would generate >1 mol of isomers! Oligosaccharide building blocks Organisms possess a much larger menu of possibilities for oligosaccharides than for peptides. There exist >50 types of aminosugars alone, and probably 50 neutral and acidic sugars. Novel monosaccharides are discovered in plants and microbial cell walls each year. Sugars in nature can be substituted with acyl, alkyl, pyruvyl, sulphate, sulphonate, phosphate, phosphonate and other groups, any one of which would raise the possible isomers to a number much greater than the one we have calculated. A hexasaccharide has 19 substitutable hydroxyls. For example, if one methyl group is substituted anywhere on our set 765

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

However, most biological activities are recognized within a proteinaceous binding site of six sugars (or usually fewer), as exemplified by antibodies, enzymes (lysozyme), heparinoids or lectins (selectins), as reviewed in the Introduction. There are a few examples of proteins requiring higher oligomers for activity, e.g. a few enzyme recognition sites in the N-linked anabolic pathway for glycoprotein synthesis which apparently recognize precursors as large as 14 sugars. Taking all of the above calculations together, the total number of permutations for a hexasaccharide can be enumerated. The master equation is given as the addition of all equations (A')-(L); negative values obtained from calculations should be regarded as zero.

Table III. Oligosaccharide isomers from D-hexoses, including branched forms

Glyco-Forum section

of hexasaccharides, nearly 2 x 1013 new compounds could be envisioned. This examination of the carbohydrate isomers is exhaustive for hexasaccharides and lower, and covers most isomers for compounds up to octasaccharides with the proviso that all possible branched compounds are to be considered and their terms are to be added. The numbers are astronomical, showing a graph that exceeds 2 logs per monomer through pentasaccharides (Figure 1), and grows beyond 3 logs per monomer above heptasaccharides, especially surprising for such a short oligomer sequence.

in a five- or six-sided polygon, and there is sequence order of all of these parameters.

Conclusion While nature has not yet confounded us with numbers of oligosaccharides of such magnitude, this brings little comfort to the analyst or synthetic chemist who must, after all, come to the proof that the oligosaccharide in question is, absolutely, the correct structure.

Managing the problem

Evolutionary potential There is a very high evolutionary potential in possible epitopes for the establishment of a biological recognition 'code' consisting of the binding pocket of a specific protein on the one hand and the complex sugar structure on the other. Because proteins can evolve more rapidly than carbohydrates (which must have a substantial enzyme change to add a new sugar), saccharide structures are likely to be very conserved over evolution when compared with proteins, whose specificity could change with a single amino acid mutation. Those carbohydrate sequences in metazoans with functions that are conserved will probably be preserved across orders, such as the selectins and heparinoids in mammals. There is obviously adequate chemistry for much further evolution in carbohydrate recognition systems. The above calculation shows the prospect for the most complex known chemical code in a short sequence yet uncovered in biology. The set model for this project can be described as a series of convex epitopes which have a direction, i.e. they have one or more beginning (non-reducing terminal) termini and only one (reducing end) ending terminus, conventionally called the reducing end and written with this 'reducing' terminus to the right. The epitopes can be populated with a set of epimeric monomers of defined size, which can be linked to each other at one position on the left hand and four different positions on the right. For each of the four positions there is a relation above (P) and below (a) an imaginary plane (D-forms). Each member can exist 766

Acknowledgements I owe gratitude to Narasinga Rao and Peter Fugedi, both of whom read a draft manuscript and made insightful suggestions, to Jeremy Carver and Alan Bush for discussions regarding limits to NMR resolution, and to Don Kiely, Jerry Hart and Dirk van den Eijnden, who encouraged me to publish the calculation.

References Aruffo,A., StamenkovicJ., Melnick.M., Underhill.C.B. and Seed,B. (1990) CD44 is the principal cell surface receptor for hyaluronate. Cell, 61, 1303-1310. Atha.D.H., Lormeau.J.C, Petitou.M, Rosenberg.R D. and Choay.J. (1987) Contribution of 3-O- and 6-O-sulfated glucosamine residues in the heparininduced conformational change in antithrombin III Biochemistry, 26, 6454-6461. Brandley.B.K., Swiedler.S.J. and Robbins.P.W. (1990) Carbohydrate ligands of the LEC cell adhesion molecules.Ce//, 63, 861-870. Casu.B. (1989) In Lane.D.A. and Lindahl.U. (eds), Heparin: Chemical and Biological Properties, Clinical Applications. Edward Arnold, London, pp. 25-49. Chang,M, Meyers, H., Nakanishi.K., Ojika.M., Park,J., Takeda.R., VazquezJ. and Wiesler.W. (1989) Microscale structure determination of oligosaccharides by the exiton chirahty method. Pure Appl. Chem., 61, 1193-1200. Cisar,J., Kabat.E.A., Dorner.M.M. and Liao.J. (1975) Binding properties of immunoglobulin combining sites specific for terminal or nonterminal antigenic determinants in dextran. J. Exp. Med., 142, 435-459. Cumming.D.A. and CarverJ.P. (1987) Virtual and solution conformations of oligosaccharides, Biochemistry, 26, 6664-6676. Feizi.T. (1985) Demonstration by monoclonal antibodies that carbohydrate structures of glycoproteins and glycolipids are onco-developmental antigens. Nature, 314, 53-57. Feizi.T. (1988) Carbohydrate structures as onco-developmental antigens and components of receptor systems. Adv. Exp. Med. Bioi, 228, 317-329. Fisher.R.F. and Long,S.R. (1992) Rhizobium-plant signal exchange. Nature, 357, 655-660. French,A.D., Mouhous-Riou,N. and Perez.S. (1993) Computer modeling of the tetrasaccharide nystose. Carbohydr. Res., 247, 51-62. Friedman.M.J., Fukuda,M. and Laine.R.A. (1985) A malaria parasite binding site on the major transmembrane protein of the erythrocyte. Science, 228, 75-77. Hakomori.S. (1984) Philip Levine award lecture: blood group glycolipid antigens and their modifications as human cancer antigens. Am. J. Clin. Pathol, 82, 635-648. Hellerqvist.C.G. (1990) Linkage analysis using Lindberg's method. In Mass Spectrometry. Methods Enzymol., 193, 554—573. Hoff.S.D., IrimuraJ., Matsushita,Y., Ota.D.M., Cleary,K.R. and Hakomori,S.-i. (1990) Metastatic potential of colon carcinoma. Expression of ABO/ Lewisrelated antigens. Arch. Surg., 125, 206-209. Karlsson.K.A. (1986) Animal glycolipids as attachment sites for microbes. Chem. Phys. Lipids, 42, 153-175. Laine.R.A. (1989) Tandem mass spectrometry of oligosaccharides. Methods Enzymol, 179. 157- 164. Laine,R.A. (1990) Glycoconjugates: overview and strategy. In McCloskey.J.A. (ed.), Mass Spectrometry. Methods Enzymol., 193, 539-553. Laine.R.A., Pamidimukkala.K.M., French.A.L., Hall,R.W., Abbas.S.A., Jain, R.K and Matta.K.L. (1988) Linkage position in oligosaccharides by fast atom bombardment ionization, collision-activated dissociation, tandem mass

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

Since questions will arise on how to manage this problem analytically, I should refer to an overview article introducing 11 chapters on carbohydrate analysis by mass spectrometry published in Methods in Enzymology, Vol. 193 (Laine, 1990), where strategy for analysis is given careful consideration. In the same volume, Karl Hellerqvist (1990) asserts in his chapter that out of 2.7 billion structures possible in a hexasaccharide set containing aminosugars, fucose and hexoses, methylation linkage analysis alone can reduce the number of possibilities to a few thousand. Other sensitive methods for linkage analysis have been proposed (Chang et ai, 1989; Wieser et ai, 1990). Calculation of the reduction in number of possible isomers in analysis can be addressed by using a technique that will establish the unknown parameter associated with any element of the equations. For example, the anomeric configuration of all sugars in a chain can be learned by using the correct set of glycosyl hydrolases in succession, and this should reduce the number by a factor of 26 or 64. In all cases, fragmentation of the molecule to smaller oligosaccharides simplifies the problem. Similarly, assembly of larger units in organic synthesis may improve yields.

Glyco-Fonim section Yuen.C.T, Lawson.A.M , Chai.W., Larkin,M., Stoll.M.S., Stuart.A.C, Sullivan.F.X., Ahern.T.J. and Feizi.T. (1992) Novel sulfated ligands for the cell adhesion molecule E-selectin revealed by the neoglycolipid technology among O-linked oligosaccharides on an ovarian cystadenoma glycoprotein. Biochemistry, 31, 9126-9131. Received on March 21, 1994; revised on July 28, 1994; accepted on August 30, 1994

Downloaded from http://glycob.oxfordjournals.org/ by guest on November 6, 2015

spectrometry and molecular modeling. L-fucosylp (fjl—»X)-D-A'-acetyl-Dglucosaminylp-(pl—>3)-D-galactosylp-(Pl->0-methyl) where X = 3,4, or 6" J. Am. Chem. Soc, 110, 6931-6939. Laine,R.A., Yoon.E., Mahier.T.J., Abbas.S.A., deLappe.B.W., Jain.R.K. and Matta,K.L. (1991) Non-reducing terminal linkage position determination in intact and permethylated synthetic oligosaccharides having a penultimate amino sugar: fast atom bombardment ionization, collisional-induced dissociation and tandem mass spectrometry. Biol. Mass Spectrom., 2, 505-514. Lane.D.A., Bjork.I. and Lindahl,U. (1992) Heparin and Related Polysaccharides. Plenum Press, New York. Lee.Y.C. (1990) Binding modes of mammalian hepatic Gal/GalNAc receptors. Ciba Found. Symp., 145, 80-95. Lindahl.U. and Hook,M. (1978) Glycosaminoglycans and their binding to biological macro-molecules. Annu. Rev. Biochem., 47, 385—471. Maniara,G., Laine,R.A. and KucJ. (1984) Oligosaccharides from Phytophthorai infestans enhance the ehcitation of sesquiterpenoid stress metabolite accumulation by arachidonic acid in potato. Physiol. Plant. Pathoi, 24, 177-186. Miller.K E., Mukhopadhyay.C, Cagas.P. and Bush.C.A. (1992) Solution structure of the Lewis X oligosaccharide determined by NMR spectroscopy and molecular dynamics simulations. Biochemistry, 31, 6703-6709. Opdenakker.G., Rudd.P.M., Ponting.C P. and Dwek.R.A. (1993) Concepts and principles of glycobiology. FASEB J., 7, 1330-1337. Polley,M.J., Phillips.M.L., Wayner.E., Nudelman.E., Singhal.A.K., Hakomori, S.-i. and Paulson.J.C. (1991) CD62 and endothelial cell-leukocyte adhesion molecule 1 (ELAM-1) recognition. Proc. Natl Acad. Sci. USA, 88, 62246229. Poppe.L., DabrowskiJ., von der Lieth.C.W., Koike.K. and Ogawa.T. (1990) Three-dimensional structure of the oligosaccharide terminus of globotnaosylceramide and isoglobotriaosyl-ceramide in solution. A rotating-frame NOE study using hydroxyl groups as long-range sensors in conformational analysis by 1H-NMR spectroscopy. Eur. J Biochem., 189, 313-325. Rademacher,T.W., Parekh,R.B. and Dwek.R.A. (1988) Glycobiology. Annu Rev. Biochem , 57, 785-838. Reitman.M.L. and Kornfeld.S. (1981) Lysosomal enzyme targeting. N-Acetylglucosaminyl-phosphotransferase selectively phosphorylates native lysosomal enzymes. J. Biol. Chem., 256, 11977-11980. RiensenfeldJ., Hook.M, Bjork.I., Lindahl.U. and Ajaxon.B. (1977) Structural requirements for the interaction of heparin with antithrombin III. Fed. Proc, 36, 39^13. Schmidt.R.R. (1986) New methods for the synthesis of glycosides and ohgosacchrides—are there alternatives to the Koenigs-Knorr method? Angew. Chem. Int. Edn Engl, 25, 212-235. Sharon.N. (1975) Complex Carbohydrates, Their Chemistry, Biosynthesis and Functions. Addison-Wesley, Advanced Book Program, Reading, MA, p. 7 Smith-Gill,S.J., Rupley,J.A., Pincus.M.R., Carty.R.P. and Scheraga.H.A. (1984) Experimental identification of a theoretically predicted 'left-sided' binding mode for (GlcNAc)6 in the active site of lysozyme. Biochemistry, 23, 993-997. Srnka.C.A., Tiemeyer.M., GilbertJ.H., Moreland.M., Schweingruber,H., de Lappe,B.W., James.P.G., Gant,T, Willoughby.R.E., Yolken.R.H., Nashed. M.A., Abbas.S.A. and Laine.R.A. (1992) Cell surface ligands for rotavirus: mouse intestinal glycolipids and synthetic carbohydrate analogs Virology, 190, 794-805. Takeo.K. and Kabat.E.A. (1978) Binding constants of dextrans and isomaltose oligosaccharides to dextran-specific myeloma proteins determined by affinity electrophoresis. J. Immunol., 121, 2305-2310. Tollefsen,M.D. (1992) In Lane.D.A., Bjork.J. and Lindahl.U. (eds), Heparin and Related Polysaccharides. Plenum Press, New York, pp. 167-176. Truchet.G., Roche.R, Lerouge.R, VasseJ., Camut.S., de Billy.E, Prome,J.-C. and Denarie.J. (1991) Sulfated lipo-oligosaccharide signals of Rhiiobium meliloti elicit root nodule organogenesis in alfalfa. Nature, 351, 670-673. vanBoeckel.C.A.A. and Petitou.M. (1993) The unique antithrombin III binding domain of heparin: a lead to new synthetic antithrombotics. Angew. Chem. Int. Edn,32, 1671-1718. Wiesler.W., Berova.N., Ojika.M, Meyers.H., Chang.M., Zhou.P, Lo,L.-C, Niwa,M., Takeda,R. and Nakanishi.K. (1990) A CD-spectroscopic alternative to methylation analysis of oghosaccharides: reference spectra for identification of chromophoric glycopyranoside derivatives. Helv. Chim. Ada, 73, 509-551 Yamamoto.F. and Hakomori,S.-i. (1990) Sugar-nucleotide donor specificity of histo-blood group A and B transferases is based on amino acid substitutions. J. Biol. Chem., 265, 19257- 19262. Yoon,E. and Laine.R.A. (1992) Linkage position determination of permethylated neutral novel trisaccharides by collisional induced dissociation and tandem mass spectrometry. Biol. Mass Speclrom., 21, 479—485.

767