Nucleic Acids Research - Europe PMC

5 downloads 857 Views 1MB Size Report
the DNA polymerase, i.e. purine/pyrimidine and certain purine/purine mis- matches (2, 3). .... All programs were written in TURBO-PASCAL (Borland Inc.).
Volume 14 Number 9 1986

Nucleic Acids Research

Mismatches in DNA double strands: thermodynamic parameters and their correlation to repair efficiencies

Heinz Werntges, Gerhard Steger, Detlev Riesner and Hans-Joachim Fritz'

Institut fir Physikalische Biologie, Universitiit Dusseldorf, 4000 Dusseldorf, and 1Max-Planck-Institut fir Biochemie, Abteilung Zellbiologie 8033 Martinsried bei Munchen, FRG Received 5 March 1986; Accepted 3 April 1986

ABSTRACT The helix-coil transitions of the 16 octadecameric DNA duplexes dCGTCGTTTXACAACGTCG-dCGACGTTGTX'AAACGACG with A, T, G, and C for X and X' were measured by UV-absorption. This sequence was taken from former studies of in vivo determination of efficiencies of mismatch repair (Kramer, Kramer, and Fritz (1984) Cell 38, 879-887). The thermodynamic parameters for double strand and mismatch formation have been obtained by evaluating the partition function of a stack model which allowed for loop formation. As a result the mismatches could be classified into wobble base pairs (T/G, G/G, C/A, A/A, A/G), open base pairs, i.e. permanent loops (T/T, C/T, T/C, C/C), and intermediate or weak base pairs (G/T, A/C, G/A). There is no correlation between Tm and the biological repair efficiency of X/X'. The structure classes, however, as described above show a close correlation: Open base pairs show the lowest repair efficiencies, whereas mismatches with high repair efficiency always belong to the structural class of wobble base pairs. Because of the palindromic nearest neighbors of the variation site X/X', the influence of next-nearest neighbor interactions could be detected and be estimated to about 1 kJ/mol for one stack.

INTRODUCTION In E. coli, post-replicative correction of nucleotide misincorporations that have escaped proof-reading makes an important contribution to the overall fidelity of DNA replication. The necessary signal for distinguishing the parental from the daughter strand is provided in this organism by the transient under-methylation of GATC-sites in newly synthesized DNA (1). It was recently found that different base/base mismatches are corrected with different efficiencies by the methyl-directed DNA mismatch repair system of E. coli (2, 3). Furthermore, a correlation was found (2) with the frequency of incorporation errors made by DNA polymerase III holoenzyme (4). The post-replicative repair system is apparently specialized to process those base/base mismatches that occur comparatively frequently as the result of errors made by the DNA polymerase, i.e. purine/pyrimidine and certain purine/purine mismatches (2, 3).

© I RL Press Limited, Oxford, England.

3773

Nucleic Acids Research The repair efficiencies have not yet been discussed in terms of known thermodynamic or structural features of the mismatches. A priori, one may think in terms of two alternatives: Either the repair enzymes recognize the mismatched bases in their unpaired, i.e. locally denatured state, or the mismatch may be recognized for the repair action when it forms a base pair of a non-Watson-Crick type. In the first case the repair efficiency would correlate with the probability of the open state, i.e. the instability of the mismatch; in the second case the correlation would hold for the the stability of the odd base pair or for some structural peculiarities. In order to test those models, the thermodynamic stabilities and favorable structural details have to be known for all base/base oppositions. The thermodynamic parameters of mismatches in double strands are also of general interest for the prediction of secondary structures in DNA and RNA. There are several reports in the literature on systematic studies of the stabilities of Watson-Crick base pairs. The effects of extra bases and mismatched bases have also been studied in oligonucleotides and polynucleotides. These studies, however, did not yield a complete set of thermodynamic data of all mismatches. Recently, Tinoco and co-workers (5) have reported thermodynamic parameters for all base/base oppositions. The neighboring sequence of the mismatch, however, was different from that of the mismatches in the repair efficiency experiments; because a marked influence of the type of the neighboring base pairs has to be expected for the stability of the mismatch, it is not surprising that these results do not correlate with one of the different interpretations of the repair efficiencies given above. In order to link the thermodynamic properties of mismatches as closely as possible to the results of a previously published genetic study (2), the region around codon 11 of the lacZ gene was re-synthesized (cf. ref. 2). In this work we measure the thermal transition curves of the resulting octadecamers containing any one of the 16 possible base/base oppositions in position 9 (Figure 1). The stability parameters of the mismatch are evaluated by applying statistical thermodynamics. A correlation with the known repair efficiencies is discussed.

MATERIALS AND METHODS Synthesis and analysis of oligonucleotides Eight 2'-deoxyoligonucleotides (cf. Figure 1) were synthesized by the phosphoramidite method (6) using an Applied Biosystems 380 A DNA synthesizer. The compounds were purified by two consecutive rounds of reversed phase HPLC 3774

Nucleic Acids Research 51 C G T C G T T T X A C A A C G T C

3,G C

G31

A G C A A A X'T G T T G C A G C5,

Figure 1: The octadecameric DNA with the variation site X/X' in position 9. at the 5'-tritylated and the completely deprotected stage as described earlier (7). All samples were precipitated two times with 5 volumes of ethanol/acetone (50/50) to remove salt. Pellets were dissolved in 1 ml H20 and stored at 200C. For the measurements all samples were adjusted to the concentration (8) of about 0.3 A260/ml in a standard buffer containing 500 mM NaCl (suprapur, Merck, Darmstadt, FRG), 1 mM Na-cacodylate, 0.1 mM EDTA, pH 7.1. All solutions were prepared from high purity water (Milli-Q-System, Millipore GmbH). All other chemicals and solvents were at least of reagent grade. All oligonucleotide samples were analyzed by HPLC on a Nucleogen-DEAE 60-7 column (9) (Macherey-Nagel, DUren, FRG) and by electrophoresis on 20%-polyacrylamide gel stained with silver (10). Only extremely low amounts of contaminating shorter fragments were detected. Two oligonucleotides were sequenced according to Maxam and Gilbert (11) to exclude errors during synthesis. Equilibrium thermal denaturation curves The thermal denaturation curves were measured in a dual wavelength spectrophotometer (Sigma ZWS 11, Biochem, MUnchen, FRG) with a cuvette of 1 cm pathlength and 40 pl volume (HELLMA, Mullheim, FRG) as described earlier (12). Absorption and temperature values from a PtlOO resistor were sampled at a rate of 20 points/OC by an APPLE II computer, which also controlled the increase or decrease of the temperature of the thermostating bath at a rate of 0.20C/min. Each sampled value was the average of about 1000 measured values. Smaller heating rates gave the same results. Fitting and differentiating of the melting curves was performed as described earlier (12). The theoretical calculations for curve fitting were also done on the APPLE; for one run about 3 min were needed. All programs were written in TURBO-PASCAL (Borland Inc.).

EXPERIMENTAL RESULTS For the synthesis of double-stranded oligonucleotides the sequences were taken from Kramer et al. (2) to assure comparability between thermodynamic results and the repair efficiencies observed earlier. A length of 18 nucleotides with the mismatch in position 9 was chosen as an optimal compromise between long oligonucleotides with a narrow and well-measurable transition range 3775

Nucleic Acids Research 0.015

12

GIC CI

-

1.0 1.2

I-

--

|

0.000 0.015

-

rTT

GIG A260

CIA~~~~~A

d A260 dT

in

0.000

1.2

0.015

A/C

C/A

1.0

0.000 40

60

80 40 T/ *C

60

80

Figure 2: Denaturation curves of octadecameric DNA double strands in standard buffer. The sequences are identical to that of Figure 1, the opposition X/X' is given in the figure. Values of absorption are normalized to 1 A260 at 250C. and short oligonucleotides with a high relative influence of the mismatched base pairs. The ionic strength was 500 mM Na+ in all measurements. This high value assured complete reversibility of the denaturation in a temperature range which was still adequate for measurements. Furthermore, most literature data are determined at similarly high ionic strengths. Figure 2 shows examples of melting curves, all measured in standard buffer. They are depicted in the integrated as well as in the differentiated form. The curves were well-reproducible. The Tm-values, taken as the maxima of the differentiated curves, varied by less than 0.30C in repeated measurements, which is very small compared to the broad transition ranges. Viroid RNA melting was used as an internal standard, because its properties are well-known in the literature (13) and its sharp transition could easily be identified in

3776

Nucleic Acids Research Table 1: Tm-values for all combinations of the four different (+)and (-)-strands. The (+)-strand is the upper strand of Figure 1, the (-)-strand the lower. The error in Tm is about +0.30C. -XI

Tm/OC

+X

A C G T

A 55.6 53.2 56.6 64.9

C 54.6 51.5 69.6 56.0

G 59.3 68.1 56.7 60.0

T 67.1 58.6 60.1 58.7

mixtures with oligonucleotides. Deviations in the ionic strength between different samples could be excluded by this method. The Tm-values of the mismatched double strands range from 510C for C/C to 600C for G/T and the Tm-values of the intact double strands from 650C for T/A to 700C for G/C (Table 1). A broadening of the curves to the high temperature side is obvious for some mismatches, e.g. A/C. This indicates a transition splitting which is not consistent with an all-or-none model. Significant differences in the Tm-values were obtained when strands with the opposition +X/-X' were compared with the strands containing +X'/-X (Table 1). This is surprising, because the base pairs neighboring the position of the mismatch are palindromic. Consequently, an exchange of the base in the (+)-strand with that in the (-)-strand does only swap the types of stacks to the neighboring bases, but not alter them. Thus, total nearest neighbor interactions are not changed by inverting the oppositions +X/-X' to +X'/-X. Therefore, the non-identical Tm-values are a proof for long-range interactions of base pairs in double-stranded nucleic acids, e.g. next-nearest neighbor interactions.

THEORY A theoretical model has been developed to simulate melting curves. The fitting of computed melting curves to the experimentally obtained ones yields the thermodynamic parameters characterizing the influence of mismatches on the local stability of DNA. A modification of D. Poland's algorithm (14) was used for the calculation of the (internal) degree of helicity gi of all double strands at a given T. From gi the hypochromicity is calculated which may be compared to the experimentally obtained hypochromicity AA(T) at the same temperature.

3777

Nucleic Acids Research General features of the model 1) Nearest neighbor interactions: For the helix-coil transition of DNA in aqueous solution, most of the reaction enthalpy AH is needed to overcome the stacking interactions between adjacent base pairs. Consequently, the model was chosen to describe the opening of the stacks between base pairs, not the base pair opening itself. This leads to a "stack model" instead of a "base pair model". 2) Formation of internal loops: For short DNAs as the ones used here, melting is expected to proceed only from the ends of the double strands. However, the expected and measured destabilizing influence of a mismatch may initialize and facilitate the formation of a small internal loop around the mismatch (15). Therefore special care was taken of the introduction of an adequate loop function 6. 3) Modeling of a mismatch: X-ray and NMR data as well as theoretical studies (16-20) suggest that mismatches lead only to local deterioration in DNA structure. In accordance with the nearest neighbor approximation used here, a mismatch is assumed to influence only the two adjacent stacks. Two thermodynamic parameters as defined below will be used to describe this influence. Two more restrictive models have been tried and both found to yield nonacceptable results: The observed occurrence of two subtransitions at mismatched systems cannot be described adequately when neglecting loop formation. An all-or-none model does not even reproduce the melting curves of the WatsonCrick systems, neither in Tm nor in AT1/2. Corrections for sequence-specific hypochromicity have been introduced and shown to play only a minor role. Details of the model 1) Stacking parameters: The reaction XIY

X- Y

\

Keq

2sY-1

, i.e. the opening of a stack, is described by two parameters, AHXy and ASXy. Because of symmetry, XY = TX. So there are 10 different possible stacks causing a need for 20 stacking parameters. In this work a common approximation for DNA is used for the AS-values: =: ASxy AS. All reaction entropies are thought to adopt the same value. The AH-values were derived from the Tm(XY)-values of Gotoh (21). AS is left as a 3778

Nucleic Acids Research free parameter that has to be obtained by curve fitting (see next paragraph). It reflects individual buffer conditions. 2) Dissociation into single strands: In a "base pair model" the state without base pairs is the single strand state, while in the "stack model" the state without stacks represents an ensemble of N+1 double strand states forming one base pair, N being the number of possible stacks: state 0 :=+

......

One equilibrium constant K for all N+1 possible final dissociation reactions

~~~~~~~~~+

>K K

has been introduced. K is assumed to be independent of temperature. This implies that there is no reaction enthalpy involved in the dissociation step. K is another free parameter to be obtained by curve fitting. Once fixed, the same K is used for all 16 duplexes. 3) Adaptation of Poland's algorithm: Poland's algorithm is given for a "base pair model", but it works equally well for the "stack model". Two minor modifications are necessary, though: First, the argument of the loop function should be shifted. A loop of size n open stacks corresponds to a loop of size (n-1) open base pairs. A loop of only one stack does not make sense, so 6(1) should be chosen close to zero. Then, the statistical weight of state 0 has to be added to Poland's partition function, since the state 0 is a double strand state only in the stack model. This is performed by starting Poland's recursion for ti with tN instead of tNl1, setting tN := rN * (N+1) (cf. Poland's equation no. 28). In addition to these minor adaptations, the treatment of strand dissociation has been performed according to the preceding paragraph. 4) The "virtual stack"-approximation: A mismatch is assumed to influence only the two stacks next to it. A priori, there is no reason for retaining the common Watson-Crick parameter AS also for disturbed stacks. So 4 unknown parameters (two AH-values and two ASvalues) are needed for a consistent accounting of the mismatch influence. It turned out, however, that hardly two parameters could be found by curve fitting on the basis of the present data. Assuming that one of the destabilized stacks always opens immediately after the opening of the other, we com3779

Nucleic Acids Research bined both to a cooperative unit, the so-called "virtual stack", thus leaving only two unknown parameters for a mismatch, AHM, and ASM, respectively, being the sums of the AH- and AS-values of the two disturbed stacks. The approximation shortens the molecule's model by one base pair. That should be kept in mind when using the loop function values or the dissociation constant K. In order to apply the same model for mismatched and mismatch-free duplexes, both stacks of the Watson-Crick base pair in the position of the mismatch were also treated as a "virtual stack" with parameters equalling the sums of the parameters of the individual stacks. 5) The loop function The probability for the formation of an internal loop of m opened stacks is given by the loop function value 6(m). Since extensive use of curve fitting was made to obtain an adequate loop function, the details will be given in the next chapter. FITTING OF THE PARAMETERS The parameters AHM(X/X') and ASM(X/X') of the 12 mismatches X/X' are of central interest in this paper. In order to obtain them it is necessary to determine also the other not yet fixed parameters of the model. The entropy of all Watson-Crick stacks AS and the dissociation constant K apply to both mismatched and mismatch-free systems. They may therefore be obtained solely from the experimental melting curves of the mismatch-free duplexes. The loop function parameters are of significant influence only for mismatched systems and therefore have to be determined together with the mismatch parameters. For parameter determination an experimentally obtained melting curve is plotted on the graphics screen. Then a computed curve is fitted to it by varying the model parameters. The "best fit" is chosen visually. More elaborate numerical methods of curve fitting would hardly be able to improve the accuracy of the fitted variables due to the error margin of the basic model parameters AHXy and to spikes in the experimental data. Parameters from mismatch-free duplexes For short DNA duplexes without a mismatch, loop formation may be neglected. Since the stacking enthalpies AHXy are given by Gotoh's data (21), this leaves only AS and K as open parameters of the model. They were determined simultaneously using the data of the most stable duplex G/C for fitting. Finding AS and K by a single fitting procedure was possible, because they govern

3780

Nucleic Acids Research

80 40 TI/ C

Figure 3: Experimental (dotted) and theoretical (solid) melting curves for mismatch-free duplexes.

essentially different features of the model curves. Following values were found: = 0.1040 kJ/(mol-K), AS = 5 * 10-4 M K Models designed for only nearest neighbor interactions are not able to account for the observed asymmetry Tm(X/X') J Tm(X'/X). Therefore we could not expect our model to reproduce Tm(C/G). The same holds for Tm(A/T) and Tm(T/A). The T/A-data were matched without further fitting. Figure 3 shows the experimental and fitted melting curves for G/C (a) and T/A (b) in both integrated and differentiated representation. In order to obtain some quantitative estimation of next-nearest neighbor interactions we treated the AHM's of C/G and A/T as adjustable parameters using the same AS and K as above. The results of fitting are included in Table 3 of the next chapter. Loop-weighting function The loop function 6(m) turned out to be the crucial element of the model for a successful description of melting behavior of mismatched duplexes. At first the simple and commonly used (22) empirical function 6(m) = a * m-1.75

(4.1)

(m=loop size in number of open stacks, ca=cooperativity factor) has been applied. However, no value foracould be found that adequately described the observed transition splitting. Using the experimental RNA data from Gralla and Crothers (15) for 6(2), 6(3), and 6(4), and (4.1) for extrapolation, did not lead to an improvement. 3781

Nucleic Acids Research Table 2: The loop function used in the virtual stack model. size m.5 open stacks equation (4.1) holds, a equalling 0.07. 6(m) loop size m

1 2 3 4 5 m>5

For loop

10-10 5. 10-6 5 - 10-5 5. 10-4 4.2 * 10-3 0.07 * m-1.75

Only a strong decrease in probability for the smallest three loops finally succeeded in reproducing the transition splitting. The smallest possible loop is one open base pair, i.e. two open stacks. Therefore, in the stack model 6(1) does not make sense. Since Poland's algozithm requires all 6(m) to be greater than 0, 6(1) was set to 10-10. We also applied 6(1) to the virtual stack, implying that the formation of a mismatch loop without additional open base pairs at its sides also is highly unlikely. This restriction will be discussed later. 6(2), 6(3), and 6(4) were treated as adjustable parameters. For mz5 open stacks formula (4.1) was used, so there were four adjustable parameters (6(2), 6(3), 6(4), a) left to be determined by curve fitting. A single melting curve of a short mismatched duplex generally does not contain sufficient information for a simultaneous determination of four loop parameters and LHM and ASM of the individual stack to be described later. However, using cas the main fitting parameter, and using 6(2) .. 6(4) only as values roughly interpolating the interval [6(1), 6(5)], a set of loop parameters was found which is indeed able to produce reasonable fits to the melting curves of all mismatched duplexes. As mentioned above, for any particular fit to a mismatched system's melting curve also its specific parameters AHM(X/X') and ASM(X/X') are required. However, since the same loop function applies to all 12 mismatched systems, this additional need could be overcome by selecting those loop function parameters that give an optimal fit to all mismatched systems. To find them a considerable amount of computation was necessary, though. Table 2 shows the loop function finally used. Mismatch parameters With AS, K, and 6(m) fixed, the virtual stack model was enabled to deter3782

Nucleic Acids Research

40

60

80 40 T/ *C

60

80

Figure 4: Examples of experimental (dotted) and computed (solid) melting curves of mismatched duplexes. Left column: curve fitting assumes wobble structure; right column: curve fitting assumes open mismatch structure. From the different qualities of curve matching, C/A is identified as a wobble structure and C/T as an open mismatch (loop). mine the two characteristic parameters AHM and ASM of each mismatch. Unfortunately, variations on AHM and ASM produce quite similar - but not equal effects in the computed curves. As a consequence the accuracy of the data did not permit a simultaneous fitting of both AHM and ASM. We eliminated the need of fitting two parameters at a time by pre-selecting one of them. Three possible types of mismatches were considered: a) Wobble pair By selecting ASM := 2 * AS, and by using AHM as the adjustable parameter, one assumes a mismatch structure which is of about the same degree of order as a regular base pair and only less stabilized by enthalpy. A wobble pair is a typical example. b) Open mismatch Alternatively, setting AHM := 0 and using ASM as the fitting parameter implies a mismatch structure without any noteworthy interaction between the

3783

Nucleic Acids Research Table 3: The parameters for the variation site in the sequence of Figure 1 as obtained by the virtual stack model. The asterisk denotes the parameters which were determined by curve fitting. The Tm-values are defined as the temperatures of the maximum absorption change.

base combination X/X'

G/C

C/G A/T T/A G/T T/G A/G T/T C/T G/G G/A T/C A/A A/C C/A C/C

experimental Tm(X/X')/OC

AHM(X/X') (kJ/mol)

69.6 68.1 67.7 64.9 60.1 60.0 59.3 58.7 58.6 56.7 56.6 56.0 55.6 54.4 53.2 51.5

70.6 68.8

0.208 0.208

67.8* 65.5*

0.206

30.6

60.8* 59.1 0 0

57.0* 30.6 0 55.6 30.6 54.1 0

ASM(X/X') (kJ/mol K)

0.208 0.116 0.208 0.208 0.031 0.031 0.208 0.128 0.035 0.208 0.133 0.208 0.045

structure type

Watson-Crick Watson-Crick Watson-Crick Watson-Crick weak wobble wobble open open wobble weak open wobble weak wobble open

two bases. The low ASM found for this type of mismatch (see below) indicates that the bases are highly mobile and do not form a rigid structure. c) Weak base pair Cases a) and b) are two extremes. To allow for at least one intermediate case, we used ASM as the fitting parameter and selected AHM := 30.6 kJ/mol, which is about half of a typical wobble pair enthalpy change (see below). All three fitting procedures were applied to all 12 mismatches. For any mismatch there turned out to be one procedure giving the best fit; thus, a clear classification of the mismatches was achieved. Figure 4 shows a typical example of the wobble pair structure (C/A) and the open mismatch type (C/T) as well as discarded "fits" of different classes for comparison. THERMODYNAMIC PARAMETERS FROM CURVE FITTING The parameters which are identical for all duplexes are given in the previous paragraph. Table 3 summarizes the results of the fitting procedures 3784

Nucleic Acids Research for the parameters AHM(X/X') and ASM(X/X') which are specific for the base combination X/X' at the variation site of the sequence. As already mentioned, also for the three Watson-Crick pairs C/G, A/T, and T/A, AHM was determined by curve fitting. A set of similar data was given by Tinoco and co-workers (5). Their Tmvalues of different mismatched duplexes correlate roughly with those given here. Deviations may be due to the different neighboring sequences. However, the differences in the elementary parameters of the mismatches are significant. Tinoco and co-workers assumed an all-or-none model for the evaluation of their experiments. Because of the importance of loop formation as found in the present study, the discrepancy between both sets of data is not surprising. Next-nearest neighbor interactions Up to now there is no experimental observation that allowed for a quantitative estimation of next-nearest neighbor interactions between base pairs. The sequence used in these studies gives a unique chance to do so, because the palindromic nearest neighbors of the variation site permit inverting a base pair from X/X' to X'/X without replacing two stacks by two others, thus leaving the total nearest neighbor interaction constant. Therefore, a change in Tm as big as observed here can be explained only by interactions with the nextnearest neighbors, which are indeed non-palindromic in this case. The differences AHM(G/C) - AHM(C/G) = 1.8 kJ/mol, AHM(A/T) - AHM(T/A) = 2.3 kJ/mol, which amount to about 3% of the stacking enthalpy of the two nearest neighbors, may give a rough estimate of the fraction of stacking enthalpy due to next-nearest neighbor interactions. Next-nearest neighbor effects are also evident for the mismatches. They could, however, not be evaluated quantitatively because in most cases different types of structures would have to be compared. DISCUSSION The thermodynamic stability of the double-stranded octadecamer was clearly dependent upon the type of base pair or mismatch in position 9. The differences in stability as well as in melting curve shape were experimentally significant and reproducible. However, the main concern of this study was the evaluation of basic thermodynamic parameters of the mismatches from clear-cut experiments and their correlation with biological features observed earlier. 3785

Nucleic Acids Research

Thermodynamic model In this study, a model with 8 adjustable parameters (K, AS, 6(2), 6(3), 6(4), a, AHX/Xi, and ASX/Xi) was used to simulate a single melting curve. For simple curves as in Figure 2 this number seems far too high. However, 6 of these parameters remain the same for all 16 curves, so the results are actually based on not more than 2 individual parameters (AHx/x, and ASx/xi) per experimental curve. The resulting set of parameters can only be consistent within this model, because assumptions like sequence-independence of the loop function or the nearest neighbor approximation are possibly not generally applicable. The assumptions are, however, much less restrictive than those, e.g. all-or-none model, used in other studies on related problems. Although the absolute numbers resulting from this study may be used with care, the qualitative arguments and interpretations should not be affected. The stacking enthalpies obtained from Gotoh (21) and used in this work seem to be quite reliable. They do not only correlate to data from theoretical studies (23). as already pointed out by Gotoh, but also meet the stacking enthalpies determined calorimetrically by Klump (24). The mismatch was incorporated into the model as a "virtual stack". The formation of a pure mismatch loop with both neighboring base pairs still intact is inhibited by setting 6(1) = 10-10 (cf. section "loop weighting function"). On the other hand, the so-called open mismatch structure (cf. Table 3) is just such a loop. Both views are consistent, though, because the parameter ASM may be regarded as a sequence-dependent correction of the loop entropy, if the virtual stack is an open mismatch. Structure and thermodynamics The virtual stack parameters AHM and ASM have been interpreted in terms of different types of mismatch structure. Melting curves were found to give clues to the classification of these structures. While wobble pairs and open mismatches (loops) are known in the literature, the 'weak base pairs' introduced herein are simply thought to represent the class of all intermediate states between the two above extremes. Because a stack model was used i-nstead of a base pair model, the classification of mismatches depends on its neighbors The modeling of a mismatch as a virtual stack and the evaluation of its parameters in terms of nearest neighbor interactions assumes a completely localized structural deformation of the DNA by a mismatch. This view is it accordance with findings from X-ray, UV, and NMR analysis and model considera3786

Nucleic Acids Research Table 4: Melting points and structure types of the mismatched sequences X/X' as compared to the repair efficiencies found by (2). Repair efficiencies are symbolized as following: '+' for good, '0' for intermediate, '-' for poor, and '- -' for very poor. The repair efficiencies of the mismatches marked with parentheses are determined with a slightly different sequence.

X/X'

Tm/0C

X/X'

G/T T/G A/G T/T C/T G/G

60.1 60.0 59.3 58.7 58.6 56.7 56.6 56.0 55.6 54.5 53.2 51.5

T/G G/G C/A A/A A/G G/T

G/A

T/C A/A A/C C/A C/C

A/C

G/A T/T C/T T/C C/C

X/X'

structure type

(+)

T/G

+

G/G

(+)

C/A A/A A/G G/T A/C

wobble wobble wobble wobble wobble weak weak weak open open open open

repair efficiency

0 -

not tested not tested - - - -

|

G/A T/T C/T T/C C/C

tions, which state that the sugar-phosphate backbone is flexibel enough to accomodate either purine-pyrimidine or purine-purine mispairs (17, 25, 18, 19). Examples for localized and structurally well-defined mismatches were G/T and G/A (16, 17). Fresco and co-workers (26) pointed out that hydrogen bonds in mismatches may be formed without deformation of the backbone if one of the bases is in an unfavored tautomeric form. More recently they reported (25) that the tendency of the base to stack with the nearest neighbors selects its tautomeric form. Our finding that pyrimidine-pyrimidine pairs do not stack is in accordance with literature reports stating that such a base pairing is not possible (26) and that the loss in stacking energy would be large (19). Next-nearest neighbor effects The experimental finding that Tm(X/X') i Tm(X'/X), demonstrates that there are long range interactions in DNA influencing the stacking enthalpies. A first estimation of 3% deviation in AH(XY) due to next-nearest neighbor interactions was possible. As a consequence, nearest neighbor models are likely to miss experimentally obtained melting points by as much as 100C. For example, the deviations between theory and experiment as reported by Gotoh (21) may be due to such effects and may only be avoided by a future applica-

3787

Nucleic Acids Research

Li I I wobble type: recognized

I IT

li

I I I I : I I I I III

open mismatch type: not recognized

Figure 5: Hypothesis to explain the observed correlation between repair efficiency and postulated mismatch structure.

tion of a next-nearest neighbor model. Although next-nearest neighbor interactions have not been evaluated quantitatively elsewhere, there are indications for those effects in other studies. Dickerson (27) pointed out that in order to avoid purine-purine collisions the geometry of a Watson-Crick helix has to be slightly modified. Such a local modification depends on the nearest as well as the next-nearest base pairs. As concluded from NMR experiments (28), the mismatch A/C may introduce perturbations extending over several base pairs. Biological relevance: correlatiOn with repair efficiency The repair efficiency of a mismatch as observed in E.coli (2,3) depends strongly on the type of the mismatch and possibly on the surrounding sequence. As expected, the mismatches were found to weaken their surrounding structure, but there is no obvious correlation between Tm and the repair efficiency of X/X', as Table 4 demonstrates. However, the structure classes postulated by the theoretical treatment do show a close correlation with the repair efficiency. As can also be seen in Table 4, all pyrimidine/pyrimidine combinations, which form the class of open mismatches, are exactly the mismatches showing the lowest repair efficiency. The other extreme is given by the wobble type mismatch: Among these structures are those repaired most efficiently. Only in the intermediate range there is no quantitative correlation with the repair efficiencies. Examples of poor repair efficiency are found for wobble pairs and for weak pairs. Therefore, we like more to emphasize the correlation with the extreme cases as outlined above. Also the T/G- and C/A-oppositions (marked in Table 4 by parentheses) have to be considered with care, because the efficiencies have been determined in the neighborhood 5'CCXAA3' (+ strand, (2)) and the thermodynamics as in Fig. 1. Thus it cannot be concluded safely that the structural differences between T/G and G/T or C/A and A/C as found in the sequence of Fig. 1 do correlate with the repair efficiencies. If the structure of the well repairable mismatches T/G and C/A could be determined in the neighborhood 5'CCXAA3', one has to expect to find the structure of the wobble base pair. 3788

Nucleic Acids Research This rule immediately implies a possible mechanism of the repair system: The repair enzymes seem to check the DNA for rigid local deformations rather than local instabilities of the double strand. Perhaps a mismatch made up of two small pyrimidines is able to escape recognition by the repair system by 'swinging into the helix', as Figure 5 suggests. Repair of mismatches and formation of mismatches are related problems. Fersht (29) reported on frequencies of mismatch formation during DNA replication. Kramer et al. (2) pointed out that mismatches which are most often formed show the highest repair efficiency and vice versa. The formation of mismatches may now be understood on a structural basis: The mismatches most likely to occur stack inside the double strand and partly also as the latest nucleotide of the growing double strand and therefore behave close to regular base pairs, while mismatches forming only loops are of such a low transient stability that they can not be incorporated into the newly synthesized strand. Considering formation and repair of mismatches, the physical data explain primarily the frequencies of mismatch formation, whereas biological evolution took care of a repair system which eliminates most efficiently the most frequent errors. A more detailed interpretation would have also to consider proof-reading as the step between incorporation and repair. Whatever the mechanistic basis of proof-reading is, it does not cancel the co-operation between incorporation and repair but possibly enhances it.

ACKNOWl EDGEMENTS We thank R. Hecker for help in HPLC purification of the DNA oligomers, and C. Heuer, I. Meier, and P. Loss for help in DNA sequencing and gel electrophoresis analysis. The help of Ms. B. Greuner in preparing the manuscript is greatly acknowledged. One of us (HW.) received a fellowship of the Studienstiftung des deutschen Volkes. This work was supported by grants from the Minister fur Wissenschaft und Forschung des Landes Nordrhein-Westfalen and the Fonds der Chemischen Industrie. REFERENCES 1 Pukkila, P.J., Peterson, J., Herman, G., Modrich, P., and Meselson, M. (1983) Genetics 104, 571-582. 2 Kramer, B., Kramer, W., and Fritz, H.-J. (1984) Cell 38, 879-887. 3 Dohet, C., Wagner, R., and Radman, M. (1985) Proc. Natl. Acad. Sci. USA 82, 503-505. 4 Fersht, A.R. and Knill-Jones, J.W. (1981) Proc. Natl. Acad. Sci. USA 78, 4251-4255. 5 Aboul-Ela, F., Koh, D., andTinoco Jr., I. (1985) 3789

Nucleic Acids Research Nucl. Acids Res. 13, 4811-4824. 6 Caruthers, M.H. in: "Chemical and Enzymatic Synthesis of Gene Fragments", H.G. Gassen and A. Lang Eds., pp. 71-79, Verlag Chemie, Weinheim, 1982. 7 Fritz, H.-J., Belagaje, R., Brown, E.L., Fritz, R.H., Jones, R.A., Less, R.G., and Khorana, H.G. (1978). Biochemistry 17, 1257-1267. 8 Handbook of Biochemistry and Molecular Biology, Nucleic Acids I (1975), Fasman, G.D. Ed. , 3. edn., p. 589. 9 Colpan, M. and Riesner, D. (1984) J. Chromatogr. 296, 339-353. 10 Schumacher, J., Randles, J.W., and Riesner, D. (1983) Analyt. Biochem. 135, 288-295. 11 Maxam, A.M. and GiTTert, W. (1980) Meth. Enyzmol. 65, 499-560. 12 Randles, J.W., Steger, G., and Riesner, D. (1982) Nucl. Acids Res. 10, 5569-5586. 13 Riesner, D. (19857 Ann. Rev. Biochem. 54, 531-564. 14 Poland, D. (1974) Biopol. 13, 1859-1871. 15 Gralla, J. and Crothers, DR. (1973) J. Mol. Biol. 78, 301-319. 16 Brown, T., Kennard, O., Kneale, G., and Rabinovich, D. (1985) Nature 315, 604-606. 17 Kennard7,'. (1985) J. Biomol. Struct. Dynamics 3, 205-226. 18 Kan, L.-S., Chandrasegaran, S., Pulford, S.M., and Miller, P.S. (1983) Proc. Natl. Acad. Sci. USA 80, 4263-4265. 19 Keepers, J.W., Schmidt, P., James, T.L., and Kollman, P.A. (1984) Biopolymers 23, 2901-2929. 20 Chuprina, V.P. and Poltev, V.I. (1983) Nucl. Acids Res. 11, 5205-5223. 21 Gotoh, 0. (1983) Adv. Biophys. 16, 1-52. 22 Poland, D. and Scheraga, H.A.71970) Theory of helix-coil transitions in biopolymers, pp. 188-191, Academic Press, New York & London. 23 Ornstein, R.L., Rein, R., Breen, D.L., and MacElroy, R.D. (1978) Biopolymers 17, 2341-2360. 24 Klump, H., siumitted for publication. 25 Fresco, J.R., Broitman, S., and Lane, A.-E. (1980) ICN-UCLA Symposia on Molecular and Cellular Biology, B. Alberts Ed., Academic Press, NY, Vol. XIX, pp. 753-768. 26 Topal, M.D. and Fresco, J.R. (1976) Nature 263, 285-293. 27 Dickerson, R. (1983) Sci. Am. 12/1983, 86-1-2F? 28 Patel, D.J., Kozlowski, S.A., Ikuta, S., and Itakura, K. (1984) Biochemistry 23, 3218-3226. 29 Fersht, A.R., Knill-Jones, J.W., and Tsui, W.-C. (1982) J. Mol. Biol. 156, 37-51.

3790