2 ot-Thalassaemia - Science Direct

1 downloads 0 Views 3MB Size Report
The overall homology is 98.5% and is subdivided by a 7 bp insertion in the. IVS 2 of a~ into Z1 ..... t --like. 1 -like, 3'-UTR. cL 2 -like, 3'-UTR. (B) anti 4.2. Et(o. m. X. I. 62. × irr= ~7-/-/777xmm~. Dr/,'/.,'tlJ. X ...... Hemoglobin 21:331-344. Mesker WE ...
2 ot-Thalassaemia L U I G I F. B E R N I N I

Emeritus Professor in Biochemical Genetics

Emeritus Lecturer in Biochemical Genetics at Leiden University

C O R N E L I S L. H A R T E V E L D PhD Researcher in Molecular Genetics

Institute of" Human Genetics of the Medical Faculty, University of Leiden, SyIvius Laboratory, Wassenaarseweg 72, 2333 AL Leiden, The Netherlands

c~-Thalassaemias are genetic defects extremely frequent in some populations and are characterized by the decrease or complete suppression of a-globin polypeptide chains. The gene cluster, which codes for and controls the production of these polypeptides, maps near the telomere of the short arm of chromosome 16, within a G + C rich and early-replicating DNA region. The genes expressed during the embryonic (4) or fetal and adult stage (or2 and cq) can be modified by point mutations which affect either the processing-translation of mRNA or make the polypeptide chains extremely unstable. Much more frequent are the deletions of variable size (from = 3 to more than 100kb) which remove one or both ct genes in cis or even the whole gene cluster. Deletions of a single gene are the result of unequal pairing during meiosis, followed by reciprocal recombination. These unequal cross-overs, which produce also c¢ gene triplications and quadruplications, are made possible by the high degree of homology of the two tx genes and of their flanking sequences. Other deletions involving one or more genes are due to recombinations which have taken place within non-homologous regions (illegitimate recombinations) or in DNA segments whose homology is limited to very short sequences. Particularly interesting are the deletions which eliminate large DNA areas 5' of ~ or of both ~ genes. These deletions do not include the structural genes but, nevertheless, suppress completely their expression. Larger deletions involving the tip of the short arm of chromosome 16 by truncation, interstitial deletions or translocations result in the contiguous gene syndrome ATR-16. In this complex syndrome ot-thalassaemia is accompanied by mental retardation and variable dismorphic features. The study of mutations of the 5' upstream flanking region has led to the discovery of a DNA sequence, localized 40 kb upstream of the ~-globin gene, which controls the expression Baillikre ~ Clinical Haematology-Vol. 11, No. 1, March 1998 ISBN 0-7020-2460-0 0950-3536/98/010053 + 38 $12.00/00

53 Copyright © 1998, by Bailli~re Tindall All rights of reproduction in any form reserved

54

L. F. BERNINI AND C. L. HARTEVELD

of the c~ genes (c~ major regulatory element or HS-40). In the acquired variant of haemoglobin H (HbH) disease found in rare individuals with myelodysplastic disorders and in the X-linked mental retardation associated with c~-thalassaemia, a profound reduction or absence of ~ gene expression has been observed, which is not accompanied by structural alterations of the coding or controlling regions of the c~ gene complex. Most probably the acquired c~-thalassaemia is due to the lack of soluble activators (or presence of repressors) which act in t r a n s and affect the expression of the homologous clusters and are coded by genes not (closely) linked to the c~ genes. The ATR-X syndrome results from mutations of the XH2 gene, located on the X chromosome (Xq13.3) and coding for a transacting factor which regulates gene expression. The interaction of the different c~-thalassaemia determinants results in three phenotypes: the c~-thalassaemic trait, clinically silent and presenting only limited alterations of haematological parameters, HbH disease, characterized by the development of a haemolytic anaemia of variable degree, and the (lethal) Hb Bag's hydrops fetalis syndrome. The diagnosis of otthalassaemia due to deletions is implemented by the electrophoretic analysis of genomic DNA digested with restriction enzymes and hybridized with specific molecular probes. Recently polymerase chain reaction (PCR) based strategies have replaced the Southern blotting methodology. The straightforward identification of point mutations is carried out by the specific amplification of the c~2 or e~ gene by PCR followed by the localization and identification of the mutation with a variety of screening systems (denaturing gradient gel electrophoresis (DGGE), single strand conformation polymorphisms (SSCP)) and direct sequencing. Key words: t~-thalassaemia; c~-globin; haemoglobin variants; haemoglobinopathies; txthalassaemia deletions, c~-thalassaemia point mutations.

The term o~-thalassaemia relates to a class of inherited disorders of haemoglobin synthesis in which the production of o~-globin chains is partially or completely suppressed. The interaction between numerous different o~thalassaemia alleles results in the expression of a large spectrum of phenotypes ranging f r o m a silent trait to a very severe anaemia already lethal in utero or soon after birth. Between the middle and the end of 1950s the existence of a new inherited disorder involving the production of ~x-globin chains was suggested by the presence, in patients with microcytic h y p o c h r o m i c anaemia, o f unusual haemoglobins having different physical properties and quaternary structure (Minnich et al, 1954; Gouttas et al, 1955; Rigas et al, 1955). These abnormal haemoglobins were later identified as tetramers of either 7- or [3-globin chains and referred to as haemoglobin Bart's and H b H (Hunt and Lehmann, 1959; Jones et al, 1959). Their presence in the haemolysates of the patients was correctly interpreted as the consequence of a relative or absolute deficiency of ~-globin chains resulting in the defective production of H b A and H b F and in the aggregation into tetramers of unpaired 13-like polypeptide chains (Ingram and Stretton, 1959). The clinical and haematological findings in patients with H b H disease and their relatives suggested the occurrence of at least two forms of o~-thalassaemia trait, an almost asymptomatic one with minimal alterations of the haematological indices (c~-thalassaemia 2) and a second with reduced mean corpuscular

~-THALASSAEMIA

55

haemoglobin (MCH) and mean corpuscular volume (MCV), normal HbA2 values and the presence of HbH inclusion bodies in rare erythrocytes (o~-thalassaemia 1) (Na-Nakorn et al, I965; Pootrakul et al, 1967; Wasi et al, 1969). In the same period the direct measurement in vitro of the relative rates of synthesis of c~ and 13chains became possible (Heywood et al, 1964). The application of this new technique demonstrated the increasing deficiency of ~-globin chain production in the o~-thalassaemia trait, HbH disease and hydrops fetalis (Weatherall et al, 1965; Kan et al, 1968). At the end of the sixties it was, however, still impossible to decide whether the expression of cz-globin chains was controlled by a single gene or by two similar, duplicated genes. According to the first model, cx-thalassaemia might have been due to the interactions of a wild type gene and two (or more) thalassaemic alleles having different degrees of severity. Alternatively, the inactivation of one or more genes in cis or trans of a duplicated cluster could have accounted for both the ~-thalassaemia traits, HbH disease and Hb Bart's hydrops fetalis (Kattamis and Lehmann, 1970). By the middle of the seventies it appeared very likely that the production of o~-globin chains was controlled by two linked genes. This conclusion was supported by the analysis of the interaction between cz-thalassaemia 1 and Hb Constant Spring (Milner et al, 1971) and by the finding of people carrying two o~-globin chain structural variants in addition to HbA (Hollan et al, 1971; Bernini et al, 1970; Meloni et al, 1980). A direct confirmation of the multiplicity of cx-genes was obtained after the purification of a-mRNA, the preparation of cDNA probes and the gene quantitation by cDNA/DNA hybridization in solution (reviewed in Bunn and Forget, 1986). The use of the latter technique demonstrated clearly the presence of four o~-genes in the normal individuals, the complete absence of ~-globin genes in the Hb Bart's hydrops fetalis syndrome and the deletion of one, two and three genes in the o~-thalassaemia 2 trait, in the o~-thalassaemia 1 trait, and in the HbH disease, respectively (Ottolenghi et al, 1974; Kan et al, 1975). The cDNA/DNA hybridization experiments could not, however, explain the absence of hydrops fetalis in African populations in spite of the high frequency of o~-thal 1 trait and also the apparent deletion of only two o~-gene in some individuals with HbH disease. The solution to the last problems were provided by the application of two new technologies: gene mapping and gene cloning. These techniques allowed the determination of the position of the different gene and pseudogenes along the o~ gene cluster, the assessment of intergenic distances, the fine structure of a-like genes and of their non-translated 5' and 3' regions and the size of the introns (reviewed in Bunn and Forget, 1986). In the same period, by mouse-human somatic cell fusion and cDNA-DNA hybridization the o~ genes were localized on chromosome 16 (Deisseroth et al, 1977) and later mapped on the short arm of the chromosome (Barton et al, 1982). By that time the general organization of the o~-globin gene cluster was, even if not in the finest detail, sufficiently

c~(c05-3

__eL

__YEM __CI

_(cO~,-,

___~PAN ____GEO

_ _ _ MED I _CANT

__SEA

_(e):o,~

__BRIT __MA __$A

____RT _ _ MED 1I __DUTCH I __CAL

____F[L

__THAI

(B) __Me

--(Xz,7 -qX3.5 -.(X+lS

._~3.711 __(~3.71ll

(A) -Ct4-: -(tx 71

Alurepeats I I

5' Tel O~.~_-,

t

I

-99 Distl "

t,

-100

I I I I

IL9-R ~ ,

t

t

t

J

MPG ~

I

t

I

I

-40

II

IIS-40 Proxl 4" I III

-60

5'-H~VR II III11

-80

L3 L2 n I]

I

-20 t

0 i

I

+20 f

1

,

t

+40 I

I

i

II

f

l

i

U

L

f' [ II ILII

I IIIIIII IIIL

I

I

..

]

I

I

L

I

I

IIII II

IIIII

I I

+80kb

II

I

~I

I

3'

'"

+60

Inter-~ HVR 3'-HVR III I I l l I I I i i I 7~.77/7?..?L..',

L1 L O ~ ot2oq O " ~nnn""~

t

II I I I I

I

Lacerra et al (1991) Embury et al (1980) Embury et al (1980) Embury et al (1980) Embury et al (1980) Zhao et al (1991) Kulozik et al (1988) [ndrak et al (1993) Vickers and Higgs (1989) Fischel-Ghodsian et al (1988) Fischel-Ghodsian et al (1988) Sabath et al (1994) Ktttlar et al (1989a) Harteveld et al (1997a) Fichera et al (1997) Higgs et al (1985) Gonzalez-Redondo et al (1989) Vandenplas et al (1987) Nicholls et al (1985) Pressley et al (1980) Pressley et al (1980) Villegas et al (1994) Villegas et al (1990) Fei et al (1992a) Pressley et aI (1980) Shalmon et al (1994) Gonzalez-Redondo et al (1988) Lamb et al (1993)

Table 1. Physical map of the human ct-globin locus between +60 from the ~-globin gene cap site to the telomere.

-120

-140

t't7




t~ o

>

Z

m

__

ll5kb

11

i

i

i

:=

I

I

III IIIH

II

IIIIII

I

II II IIIII

,'!!

UIIIItlIIIIII III

II II t Hill I

lilt

lUllUlII II IIIIIII

I

IIH

II

I

I

IWIIIHIIIIIIIIIIIIII

II

II IIIII III

I

IlL

+

IIIIIIIIII I I

I[I

III I III1[ III[ILI/[

IIIIIIIIIII I

IIIIII IIII III II

II II IIII

III

IIIIIIIIIIIIII I

I II II

I

I I IIIIIIIIII I

IIIIIII

IIIIIII

IIII

III

III

I

/( ~_530 kb

+

Felice et al (1984) Harris et al (1990) Waye et al (1992) Harteveld et al (1997b) Lamb et al (1989) Hatton et al (1990) Roman et al (1992) Roman et al (1991) Wilkie et al (1990b) Liebhaber et al (1990) Flint et al (1994) Flint et al (1994) Flint et al (1994) Flint et al (1996)

The genes are indicated as full boxes, the pseudogenes as open boxes. The hypervariable regions (HVRs) are indicated as zigzag lines. The unique sequences L0-L3 are indicated as small open boxes; the position of the element HS-40 involved in regulation of ct-globin gene expression is indicated by a vertical arrow. The directions of transcription and localization of the Distl, MPG, Proxl and -99 genes are indicated as black bars, on the line when the direction of transcription is towards the centromere and below the line when in the opposite direction. The IL9-R pseudogene is indicated as an open bar. The genes known as 16pHQG;I and 2 flanking the telomere are of undetermined function and probably represent pseudogenes. Alu repeats are depicted as vertical lines below the physical map (Flint et al, 1997). (A) Deletions involving one c~-globin gene. The ct+-thalassaemia deletions are shown with the deletion extent as full black boxes or unfilled boxes if the endpoints have not been precisely determined. The symbol '-" indicates the deletion of a complete a gene; '(a)' indicates a partial deletion. An open end indicates that the deletion length has not been determined. (B) Deletion of two

?

58

L. F. BERNINI AND C. L. HARTEVELD

(A) Mo~se:

Human:

• ~GATA/A . . . . ACCCATCTGGAACCTATCAGTGACCATAGTCAACAGCAGGTGTACACA.. IIII ItllIIIIIIIIIll IIIll IIIll I llII llll ACCC.TCTGGAACCTATCAGGGACCACAGTCAGC..CAGGCAAGCACATC

: 48 : 47

*CACC/A Mouse:

..CCCAGGCCAAGC~TGC-~.GCAGACCACTGTGGG.ATCTATGGAGATGC : 94 Iltl IIIIIIit1Itl III I II111II 111 !I I I 1

Human:

TGCCCAAGCCAAGGGTC-~--,AGGCATGCAGCTGTGGGGGTCTGTGAAAACAC : 97

Mouse:

• GATA/B~ . ~NF-E2/A . . TTGAACGAG~TA~CTAAGCCAAGCATC4~TCAGAGTTTCTAGAGGCC :144

Human:

TTGAGGGAG~TAACTGGGCCAACCA~ACTCAGTGCTTCTGGAGGCC

Mouse:

NF-E2/B~ . AG~ ACTAGGACTGCTC~%(~TAATACT,TGGGGGTACAGAGTCAG..AAAGGAAA

Iiti

111Iitliiiii

11111 I I I I 1 t 1 1 1 1

I III1

ttltli :147

4CACC/B~CACC/C

I

II11111111111"tl

11 I I I I I 1 1 " * I I

tt**I

II

:191

II111

Human:

AACAGGACTGCTGAGTCATCCTGTGGGGGTGGAG.GTGGGACAAGGGAAA

Mouse:

C~.ACAAATGGTACCACTGATTAGGACCTCTGACGCTGTTTTCCCATCCT:240

Human:

~C~GGTGAATGGTACTGCTGATTACAACCTCTGGTGCTGCCTCCCCCTCCT

Mouse:

G~ATrTGCC~GTGACCCTG~GCC...~.~Gr~.AC.". . . . . C r ~

4CACC/D.

:196

GATA/C,

II****IIit111I

I111111

IIII11t

IIII

1 II1

ttti :246

wGATAID.

IIItll*ll

I I

I

I

lJtl

Jlll

lJ

ill

:277

I

Human:

GTTTATCTGAGAGGGAAGGCCATGCCCAAAGTGTTCACAGCCAGGCTTCA

Mouse:

GT..CA-,t,~TCTTACCCTGAC. . . . AACACCTTGTACACCTGCAGTTGGGAAGACTTTC :330

Human:

GGGGCAAAGCCTGACCCAGACAGTAAATACGTTCTTCATCTGGAGCT..GAAGAAATTC

I

II11

11 1111 111

I1 1I I I t

It

111 It

I

:296

IIIII

III :353

(B) MOUSE A

A

B

A lq~/R2

B

C

Nl~/g~

HUMAN

A

A

B

A

B

BC

D

C

D

Figure 1. (A) Alignment of the mouse and human c~-MRE. Bat-s represent identical bases, &sterisks mark mismatches in potential transcription factor binding sites. Individual binding sites (bold) are alphabetically marked from 5' to Y. (B) Putative protein binding sites at the mouse and human e~-MRE+ Note the absence of CACC boxes B, C and D and GATA- 1 site D but the conservation of the AG box in the 3' part of the mouse c~-MRE.

~-THALASSAEMIA

59

clearto account forthe genetics of (z-thalassaemiasyndromes(Laueretal, 1980). STRUCTURE AND ORGANIZATION OF THE ct-GLOBIN GENE CLUSTER The (z-globin gene cluster was finally located at position 16p13.3 by Breuning et al (1987). The embryonic and adult (z-like genes are contained in a DNA cluster of about 30 kb inserted in a large isochore of the H3 family contiguous to the telomere region (reviewed in Higgs, 1993). These isochores are preferentially located in subtelomeric regions, show a high GC content (60%) and contain non-methylated CpG-rich islands and hypervariable minisatellite sequences. They are, in addition, early replicating and contain a large number of 'housekeeping' genes (reviewed in Bemardi, 1989). The region extending for about 300 kb from the terminal repeats of the short arm of chromosome 16 shows a very high density of Alu family repeats which constitute close to 26% of the whole sequence (Flint et al, 1997). The frequency of the repeats along the sequence is not uniform but seems to decrease in gene-rich regions. The genomic organization of the region 5' of the (z gene cluster conforms to these general rules and shows in a relatively limited area (see Table 1) the existence of at least four genes, 16priG;4, Dist 1, MPG and Prox 1, which are independently regulated and expressed constitutively (Kielman et al, 1993, 1996; Vyas et al, 1995). The study of the (z-globin upstream flanking region ((Z-UFR) has been stimulated by the discovery that a large deletion which eliminates part of this region (Hatton et al, 1990) is responsible for the silencing of the otherwise intact embryonic and adult (z genes in cis. This situation seemed analogous to the ~-thalassaemia caused by the deletion of the ~ locus control region (13-LCR) elements 5' to the 13-globin gene cluster (Van der Ploeg et al, 1992). An MRE (~-MRE) was indeed found 40kb upstream of the embryonal ~ gene and referred to as HS-40 (Higgs et al, 1990) (see Table 1). The cloning and complete characterization of the homologous region in mouse has revealed that the mouse (z-UFR-(z-globin cluster is located on chromosome 11 in a subcentromeric position and, although smaller than its human homologue, contains the same non-globin genes identified in humans, in the same orientation with respect to the (z genes (Kielman et al, 1993). The (Z-MRE identified in humans was also found in the mouse (Z-UFR, 26 kb upstream of the mouse embryonic globin gene (Hba-x) (see Figure 1) (Kielman et al, 1994). The (Z-MRE includes in a 350bp sequence several binding sites for transcription factors (Jarman et al, 1991). The functional analysis of the (Z-MRE in in vitro systems and in transgenic mouse indicates that this element is necessary for the expression of (z-globin genes (Bernet et al, 1995; Chen et al, 1997). In these experimental systems the (Z-MRE behaves as a strong enhancer of the (z-as well as of [3- and y-globin genes but does

60

L. F. B E R N I N I A N D C. L. H A R T E V E L D

not show the copy number dependence observed for the 13-LCR (Sharpe et al, 1992). Human and mouse (x-UFR regions are highly conserved and, because of the peculiar gene density observed in GC-rich areas of the genome, the non-globin genes are so tightly packed that essential regulatory elements such as (x-MREs and numerous erythroid hypersensitive sites are located 10 kb

(A)

) HS-40 1~

Chromosome 16 lm

~2

[

~1t~(~21P(x1 (~2 ctl 0

5'

"

In! -

'-

q

Adult

(x2~2 Hb Gowsr II

I

I_~

u~:

~2~2

I

HbA HbA2

~2y2 Hb Portland I

p

2~2 Hb Portland II

BLCR

I

"~

6

Chromosome 11

Ira'

mmml

5'

m

m

~

mmm

Gy

Ay

ip~

~

m

3'

q

(B) Site of

erythropoiesis 50 Percentage 40 oftotal

globin synthesis

30

20 10

6

12

18 24

Prenatal age (weeks)

30

36 Birth

6

12

18 24

30

36 42

Postnatal age (weeks)

Figure 2. (A) Schematic representation of the (x-globin (upper) and ~-globin (lower) gene clusters and their chromosomal location. The embryonic, fetal and adult haemoglobins coded by the different genes during development are indicated in frames between the clusters. Genes are shown as full boxes, pseudogenes as open boxes. The 0 gene of undetermined function is indicated in grey. HVRs are indicated as zigzag lines. The positions of the regulatory elements HS-40 and the I3-LCR, consisting of hypersensitive sites 1-6, are indicated by vertical arrows. (B) Graphical representation of the expression of human globin genes during development. The site of erythropoiesis is indicated at the top of the figure. Adapted from Weatherall and Clegg (1981, The Thalassaemia Syndromes, 3rd edn. Oxford: Blackwell).

61

~-THALASSAEMIA

within the transcription unit of PROX1, the most centromeric gene of the c~-UFR. In addition, human MPG and PROX1 genes overlap, the 3' end of the PROX1 gene being located within the last intron of MPG (Table 1) (Kielman et al, 1996). The structural and functional interrelationships between o~gene cluster and o~-UFR have probably played an important role in the systemic evolution of this gene complex. Within a DNA sequence of about 30kb the 0t-globin gene cluster includes in the direction 5'---)3' the embryonic gene ~2, three pseudogenes (W~I, W~,2, utica1), the two duplicated in tandem ~ and (~1 genes and the 0 gene (Figure 2) (Lauer et al, 1980). As in the 13-1ike globin cluster, the 5'---)3' sequence of c~ genes along the chromosome reflects their order of activation and expression during ontogenesis. The two (~ genes, 5'-oE-oq-3", are the result of a duplication which took place about 60 Myears ago (reviewed in Collins and Weissman, 1984). Since then, instead of diverging, the two genes have undergone, through repeated rounds of gene conversion and cross-over fixation, a process of concerted evolution and have remained virtually identical. They differ only by a seven-nucleotide insertion near the end of the second intron and a base substitution also in the second intron at positions 509 and 573 from the cap site and diverge considerably in the region downstream of the third exon (Figure 3) (Michelson and Orkin, 1983). ~2 and ~ genes are inserted

a2 ~

5"

1 f l / / / i / ~

Z1

509 T

ct t

Q

568-569

GGCCCTC

573

3'

0rl//77.,~

Z2

{.//,////////////////////]

,~z2

(x 1 ~

a I b I cl

740

804

(3

RGCC(}T'rCCTCCTGCCCGCTGGGCCTCCC,

n~C,~G~CCCTCC T C C ~ T C C T T 6 C R C C G G -CGCTTCC

C

G, • ,R,~,T,

C¢.~.¢.,,*.

,T .....

CT . . . . . . . . . .

.......

T,C ......

¢ . T F I . , ¢ C .G

Figure 3. Schematic representation of the duplicated ~-~obin genes and the location of the X, Y and Z homology boxes. Below the picture the regions of homology between the Z boxes of the cq and ct,globin genes are depicted. The overall homology is 98.5% and is subdivided by a 7 bp insertion in the IVS 2 of a~ into Z1 (99% homology) and Z2 (93% homology). Z2 can be further subdivided into Z2a (99%), Z2b (78%) and Z2c (100%, surrounding the polyadenylation signal). The exact nucleotide differences between cq and ~2 are indicated at the bottom of the figure. Dots mark the sequence similarity and the numbers indicate the position from the cap site of the ct2 gene. Adapted from Higgs et al (1984), Nucleic Acids Research 12: 6965-6977).

62

L . F . BERNIN1 AND C. L. HARTEVELD

within three larger homologous sequences, X, Y and Z, also duplicated in tandem and separated by non-homologous DNA regions (Figure 3) (Lauer et al, 1980). Several hypervariable repetitive minisatellite sequences are embodied along the whole cluster. These minisatellites were identified because restriction enzymes specific for regions flanking the hypervariable sequences revealed in the normal population a range of variable-length alleles inherited in a Mendelian fashion (reviewed by Higgs et al, 1989). The HVRs are located 70 kb upstream of the 42 gene (5'-HVR), between the 42 and the ~41 genes (inter-4 HVR), within the introns of 42 and ~41 genes (intra-~ HVR) and at the 3' end of the cluster (3'-HVR). The unit size of each repeat ranges from 5 to 57 bp and the number of repeats from 5-55 in the 5'-HVR to 70-450 in the 3'-HVR. These hypervariable minisatellites have been (and still are) extremely useful as genetic markers in the derivation of t~-gtobin gene haplotypes and their study in families and populations (Waye and Eng, t994).

EXPRESSION OF oL-LIKE AND [~-LIKE GENES DURING D E V E L O P M E N T AND IN ADULT LIFE Embryonic ~ and E chains are already produced at 5 weeks gestation in the primitive erythroblasts of the yolk sac (Peschle et al, 1985). From the sixth week thereafter the embryonic chains are progressively replaced by adult (o~,[3) and fetal (Y) polypeptides (Figure 2). The switch is associated with a change in the transcription of the relative genes within the same erythropoietic lineage and not with the replacement of primitive progenitors by a different cell lineage (Stamatoyannopoulos et al, 1987). The occurrence of an e---~7 switch within the same cell is proved by the presence in the erythroblast's population of cells containing either e chains or 7 chains or both kinds of proteins (Mesker et al, in press). The transition ~---)o~has not been documented at single-cell level in humans. In mouse the occurrence of an embryonic-->fetal switch has been challenged by Leder et al (1997). These authors have shown by in situ hybridization that ~ and o~mRNAs are simultaneously present in the earliest erythrocyte population and that mice homozygous for a knock-out of the 4 gene develop normally. This experiment suggests that in mice the complete lack of the ~ peptide is not lethal and that the embryonic chain is largely redundant and can be replaced by the adult t~ which is expressed at the same time. During fetal and adult life the globin genes of the o~ and 13 clusters are expressed in a co-ordinated fashion, so that eventually the ratio of o~-like and [~-like chains synthesized remains always very close to 1. The production of globin chains, however, is not kept balanced by any particular mechanism implying mutual dependence. As evident in o~- and [3-thalassaemic disorders, the complete suppression of the synthesis of one chain does not prevent the production of excessive amounts of the other type of chain. Because unpaired globin chains are unstable and aggregate

~-THALASSAEMIA

63

into functionally useless tetramers, the maintenance of a balanced o~: non-o~ synthetic ratio is of critical importance in the viability of erythrocytes and oxygen exchange. The physiological 1 : 1 ratio of globin chains is not the consequence of a balanced mRNA output. In fact, the o~-globin mRNA: [3-globin mRNA ratio is equal to 2.6 (Hunt et al, 1980; Lin et al, 1994) or is even higher (about 4) according to recent reports (Smetanina et al, 1996) and there is good evidence that the balance at the protein level is brought about by a higher translation efficiency of [3 mRNA. The analysis of the translational profiles of globin mRNAs in reticulocytes has revealed a preferential sequestration of o~-globin mRNA into the pre-80S fractions. In addition ~-globin chains are assembled on polysomes larger than those sustaining o~-globin mRNA translation. This indicates that a differential inhibition of the initiation is responsible for the lower translational yield of a-globin mRNA (Lodish, 1976; Binninger and Weber, 1984). The structural features most important in enhancing the rate of initiation are the length of the leader sequence and the reduction or absence of a secondary structure (Kozak, 1991). The experiments carried out by Kozak (1994) with rabbit ~ and cz mRNAs in a translation-competent reticulocyte lysate system suggest that the reduced translatability of o~-globin mRNA is due to a shorter 5' leader sequence and to a much higher degree of secondary structure of the leader in comparison with the ~-globin mRNA. The translational efficiency of the oc-globin mRNA was indeed improved by increasing the length of the leader or introducing into the sequence mutations able to minimize secondary structure. There is complete agreement on the relative ratio of ~-globin mRNA transcribed by the a2 and oq genes. This ratio equals 2.6 according to the experiments of Liebhaber et al (1986) and an average of 2.8 as reported by Smetanina et al (1996). The translational profile of the mRNA expressed by the two genes is also the same (Shakin and Liebhaber, 1986), implying comparable translational efficiencies. Analysis of mRNA ratios in human embryos has revealed that the levels of ot2 and c~ mRNA are the same until the eighth week of gestation; afterwards the o~2 mRNA assumes the dominant expression which will be maintained in the adult. The same developmental switch has been observed in transgenic mice carders of the whole o~-globin gene cluster (Albitar et al, 1992). Assuming a comparable translational efficiency, a close correlation is expected, therefore, between the amount of messenger and the levels of protein synthesized. A recent reassessment of the quantities of o~-globin chain stable variants expressed by the c~2- and o~-globin genes in heterozygotes suggests, however, that the level of or2 mRNA is about twice the relative amount of protein translated (Molchanova et al, 1994). The same authors impute the altered mRNA: protein ratio to a less efficient translation of the o~2 mRNA. The difference reported by Molchanova et al between the output of c~-globin chains coded by the ot2 and oq genes, even if smaller than previously realized, is quite significant. It seems difficult to challenge, therefore, the concept of the existence of major (o~2) and minor (oq) globin genes. A large number of variables (including protein and mRNA stability,

64

L.F.

B E R N I N I A N D C. L. H A R T E V E L D

translational efficiency, influence of cis- or trans-acting factors etc.) determine the output of globin chains coded by the two duplicated genes. The concurrence of many factors may generate an appreciable degree of variability observed in the relative expression of o~-globin genes in cis (o~2:oq ratios) and in trans (o~2:cx2 or cx~:oq ratios). This variability may suggest the existence of different arrangements within the c~ gene cluster (haplotypes) which might be sufficient to account for the apparently unusual phenotypes mentioned above (Pagnier et al, 1982). The dominance of the 5'c~ gene has also been demonstrated in other species (Snyder, 1980). In mouse the interaction of a deletion-type o0thalassaemia with an ~+-thalassaemia determinant generated by disruption by gene targeting of the 5' cx-globin gene results in a very severe HbH disease already lethal in utero (Chang et al, 1996). This outcome confirms that the 5" c~-globin gene is the predominant one also in mouse. In sheep o~-globin genes are duplicated (~c~and ~o~) and the 5' Io~ gene shows again dominance contributing, per haplotype, 32% of the ~-globin chains against the 18% expressed by the H~ gene (Vestri et al, 1983a,b). Triplication and quadruplication of ~ genes represent a relatively common genomic rearrangement in sheep. Vestri et al (1994) have been able to show a gradient of expression from the 5' to the 3' gene, at both mRNA and protein level. The c0 ~3Hisgene, for instance, expresses 18% of the total haplotype output when the gene is at the second position from the 5' end but only about 1% when it is located at the fourth. The authors rule out protein instability, variable translational efficiency and transcriptional interference as possible reasons for the decreasing expression and suggest that a competition which takes place between promoters for interaction with a common enhancer might be important in the generation of such a gradient. oz-Thalassaemia determinants

Depending on the production of c~-globin chains, o~-thalassaemia determinants can be classified into two groups: c~° and ~+. In o:°-thalassaemia the production of o~ chains by the affected chromosome is completely abolished; c~+-thalassaemia is defined by the variable amounts of ~ polypeptide chains which can still be expressed in cis to the thatassaemic cluster. This nomenclature, which describes o~-thalassaemias in terms of o~-globin chain expression/haplotype, has replaced the previous classification of these defects into severe (o~-thalassaemia 1) and mild (~-thalassaemia 2) forms (Weatherall and Clegg, 1981). ~°-Thalassaemia is usually the result of deletions which eliminate (~) and ~ genes in cis, the entire ~ cluster together with the main regulatory sequence o~-MRE or only the HS-40 regulatory element. Another (less frequent) cause of ~°-thalassaemia is the occurrence of a point mutation within the single (otherwise intact and presumably functional) ~ gene left over in a haplotype after the deletion of the other partner. ~+-Thalassaemia is mainly the result of the deletion of a single gene within the cluster or the consequence of a thalassaemia-generating point

O~-THALASSAEMIA

65

mutation of either the ~ or the cz~ gene. Some non-deletional mutations may not completely suppress the gene expression; furthermore, the same kind of mutation, because of the dominance of the cz2 gene over cz~, may give a different degree of ¢z chain unbalance when located on either gene. Finally, some mutations of the cz2 gene interfere with the transcription of the downstream czI gene and induce an o~-globin chain deficiency more pronounced than that expected from the inactivation of only one gene in cis (Whitelaw and Proudfoot, 1986). The homozygosity or compound heterozygosity for cz°-thalassaemia or the varieties of cz+-thalassaemia determinants and their interaction with the normal haplotype account for an almost continuous range of clinical severity spanning from lethality in the embryonic or fetal period to a very mild trait. In order of increasing severity the thalassaemic haplotypes can be classified as follows: cz2czlT< ot2-