The antibody, T cell receptor and MHC loci
Theories of antibody formation
Ten years after the discovery of antibodies by von Behring and Kitasato, Paul Ehrlich, in his Croonian lecture at the Royal Society in London in 1900, suggested that antibodies were cell surface molecules (side-chains) and that interaction of antigen with one such molecules would lead to augmented synthesis of that particular side-chain (see Fig Ehrlich's side-chains theory and Linus Pauling's instructuve theory of antibody formation). This theory is the prototype of all selective theories of antibody formation. Modern experiments have changed this theory in only one important respect, i.e. the fact that different antibodies are not produced by the same cell, as in Ehrlich's, but by different cells. In fact Nature has gone to considerable length to ensure that each cell produces only one type of antibodies. Selective theories are attractive to current biological thinking but assume, in modern language, a very large gene pool encoding the different specificities. Such theories were much undermined by the careful work of a Karl Landsteiner who, in the thirties, demonstrated that specific antibodies could be raised against a variety of small organic molecules. How can genes encoding antibodies with these specificities have been preserved during evolution ? A number of distinguished scientists, including Linus Pauling, could not bring themselves to such a belief and proposed alternative instructive theories. According to early formulations of these theories, the antigen has the ability to direct the synthesis and folding of a specific antibody; hence, the number of genes required for generating numerous specificities antibodies could be very small.
Although the difficulties in reconciling instructive theories with the developing knowledge on protein structure and folding became soon obvious and led to their abandonment, the argument for a limited pool of genes was maintained by a number of scientists and the debate between selective and instructive theories was replaced by the debate between the germ line theory and the so-called somatic theory.
According to the germ line theory each antibody is encoded by an inherited gene, it is not modified during somatic development and thus a very high numbe of antibody genes must exist in the germ line. According to the somatic theory there exist only a small number of antibody genes but these become greatly diversified in somatic cells through mutation and/or recombination.We now know the number of antibody gene segments in humans, we know in substantial detail how these segments are somatically modified by recombination and mutation and what is the contribution of the gene pool and the somatic processes to the specificity and affinity of the antibody response. We also know that certain strategies apply both to antibody and T cell receptor genes (gene recombination), others apply to antibody but not T cell receptor genes (gene hypermutation) and that the strategy underlying the ability of MHC proteins to interact with different antigens is gene polymorphism. Thus the genetics of vertebrate adaptive immunity has been largely unravelled in the last forty years.
Evidence for somatic recombination of antibody and TCR genes
The hypothesis that antibody genes resulted from the "fusion" of two different genes (a V gene and a C gene) was put forward in 1965 by W Dreyer and J Bennett. It provided an explanation for the protein sequence data that was accumulating at the time and for serological and genetic observations indicating that the same idiotype could be found in association with different isotypes.
The "two genes - one polypeptide" hypothesis was proven correct in the mid seventies by N Hozumi and S Tonegawa who demonstrated that probes for the C or a V+C light chain mRNA hybridized with different restriction fragments of embryo genomic DNA but hybridized with the same restriction fragment in the case of DNA from the myeloma line. This experiment confirmed that a process of somatic rearrangement had occured in the antibody-producing cell.
Locus, alleles, polymorphism and haplotype.
The location of a gene on a chromosome is defined as its locus. The same term, however, is applied for gene clusters such as the antibody heavy chain locus, the MHC locus, etc. Alleles are the mutiple forms of a gene. In a population a locus is monomorphic if the frequency of the commonest alleles exceeds 0.99, it is polymorphic if its frequency is is below 0.99. Haplotype (literally the haploid genotype) is the genotype of a set of closely-linked genes.
Banding patterns of metaphase chromosomes
Metaphase chromosomes, unlike interphase chromosomes, are codensed and well amenable to staining and banding with a variety of procedures (Q banding using quinacrine, G banding usineg Giemsa, etc.). Based on the bands visualised in this way, chromosomes have been assigned a nomenclature in which the letters p and q follow the chromosome number and define the short and long arms respectively. The letter is followed by a number which specifies the region of the arm and the band. A more detailed banding (approximately 2,000 in the human haploid genome) is seen using late prophase chromosomes.
Genetic and Physical Maps
The location of a gene or a gene cluster on a chromosome can be established by several means. It can be established genetically, i.e. through linkage analysis or the study of chromosome segregation in somatic cell hybrids. These techniques, however, have rather low resolution (5-10 Mb). Alternativley maps can be generated by physical means (hybridisation, restriction and sequencing). These techniques reach nucleotide resolution (sequencing).
Estimating gene numbers in multigene families
There are several procedures for estimating gene numbers: (i) counting bands in Southern blots, (ii) reassociation kinetics of DNA-DNA hybrids in solution, (iii) saturation sequencing of sequences obtained by PCR amplification and, (iv) complete mapping and sequencing of the locus.
V genes have conserved elements at the 5' end (the octamer promoter box and the leader exon) and the 3' (the heptamer/nonamer box) which are suitable for PCR amplification and sequencing. Extensive sequencing of these PCR products thus yielded data highly representative of repertoire size before the relevant loci had been fully mapped and sequenced. The location of the antibody, T cell receptor and MHC loci in humans is shown in the Table below.
The antibody heavy chain, k light chain and l light chain loci
The chromosomal location of the human VH, Vk and Vl loci is shown in the Fig above. Links are provided here to very informative and up to date graphical representations of the antibody, TCR and MHC provided by the International Imunogenetics Information System (see the section Further links for the relevant url address). Students should be aware of the colour code used for graphical gene representation on the IMGT sever In order to interpret data correctly.
The human VH locus is located at the telomeric end (i.e. within few kb of the telomere) of the long arm of chromosome14 (14q32-33). The position has been confirmed by a variety of cytogenetic and hybridization techniques. The locus comprises 123-129 V segments, the variation being due to insertional polymorphism. The locus spans > 1Mb.
The human Vk locus is located near the centromere on the short arm of chromosome 2 (2p12). It comprises 76 Vk segments, 5 Jk segments and a single Ck segment. The locus spans ~ 1.8 Mb and a large portion (440 kb) is duplicated. The duplication is a recent event as it is >96% identical at the sequence level. 40 genes reside in the proximal half, 36 reside in the distal (duplicated) half. 34 genes are functional, 16 have minor defects (one or two nucleotide changes) and 25 are clearly pseudogenes. Six duplicate genes yield the same transcription product and two other yield the same translation product. The transcriptional orientation differ among different groups of genes. Approximately half the genes rearrange by a deletion mechanism. The other half by a mechanism involving inversions of Mb size DNA. Among the haplotypes described, there is one lacking the distal (d) copy of the locus. Subjects with this haplotype have a smaller number of functional genes (18). In the anti-H. infuenzae response, which typically uses a d gene (A2), these subjects use a p gene homologue which is heavily hypermutated.
The human Vl locus is located near the centromere on the long arm of chromosome 22 (22q11). The locus comprises 69-70 genes of which 30 are functional. It spans over 0.9 Mb.
The T cell receptor (TCR) a/d, b and g loci
The chromosomal location of the TCR a, b, g and d chains is shown in the Fig above. Although there are four types of T cell receptor chains (a, b, g and d), there are only three loci because the d locus is interdispersed in the a locus. The human a-d locus is complex and includes 54 Va gene segments of which 45 are functional.
The locus also contains a remarkable number (61) of Ja segments, the majority of which is functional. The human b locus was the first TCR locus to be fully mapped and sequenced. It spans ~0.6 Mb and contains 62-65 V genes of which 39-41 are functional. The human g locus instead is much simpler: it spans ~ 0.1 Mb and contains 12-15 V genes of which 4 -6 are functional.
The MHC locus
The human MHC locus spans approximately 4 Mb on the short arm of chromosome 6 (6p21) (see Fig below). The locus comprises three regions: a group of around 20 class I genes (telomeric), a group of around 15 class II genes (centromeric) and a heterogeneous group of around 30 class III genes located between the class I and class II clusters. The class III region is gene-rich (a gene every 15 kb) and includes genes for complement factors (C2, C4 and factor B), the heat shock protein HSP70 and the genes for TNFa and b. There are three major polymorphic products of class I genes (defined in man as HLA-A, HLA-B and HLA-C) involved in antigen presentation to cytotoxic T cells. Among the other class I genes there are several pseudogenes but also functional genes (see gene statistics at the MHC locus for up to date statistics and the Fig Gene complexity at the MHC locus in man for a summary). The function of a number of these other genes is under investigation. They are less polymorphic than the A, B and C genes and two are expressed in a tissue-specific manner. At least in the mouse one gene is involved in presenting bacterial formyl-methionyl peptides to the cells of the immune system. There are also three major polymorphic products of class II genes (HLA-DR, HLA-DQ and HLA-DP) and several class II pseudogenes (for example the DPA2 and DPB2 genes).
The molecular basis of class I and class II gene polymorphism is at the level of amino acid substitutions and most alleles are present at significant frequencies in the populations. This argues against neutral mutation and suggests overdominant selection (heterozygote advantage) or frequency-dependent selection. The suggestion made by Peter Doherty and Rolf Zinkernagel that the polymorphism is maintained by viruses is a plausible one. Recombination within locus occurs at preferred sites. This leads to linkage disequilibrium (i.e. the occurrence of alleles of two genes together at frequencies higher than expected from the frequencies of the single alleles; the loci are then said to be in linkage disequilibrium). For example in North European populations the frequency of the A1 allele at the A locus and the frequency of the B8 allele at the B locus are 0.17 and 0.11. The predicted frequency of the A1B8 haplotype is therefore 0.17 x 0.11 = 0.0187. Instead, the observed frequency of the A1B8 haplotype is 0.9 giving a linkage disequilibrium value of 0.9 - 0.0187 = 0.7.
The distribution of the genetic polymorphism across the sequence of the MHC genes is of remarkable interest and this illustrated for the HLA-A gene in the Fig Genetic polymorphism of the human HLA-A gene. The variability plot shown in panel a clearly indicates that the distribution of amino acid substitutions among HLA-A alleles in the population is non random (certain position are highly polymorphic, many are not). Mapping the sequence variability onto the structure of the HLA-A protein(two views, shown in panels a and b) dramatically illustrates this concept and demonstrates that large reagions of the protein are unaffected by genetic polymorphism whereas the beta-sheet and the two alpha-helices defining the peptide antigen-binding site, are highly polymorphic (color code for variability, highest to lowest: red, orange, yellow, greeen and light blue).
Interdispersed among the class II genes are two pairs of genes of considerable interest. The TAP1 and TAP2 genes (transporter associated with antigen processing) encode proteins which transport peptides from the cytoplasm to the ER for association with class I molecules. Interestingly mutations in these genes leads to failure of class I proteins to exit from the ER and tendency to dissociate from b2-microglobulin. Two additional genes (LMP2 and LMP7) encode components of the cytoplasmic complex known as proteasome; this is involved in protein degradation in the cytoplasm and yields peptides for transport by the TAP proteins.
What is the significance of the clustering of class I, class II, complement, TAP and LMP genes in the same complex ? While functionally-interpendent gene clusters are frequent in prokaryotic genomes where they are subject to coordinated regulation, eukaryotic gene clusters of this type are infrequent. Exceptions include the Hox cluster, the MHC and the globin cluster. There is little evidence for coordinated regulation of the MHC genes but other processes (selection for particular haplotypes, gene conversion) may have played a role in preserving the cluster.
Topics for Exam
- Theories of antibody formation
- Somatic rearrangement of antibody genes
- Estimating gene numbers at complex loci
- The two genes - one polypeptide hypothesis
- The antibody gene loci
- The TCR gene loci
- The MHC locus
- MHC polymorphism
 Ehrlich P. On immunity with special reference to cell life. Proc R Soc London (Biol) 66:424 (1900)
 Pauling L. A theory of the structure and process of formation of antibodies. J Am Chem Soc 62: 2643 (1940)
 Dreyer WJ, Bennett JC. The molecular basis of antibody formation: a paradox. Proc Natl Acad Sci USA 54:864-9 (1965)
 Hozumi N, Tonegawa S. Evidence for somatic rearrangement of immunoglobulin genes coding for variable and constant regions. Proc Natl Acad Sci USA 73:3628-32 (1976)
 Tonegawa S. Somatic generation of immune diversity. Nobel lecture (1987)
 International Imunogenetics Information System (a global reference in Immunogenetics and Bioinformatics at the University of Montpellier)