《Crick 4 4 4 遺傳密碼錶》的起草origin與修訂evolution （五）

04-27

The stereochemical theory: tantalizing hints but no conclusive evidence

Extensive early experimentation has detected, at best, weak and relatively non-specific interactions between amino acids and their cognate triplets (5, 73, 74). Nevertheless, it is not unreasonable to argue that even a relatively weak, moderately selective affinity between codons (anticodons) and the cognate amino acids could have been sufficient to precipitate the emergence of the primordial code that subsequently evolved into the modern code in which the specificity is maintained by much more precise and elaborate, indirect mechanisms involving tRNAs and aminoacyl-tRNA synthetases. Furthermore, it can be argued that interaction between amino acids and triplets are strong enough for detection only within the context of specific RNA structures that ensure the proper conformation of the triplet; this could be the cause of the failure of straightforward experiments with trinucleotides or the corresponding polynucleotides. Indeed, the modern version of the stereochemical theory, the 『escaped triplet theory』 posits that the primordial code functioned through interactions between amino acids and cognate triplets that resided within amino-acid-binding RNA molecules (75). The experimental observations underlying this theory are that short RNA molecules (aptamers) selected from random sequence mixtures by amino-acid-binding were significantly enriched with cognate triplets for the respective amino acids (76, 77). Among the 8 tested amino acids (phenylalanine, isoleucine, histidine, leucine, glutamine, arginine, tryptophan, and tyrosin) (75), only glutamine showed no correlation between the codon and the selected aptamers. The straightforward statistical test applied in these analyses indicated that the probability to obtain the observed correlation between the codons and the sequences of the selected aptamers due to chance was extremely low; the most convincing results were seen for arginine (75). However, more conservative statistical procedures (applied to earlier aptamer data) suggest that the aptamer-codon correlation could be a statistical artifact (78) (but see (79)).

A different kind of statistical analysis has been employed to calculate how unusual is the standard code, given the aptamer-amino-acid binding data (75, 77). A comparison of the standard code with random alternatives has shown that only a tiny fraction of random codes displayed a stronger correlation with the aptamer selection data than the standard code (the real genetic code has greater codon association than 90.3% random codes, and greater anticodon association than 99.8 random codes). The premises of this calculation can be disputed, however, because the standard code has a highly non-random structure, and one could argue that only comparison with codes of similar structures are relevant, in which case the results of aptamer selection might not come out as being significant.

On the whole, it appears that the aptamer experiments, although suggestive, fail to clinch the case for the stereochemical theory of the code. As noticed above, the affinities are rather weak, so that even the conclusions on their reality hinge on the adopted statistical models. Even more disturbing, for different amino acids, the aptamers show enrichment for either codon or anticodon sequence or even for both (75), a lack of coherence that is hard to reconcile with these interactions being the physical basis of the code.

Go to:

The adaptive theory: evidence of evolutionary optimization of the code

Quantitative evidence in support of the translation-error minimization hypothesis has been inferred from comparison of the standard code with random alternative codes. For any code its cost can be calculated using the following formula:

?(a(c)) = ∑c∑c′p(c′∣c)d(a(c′), a(c)),

(I)

where a(c) : C → A is a given code, i.e., mapping of 64 codons c ∈ C to 20 amino acids and stop signal a(c) ∈ A ; p(c′ | c) is the relative probability to misread codon c as codon c′, and d(a(c′), a(c)) is the cost associated with the exchange of the cognate amino acid a(c) with the misincorporated amino acid a(c′). Under this approach, the less the cost ?(a(c)) the more robust the code is with respect to mistranslations, i.e., the greater the code』s fitness.

The first reasonably reliable numerical estimates of the fraction of random codes that are more robust than the standard code have been obtained by Haig and Hurst (16) who showed that, under the assumption that any misreadings between two codons that differ by one nucleotide are equally probable, and if the polar requirement scale (80) is employed as the measure of physicochemical similarity of amino acids, the probability of a random code to be fitter than the standard one is P1 ≈ 10?4. Using a refined cost function that took into account the non-uniformity of codon positions and base-dependent transition bias, Freeland and Hurst have shown that the fraction of random codes that outperforms the standard one is P2 ≈ 10?6, i.e., 『the genetic code is one in a million』 (81). Subsequent analyses have yielded even higher estimates of error minimization of the standard code (15, 17, 82, 83).

Despite the convincing demonstration of the high robustness to misreadings of the standard code, the translation-error minimization hypothesis seems to have some inherent problems. First, to obtain any estimate of a code』s robustness, it is necessary to specify the exact form of the cost function (I) that, even in its simplest form, consists of a specific matrix of codon misreading probabilities and specific costs associated with the amino acid substitutions. The form of the matrix p(c′| c) proposed by Freeland et al. (81) is widely used (e.g., (15, 83–86)) but the supporting data are scarce. In particular, it has been convincingly shown that mistranslation in the first and third codon positions is more common than in the second position (65, 87, 88), but the transitional biased misreading in the second position is hard to justify from the available data. In part, to overcome this problem, Ardell and Sella formulated the first population-genetic model of code evolution where the changes in genomic content of a population are modeled along with the code changes (89–91). This approach is a generalization of the adaptive concept of code evolution that unifies the lethal-mutation and translation-error minimization hypotheses and incorporates the well-known fact that, among mutations, transitions are far more frequent than transversions (92, 93). Essentially, the Ardell-Sella model describes coevolution of a code with genes that utilize it to produce proteins and explicitly takes into account the 「freezing effect」 of genes on a code that is due to the massive deleterious effect of code changes (90). Under this model, evolving codes tend to 「freeze」 in structures similar to that of the standard code and having similar levels of robustness.

Another problem with the function (I) is that it relies on a measure of physicochemical similarity of amino acids. It is clear that any one such measure cannot be totally adequate. The amino acid substitution matrices such as PAM that are commonly used for amino acid sequence comparison appear not to be suitable for the study of the code evolution because these matrices have been derived from comparison of protein sequences that are encoded by the standard code, and hence cannot be independent of that code (94). Therefore one must use a code-independent matrix derived from a first-principle comparison of physic-chemical properties of amino acids, such as the polar requirement scale (80). However, the number of possible matrices of this kind is enormous, and there are no clear criteria for choosing the 「best」 one. Thus, arbitrariness is inherent in the matrix selection, and its effect on the conclusions on the level of optimization of a code is hard to assess.

A potentially serious objection to the error-minimization hypothesis (95) is that, although the estimates of P1 and P2 indicate that the standard code outperforms most random alternatives, the number of possible codes that are fitter (more robust) than the standard one is still huge (it should be noted that estimates of the code robustness rely on the employed randomization procedure; the one most frequently used involves shuffling of amino acid assignments between the synonymous codon series that are intrinsic to the standard code, so that 20! ≈ 2.4 · 1018possible codes are searched; different random code generators can produce substantially different results (86)). It has been suggested that, if selection for minimization of translation error effect was the principal force of code evolution, the relative optimization level for the standard code would be significantly higher than observed (96). The counter argument offered by supporters of the error-minimization hypothesis is that the distribution of random code costs is bell-shaped, where more robust codes form a long tail, so because the process of adaptation is non-linear, approaching the absolute minimum is highly improbable (17).

It has been suggested that the apparent code robustness could be a by-product of evolution that was driven by selective forces that have nothing to do with error minimization (97). Specifically, it has been shown that the non-random assignments of amino acids in the standard code can be almost completely explained by incremental code evolution by codon capture or ambiguity reduction processes. However, this conclusion relies on the exact order of amino acids recruitment to the genetic code (98, 99), primarily, on a specific interpretation of the evolution of biosynthetic pathways for amino acids, which remains a controversial issue.

《Crick 4 *4 *4 遺傳密碼錶》的起草origin與修訂evolution （五）

Quantitative evidence in support of the translation-error minimization hypothesis has been inferred from comparison of the standard code with random alternative codes. For any code its cost can be calculated using the following formula:

《Crick 4 4 4 遺傳密碼錶》的起草origin與修訂evolution （五）