Inverse relationship between genetic diversity and epigenetic complexity
Correspondence: (Login to view email address)
- The Burnham Institute, La Jolla, CA 92037
PDF (638.8 KB)
This manuscript is a preprint. A published version is available at:
http://www.amazon.com/Cancer-Epigenetics-Trygve-Tollefsbol/dp/1420045792/ref=sr_1_1?ie=UTF8&s=books&qid=1226425804&sr=1-1 (Peer Reviewed) The main idea of this paper, i.e. the inverse relationship between genetic diversity and epigenetic complexity, has been published as a peer reviewed book chapter. Citation: Huang, S. (2008) Histone methylation and the initiation of cancer. Cancer Epigenetics, Ed. Tollefsbol, T., CRC Books.- Document Type:
- Manuscript
- Date:
- Received 13 January 2009 19:33 UTC; Posted 15 January 2009
- Subjects:
- Genetics & Genomics, Bioinformatics, Evolutionary Biology
- Abstract:
Early studies of molecular evolution revealed a correlation between genetic distance and time of species divergence. This observation provoked the molecular clock hypothesis and in turn the ‘Neutral Theory’, which however remains an incomplete explanation since it predicts a constant mutation rate per generation whereas empirical evidence suggests a constant rate per year. Data inconsistent with the molecular clock hypothesis have steadily accumulated in recent years that show no correlation between genetic distance and time of divergence. It has therefore become a challenge to find a testable idea that can reconcile the seemingly conflicting data sets. Here, an inverse relationship between genetic diversity and epigenetic complexity was deduced from a simple intuition in building complex systems. Genetic diversity, i.e., genetic distance or dissimilarity in DNA or protein sequences between individuals or species, is restricted by the complexity of epigenetic programs. This inverse relationship logically deduces the maximum genetic diversity (MGD) hypothesis, which suggests that macroevolution from simple to complex organisms involves a punctuational increase in epigenetic complexity that in turn causes a punctuational loss in genetic diversity. The hypothesis fully grants Neo-Darwinism to be what it really is (a theory of microevolution) and explains all the major facts of evolution. Importantly, it predicts the most remarkable result of molecular evolution, the genetic equidistance result, which originally provoked the molecular clock hypothesis.
Discussion
- Votes:
-
3 votes
- Comments:
-
6 comments
Thank you for alerting me to your paper. Something similar to your observation in plants has long been noted in animals. The first report of this kind that I know of is this paper:
Albumin Evolution in Frogs: A Test of the Evolutionary Clock Hypothesis – DG Wallace, LR Maxson, AC Wilson – Proceedings of the National Academy of Sciences, 68, No. 12, pp. 3127-3129, December 1971
Abstract: Frogs are an ancient group compared to placental mammals. Yet, although there are about as many species of frogs as there are of mammals, zoologists consider that frogs have undergone only limited morphological divergence, while placental mammals have diversified greatly in morphology and way of life. The serum albumins of numerous frog species were compared by the quantitative microcomplement fixation technique. Frogs that are morphologically similar enough to merit taxonomic distinction at only the species level often exhibit differences in the serological properties of their albumins larger than those usually seen between mammals placed in distinct families or suborders. Thus, there seems to be a contrast between albumin evolution and evolution at the organismal level.So, frogs are genetically much more diverse than mammals but are much less diverse in phenotypes. Phenotype or epigenetic diversity/complexity is not possible if genetic diversity is not somehow curtailed.
Regarding your paper, I am not sure from your data whether the genetic diversity of Jatropha is much less than a similar plant that has the same length of evolution time but is much less diverse in phenotypes. It is necessary to show such data to support your claim of low genetic diversity but high phenotypic diversity.
You may come up with more examples of plants similar to frogs/mammals. But the one I can think of for plants is the high phenotypic diversity of angiosperm plants versus low phenotypic diversity of gymnosperm. Angiosperm has more than 250000 species (about 90% of all plant species), even though it is the most recently evolved and the most complex plants. I have not done the research but it is easy for me to predict that the genetic diversity of angiosperms is much less than that of gymnosperms of equal evolutionary time.
Molecular clock at best explains half the story on ‘genetic equidistance’ and at worst explains none
The genetic equidistance result (sister species are equidistant to a simpler outgroup) has been interpreted by a tautology, the molecular clock hypothesis, which says that vastly different lineages have very similar mutation rates. The neutral theory was invented to explain the molecular clock by postulating that the vast majority of residue differences between species are neutral mutations.
On surface, the similar mutation rate idea seems to explain the equidistance result in terms of percent identity. But one fatal weakness with this idea that was pointed out in my previous paper is that there is no independent evidence for this idea. In contrast, there are ample evidence against this idea. That observation alone has in part led me to invent the MGD hypothesis as the correct interpretation for the equidistance result. The MGD interpretation came to me from logical reasoning based on basic biological principles. Thus, I had deduced an important feature of the equidistance result from an axiom before I had a full grasp of the complete story of equidistance. That feature is: most of the residue positions differing between one sister lineage and the outgroup are also different between another sister lineage and the outgroup. In other words, suppose that sister species A and B are equidistant to the simpler outgroup C, where A and B has separated for much longer time than the time of separation between C and the common ancestor of A and B. We would observe that most of the residue positions that differ between A and C are also different between B and C. Below, I illustrate this fundamental feature of the equidistance result by using the example of cytochrome c which was used originally in 1963 to discover the equidistance result, with the baker’s yeast as the outgroup to the sister species of drosophila and human.
The actual BLASTP alignment data (omitted here since it can be easily obtained by anyone familiar with BLASTP) show that yeast is approximately equidistant to drosophila (67/104 identity) and to human (66/102 identity). If one carefully compares the alignments, one would find that among those 36 residue differences between yeast and human, 34 are also different between yeast and drosophila.
This nearly complete overlap in mutated residue positions in two separate sister lineages is one of the two fundamental features of the genetic equidistance phenomenon (the other is of course the equidistance in terms of percent identity). However, it, dubbed the overlap feature, has been completely ignored or overlooked in the past 46 years. The molecular clock interpretation and the neutral theory were invented based on a complete ignorance of this feature. They would not have been invented in the first place if people had paid attention to the overlap feature because they are clearly contradicted by this feature.
The molecular clock and the neutral theory predict only a minority, rather than a majority, of all mutant residue positions between yeast and human to be also mutant positions between yeast and drosophila. The predicted number is at best 17 residue positions, merely half of what is actually observed. (at worst, it is only 11) This is easily calculated as follows:
As the BLASTP alignment shows, drosophila and human differ at 22 of 102 positions. So, of the total 34 mutant/changed positions between yeast and human that are also altered between yeast and drosophila, 12 could be assigned to changes occurred during the time period when the common ancestor of human and drosophila has been separate from the yeast lineage but has yet to split out human and drosophila. After the split of human and drosophila, the chance for a residue to be different between yeast and the human lineage or between yeast and the drosophila lineage is approximately 22/90 = 0.24. The chance for the same residue position to be altered in both the yeast-human comparison and the yeast-drosophila comparison is 0.24×0.24 = 0.059. Given a length of 90 residue positions, this translates to 0.059×90 = 5.3 residue positions. Together with the 12 shared mutant positions accumulated in the common ancestor lineage of drosophila and human, this means that there should only be 12 + 5.3 =17 residue positions that are altered in both the yeast-human comparison and the yeast-drosophila comparison.
Even if we grant that about half of the 90 residues are conserved non neutral positions, we would still get only 12 +11 = 23 overlap positions, far short of 34 (22/45×22/45×45 = 10.8). To get to 34, we must invoke that there are only 22 out of 90 residues that are neutral. This means that the observed distance between yeast and human or between yeast and drosophila represents the maximum possible. But a maximum cap concept on genetic distance is entirely missing in the practical application of the molecular clock and the neutral theory. That concept is nonexistent in the past until the recent MGD hypothesis.
In short, while the molecular clock and the neutral theory may predict half of the equidistance result (equidistance in terms of percent identity), they cannot predict or are contradicted by the other half of the result where most of the mutant positions relative to the outgroup are shared between the two sister lineages. Therefore, the molecular clock and the neutral theory are not at all valid explanation for the equidistance result. This way of invalidating the existing theory did not occur to me until recently, which is why I did not include it in my previous paper that refutes the molecular clock interpretation of the equidistance result. Ref. Huang, S. “The genetic equidistance result of molecular evolution is independent of mutation rates.”
The MGD hypothesis is the only viable and complete explanation so far for the equidistance result. It has proven to be the correct one as it easily passes the highest standard for a scientific theory, i.e. to explain all relevant facts and to have not a single factual and logical contradiction. The example here with cytochrome c should provide the actual data for the simplistic illustration of the MGD explanation as shown in Table 1 of my MGD paper (Inverse relationship between genetic diversity and epigenetic complexity).
BTW, an example of how the molecular clock interpretation was used in practice by the field to produce the famous 5 million year divergence time between human and chimpanzees, as first reported in 1969. Wilson AC, Sarich VM (1969) A molecular time scale for human evolution. Proc Natl Acad Sci U S A 63: 1088-1093.
Wilson and Sarich wrote in their 1969 paper:
“Table 1 shows that the four primate hemoglobins are about equally distinct in sequence from that of the horse. Therefore, the hemoglobins of monkeys on the one hand, and those of the apes and man on the other, have changed to about the same extent since these species last shared a common ancestor. These results are neither unique nor surprising. Others have already recognized that protein molecules often appear to have evolved in a regular fashion with respect to time. The bulk of the available sequence information is consistent with the hypothesis that for any given protein, such as hemoglobin, the probability of an amino acid substitution occurring in a given interval of time is the same in every lineage.”The above shows that Wilson and Sarich interpreted the equidistance result (horse is equidistant to the primates in hemoglobins) by assuming the same mutation rate in every primate lineage. From there, they went on to calculate a 5 million year split for humans and chimpanzees. Now, given that I have proven that the molecular clock interpretation of the equidistance result is completely false, it follows that any result based on such interpretation is automatically false. Indeed, my calculation based on the MGD hypothesis gave a human-pongid split time of 19.2 million years (manuscript in preparation).
More on the MGD interpretation of certain facts
I here explain an observation that is not covered in detail in my MGD paper but is covered in principle. The observation is that not all variant residues between two complex species are also variant between two simple species, when Table 1 of the paper seems to indicate that all variant residues between two complex species are also variant between two simple species. Table 1 is of course meant to express an idea in simplistic form and should not be taken to be literally exact.
A typical example of this observation is: there are only 10 out of 22 residues differing between drosophila and human that are also different between yeast and neurospora.
So about 50% of variants between drosophila and human is also variant between yeast and neurospora. That is a significant overlap that can only be explained by the MGD but not by the molecular clock/neutral theory. Now, why not 100% overlap?
As shown by Figure 2 of my paper, the MGD says that of all the conserved residues between two species at any time, there are a fraction of them that is due to adaptation to common environmental selection and may change from time to time. In our case, yeast and neurospora share 67% of all positions. Of these, maybe 20% is shared because of common selection (the two yeasts have very similar way of life). So, the absolutely non-neutral sequence may be 47% and the neutral region 53%. Human and drosophila share 80% positions, and 10% of these may be due to common environmental selection (they have very different life style and so the shared region due to common selection is less). So the neutral sequence is 30% in this case for the drosophila. The MGD says that this 30% region should completely overlap the 53% neutral region of yeast. But since only 20 of 30 in human/drosophila did actually vary and only 30 of 53 in yeast, the actual overlap residues are 20/30×30/53×30 = 11, which is very close to the actual number.
So, the exact numbers may not be real but the above is to illustrate in principle how the observation may be explained by the MGD. The key is to have a fraction of the shared residues as being neutral or changeable with environment, even when the distance is at maximum. This is a very reasonable and intuitively obvious point and is actually true in reality. So, when we see a maximum distance, it does not mean that all the shared residues are absolutely nonchangeable or nonneutral.
Latest gene knockout works confirm the MGD hypothesis
A recent news focus article in Science discusses knock out works that are inexplicable by the existing paradigm of evolution. But, as I explain below, they are easy predictions of the MGD hypothesis.
“Genomic clues to DNA treasure sometimes lead nowhere”
By Don Monroe, July 10, 2009, Science 325: 142-143
The first paragraph summarizes the basic findings that the existing paradigm has no clues about: “When a gene works, evolution holds on to it, keeping its sequence intact even as bases around it change over time. Genome researchers had come to depend on this conservation to steer them to critical regions in the genome: If a stretch of DNA remains unchanged across different species, that DNA is probably performing a vital function. But a growing number of examples show that not all conserved sequences are important and, worse, that not all important sequences are conserved. That second observation—which would have been considered heresy until about a decade ago—means that researchers who had typically relied on conservation to guide them could have missed critical genes or unknown regulatory regions. But even as they scramble to understand how the “conservation equals function” rule has failed them, they are uncovering profound new subtleties in how genes are controlled and how they adapt during evolution.”
The fact that non-conserved sequences can be just as lethal when knocked-out as conserved ones is precisely what would be predicted by the MGD hypothesis.
Let us consider two genes A and B in mouse. A is conserved or shows 90% identity between two strains of mice. B is less conserved or merely shows 40% identity between two strains of mice. Based on the MGD hypothesis, the bare bone function of A (minimal function that shows activity in a test tube) may require merely 40% of the residues rather than 90%. The high degree of conservation of A indicates a broader function associated with epigenetic complexity rather than with bare bone function. In terms of percentage residues needed for bared bone function, A and B may in fact be quite similar. However, B may not be needed for multiple cell types and encounter less epigenetic constraints, leading to less sequence conservation. But the bare bone function of B may be just as essential as that of A and neither could be knocked out without a lethal effect.
To distinguish the functional importance of conserved vs non-conserved sequences, knock out is not the way to do it but is unfortunately what has so far been done simply because it is easy to do. Mutation is affecting single base pair while knock out is often deleting the whole gene. The MGD makes the following testable prediction. If one does random mutagenesis one by one by knock in method for each of the base pairs of a gene, it is far more likely for a point mutation in a conserved gene than in a less conserved one to lead to an abnormal phenotype.
The fact that conserved sequences are not lethal when knocked-out is also precisely what would be predicted by the MGD hypothesis. Conservation may reflect epigenetic complexity rather than merely functional essentialness for viability.
“Complexity Associated Protein Sectors (CAPS), new evidence for the MGD”
A Cell paper two weeks ago reports an important and beautiful result on protein sequence conservation and evolution (1). The result contradicts the modern evolution theory but is a precise prediction of the more complete evolution theory, the maximum genetic diversity hypothesis (MGD) (2, 3).
The modern evolution theory mainly consists of natural selection of Darwinism and random drift of the neutral theory. The theory makes no distinction between microevolution and macroevolution and was originally a theory of microevolution or population genetics. It was invented by population geneticists based on a complete ignorance of epigenetics, the other half of heredity equally important if not more important for determining heritable phenotypes. A factual observation is explained either by natural selection or its negation (random drift) depending on which one works better. The reason why a negation of Darwinian natural selection can be accorded equal weight in the modern evolution theory is because natural selection is largely irrelevant to or contradicted by molecular evolution for which the clock/neutral theory seems to superficially work if one overlooks the numerous contradictions of its own. No evolution biologist has ever claimed that the modern evolution theory has no factual contradictions. In truth, however, all the contradictions are about macroevolution. The theory essentially has no contradiction for the domain of microevolution or population genetics for which it was originally invented and should never have been allowed to apply outside of it.
An obvious difference between microevolution and macroevolution is that the latter involves a change in organismal or epigenetic complexity as roughly defined by the number of cell types or the number of epigenetic molecules. With microevolution only, bacteria would stay forever as bacteria, and would never be able to evolve into complex multicellular organisms.
As reported by the Cell paper, one portion of a protein of S1A family protease, termed the blue sector (arbitrary color to be different from two other sectors of red and green), is much more conserved in vertebrates than in invertebrates (Figure 6B of the Cell paper) and is not related to enzyme activity. It represents a domain specific to vertebrates. The existence of sectors in a protein, the blue sector in this case, that can differentiate complex vertebrates from simple invertebrates cannot be explained by the modern evolution theory. Thus the paper made no attempt to discuss the blue sector in connection with any evolution theory, perhaps in order not to openly embarrass the paradigm and thus have a chance to pass the peer censorship of Cell. Here is why. To explain the blue sector by natural selection, one must invoke that vertebrates as a whole encounter an entirely different natural environment from that of invertebrates, which is simply not the case. Even if so, it needs one additional wild ad hoc speculation that natural selection only acts on vertebrates but not on invertebrates, which is unlikely and inconsistent with Darwinism. To explain the blue sector by random drift would require neutrality for most of the amino acid positions in this sector, which is also simply not the case. Because if it is, the blue sector would never have been discovered in the first place as a group of correlated amino acids.
So what does the blue sector say about actual evolutionary mechanisms? A key question that should be discussed by the Cell paper but was unfortunately not. First, it adds one more outstanding fact to the long list of facts that contradict the modern evolution theory. Second, every fact that contradicts the modern evolution theory has been automatically found to be evidence for the MGD and the new result of the Cell paper is no exception. The MGD treats the modern evolution theory as true only for microevolution and suggests that macroevolution is distinctly different and involves a change in epigenetic complexity. One is mostly about pure genetic changes such as point mutations whereas the other is mostly about epigenetic changes, e.g., rearrangement of large segments of chromatin and gene expression patterns.
A good analogy is house building or any kind of man-made construction. We need both bricks/building blocks and architecture plan/map. Microevolution is about changing brick types, like from clay to rock. Macroevolution is about changing architecture plans, like from 1 story to 100 story buildings. And there is a self-evident inverse relationship between plan and bricks: the more complex the building plan, the more restriction on the variation in building blocks. It is always a sign of great science if one can express it in terms of common sense language, as well put by Einstein: “Most of the fundamental ideas of science are essentially simple, and may, as a rule, be expressed in a language comprehensible to anyone.” It is obviously non-sensible to ordinary people for any theory to be equivalent to saying that changing brick type alone can change the architectural style of buildings.
An increase in epigenetic complexity will lead to a decrease in genetic diversity as measured by point mutations due to a self-evident inverse relationship between genetic diversity and epigenetic complexity (2, 3). A gene in complex organisms encounters more epigenetic constraint than in simple organisms and is thus less tolerant of point mutations. Macroevolution towards higher epigenetic complexity involves a suppression of point mutations, and in this sense is the exact opposite of microevolution (ref 2-5). Thus the MGD predicts that protein or DNA sequence sectors that are non-constrained in simple organisms would become constrained in complex organisms even though such sectors may play no role in enzyme function. The blue sector of S1A protease is the first example of such Complexity-Associated-Protein-Sector (CAPS) or more generally Complexity-Associated-Sequence-Sectors (CASS) to also include DNA. Epigenetic complexity puts maximum CAPS on sequence divergence.
Finally, a simple thought experiment on how the blue sector may illustrate the distinction between micro and macro evolution. A common ancestor gave rise to two invertebrate species A and B and a vertebrate species C within a couple of million years during the Cambrian period. After 550 million years of evolution, A and B are 40% non identical in a trypsin of 240 aa. Most of the blue sector residues are located in the non-identical regions. Both A and B contributed equally to the non-identity between them because they are similar in complexity or in their tolerance level to point mutations. On the other hand, C and A or C and B are also 40% non-identical. Most of the blue sector residues of C are also located in the non-identical region. However, mutations in the blue sector are not neutral to C (while neutral to A and B) because C is more complex, and so have happened much less frequently than the corresponding positions of A or B. Thus, while the mutation rate of A or B can be calculated as is done by the modern evolution theory as 40% x 240/2/550 = 0.087 aa per million year, the same cannot and should not be done for C.
Unfortunately, the same has in fact been done and is being done daily for C under the existing paradigm in the past 46 years, resulting in numerous contradictions with the facts of macroevolution in term of both the fossil record and the DNA record such as the genetic equidistance result of Margoliash in 1963 (6), the most remarkable result of molecular evolution. The molecular clock/neutral theory was invented to account for the numerical feature of this result but should never have been invented in the first place and lasted as long as it has been if another feature, the overlap feature, of this result had been appreciated 46 years ago rather than just now as a direct result of inventing the MGD. As things stand today, we either have no theory to explain the overlap feature of the equidistance result as well as numerous other facts such as CAPS or we have a perfect one in the MGD.
Ref.
1. Halabi, N., Rivoire, O., Leibler, S., and Ranganathan, R. (2009). Protein sectors: evolutionary units of three-dimensional structure. Cell 138, 774-786.
2. Huang, S. (2008) Histone methylation and the initiation of cancer. Cancer Epigenetics, Ed. Tollefsbol, T., CRC Books.
3. Huang, S. (2009) Inverse relationship between genetic diversity and epigenetic complexity, Submitted. Preprint available, http://precedings.nature.com/documents/1751/version/2
4. Gago, S., et al., (2009) Extremely high mutation rate of a hammerhead viroid. Science 323: 1308
5. Zimmer, C. (2009) Fast-mutating viroids hold clues to early life. Science magazine blog, http://blogs.sciencemag.org/origins/2009/03/fast-mutating-viroids-hold-clu.html
6. Margoliash, E. (1963) Primary structure and evolution of cytochrome C. PNAS, 50:672-679
- (Login to share with a colleague)
Additional information
- License:
- This document is licensed to the public under the Creative Commons Attribution 3.0 License
- How to cite this document:
-
Huang, Shi. Inverse relationship between genetic diversity and epigenetic complexity. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2009.1751.2> (2009)
- Version info:
-
Published version:
http://www.amazon.com/Cancer-Epigenetics-Trygve-Tollefsbol/dp/1420045792/ref=sr_1_1?ie=UTF8&s=books&qid=1226425804&sr=1-1 (Peer Reviewed) The main idea of this paper, i.e. the inverse relationship between genetic diversity and epigenetic complexity, has been published as a peer reviewed book chapter. Citation: Huang, S. (2008) Histone methylation and the initiation of cancer. Cancer Epigenetics, Ed. Tollefsbol, T., CRC Books. -
Other versions of this document in Nature Precedings
Version number Document title Date v1 Posted 02 April 2008 Other versions of this document elsewhere on the web
None known.
Ajay Kohli on 19 January 2009 03:49 UTC
”..an inverse relationship between genetic diversity and epigenetic complexity was deduced from a simple intuition..”.
A similar intuition guided us to propose the phenetic diversity in genetically similar accessions of the plant Jatropha curcas, to arise from epigenetic mechanisms
(hdl:10101/npre.2009.2782.1 at Nature Precedings).
Initial experimental results bear out the hypothesis. Good to see support coming in at the evolutionary level.