Monday, 18 June 2007

evolution - Why are there exactly four nucleobases in DNA?

Here is a possible answer given by this paper:



http://www.ncbi.nlm.nih.gov/pubmed/16794952
or
http://www.math.unl.edu/~bdeng1/Papers/DengDNAreplication.pdf



It gives a Darwinian explanation to the question. It approaches the problem from Claude Shannon's theory for communication. It treats DNA replication conceptually and mathematically the same as a data transmission. It concludes that the system of four bases, not two, not six, replicates the most genetic information at the shortest amount of time.



The communicational analogy goes like this. If you have two data transmission systems, one can transmit, say, 1 MB per second, and the other can do 2 MB per second but cost less than twice as much. The answer is obvious you will buy the second service for a higher rate per cost. As a data service, it does not care what information you consume -- it can be spam, video, audio, etc. All that matters is the transmission rate. As for DNA replication, it is like a data transmission channel when one base is replicated a time along the mother DNA template. It too does not care whether the process is for a bacterium genome, or a plant, or an animal genome. The pay-off is in information and the cost is in time. Unlike your abiotic communication varieties, time is both the sender and the receiver of all messages of life, and different life forms or species are merely time's cell phones. So if one system can replicate more information in a unit time than another, the faster one will win the evolutionary arm race. A prey operating on a slow replicator system will not be able to compete with nor to adapt to a predator operating on a fast one.



Now because the A-T pair has only two weak hydrogen bonds but the C-G pair has three, A and T take a shorter time to complete duplication than the C and G do. Although the replication time is short in some fraction of nano second, but the time adds up quickly for genomes with base pairs in the billions. So having the C-G pair may slow down the replication, but the gain is in information. One base pair gives you 1 bit per base information. Two pairs gives you 2 bits per base information. But, having more base pairs may eventually run into a diminished return in information replication rate if the new bases take too long a time to replicate. Hence the consideration for the optimal rate of replication measured in information bits per base per time. Without information there would be no diversity, no complexity. Without replication in information there would be no life.



Using a simple transmission/replication rate calculation by Shannon you can calculate the mean rate for the AT-system, the CG-system, the ATCG-system, and for some hypothetical 6-bases, 2n-bases system whose new bases take progressively longer time to replicate. The analysis shows the ATCG-system has the optimal replication rate if the CG bases take 1.65 to 3 times longer to replicate than the AT bases. That is, a base-2 system replicates its bases faster but does not carry more information to have a higher bit rate. Likewise, a base-6 system has a greater per-base information but replicate slower on average to end up with a suboptimal bit rate.



DNA Replication Rate



According to a comparison from the paper, the base-4 system is about 40% faster than the A–T only system, and 133% faster than the G–C only system. Assume life on Earth started about 4 billion years ago, then the A-T only system would set back evolution by 1 billion years, the G–C system would do so by 2.3 billion years. For a hypothetical base-6 system, it would do so by 80 million years. In other words, life is where it should be because the base-4 system is able to transmit information through the time bottleneck at the optimal bit rate.



In conclusion, life is to replicate the most information with the shortest time, and the base-4 system does it the best. If ever there were other systems they would have lost the informatic competition to the base-4 system from the get-go. Darwin's principle works at life's most basic and most important level.



There are other explanations, all non-Darwinian. Most are based on the base's molecular structures. But these types of explanation border on circular argument -- using observations to explain themselves. They also face this catch-22 problem since there is no way to exhaust all possible bases for replication. However, such lines of exploration are fruitful regardless because more knowledge the better. But without taking information and its replication into consideration it is hard to imagine a sensible answer to the question.

No comments:

Post a Comment